Oxford University Press

English Language Teaching Global Blog


Leave a comment

Adaptive testing in ELT with Colin Finnerty | ELTOC 2020

OUP offers a suite of English language tests: the Oxford Online Placement test (for adults), the Oxford Young Learners Placement Test, The Oxford Test of English (a proficiency test for adults) and, from April 2020, the Oxford Test of English for Schools. What’s the one thing that unites all these tests (apart from them being brilliant!)? Well, they are all adaptive tests. In this blog, we’ll dip our toes into the topic of adaptive testing, which I’ll be exploring in more detail in my ELTOC session. If you like this blog, be sure to come along to the session.

The first standardized tests

Imagine the scene. A test taker walks nervously into the exam room, hands in any forbidden items to the invigilator (e.g. a bag, mobile phone, notepad, etc.) and is escorted to a randomly allocated desk, separated from other desks to prevent copying. The test taker completes a multiple-choice test, anonymised to protect against potential bias from the person marking the test, all under the watchful eyes of the invigilators. Sound familiar? But imagine this isn’t happening today, but over one-and-a-half thousand years ago.

The first recorded standardised tests date back to the year 606. A large-scale, high-stakes exam for the Chinese civil service, it pioneered many of the examination procedures that we take for granted today. And while the system had many features we would shy away from today (the tests were so long that people died while trying to finish them), this approach to standardised testing lasted a millennium until it came to an end in 1905. Coincidentally, that same year the next great innovation in testing was established by French polymath Alfred Binet.

A revolution in testing

Binet was an accomplished academic. His research included investigations into palmistry, the mnemonics of chess players, and experimental psychology. But perhaps his most well-known contribution is the IQ test. The test broke new ground, not only for being the first to attempt to measure intelligence, but also because it was the first ever adaptive test. Adaptive testing was an innovation well ahead of its time, and it was another 100 years before it became widely available. But why? To answer this, we first need to explore how traditional paper-based tests work.

The problem with paper-based tests

We’ve all done paper-based tests: everyone gets the same paper of, say, 100 questions. You then get a score out of 100 depending on how many questions you got right. These tests are known as ‘linear tests’ because everyone answers the same questions in the same order. It’s worth noting that many computer-based tests are actually linear, often being just paper-based tests which have been put onto a computer.

But how are these linear tests constructed? Well, they focus on “maximising internal consistency reliability by selecting items (questions) that are of average difficulty and high discrimination” (Weiss, 2011). Let’s unpack what that means with an illustration. Imagine a CEFR B1 paper-based English language test. Most of the items will be around the ‘middle’ of the B1 level, with fewer questions at either the lower or higher end of the B1 range. While this approach provides precise measurements for test takers in the middle of the B1 range, test takers at the extremes will be asked fewer questions at their level, and therefore receive a less precise score. That’s a very inefficient way to measure, and is a missed opportunity to offer a more accurate picture of the true ability of the test taker.

Standard Error of Measurement

Now we’ll develop this idea further. The concept of Standard Error of Measurement (SEM), from Classical Test Theory, is that whenever we measure a latent trait such as language ability or IQ, the measurement will always consist of some error. To illustrate, imagine giving the same test to the same test taker on two consecutive days (magically erasing their memory of the first test before the second to avoid practice effects). While their ‘True Score’ (i.e. underlying ability) would remain unchanged, the two measurements would almost certainly show some variation. SEM is a statistical measure of that variation. The smaller the variation, the more reliable the test score is likely to be. Now, applying this concept to the paper-based test example in the previous section, what we will see is that SEM will be higher for the test takers at both the lower and higher extremes of the B1 range.

Back to our B1 paper-based test example. In Figure 1, the horizontal axis of the graph shows B1 test scores going from low to high, and the vertical axis shows increasing SEM. The higher the SEM, the less precise the measurement. The dotted line illustrates the SEM. We can see that a test taker in the middle of the B1 range will have a low SEM, which means they are getting a precise score. However, the low and high level B1 test takers’ measurements are less precise.

Aren’t we supposed to treat all test takers the same?

                                                                                            Figure 1.

How computer-adaptive tests work

So how are computer-adaptive tests different? Well, unlike linear tests, computer-adaptive tests have a bank of hundreds of questions which have been calibrated with different difficulties. The questions are presented to the test taker based on a sophisticated algorithm, but in simple terms, if the test taker answers the question correctly, they are presented with a more difficult question; if they answer incorrectly, they are presented with a less difficult question. And so it goes until the end of the test when a ‘final ability estimate’ is produced and the test taker is given a final score.

Binet’s adaptive test was paper-based and must have been a nightmare to administer. It could only be administered to one test taker at a time, with an invigilator marking each question as the test taker completed it, then finding and administering each successive question. But the advent of the personal computer means that questions can be marked and administered in real-time, giving the test taker a seamless testing experience, and allowing a limitless number of people to take the test at the same time.

The advantages of adaptive testing

So why bother with adaptive testing? Well, there are lots of benefits compared with paper-based tests (or indeed linear tests on a computer). Firstly, because the questions are just the right level of challenge, the SEM is the same for each test taker, and scores are more precise than traditional linear tests (see Figure 2). This means that each test taker is treated fairly. Another benefit is that, because adaptive tests are more efficient, they can be shorter than traditional paper-based tests. That’s good news for test takers. The precision of measurement also means the questions presented to the test takers are at just the right level of challenge, so test takers won’t be stressed by being asked questions which are too difficult, or bored by being asked questions which are too easy.

This is all good news for test takers, who will benefit from an improved test experience and confidence in their results.

 

                                                                                            Figure 2.


ELTOC 2020

If you’re interested in hearing more about how we can make testing a better experience for test takers, come and join me at my ELTOC session. See you there!

 


Colin Finnerty is Head of Assessment Production at Oxford University Press. He has worked in language assessment at OUP for eight years, heading a team which created the Oxford Young Learner’s Placement Test and the Oxford Test of English. His interests include learner corpora, learning analytics, and adaptive technology.


References

Weiss, D. J. (2011). Better Data From Better Measurements Using Computerized Adaptive Testing. Testing Journal of Methods and Measurement in the Social Sciences Vol.2, no.1, 1-27.

Oxford Online Placement Test and Oxford Young Learners Placement Test: www.oxfordenglishtesting.com

The Oxford Test of English and Oxford Test of English for Schools: www.oxfordtestofenglish.com


Leave a comment

Find your learner’s reading level | Andrew Dilger

Find your reading level

I have a question for you. Do you know your learners’ reading level in English – I mean, really know it? If your learners are halfway through an A2 coursebook, does that mean their reading level is A2-and-a-half?! The cautious ones among us would say ‘Not necessarily’; the bold ones would say ‘No’. But in an age when efficacy and assessment is all the rage in ELT, plenty of pressure is put on the teaching community (by itself, parents, and other stakeholders) to measure learners’ language skills accurately – down to the nth degree, in fact.

 The dark art of testing

Measuring reading level has always been something of a dark art, or at least a shadowy discipline. Part of the problem is that, as a receptive skill, it seems to take place inside learners’ heads. We can test comprehension, of course. And how we love to test it – with questions, gapfills, clozes, and multiple-choices, all of which require learners to skim and scan until they go cross-eyed! We often enjoy testing comprehension so much that we squeeze the life out of a text. It’s a wonder we don’t put learners off reading in English altogether.

There are other factors at play, of course – short attention spans in a fast-paced, device-driven world compromising the appeal of ‘deep reading’ is one of them, but that’s an easy target. The main issue is that most learners aren’t reading the right texts for them, and not in the right way.

Reading improves all-round ability

If learners want to improve their reading level – and benefit their all-round ability in English – then it’s vital we help them discover how to do this. And don’t just take my word for it. Research by luminaries like Richard Day and Paul Nation has suggested this for years. There are massive gains to be made by learners reading a lot in English – reading extensively for interest and pleasure. For more on this, see this article from El País (use Google Translate if your Spanish is rusty or non-existent).

Reading fluency over reading comprehension

So let’s go back to the question: Do you know your learners’ reading level? The important thing to appreciate is that I’m talking about reading fluency here. Can they read a text connectedly and understand the majority of words?

Most publishers have an online test which claims to tell learners their reading level. Take the Macmillan Readers Level Test, for example. In actual fact, it’s a series of grammar and vocabulary sentences with multiple-choice options, i.e. it doesn’t test reading fluency at all. It features prompt pictures for all the items but most are decorative rather than functional. In addition, some of the sentences are unnatural or misleading, e.g. I’ve got an ache in my throat; Did you hear the thunder last night? with the prompt picture showing lightning. The maximum level the test can give is Upper Intermediate and, if you retake it, the questions and options are all in exactly the same order… so you can improve instantly by virtue of having done the test already.

A tool instead of a test

Here at OUP we’ve come up with something different and something new. And we’d like you and your learners to decide how useful it is. For a start, we’re not calling it a test – it’s a tool. A semantic difference perhaps, but an important one. This isn’t a grammar check based on a random text, but something which genuinely attempts to gauge how fluent learners are at reading a page of a published story.

How does it do this? With a disarmingly simple innovation. Learners themselves decide whether they know the meanings of the words or not. They also decide whether a page of a story at a certain level is ‘Too easy’, ‘Too difficult’, or ‘OK’. This is known as the Goldilocks Principle and is common in cognitive science and developmental psychology.

‘But students will cheat!’ I hear you cry. If they do, they’re only cheating themselves because they’ll be shown a range of stories at the wrong level. It’s like buying clothes – why would you choose trousers which are two sizes too big if they fall down round your ankles? Instead, what learners need is something that ‘fits’ – something that’s right for them at that stage in their development. This means being able to read confidently at a comfortable level.

What’s the point?

After all, the point of learners finding their reading level isn’t so they can brandish it on a certificate or boast about it on social media. The point is to open up a world of texts, stories, and information which they will find digestible and rewarding – even life-changing.

If YOUR learners want to find their reading level in English, they can try our new tool here. Why don’t YOU try it, too? It’s free and takes less than 10 minutes. Because it’s a beta version, we’re also interested in getting feedback about ways to improve it, so please ask your learners to complete the survey too. Happy reading!                      

Find your reading level button

 


Andrew Dilger is a Managing Editor at Oxford University Press. He has been involved in English language teaching as a teacher, trainer, and editor for over a quarter of a century. He is passionate about the power of reading and claims to have read something every day of his life since he first went to school.


1 Comment

How can we assess global skills? | ELTOC 2020

Global Skills puzzle pieceWe all want our students to develop the global skills needed for modern life and work. We know that our teaching style, our classroom organisation, and what we expect of our students are critical in this. If we want our students to be collaborative and creative we have to provide opportunities for cooperation and problem-solving. However, any attempt to assess these skills raises ‘why?’ and ‘how?’ questions. During my session at ELTOC 2020, I will seek to answer some of these. In the meantime, here’s a brief summary of my approach:

Why do we need to assess this kind of learning?

  1. To signal its importance in the modern world. Language learning cannot be separated from functioning effectively in society. Global skills encourage sensitivity to the needs of others, problem-solving, and how to communicate effectively with those from different cultures. Assessing global skills shows we value them.
  2. To convince students, particularly those who are driven by external rewards, that these skills are important and should be attended to.
  3. Because their assessment helps students know how well they are doing in these skills. It becomes the basis of feedback to students on how well they are doing and what they need to do next to improve.

How do we assess global skills?

Global skills are broad and complex so we need to assess them in ways that does justice to this. If we want to capture performance in a global skill, some conventional assessment methods might not be fit-for-purpose. A multiple-choice test assessing creativity (and there have been some) won’t capture what is important. Nor would giving a mark for social responsibility and well-being be appropriate.

Instead, we will need to use more informal ways of gathering evidence in order to make more general holistic judgements about progress. These are the result of regular observations of student performance and continuous monitoring of their progress. This does not involve lots of extra record-keeping by teachers, it relies on their professional skills of both knowing what the skills involve and informally monitoring individuals’ performance.

As part of our students’ development of global skills we can put the responsibility for gathering evidence of performance on the students. What can they claim they have done to demonstrate a particular cluster of skills? Can they provide evidence of, for example, of creativity and communication? The very act of doing this may be evidence of emotional self-regulation and wellbeing.  

One of the best ways of capturing their achievements is for students to develop individual portfolios. These can be electronic, paper-based, or a blend of both. The aim is to demonstrate their development in each of the global skill clusters. The teacher’s role is to judge, along with their own observations, the student’s progress in skill development. This then provides an opportunity for feedback on where a student has reached and what steps could be taken to progress further.

How should we approach this more holistic approach to the assessment of global skills?

  1. Keep it simple

Our suggestion[i] is that we use just three classifications for each cluster of skills: working towards; achieved; exceeded. Each of these may generate specific feedback – what more is needed; where to go next; how to improve even further.

  1. Trust teacher judgement

The evidence for these holistic judgements comes from the teacher’s own informal evaluation of what is seen, heard and read in the classroom and outside. This is more dependable than narrow standardised skills because of the multiple and continuous opportunities for information gathering. These judgements require teachers to utilise and develop their skills of observation and continuous assessment.

  1. Continuously sample student performance

This does not mean informally assessing every student on every occasion, it involves focusing on different students on different occasions so that, over time, we will have monitored all our students’ performance.

  1. Use any assessments formatively

The purpose of the assessments is to inform students of their performance and to use our judgements to provide feedback on what to do next. The classifications should be used to encourage further development rather than as summative grades.


ELTOC 2020

I hope this is useful. I’ll be expanding on this in my upcoming session at ELTOC 2020. I look forward to seeing you there!


Gordon Stobart is Emeritus Professor of Education at the Institute of Education, University College London, and an honorary research fellow at the University of Oxford. Having worked as a secondary school teacher and an educational psychologist, he spent twenty years as a senior policy researcher. He was a founder member of the Assessment Reform Group, which has promoted assessment for learning internationally. Gordon is the lead author of our Assessment for Learning Position Paper.

[i] ELT Expert Panel (2019) Global Skills: Creating Empowered 21st Century Citizens Oxford University Press


1 Comment

Writing ELT tests for teenagers | ELTOC 2020

ELT AssessmentI don’t want to sound too stuffy as I firmly believe that 42 is the new 21, however, teenagers today live very different lives to those who came before. Starting this blog with a quick comparison of my teenage life and my niece’s teenage life seems a good way to start. I was 12 in 1988, my life revolved around school, family, youth club, and the 4 channels on UK television. I loved music and spent all my pocket money on tapes, spending my evenings memorising the lyrics from the tape inserts. Now, Millie, my niece is 12 in 2019 and her teenage years are radically different to mine. Still, her life revolves around family and schools but the impact of technology on her life is of fundamental importance and so creates the biggest difference between our teenage lives.

But what does all of this have to do with assessment? Well, as Director of Assessment at OUP responsible for English language tests, some of which are aimed at teenagers, it’s very much my concern that what we design is appropriate for the end-user. My ELTOC talk will be about designing assessments for teenagers. Let’s start by considering why…

Why do we design a test specifically for teenagers?

Our aim is to make the test an accurate reflection of the individual’s performance as possible, and that means removing any barriers that increase cognitive load. Tests can be stressful enough and so I see it as a fundamental part of my job to remove any extraneous stress. In terms of a test for teenagers, this means providing them with test items that have a familiar context. Imagine an 11-year-old doing an English language assessment and facing this writing task. It’s not a real task but it is indicative of lots of exam writing tasks.

The 11 year might have the linguistic competence to describe advantages and disadvantages, make comparisons and even offer their own opinion. However, the teenager is likely to struggle with the concepts in the task. The concepts of work and flexible working will not be familiar enough to enable them to answer this task to the best of their ability.

This is why we develop tests specifically aimed at teenagers. Tests that allow them to demonstrate linguistic competence that is set within domains and contexts that the teenager is familiar with. An alternative question that elicits the same level of language is given below. It might not be the perfect question for everybody but it’s a question that should be more accessible to most teenagers and that allows them to demonstrate linguistic competence within a familiar context.

We have a responsibility to get this right and to provide the best test experience for everybody to enable them to demonstrate their true abilities in the test scenario. For us, behind the scenes, there are lots of guidelines we provide our writers with to try to ensure that the test is appropriate for the target audience, in this case, teenagers. Let’s look at this in more detail.

Writing a test for teenagers

Let’s think about the vocabulary used by a teenager and vocabulary used by the adults writing our test content, the potential for anachronisms is huge. Let’s look at this issue through the evolution of phone technology.

As well as the item evolving, so has the language: phone / (mobile) phone / (smart) phone. The words in parentheses gradually become redundant as the evolved item becomes the norm so it’s only useful to say ‘mobile phone’ if you are differentiating between another type of phone. For those of us who have lived through this evolution, we may use all of the terms interchangeably and writers might choose to write tasks about the ‘smartphone’. However, teenagers have only ever known the ‘smart, mobile phone’- to them, it’s just a phone! It’s not a big deal unless you’re a teenager in an exam suddenly faced with a phrase that might cause confusion. Other examples of such anachronisms include:

  • Video game, or is it just a game?
  • Do we say desktop, laptop, or just computer?
  • Would you talk about a digital camera or a camera, or would you just use your phone?
  • Are good things: cool, wicked, rad, awesome, chill, lit or maybe you just stan?

Writing tests for teenagers that incorporate the kind of language they are used to needs to be considered but this should be balanced with maintaining and measuring a ‘standard English’ that is recognised by the majority of people doing the test in different countries around the world as we produce global tests. Another important consideration is creating tasks of sufficient complexity that we can be sure of the level we are measuring.

As a test provider, we have people whose job it is to solve some of these challenges. For teachers, who write assessments for their students, some of the same challenges exist but with less resource available to solve them. This is why you should join me for my ELTOC session!


ELTOC 2020

During my talk, I’ll be sharing my expertise on all thing assessment. You’ll learn lots of tips that you can take away about how to design your own classroom assessments for teenagers.

So, if this sounds interesting to you, come along to my session and learn more about designing assessments for teenagers. See you there!


Sarah Rogerson is Director of Assessment at Oxford University Press. She has worked in English language teaching and assessment for 20 years and is passionate about education for all and digital innovation in ELT. As a relative newcomer to OUP, Sarah is really excited about the Oxford Test of English and how well it caters to the 21st-century student.


2 Comments

5 Ways to Improve Feedback in your Classroom

Teacher and student high-fivingEffective feedback is the key to successful assessment for learning, and can greatly improve your students’ understanding. So how can you ensure that your feedback is as effective as possible? You need to understand what level your students are at and where they need to improve. Your students will also find your feedback more useful if they understand the purpose of what they are learning and know what success looks like.

 

Try these 5 tips to improve feedback in your classroom:

1. Ask questions to elicit deeper understanding

Most questions asked in the classroom are simple recall questions (‘What is a noun?’) or procedural questions (‘Where’s your book?’). Higher-order questions require student to make comparisons, speculate, and hypothesize. By asking more of these questions, you can learn more about the way your students understand and process language, and provide better feedback.

2. Increase wait time

Did you know that most teachers wait less than a second after asking a question before they say something else? Instead of waiting longer, they often re-phrase the question, continue talking, or select a student to answer it. This does not give students time to develop their answers or think deeply about the question. Try waiting just 3 seconds after a recall question and 10 seconds after a higher-order question to greatly improve your students’ answers.

3. Encourage feedback from your students

Asking questions should be a two-way process, where students are able to ask the teacher about issues they don’t understand. However, nervous or shy students often struggle to do so. Encourage students to ask more questions by asking them to come up with questions in groups, or write questions down and hand them in after class.

4. Help students understand what they are learning

Students perform better if they understand the purpose of what they are learning. Encourage students to think about why they are learning by linking each lesson back to what has been learned already, and regularly asking questions about learning intentions.

5. Help students understand the value of feedback

If students recognise the standard they are trying to achieve, they respond to feedback better and appreciate how it will help them progress. Try improving students’ understanding by explaining the criteria for success. You can also provide examples of successful work and work that could be improved for your students to compare.

 

Did you find this article useful? For more information and advice, read our position paper on Effective Feedback:

Download the position paper

 

Chris Robson graduated from the University of Oxford in 2016 with a degree in English Literature, before beginning an internship at Oxford University Press shortly afterwards. After joining ELT Marketing full time to work with our secondary products, including Project Explore, he is now focused on empowering the global ELT community through delivery of our position papers.