Oxford University Press

English Language Teaching Global Blog


Leave a comment

Adaptive testing in ELT with Colin Finnerty | ELTOC 2020

OUP offers a suite of English language tests: the Oxford Online Placement test (for adults), the Oxford Young Learners Placement Test, The Oxford Test of English (a proficiency test for adults) and, from April 2020, the Oxford Test of English for Schools. What’s the one thing that unites all these tests (apart from them being brilliant!)? Well, they are all adaptive tests. In this blog, we’ll dip our toes into the topic of adaptive testing, which I’ll be exploring in more detail in my ELTOC session. If you like this blog, be sure to come along to the session.

The first standardized tests

Imagine the scene. A test taker walks nervously into the exam room, hands in any forbidden items to the invigilator (e.g. a bag, mobile phone, notepad, etc.) and is escorted to a randomly allocated desk, separated from other desks to prevent copying. The test taker completes a multiple-choice test, anonymised to protect against potential bias from the person marking the test, all under the watchful eyes of the invigilators. Sound familiar? But imagine this isn’t happening today, but over one-and-a-half thousand years ago.

The first recorded standardised tests date back to the year 606. A large-scale, high-stakes exam for the Chinese civil service, it pioneered many of the examination procedures that we take for granted today. And while the system had many features we would shy away from today (the tests were so long that people died while trying to finish them), this approach to standardised testing lasted a millennium until it came to an end in 1905. Coincidentally, that same year the next great innovation in testing was established by French polymath Alfred Binet.

A revolution in testing

Binet was an accomplished academic. His research included investigations into palmistry, the mnemonics of chess players, and experimental psychology. But perhaps his most well-known contribution is the IQ test. The test broke new ground, not only for being the first to attempt to measure intelligence, but also because it was the first ever adaptive test. Adaptive testing was an innovation well ahead of its time, and it was another 100 years before it became widely available. But why? To answer this, we first need to explore how traditional paper-based tests work.

The problem with paper-based tests

We’ve all done paper-based tests: everyone gets the same paper of, say, 100 questions. You then get a score out of 100 depending on how many questions you got right. These tests are known as ‘linear tests’ because everyone answers the same questions in the same order. It’s worth noting that many computer-based tests are actually linear, often being just paper-based tests which have been put onto a computer.

But how are these linear tests constructed? Well, they focus on “maximising internal consistency reliability by selecting items (questions) that are of average difficulty and high discrimination” (Weiss, 2011). Let’s unpack what that means with an illustration. Imagine a CEFR B1 paper-based English language test. Most of the items will be around the ‘middle’ of the B1 level, with fewer questions at either the lower or higher end of the B1 range. While this approach provides precise measurements for test takers in the middle of the B1 range, test takers at the extremes will be asked fewer questions at their level, and therefore receive a less precise score. That’s a very inefficient way to measure, and is a missed opportunity to offer a more accurate picture of the true ability of the test taker.

Standard Error of Measurement

Now we’ll develop this idea further. The concept of Standard Error of Measurement (SEM), from Classical Test Theory, is that whenever we measure a latent trait such as language ability or IQ, the measurement will always consist of some error. To illustrate, imagine giving the same test to the same test taker on two consecutive days (magically erasing their memory of the first test before the second to avoid practice effects). While their ‘True Score’ (i.e. underlying ability) would remain unchanged, the two measurements would almost certainly show some variation. SEM is a statistical measure of that variation. The smaller the variation, the more reliable the test score is likely to be. Now, applying this concept to the paper-based test example in the previous section, what we will see is that SEM will be higher for the test takers at both the lower and higher extremes of the B1 range.

Back to our B1 paper-based test example. In Figure 1, the horizontal axis of the graph shows B1 test scores going from low to high, and the vertical axis shows increasing SEM. The higher the SEM, the less precise the measurement. The dotted line illustrates the SEM. We can see that a test taker in the middle of the B1 range will have a low SEM, which means they are getting a precise score. However, the low and high level B1 test takers’ measurements are less precise.

Aren’t we supposed to treat all test takers the same?

                                                                                            Figure 1.

How computer-adaptive tests work

So how are computer-adaptive tests different? Well, unlike linear tests, computer-adaptive tests have a bank of hundreds of questions which have been calibrated with different difficulties. The questions are presented to the test taker based on a sophisticated algorithm, but in simple terms, if the test taker answers the question correctly, they are presented with a more difficult question; if they answer incorrectly, they are presented with a less difficult question. And so it goes until the end of the test when a ‘final ability estimate’ is produced and the test taker is given a final score.

Binet’s adaptive test was paper-based and must have been a nightmare to administer. It could only be administered to one test taker at a time, with an invigilator marking each question as the test taker completed it, then finding and administering each successive question. But the advent of the personal computer means that questions can be marked and administered in real-time, giving the test taker a seamless testing experience, and allowing a limitless number of people to take the test at the same time.

The advantages of adaptive testing

So why bother with adaptive testing? Well, there are lots of benefits compared with paper-based tests (or indeed linear tests on a computer). Firstly, because the questions are just the right level of challenge, the SEM is the same for each test taker, and scores are more precise than traditional linear tests (see Figure 2). This means that each test taker is treated fairly. Another benefit is that, because adaptive tests are more efficient, they can be shorter than traditional paper-based tests. That’s good news for test takers. The precision of measurement also means the questions presented to the test takers are at just the right level of challenge, so test takers won’t be stressed by being asked questions which are too difficult, or bored by being asked questions which are too easy.

This is all good news for test takers, who will benefit from an improved test experience and confidence in their results.

 

                                                                                            Figure 2.


ELTOC 2020

If you’re interested in hearing more about how we can make testing a better experience for test takers, come and join me at my ELTOC session. See you there!

 


Colin Finnerty is Head of Assessment Production at Oxford University Press. He has worked in language assessment at OUP for eight years, heading a team which created the Oxford Young Learner’s Placement Test and the Oxford Test of English. His interests include learner corpora, learning analytics, and adaptive technology.


References

Weiss, D. J. (2011). Better Data From Better Measurements Using Computerized Adaptive Testing. Testing Journal of Methods and Measurement in the Social Sciences Vol.2, no.1, 1-27.

Oxford Online Placement Test and Oxford Young Learners Placement Test: www.oxfordenglishtesting.com

The Oxford Test of English and Oxford Test of English for Schools: www.oxfordtestofenglish.com


1 Comment

Writing ELT tests for teenagers | ELTOC 2020

ELT AssessmentI don’t want to sound too stuffy as I firmly believe that 42 is the new 21, however, teenagers today live very different lives to those who came before. Starting this blog with a quick comparison of my teenage life and my niece’s teenage life seems a good way to start. I was 12 in 1988, my life revolved around school, family, youth club, and the 4 channels on UK television. I loved music and spent all my pocket money on tapes, spending my evenings memorising the lyrics from the tape inserts. Now, Millie, my niece is 12 in 2019 and her teenage years are radically different to mine. Still, her life revolves around family and schools but the impact of technology on her life is of fundamental importance and so creates the biggest difference between our teenage lives.

But what does all of this have to do with assessment? Well, as Director of Assessment at OUP responsible for English language tests, some of which are aimed at teenagers, it’s very much my concern that what we design is appropriate for the end-user. My ELTOC talk will be about designing assessments for teenagers. Let’s start by considering why…

Why do we design a test specifically for teenagers?

Our aim is to make the test an accurate reflection of the individual’s performance as possible, and that means removing any barriers that increase cognitive load. Tests can be stressful enough and so I see it as a fundamental part of my job to remove any extraneous stress. In terms of a test for teenagers, this means providing them with test items that have a familiar context. Imagine an 11-year-old doing an English language assessment and facing this writing task. It’s not a real task but it is indicative of lots of exam writing tasks.

The 11 year might have the linguistic competence to describe advantages and disadvantages, make comparisons and even offer their own opinion. However, the teenager is likely to struggle with the concepts in the task. The concepts of work and flexible working will not be familiar enough to enable them to answer this task to the best of their ability.

This is why we develop tests specifically aimed at teenagers. Tests that allow them to demonstrate linguistic competence that is set within domains and contexts that the teenager is familiar with. An alternative question that elicits the same level of language is given below. It might not be the perfect question for everybody but it’s a question that should be more accessible to most teenagers and that allows them to demonstrate linguistic competence within a familiar context.

We have a responsibility to get this right and to provide the best test experience for everybody to enable them to demonstrate their true abilities in the test scenario. For us, behind the scenes, there are lots of guidelines we provide our writers with to try to ensure that the test is appropriate for the target audience, in this case, teenagers. Let’s look at this in more detail.

Writing a test for teenagers

Let’s think about the vocabulary used by a teenager and vocabulary used by the adults writing our test content, the potential for anachronisms is huge. Let’s look at this issue through the evolution of phone technology.

As well as the item evolving, so has the language: phone / (mobile) phone / (smart) phone. The words in parentheses gradually become redundant as the evolved item becomes the norm so it’s only useful to say ‘mobile phone’ if you are differentiating between another type of phone. For those of us who have lived through this evolution, we may use all of the terms interchangeably and writers might choose to write tasks about the ‘smartphone’. However, teenagers have only ever known the ‘smart, mobile phone’- to them, it’s just a phone! It’s not a big deal unless you’re a teenager in an exam suddenly faced with a phrase that might cause confusion. Other examples of such anachronisms include:

  • Video game, or is it just a game?
  • Do we say desktop, laptop, or just computer?
  • Would you talk about a digital camera or a camera, or would you just use your phone?
  • Are good things: cool, wicked, rad, awesome, chill, lit or maybe you just stan?

Writing tests for teenagers that incorporate the kind of language they are used to needs to be considered but this should be balanced with maintaining and measuring a ‘standard English’ that is recognised by the majority of people doing the test in different countries around the world as we produce global tests. Another important consideration is creating tasks of sufficient complexity that we can be sure of the level we are measuring.

As a test provider, we have people whose job it is to solve some of these challenges. For teachers, who write assessments for their students, some of the same challenges exist but with less resource available to solve them. This is why you should join me for my ELTOC session!


ELTOC 2020

During my talk, I’ll be sharing my expertise on all thing assessment. You’ll learn lots of tips that you can take away about how to design your own classroom assessments for teenagers.

So, if this sounds interesting to you, come along to my session and learn more about designing assessments for teenagers. See you there!


Sarah Rogerson is Director of Assessment at Oxford University Press. She has worked in English language teaching and assessment for 20 years and is passionate about education for all and digital innovation in ELT. As a relative newcomer to OUP, Sarah is really excited about the Oxford Test of English and how well it caters to the 21st-century student.


2 Comments

5 minutes with Sarah Rogerson, Director of Assessment for the Oxford Test of English

Oxford Test of English

A new job and new products

I started at Oxford University Press as Director of Assessment for ELT on January 2nd this year. I remember at my interview being asked about what my priorities would be within the first 3 months of the job. I said one of my main priorities would be to fall in love with the OUP assessment products. Somethings you say at interviews because you have to, but this is something I genuinely meant. I need to feel passionate about what I do and see the value in what I do – I need to fall in love with what I do. So this blog is a love story! It’s a love story about me and the Oxford Test of English.

Where to begin… how about an exotic location!

In my 3rd week at OUP, I visited the OUP España offices in Madrid. I wanted to meet customers, I wanted to know about their problems, I wanted to know their thoughts about the Oxford Test of English, I wanted to know from them what my priorities should be. And so, my colleagues arranged for me to meet 3 very different types of customer in and around Madrid. I was overwhelmed by the positivity of these customers towards a new English language assessment in what is a very competitive market. Some key things that came out of this were that the Oxford Test of English is fit for purpose, friendly and flexible. They loved the fact that the exam can’t be failed, that it’s fully online, that it’s modular, and that it’s on demand. As a newcomer, this was fantastic to hear.

“I arranged to sit the test like an actual student”


As soon I got back to the UK, I arranged to sit the test as an actual student, and so my love was ignited! A 4 skill test, 3 CEFR levels, and it can be completed in 2 hours; it solves so many customer pain points. It had me hooked.

The assessment capability at OUP is strong. The Oxford Test of English is really impressive, and our placement test is also a winner! We’ll be revealing a new product in April 2020 and I’m really happy in my new role.

I’m thoroughly excited about the future and building the OUP assessment brand. If you want to know more, check out the Oxford Test of English website, or if you’re coming to the IATEFL conference this year in Liverpool, don’t miss our launch event!


Sarah Rogerson is Director of Assessment at Oxford University Press. She has worked in English language teaching and assessment for 20 years and is passionate about education for all and digital innovation in ELT. As a relative newcomer to OUP, Sarah is really excited about the Oxford Test of English and how well it caters to the 21st-century student.