Oxford University Press

English Language Teaching Global Blog


Leave a comment

Adaptive testing in ELT with Colin Finnerty | ELTOC 2020

OUP offers a suite of English language tests: the Oxford Online Placement test (for adults), the Oxford Young Learners Placement Test, The Oxford Test of English (a proficiency test for adults) and, from April 2020, the Oxford Test of English for Schools. What’s the one thing that unites all these tests (apart from them being brilliant!)? Well, they are all adaptive tests. In this blog, we’ll dip our toes into the topic of adaptive testing, which I’ll be exploring in more detail in my ELTOC session. If you like this blog, be sure to come along to the session.

The first standardized tests

Imagine the scene. A test taker walks nervously into the exam room, hands in any forbidden items to the invigilator (e.g. a bag, mobile phone, notepad, etc.) and is escorted to a randomly allocated desk, separated from other desks to prevent copying. The test taker completes a multiple-choice test, anonymised to protect against potential bias from the person marking the test, all under the watchful eyes of the invigilators. Sound familiar? But imagine this isn’t happening today, but over one-and-a-half thousand years ago.

The first recorded standardised tests date back to the year 606. A large-scale, high-stakes exam for the Chinese civil service, it pioneered many of the examination procedures that we take for granted today. And while the system had many features we would shy away from today (the tests were so long that people died while trying to finish them), this approach to standardised testing lasted a millennium until it came to an end in 1905. Coincidentally, that same year the next great innovation in testing was established by French polymath Alfred Binet.

A revolution in testing

Binet was an accomplished academic. His research included investigations into palmistry, the mnemonics of chess players, and experimental psychology. But perhaps his most well-known contribution is the IQ test. The test broke new ground, not only for being the first to attempt to measure intelligence, but also because it was the first ever adaptive test. Adaptive testing was an innovation well ahead of its time, and it was another 100 years before it became widely available. But why? To answer this, we first need to explore how traditional paper-based tests work.

The problem with paper-based tests

We’ve all done paper-based tests: everyone gets the same paper of, say, 100 questions. You then get a score out of 100 depending on how many questions you got right. These tests are known as ‘linear tests’ because everyone answers the same questions in the same order. It’s worth noting that many computer-based tests are actually linear, often being just paper-based tests which have been put onto a computer.

But how are these linear tests constructed? Well, they focus on “maximising internal consistency reliability by selecting items (questions) that are of average difficulty and high discrimination” (Weiss, 2011). Let’s unpack what that means with an illustration. Imagine a CEFR B1 paper-based English language test. Most of the items will be around the ‘middle’ of the B1 level, with fewer questions at either the lower or higher end of the B1 range. While this approach provides precise measurements for test takers in the middle of the B1 range, test takers at the extremes will be asked fewer questions at their level, and therefore receive a less precise score. That’s a very inefficient way to measure, and is a missed opportunity to offer a more accurate picture of the true ability of the test taker.

Standard Error of Measurement

Now we’ll develop this idea further. The concept of Standard Error of Measurement (SEM), from Classical Test Theory, is that whenever we measure a latent trait such as language ability or IQ, the measurement will always consist of some error. To illustrate, imagine giving the same test to the same test taker on two consecutive days (magically erasing their memory of the first test before the second to avoid practice effects). While their ‘True Score’ (i.e. underlying ability) would remain unchanged, the two measurements would almost certainly show some variation. SEM is a statistical measure of that variation. The smaller the variation, the more reliable the test score is likely to be. Now, applying this concept to the paper-based test example in the previous section, what we will see is that SEM will be higher for the test takers at both the lower and higher extremes of the B1 range.

Back to our B1 paper-based test example. In Figure 1, the horizontal axis of the graph shows B1 test scores going from low to high, and the vertical axis shows increasing SEM. The higher the SEM, the less precise the measurement. The dotted line illustrates the SEM. We can see that a test taker in the middle of the B1 range will have a low SEM, which means they are getting a precise score. However, the low and high level B1 test takers’ measurements are less precise.

Aren’t we supposed to treat all test takers the same?

                                                                                            Figure 1.

How computer-adaptive tests work

So how are computer-adaptive tests different? Well, unlike linear tests, computer-adaptive tests have a bank of hundreds of questions which have been calibrated with different difficulties. The questions are presented to the test taker based on a sophisticated algorithm, but in simple terms, if the test taker answers the question correctly, they are presented with a more difficult question; if they answer incorrectly, they are presented with a less difficult question. And so it goes until the end of the test when a ‘final ability estimate’ is produced and the test taker is given a final score.

Binet’s adaptive test was paper-based and must have been a nightmare to administer. It could only be administered to one test taker at a time, with an invigilator marking each question as the test taker completed it, then finding and administering each successive question. But the advent of the personal computer means that questions can be marked and administered in real-time, giving the test taker a seamless testing experience, and allowing a limitless number of people to take the test at the same time.

The advantages of adaptive testing

So why bother with adaptive testing? Well, there are lots of benefits compared with paper-based tests (or indeed linear tests on a computer). Firstly, because the questions are just the right level of challenge, the SEM is the same for each test taker, and scores are more precise than traditional linear tests (see Figure 2). This means that each test taker is treated fairly. Another benefit is that, because adaptive tests are more efficient, they can be shorter than traditional paper-based tests. That’s good news for test takers. The precision of measurement also means the questions presented to the test takers are at just the right level of challenge, so test takers won’t be stressed by being asked questions which are too difficult, or bored by being asked questions which are too easy.

This is all good news for test takers, who will benefit from an improved test experience and confidence in their results.

 

                                                                                            Figure 2.


ELTOC 2020

If you’re interested in hearing more about how we can make testing a better experience for test takers, come and join me at my ELTOC session. See you there!

 


Colin Finnerty is Head of Assessment Production at Oxford University Press. He has worked in language assessment at OUP for eight years, heading a team which created the Oxford Young Learner’s Placement Test and the Oxford Test of English. His interests include learner corpora, learning analytics, and adaptive technology.


References

Weiss, D. J. (2011). Better Data From Better Measurements Using Computerized Adaptive Testing. Testing Journal of Methods and Measurement in the Social Sciences Vol.2, no.1, 1-27.

Oxford Online Placement Test and Oxford Young Learners Placement Test: www.oxfordenglishtesting.com

The Oxford Test of English and Oxford Test of English for Schools: www.oxfordtestofenglish.com


1 Comment

What’s new in the new Oxford 3000™️? | ELTOC 2020

A changing language

The Oxford ELT Dictionaries team has relaunched its core word list, the Oxford 3000, billed as ‘the most important words to learn in English’, 14 years on from its initial launch in 2005.

So let’s start with a brainstorm: what has changed in the last 14 years? Jot down any words or phrases that occur to you. Here are some images to get you started.

I’m sure you can think of more.

The items in blue are all now headwords in the Oxford Advanced Learner’s Dictionary online but were not included in the seventh edition of the dictionary, published in 2005. These words, things and concepts either did not exist or barely existed at that time.

The influence of smartphones and social media can also be clearly seen in the revised Oxford 3000.  Words new to the list in the area of media and technology include app, blog, download, edit, scan and update – which all existed in 2005 but have become much more central to our lives and communication since then.

The two criteria we used to determine which words should be included in the revised Oxford 3000 were frequency and relevance.  Frequency was measured in the 2-billion-word Oxford English Corpus. Relevance was determined by measuring frequency in a specially created corpus of ELT Secondary and Adult coursebooks. This enabled us to capture those words – such as cafe and T-shirt – that occur frequently in teaching texts and are familiar to learners from a low level, but are not among the most frequent words in a general corpus.

Focus on topics

One result of this increased focus on the texts that learners are actually using to study English is an increase in vocabulary connected with topics that are popular in ELT courses and exams, including sports (athlete, basketball, champion, skiing, stadium, tennis and more), culture (celebrity, classical, creative, gallery, historic, portrait, sculpture, venue), film and TV (cartoon, detective, episode, genre, script, setting) and travel and transport (airline, crew, destination, tourism).

Overall, about 200 words are new to the list. Typically, they are more concrete, lower-level words than the words they have displaced. All the texts in the coursebook corpus are from courses that have been carefully graded against the CEFR. This has made it possible for us to analyse the profile of different vocabulary items across the different CEFR levels and to assign a level to each word. The levels are for guidance only – it is impossible to be definitive about the level of any individual word. Different learners may well encounter the same word at different levels. But broadly speaking, the level assigned represents the level at which we would expect most learners to recognize and understand the word if they read it or hear it spoken – even if they do not yet use it in their own writing or speaking.

The most important words to learn in English

In the revised Oxford 3000, 900 words have been graded at A1 level, 800 at A2, 700 at B1 and 600 B2. This tapering profile is deliberate because this is intended as a core vocabulary, not a complete vocabulary. The more learners progress, the more they will want to supplement this core vocabulary with items that are off-list. It is impossible to prescribe what this additional vocabulary should be: it will vary according to the needs and interests of each individual learner. The core list, on the other hand, provides a firm foundation for all learners, whatever their learning context. To learn more about what is important in a core vocabulary, see Julie Moore’s blog here.

To see the full, revised Oxford 3000 visit www.oxford3000.com. Here you will also find the brand new Oxford 5000 – an extension of the list for advanced level learners, including 2,000 more words at B2-C1 level. Also available is the new Oxford Phrase List – 750 common phrases including idioms, phrasal verbs, collocations and prepositional phrases, graded from A1 to C1.


ELTOC 2020

Join Diana at ELTOC 2020 for a webinar on helping learners with their core vocabulary using the Oxford 3000 and Oxford 5000. During Diana’s session, you’ll learn how the lists were compiled, the benefits for learners, and how you can use the lists in your teaching.


Diana Lea taught English to learners and trainee teachers in Czechoslovakia, Poland and the UK before joining Oxford University Press in 1994, where she works in the English Language Teaching Division on dictionaries and other vocabulary resources for learners of English. She is the editor of the Oxford Learner’s Thesaurus and the Oxford Learner’s Dictionary of Academic English. Most recently she has been working on Oxford Learner’s Word Lists and preparing the tenth edition of the Oxford Advanced Learner’s Dictionary, to be published in January 2020.

 


2 Comments

5 minutes with Sarah Rogerson, Director of Assessment for the Oxford Test of English

Oxford Test of English

A new job and new products

I started at Oxford University Press as Director of Assessment for ELT on January 2nd this year. I remember at my interview being asked about what my priorities would be within the first 3 months of the job. I said one of my main priorities would be to fall in love with the OUP assessment products. Somethings you say at interviews because you have to, but this is something I genuinely meant. I need to feel passionate about what I do and see the value in what I do – I need to fall in love with what I do. So this blog is a love story! It’s a love story about me and the Oxford Test of English.

Where to begin… how about an exotic location!

In my 3rd week at OUP, I visited the OUP España offices in Madrid. I wanted to meet customers, I wanted to know about their problems, I wanted to know their thoughts about the Oxford Test of English, I wanted to know from them what my priorities should be. And so, my colleagues arranged for me to meet 3 very different types of customer in and around Madrid. I was overwhelmed by the positivity of these customers towards a new English language assessment in what is a very competitive market. Some key things that came out of this were that the Oxford Test of English is fit for purpose, friendly and flexible. They loved the fact that the exam can’t be failed, that it’s fully online, that it’s modular, and that it’s on demand. As a newcomer, this was fantastic to hear.

“I arranged to sit the test like an actual student”


As soon I got back to the UK, I arranged to sit the test as an actual student, and so my love was ignited! A 4 skill test, 3 CEFR levels, and it can be completed in 2 hours; it solves so many customer pain points. It had me hooked.

The assessment capability at OUP is strong. The Oxford Test of English is really impressive, and our placement test is also a winner! We’ll be revealing a new product in April 2020 and I’m really happy in my new role.

I’m thoroughly excited about the future and building the OUP assessment brand. If you want to know more, check out the Oxford Test of English website, or if you’re coming to the IATEFL conference this year in Liverpool, don’t miss our launch event!


Sarah Rogerson is Director of Assessment at Oxford University Press. She has worked in English language teaching and assessment for 20 years and is passionate about education for all and digital innovation in ELT. As a relative newcomer to OUP, Sarah is really excited about the Oxford Test of English and how well it caters to the 21st-century student.


2 Comments

Don’t look now – the CEFR is in your classroom

Getting your exam results

Working in language education, it’s quite hard to escape from the CEFR, or Common European Framework of Reference for Languages. It crops up in courses at language schools and in publishers textbooks. International testing bodies label their products as suitable for levels called A2, B1+, or C1.

Ministries of education around the world are vying with each other to set the most demanding targets for the percentage of school children who will reach B2 in the languages they study by the time they graduate. People applying for a Tier 2 visa to do skilled work in the UK need a B1 level certificate in English in reading, writing, speaking and listening. If looking for work using their German language skills, applicants might be asked by their future employer to demonstrate at least an A1 level for unskilled work, B1 for a service role, or C1 for a professional level job involving meetings and negotiations.

Although it’s clearly important that people involved in language education should have a good understanding of such an influential object, there seems to be a lot of confusion around where the CEFR comes from and even about what exactly it is. Let’s start with the first of those points. The CEFR is not a product of the European Union, but was developed by the Council of Europe, an entirely different organisation which is both older (it was founded in 1949) and much bigger (it has 47 member states, many of which are not EU members, including Norway, Russia and Turkey). Its mission includes protecting human rights, democracy and the rule of law, promoting diversity, and combating discrimination against minorities. It has carried out successful campaigns among its members to end the death penalty and to support the rights of people with disabilities. Its work in language education involves promoting linguistic human rights and the teaching and learning of minority languages.

The Council of Europe and language education

As part of this work, the Council of Europe was pioneering in promoting one of the most revolutionary ideas in language education: the communicative approach. Instead of focussing (as teachers usually did before the 1970s) on what learners knew about a language – how many words or how much grammar – the Council of Europe focussed attention on what learners might actually want to do with the language they were learning – the activities they might need to carry out, and the ideas they might want to express. In 1975 the Council of Europe published Jan van Ek’s Threshold Level. This book defined a level (to become “B1” in the CEFR) that a language learner would need in order to be able to live independently for a while in a country where that language is spoken. In 2001 (the European year of languages) twenty-five years of further work involving extensive consultation with language teachers and academic experts culminated in the publication of the CEFR. This year, the Council of Europe has published a Companion Volume, available online that updates and expands on the original publication.

It is part of the Council of Europe’s educational philosophy that learners should be able to move easily between informal learning, schools, universities, and workplace training courses to pick up the practical skills that they need. Of course, doing this is much easier if everyone shares the same basic terms for talking about teaching and learning. If a ‘Beginner’ level class in school A is like an ‘Elementary’ level class in school B or a ‘Preliminary’ class in school C, and the ‘Starters’ book in textbook series X is like the ‘Grade 2’ book in series Y, life in the English classroom can soon get very confusing for the uninitiated. The CEFR provides a shared language to make it easier for teachers, learners, publishers, and testers to communicate across languages, educational sectors, and national boundaries.

School A School B School C
Beginner Elementary Preliminary

Table 1 shows the need for a shared ‘language’ for talking about levels.

Language learning levels, activities, and contexts

One contribution of the CEFR has been to provide terms for levels – running from Basic (pre-A1, A1 and A2), through Independent (B1 and B2) up to Proficient (C1 and C2) – that are defined in terms of what learners at each level can do with the language they are learning.  For example, at the A1 level a learner, ‘can use simple phrases and sentences to describe where he/she lives and people he/she knows’, but at B2 ‘can present clear, detailed descriptions on a wide range of subjects related to his/her field of interest’.

CEFR level A1 CEFR level B2
‘can use simple phrases and sentences to describe where he/she lives and people he/she knows’ ‘can present clear, detailed descriptions on a wide range of subjects related to his/her field of interest’

Table 2 gives examples of what students ‘can do’ at two CEFR levels.

Although levels are important, they are only a small part of what the CEFR offers. In fact, the Council of Europe suggests that levels are too reductive and that it is better to consider learners and learning in terms of profiles of abilities. For example, learners may be very effective speakers and listeners (B2 level), but struggle with the written language (A2 level). The CEFR does not follow the traditional “four skills model” of Reading, Writing, Listening and Speaking, but divides language use activities into reception, interaction, production and mediation. The framework also considers the contexts in which people use languages, recognising that learning a language to keep in touch with one’s grandparents is rather different (and suggests a different skills profile) from learning in order to pursue a career in Engineering.

Describing and explaining, not prescribing or imposing 

The CEFR is not a test or a syllabus, it is not limited to the learning of indigenous “European” languages and it does not set out what learners should learn. There is no consensus view on what should be learned or what methods should be used and the CEFR is not a recipe book that recommends or requires its users to adopt a certain teaching method. Educational objectives and standards will inevitably differ according to the target language and the learning context; teaching methods will vary according to the local educational culture. What the CEFR does offer is sets of key questions that encourage educators to think about, describe and explain why they choose to learn, teach or test a language in the way that they do. As part of this process, they are encouraged to question their current aims and methods, but selectivity, flexibility and pluralism are seen to be essential. Users choose only those parts of the CEFR scheme that are seen to be relevant in their context. If the illustrative descriptions in the CEFR are not suitable for a particular group, it is clear that they are free to develop alternative descriptions that work better for them – and the CEFR suggests ways of doing just that. Indeed, the new Companion Volume brings together many of the Can-Do descriptors that have been developed since 2001 to fill gaps and expand the scope of the CEFR descriptive scheme.

If you think it’s time you found out more about the CEFR and Companion Volume and how they affect your work, visit the CEFR website to learn more.


Professor Anthony Green is Director of the Centre for Research in English Language Learning and Assessment at the University of Bedfordshire. He has published widely on language assessment and is a former President of the International Language Testing Association (ILTA). His most recent book Exploring Language Assessment and Testing (Routledge, 2014) provides trainee teachers and others with an introduction to this field. Professor Green’s main research interests concern relationships between language assessment, teaching and learning.


Further reading

Need further support, or just want to learn more about language assessment? We recommend that you take a look at these two titles: ‘Language Assessment for Classroom Teachers‘, and ‘Focus on Assessment‘.


2 Comments

How 100 teachers helped to build the Common European Framework

Glyn Jones is a freelance consultant in language learning and assessment and a PhD student at Lancaster University in the UK. In the past he has worked as an EFL teacher, a developer of CALL (Computer Assisted Language Learning) methods and materials, and – most recently – as a test developer and researcher for two international assessment organisations.

One day in 1994 a hundred English teachers attended a one-day workshop in Zurich, where they watched some video recordings of Swiss language learners performing communicative tasks. Apart from the size of the group, of course, there was nothing unusual about this activity. Teachers often review recordings of learners’ performances, and for a variety of reasons. But what made this particular workshop special was that it was a stage in the development of the Common European Framework of Reference for Languages (CEFR).

The teachers had already been asked to assess some of their own students. They had done this by completing questionnaires made up of CAN DO statements. Each teacher had chosen ten students and, for each of these, checked them against a list of statements such as “Can describe dreams, hopes and ambitions” or “Can understand in detail what is said to him/her in the standard spoken language”. At the workshop the teachers repeated this process, but this time they were all assessing the same small group of learners (the ones performing in the video recordings).

These two procedures provided many hundreds of teacher judgments. By analysing these, the researchers who conducted the study, Brian North and Günther Schneider, were able to discover how the CAN DO statements work in practice, and so to place them relative to each other on a numerical scale. This scale was to become the basis of the now familiar six levels, A1 to C2, of the CEFR.

This is one of the strengths of the CEFR. Previous scales had been constructed by asking experts to allocate descriptors to levels on the basis of intuition. The CEFR scale was the first to be based on an analysis of the way the descriptors work when they are actually used, with real learners.

For my PhD study I am replicating part of this ground-breaking research.

Why replicate, you might ask?

Firstly, thanks to the Internet I can reach teachers all over the world, whereas North and Schneider were restricted to one country (for good reasons).

Secondly, my study focusses on Writing. This is the skill for which there were the fewest descriptors in the original research (which focussed on Speaking) and which is least well described in the CEFR as a result.

Thirdly, I am including in my study some of the new descriptors which have been drafted recently in order to fill gaps in the CEFR in order to scale these along with the original descriptors. In short, as well as contributing to the validation of the CEFR, I will be helping to extend it.

If you teach English to adult or secondary-age learners, you could help with this important work. As with the original research, I’m asking teachers to use CAN DO statements to assess some of their learners, and to assess some samples of other learners’ performance (of Writing, this time, not Speaking).

If you might like to participate, please visit my website https://cefrreplication.jimdo.com/ where you can register for the project. From then on everything is done online and at times that suit you. You can also drop me a line there if you would like to find out more.