Oxford University Press

English Language Teaching Global Blog

1 Comment

An English Test For Schools: Introducing Ana And Her Students

Ana and her studentsEarlier this year, Oxford University Press launched the Oxford Test of English for Schools – an online, English proficiency test recommended for 12-16-year-olds. It’s flexible, fast and available at Approved Test Centres worldwide. Plus, it’s the only proficiency test certified by the University of Oxford.

Teacher Ana Isabel Vázquez from Spain is excited for a version of the Oxford Test of English that has been designed especially for younger students – as she says, it’s “a test adapted to give them the best start on their English journey.”

“The younger we are able to test children’s English, the farther they will be able to take their language learning.”

She uses the Oxford Test of English for Schools to motivate her students, so they “find the confidence to keep learning and using English.” And it works!

An English test that motivates students!

Nerea, 16, one of Ana’s students, is proof: “English will help me get a job, go abroad, learn about other cultures, and be able to communicate with people around the globe. That’s fantastic.”

“It makes me so proud to see the students develop, learn, and feel more confident in how they use English to communicate.”

The Oxford Test of English for Schools assesses 12 to 16-year-olds’ abilities in Reading, Writing, Speaking and Listening. The Listening and Reading modules are adaptive, so the difficulty adjusts in response to students’ answers, while the Speaking and Writing modules use task randomization.

This personalized experience makes the test shorter, less stressful, and more precise than traditional proficiency tests.

“Because a 12-year-old doesn’t want to write about work or finance,” Ana explains: “The Oxford Test of English for Schools will have content adapted to suit children’s interests and life experience. It’ll cover topics like free time and what they did at the weekend.”

A change students welcome

Veronica, 16, says: “We like to answer questions about friendship, free time, cinema, culture – things that affect all of society.”

“I like that I can speak English everywhere and most young people are going to understand me, which gives me the freedom to travel and know I’ll be understood,” adds Fernando, 16.

Like many institutions worldwide, Ana’s school, Colegio Nuestra Señora del Pila, has become an Oxford Test of English Approved Test Centre, meaning they can offer the test securely within their computer room.

“It only takes two hours, and the results are ready in 14 days, which makes everyone feel really comfortable and confident,” says Ana.

“My biggest hope is that the children maintain their English and use it throughout their lives. Our objective is to give them the ability to have a conversation, to be able to communicate – we don’t drill grammar here, we just want them to love English as much as we do!”

And it seems they already do. Maria, 16, says, “Knowing English helps me when I travel to other countries; for now, I can understand other cultures and communicate with people, but maybe speaking English will also help me get a job when I leave school.”

Opening Doors

Ana believes the Oxford Test of English is “a great starting point for showing future generations that they are our hope and that they can conquer the world. Being able to speak English will open doors for them and set them on their journey to success.”


Fast-track your 12 to 16-year-olds’ English language certification with the Oxford Test of English for Schools.

Learn how the test could benefit you and your students on our website.

Oxford Test of English for Schools


Like this? Now read: Watching students find success with the Oxford Test of English

Don’t forget to share this link to our Learning Resources Bank with your students – where they can find additional tips and support to guide them through their English learning journey.


Writing tests for teenagers – where to begin!

teen doing school work on a laptopCreating items (test questions) for English language assessments is a tricky business, particularly for teens. You need to ensure that the item produces an accurate and valid measurement of the skill you are trying to test while providing the best possible experience for a test taker. In this blog, we’ll look at two important considerations when writing items: context and content. If this whets your appetite, be sure to join me in my Oxford English Assessment Professional Development session where we’ll be exploring in more detail how to write good test items.


Here’s an example of the kind of item you might get in an adult speaking test. But it’s not suitable for teens. Why not?

Some people say that the perks of a job, such as working from home, are more important than the salary. Do you agree or disagree?

As you might have guessed, a 13-year-old may well have some of the linguistic competences required to tackle this question (describing advantages and disadvantages, making comparisons between ideas, or offering their opinion on the topic), but how many teens will have enough experience of work to actually be able to demonstrate these competencies?

So, when it comes to writing items, context is key. Let’s take a look at an alternative question that takes the teen context into consideration.

What are the advantages of homeschooling?

Even for teens who don’t have direct experience of homeschooling, this context is still more accessible to them, meaning they are more able to demonstrate their linguistic competence.


As well as getting the context correct, we also have to get the content correct. Consider this: in our lives, some of us have had landline phones, mobile phones, and smartphones. I bought a Nokia 3210 in 2001, and for the first time ever was able to make phone calls, send messages, and play Snake with one device, all while on the move. When telling my friends about it, I would refer to it as my mobile phone to distinguish it from my landline. And then, much later, I would reference my smartphone to distinguish it from my old phone.

However, for most teenagers, a phone is just a phone, and any talk about non-smart phones will probably just draw blank looks. It might not sound major, but imagine being a teenager in an exam suddenly faced with a phrase that might cause confusion.

In summary

As a test provider, our goal is to solve some of the challenges outlined above. Some of these same challenges exist for teachers who write assessments for their students, and we’ll be talking more about these in the Writing tests for teenagers webinar.

Get practical support and guidance for delivering effective English language assessment online!

Register for our series of webinars delivered by leading OUP ELT Assessment experts:

Register for the webinar


Robin Lee has been working for Oxford University Press for five years and is the product manager for the Oxford Test of English and the Oxford Test of English for Schools. Before joining OUP, he worked as a teacher, teacher trainer, and item writer, mostly in East Asia and Southeast Asia. His interests include data analysis and the use of technology in assessment.


Assessment Literacy – the key concepts you need to know!

student filling out a test in the classroomResearch shows that the typical teacher can spend up to a third of their professional life involved in assessment-related activities (Stiggins and Conklin, 1992), yet a lack of focus on assessment literacy in initial teacher training has left many teachers feeling less than confident in this area. In this blog, we’ll be dipping our toes into some of the key concepts of language testing. If you find this interesting, be sure to sign up for my Oxford English Assessment Professional Development assessment literacy session.

What is assessment literacy?

As with many terms in ELT, there are competing definitions for the term ‘assessment literacy’, but for this blog, we’re adopting Malone’s (2011) definition:

Assessment literacy is an understanding of the measurement basics related directly to classroom learning; language assessment literacy extends this definition to issues specific to language classrooms.

As you can imagine, language assessment literacy (LAL) is a huge area. For now, though, we’re going to limit ourselves to the key concepts encapsulated in ‘VRAIP’.

What’s VRAIP?

VRAIP is an abbreviation for Validity, Reliability, Authenticity, Impact and Practicality. These are key concepts in LAL and can be used as a handy checklist for evaluating language tests. Let’s take each one briefly in turn.


Face, concurrent, construct, content, criterion, predictive… the list of types of validity goes on, but at its core, validity refers to how well a test measures what it is setting out to measure. The different types of validity can help highlight different strengths and weaknesses of language tests, inform us of what test results say about the test taker, and allow us to see if a test is being misused. Take construct validity. This refers to the appropriateness of any inferences made based upon the test scores; the test itself is neither valid nor invalid. With that in mind, would you say the test in Figure 1 is a valid classroom progress test of grammar? What about a valid proficiency speaking test?

Figure 1

Student A

Ask your partner the questions about the magazine.


1.       What / magazine called?

2.       What / read about?

3.       How much?



Student B

Answer your partner with this information.

Teen Now Magazine

Download the Teen Now! app on your phone or tablet for all the latest music and fashion news.

Only £30 per year!



‘Reliability’ refers to consistency in measurement, and however valid a test, without reliability its results cannot be trusted. Yet ironically, there is a general distrust of statistics itself, reflected in the joke that “a statistician’s role is to turn an unwarranted assumption into a foregone conclusion”. This distrust is often rooted in a lack of appreciation of how statistics work, but it’s well within the average teacher’s ability to understand the key statistical concepts. And once you have mastered this appreciation, you are in a much stronger position to critically evaluate language tests.


The advent of Communicative Language Teaching in the 1970s saw a greater desire for ‘realism’ in the context of the ELT classroom, and since then the place of ‘authenticity’ has continued to be debated. A useful distinction to make is between ‘text’ authenticity and ‘task’ authenticity, the former concerning the ‘realness’ of spoken or written texts, the latter concerning the type of activity used in the test. Intuitively, it feels right to design tests based on ‘real’ texts, using tasks which closely mirror real-world activities the test taker might do in real life. However, as we will see in the Practicality section below, the ideal is rarely realised.


An English language qualification can open doors and unlock otherwise unrealisable futures. But the flip side is that a lack of such a qualification can play a gatekeeping role, potentially limiting opportunities. As Pennycook (2001) argues, the English language

‘has become one of the most powerful means of inclusion or exclusion from further education, employment and social positions’.

As language tests are often arbiters of English language proficiency, we need to take the potential impact of language tests seriously.

Back in the ELT classroom, a more local instance of impact is ‘washback’, which can be defined as the positive and negative effects that tests have on teaching and learning. An example of negative washback that many exam preparation course teachers will recognise is the long hours spent teaching students how to answer weird, inauthentic exam questions, hours which could more profitably be spent on actually improving the students’ English.

Take the exam question in Figure 2, for instance, which a test taker has completed. To answer it, you need to make sentence B as close in meaning as possible to sentence A by using the upper-case word. But you mustn’t change the upper-case word. And you mustn’t use more than five words. And you must remember to count contracted words as their full forms. Phew! That’s a lot to teach your students. Is this really how we want to spend our precious time with our students?

By the way, the test taker’s answer in Figure 2 didn’t get full marks. Can you see why? The solution is at the end of this blog.

Figure 2

A    I haven’t received an invite from Anna yet.


B     Anna still hasn’t sent an invite.

The cause of this type of ‘negative washback’ is typically due to test design emphasising reliability at the expense of authenticity. But before we get too critical, we need to appreciate that balancing all these elements is always an exercise in compromise, which brings us nicely to the final concept in VRAIP…


There is always a trade-off between validity, reliability, authenticity and impact. Want a really short placement test? Then you’re probably going to have to sacrifice some construct validity. Want a digitally-delivered proficiency test? Then you’re probably going to have to sacrifice some authenticity. Compromise in language testing is inevitable, so we need to be assessment literate enough to recognise when VRAIP is sufficiently balanced for a test’s purpose. If you’d like to boost your LAL, sign up for my assessment literacy session.

If you’re a little rusty, or new to key language assessment concepts such as validity, reliability, impact, and practicality, then my assessment literacy session is the session for you:

Register for the webinar

Solution: The test taker did not get full marks because their answer was not ‘as close as possible’ to sentence A. To get full marks, they needed to write “still hasn’t sent me”.



  • Malone, M. E. (2011). Assessment literacy for language educators. CAL Digest October 2011.
  • Pennycook, A. (2001). English in the World/The World in English. In A Burns and C. Coffin (Eds), Analysing English in a Global Context: A Reader. London, Routledge
  • Stiggins, R. J, & Conklin, N. F. (1992). In teachers’ hands: Investigating the practices of classroom assessment. Albany: State University of New York Press.


Colin Finnerty is Head of Assessment Production at Oxford University Press. He has worked in language assessment at OUP for eight years, heading a team which created the Oxford Young Learner’s Placement Test and the Oxford Test of English. His interests include learner corpora, learning analytics, and adaptive technology.

1 Comment

Assessment in a Post-Pandemic World

empty classroomThere’s an elephant in the room!

At times, the whole world seems to be falling to pieces around us. Yet, the expectation is that we carry on and do our best to get through the crisis remains – and this expectation is right, as learners are looking towards educators for guidance and for a way through. I see it as our duty to ensure that the interruption to education is as minimal as possible and we’re all stepping up to try to do our bit. That’s why we’re doing the Oxford English Assessment Professional Development conference, to provide professional development to teachers who want to know more about assessment. For more information about what else Oxford University Press is doing to support students and teachers, click here.

My session is about assessing online and by providing access to this kind of professional development to teachers, I hope that our students benefit. Now the elephant called COVID-19 has been addressed, let’s move on to explore what changes it will leave in its wake and how teachers can adapt now to best serve their students.

A changed educational landscape

The current situation means that even teachers who have always avoided online are being forced to deliver lessons and/or content to their students digitally. There’s a spectrum here from the school which provides a few worksheets to parents to the schools who carry out all lessons via Zoom. Wherever you fall on that spectrum, there’s no denying that we’re all learning to do things differently and, in many ways, the digital revolution in education that has been promised for decades is now being forced upon the world. The impact of these changes is going to last far longer than the pandemic itself.

The continued importance of assessment

Assessment remains important in this new world for all the benefits that it brings, and I’ll discuss these more in my talk. In the absence of face-face contact, good assessment is more important than ever in providing feedback to students on their learning journey and keeping students engaged and motivated. Delivering this type of assessment online might be a challenge for some teachers and in this session, I’ll talk about some different scenarios where good assessment can be implemented, and I’ll provide you with a toolkit for carrying out assessment online.

Tell me what you want, what you really, really want!

The scenarios I’m going to address are based on what I know about learning, teaching and assessment but I’m not the expert in what’s happening for you right now. It would be awesome if you could leave comments and let me know about any scenarios you would like me to explore or any questions you have about online assessment. I’ll try to include as many as possible in the talk and I’ll make sure there’s a lot of time for questions and discussion. Join me and a community of educators to explore the topic of online assessment in a changed world.


In the absence of face-face contact, good assessment is more important than ever in providing feedback to students on their learning journey and keeping students engaged and motivated. In my session, I’ll talk about some different scenarios where good assessment can be implemented, and I’ll provide you with a toolkit for carrying out assessment online.

Register for the webinar


Sarah Rogerson is Director of Assessment at Oxford University Press. She has worked in English language teaching and assessment for 20 years and is passionate about education for all and digital innovation in ELT. As a relative newcomer to OUP, Sarah is really excited about the Oxford Test of English and how well it caters to the 21st-century student.


Adaptive testing in ELT with Colin Finnerty | ELTOC 2020

OUP offers a suite of English language tests: the Oxford Online Placement test (for adults), the Oxford Young Learners Placement Test, The Oxford Test of English (a proficiency test for adults) and, from April 2020, the Oxford Test of English for Schools. What’s the one thing that unites all these tests (apart from them being brilliant!)? Well, they are all adaptive tests. In this blog, we’ll dip our toes into the topic of adaptive testing, which I’ll be exploring in more detail in my ELTOC session. If you like this blog, be sure to come along to the session.

The first standardized tests

Imagine the scene. A test taker walks nervously into the exam room, hands in any forbidden items to the invigilator (e.g. a bag, mobile phone, notepad, etc.) and is escorted to a randomly allocated desk, separated from other desks to prevent copying. The test taker completes a multiple-choice test, anonymised to protect against potential bias from the person marking the test, all under the watchful eyes of the invigilators. Sound familiar? But imagine this isn’t happening today, but over one-and-a-half thousand years ago.

The first recorded standardised tests date back to the year 606. A large-scale, high-stakes exam for the Chinese civil service, it pioneered many of the examination procedures that we take for granted today. And while the system had many features we would shy away from today (the tests were so long that people died while trying to finish them), this approach to standardised testing lasted a millennium until it came to an end in 1905. Coincidentally, that same year the next great innovation in testing was established by French polymath Alfred Binet.

A revolution in testing

Binet was an accomplished academic. His research included investigations into palmistry, the mnemonics of chess players, and experimental psychology. But perhaps his most well-known contribution is the IQ test. The test broke new ground, not only for being the first to attempt to measure intelligence, but also because it was the first ever adaptive test. Adaptive testing was an innovation well ahead of its time, and it was another 100 years before it became widely available. But why? To answer this, we first need to explore how traditional paper-based tests work.

The problem with paper-based tests

We’ve all done paper-based tests: everyone gets the same paper of, say, 100 questions. You then get a score out of 100 depending on how many questions you got right. These tests are known as ‘linear tests’ because everyone answers the same questions in the same order. It’s worth noting that many computer-based tests are actually linear, often being just paper-based tests which have been put onto a computer.

But how are these linear tests constructed? Well, they focus on “maximising internal consistency reliability by selecting items (questions) that are of average difficulty and high discrimination” (Weiss, 2011). Let’s unpack what that means with an illustration. Imagine a CEFR B1 paper-based English language test. Most of the items will be around the ‘middle’ of the B1 level, with fewer questions at either the lower or higher end of the B1 range. While this approach provides precise measurements for test takers in the middle of the B1 range, test takers at the extremes will be asked fewer questions at their level, and therefore receive a less precise score. That’s a very inefficient way to measure, and is a missed opportunity to offer a more accurate picture of the true ability of the test taker.

Standard Error of Measurement

Now we’ll develop this idea further. The concept of Standard Error of Measurement (SEM), from Classical Test Theory, is that whenever we measure a latent trait such as language ability or IQ, the measurement will always consist of some error. To illustrate, imagine giving the same test to the same test taker on two consecutive days (magically erasing their memory of the first test before the second to avoid practice effects). While their ‘True Score’ (i.e. underlying ability) would remain unchanged, the two measurements would almost certainly show some variation. SEM is a statistical measure of that variation. The smaller the variation, the more reliable the test score is likely to be. Now, applying this concept to the paper-based test example in the previous section, what we will see is that SEM will be higher for the test takers at both the lower and higher extremes of the B1 range.

Back to our B1 paper-based test example. In Figure 1, the horizontal axis of the graph shows B1 test scores going from low to high, and the vertical axis shows increasing SEM. The higher the SEM, the less precise the measurement. The dotted line illustrates the SEM. We can see that a test taker in the middle of the B1 range will have a low SEM, which means they are getting a precise score. However, the low and high level B1 test takers’ measurements are less precise.

Aren’t we supposed to treat all test takers the same?

                                                                                            Figure 1.

How computer-adaptive tests work

So how are computer-adaptive tests different? Well, unlike linear tests, computer-adaptive tests have a bank of hundreds of questions which have been calibrated with different difficulties. The questions are presented to the test taker based on a sophisticated algorithm, but in simple terms, if the test taker answers the question correctly, they are presented with a more difficult question; if they answer incorrectly, they are presented with a less difficult question. And so it goes until the end of the test when a ‘final ability estimate’ is produced and the test taker is given a final score.

Binet’s adaptive test was paper-based and must have been a nightmare to administer. It could only be administered to one test taker at a time, with an invigilator marking each question as the test taker completed it, then finding and administering each successive question. But the advent of the personal computer means that questions can be marked and administered in real-time, giving the test taker a seamless testing experience, and allowing a limitless number of people to take the test at the same time.

The advantages of adaptive testing

So why bother with adaptive testing? Well, there are lots of benefits compared with paper-based tests (or indeed linear tests on a computer). Firstly, because the questions are just the right level of challenge, the SEM is the same for each test taker, and scores are more precise than traditional linear tests (see Figure 2). This means that each test taker is treated fairly. Another benefit is that, because adaptive tests are more efficient, they can be shorter than traditional paper-based tests. That’s good news for test takers. The precision of measurement also means the questions presented to the test takers are at just the right level of challenge, so test takers won’t be stressed by being asked questions which are too difficult, or bored by being asked questions which are too easy.

This is all good news for test takers, who will benefit from an improved test experience and confidence in their results.


                                                                                            Figure 2.

Colin spoke further on this topic at ELTOC 2020. Stay tuned to our Facebook and Twitter pages for more information about upcoming professional development events from Oxford University Press.

Colin Finnerty is Head of Assessment Production at Oxford University Press. He has worked in language assessment at OUP for eight years, heading a team which created the Oxford Young Learner’s Placement Test and the Oxford Test of English. His interests include learner corpora, learning analytics, and adaptive technology.

You can catch-up on past Professional Development events using our webinar library.

These resources are available via the Oxford Teacher’s Club.

Not a member? Registering is quick and easy to do, and it gives you access to a wealth of teaching resources.


Weiss, D. J. (2011). Better Data From Better Measurements Using Computerized Adaptive Testing. Testing Journal of Methods and Measurement in the Social Sciences Vol.2, no.1, 1-27.

Oxford Online Placement Test and Oxford Young Learners Placement Test: https://elt.oup.com/feature/global/oxford-online-placement/

The Oxford Test of English and Oxford Test of English for Schools: www.oxfordtestofenglish.com