A pencil on a written test

Using Data to Design Tests People Don’t Hate

David Saben is on a mission to make taking tests less painful, and he’s using data to do it. In this episode, he’ll discuss reviving methods developed in 1979 to shorten tests and make them more effective, as well as how to use psychometrics to aid in the design and crafting of an effective test.

David Saben: When I see my son who’s 11 years old, spending three days and testing when I know there’s absolutely no reason for it that you can do that in an hour.

Ginette Methot: I’m Ginette

Curtis Seare: And I’m Curtis

Ginette: And you are listening to Data Crunch

Curtis: A podcast about how applied data science, machine learning and artificial intelligence are changing the world.

The father of lean startup methodology once said “There are no facts inside the building so get the heck outside.”

The education industry is no different. Sometimes the facts that’ll make your machine learning career are waiting just outside your office. 

Read more at mode.com/mledu

m o d e dot com slash M L e d u

Ginette: Today we chat with David Saben, the CEO and president of Assessment Systems, an organization innovating psychometrics (the science of assessment)

Dave: I originally started my career in telecommunications, uh, bringing voice and data services into institutions and to learning institutions. And then when I realized is, is that connecting universities and for profit schools, you know, connecting them online really created a huge opportunity for learning and really crossing barriers to learn and really meeting learners on their terms with online learning courses. And that kind of brought me through this, this journey with using technology to, to really make better decisions in learning and knowledge and how we do that effectively. And that has started a about a 16 year career focused on that using using data, using e tools to make a better learning environment for everybody and make us more effective in the way that we, we gather information and retain information. And that that’s left. Let brought me, um, into several areas. One is in the learning sciences is how do you, how do you deliver learning content more effectively, but also in the assessment side as well, where, how do you measure what folks are learning effectively and painlessly in that that’s brought me on this, uh, this journey into the assessment industry and really making sure that every exam that’s delivered in classrooms or whether it’s a licensure exam is as fast and as fair as possible and using data to be able to do that.

So really mitigating the risk of human bias when it comes to measuring a human’s abilities, uh, which is, uh, which is a troublesome area, right?

Curtis: Yeah. And now you say a effective and, and painless. And I know most people hate taking tests, so, so tell me how you approach that.

Dave: Yeah. Well, I think there’s a lot of ways. I mean, I think one of the, one of the most important ways is that you make the test faster, right? You make, you know, in 1979, I was the chairman of assessment systems help create a technology called computerized adaptive testing. What that uses, it uses algorithms to gauge what you know and what you don’t know and then basically tailoring the content that you see, the next item you see gets more progressively difficult or progressively easier depending on your, your ability. And what that does is that reduces test time by about 50%. We see that with the ASVAB exam that’s given to our service men and women to make their testing experience faster and fair and really, and we’re starting to see that really across the world with measurements. So really making those exams tailored to the person’s ability, uh, which is really, really important.

You know, what you don’t want to do is you don’t want to give one test that doesn’t change to everyone cause that’s really, really inefficient. You know, if I’m going through the test and I know I know the content really well, I just fly right through it, but I still have to take that two hour exam. It doesn’t change based on my ability. Right. That’s a problem. We probably already know over the first, you know, 15 minutes if you have knowledge of that content area or if you don’t, the exam has to adapt. The ah, the measurement tool has to adapt. So it makes it fair that technology has been around for 40 years, but we still, we struggle on widespread adoption of the technology for a number of reasons. But that’s one of the first ways you do it. The second way you do it is that after, after folks have taken the exam, you start running algorithms over it and you start running what’s called psychometric, uh, algorithms over the, over the dataset to understand how the items are performing, how that test item performs with different audiences. What you don’t want to have in that examination is an item that’s unfair that you know that no one’s getting right or you don’t want to. Or conversely, you don’t want an item that everybody’s getting ah, getting wrong. You want to make sure that’s a fair, fair item in the way that you do that is through data modeling and through the data sciences, which looked the first real truly vetted data science is psychometrics, which is the measurement of, uh, of human intelligence.

Curtis: Let’s dive into that a little bit cause it’s a super interesting topic and I didn’t know about it until till we talked. So tell me a little bit about psychometrics. I mean, it’s old and had been around for a long time. So tell me how that developed and maybe maybe recent advances or maybe we’re still using the same stuff that that’s been around forever.

Dave: Yeah, it’s been around a very, very long time, Curtis, so it’s been around over a hundred years. Charles Spearman started originally was the father of factor analysis. And what he started to do was trying to gauge the understanding of intelligence and what does that mean and really applying data to it to make sure that, that we’re understanding those, uh, those results. So it started with that whole concept and then how do you come up with examinations and what does it mean when someone takes the examination? How do we know that that’s accurate? How do we know that it’s actually gauging and measuring, uh, intelligence. And so that started over a hundred years ago and then has worked through evidence based research for that entire time. And so there were schools where folks go, there are folks called psychometricians. These are people with PhDs in tests and measurement. And what they do is that they, they work through a number of psychometric models and research models to determine if people understand that content or better and faster ways of assessing content. And they use all kinds of of models. I mean it started with dimensional reduction and then algorithms to start predicting performance of the data. Again, one of the oldest data sciences, and it started with, with gauging human knowledge and human performance.

Curtis: So this actually allows you, what you’re talking about is the dynamic test, right? So they, as they’re taking questions, you’re then predicting what the next question should be to just gauge their intelligence on a certain content topic.

Dave: That’s right. I mean, and using and using a evidence based research to do that, right? Not Arbitrary decisions. So data models that are behind the scenes and say, “okay, you got this item wrong, you got this item correct.” And then really it changes based on individual performance, which is really, really important. Right? I mean, you know, one size doesn’t fit all in measurement and that’s a really, really important part of it. If you don’t have that, it makes it very, very hard to gauge effectively intelligence or mastery of content.

Curtis: I’m curious how you even approached training a model like that. Obviously there’s a lot of domain expertise here, I’m assuming, but how do you approach training a model that can dynamically gauge intelligence and give appropriate questions?

Dave: Well, well, it actually a couple things. So you have to start with the modern test theory and modern test theory is where you’re using a training set to the calibrate the sample. That’s been around for a very, very long time. The idea of a, of looking at the individual item and they engaging a, what that means. And then the, the um, the algorithm that goes around it. So it’s really, it’s an algorithmic approach to, uh, to testing. Looking at that one item, how did you perform? And then tailoring another item and understanding the difficulty, difficulty of that item. It really starts by field testing tests. And so every single time someone takes a test, there’s going to be an unscored item in that test. And you’re going to gather data based on people’s performance on that item, right? Cause it’s unscored and you’re going to see how folks are performing. Is the item difficult? Is it not difficult? And that’s how you start training the models and then you start creating your examination from that and start building it based off of that. But it really starts with, with modeling it over live data and understanding how human beings perform on it. It’s really an important element of it. And so, you know, that’s that you, that’s always best practices and psychometrics is really applying it to, to live, to live examinations. And unscored items are what we call field testing.

Curtis: You mentioned earlier, I mean the science behind this is fascinating and uh, but you mentioned earlier there’s some other roadblocks in terms of, of applying this and getting adoption for it. Can we go into that a little bit?

Dave: Yeah, I, I think, you know, we, this industry has been around a very, very long time. It’s about a $17 million addressable market and it’s filled with very, very large corporations that have vested interests with keeping examinations long. I’ll give you an example. If you’re charging per hour to a test sponsor and a test sponsor would be somebody that owns the test. I’ll give you an example. So if we’re talking about the ah, so I don’t know the certified, uh, financial planning exam, the Certified Financial Planning Board or the folks that own the examination, but they’re not the ones who administer the exam. Folks that administer the exam would be a company like Pearson Education or Pearson view or Prometric. These very, very large assessment providers. And they’re going to charge those sponsors based on the length of the examination. So the longer the exam is, the more the more revenue you make per examination, smaller the exam, the less the less money you can make.

That’s the model in the industry. That’s how it works. And so there’s a vested interest in not making it shorter, right? Um, definitely vested interest in it. And then you have on top of it, you know, this, this idea where they say, oh, it’s too difficult to do. It’s too complex to do. That isn’t the case. I mean there is software out there that helps really move this process forward in a really accelerated manner. I mean, you can, it doesn’t take somebody with a spreadsheet crunching numbers. We have software that can do that and we have software that can do it better and faster and more precise than human beings could ever do it. And that, that’s really the big difference there. But there’s definitely, definitely a, um, a problem with, with moving these forward. Again, it’s, it’s happened, I mean this, this technology was created in 1979 a year after I was born.

I mean, and we’re still not seeing widespread adoption across the globe. I mean, we’re seeing it. It’s funny, we’re seeing it outside the US even more rapid adoption than we’re seeing it in the u s which is a pretty, pretty scary thing. You know, we put on a, uh, an annual conference. We have this, uh, in about a month in Minneapolis and it’s computer, a computer adaptive test administrators come together from all over the world and the large, large portion of our audience is from overseas. And bringing this technology to emerging markets and emerging countries. It’s very, very interesting.

Curtis: Got It. What are the results of people that adopt this testing approach and what kind of benefits do they see from it?

Dave: Oh, Geez, Curtis. I mean, just a, a myriad of benefits. So first of all, you can get a more precise instrument, right? So you can get an instrument in assessment that, uh, is more precise at gaging competency and also you can reduce time quite significantly. We just did, uh, we converted, um, a national science exam overseas to a computerized adaptive test and we saw that the test length was reduced by 74%, which is significant.

Curtis: You can complain about that.

Dave: That’s right. I mean, you know, it’s, it’s testing without tears. You know, all the studying. You don’t have to spend eight days testing. I mean, and it, it breaks my heart. You know, even in my personal life when I see my son who’s 11 years old, spending three days and testing when I know there’s absolutely no reason for it that you can do that in an hour. You don’t have to do it in four hours of testing. You just have to construct the test more intelligently and you have to leverage, you have to leverage technology, you have to leverage computerized, computerized testing and computerized adaptive testing. And that’s a really, really important part of, of what we’re doing. There has to be a push here as a nation to move our testing online. I mean, 90% of the assessments that are delivered, a test that are delivered in the state of Texas are still delivered as paper and pencil exams. We’re still printing paper, and we’re destroying paper and we’re destroying forests for absolutely no reason. Well, I mean, we’re, we’re better than that.

Curtis: How’s that going forward? I mean, obviously there’s some economic interests at play, maybe some political interests at play. How do you overcome some of these more human challenges to actually allow the data science to work?

Dave: To use a, uh, sort of, uh, you know, really an unpleasant analogy is how do you eat an elephant one bite at a time. And so what you do is you take, you take smaller, smaller projects and you bite them off and then you publish, you publish white papers and you start really putting it out in the community and start eliminating those barriers. One of the, one of the challenges that folks are always going to run into is a human beings, we intrinsically don’t like change. Uh, if something, if they think something is working, they have to be able to see a painless way to do it. So one of our goals is to create applications and tools that make it easy for folks to convert, make it easy for folks to convert from a linear examination, which is just a plain old test to a, to a more adaptive, adaptive measurement tool.

And that is uh, you know, that should you pick off small little parts. You know, this, the, the one example that I gave you with the national science competition is one example where we took a linear examination and we made it adaptive in our hopes are that once you do projects like that, then you start biting off more of the larger projects. You know, the, the national examinations, right? You know, the, the test that kids take and kindergarten all the way through 12th grade that are gauging their, ah, mastery of the content that they’re learning during those formative years. You start applying those all the way through, but you start with smaller projects and you show the efficiency and you show the effectiveness and then you help educate the community and you help educate test takers.

You know, when, when you go take a test, which you will take, and I will take, we all take tests. You have to ask yourself every time you take it, isn’t there a better way of doing this? Why am I gonna spend six hours taking a test? Haven’t you folks thought of a better way? This seems like a lot of time out of my life and that’s the one resource we can’t make up is time. So it starts with the individual and it starts with us, with us asking “why.” I told my son to use that example, you know, ask your school why, why do you have to take three days out of your classroom instruction to take a test when it literally can be done in an hour. But this is just, it starts with, with informing the public and the public actually saying, “you know what, listen, this is crazy. I’m not going to take the test if it’s going to take six hours out of my life.”

Curtis: Yeah. What do you look for when you’re, when you’re building out your, your data science team? Are you looking for PhDs? Are you, you know, like what kind of skills, um, work in the psychometrics field?

Dave: Well, in our, in our field primarily it’s a, it’s PhDs and psychometrics, but more precisely, it’s a, it’s quantitative psychometrics. So there’s, you know, the industry of psychometrics, there’s application-based psychometricians that are, you know, more they’ll go in and they’ll work with an organization at sort of helping them understand the models. And then there’s a smaller subset of those folks, which are quantitative psychometricians and they’re the ones that actually build the models. Uh, so we’re looking for quantitative folks. Um, but again, with a very, very specific focus on, uh, on tests and measurement because again, this has been around for a very long time. While we’re seeing, you know, data science as an emerging, as an emerging a course of study in academia, psychometrics and quantitative psychometrics has been around for a very, very long time. University of Kansas, University of Minnesota, these, these institutions have had quantitative psychometric schools for a very, very long time.

And so we have, we have an advantage over lots of other areas that we actually have, have schools putting out candidates. Now. It’s not enough, um, because they’re, they’re small cohorts of, of students maybe 12 at a graduating class, 13 in the graduating class. Uh, but it’s, but at least there’s been, you know, a, a, a focus there, uh, in, in some of the other data scientists. We’re just not seeing it. I mean, now we’re trying to, you know, now, now academia is trying to catch up and we’re seeing this pattern right that’s existed for at least the last, what, 15, 20 years record, where academia is spending their time trying to catch up with the emerging workforce. And that becomes a very, very large gap to try to try to, uh, cover when you’re talking about complex things like machine learning, um, in neuro net and, you know, intelligent algorithms. Um, but we have an advantage that way.

Curtis: Where do you want to be in five to 10 years? Like, you know, where, what can this idea of, of this intelligent assessment take us?

Dave: Well, I, I, where I’d like to be in five years is that I’d like to be in a place where my son isn’t spending eight hours of testing. I’d like to be in a place where, um, you know, testing becomes faster and fairer and we don’t just eliminate testing. What we’re seeing is that with the common core and some of the pushback with testing, because the tests are so long and they become, uh, they become really painful for students and for educators, that people just want to throw them out. And, and that isn’t the way to do it. What you do is you just make them better and make them faster. So, you know, my hope is that, you know, we will live in a world where we’re, we’re assessing more intelligently and faster and we’re also, you know, providing content remediation to people faster.

And I think that we’re, we’re, I think this can happen. We’ve seen record breaking investment in educational technology over the last five years and it’s continuing to grow. We’re seeing, you know, it’s a worldwide field, uh, where, you know, globally folks are interested in measuring ability and measuring content mastery. So I think too, you know, I think as a world together, we’ll be able to, to really, to, to cross this chasm. Um, and so I, you know, I’d see a, you know, five years from now when my son is a sophomore in high school. Hopefully that eight hour exam is down to an hour or down to a half an hour, maybe even down to 15 minutes.

Ginette: Thanks for listening. A big thank you to David Saben. Also, remember  The father of lean startup methodology once said “There are no facts inside the building so get the heck outside.”

The education industry is no different. Sometimes the facts that’ll make your machine learning career are waiting just outside your office. 

Read more at mode.com/mledu . . . m o d e dot com slash M L e d u

Attributions

Music

“Loopster” Kevin MacLeod (incompetech.com)

Licensed under Creative Commons: By Attribution 3.0 License