turadg asked: I really like how you and other Khan Academy data scientists post your analyses. As a graduate student in HCI and learning sciences, I've been surprised that none of the analyses (that I've seen!) have any cognitive modeling. Is that true? If so, why? Thanks.
While I certainly am nowhere close to giving up, in the limited experience I have comparing cognitively-based models versus, let’s say, more purely empirical models, the purely empirical models have performed much better.
Online education is hot field right now. That means a lot of programmers and data scientists are interested in learning about education and instruction; a lot of teachers are rethinking the potential uses of technology and analytics. What may be less apparent or accessible to the newly interested, though, is the fantastic body of research available to them through the field of learning science(s).
The science of learning is concerned with questions like: How do people learn? How does learning vary across subjects, people, and environments? What reliably improves or harms learning? What is known about the effectiveness of online learning? Can we use psychological interventions to boost learning?
With answers to questions like these, online education professionals can have a huge head start on building great learning experiences.
Over the last year, I’ve tried to acquire a solid footing in the learning sciences. Given the surging interest in the field, I made this reading list in an attempt to streamline the process for others.
The list was made with the following goals in mind.
Here’s the list..
How People Learn - (Free after signup with NAP). If you only have time to read one book, this is your winner. Summarizes decades of research in the science of learning and highlights its connection to educational practices. This is the foundation on which everything below is built.
e-Learning and the Science of Instruction - Summary of how learning science informs best practices when designing and delivering learning experiences online.
Knowing What Students Know / Executive Summary - (free from NAP) Learn about the science of effective assessment practices— and the shortcomings of many current-day assessments. Presents a framework for building designed to improve learning, not just score it.
Why Don’t Students Like School? / Sample Article - Written by a cognitive scientist, this is fairly overlapped with How People Learn. But it was a fun, short read, and did as much as any reading to drive the learning principles home for me. And at this point you deserve some fun, right? It’s well worth a couple hours and ten bucks.
How Students Learn Math, History, Science / Executive Summary - Follow-up report to How People Learn with chapters written by expert educators in each field. Even greater emphasis on application to the classroom.
Yes, the list is made entirely of books. No blogs, no zippy articles. I’m sorry. But the truth is, while I read a lot of articles, it was through these books that I felt like I was deeply learning and my perspective was changing. And given what I have learned in these books, I think I might know why! Built-in to the learning experience of this list are:
- Spacing - Most of the books were not read in one sitting. Revisiting them multiple times helped my retention.
- Repetition and practice - While the books have lots of overlap, the redundancy provided extra reinforcement and helped me master the vocabulary so I could begin to focus on the concepts.
- Lots of examples - Book-length writing provided room for the authors to use ample concrete examples. Processing of many examples is the most proven path to expert knowledge.
- Thinking about meaning - Something about the commitment and pace of reading a book encouraged me to take time and reflect more deeply about what I read. And as you will learn, knowledge is memory, and memory is the residue of thought.
The list is a work in progress, and I’d love to hear your own suggestions via comment here or on Twitter.
Bayesian networks (and probabilistic graphical models more generally) are cool. We computer geeks can love ‘em because we’re used to thinking of big problems modularly and using data structures. But better than being cool, they’re useful. Especially if you have the kind of problem that involves hundreds or thousands of interrelated variables, any one of which you might want to predict based on some subset of the others. Did I mention that your variables can have noisy, missing, or just plain unobservable data?
At Khan Academy, we’ve got problems like that.
So I sat down to start applying these tools, and there were plenty of online resources to help with the book learnin’. I’m not going to duplicate them—I recommend this or that. When I started coding, though, I had several practical questions and really wanted a simple code example. I didn’t find a great one, so I’m posting what I came up with in the hope of helping the next soul.
THE WORKING EXAMPLE
For our example problem, let’s say there is a hidden (unobservable) binary variable T for every Khan Academy user that represents whether they have mastered a given topic (1=mastered, 0= not mastered). You can think of a topic as a collection of any N exercises. While topic mastery is hidden, we can partially observe performance on the exercises in that topic. I say ‘partially’ because the user may not do problems on all of the exercises. So the example program needs to handle missing data for all of the exercise variables E_i, which for simplicity can also be binary variables (1=good performance, 0=bad). In the idiom of Bayes nets, then, our graph looks like this:
Of course, lots of problems could be modeled by this simple graph structure of a hidden parent variable with a collection of children with missing data. Let me know if you invent another interesting application!
Here’s the code. It contains functions to learn from a simulated example or from a data file representing the evidence from your child variables. The learning algorithm is an expectation-maximization, of which the most interesting piece is the expectation step where we must impute the hidden T-variable given whatever E-variable evidence is available for that data sample. To see the algebra worked out using Bayes rule, check out this excellent write up courtesy of Jascha.
If for some sad reason you have an aversion to matrices (or NumPy), you might also take a look at my rough draft version written with the excellent Pandas library instead of just NumPy. (It doesn’t support handling of missing data for the E-variables and I ended up converting strictly to NumPy for speed.)
What I find fascinating (and what the Pandas version illustrates nicely) is that the most complicated math really needed here is… counting! The idea that we can learn the full joint distribution with nary a gradient or a step size parameter in sight kind of feels like magic to me.
PRACTICAL TIPS AND TRICKS, TRIALS AND ERRORS
A few lesson learned here and over the years:
Despite the simplicity of this example, it is already Really Useful. I created a data set for a subset of the Khan Academy exercises, which you can download here. I heuristically chose a definition for the E-variables of whether a user answered more or less than 85% of problems correct on an exercise. Once I had learned the full joint distribution (“theta” in the code), I could infer the probability of mastery for a given user on any exercise, including exercises for which they had not yet done any problems. When I plugged those predictions in as an additional feature to our accuracy model, it was a highly significant feature, especially on the first few problems done for an exercise.
Of course, there are many different ways we could construct new features to summarize performance across a pool of exercises, but this a clean, robust, and transparent option. It’s easily extended to a full hierarchical model of our knowledge map. And the graphical modeling framework is powerful enough that we can eventually accomodate temporal effects, a decision-making agent, prior knowledge of experts (hello, teachers!), and more.
I’m tremendously excited about the potential of probabilistic graphical models to power optimized online learning at Khan Academy and elsewhere. If you’re interested in learning about these modeling techniques in general, hustle over to Stanford’s free and recently-launched online course, and follow me on Twitter for more practical examples and updates. If you want to directly improve the future of education, there’s a place to do that, too.