The last three months or so I’ve been in school (which is why I haven’t been posting as much lately). Not a real bricks-and-mortar school—I’ve been participating in the “Data Science Specialization” series of online courses created by faculty at the Johns Hopkins Bloomberg School of Public Health and offered by Coursera, a startup in the online education space. It’s been an interesting experience, and well worth a blog post.
The obvious first question is, why I am doing this? Mainly because I thought it would be fun. I was an applied mathematics (and physics) major in college, enjoyed the courses I had in probability, statistics, stochastic processes, etc., and wanted to revisit what I had learned and (for the most part) forgotten. It’s one of my hobbies—a (bit) more active one than watching TV or reading. Also, I’ve done some minor fiddling about with statistics on the blog (for example, looking at Howard County election data), am thinking about doing some more in the future, and wanted to have a better grounding in how best to do this. Finally, “data scientist” is one of the most hyped job categories in the last few years, and even though I probably won’t have much occasion to use this stuff in my current job it certainly can’t hurt to learn new skills in anticipation of future jobs.
The next question is, why an online course? Because I didn’t have the time (or the money) to commit to attending an in-person class, but I wanted the structure that a formal class provides. I’ve been (re)learning linear algebra out of a textbook for over four years now, and I still haven’t gotten past chapter 3. Part of the reason is that I’m doing every exercise and blogging about it, but mainly it’s that I don’t have an actual deadline to finish my studies. In the Coursera series there are nine courses, each lasting a month, with quizzes every week and course projects every 2-4 weeks depending on the course. I’ve been doing pretty well in the courses thus far and don’t want to spoil my record. For example, the first project in the current class was due Sunday but I was concerned about missing the deadline and so finished it last Friday night.
I like the way the series of courses is structured as well, not just as a class in statistics (only) but covering the whole range of skills needed to wrangle with data in its various forms, not least including the problems of getting datasets and cleaning them up. Each class thus far has only been a month long, so the time commitment is not that great and I know any work I do today will pay off in a completed course not too far down the road. It is a fairly serious commitment of time though, especially since the course video lectures cover only a fraction of what you need to know in order to do the course projects and correctly answer the more difficult quiz questions. I’ve probably spent almost 10 hours each week working on various aspects of the classes, including doing a copious amount of Internet searching to find out the additional information I need. But it’s been time well-spent: I feel like I’m getting a good understanding of how to do “data science” tasks—not that I know everything, but I have a much better picture of what I need to know, and what it would take to finish learning it.
The course I’m currently taking (“Exploratory Data Analysis”), like the others in the series, is what’s been referred to as a MOOC, or “massive open online course”, open at no charge to anyone in the world who wants to participate over the Internet. The instructors provide video lectures and create the quizzes and class projects but are not otherwise directly involved; the students provide help to each other in online discussion forums, assisted by “community TAs”, i.e., former students who volunteer as teaching assistants. MOOCs have recently been the subject of both hype and caution; now that I’ve been involved in them day-to-day I can provide a personal perspective on the controversy.
First, I think MOOCs are good for the sort of people who invented them in the first place: Internet-savvy folks with a technological bent who are motivated to learn something and have the necessary free time and background experience and knowledge to do so effectively. I’ve certainly appreciated having convenient no-charge access to a wide variety of classes, many of which (like the courses I’m taking now) have been put together by people who are leaders and innovators within their fields. I’d even consider paying for at least some of these courses (at $49 each) in order to get a more formal “verified certificate” (as opposed to a “statement of accomplishment”, and may do so for later courses within this series—potentially good news for Coursera, which in the end is a profit-making enterprise.
However for people who are not Internet-savvy, not all that motivated, and don’t have the necessary background then MOOCs aren’t a good choice. In fact, they’re about the worse choice there is. The dropout rates in MOOCs are extremely high (well above 90% in many cases), and the first serious test of MOOCs as a replacement for in-person college courses (at San Jose State University) was not a raging success. Which is not to say that online learning in general is doomed; in its more traditional forms (for example, University of Maryland University College) it’s doing quite fine.
MOOCs are simply the latest in a long line of attempts to move away from the traditional classroom model and “disrupt” the existing educational establishment. They’ll eventually find a place in the overall educational picture, most likely serving a variety of needs from “learning as hobby” (what I’m doing), high-end vocational education (what Coursera competitor Udacity seems to be morphing into), or as a supplement to traditional classes. But that’s for the future, and no real concern of mine; in the meantime I’m just trying to learn how to plot in R.