Friday, April 04, 2014

Humans versus Machines, Statistics for the win.

In an online interview, Professor Chris Wiggins said
... Machine learning sits at the intersection of data engineering and mathematical modeling. The thing that makes it different from statistics traditionally, is far more focus on building algorithms.
Another difference, although this is more of a spiritual difference, is that statistics traditionally has had a stronger emphasis in explaining a data set and machine learning has far more interest in building predictive models. For example, when Netflix tells you what movie to watch or when Amazon predicts what book to buy--that’s machine learning. ...
For years, this has been my understanding of the distinction between Statistical Learning and Machine Learning but it means more when it came from a machine learning expert. Statisticians are indeed obsessed with finding explanations of the variations we observe in a data set, through modeling, visualization, simulation-based model checking and etc. We are interested in which few features can explain away a large proportion of variation in the outcome of interest, and through what forms of mechanisms. These features are mostly likely related to the scientific mechanisms behind the data. Understanding such mechanisms will be of importance to any data-driven decision making process such as policy making, intervention, and e.g., personalized medicine. We do not believe that our models are correct but we do think they are often very useful. For example, we use model-based framework to understand what assumptions were made for specific machine learning algorithms. Such understanding is then used to evaluate these algorithms under the situations where the assumptions do not hold and to suggest extensions, modifications to make them work better. Modeling is not our goal. Understanding is. Machines do not need to understand the work they are doing but humans do. For the curious minds, Statistics can help.

Friday, March 14, 2014

Good read for aspiring applied statisticians

Roderick J. Little (2013) In Praise of Simplicity not Mathematistry! Ten Simple Powerful Ideas for the
Statistical Scientist, Journal of the American Statistical Association, 108:502, 359-369. [Link]

Wednesday, January 15, 2014

MS in Data Science at Columbia

An interview with Susan Murphy

Here is a very interesting interview I read online. Highly recommended.

My favorite quote from this interview, among others, is
Well, I've got this viewpoint—and I don't know if this is very mature—but I think it's a big game out there, and we all have to be prepared to play it. Everyone is trying to frame things in their own way, and we all have to try and be as educated as possible so we can understand the degree to which it's a game, and know if someone's trying to pull the wool over our eyes. Even if it's someone I agree with!
I totally feel the same way. Having been a statistician for years, I have become more and more aware of the fact that I often interpret the world in a way different from my non-statistician friends.

Friday, November 01, 2013

R commander: why is clicking better than typing?

I came across a GUI for R called R commander. It resembles a typical, more user friendly, interface where users can explore the drop down menus and select (basic) things they can apply to their data. I do not find it easier than writing my own R script. But I think this can actually be a blessing to people (i.e., students) who have not written a single script in their life before coming into an Intro Stat class.

I think the main reason why this makes life easier for certain user group is that you don't have to remember much to get started. The interface has the same structure as the other interface used by a personal computer (PC or Mac)'s operating system. Therefore, you understand what you are supposed to do, more or less. Therefore, most students should have had the required essential skill set to use R commander before taking the intro stat course, even not R itself.

The regular R console is another story. You can copy-paste the examples in a teacher's lecture notes without having a clue about what you are doing. This is understandably frustrating. If a student in an intro state class decides to go deeper into statistics, s/he eventually would need to learn how to program (R, C, Perl, Python, or whatever). This will naturally become more interesting (or less frustrating) once the student is into statistics already.