Wednesday, December 17, 2014

Stacked bar-plot to show different allocation profiles.

In our 2010 paper on estimating personal network sizes, we used the following graph to show the non-random mixing matrix we estimated for personal networks:

Each group of bars represent the composition of a certain ego group member's social network, broken down into eight groups of alters (or types of acquaintances). This figure demonstrates the homophily phenomenon in social networks that individuals tend to form ties with others who are similar.

Today, Shirin asked me about how I made this plot. Despite its "busy" appearance, it is actually pretty easy to make it. Assume you have two ego groups. Therefore you have two vectors of proportions (composition) of length 8. We assume the first 4 are for males and second set of 4 are for females.

Wednesday, December 10, 2014

Circlize your visualization!

Using a circular organization in visualization is a good way of presenting a system of information such as a network. It is also known as a chord diagram.

I found a nice R package called circlize that provides functions to create a whole range of cool visualizations. Read their tutorials to have as much fun as you would like. Here is the one I like the most. You start with a grid of images like this (Keith Haring’s doodle)

and make it into a cool circular adaptation:

Tuesday, November 18, 2014

OpenIntro Statistics: an online intro stat book with labs

I came across this nice online portal on introductory statistics: OpenIntro Stats. It has a textbook, labs on R or SAS, teachers resources (slides, learning objectives), videos, and much more. Everything is laid out in a nice accessible platform, including LaTex source files. It is a nice resource for learning intro stat, R/R studio and LaTex.

Saturday, November 15, 2014

Visualizing An American Day in real time.

Tom Ireland wrote
The average American's alarm clock goes off at about 7am to get to work just in time for a 9 to 5 job, only to drive back home, have dinner at 6pm and watch a bit of TV before bed at 10:30pm. But how typical is this routine, really?
After reading your blog, I thought you might be interested to know that at peak times, over 1/3 of Americans are watching TV. You can find this and more fascinating information on our visualization, Busy States of America. With new data as yet unpublished from the Bureau of Labor Statistics, you can see how many Americans are doing common, everyday activities right now. View the real-time visualization here: 
I hope you find our display of the typical American's day interesting and share it with your readers. Let me know if you have any questions.
 I think the visualization is pretty nice.

Thursday, November 06, 2014

Google NGram

So, Google is scanning all the books ever published and is making good progress. An interesting project span off from all text scanned is the Google books NGram viewer project that curated all the words/phrases' traces in the publishing history. The raw data is also available for anyone interested in playing with a big set of interesting data. Here is my take on "Statistical learning" vs. "Statistical modeling" vs "predictive analytics".

Friday, April 04, 2014

Humans versus Machines, Statistics for the win.

In an online interview, Professor Chris Wiggins said
... Machine learning sits at the intersection of data engineering and mathematical modeling. The thing that makes it different from statistics traditionally, is far more focus on building algorithms.
Another difference, although this is more of a spiritual difference, is that statistics traditionally has had a stronger emphasis in explaining a data set and machine learning has far more interest in building predictive models. For example, when Netflix tells you what movie to watch or when Amazon predicts what book to buy--that’s machine learning. ...
For years, this has been my understanding of the distinction between Statistical Learning and Machine Learning but it means more when it came from a machine learning expert. Statisticians are indeed obsessed with finding explanations of the variations we observe in a data set, through modeling, visualization, simulation-based model checking and etc. We are interested in which few features can explain away a large proportion of variation in the outcome of interest, and through what forms of mechanisms. These features are mostly likely related to the scientific mechanisms behind the data. Understanding such mechanisms will be of importance to any data-driven decision making process such as policy making, intervention, and e.g., personalized medicine. We do not believe that our models are correct but we do think they are often very useful. For example, we use model-based framework to understand what assumptions were made for specific machine learning algorithms. Such understanding is then used to evaluate these algorithms under the situations where the assumptions do not hold and to suggest extensions, modifications to make them work better. Modeling is not our goal. Understanding is. Machines do not need to understand the work they are doing but humans do. For the curious minds, Statistics can help.

Friday, March 14, 2014

Good read for aspiring applied statisticians

Roderick J. Little (2013) In Praise of Simplicity not Mathematistry! Ten Simple Powerful Ideas for the Statistical Scientist, Journal of the American Statistical Association, 108:502, 359-369. [Link]

Wednesday, January 15, 2014

MS in Data Science at Columbia

An interview with Susan Murphy

Here is a very interesting interview I read online. Highly recommended.

My favorite quote from this interview, among others, is
Well, I've got this viewpoint—and I don't know if this is very mature—but I think it's a big game out there, and we all have to be prepared to play it. Everyone is trying to frame things in their own way, and we all have to try and be as educated as possible so we can understand the degree to which it's a game, and know if someone's trying to pull the wool over our eyes. Even if it's someone I agree with!
I totally feel the same way. Having been a statistician for years, I have become more and more aware of the fact that I often interpret the world in a way different from my non-statistician friends.