Wednesday, February 11, 2015

A statistical read about gender splits in teaching evaluations

A recent article on NYT's upshot shared a recent visualization project of teaching evaluations on "rate my professor", 14 millions of them. Reading this article after a long day in the office made me especially "emotional." It confirmed that I have not been delusional.
It suggests that people tend to think more highly of men than women in professional settings, praise men for the same things they criticize women for, and are more likely to focus on a woman’s appearance or personality and on a man’s skills and intelligence.
Actually, the study didn't find people focus on a woman's appearance as much as expected. 

The article made an important point:"The chart makes vivid unconscious biases. " The 14 millions reviewers were not posted to intentionally paint a biased picture of female professors compared with their male colleagues. The universities didn't only assign star male professors to teach alongside of mediocre female professors. Online teaching evaluations have known biases as people who feel strongly about what they have to say are more likely to post reviews. But this selection bias cannot explain away the "gender splits" observed. They are due to "unconscious biases" towards women.

What does this term, "unconscious biases", actually suggest? It suggests that if you are thinking that you are being fair to your female colleagues, female students or professors, you are probably not. If the biases were unconscious, how can we possibly assert that we do not have them? Most of those who wrote the 14 millions review must have felt they were giving fair reviews. Therefore, statistically speaking, if 14 millions intended "fair" reviews carried so much unconscious biases, we then have to act more aggressively better than just being fair to offset these unconscious biases.

Friday, January 30, 2015

Lego, sampling and bad-behaving confidence intervals

Yesterday, during the second lecture of our Introduction to Data Science course for students in non-quantitative program. We did a sampling demo adapted from Andrew Gelman's teaching book (a bag of tricks).

Change from candies to legos. The original teaching recipe uses candies. A side effect of that is the instructor will always get so much left over of candies as the students are getting more and more health conscious. So this time, I decided to use lego pieces. One advantage of this change is that we can save the kitchen scale and just count the number of studs (or "points") on the lego pieces.

Preparation. The night before I counted two bags of 100 lego pieces: population A and population B. Population A consists of about 30 large pieces and 70 tiny pieces. Population B consists of 100 similar pieces (4 studs, 6 studs and 8 studs).

In-Class demo. At the beginning of the lecture, we explained to the students what they need to do and passed one bag to half of the class, and the other bag to the other half, along with  data recording sheets.

Results. Before class, I asked a MA student, Ke Shen, in our program who is very good at visualization and R to create a RShiny app for this demo, where I can quickly key in the numbers and display the confidence intervals.

Here are population A samples.
Here are population B samples. 

Conclusion. Several things we noticed from this demo:
  1. sampling lego pieces can be pretty noisy. 
  2. all samples of population A over-estimated the true population mean (the red line). samples of population B seemed to be doing better. 
  3. population variation affects the width of the confidence intervals. 
  4. but even wider confidence intervals were wrong due to large bias. 

Wednesday, December 17, 2014

Stacked bar-plot to show different allocation profiles.

In our 2010 paper on estimating personal network sizes, we used the following graph to show the non-random mixing matrix we estimated for personal networks:

Each group of bars represent the composition of a certain ego group member's social network, broken down into eight groups of alters (or types of acquaintances). This figure demonstrates the homophily phenomenon in social networks that individuals tend to form ties with others who are similar.

Today, Shirin asked me about how I made this plot. Despite its "busy" appearance, it is actually pretty easy to make it. Assume you have two ego groups. Therefore you have two vectors of proportions (composition) of length 8. We assume the first 4 are for males and second set of 4 are for females.

Wednesday, December 10, 2014

Circlize your visualization!

Using a circular organization in visualization is a good way of presenting a system of information such as a network. It is also known as a chord diagram.

I found a nice R package called circlize that provides functions to create a whole range of cool visualizations. Read their tutorials to have as much fun as you would like. Here is the one I like the most. You start with a grid of images like this (Keith Haring’s doodle)

and make it into a cool circular adaptation:

Tuesday, November 18, 2014

OpenIntro Statistics: an online intro stat book with labs

I came across this nice online portal on introductory statistics: OpenIntro Stats. It has a textbook, labs on R or SAS, teachers resources (slides, learning objectives), videos, and much more. Everything is laid out in a nice accessible platform, including LaTex source files. It is a nice resource for learning intro stat, R/R studio and LaTex.