Monday, April 25, 2011

The really scary phone

We all sort of know it is hard to keep an entirely private life in today's world. But we have no idea how much of our personal life has been infiltrated until a recent "discovery" of iphone location data.

According to an WSJ article, Apple and Google didn't realize the values in this gigantic pool of data until recently. There is also the recent discovery that social networking and behavioral data can be automatically logged by our cell phones. The use of the data? Better marketing. In a recent article by WSJ, it is pointed out that our phone can actually log our body language during phone calls, the same technology that has made our iphone games more fun. Therefore, statistical methods can (potentially) be applied to study the association between body language and the content of phone calls. A always walks around while calling B, and sits at a desk while calling C. Different social ties, possibly different social influence, and thus different marketing values. Existing research has already achieved a level where  a trained learning method can predict whether two people are discussing politics without actually logging the conversation. This prediction was based on some call history, location, timing of the call, etc.

This seems to suggest a new wave of interesting data, gigantic in size and complex in nature, attractive to both machine/statistical learning researchers and social/behavioral researchers.

PS: Michael Malecki posted the R codes he used to analyze his own iphone data. I can't wait, but yet to find time, to try this on my own iphone data.

Saturday, April 23, 2011

Happy 100th Birthday, my dear Tsinghua!

Tsinghua University is celebrating 100 years since it's foundation on 4/24. The beautiful campus of Tsinghua is my dearest hometown.

Tuesday, April 12, 2011

2011 Minghui Yu Memorial Conference for PhD students and by PhD students

The doctoral students in the Department of Statistics are pleased to announce the 2011 Minghui Yu Memorial Conference on Sunday, April 17th in the Faculty House on the Columbia University main campus. This event is in honor of Minghui Yu, a student in our department who passed away in a tragic accident three years ago.  
The conference will feature talks by students in statistics, ranging from those just beginning a research program to those who are about to defend dissertations. In addition to being an occasion to remember our friend and colleague, this event will be an opportunity to learn about exciting new research areas emerging from our department.  
This year we are pleased to have a keynote presentation by Professor Jianqing Fan from the Department of Operations Research and Financial Engineering at Princeton University. Contact Stephanie Zhang [ssz2105 -at-] for questions about this event.  
Thank you to the Department of Statistics and the Graduate Student Advisory Council for their generous support.

Wednesday, April 06, 2011

Like radiation, extrapolation is also everywhere

I enjoyed a recent New York Times article immensely. It covered the current discussion surrounding the potentially world-wide radiation pollution caused by the recent nuclear crisis in Japan. Everyone agrees that the leaked radioactive materials are low in amounts, compared to the ocean to which they were released to. The debate was about whether low or very low radiation equals no risk or harm.

To answer this question, we need empirical evidence. Data! However, the effects of low radiation have not been statistically established. According to the article, available results on the relation between radiation and health risks have all been obtained on radiation level greater than 10 rem. "Current estimates by government agencies for risks from low doses rely on extrapolation from higher doses." Wow, extrapolation! Isn't this more dangerous than the radiation itself?

As we repetitively preach in W1111, the danger of extrapolation is that the regressional relation beyond the scope of observed data can be much different from that of the observed data. Andrew shared a story from his study on radon. They were faced with the same problem. Because there were very few sample points with very low radon levels, without affecting the model fit on the observed data, the predictive model of Y (risk) can be linearly decreasing, diminishing towards zero, or even curve up (meaning that a little bit of radon is good to you) when the radon level goes to zero.

Another interesting aspect of this story is in the potential long-term impact of the Japan nuclear pollution. Current studies on radiation only help us understand the effects of short-term yet high dose radiation exposure. The situation surrounding Japan's nuclear crisis requires extrapolation in two directions: into low level of radiation (as we discussed above) and into longer time of exposure. Double-extrapolation!

Biologists provide their insights on this according to DNA mutation models and their relation with cancers. Basically, radiation (strong enough radiation, maybe) causes DNA mutation. The simplest mechanism would be that radiation cuts up DNA, and the sequence will then get messed up during replication and beyond repair. Such mutations tend to cumulate in our body and cause cancers and other disorders in a long run. Based on such biological belief, it is then hypothesized that low radiation, no matter how low, poses health threat.

So help us Data!