Monday, April 25, 2011

The really scary phone

We all sort of know it is hard to keep an entirely private life in today's world. But we have no idea how much of our personal life has been infiltrated until a recent "discovery" of iphone location data.

According to an WSJ article, Apple and Google didn't realize the values in this gigantic pool of data until recently. There is also the recent discovery that social networking and behavioral data can be automatically logged by our cell phones. The use of the data? Better marketing. In a recent article by WSJ, it is pointed out that our phone can actually log our body language during phone calls, the same technology that has made our iphone games more fun. Therefore, statistical methods can (potentially) be applied to study the association between body language and the content of phone calls. A always walks around while calling B, and sits at a desk while calling C. Different social ties, possibly different social influence, and thus different marketing values. Existing research has already achieved a level where  a trained learning method can predict whether two people are discussing politics without actually logging the conversation. This prediction was based on some call history, location, timing of the call, etc.

This seems to suggest a new wave of interesting data, gigantic in size and complex in nature, attractive to both machine/statistical learning researchers and social/behavioral researchers.

PS: Michael Malecki posted the R codes he used to analyze his own iphone data. I can't wait, but yet to find time, to try this on my own iphone data.

Saturday, April 23, 2011

Happy 100th Birthday, my dear Tsinghua!

Tsinghua University is celebrating 100 years since it's foundation on 4/24. The beautiful campus of Tsinghua is my dearest hometown.

Tuesday, April 12, 2011

2011 Minghui Yu Memorial Conference for PhD students and by PhD students

The doctoral students in the Department of Statistics are pleased to announce the 2011 Minghui Yu Memorial Conference on Sunday, April 17th in the Faculty House on the Columbia University main campus. This event is in honor of Minghui Yu, a student in our department who passed away in a tragic accident three years ago.  
The conference will feature talks by students in statistics, ranging from those just beginning a research program to those who are about to defend dissertations. In addition to being an occasion to remember our friend and colleague, this event will be an opportunity to learn about exciting new research areas emerging from our department.  
This year we are pleased to have a keynote presentation by Professor Jianqing Fan from the Department of Operations Research and Financial Engineering at Princeton University. Contact Stephanie Zhang [ssz2105 -at-] for questions about this event.  
Thank you to the Department of Statistics and the Graduate Student Advisory Council for their generous support.

Wednesday, April 06, 2011

Like radiation, extrapolation is also everywhere

I enjoyed a recent New York Times article immensely. It covered the current discussion surrounding the potentially world-wide radiation pollution caused by the recent nuclear crisis in Japan. Everyone agrees that the leaked radioactive materials are low in amounts, compared to the ocean to which they were released to. The debate was about whether low or very low radiation equals no risk or harm.

To answer this question, we need empirical evidence. Data! However, the effects of low radiation have not been statistically established. According to the article, available results on the relation between radiation and health risks have all been obtained on radiation level greater than 10 rem. "Current estimates by government agencies for risks from low doses rely on extrapolation from higher doses." Wow, extrapolation! Isn't this more dangerous than the radiation itself?

As we repetitively preach in W1111, the danger of extrapolation is that the regressional relation beyond the scope of observed data can be much different from that of the observed data. Andrew shared a story from his study on radon. They were faced with the same problem. Because there were very few sample points with very low radon levels, without affecting the model fit on the observed data, the predictive model of Y (risk) can be linearly decreasing, diminishing towards zero, or even curve up (meaning that a little bit of radon is good to you) when the radon level goes to zero.

Another interesting aspect of this story is in the potential long-term impact of the Japan nuclear pollution. Current studies on radiation only help us understand the effects of short-term yet high dose radiation exposure. The situation surrounding Japan's nuclear crisis requires extrapolation in two directions: into low level of radiation (as we discussed above) and into longer time of exposure. Double-extrapolation!

Biologists provide their insights on this according to DNA mutation models and their relation with cancers. Basically, radiation (strong enough radiation, maybe) causes DNA mutation. The simplest mechanism would be that radiation cuts up DNA, and the sequence will then get messed up during replication and beyond repair. Such mutations tend to cumulate in our body and cause cancers and other disorders in a long run. Based on such biological belief, it is then hypothesized that low radiation, no matter how low, poses health threat.

So help us Data!

Thursday, March 24, 2011

My Google Status

We were talking about googling someone and Andrew said he is the first Gelman if googled.

I was surprised,
"Gelman, just Gelman?"
"Yes, and I need to work on the Andrew part".

Out of pure curiosity, we tried my name. Tian, first. I was not hopeful as Tian has the same sound of "sky" in Chinese. And then it comes as the second! Right after wikipedia's entry of "Tian". Popularity is a double-blade sword. You will have a wikipedia page for your name if it is popular enough and then your own website will be ranked second to the wikipedia page.

Andrew noticed that Google automatically detected my location and then give my website an edge. We then tried Chicago. Obviously no Tian in Chicago can out beat me!

What about Zheng? I became even less hopeful. This is the 27th most popular last name out of all the Chinese in the world after all. I am the eighth or something this time. Andrew was impressed, "still on the first page! You should put this on your CV." Well, I think I will just blog about it!

Monday, February 14, 2011

The Tiger Parenting

There has been too much talking about the "Tiger Mom" article / book / culture difference, etc, that I didn't even care to blog about it. I only found out today that Andrew has blogged about it as well. His argument about whether you feel special about yourself made me think. So here is my two cents.

The story of the rabbit and the turtle (you can find so many versions online) can essentially summarize the main message of the education I have received not only from my parents but also my teachers. Rabbit represents someone who has a lot of talents and turtle someone who doesn't. The conclusion is two-fold. One, if you don't work hard, your talents are wasted. Second, even for someone with limited talent, working hard has a high reward. Just imagine what hard work will do to someone with a little more taltent (you!).

The guideline is then simple: work hard, harder and the hardest, no matter what.

Implicitly, it also means that you are responsible for your own "success". If you "fail", it only means that you didn't work hard enough (you can't be as slow as a turtle, right?).

However, I think a subtle message that didn't get discussed a lot was turtle's attitude in the story. Turtle was focused on trying his best but not winning (at least in some version of the story). I am sure he enjoyed the race even though he was not at all "goog at it." To me, this is the more important lesson of the story: "work hard and enjoy!"