Tuesday, July 22, 2008

Up and downs of percentage of working men and women

New York Times published an article today on "Women Are Now Equal as Victims of Poor Economy". The whole article can be found here. I found the time plots used in this article is very interesting. It has some good graphing tactics: use of color/shade, for example and the Y axis is actually from 0-100%.

Monday, July 21, 2008

The Black Swan

"The Black Swan" is a best-seller by Nassim Nicholas Taleb, the author of another bigger best-seller "Fooled by Randomness." Both books received more than 300 reviews on I flipped through the second book in the bookstore and thought it was an interesting book, especially for teaching statistics to curious and skeptical undergraduate students.

Today, I stumbled over a special issue in The American Statistician (August 2007) designated to the praises and criticisms (mainly the latter) of "The Black Swan." The reason for such strong reaction from the usually low-key statistical profession is pretty obvious from a small number of excerpts quoted in TAS:

“Statisticians . . . are computing people, not thinkers.”

“Statisticians, it has been shown, tend to leave their brains in the classroom, and engage in the most trivial inferential errors when they are let out on the streets.”

It is worth mentioning that this special issue even caught the attention of the Bloomberg news and the editor of TAS was interviewed for a news coverage of this book.

TAS also invited Taleb to write a response in this special issue. In this response, Taleb said the main criticism for statistics in his book is for

1. The unrigorous use of statistics, and reliance on probability in domains where the current methods can lead us to make consequential mistakes (the “high impact”)where, on logical grounds, we need to force ourselves to be suspicious of inference about low probabilities.

2. The psychological effects of statistical numbers in lowering risk consciousness and the suspension of healthy skepticism—in spite of the unreliability of the numbers
produced about low-probability events.

3. Finally TBS is critical of the use of commoditized metrics such as “standard deviation,” “Sharpe ratio,” “mean-variance,”and so on in fat-tailed domains where these terms have little practical meaning, and where reliance by the untrained has
been significant, unchecked and, alas, consequential.

This is essentially about cautions on prediction (extrapolation) based on models, effects of outliers and rare events, and uses of statistics that are motivated by specific probability models. I don't think any good applied statistican will deny the importance of cautionary interpretation of statistical analysis and will not "commit" the mistakes outlined above. Actually, sometimes I feel the conclusion that can be made from data analysis is very limited. The usefulness of statistical inference and analysis lies primarily in narrowing down hypotheses and possbilities.

Solution to improve integrity in medical research was proposed surrounding statistician's participation

This may be one of the most interesting articles from the president's column of AMSTAT news I have read recently. It is a reaction to an editorial of the Journal of American Medical Association that "impugning the integrity of medical science." As part of the proposed solution for the integrity problem in the pharmaceutical industry, it is suggested that clinical trial data collected by a for-profit organization (usually a company) should be analyzed by an academic statistician. The AMSTAT news article challenged this proposal by first pointing out the implicit accusation against industrial statisticians, then suggesting the remaining conflict of interests for an academic statisitican who is paid to do the analysis. Another problem with this solution, according to the AMSTAT news article, is the segregation of the design and experiment phase of the study from the analysis phase. I tend to agree with the AMSTAT news article a lot. As someone who has done some analysis for medical investigators, my first reaction is "since when have statisticians become the leading investigators of such studies? who would decide on actions that will affect the integrity of a study."

Monday, February 25, 2008

Identifying gene-gene interaction that is relevant to a disease outcome

In collaboration with Dr. Dimitris Anastassiou (EE, Columbia), we just published a paper on "Identification of gene interactions associated with disease from gene expression data using synergy networks" in BMC System Biology.

Tuesday, February 19, 2008

Tiling Arrays

Here are some references on this technology for my own use.

  1. Global Identification of Human Transcribed Sequences with Genome Tiling Arrays
    Science 24 December 2004:Vol. 306. no. 5705, pp. 2242 - 2246
    DOI: 10.1126/science.1103388
    Paul Bertone,1* Viktor Stolc,1,2* Thomas E. Royce,3 Joel S. Rozowsky,3 Alexander E. Urban,1 Xiaowei Zhu,1 John L. Rinn,3 Waraporn Tongprasit,4 Manoj Samanta,2 Sherman Weissman,5 Mark Gerstein,3 Michael Snyder1,3
  2. Array of hope
    Nature Genetics 21, 3 - 4 (1999)
    E. S. Lander
  3. Model-based analysis of tiling-arrays for ChIP-chip
    PNAS August 15, 2006 vol. 103 no. 33 12457-12462
    W. Evan Johnson*,,, Wei Li*,,, Clifford A. Meyer*,,, Raphael Gottardo, Jason S. Carroll¶, Myles Brown¶, and X. Shirley Liu*,,

Standard Deviation Function in R

I always thought there was no function for standard deviation in R. Since in Splus, it was stdev() if I remember correctly. I never help.seached it. I only found out that there was one: sd().