Friday, December 10, 2004

Discussion to Andrey Rzhetsky's talk on "chains of collective reasoning in biology"

This was writen on Dec 9th, 2004, when I was preparing for my discussion to Andrey Rzhetsky's talk.

Just a couple of days ago, one of my master students was discussing with me on research projects. He said that he was told that if he found something contradicting with a claim in his textbook, he probably has made some mistakes in his inference. I said, that was probably true, especially if we were talking about a 5th edition of a textbook, which was used nationwide.

When comes to the creative research we are doing everyday, especially for the topics that are deeply into the unknown, we lack the reference of authority. It is hard to argue which makes one feel better, knowing definitely that he is wrong or the torture of uncertainty with challenges from the opposite direction. From time to time, we have to practice, to that extent, the reasoning presented in this talk. That is why I very much appreciate this project.

If we think about the set-up the model, and imagine a perfect world where every piece of work will be published. Then the outcome in the literature will be estimating the reality instance of the truth. On the contary, if a lot of investigators throw away their work due to a confliction with previously published claims, the published record may soon converge, to the right or to the wrong, depending on the correctness of the initial discoveries. The rate of convergence depends on the values of the censoring parameters.

Also one thing to note for this data analysis is that the data used is the publication record. if I am not mistakening, the model is set up for the submission reasoning alone. It is hard for one to assume that the chance that a paper is accepted for publication is independent from the claim of that paper and what has been known in the literature. If one simply includes another layer of random publication decision in the model, there will be some identifibility issue since the submission and the publication review processes are confounded. This may be one issue that needs to be addressed in the future. I can't help but amusing myself by thinking about that the rejected submissions should be recorded and somehow credited since they will contribute valuable information to this study.

The first time I looked at the results of this project, I wonder what the estimates will look like for statistics. In statistics, the intuition and rigor from the mathematical backbone provide us with some sense of authority. Very often, I hear some of my fellow statisticians (or myself) say "it has to be right" or "there must be something wrong in the simulation". Things are different at the frontier of statistics and biology though, due to the fact that the complexity of biological systme is not fully understood. Many different statistical methods based on different understandings and simplifications of the biological models are being proposed (there are a couple of good examples in the poster presentation on the 10th floor). Maybe we can archieve some good estimates of the truth here throught a lot of collaborations, according to Andrey's equations. Maybe uncertainty is a blessing to collaborative projects.