t+z statistics: March 2006

Wednesday, March 29, 2006

Mean, median, mutual funds and DoND

This morning I saw a commercial where a man cheered "more than half of our mutual funds outperforms the market's average!" I can't help thinking about this statement statistically.

It will be an absolutely dumb statement if it was not "average" but "median". For "average", if the distribution of the returns is highly right-skewed, with a more than 50% probability to perform higher than the mean is a good indicator. But can the copy-writer of that commercial be so statistically sophiscated? This could be an amusing w1111 example in the future. Even though mutual funds may not be the most appropriate subject. I used to have students who complained about the car examples I used since they had little experience with automobiles.

This reminded me of my conversation with Ying Wei the past Monday. It was on the loss of efficience when estimating mean of a Gaussian distribution using median compared with using mean. That reminded me of the recent popular show on NBC: Deal or No Deal (DoND).

Here is my version of what that show is about: the show starts with 26 closed cases contain 26 fixed money values range from $0.01 to $1,000,000. The contestant will open several cases randomly in batches. The cases opened are eliminated from the board (i.e., can not be won by the contestant). After each batch of cases opened, the show will pause. Looking at the remained undiscovered money amounts, a banker will offer the contestant an amount of money to make him/her stop, (from winning the biggest remaining value, of course). If the contestant refuses the offer, he/she will have to eliminate one or more amounts by random guessing, which will actually make the next offer drop.

From the contestant stand point, he/she should accept offer that is higher than the MEDIAN since he/she only play once. If he/she keeps on playing, there is a 50-50 chance that he/she leaves with value lower than the offer. On the banker side, he needs to make offer that is much lower than the mean since he needs to play the game many times. Thus, it is not a surprise to me that every time the bank makes an offer, it is always much lower than the mean of the remaining values. I still yet to figure out the magical amount (offer-median) and the reasoning behind it.

Sunday, March 26, 2006

Blog as an alternative to group website

On March 23rd, in Tom's office, we discussed about building a website for our "connection" research group (or how many X's do you know group). For something like this, in order for it to work, some one must take up some tedious work. I suggested a blog. Andrew said he was happy with blog except for only one concern--he could not directly attach a file to his blog post. I was not sure about blogger.com blogs, so I didn't insist on the idea. I used the "upload file" function of blogger.com editor and put document1.pdf in the same folder as the blog. However, it did not automatically generate a link to that file. The link needs to be added manually. It is a WORKABLE setup but not the most convenient we may wish to have.

Regarding the security, we can always implement .htaccess level of limited access. Try click on my teaching site for w1111. It is pretty easy to set up but I haven't mastered how to let users change their password occassionally. I suppose it is not our biggest concern now.

Friday, March 24, 2006

Data: the only and narrow window

Today, in a ISERP lunch group, Peter Hoff from University of Washington gave a talk on latent factors models for network data. It is nice to have a 3-hour discussion on a topic since you get to stop, think and discuss about a particular problem without worrying about running out of time.

One of the questions I had was that whether the latent factors fitted to the data correspond to demographic characteristics of the nodes in the network. Before I asked I knew the answer would be "not necessarily". The latent factors just provide a way or a model to decompose the variation structure of a network into a more interpretable factors that represent the initiator and the receiver of an edge in a network.

There were also other discussion along this direction. I didn't catch all of them since I was busy making some simple numerical examples to help myself understand better. Then I heard Andrew say:"we can not claim to infer the data generating mechanism behind the data. we can only infer a data generating mechanism that can generate the data observed."

This reminded me of my thoughts on data and models.

Data (limited observed values) always classify all possible models into equivalence classes. For example, in regression, n points (x_i, y_i) define classes of curves that go through the same values at the x_i's. The regression analysis is simply trying to find the class with the closest distance to the data. In a modeling effort, the targeted model space intercept with the data's equivalence classes. After the interception, if there is more than one model remained in each equivalence class, we get the identifiability issue.

We can only understand the world to the extent that the data allow. When we ask others about the size of their data sets, we may just sound like coworkers comparing offices: "how's the view in your new office?" "pretty good! the window's much bigger than what I used to have" "wow, nice! you can see so much more now!"

Thursday, March 23, 2006

Finally, found it!

I used to run DOS commands in Splus to automate data generation (naming folders, etc) using dos( ). But R does not have dos( ). Before I only help.searched 'dos' and didn't get what I wanted. Today, something just clicked and I help.searched 'system' and I found function shell( ) was exactly what I needed.

Remembering stress

I found I wake up to a different mood every day. Sometime I can come up with a probable explanation but most time I don't know why. On some lucky days, I wake up very excited about the day of work ahead of me. And that is what I want. It is not that I am a workaholic (even though it is not a bad idea I become one). It is just that I will go to work no matter what my mood is and excitement towards work will make the day so much more enjoyable AND productive. I remember I was in such a mood one January morning last year and then I had a bad fall on the icy street. Then my "work high" disappeared for a couple of months.

Today I woke up feeling exhausted. Every time this happens, I just want to take something that will boost my mood into a unreasonable "work high". I vaguely recall reading about the medical cause of depression, where I learned that our moods are affected by some enzyme in our brain. So I thought maybe the level of that-whatever-it-is thing in my head is highly variable. Maybe there is a way to stablize it (doing Yoga maybe?).

So I went online and googled.

There is absolute proof that people suffering from depression have changes in their brains compared to people who do not suffer from depression. The hippocampus, a small part of the brain that is vital to the storage of memories, is smaller in people with a history of depression than in those who've never been depressed. A smaller hippocampus has fewer serotonin receptors. Serotonin is a neurotransmitter -- a chemical messenger that allows communication between nerves in the brain and the body. What scientists don't yet know is why the hippocampus is smaller.
Investigators have found that cortisol (a stress hormone that is important to the normal function of the hippocampus) is produced in excess in depressed people. They believe that cortisol has a toxic or poisonous effect on the hippocampus. It's also possible that depressed people are simply born with a smaller hippocampus and are therefore inclined to suffer from depression.

Okay. It is not really an enzyme. Hippocampus (hippo-campus?) that in charge of memory storage becomes smaller in depressed patients. Hmm ... interesting, I thought. Most people had depression went through things they DO want to forget. Sometime, we hear ourselves saying "I am trying to forget about this" especially during stressful events. This can be interpretted as signals to our brain (of course, we think using our brain, don't we?) and our brain takes the hint and signaled the hippocampus to become smaller.

So we probably should keep remembering everything no matter how frustrating it is. :)

Tuesday, March 21, 2006

Direct 3D Robot Control

Robert Kass came to our department yesterday and gave a talk on Bayesian curve fitting and neuron firing rates modeling. During his presentation, he showed a segment of video from one of his collaborators' lab in which a monkey was shown to control a robot arm through its thoughts. The monkey's desire to eat were sensed by neuron sensors on its head and translated by some computational algorithm. Dr. Kass mentioned that similar experiements have been planned where the method developped by his colleagues and him were to be used in the translation. It is moments like this that make doing applied statistics rewarding.

Tuesday, March 14, 2006

A helpful command in R

setwd(base) -- set current working directory
getwd(base) -- get current working directory

I used to use my own path string in each program to control for data input/output path. I think this is even easier.

Twitter