A colleague and good friend of mine, Dr. Ying Wei sent me a document that contains a list of R colors. She compiled this list when she was working on her thesis. It is such a beautiful file by itself, not to mention that it can be so helpful at times. Upon approval from Ying, I am sharing this list in this post.
Wednesday, September 06, 2006
Have you seen so many colors in R before?
A colleague and good friend of mine, Dr. Ying Wei sent me a document that contains a list of R colors. She compiled this list when she was working on her thesis. It is such a beautiful file by itself, not to mention that it can be so helpful at times. Upon approval from Ying, I am sharing this list in this post.
Thursday, August 24, 2006
From PDF to Excel Table
PDF2XL, our core software product can aid academic researchers in sociology, economics, political science and other disciplines to extract data from government and general publications. Extracting data from these sources is a prerequisite for numerous quantitative research projects. The data is usually provided in PDF format and PDF2XL can drastically reduce the amount of work involved in getting that data into a statistics-based software such as SPSS or Excel.
PDF2XL is regularly priced at $95, but our initial academic install base (UC Irvine, ASU, WSU and others) and their success, led us to believe strongly in its applicability to any research or academic setting. That is why we are introducing PDF2XL to the academic community by giving a free one-year license of PDF2XL to all academic users for the academic year 2006-2007.
We prepared a section on our website that contains information devoted to the academic community, for example, academic outreach main page and a case study about the use of PDF2XL in quantitative research.I have also attached the case study in PDF format. [http://www.cogniview.com/pdf2xl-academic.php]
Out of curiosity and my love for new geeky softwares, I tried this software on a short PDF file. Below is the screenshot. It allows you to select table area and also lets you correct the table recognition done by the program. I can imagine that this will be useful if I need to deal with PDF files with lots of tables.
Saturday, August 19, 2006
The correlation plot in our JASA "how many X" paper
I thought of providing the data from our paper as example but decided not to since we are not the owners of these data.
Thursday, July 20, 2006
Fitting constrained least square regression in R
Yes. There is something in R for such tasks. This requires a special package called mgcv, which should be installed in standard R configuration. See http://sekhon.berkeley.edu/library/mgcv/html/pcls.html. Especially, check out the option Ain and bin.
Here is an example I wrote:
(copy-paste these lines into R console)
## load the special package first.
library(mgcv)
options(digits=3)
## generate some fake data
x.1<-rnorm(100, 0, 1)
x.2<-rnorm(100, 0, 1)
x.3<-rnorm(100, 0, 1)
x.4<-rnorm(100, 0, 1)
y<-1+0.5*x.1-0.2*x.2+0.3*x.3+0.1*x.4+rnorm(100, 0, 0.01)
## make your own design matrix with one column corresponding to the intercept
x.mat<-cbind(rep(1, length(y)), x.1, x.2, x.3, x.4)
## this is the regular least-square regression
ls.print(lsfit(x.mat, y, intercept=FALSE))
## since you already have an X column for intercept, so no need for lsfit to assume another intercept term.
## the penalized constrained least square regression
M<-list(y=y,
w=rep(1, length(y)),
X=x.mat,
C=matrix(0,0,0),
p=rep(1, ncol(x.mat)),
off=array(0,0),
S=list(),
sp=array(0,0),
Ain=diag(ncol(x.mat)),
bin=rep(0, ncol(x.mat)) )
pcls(M)
Monday, July 03, 2006
News Headline starts with "Statisticians"
This news article outlines a research conducted by a statistician in Oxford University on "how interconnected the human life is".
See the following argument:
It's simple math. Every person has two parents, four grandparents and eight great-grandparents. Keep doubling back through the generations — 16, 32, 64, 128 — and within a few hundred years you have thousands of ancestors.
It's nothing more than exponential growth combined with the facts of life. By the 15th century you've got a million ancestors. By the 13th you've got a billion. Sometime around the 9th century — just 40 generations ago — the number tops a trillion.
But wait. How could anybody — much less everybody — alive today have had a trillion ancestors living during the 9th century?
The answer is, they didn't. Imagine there was a man living 1,200 years ago whose daughter was your mother's 36th great-grandmother, and whose son was your father's 36th great-grandfather. That would put him on two branches on your family tree, one on your mother's side and one on your father's. In fact, most of the people who lived 1,200 years ago appear not twice, but thousands of times on our family trees, because there were only 200 million people on Earth back then.
Simple division — a trillion divided by 200 million — shows that on average each person back then would appear 5,000 times on the family tree of every single individual living today.
But things are never average. Many of the people who were alive in the year 800 never had children; they don't appear on anybody's family tree. Meanwhile, more prolific members of society would show up many more than 5,000 times on a lot of people's trees.
Keep going back in time, and there are fewer and fewer people available to put on more and more branches of the 6.5 billion family trees of people living today. It is mathematically inevitable that at some point, there will be a person who appears at least once on everybody's tree.