Thursday, August 24, 2006

From PDF to Excel Table

I received the following email:

PDF2XL, our core software product can aid academic researchers in sociology, economics, political science and other disciplines to extract data from government and general publications. Extracting data from these sources is a prerequisite for numerous quantitative research projects. The data is usually provided in PDF format and PDF2XL can drastically reduce the amount of work involved in getting that data into a statistics-based software such as SPSS or Excel.

PDF2XL is regularly priced at $95, but our initial academic install base (UC Irvine, ASU, WSU and others) and their success, led us to believe strongly in its applicability to any research or academic setting. That is why we are introducing PDF2XL to the academic community by giving a free one-year license of PDF2XL to all academic users for the academic year 2006-2007.

We prepared a section on our website that contains information devoted to the academic community, for example, academic outreach main page and a case study about the use of PDF2XL in quantitative research.I have also attached the case study in PDF format. []

Out of curiosity and my love for new geeky softwares, I tried this software on a short PDF file. Below is the screenshot. It allows you to select table area and also lets you correct the table recognition done by the program. I can imagine that this will be useful if I need to deal with PDF files with lots of tables.

Saturday, August 19, 2006

The correlation plot in our JASA "how many X" paper

Andrew emailed me saying that he received a number of requests for the codes that we used to make that figure. So here I have prepared a more annotated version of the codes. R codes.

I thought of providing the data from our paper as example but decided not to since we are not the owners of these data.