Today I was invited to gave comments at the book signing presentation by Seth Stephens-Davidowitz on his book: "
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are". Here are my comments, which were random statistical thoughts I had while reading this book.
- "many economists to pay more attention to human behavior”
- “people are not rational”
- Reminded me of the most interesting aspect of this book: behind every google search, there is a human decision.
- Google search expanded our knowledge.
- There are two types of knowledge: the facts that you know and the facts that you know how to find them when needed.
- Google increased the amount of the second type of knowledge by multiple magnitude.
- In 2012, it is estimated that there are 3.3 billions Google searches per day, 47K a second.
- Now: 63K from Live Internet Stats.
- Google ranking algorithm:
- How does it change how we acquire knowledge and evaluate evidence?
- Has it been using personalization based on your location and prior search pattern?
- In what ways does a Google search differ from people asking friends for opinions or suggestions?
- Google consumer confidence report 2017
- Trust in Google remains high as 72.3% of respondents trust the accuracy of Google search results.
- 63.7% of respondents don’t know how Google make money from search.
- 65.3% said they would not want more relevant Google search results if it meant Google would use their search history to generate their results – something which Google is doing anyway.
- Google Flu Trends story
- In 2008, researchers from Google experimented predicting trends of seasonal flu based on people’s searches.
- They published a paper in Nature, explaining an intuitive idea that people who are coming down with the flu would search for flu-related information on Google, providing almost instant signals of overall flu prevalence.
- The research behind this paper was based on machine learning algorithms constructed using search data fitted against real-world flu tracking information from the Centers for Disease Control and Prevention.
- In 2013, Google flu trends missing at the peak of the 2013 flu season by 140 percent.
- In a 2014 Science paper, researchers found that
- Google’s GFT algorithm simply overfitted.
- It is sensitive to seasonal terms unrelated to the flu, like “high school basketball.”
- Google’s GFT algorithm also did not consider changes in search behavior over time, including influences from their own new search features.
- It was suggested that GFT should be updated in collaboration with CDC.
- This is an example that there is information in “big data” but using such information to derive correct knowledge and insights require careful modeling and interdisciplinary collaboration.
- Natural language is hard
- Actually science is hard.
- Google search reveal secrets
- Related to research conducted by my collaborator Sarah Cowan.
- Pay attention to data collected and not collected: https://www.squawkpoint.com/2015/01/sample-bias/
-