Bestselling author and data scientist Seth Stephens-Davidowitz delivered the fall 2019 Atkinson Lecture: “Search and Discover: What the Internet and Big Data Reveal about Who We Are.” He shared the story of Jeff Seder, who used data science to find the perfect racehorse.
After graduating from Harvard, Harvard Law and Harvard Business School, Seder decided to leave his international banking job in New York City to begin measuring horse nostrils. Seder decided the banking life wasn’t for him and that he would use big data to figure out what makes race horses win.
First, Seder theorized that horses with bigger nostrils should be able to breathe better and run faster. He collected and analyzed horse nostril data and found that it wasn’t correlated to winning races. The same was true when he carefully measured the size of horse poop — not correlated to winning. But when he measured horses’ heart sizes, particularly the size of the horse’s left ventricle, he found a strong correlation. So he went to work finding a horse.
Meet American Pharoah, a horse with a gargantuan left ventricle. In the 99th percentile, such a big left ventricle usually meant a defective heart. But American Pharoah’s heart was strong, and Seder would put his theory about ventricle-size to the test.
American Pharoah — and Seder’s ventricle-size theory — would pass. Seder’s pick won the Kentucky Derby, the Preakness Stakes and the Belmont Stakes, becoming the first horse in 37 years to win the Triple Crown.
“Big data isn’t bad or good; it’s powerful,” said Stephens-Davidowitz, who urged students to consider looking to big data tools for answers and not to be afraid of failure. “Data scientists are entrepreneurial,” said Stephens-Davidowitz. “And winners fail a lot,” he said, reminding the audience of Seder’s nostril and poop studies.
But data science can do far more than find a winning race horse. By turning this technique to people, Stephens-Davidowitz analyzed searches about suicide to figure out what precedes them in order to find out what led people to consider suicide. It turns out that people learning about health conditions — in particular stigmatized sexually transmitted infections such as herpes — are strongly associated with searches related to suicide, opening an avenue for intervention.
“People lie on surveys,” said Stephens-Davidowitz. “But Google is like digital truth serum, giving us insight we would never know otherwise.”
Stephens-Davidowitz said he hoped to inspire students to consider studying data science, and he left the audience with a challenge: “There are still left ventricles out there for you to find.”
To learn more about Willamette’s undergraduate data science program or its four-year data science master’s program, visit willamette.edu/cla/data-science.