Who were the people pushing the limits of their time and circumstances to bring us what we know today as data science? We examine what motivated them to do their important work and how they laid the foundations for our modern world where algorithms and analytics affect everything from communications to transportation to health care—to basically every aspect of our lives.
This is their story.
Ginette: “She was obsessed with her failure—she thought she hadn’t done enough. And it didn’t matter that the public saw her as a heroine. So she ended up writing an 830-page report where she employed some power graphics, and this paired with her other efforts ended up changing the entire system.”
Ginette and Curtis: “I’m Ginette, and I’m Curtis, and you are listening to Data Crunch, a podcast about how data and prediction shape our world. A Vault Analytics production.”
Ginette: “In our last three episodes, we have just thrown you into the middle of data and prediction and the explosion of data science. And some of you have had some questions, like, How did data science become a thing?
“In the next three episodes, we’re doing a miniseries where we’re going to address some of these questions, and I think you’ll find it very interesting. Our story starts with an impressive woman.
“It’s 1854. It’s the Crimean War, and a woman shows up at a hospital to help. She finds horrifying conditions. To paint an accurate picture for you, here’s a little bit of what she found: the sewage and ventilation systems were broken; the floor was an inch thick with waste—probably human and rodent; the water was contaminated because, come to find out, the hospital was built over a sewer; rats were hiding under beds and scurrying past, as were bugs; and the soldiers’ clothing was swarming with lice and fleas; and on top of that, there were no towels, no basins, no soap, and there were only 14 baths for 2,000 soldiers. Keep in mind this was 20 years before Pasteur and Koch spread Germ Theory.
“So she and the 37 nurses that she brought with her set to work, and they did their best to clean up the hospital and help the soldiers. Eventually, because of her, the government sent a sanitary commission. They flushed the sewers; they improved the ventilation. And this helped the situation dramatically. In the end, she reduced the death rate by two thirds.
“But Florence Nightingale went home feeling like she had failed, which you’ll remember we mentioned right at the beginning of the podcast. She felt a lot of soldiers had died needlessly. This drove her to write her famous 830-page report. And she ended up working with lead statistician William Farr, who actually helped invent medical statistics. He would say to her, ‘We don’t want impressions, we want facts.’ And working under that type of context, she gathered vast amounts of complex army data and analyzed it to find something rather shocking: 16,000 of 18,000 deaths in hospitals were not due to battle wounds but to preventable diseases spread by poor sanitation.”
“So these statistics completely changed her understanding. She thought the deaths were due to inadequate food and lack of supplies, but after the sanitary commission came in, she noticed that the mortality rate dropped significantly. So as Florence prepared her report, she was afraid that people’s eyes would glaze over the numbers and that they wouldn’t grasp the significance of what she was trying to say. So she came up with a clever way to present her data: she ended up using graphics, in particular what she’s known form the rose chart, to convey her message.”
Curtis: “Nowadays, charts are everywhere, but back in her day, the idea of creating a picture that was defined by certain data points was not very common, and so the fact that Nightingale thought to do this was very innovative and clever, and it was important because it was able to communicate what she needed to communicate.
“Her mentor, William Farr, actually advised against this. He’s quoted as saying, ‘You complain that your report would be dry . . . The dryer [sic] the better. Statistics should be the dryest [sic] of all reading.’ So here’s the leading statistician of the day saying that when you write about statistics, it should be as dry as possible. Lucky for a lot of people in the army, Florence Nightingale decided to disregard this advice. Instead, she said that she wanted to use the chart, quote, ‘to affect thro’ the Eyes what we fail to convey to the public through their word-proof ears.’
“She knew that if her insights were buried in this huge document of text, there was a low chance that anyone would do anything with it. But if she could communicate very quickly through a chart, she could make an impact, and she could save a lot of people’s lives.”
Ginette: “And her work didn’t stop there, for instance, during peacetime, she found that the average mortality rate for a certain segment of military soldiers, 20- to 35-year-olds, was almost double the average civilian mortality rate. Without mincing words, she essentially thought this about the situation: to have a mortality rate this high among the members of the military during peacetime, while the average civilian mortality rate was so much lower, was as criminal as taking soldiers out into a field and just shooting them. She felt that strongly that these deaths were preventable.
“She didn’t stop trying to change things later in life either. She attempted to have a position established at Oxford to teach applied statistics. Since most of the government members were college educated, and a lot of them at Oxford University, she thought it was very important that future leaders knew how to work with statistics. Most of the government departments at that time were collecting lots of data, and she felt that the data they were collecting was ineffectual because they didn’t know how to analyze it. She had this to say about it, “the enormous amount of statistics at this moment available . . . is almost absolutely useless.” It wasn’t that the data was useless—she knew it was extremely valuable—but she also knew that it was languishing in their inboxes because they didn’t know how to gain insights from it. And Florence felt that in order for the government to be effective, all of its legislation and all its administration had to be data driven.
“Unfortunately, Sir Francis Galton didn’t understand her vision, so the position wasn’t instituted. Regardless of her lack of success in that particular area, she used her mathematical genius to save lives, and she was recognized for it. The same year that she printed her report, she became the first female member of the Royal Statistical Society. Then at the International Congress of Statistics, she worked to make medical data collection uniform and consistent because she knew that there was so much more data could tell them if they properly collected it and analyzed it. So she ended up developing a standard hospital form with William Farr and some other physicians to make it easier to collect this data.”
Curtis: “I’d like to point out here that often times it’s the small, mundane aspects of data science that are often the most critical. Florence Nightingale created a form, which isn’t anything glamorous or exciting, but without the simple creation of a standard form to collect the data, there would have been nothing to fuel any kind of analysis. Data would have been sporadic, messy, and meaningless without the structure that, that standardized form gave to it.
Ginette: “In the end the International Congress of Statistics approved it. And her legacy lives on today. Not only was she one of the first people to use statistical data in charts, she was probably the first people to use it for social change.
“But where did statistical data visualization start? For that we have to head a little farther back in time to about the 1760s to a man you may have heard of—Joseph Priestley. Joseph is your renaissance man. He was involved in everything. In Chemistry, he was the man who discovered Oxygen, and he invented soda water, so if you love soda, you should love this man. In English grammar, he wrote a wildly successful English grammar book that gained extensive circulation, probably because he used funny lines like this: ‘Beneath this stone my wife doth lie; she’s now at rest, and so am I.’ In education, he created the first timeline graphs to teach his students about history, and it brought him a lot of notoriety, so much so that he received a doctor of law degree from the University of Edinburgh. The impressive thing about Priestley’s timelines is that the line visualized a period of time, and you could use those lines to compare against each other. You’re saying ‘What’s a big deal? That’s what a timeline is.’ But nobody had done this before.
“While Joseph Priestley innovated timeline charts, William Playfair, who came after Priestley by about 25 years, is credited as the founder of graphical methods of statistics. They say he invented the line chart, bar chart, and pie chart.
“Why hadn’t graphs developed before William Playfair? At the time, graphical displays of data were not acceptable in science. People thought graphs would taint accuracy of the data. Also it was difficult to produce graphs. It wasn’t like popping open Excel, plugging in numbers, and having it spit out an image. It called for very specialized engraving and printing knowledge that most people didn’t have.
“It turns out William learned about engraving, printing, and cartography early on in life. He also happened to have a unique blend of skills that set him up for success, at least in the graph development arena. He was schooled in math by his big brother, famous mathematician John Playfair, and so his early education mixed well with his practical skills: engineering, mapmaking, and printing, which he developed alongside some of the greatest figures of the Scottish Enlightenment.
“Another very interesting thing is that 200 years before experiments actually proved this, William Playfair knew that it could ‘facilitate the attainment of information and aid the memory in retaining it.’ For instances, he only used three or four colors in his chart so they wouldn’t overload working memory, and he placed labels next to the lines they represented instead of far away in a legend.
“But no one in England in listened him because he had a checkered reputation. A lot of people thought he was a scoundrel. He had been convicted in England of libel; convicted of swindling in France; and had been involved in blackmail. So with that type of a background, people didn’t trust him. Regardless of his shady background, this is essentially where data visualization started. And while there are a few other people who have been noted as contributors, this is really where major visualization headway happened.
“Now let’s combine data visualization with the rise of statistics, and then we’ll have built part of the foundation for data science.
“What surprised me is that the basis for modern statistics, really started in the late 1800s early 1900s. I would have guessed that statistics started a lot earlier on. So technically Florence Nightingale wasn’t using what we consider “modern statistics” when she pushed for health reform. So who developed modern statistics? It was a handful of people. We have Sir Francis Galton, and he developed standard deviation, correlation, regression, and then his protege, Karl Pearson, moved statistics into other fields besides science, like politics and business, which is ironic because as we know, Florence Nightingale had tried to move it into politics earlier, even appealing to Sir Francis Galton to help bring it to the university level.”
Curtis: “In 1919, a rising statistician, by named Ronald Fisher, was offered a position to work at the Galton laboratory, which was run by Pearson. He turned it down for a temporary job analyzing crop data, and eventually, he published two of the most influential statistics books ever written: Statistical Methods for Research Workers and The Design of Experiments. For this work and other work, he’s now known as the father of modern statistics. And for anyone out there who has ever taken an introductory data analysis class, he is the man that introduced the Iris Data Set.”
Ginette: “And this bring us to the end of the beginning. In a little over ten minutes, we’ve highlighted some key and interesting contributors who helped build the foundation of data science. Keep in mind that everyone we’ve discussed so far was doing statistics and visualization by hand, an insanely laborious task. In the next episode, we’ll explore how this work was made dramatically more efficient and started to grow in innovative ways with the advent of the computer.”
Curtis: “Thanks again for joining us for another episode of Data Crunch. We really like making the show, and we hope you guys like listening to it. If you do like the show, please consider sharing it with your friends and with your family, teaching someone how to download a podcast, or sharing us on social media. The more people we can reach, the more we’re able to keep creating the show. So anyone that you can share it with, we would appreciate it. If you have any feedback for us, visit us at vaultanalytics.com, and we’d love to hear your comments.”
Sound Credits—A Big Thanks!:
Free Music Archive, Under Creative Commons Attribution 3.0 United States (CC BY 3.0 US)
Zapsplat, Under Attribution
Freesound.org, Under Creative Commons 0
Just Some of Our Randomly Consulted Sources: