Who were the people pushing the limits of their time and circumstances to bring us what we know today as data science? We examine what motivated them to do their important work and how they laid the foundations for our modern world where algorithms and analytics affect everything from communications to transportation to health careâto basically every aspect of our lives.
This is their story.
Transcript:
Ginette: âShe was obsessed with her failureâshe thought she hadnât done enough. And it didnât matter that the public saw her as a heroine. So she ended up writing an 830-page report where she employed some power graphics, and this paired with her other efforts ended up changing the entire system.â
Ginette and Curtis: âIâm Ginette, and Iâm Curtis, and you are listening to Data Crunch, a podcast about how data and prediction shape our world. A Vault Analytics production.â
Ginette: âIn our last three episodes, we have just thrown you into the middle of data and prediction and the explosion of data science. And some of you have had some questions, like, How did data science become a thing?
âIn the next three episodes, weâre doing a miniseries where weâre going to address some of these questions, and I think youâll find it very interesting. Our story starts with an impressive woman.
âItâs 1854. Itâs the Crimean War, and a woman shows up at a hospital to help. She finds horrifying conditions. To paint an accurate picture for you, hereâs a little bit of what she found: the sewage and ventilation systems were broken; the floor was an inch thick with wasteâprobably human and rodent; the water was contaminated because, come to find out, the hospital was built over a sewer; rats were hiding under beds and scurrying past, as were bugs; and the soldiersâ clothing was swarming with lice and fleas; and on top of that, there were no towels, no basins, no soap, and there were only 14 baths for 2,000 soldiers. Keep in mind this was 20 years before Pasteur and Koch spread Germ Theory.
âSo she and the 37 nurses that she brought with her set to work, and they did their best to clean up the hospital and help the soldiers. Eventually, because of her, the government sent a sanitary commission. They flushed the sewers; they improved the ventilation. And this helped the situation dramatically. In the end, she reduced the death rate by two thirds.
âBut Florence Nightingale went home feeling like she had failed, which youâll remember we mentioned right at the beginning of the podcast. She felt a lot of soldiers had died needlessly. This drove her to write her famous 830-page report. And she ended up working with lead statistician William Farr, who actually helped invent medical statistics. He would say to her, âWe donât want impressions, we want facts.â And working under that type of context, she gathered vast amounts of complex army data and analyzed it to find something rather shocking: 16,000 of 18,000 deaths in hospitals were not due to battle wounds but to preventable diseases spread by poor sanitation.â
âSo these statistics completely changed her understanding. She thought the deaths were due to inadequate food and lack of supplies, but after the sanitary commission came in, she noticed that the mortality rate dropped significantly. So as Florence prepared her report, she was afraid that peopleâs eyes would glaze over the numbers and that they wouldnât grasp the significance of what she was trying to say. So she came up with a clever way to present her data: she ended up using graphics, in particular what sheâs known form the rose chart, to convey her message.â
Curtis: âNowadays, charts are everywhere, but back in her day, the idea of creating a picture that was defined by certain data points was not very common, and so the fact that Nightingale thought to do this was very innovative and clever, and it was important because it was able to communicate what she needed to communicate.
âHer mentor, William Farr, actually advised against this. Heâs quoted as saying, âYou complain that your report would be dry . . . The dryer [sic] the better. Statistics should be the dryest [sic] of all reading.â So hereâs the leading statistician of the day saying that when you write about statistics, it should be as dry as possible. Lucky for a lot of people in the army, Florence Nightingale decided to disregard this advice. Instead, she said that she wanted to use the chart, quote, âto affect throâ the Eyes what we fail to convey to the public through their word-proof ears.â
âShe knew that if her insights were buried in this huge document of text, there was a low chance that anyone would do anything with it. But if she could communicate very quickly through a chart, she could make an impact, and she could save a lot of peopleâs lives.â
Ginette: âAnd her work didnât stop there, for instance, during peacetime, she found that the average mortality rate for a certain segment of military soldiers, 20- to 35-year-olds, was almost double the average civilian mortality rate. Without mincing words, she essentially thought this about the situation: to have a mortality rate this high among the members of the military during peacetime, while the average civilian mortality rate was so much lower, was as criminal as taking soldiers out into a field and just shooting them. She felt that strongly that these deaths were preventable.
âShe didnât stop trying to change things later in life either. She attempted to have a position established at Oxford to teach applied statistics. Since most of the government members were college educated, and a lot of them at Oxford University, she thought it was very important that future leaders knew how to work with statistics. Most of the government departments at that time were collecting lots of data, and she felt that the data they were collecting was ineffectual because they didnât know how to analyze it. She had this to say about it, âthe enormous amount of statistics at this moment available . . . is almost absolutely useless.â It wasnât that the data was uselessâshe knew it was extremely valuableâbut she also knew that it was languishing in their inboxes because they didnât know how to gain insights from it. And Florence felt that in order for the government to be effective, all of its legislation and all its administration had to be data driven.
âUnfortunately, Sir Francis Galton didnât understand her vision, so the position wasnât instituted. Regardless of her lack of success in that particular area, she used her mathematical genius to save lives, and she was recognized for it. The same year that she printed her report, she became the first female member of the Royal Statistical Society. Then at the International Congress of Statistics, she worked to make medical data collection uniform and consistent because she knew that there was so much more data could tell them if they properly collected it and analyzed it. So she ended up developing a standard hospital form with William Farr and some other physicians to make it easier to collect this data.â
Curtis: âIâd like to point out here that often times itâs the small, mundane aspects of data science that are often the most critical. Florence Nightingale created a form, which isnât anything glamorous or exciting, but without the simple creation of a standard form to collect the data, there would have been nothing to fuel any kind of analysis. Data would have been sporadic, messy, and meaningless without the structure that, that standardized form gave to it.
Ginette: âIn the end the International Congress of Statistics approved it. And her legacy lives on today. Not only was she one of the first people to use statistical data in charts, she was probably the first people to use it for social change.
âBut where did statistical data visualization start? For that we have to head a little farther back in time to about the 1760s to a man you may have heard ofâJoseph Priestley. Joseph is your renaissance man. He was involved in everything. In Chemistry, he was the man who discovered Oxygen, and he invented soda water, so if you love soda, you should love this man. In English grammar, he wrote a wildly successful English grammar book that gained extensive circulation, probably because he used funny lines like this: âBeneath this stone my wife doth lie; she’s now at rest, and so am I.â In education, he created the first timeline graphs to teach his students about history, and it brought him a lot of notoriety, so much so that he received a doctor of law degree from the University of Edinburgh. The impressive thing about Priestleyâs timelines is that the line visualized a period of time, and you could use those lines to compare against each other. Youâre saying âWhatâs a big deal? Thatâs what a timeline is.â But nobody had done this before.
âWhile Joseph Priestley innovated timeline charts, William Playfair, who came after Priestley by about 25 years, is credited as the founder of graphical methods of statistics. They say he invented the line chart, bar chart, and pie chart.
âWhy hadnât graphs developed before William Playfair? At the time, graphical displays of data were not acceptable in science. People thought graphs would taint accuracy of the data. Also it was difficult to produce graphs. It wasnât like popping open Excel, plugging in numbers, and having it spit out an image. It called for very specialized engraving and printing knowledge that most people didnât have.
âIt turns out William learned about engraving, printing, and cartography early on in life. He also happened to have a unique blend of skills that set him up for success, at least in the graph development arena. He was schooled in math by his big brother, famous mathematician John Playfair, and so his early education mixed well with his practical skills: engineering, mapmaking, and printing, which he developed alongside some of the greatest figures of the Scottish Enlightenment.
âAnother very interesting thing is that 200 years before experiments actually proved this, William Playfair knew that it could âfacilitate the attainment of information and aid the memory in retaining it.â For instances, he only used three or four colors in his chart so they wouldnât overload working memory, and he placed labels next to the lines they represented instead of far away in a legend.
âBut no one in England in listened him because he had a checkered reputation. A lot of people thought he was a scoundrel. He had been convicted in England of libel; convicted of swindling in France; and had been involved in blackmail. So with that type of a background, people didnât trust him. Regardless of his shady background, this is essentially where data visualization started. And while there are a few other people who have been noted as contributors, this is really where major visualization headway happened.
“Now letâs combine data visualization with the rise of statistics, and then weâll have built part of the foundation for data science.
âWhat surprised me is that the basis for modern statistics, really started in the late 1800s early 1900s. I would have guessed that statistics started a lot earlier on. So technically Florence Nightingale wasnât using what we consider âmodern statisticsâ when she pushed for health reform. So who developed modern statistics? It was a handful of people. We have Sir Francis Galton, and he developed standard deviation, correlation, regression, and then his protege, Karl Pearson, moved statistics into other fields besides science, like politics and business, which is ironic because as we know, Florence Nightingale had tried to move it into politics earlier, even appealing to Sir Francis Galton to help bring it to the university level.â
Curtis: âIn 1919, a rising statistician, by named Ronald Fisher, was offered a position to work at the Galton laboratory, which was run by Pearson. He turned it down for a temporary job analyzing crop data, and eventually, he published two of the most influential statistics books ever written: Statistical Methods for Research Workers and The Design of Experiments. For this work and other work, heâs now known as the father of modern statistics. And for anyone out there who has ever taken an introductory data analysis class, he is the man that introduced the Iris Data Set.â
Ginette: âAnd this bring us to the end of the beginning. In a little over ten minutes, weâve highlighted some key and interesting contributors who helped build the foundation of data science. Keep in mind that everyone weâve discussed so far was doing statistics and visualization by hand, an insanely laborious task. In the next episode, weâll explore how this work was made dramatically more efficient and started to grow in innovative ways with the advent of the computer.â
Curtis: âThanks again for joining us for another episode of Data Crunch. We really like making the show, and we hope you guys like listening to it. If you do like the show, please consider sharing it with your friends and with your family, teaching someone how to download a podcast, or sharing us on social media. The more people we can reach, the more weâre able to keep creating the show. So anyone that you can share it with, we would appreciate it. If you have any feedback for us, visit us at vaultanalytics.com, and weâd love to hear your comments.â
Sound CreditsâA Big Thanks!:
Free Music Archive, Under Creative Commons Attribution 3.0 United States (CC BY 3.0 US)
Something Elated by Broke For Free, slightly modified by clipping
Day Bird by Broke For Free, slightly modified by clipping
My Luck by Broke For Free, slightly modified by clipping
Backed Vibes Clean by Kevin MacLeod, slightly modified by clipping
Zapsplat, Under Attribution
Freesound.org, Under Creative Commons 0
Just Some of Our Randomly Consulted Sources: