The History Of Data Science and Pioneers You Should Know

Data science is a relatively new discipline. The term "Data Science" entered the lexicon in the early 21st century to categorize a new profession: the field of applied mathematics and statistics that provides insights based on large amounts of complex data or big data. Although the term Data Science is relatively contemporary, the history of Data Science is extensive.

Graduates with a Master of Science in Data Science are instrumental in furthering the discipline and helping organizations make discoveries from the world's incredible reserves of Big Data. If you are interested in developing a solid data science strategy, you could join those making history in one of today's most cutting-edge fields. Learn more about the disciplinary pioneers who have played a part in the conception and future of Data Science.

Data Science Pioneers

Millions of professionals work daily to advance Data Science to the next level, from Big Tech Data Engineers in Silicon Valley to government officials leveraging AI applications to solve community challenges. Throughout the history of Data Science, several key figures have been instrumental in the development of Data Science, including the following historical icons.

  • Ada Lovelace: This Countess programmed one of the world's first computers more than 30 years before the invention of the electric light bulb. She is seen as an icon in the field of computer science. "The Analytical Engine has no pretensions whatsoever to originate anything," she wrote. "It can do whatever we know how to order it to perform. It can follow analysis, but it has no power of anticipating any analytical relations or truths."
  • Timnit Gebru: Timnit is a computer scientist who advocates for diversity in technology, and is leading the way in the emerging field of ethical AI.  Her work has included the study of algorithmic bias and resulting ethical implications, co-founding “Black in AI” - a community supporting inclusion of Black people in the field of AI, and co-leading the “Gender Shades” project which exposed bias in commercial AI systems.
  • Alan Turing: He is considered to be the father of theoretical computer science and artificial intelligence. In 1942, Turing worked for the United States as part of an intelligence exchange and inspected the speech encryption system that enabled conversations between Churchill and Roosevelt.
  • Ronald Fisher: He is a historical icon in the world of statistics and is often described as the most important figure in the development of modern statistical research.
  • Claude Shannon: Dr. Claude Shannon created the information theory, making today's digital world possible. He was a mathematician, computer scientist, and creator of the "bit" (the basic unit of information), digital compression, and strategies for encoding and transmitting information between two points.
  • John Tukey: Tukey coined the term "data analysis" and encouraged data scientists to find stories and meaning in data sets.
  • Edward Tufte: He is an American statistician and professor of political science, statistics, and computer science at Yale University, known for his research on information design and as a pioneer in the field of data visualization.
  • Yoshua Bengio: Bengio is recognized worldwide as a leading expert in artificial intelligence. Yoshua Bengio is most known for his pioneering work in deep learning, earning him the 2018 A.M. Turing Award, "the Nobel Prize of Computing," with Geoffrey Hinton and Yann LeCun.
  • Karen Spärck Jones: She is an iconic British computer scientist behind the concept of inverse document frequency and index-term weighting — the principles are the foundation for modern search engines like Google. In 2019, The New York Times called her "a pioneer of computer science for work combining statistics and linguistics and an advocate for women in the field."

A Brief History of Data Science

The journey to structure, organize and understand data has a long history. The evolution of data science has involved discussions by scientists, statisticians, researchers, computer scientists, and notable industry pioneers for generations. The following timeline traces the evolution of Data Science and its inception, use, and popularity over the years.



  • John Tukey wrote a paper titled The Future of Data Analysis. He described a shift in the world of statistics, the merging of statistics and computers, and when computers were first used to solve mathematical problems and work with statistics.


  • Karen Spärck Jones published Synonymy and Semantic Classification, now considered a foundational paper in natural language processing.


  • Peter Naur used the term "Data Science" throughout his 1974 publication, "The Concise Survey of Computer Methods". He defined Data Science as "The usefulness of data and data processes derives from their application in building and handling models of reality."


  • The International Association for Statistical Computing (IASC) was formed with a mission to "foster worldwide interest in effective statistical computing and to exchange technical knowledge through international contacts and meetings between statisticians, computing professionals, organizations, institutions, governments, and the general public."
  • Tukey published a second paper, titled Exploratory Data Analysis, about the importance of data in selecting and testing hypotheses.




  • Researchers published a journal article titled CoverStory: Automated News Finding in Marketing about how companies leverage customer data in supermarkets to inform marketing strategies. This paper discusses customer data collection, automation, and personalization.


  • Yoshua Bengio, a professor at the Université de Montréal, founded Mila, the Montreal Institute for Learning Algorithms, a research institute on AI.

  • IBM's supercomputer program, Deep Blue, shocked the world when it beat the world chess champion, Gary Kasparov, in a six-game match.


  • The acronym NoSQL was first used by Carlo Strozzi and referred to a lightweight, open-source "relational" database that did not use SQL.
  • Yoshua Bengio published a groundbreaking paper, "Gradient-based Learning Applied To Document Recognition," proving that specific algorithms can recognize images more accurately than standard technology.



  • Software-as-a-Service (SaaS) was created, and Salesforce became a pioneer in the SaaS space. This was the precursor to using cloud-based applications.
  • William S. Cleveland created an action plan to expand the technical areas of statistics focused on the data analyst titled, Data science: An action plan for expanding the technical areas of the field of statistics. The plan sets out six technical work areas for a university department, government research lab, or corporate research organization and advocates for the appropriate allocation of resources devoted to research in each area.



  • Hadoop 0.1.0, an open-source, non-relational database, was released. Apache Hadoop is used in the present day as an open-source software library that allows for Big Data research.






  • Statistics about Big Data, widely attributed to IBM, went viral: 90% of the data in the world was created within the last two years.



  • The DeepMind team released AlphaZero. In 24 hours, AlphaZero achieved a superhuman level of play in Chess, Shogi, and Go by defeating world-champion programs Stockfish, Elmo, and the 3-day version of AlphaGo Zero.
  • PricewaterhouseCoopers (PwC) forecasts job listings for data science and analytics will surge to 2.7 million by 2020.


  • Timnit Gebru and Joy Buolamwini co-author the paper ”Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification," detailing the tendency toward gender and racial bias found in commercial AI facial recognition software,


  • The WHO and its partners launch the Solidarity Trial, an international clinical trial to meet at the intersection of biology and technology and generate data sources and insights to create the most effective treatments for COVID-19.


