The History Of Data Science and Pioneers You Should Know
Data science is a relatively new discipline. The term "Data Science" entered the lexicon in the early 21st century to categorize a new profession: the field of applied mathematics and statistics that provides insights based on large amounts of complex data or big data. Although the term Data Science is relatively contemporary, the history of Data Science is extensive.
Graduates with a Master of Science in Data Science are instrumental in furthering the discipline and helping organizations make discoveries from the world's incredible reserves of Big Data. If you are interested in developing a solid data science strategy, you could join those making history in one of today's most cutting-edge fields. Learn more about the disciplinary pioneers who have played a part in the conception and future of Data Science.
Data Science Pioneers
Millions of professionals work daily to advance Data Science to the next level, from Big Tech Data Engineers in Silicon Valley to government officials leveraging AI applications to solve community challenges. Throughout the history of Data Science, several key figures have been instrumental in the development of Data Science, including the following historical icons.
- Ada Lovelace: This Countess programmed one of the world's first computers more than 30 years before the invention of the electric light bulb. She is seen as an icon in the field of computer science. "The Analytical Engine has no pretensions whatsoever to originate anything," she wrote. "It can do whatever we know how to order it to perform. It can follow analysis, but it has no power of anticipating any analytical relations or truths."
- Timnit Gebru: Timnit is a computer scientist who advocates for diversity in technology, and is leading the way in the emerging field of ethical AI. Her work has included the study of algorithmic bias and resulting ethical implications, co-founding “Black in AI” - a community supporting inclusion of Black people in the field of AI, and co-leading the “Gender Shades” project which exposed bias in commercial AI systems.
- Alan Turing: He is considered to be the father of theoretical computer science and artificial intelligence. In 1942, Turing worked for the United States as part of an intelligence exchange and inspected the speech encryption system that enabled conversations between Churchill and Roosevelt.
- Ronald Fisher: He is a historical icon in the world of statistics and is often described as the most important figure in the development of modern statistical research.
- Claude Shannon: Dr. Claude Shannon created the information theory, making today's digital world possible. He was a mathematician, computer scientist, and creator of the "bit" (the basic unit of information), digital compression, and strategies for encoding and transmitting information between two points.
- John Tukey: Tukey coined the term "data analysis" and encouraged data scientists to find stories and meaning in data sets.
- Edward Tufte: He is an American statistician and professor of political science, statistics, and computer science at Yale University, known for his research on information design and as a pioneer in the field of data visualization.
- Yoshua Bengio: Bengio is recognized worldwide as a leading expert in artificial intelligence. Yoshua Bengio is most known for his pioneering work in deep learning, earning him the 2018 A.M. Turing Award, "the Nobel Prize of Computing," with Geoffrey Hinton and Yann LeCun.
- Karen Spärck Jones: She is an iconic British computer scientist behind the concept of inverse document frequency and index-term weighting — the principles are the foundation for modern search engines like Google. In 2019, The New York Times called her "a pioneer of computer science for work combining statistics and linguistics and an advocate for women in the field."
A Brief History of Data Science
The journey to structure, organize and understand data has a long history. The evolution of data science has involved discussions by scientists, statisticians, researchers, computer scientists, and notable industry pioneers for generations. The following timeline traces the evolution of Data Science and its inception, use, and popularity over the years.
1957
- Arthur Samuel coined the term "machine learning" and created the Samuel Checkers-Playing program, one of the world's first successful self-learning programs.
- IBM develops Fortran, a programming language that remains in use today.
1962
- John Tukey wrote a paper titled The Future of Data Analysis. He described a shift in the world of statistics, the merging of statistics and computers, and when computers were first used to solve mathematical problems and work with statistics.
1964
- Karen Spärck Jones published Synonymy and Semantic Classification, now considered a foundational paper in natural language processing.
1974
- Peter Naur used the term "Data Science" throughout his 1974 publication, "The Concise Survey of Computer Methods". He defined Data Science as "The usefulness of data and data processes derives from their application in building and handling models of reality."
1977
- The International Association for Statistical Computing (IASC) was formed with a mission to "foster worldwide interest in effective statistical computing and to exchange technical knowledge through international contacts and meetings between statisticians, computing professionals, organizations, institutions, governments, and the general public."
- Tukey published a second paper, titled Exploratory Data Analysis, about the importance of data in selecting and testing hypotheses.
1986
- A professor at Carnegie Mellon University, Hinton co-authors a paper with David E. Rumelhart and Ronald J. Williams on applying the backpropagation algorithm to multi-layer neural networks. This application was a milestone in AI because it allowed the networks to learn internal representations of data.
1989
- The Knowledge Discovery in Databases organization scheduled its first Data Science workshop. This organization and conference would later rebrand into the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, a conference that continues to run in 2022.
1990
- Researchers published a journal article titled CoverStory: Automated News Finding in Marketing about how companies leverage customer data in supermarkets to inform marketing strategies. This paper discusses customer data collection, automation, and personalization.
1993
- Yoshua Bengio, a professor at the Université de Montréal, founded Mila, the Montreal Institute for Learning Algorithms, a research institute on AI.
1997
- IBM's supercomputer program, Deep Blue, shocked the world when it beat the world chess champion, Gary Kasparov, in a six-game match.
1998
- The acronym NoSQL was first used by Carlo Strozzi and referred to a lightweight, open-source "relational" database that did not use SQL.
- Yoshua Bengio published a groundbreaking paper, "Gradient-based Learning Applied To Document Recognition," proving that specific algorithms can recognize images more accurately than standard technology.
1999
- Jacob Zahavi and Robert Stine publish Mining Data for Nuggets of Knowledge, a paper that explores how companies must use data to inform customer behaviors and market trends.
2001
- Software-as-a-Service (SaaS) was created, and Salesforce became a pioneer in the SaaS space. This was the precursor to using cloud-based applications.
- William S. Cleveland created an action plan to expand the technical areas of statistics focused on the data analyst titled, Data science: An action plan for expanding the technical areas of the field of statistics. The plan sets out six technical work areas for a university department, government research lab, or corporate research organization and advocates for the appropriate allocation of resources devoted to research in each area.
2002
- The International Council for Science: Committee on Data for Science and Technology (CODATA) started publishing the Data Science Journal, which focused on Data Science topics like the description of data systems, publication on the internet, applications, and risk and compliance issues.
2006
- Hadoop 0.1.0, an open-source, non-relational database, was released. Apache Hadoop is used in the present day as an open-source software library that allows for Big Data research.
2008
- DJ Patil and Jeff Hammerbacher of LinkedIn and Facebook make "Data Scientist" an official buzzword.
2009
- NoSQL was reintroduced when Eric Evans and Johan Oskarsson used it to describe non-relational databases.
2011
- Job listings for data scientists increased by 15,000%.
2012
- Harvard University declared the role of a Data Scientist as the sexiest job of the 21st century.
2013
- Statistics about Big Data, widely attributed to IBM, went viral: 90% of the data in the world was created within the last two years.
2015
- Google uses Deep Learning to launch speech recognition, Google Voice, and saw a 49 percent increase in performance.
- Google launched open-sourced TensorFlow, an artificial intelligence engine to enact Deep Learning using Big Data and Cloud.
2017
- The DeepMind team released AlphaZero. In 24 hours, AlphaZero achieved a superhuman level of play in Chess, Shogi, and Go by defeating world-champion programs Stockfish, Elmo, and the 3-day version of AlphaGo Zero.
- PricewaterhouseCoopers (PwC) forecasts job listings for data science and analytics will surge to 2.7 million by 2020.
2018
- Timnit Gebru and Joy Buolamwini co-author the paper ”Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification," detailing the tendency toward gender and racial bias found in commercial AI facial recognition software,
2020
- The WHO and its partners launch the Solidarity Trial, an international clinical trial to meet at the intersection of biology and technology and generate data sources and insights to create the most effective treatments for COVID-19.
Today
- The market for Big Data analytics in banking could rise to $62.10 billion by 2025.
- Data creation will grow to more than 180 zettabytes by 2025.
- Data science jobs will increase by around 28% by 2026.
- The global machine learning market was valued at $8 billion in 2021 and is anticipated to grow at a 39 percent compound annual growth rate (CAGR) by 2027.
Lead the Future of Data Science With a Master of Science in Data Science from WPI
Do you want to help lead the future of Data Science and work with cutting-edge technologies, like artificial intelligence and machine learning, to make real-world changes? The Master of Science in Data Science online degree program from Worcester Polytechnic Institute (WPI) will prepare you to do just that.
WPI is well known for its experiential learning model, which provides real-world knowledge and develops leadership, collaborative, and critical-thinking skills. Whether you have a background in data science or another field, you can take the necessary steps to become or advance your career as a data scientist at Worcester Polytechnic Institute.
The 30-credit-hour Master of Science in Data Science online degree program takes place entirely online and includes built-in bridge courses for those with an undergraduate degree outside the field. The program coursework will teach you how to master the skills you need to pursue a career as a data scientist, including:
- Programming and math foundations: Develop fundamental skills in computing languages, programming concepts, design and analysis techniques, algorithms, statistics, and linear algebra
- Data science methods and technologies: Learn how to create, manage, and analyze large-scale databases, use relevant statistical techniques such as predictive modeling and clustering, and understand machine learning
- AI & Machine Learning, Big Data Analytics & Management, or Business Intelligence: Choose between these three focus areas, or build your own specialization from various electives to tailor your degree plan to the career you plan to pursue.
By earning your MS in Data Science degree at WPI, you will become part of the alumni family at a prestigious, respected university that is ranked:
- #4 National Universities Where Grads Are Paid Well by US News and World Report (2021)
- #5 in Best Career Services by The Princeton Review (2019)
- Among the top 25 STEM Colleges by Forbes, Top 60 Most Innovative Schools by US News & World Report, and Top 30 Best Value Colleges by Payscale.com
- #5 Best Online Master's in Data Science by Fortune Magazine (2022)
Learn more about WPI's Master of Science in Data Science online degree program.