No, I don't know anything about them. I hate all snakes.
That statement, made by a retired electronics technician who specialised in machine coding set me on my heels for a bit. I didn't ask if he knew anything about pythons - plural; I wanted to know if he knew anything about Python - singular.
Maybe it was because our video conference connection wasn't that great or, perhaps, as he's an avid gardener, he thought I was referring to reptiles - although I shudder to think about what kind of garden pythons would call home.
Or it could just have been that Python wasn't that renowned or used in his specialised field so he never had a reason to study up on it.
I explained my premise - research for this article, we had a good laugh and discussed one of his other areas of competence: the burgeoning field of data science. It was burgeoning at the time of his retirement and is still growing today.
In fact, 'data science' is one of the most googled terms. In itself, that statistic is significant; it's as important as the relationship between data science and Python is.
So let's discover what data science is, exactly, its relationship with Python and how you might build yourself a satisfying and rewarding career by learning about both of them.
What Is Data Science?
How can you tie statistics and data analysis to informatics? By calling it data science.
Strangely enough, there is still no consensus of what, exactly, data science is. In loose terms, it is defined as above but, fundamentally, it remains a concept, not an actual science.
Find good python courses London here on Superprof.
The term first appeared in 1962, when American statistician John Tukey defined what he did as data analysis. His work incorporated many aspects of today's data science but it wasn't until 1985 that that term made its first official appearance. It took seven more years for formal acknowledgement of a new research field that combined principles and concepts of statistics and data analysis with computing.
That field would become known as data science.
Each of those dates has special significance in the computer world:
- 1962 saw the first computer program, the development of RAM and virtual memory
- 1985: C++ programming language was published; MIT founded its Media lab, and Michael Dell, founder of Dell Computers, opens his first company (custom-built personal computers)
- Nintendo released its NES gaming console, taking computer games out of the arcade and into the living room
- 1992: the Intel Paragon parallel supercomputer is deemed the fastest computer in the world
- Paragon was significant because it was used to crunch all manner of data, scientific and statistical.
Another significant year for data science: 1991. That's the year that the World Wide Web went public. Computers went mainstream, providing data scientists with a glut of data for the asking - even if they didn't yet know what to do with it all. Still, since then, data collection and analysis has never been the same.
Prior to - and even through these developments, statisticians and data analysts had their work cut out for them. First, they would have to gather their data, decide which variables to consider and which methods they would use to gain insights from it and, finally, model their attained information and interpret it for whichever party commissioned the study.
Back then, just gathering relevant data was a monumental (and onerous) task; computing and rendering it took serious brain power. Today, data scientists have an abundance of data literally at their fingertips and they have computers that can spit out scatterplots on demand.
Now, data scientists are embracing the exciting field of machine learning, teaching computers how to improve their algorithms through the use of data. That and data mining, discovering patterns in large datasets are currently the main directions data science is headed towards.
Find good python classes here on Superprof.
the Ethos of Python
In 1990, Sir Tim Berners-Lee ran into a major roadblock on his way to making his World Wide Web go live: funding.
The trouble was that his code ran only on NeXT computers. Have you ever seen one or even heard of that brand? It was Steve Jobs' early creation, debuting in 1985. It did not gain much of a market share and, just 12 years after the machines hit the market, the brand was defunct.
At the time, other innovators were building computers with different operating systems, all of which could, presumably, be compatible with Sir Berners-Lee's internet code. The trouble was, CERN was the World Wide Web's financial backer but the decision-makers baulked at having to pay for additional software versions.
And so, the call went out to all software engineers, programmers... anyone who knew anything about programming languages, to write browsers that will run on all types of machines. Well, actually, it was a text-only page that went out over the existing computer network.
This bit of internet lore is one reason why there are so many programming languages today. Python's genesis is a direct descendant of a programming language that may have been a part of that mad scramble as its parent language, ABC, released its first stable version around that time.
That's speculation, of course. Our research stopped short of finding a connection between the dawn of the internet and the ABC programming language.
Another reason for all the programming languages is that different languages address different aspects of computing. Some emphasize high performance, the kind needed in robotics and gaming while others are written specifically for desired functions - Java is a prime example of such. Where does Python fit in?
The Python programming language was born out of frustration at the overly complex syntax of programming languages. For instance, if you're working in Java or C++ and wanted to give a print command, your code would consist of multiple lines, curly brackets, hashtags and other symbols.
By contrast, the print command in Python is a single line that starts with the command - print, followed by what needs printing in parentheses and double-quotes.
Python's ethos is simplicity. Indeed, the Zen of Python's third statement is "Simple is better than complex.". Further down that list of 19 principles, we find that "Sparse is better than dense." and "Readability counts."
Granted, these statements are meant to define how Python should be written: simply, sparsely and readably.
But if you consider them in conjunction with data analysis, don't those adjectives take on a whole different meaning?
Find good python tutorial here on Superprof.
Data Science and Python
Python is well-suited to several aspects of computing. For instance, it is one of the top three languages used in web development and is also used in robotics, albeit in a limited capacity. You could even develop computer games in Python!
Of all the fields Python adapts to, data science is one of the areas where it is most widely used. Thanks to its Python Package Index (PyPI) of nearly 300 thousand modules called packages - among them mathematical libraries and functions, conducting data analysis is simply a matter of plugging in the right module to get the desired results.
One Python library, NumPy. contains an extensive collection of mathematical functions written to parse through multiple dimensions of data arrays and matrices.
Python did not originally include language for numerical coding. However, because this programming language fired the interest of the scientific community early on, that deficit was soon made up through a special interest group that wasted no time putting together an array computing package.
SciPy is another Python package particularly beneficial to data science; its focus is on technical and scientific computing. This library contains linear algebra modules, as well as modules for integration, interpolation and image processing. Most notably, its special functions module is a great tool for data scientists because it contains utilities vital to varying types of analyses, from mathematical to functional.
And then, there's Matplotlib, Python's plotting library that is capable of embedding plots into applications via an object-oriented application programming interface (API). That all sounds complicated and highly scientific but it essentially boils down to a group of coded computer programs that, when executed, will render analysed data as a scatterplot, a graph - two-dimensional or 3D, a line plot or a histogram.
NumPy, SciPy and Matplotlib are three reasons why data science and Python are so firmly enmeshed. No matter if a data scientist is crunching marketing data, cosmic data or atmospheric data, each of these libraries have modules capable of analysis and turning out visually interpreted results.
And then, there's pandas, another library written for Python, designed for data analysis and manipulation. That sounds underhanded but data manipulation is an integral part of data analysis. You have to set parameters so that the data under examination will be useful.
Python's vast catalogue of analytical functions and mathematical tools make this programming language indispensable to data scientists of all stripes.
Without Python applications to lend a hand, data scientists would be crushed under today's huge troves of data - from cosmological and environmental concerns to how people shop. And, considering how many novel ways analysed data is put to work, it's no wonder that data science is currently one of today's hottest career fields.
Discover for yourself the many ways Python and data analysis are used...
The platform that connects tutors and students