80/20 Python—Python for Data Science

In this post, you’re going to learn the 20% of Python that you’ll use 80% of the time.

This post is a bit longer and more involved than the other 80/20 Guides, because… well, you’re literally learning a programming language! But stick with it and don’t worry if things don’t make sense the first time around—they rarely do, and you’ll need to use Python for a while for these concepts to stick.

(But this guide is still emphatically not meant to be comprehensive—this guide will show you how to get up and running quickly with the most useful commands, but Python is a full programming language with way more to it than we can discuss here!)

Before we dive in, just a couple of items.

In case you want to learn the other tools that you’ll need to use as a data scientist, here are the other Project Data Science 80/20 Guides:

And if you need to get a professional data science environment set up on your computer, we have a guide for that: Step-by-Step Guide to Setting Up a Professional Data Science Environment.

Alright—Ready to get started?

[convertkit form=1849266]

Table of Contents


80/20 Python

What is Python?

Before we start using Python, let’s talk for just a minute about what Python is. To do this, there’s really no better place to start than the first few lines of the Python Wikipedia article:

“Python is an interpreted, high-level and general-purpose programming language. Python’s design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.”

Let’s take this piece by piece.

Python is General-Purpose and High-Level

The first line says:

“Python is an interpreted, high-level and general-purpose programming language.”

General-purpose just means that you can use Python for a lot of different things. You can do data science with Python, but you can also build web applications, or design games, or automate processes on your computer, or control robots and hardware, or all kinds of other things.

Source: https://www.botreetechnologies.com/blog/top-10-python-use-cases-and-applications/

And high-level means that the code you’re writing is fairly human-readable and isn’t at the “lower” levels that have to manually deal with the messy specifics of computer memory and things like that. A Python program can do amazing things in a fairly short number of lines because you’re working at a high level of abstraction, where a lot of the details are taken care of for you.

Python is Interpreted

What does that word “interpreted” mean in the first sentence?

An interpreted language like Python is typically contrasted with compiled languages like Java, and it helps to discuss them both. Although the difference can be a bit fuzzy, the difference essentially comes down to how your computer runs the code that you write.

In a compiled language, your computer first needs to look at the entire program and convert it into machine-readable code. This step is called “compiling” your program, and it basically means converting code from one language (the code that you wrote) into another language (the language that the computer reads). When you compile your code, you’ll typically create another file that contains the code in the machine-readable language.

In an interpreted language, by contrast, you don’t need to compile your code and you don’t create another file. Your code is read in by the interpreter directly when you run your code, and the interpreter handles everything that needs to happen in order for the code to run. The interpreter will need to convert your code into machine-readable code at some point, but it takes care of this right at runtime rather than before.

The difference between these two gets a little blurrier every year, and “the devil is in the details” with this one. If you want to learn more about interpreted versus compiled languages, you’ll need to dive in deep!

Python is Readable and Uses Whitespace

The second line from Wikipedia says:

“Python’s design philosophy emphasizes code readability with its notable use of significant whitespace.”

This is partially related to Python being a high-level programming language, but there are certainly other high-level languages that are less readable. Python is particularly readable, and this is a great thing for programmers—you probably spend ten times as much time reading code as you do writing it, so having a readable coding language is very nice.

One of the things that makes Python readable is that it uses whitespace as an integral part of the language. For example, the logic inside of for loops and if/then statements needs to be indented, usually by using four spaces. Functions and classes also have whitespace as a main part of their syntax. This means you can quickly scan Python code and see where some of the main logic is happening.

Here’s a very simple program that prints out even numbers between 0 and 100.

As a more realistic example, here’s some code that’s part of a model-training function. Even if you don’t know what’s going on, you can see the general sections and flow of the code.

Python is Clear, Logical, and Good for Small and Large Projects

Finally, here’s the final sentence from the Wikipedia paragraph:

“Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.”

Python is an incredibly versatile language. It’s good for small scripts that you use once and then throw away. It’s good for massive projects that millions of people use and interact with. You can use Python in a classic OOP (object-oriented programming) way, you can use it procedurally, and you can use it functionally.

However you use Python, the structure of Python code remains fairly intuitive. If you know other languages, you can pick up Python fairly quickly. And, once you learn Python you’ll understand many fundamental programming concepts.

Installing Python

It can often be confusing to learn about a programming language like Python before just diving in and using it a bit, though, so let’s get Python installed and get some hands-on experience!

If you want to just install Python, then head to the Miniconda page and download the Python 3 installer for your operating system. (Don’t use Python 2!) Miniconda is one of the most popular ways for data scientists to install and use Python—it’s basically just an easy way to get Python and Conda installed.

(You may already have Python installed—you can check by running “python –version” in your terminal or command line—but I would recommend using Miniconda anyway if you haven’t downloaded it before. You’ll want to have Conda installed, which doesn’t come with default Python installations.)

But if you want to get a full data science environment set up—with a code editor and all of that fun stuff—I would highly recommend that you follow our guide to setting up your full professional data science environment. The guides also discuss each piece of the environment a bit more in-depth, which we’re not going to do here.

(These guides cover the same material, just tailored slightly per operating system.)

But if you don’t want to do that, just follow the Miniconda installation instructions to get Python and Conda installed.

Once you get Python installed, open up a terminal (command line) and enter “python –version” and see if anything prints out. If you print out your Python version, then you’re ready to go!

Running Python

The first thing that you need to do to use Python is… well, you need to know how to run Python code.

There are a few very common ways to run Python code:

  1. The Python shell
  2. The IPython shell
  3. Jupyter Notebooks
  4. Scripts

We’ll cover each one of these briefly.

The Python Shell

Want to hop in and start writing and running code immediately? You can use the Python Shell for that.

First, open up a terminal / command line.

Then, just type “python” and hit enter.

Welcome to the Python shell! Here, you can write and run Python code line by line. We’ll run some very simple Python code—define a few variables, multiple some numbers together, and define a function.

The three “>>>” symbols together indicate where you type Python code, and the lines below that without the “>>>” symbols show the console output of the Python code that you run.

The variables that you define (like x and y above) are usable for the entire that you use the shell, unless you delete them. To get out of the shell, we can just type “exit()” or use control-z.

And that’s pretty much it for the Python shell!

The IPython Shell

The IPython shell is basically a fancy, more powerful version of the Python shell. The IPython shell doesn’t come with Python by default, so you may need to install it using Conda. IPython comes with Jupyter notebooks, which we’ll need later, so you can go ahead and install that using your terminal.

Here’s what you’ll see when you run this command (you’ll need to type “y” when prompted), except you’ll probably have more things listed under the list of packages that will be installed (including ipython).

Now you can run “ipython” and get into the IPython shell.

So what’s different about the IPython shell? Well, one of the most useful differences is the “?” help functionality. All you have to do is throw a question mark after anything, and the documentation for that thing will pop up. Let’s import the math package then use the “?” syntax to get the documentation.

When we hit enter, here’s what shows up.

This Docstring doesn’t have a lot in it, so let’s try it with a different function. Let’s import the pandas package and look at the read_csv() function.

The first part of the documentation shows the parameters that you can use.

If we scroll down, we get some descriptions of the read_csv() function and the parameters.

In addition to the “?” syntax, there’s also tab-complete and tab-suggestions. Let’s go back to the math package and see what objects we can access inside of it. We’ll type “math.” (notice the period at the end), then hit “tab”, and IPython will show us everything we have access to in the math package.

IPython also has a whole suite of extra “magic” commands to do all kinds of things you might want to do. All magic commands start with the “%” symbol, and you can run “%magic” to show the documentation for the magic functions.

Here’s what shows up when we run “%magic” in the shell.

There’s a lot more to IPython, but we’ll leave it there for now. Unless there’s some reason I can’t use IPython, I’ll always use the IPython shell rather than the default Python shell.

Jupyter Notebooks

Jupyter notebooks have become extremely popular over the last handful of years, and it’s easy to see why. Imagine a Google Doc (or Word doc) mixed with an IPython shell, and you basically have a Jupyter notebook.

Jupyter notebooks allow you to write and run Python code, as well as write and format normal text using Markdown—which is basically a way to write plain text and have it turn into formatted text.

People these days often use Jupyter notebooks instead of IPython shells anytime they need to do some ad hoc analysis or impromptu Python. But Jupyter notebooks are very versatile, and many data scientists have found themselves using Jupyter notebooks for more and more. If you look online, you can find people sometimes documenting entire machine learning projects inside a single notebook.

To launch Jupyter notebooks, you first need to start a Jupyter server by running “jupyter notebook” in your terminal.

You’ll see some text that shows that the Jupyter server is now running, and you should have a new window pop open in your browser.

What you’re seeing is the notebook dashboard, and it shows you the folder that you started the Jupyter server in. In my case, I started the server in an “80-20-python” folder that already had a notebook and a script in it.

If you want to go deeper into Jupyter notebooks, check out the Project Data Science 80/20 guide: 80/20 Jupyter Notebooks—Jupyter Notebooks for Data Science.

Python Scripts

Finally, the last way to run Python code is to write a script with the Python code in it, and then run that script from the terminal / command line.

For example, here’s our simple script from earlier.

We can run this code from the terminal by simply typing “python” and then the name of the script.

Although you’ll probably use Jupyter notebooks a lot as a data scientist, Python scripts are still essential. Packages like pandas and Scikit-Learn are essentially nothing but a lot of Python scripts. Python applications are a bunch of Python scripts. You can use Python scripts to clear your data, to create your machine learning models, and to deploy those models as well.

As a data scientist, you’ll write and run many different kinds of Python scripts depending on what you’re trying to do.

When to Use One Method Over Another?

With so many ways to run Python code, you may be asking: “How should I choose which way to run my Python code?”

Here are some basic guidelines for when to use which method:

  • Python Shell: Don’t use the Python shell unless you have to. If you want to use a shell, use the IPython shell instead.
  • IPython Shell: The IPython shell is great for very quick operations—hop in, run a couple lines of code, hop out. I’ll use the IPython shell if I’m going to be doing very quick experimentation and don’t want to create a Jupyter notebook. For anything that’s longer, I’ll use a Jupyter notebook or a script.
  • Jupyter Notebook: Jupyter notebooks are great for data analysis, experimentation, and ad hoc Python work. They’re also good if you want to do a whole experiment from beginning to end and show it to other people. But, if you’re going to be writing code that needs to run in multiple places, or that needs to be integrated into an application, or that you want to import into other code, or that’s part of a bigger project, you’re going to want to use a Python script instead.
  • Python Script: Python scripts are the main way to write and run Python code for most Python programmers. If you need to write reusable functions, or if you need to write end-to-end scripts that run on a regular basis, or if you need to create code that integrates with an application, then you’ll need to create a Python script for that. (Sometimes you’ll create code in a Jupyter notebook and then transfer it to a Python script when needed.)

Enough talking about how to run Python—let’s actually start using Python!

Basic Data Types

One of the first things you want to do when using a new tool is ask “what are the main objects here?” The first types of objects we’re going to talk about are the basic objects: strings, integers, floats, and booleans.

Strings

Here’s a string, which is just text.

You can add strings together, which concatenates them.

And you can access pieces of the string using the square bracket indexing notation.

If we want to see what methods and attributes are available on our string object, we can type a “.” after a variable and then hit “tab” to auto-suggest what’s available.

Integers

An integer (or int) is just a number without a decimal.

Floats

And a float is a number with a decimal.

Booleans

Finally, a boolean data type can either be True or False, and that’s it.

Data Structures

Now that we’ve covered basic data types, let’s talk about data structures that can hold these other data types.

Lists

A list stores values in order, and you access these values by their index. A list can store anything as its values, including other lists. Lists are created using square brackets.

You access values from the list using square bracket notation, and Python is zero-indexed which means that the first list value is stored at index 0.

You can check if a value is in a list by using the “in” keyword, and the result is a boolean.

We can add new values to a list using the “.append()” method.

Dictionaries

Dictionaries store data in key-value pairs. The key needs to be a basic data type—a string, int, float, or boolean—but the value can be anything you want it to be. Dictionaries are created using curly braces and colons.

Here’s a more complicated dictionary that stores all kinds of objects.

You access the values in dictionaries by passing in the key to square brackets.

Similar to lists, you can use the “in” keyword to see if a key is stored in the dictionary. Note that you’re checking for keys here, not values.

You can add new values to a dictionary by using square brackets with a new key, and then assigning that a value.

Sets

Sets are kind of like lists, but they only store unique values. So in a list, you can have two values that are both the integer 7—but in a set, you can only have the integer 7 once. You create sets by using only curly braces, or by passing a list to “set()”.

We can add values to a set using the “.add()” method.

But if we try to add a value that’s already present—even if we try to add it multiple times!—nothing happens.

Contrast this with a list, where you can have multiple of the same value.

Thus, one of the most valuable reasons for using a set is if you want to track unique or distinct values.

Variables

You’ll notice that we’ve been assigning all of these data structures and values to variables, which can be named pretty much whatever you want, as long as the variable name only contains letters, numbers, and underscores (with a few exceptions, such as that a variable can’t start with a number).

The convention is to use “snake_case” for variables—lowercase letters separated by underscores when needed. (Snake case… Python… Get it?) The only exception is classes which use “UpperCamelCase”—we’ll talk about classes later.

You can check the data type of a variable using the “type()” built-in function, which we also used above.

Control Flow Statements

Let’s talk about some control flow statements—statements that affect the flow and ordering of your code.

Logical Statements: If/Elif/Else

Python makes if-then style logic very easy. Here’s a simple example.

To check multiple conditions in a row—where we only want one condition to pass—we can use the “elif” statement.

Note that you can use an if statement by itself just fine.

Loops: For Loops

Very often, you’ll want to loop through an iterable of some kind. (An “iterable” is just something that you can iterate through, like a list of items.) A for-loop is how you can do this.

(Just to make it more interesting, we also changed the color to uppercase by using the “.upper()” string method.)

You can iterate through a set just as well as a list.

(We made it a little more interesting this time too! We printed each color out backwards.)

Iterating through a dictionary is pretty straightforward as well—the dictionary returns the keys one at a time.

Although if you do prefer to get the key and value at the same item, you can use the “.items()” method on your dictionary. In this case, two items are returned, which means you need to capture them both as different variables.

Loops: While Loops

Sometimes, you’ll want to keep looping through a piece of code until some condition is met or something happens—this is where a while loop comes in handy.

Let’s say we want to create a list with all of the even numbers below 100. One way we could do that is using a while loop.

(You’ll notice that we use a cool little “+=” trick here to add two to the variable number. We could also do “number = number + 2”, which would work exactly the same way.)

And just to verify that we created the list correctly, let’s take a look at it.

Looks good to me!

A common reason to use a while loop is in situations where you don’t know how much data you’re going to be processing, or when you don’t know when a condition will be met. (If you do know, then you can often use a for loop instead.)

Functions

A function is like a verb—it does something. A function does an action.

Let’s define a very simple function that adds two numbers and then talk through its parts.

We define the function using the def keyword. Then comes the function’s name, where we try to be descriptive: “add_two_numbers”. In the parentheses, we specify the parameters that the function accepts. Parameters are simply variables that the function either needs (required parameters), or doesn’t need but can use (optional parameters). In our case, if we’re going to add two numbers together then we need both numbers! So both of our parameters are required. (We’ll show what an optional parameter looks like in just a second.)

Finally, we have a return statement where we return the result of whatever we did.

Let’s run this function and see what happens!

Perfect!

Let’s create a function with an optional parameter now. To specify an optional parameter, we need to use the equals sign to give the parameter a default value.

The values that we’re passing into our functions are known as arguments—an argument is the specific value that we give to a parameter, while the parameter itself is kind of a named placeholder, just waiting for a value. The string “universe” above is the argument that we passed into the “name” parameter.

(This is a bit of a nuanced point that probably doesn’t quite fall into “80/20”, so don’t worry if that doesn’t make perfect sense.)

Notice that we don’t have to specify the names of the parameters as long as we pass things in the correct order. But, it can often help to be explicit here and specify the parameter names anyway. And, if we use the names, then we can pass the arguments in any order we want. Check this out, for example.

(One quick note—notice that the function is returning the sum, even though we’re not assigning it to a variable. Since that returned value is the last thing happening in the cell, the Jupyter notebook shows the value in the “Out[ ]” section.)

You don’t always have to have a return statement for a function—sometimes a function can just do something and be done. Our “print_hello_name” function above didn’t have a return statement, for example.

Classes

A class is like a noun—it is something. A class is a thing. Classes have methods, which are basically just functions “inside of” the class (functions that relate specifically to the class), and attributes, which are just variables inside of a class.

I’m going to go ahead and warn you—classes are where things start getting pretty complicated, so this section might not make sense if you’re a brand-new programmer. But that’s ok, you can make a lot of progress in Python just practicing what you’ve learned so far! Take it one step at a time, and learn more about classes once you’re tried using Python for a little bit.

Here’s how we define a class.

Now technically, a class is just the blueprint for an object. We need to take this blueprint and create a real object from it. (And just like a blueprint, we can create many different objects from this class if we want!)

This is called instantiation, which returns a specific object created from the class—that specific object is called an instance of the class.

So let’s create an instance of our new MySquare class.

Easy enough! Now we just instantiated our instance using an initial variable that we passed in. That variable is now stored as a class attribute, which we can access using dot notation.

We also defined a couple of class methods which make sense for squares—getting the area of the square and the perimeter of the square. Since methods are basically functions, we call them using the same parentheses notation that we used previously for non-class-based functions. (Plus the dot notation on the class instance.)

And since we chose a side length of 4 for our square, our area and our perimeter end up being the same number! (Different units though, of course—squared units for area, versus non-squared for perimeter.)

Now you’re probably thinking—what the heck is this self variable that we keep using inside of the class? Well, it’s there because the class seems some way of talking about itself when we’re defining the class. As we’re defining attributes and methods of the class, we often want to access other attributes and methods within the class… And so “self” is a way of doing that. When we first pass in side_length, we need to store it somewhere—so we create an attribute called side_length on the class. Then in our later methods, we need to access that side_length attribute, which is associated with the class—so we grab it using “self.side_length”, where “self” means “I’ve stored this attribute inside of my class somewhere, and this is how I’m referencing the class.”

Practically speaking, you just need to pass the variable “self” into every method when you’re defining it—but, you don’t need to pass in anything for self when you’re calling these methods on a class instance. A class method always gets “self” as the first variable passed into it—the class takes care of this for us.

Packages and Imports

One of the best parts of Python is the massive community it has writing amazing packages for it. Python has packages to do everything! From machine learning, to robotics, to astronomy, to geography, to building games, to building web applications… Python packages are how you do awesome things in Python.

So what are packages? Well, packages are just collections of Python scripts, functions, and classes that do helpful things. Then once a package is created, other people can install it and use it as well. Scikit-Learn is a Python package for machine learning, TensorFlow is a Python package for neural networks, and Matplotlib is a package for data visualization.

Many packages come installed with Python by default. These are called the Python standard library, and you can import them into your Python code using the import statement. Let’s import the os package, which lets us interact with our computer’s operating system.

We can use the “?” syntax in Jupyter notebooks to give us the documentation for this package.

Let’s say we want to use a new package that we don’t have installed, like the requests package that helps us use HTTP requests to get data from websites.

First, let’s create a virtual environment using conda, so that we have a separate installation of Python on our computer to install packages for just this project. Using virtual environments is a Python best practice so that you keep your project coding environments separate from each other.

We’ll create the conda environment in our terminal. I’ll go ahead and install Jupyter and pandas right when we create the environment, which is what you’ll probably usually do.

You’ll need to hit “y” at some point to install the environment. Now we can activate our virtual environment.

Now if we want to install a new package—like requests—we can use either pip or conda.

Pip is the primary Python package installer, and it’s totally fine to use pip. But since we’re using conda for virtual environments, it also makes sense to use conda to install packages when you can.

We’ll use conda to install requests. (Make sure to activate your environment first or you’ll install it in the wrong place!)

Now I’m going to make sure I’m using Jupyter notebooks with this virtual environment by starting a new Juypter server in this terminal.

And then inside of Jupyter notebooks, I can make sure I’m using the right environment by using the “!which python” command—this is actually a command line (bash) command, which you can use in Jupyter notebooks because of the “!” exclamation point syntax.

Now if we import requests, we have it!

Perfect—now we can use the requests package to pull data from websites using HTTP.

Example—Putting It All Together

Let’s talk through some aspects of Python while using the pandas package, one of the most frequently used data science packages.

Pandas is used for exploratory data analysis (EDA), data cleaning, data transformation, feature engineering, data visualization, statistical analysis, and preparing the data for machine learning. If you want to learn more about pandas, check out our guide: 80/20 Pandas—Pandas for Data Science.

First, we’ll import pandas “as pd”, which just means we’re going to call the pandas object “pd” when we use it. This alias is how people usually import pandas.

We’ll read in some data using the “pd.read_csv()” function that is a part of the pandas package, and we’ll assign the results to the variable “df” (which stands for “DataFrame”).

Now if we check the object type of the df variable, we’ll see that the type is a pandas “DataFrame” object. This is an instance of the pandas DataFrame class. (Remember that the class is the blueprint, while the instance is the real object that you make with that blueprint.)

We can use the “DataFrame.head()” method on the DataFrame to take a look at the first 5 rows of data. We call it a method rather than a function because it’s attached to a class instance.

Finally, let’s take a look at the “DataFrame.shape” attribute. We call it an attribute rather than a variable, because (once again!) it’s attached to a class instance.

Other Interesting and Important Python Functionality

List Comprehensions

List comprehensions are just a nifty way to do a for loop on a single line in Python—you’ll see people using list comprehensions all the time to do quick little manipulations with lists and other objects.

Here’s a list comprehension that creates a list of the first letters of colors.

Here’s a list comprehension that squares each number in a list.

You can also use if/then logic inside of list comprehensions—but I’ll leave that one for you to figure out when you need it.

Try/Except Blocks

Sometimes, you might want to try something with the expectation that it could fail. Obviously you don’t want your entire script or program to break if it does fail, which is where try/except blocks come into the picture.

Let’s say we have a list with a bunch of data, and we want to round all of the floats to integers. Well, strings don’t like to be treated like numbers… so your program will crash at this point without a try/except block.

Maybe we know that we have mixed data types in our data, though, and we’re fine with that. In this case, we might want to use a try/except block to try rounding each item—for the floats—but then printing the item anyway if the round fails—for the strings and other data types.

Here, we explicitly said that if the code breaks due to a “TypeError”, then we want to do something else. You can catch the types of errors explicitly (which is better), or you can catch all exceptions (not as great, but still gets the job done).

Formatting Strings Using “.format()”

Often, you’ll want to create a string in Python that uses a variable that you’ve defined elsewhere. For example, let’s say we have a variable with someone’s name, and now we want to print out their name.

Here’s how we can use the “string.format()” string method to accomplish that.

First, you want to define a string that has a variable name in curly braces. Then, you can replace that curly-braced-variable with whatever you want, by doing “.format(name=…)”.

Formatting Strings Using F-Strings

F-strings are a newer and slightly more elegant way to do the same thing as above. Rather than needing to use “.format()”, though, we can jump straight to using the variable right in the string by placing the letter “f” in front of the string and then using a variable that already exists.

Note that we need to use the exact variable name of what we want to insert into the string. Since we called our variable “my_name_variable”, we need to put that into the curly braces.

A general rule is that if you already have the variable defined, go ahead and use the f-strings. If you need to define the variable later—or if you want to use the same string multiple times with different variables—use the “.format()” method.

The “?” for Object Documentation

We’ve already used this, but I want to point it out again just because it’s so useful! Anytime you don’t know how to use something, just slap a question mark at the end of the object and run the cell (or hit enter in an IPython interactive shell). The documentation will pop right up!

The Python Debugger (pdb)

The Python debugger is a great little package that you can use to debug any section of your code.

For example, let’s say we have some code that’s breaking for some reason, and we don’t know why.

(This is a very simple, contrived example, just to show the point.)

Let’s import pdb (the Python debugger) and use a try-except statement to use the Python debugger if our code crashes!

Looks like our code threw an exception again—but this time, we have a little shell that we can use right at the place where the code broke! Let’s take a look at what our “x” variable is.

Oh no, looks like we have a string that we’re trying to add to our numeric total! Looks like dirty data to me…

We might not have known that this was the problem without using the Python debugger to check out our variables.

Additional Resources

And with that, you’ve really learned most of what you need to get going.

One of the best ways to learn Python is simply to use Python for real projects, like the projects we have in the Intro to Practical Data Science course bundle. If you try to learn Python from a textbook or something, you’re going to end up forgetting most of it after one or two months. But if you learn through doing projects, you’ll learn the right functionality when you need it and anything that you learn will be remembered for much longer.

But as you’re learning Python, you’ll find yourself going back to certain resources over and over again. Here are some of the most helpful resources that you’ll use on your journey.

  • The Python Standard Library Reference
    • This page has everything you’ll ever need to know about the amazing built-in functionality that Python has—all of the data structures and packages and features that Python comes with by default. Definitely give this page a good skim. (There’s gold in yonder hills…)
  • StackOverflow
    • The greatest Q&A site on the Internet, hands down—pretty much every programming question you’ve ever thought of. Get ready to love this site.
  • PEP8
    • PEP8 is the official style guide for Python—it has everything you need to make sure your code looks good.
    • But, a good “linter” like Pylint will help you write beautiful code without you needing to memorize all of PEP8.
  • Pylint
    • Just to reiterate… a good linter like Pylint goes a lonnnnnng way. Install it and use it to help you write cleaner code!
  • Managing Conda Environments
    • This page has everything you need to know about creating, using, and removing conda environments. Before I memorized the commands, I used this page pretty much every other week.

And of course, we’re biased, but we think the Project Data Science 80/20 Guides are some of the best, most concise resources on the Internet for data science.

Conclusion

And with that, you’ve just learned the 20% of Python that will get you 80% of the value. A lot more to learn, but you’re well on your way! (And learning Python is incredibly fun—I hope you enjoy the journey.)

Happy learning!


Introduction to Practical Data Science in Python

This course bundle covers an in-depth introduction to the core data science and machine learning skills you need to know to start becoming a data science practitioner. You’ll learn using the same tools that the professionals use: Python, Git, VS Code, Jupyter Notebooks, and more.

Leave a Reply