In this post, you’re going to learn the 20% of Matplotlib that you’ll use 80% of the time.

*(This guide is emphatically **not** meant to be comprehensive—this guide will show you how to get up and running quickly with the most useful commands.)*

Just a couple quick things before we dive in.

In case you want to learn more than just Matplotlib, here are the other Project Data Science **80/20 Guides**:

**I**f you need to get a professional data science environment set up on your computer, we have a guide for that: Step-by-Step Guide to Setting Up a Professional Data Science Environment.

And, if you’d rather watch an in-depth video tutorial, we have that for you right here: YouTube – Matplotlib Mega Tutorial.

Alright—Ready to get started?

[convertkit form=1849266]

## Table of Contents

**Table of Contents****80/20 Matplotlib****Additional Matplotlib Information****Conclusion**

## 80/20 Matplotlib

### Primary Objects

One of the first things to ask when faced with a new Python package is, “What are the primary data structures, methods, and other objects?”

Since Matplotlib is a data visualization package, the primary objects we’re going to be using relate to pieces of a data visualization.

#### Figure

First, there’s the Matplotlib **figure** object, which is essentially the entire image. There could be one graph on that image, or a million graphs on it—the whole image is the **figure.** The figure also includes any text on the image, including plot titles, axis labels, annotations, legends, etc.

#### Axes

Second, there’s the Matplotlib **axes** object, which is the part of the figure where the graph actually happens. One figure can contain multiple axes objects, in order to have multiple graphs. The axes object is where we’ll actually be plotting our scatter plots, bar charts, line charts, etc.

**One important note about confusing terminology.** On a graph, you’ll typically have an x-axis and a y-axis, which are your horizontal and vertical lines on the bottom and side of your graph. Be careful not to confuse this kind of “axis” with the Matplotlib **“axes”** object. An axis is a single dimension of a graph, like an x-axis, while a Matplotlib **axes** object relates to where a whole plot is being graphed. A good synonym for “axes” might be “graph”, so you can think about it like that if you want.

Here’s a very helpful graphic from the Matplotlib website showing the different pieces of a Matplotlib image.

Source: https://matplotlib.org/tutorials/introductory/usage.html

### Loading Data Using Pandas

Let’s get some data to play with. We’re going to be using the world happiness dataset from Kaggle, which you can download here: https://www.kaggle.com/unsdsn/world-happiness#2019.csv. Specifically, we’ll look at the 2019 data.

We’re going to use the Python package **pandas** to load the data, which is very often how we’ll load data in data science projects. We’ll be doing all of our coding in a **Jupyter notebook.**

**If you want to learn how to use the pandas package, see our guide ****80/20 Pandas—Pandas for Data Science****.**

From the first five rows of the dataset, you can see that we have a list of countries and some data about those countries: GDP per capita, healthy life expectancy, and the happiness score, for example.

### Our First Graph—A Histogram

Suppose we want to take a look at the distribution of happiness scores. We can do this in Matplotlib using a **histogram.** Let’s go ahead and import Matplotlib and get ready to create our first data visualization!

The first thing you’ll notice is that we’re actually importing the **pyplot** subpackage from Matplotlib, and we’re aliasing it as **plt,** which is the common way to import pyplot. Most of the time, the pyplot subpackage is the only thing you’ll need to import from matplotlib.

But if you do want to see what else we could import from Matplotlib, you can use **tab-complete** in Jupyter Notebooks to see what other subpackages and modules are there.

But like we said above, we’ll just be sticking to pyplot.

Let’s go ahead and plot a **histogram** of the Score column in our dataset. We can do this in three short lines of code.

First, we create the **figure** and **axes** objects (which are the two primary objects we mentioned in the beginning of the article). Second, we plot our histogram. And third, we show our plot.

Notice how the **ax.hist()** method of plotting the histogram is a **method** on our **axes** object. You plot data visualizations on the axes objects, not on the figure object—the figure object is a container for axes, titles, and a few other things. But, the plotting is done on the axes object.

You’ll notice that our **x-axis** has a **scale** going from roughly three to eight, while the **y-axis** has a **scale** going from zero to about thirty. The x-axis **ticks** show every integer between three and eight inclusive, while the y-axis **ticks** only show multiples of five.

#### Adding a Figure Title

Let’s add some text to our image to make it more informative. First, we’ll add a title using the **figure.suptitle()** method on our figure object.

#### Adding Axis Labels

And we usually want to add an x-axis **label** and a y-axis **label** as well, to describe what the axis represents.

This is looking good! We’ve already learned how to do most of the important things in Matplotlib—we can create a graph, we can add a title, and we can add axis labels. I’d say this represents about 60% of what you’ll want to do in Matplotlib.

But, with just a few more pieces of functionality we can get you up to 80%. So let’s keep going.

### Other Graph Types

Histograms are only one type of graph that you’ll want to plot. There are also bar charts, line charts, scatter plots, and more. Let’s show how to do some of those really quickly.

#### Scatter Plot

If we want to look at the relationship between Score and GDP per Capita, that’s a perfect opportunity for a **scatter plot.** We can use the **axes.scatter()** method to accomplish this.

We can see a very strong positive correlation there, which is good information to have.

#### Line Chart

Maybe we want to see a **line chart** of all of the GDP values sorted from smallest to largest. We can sort our data using the Python **sorted()** function, and then use the **axes.plot()** method to plot a line chart.

*(In this graph, we removed our x-axis label since the x-axis doesn’t really mean anything in this graph, other than being a general numeric index for each data point.)*

#### Bar Chart

Finally, let’s say we want to present the happiness scores for the top five countries. This is a perfect use case for a **bar chart,** which we can do using **axes.bar()** method. First, we create two variables to hold the data for our top five countries and those countries’ scores, then we create the plot.

### Using Color

Let’s talk about **color** for a bit, since effective use of color is one of the most important parts of data visualization.

First, we can change the color of any graph using the parameter **c.** (Sometimes the parameter is called **color** rather than **c.**) Let’s change the color of our scatter plot to green.

For scatter plots, we can do something much more powerful though—we can color each individual data point by *another* variable. For example, let’s say that we want to color each data point based on the healthy life expectancy. We can simply pass the healthy life expectancy data to the parameter c.

#### Colorbar

Look at that! There’s obviously a pattern here. But, there’s nothing here to tell us what those colors mean… Let’s fix that with a **colorbar.**

#### The cmap Parameter

If we want to change the color palette, we can pass in an argument for the **cmap** parameter (“color map”).

### Plotting Multiple Graphs—Same Axes

Now that we’ve covered a lot of the functionality that you can do with a single graph, let’s talk about how to do multiple graphs. The first kind of multi-graph plot that we’ll work with is where you plot multiple graphs on the **same axes. **Doing this is incredibly easy—you simply add one more line of code with your new plot.

For example, let’s say that we want to plot two line charts on the same graph—our first line chart will be the happiness score of each country, and the second will be the healthy life expectancy of each country.

We can simply have two **axes.plot()** method calls on the same axes object. (We’ll remove the plot text for now.)

#### Adding a Legend

In order to tell which graph is which, we can add a **legend** to our graph. First, we’ll need to pass in a **label** parameter to each plot method. Then, we can call the **axes.legend()** method to add the legend to our graph with the correct labels.

Very often, we’ll want to do multiple histograms on the same graph, to compare the distributions of two variables. For example, let’s say we want to split the countries up into “high happiness” and “low happiness” groups, and then plot the GDP per capita distributions for each of those groups to see if the distributions of GDP per capita are different.

First, we’ll create our variables.

Then, we’ll plot the two histograms on the same axes.

#### Alpha Parameter for Transparency

The orange histogram is covering a large part of the blue histogram, which isn’t good. We can fix this by using the **alpha** parameter, which sets the transparency of each graph. We’ll set the alpha to something less than 1 so that each graph becomes transparent and we can see both graphs clearly.

The alpha parameter is useful anytime you need to plot multiple overlapping graphs like this.

### Plotting Multiple Graphs—Different Axes and Subplots

Rather than doing multiple plots on the same axes, what if we want totally separate axes for each plot? This is where we can use **subplots** with the **plt.subplots()** function.

Let’s say we want to create a two-by-two figure, with four axes total. Before plotting anything, let’s just see what happens when we create that two-by-two figure with four axes.

You can see that the axs object is now an **array** (technically a **NumPy ndarray**) that holds four different axes objects, one axes per graph. Since we didn’t plot anything, each axes shows up as a blank rectangle for now.

Let’s get each axes object from the array and plot histograms for four of our variables.

Beautiful! But once again, we’re missing text which means that we don’t know what data each plot has. We can add text using each of the axes objects. In addition to a title for the whole *figure*, we can also add a title to each *axes*, now that we have multiple axes objects.

#### Tight Layout

It looks like our axes titles are getting all mixed up with our x-axis tick labels. Let’s use the **plt.tight_layout()** function to see if we can fix that.

There we go.

### Saving Figures

Finally, we’ve created some cool graphs that we might want to share with others—how do we save graphs out to files? We can save images using the **plt.savefig()** function. The function can’t save JPG files, but it can save PNG files, so let’s do that.

If we look in our current directory, we’ll find our image there—and we can open it like any normal image.

## Additional Matplotlib Information

And with that, we’ve finished the main section of 80/20 Matplotlib! This is enough Matplotlib functionality to get you through many of the data visualization tasks you need to do.

However, Matplotlib is a rather big, hairy, extensive package—so before wrapping up, there are a few more things that you should at least be aware of in case you run into them.

### Two Matplotlib Styles: Object-Oriented vs. MATLAB

You’ll notice that we’ve occasionally used the **plt** (pyplot) object to do certain things, such as show figures and save figures and create subplots. There’s certain functionality that pyplot handles, that isn’t handled by the figure or the axes—this can be confusing at first, but just know for the most part you’ll only be using the figure and axes objects.

But, there is a way of using Matplotlib that relies pretty much entirely on the **plt** object, without using **fig** or **ax** at all. This method of using Matplotlib is called the **MATLAB-style,** where there’s a “hidden” state in the background where your figure and axes live.

For example, in the MATLAB-style way, you can create a scatter plot by just calling **plt.scatter().**

Although this seems simpler at first, it becomes more convoluted very quickly as your graphs get more complex. If you prefer this style after doing your own research, then go for it—otherwise, we recommend sticking with the more explicit **fig, ax** way of using Matplotlib, which is called the **object-oriented** style of using Matplotlib.

Matplotlib generally recommends the object-oriented style as well. Here’s a quote from the Matplotlib website:

*“For more complicated applications, this explicitness and clarity becomes increasingly valuable, and the richer and more complete **object-oriented interface** will likely make the program **easier to write and maintain**.”*

*Source: **https://matplotlib.org/3.1.0/tutorials/introductory/usage.html#coding-styles*

### Searching for Matplotlib Documentation and Answers

Finally, customizing Matplotlib graphs can take a lot of work and research. The package is so powerful that you can do pretty much *anything* with it… but doing *what you want* can take some digging.

The documentation will very often be your friend in this case, as will StackOverflow. When you find yourself needing to do something, just Google your problem and the documentation (or a StackOverflow answer) should pop right up.

I would recommend including either **axes** or **figure** in your search so that you get results for the object-oriented style, rather than the MATLAB style.

## Conclusion

And with that, you’ve just learned the 20% of Matplotlib that will get you 80% of the value. Feels pretty good, right?

Happy learning!

*PS, here are some extra resources for you.*

*Here’s a great**introductory guide**on the Matplotlib website:**https://matplotlib.org/tutorials/introductory/usage.html**Seaborn**is a nice data visualization tool built on top of Matplotlib:*https://seaborn.pydata.org/*Plotly**is an interactive data visualization package for Python:*https://plotly.com/python/*And finally, here’s the**Project Data Science Matplotlib Mega Tutorial**if you want to dive deeper:*https://youtu.be/axSTGczvYIE

Introduction to Practical Data Science in Python

This course bundle covers an in-depth introduction to the core data science and machine learning skills you need to know to start becoming a data science practitioner. You’ll learn using the same tools that the professionals use: Python, Git, VS Code, Jupyter Notebooks, and more.