Mastering Pandas DataFrame Styling for a Stunning Presentation

Pandas
Author

Adam Cseresznye

Published

August 27, 2023

Image by Mika Baumeister

As you wrap up your data analysis journey, you face a fun decision: how to share your discoveries effectively. Tables and graphs both have their moments to shine. Tables work great when you want your audience to spot exact values and compare them. But, here’s the twist: tables can look overwhelming at first, especially if they’re crammed with data.

But don’t worry! Styling and formatting are here to help. Pandas comes to the rescue. It lets you turn your data into stylish tables effortlessly.

In this article, we’ll explore some tricks to make your Pandas DataFrames look awesome and tell a clear data story. For the demonstrations, we’ll dive into Seaborn’s built-in “tips” dataset. This dataset is a nifty data frame with 244 rows and 7 variables, offering a peek into the world of tipping. This comprehensive collection includes variables like the tip amount in dollars, the bill total in dollars, the gender of the bill payer, the presence of smokers in the party, the day of the week, the time of day, and the party size. Ready to roll? Let’s jump right in!

Setup

Code
import pandas as pd
import numpy as np
Code
df = pd.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
)
df.sample(5)
Table 1: Original table
total_bill tip sex smoker day time size
109 14.31 4.00 Female Yes Sat Dinner 2
42 13.94 3.06 Male No Sun Dinner 2
44 30.40 5.60 Male No Sun Dinner 4
37 16.93 3.07 Female No Sat Dinner 3
40 16.04 2.24 Male No Sat Dinner 3

Built-in styling

In this article, we won’t cover every single feature of the Styler class because that might make it too long. Instead, we’ll concentrate on the practical aspects that can simplify your data analysis, improve data comprehension, and make your message clearer to your audience. So, instead of taking a deep dive, we’ll start with the low-hanging fruits and explore the built-in options that Pandas offers.

Before we dive in, let’s prepare an aggregated dataframe for our demonstrations. Here it is:

Code
grouped = df.groupby(["day", "smoker", "time"]).median(numeric_only=True)
grouped
Table 2: How the Data Appears After Grouping and Calculating Medians (Before Pandas Styling)
total_bill tip size
day smoker time
Fri No Dinner 22.490 3.250 2.0
Lunch 15.980 3.000 3.0
Yes Dinner 16.320 3.000 2.0
Lunch 12.790 2.100 2.0
Sat No Dinner 17.820 2.750 2.0
Yes Dinner 20.390 2.690 2.0
Sun No Dinner 18.430 3.020 3.0
Yes Dinner 23.100 3.500 2.0
Thur No Dinner 18.780 3.000 2.0
Lunch 15.235 2.105 2.0
Yes Lunch 16.470 2.560 2.0

Highlighting minimum and maximum values

Check out these handy helpers: highlight_min and highlight_max. They’re your go-to for spotting the lowest and highest values in each column. It’s a speedy way to emphasize the most important data points in each category.

Also, please note the format(precision=1, thousands=",", decimal=".") snippet, this is not exactly a built-in style feature but has everything to do with keeping those float numbers tidy. Pandas tends to display more decimal places than we often require, which can be a bit distracting. To tone it down a notch, we can lean on the format() and format_index() methods to fine-tune the precision. Trust me, it’s super useful!

Code
(
    grouped.style.format(precision=1, thousands=",", decimal=".")
    .highlight_max(
        axis=0, props="color:white; font-weight:bold; background-color:green;"
    )
    .highlight_min(axis=0, props="color:white; font-weight:bold; background-color:red;")
)
Table 3: Highlighting Maximum and Minimum Values with Pandas Styler
      total_bill tip size
day smoker time      
Fri No Dinner 22.5 3.2 2.0
Lunch 16.0 3.0 3.0
Yes Dinner 16.3 3.0 2.0
Lunch 12.8 2.1 2.0
Sat No Dinner 17.8 2.8 2.0
Yes Dinner 20.4 2.7 2.0
Sun No Dinner 18.4 3.0 3.0
Yes Dinner 23.1 3.5 2.0
Thur No Dinner 18.8 3.0 2.0
Lunch 15.2 2.1 2.0
Yes Lunch 16.5 2.6 2.0

Emphasizing Values Within a Range

Imagine you’re want to find the days when tips fell between 3 and 5 dollars. In this scenario, the highlight_between method comes to the rescue. Don’t forget to use the subset argument; it’s your trusty sidekick when you only want to work with selected columns.

Code
(
    grouped.reset_index()
    .style.format(precision=1, thousands=",", decimal=".")
    .highlight_between(
        left=3,
        right=5,
        subset=["tip"],
        axis=1,
        props="color:white; font-weight:bold; background-color:purple;",
    )
)
Table 4: Highlighting Data within a Specified Range Using Pandas Styler
  day smoker time total_bill tip size
0 Fri No Dinner 22.5 3.2 2.0
1 Fri No Lunch 16.0 3.0 3.0
2 Fri Yes Dinner 16.3 3.0 2.0
3 Fri Yes Lunch 12.8 2.1 2.0
4 Sat No Dinner 17.8 2.8 2.0
5 Sat Yes Dinner 20.4 2.7 2.0
6 Sun No Dinner 18.4 3.0 3.0
7 Sun Yes Dinner 23.1 3.5 2.0
8 Thur No Dinner 18.8 3.0 2.0
9 Thur No Lunch 15.2 2.1 2.0
10 Thur Yes Lunch 16.5 2.6 2.0

Highlight Column-Wise Outliers

Utilizing parameters such as q_left=0.05, q_right=0.95, axis=0, and defining props='opacity: 10%;', we can highlight values residing outside the 5-95 percentile range.

Code
(
    grouped.style.format(precision=1, thousands=",", decimal=".").highlight_quantile(
        q_left=0.05, q_right=0.95, axis=0, props="opacity: 10%;"
    )
)
Table 5: Highlighting Outliers
      total_bill tip size
day smoker time      
Fri No Dinner 22.5 3.2 2.0
Lunch 16.0 3.0 3.0
Yes Dinner 16.3 3.0 2.0
Lunch 12.8 2.1 2.0
Sat No Dinner 17.8 2.8 2.0
Yes Dinner 20.4 2.7 2.0
Sun No Dinner 18.4 3.0 3.0
Yes Dinner 23.1 3.5 2.0
Thur No Dinner 18.8 3.0 2.0
Lunch 15.2 2.1 2.0
Yes Lunch 16.5 2.6 2.0

Display Bar Charts within Your Table

Let’s explore a technique for highlighting the significance of values by embedding bar charts right within the cells. The blend of bar heights and color gradients can pack a powerful punch in your data storytelling arsenal. Don’t forget to experiment with the ‘align’ option, a handy tool that helps you position these bars within the cells just right, giving your visuals a polished look. Feel free to play around with the settings and find what clicks best with your data tales.

Code
(
    grouped.reset_index()
    .sort_values(by="total_bill")
    .style.format(precision=1, thousands=",", decimal=".")
    .bar(
        align="mean",
        cmap="bwr",
        height=50,
        width=60,
        props="width: 120px; border-right: 1px solid black;",
    )
)
Table 8: Bar Charts in Your Table with Alignment Set to ‘Mean’
  day smoker time total_bill tip size
3 Fri Yes Lunch 12.8 2.1 2.0
9 Thur No Lunch 15.2 2.1 2.0
1 Fri No Lunch 16.0 3.0 3.0
2 Fri Yes Dinner 16.3 3.0 2.0
10 Thur Yes Lunch 16.5 2.6 2.0
4 Sat No Dinner 17.8 2.8 2.0
6 Sun No Dinner 18.4 3.0 3.0
8 Thur No Dinner 18.8 3.0 2.0
5 Sat Yes Dinner 20.4 2.7 2.0
0 Fri No Dinner 22.5 3.2 2.0
7 Sun Yes Dinner 23.1 3.5 2.0
Code
(
    grouped.reset_index()
    .sort_values(by="total_bill")
    .style.format(precision=1, thousands=",", decimal=".")
    .bar(
        align=0,
        cmap="bwr",
        height=50,
        width=60,
        props="width: 120px; border-right: 1px solid black;",
    )
)
Table 9: Bar Charts in Your Table with Alignment Set using a float number
  day smoker time total_bill tip size
3 Fri Yes Lunch 12.8 2.1 2.0
9 Thur No Lunch 15.2 2.1 2.0
1 Fri No Lunch 16.0 3.0 3.0
2 Fri Yes Dinner 16.3 3.0 2.0
10 Thur Yes Lunch 16.5 2.6 2.0
4 Sat No Dinner 17.8 2.8 2.0
6 Sun No Dinner 18.4 3.0 3.0
8 Thur No Dinner 18.8 3.0 2.0
5 Sat Yes Dinner 20.4 2.7 2.0
0 Fri No Dinner 22.5 3.2 2.0
7 Sun Yes Dinner 23.1 3.5 2.0

Formatting

Below is an illustrative example of the myriad possibilities when it comes to fine-tuning the style of a DataFrame. In this code, we’ve undertaken various stylistic enhancements:

  1. Background Gradient: We’ve applied a nice background color gradient to the columns along the vertical axis.
  2. Caption: We’ve set a descriptive caption for the DataFrame.
  3. Visual Styling: We’ve specified text alignment and introduced hovering options to make highlighted values pop even more.
  4. Column Names Formatting: We’ve reformatted our column names for clarity and aesthetics.
  5. Column Hiding: We’ve concealed the ‘smoker’ column, decluttering our view.
  6. Index Hiding: We’ve also hidden the index labels for a cleaner look.
  7. Numerical Formatting: We’ve individually specified the numerical representation, including adding a dollar sign for currency values.

This demonstration underscores the virtually endless possibilities for customizing the appearance of your DataFrame. However, it’s important to note that the physical attributes set by the set_table_styles method won’t be exported to Excel, should you choose to do so. Just a handy tidbit to keep in mind.

Code
# Start by resetting the index and renaming columns with underscores
# Replace underscores with spaces for better readability
(
    grouped.reset_index().rename(columns=lambda x: x.replace("_", " "))
    # Sort the DataFrame by the 'total bill' column
    .sort_values(by="total bill")
    # Apply Pandas Styler to format the table
    .style
    # Apply background color gradient to columns along the vertical axis (axis=0)
    .background_gradient(cmap="viridis", axis=0)
    # Set a caption for the table
    .set_caption("Exploring Dining Trends: Bill Amounts, Tips, and Party Sizes")
    # Customize the table's visual styling
    .set_table_styles(
        [
            {
                "selector": "th.col_heading",
                "props": "text-align: center; font-size: 1.5em;",
            },
            {"selector": "td", "props": "text-align: center;"},
            {
                "selector": "td:hover",
                "props": "font-style: italic; color: black; font-weight:bold; background-color : #ffffb3;",
            },
        ],
        overwrite=False,
    )
    # Apply custom formatting to the index labels (convert to uppercase)
    .format_index(str.upper, axis=1)
    # Hide the 'smoker' column from the table
    .hide(subset=["smoker"], axis=1)
    # Hide the index label (row numbers)
    .hide(axis="index")
    # Format specific columns with dollar signs and one decimal place
    .format(
        {
            "total bill": "$ {:.1f}",
            "tip": "$ {:.1f}",
            "size": "{:.0f}",
        }
    )
)
Table 10: Formatting tables
DAY TIME TOTAL BILL TIP SIZE
Fri Lunch $ 12.8 $ 2.1 2
Thur Lunch $ 15.2 $ 2.1 2
Fri Lunch $ 16.0 $ 3.0 3
Fri Dinner $ 16.3 $ 3.0 2
Thur Lunch $ 16.5 $ 2.6 2
Sat Dinner $ 17.8 $ 2.8 2
Sun Dinner $ 18.4 $ 3.0 3
Thur Dinner $ 18.8 $ 3.0 2
Sat Dinner $ 20.4 $ 2.7 2
Fri Dinner $ 22.5 $ 3.2 2
Sun Dinner $ 23.1 $ 3.5 2

Defining Custom Styling

If the predefined styling methods don’t meet your needs, you have the flexibility to create your custom styling function and use it with your DataFrame. You can apply styling element by element using the .applymap() method, or you can work on columns or rows using the .apply() method. If you need to work specifically on column header rows or indexes, both .apply_index() and .applymap_index() are at your disposal.

Code
# Define a custom styling function that highlights 'Sun' in yellow
def highlight_sunday(value, color):
    return f"color: {color};" if value == "Sun" else None


# Sample 10 random rows from the DataFrame
(
    df.sample(10).style
    # Set formatting options for numeric values
    .format(precision=1, thousands=",", decimal=".")
    # Apply the highlight_sunday function to the 'day' column and set the color to yellow
    .applymap(highlight_sunday, color="yellow", subset=["day"])
    # Apply opacity styling to cells in the 'size' column where the value is less than or equal to 3
    .applymap(lambda value: "opacity: 50%;" if (value <= 3) else None, subset=["size"])
    # Apply red text color styling to cells in the 'tip' column where the value is less than or equal to 3
    .applymap(lambda value: "color:red;" if (value <= 3) else None, subset=["tip"])
    # Use apply_index to style index labels. If the label is 'tip', 'day', or 'size', set the color to white, otherwise black.
    .apply_index(
        lambda s: np.where(
            s.isin(["tip", "day", "size"]), "opacity: 100%;", "opacity: 10%;"
        ),
        axis=1,
    )
)
Table 11: Applying Custom Styling
  total_bill tip sex smoker day time size
179 34.6 3.5 Male Yes Sun Dinner 2
3 23.7 3.3 Male No Sun Dinner 2
83 32.7 5.0 Male Yes Thur Lunch 2
208 24.3 2.0 Male Yes Sat Dinner 2
53 9.9 1.6 Male No Sun Dinner 2
17 16.3 3.7 Male No Sun Dinner 3
6 8.8 2.0 Male No Sun Dinner 2
182 45.4 3.5 Male Yes Sun Dinner 3
48 28.6 2.0 Male No Sun Dinner 3
61 13.8 2.0 Male Yes Sat Dinner 2

Other Useful Features

Get a Clearer Overview with set_sticky

What if you find yourself facing a DataFrame with more columns than can comfortably fit on your screen, yet you still wish to inspect each column individually? In the past, you might have resorted to using pd.set_option('display.max_columns', xyz) to expand the display. However, there’s a much more elegant solution: set_sticky.

set_sticky introduces a clever CSS trick that permanently pins the index or column headers within a scrolling frame. In our case, although the ‘tips’ DataFrame doesn’t have an excessive number of columns, we’ve concatenated 10 DataFrames together to showcase the remarkable utility of set_sticky. As you scroll horizontally, you’ll notice that you can now conveniently inspect all the columns while the index remains firmly in place, thanks to the magic of set_sticky. Let’s explore this feature below.

Code
(pd.concat([df for i in range(10)], axis=1).head().style.set_sticky(axis="index"))
Table 12: Demonstrating set_sticky
  total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size total_bill tip sex smoker day time size
0 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2 16.990000 1.010000 Female No Sun Dinner 2
1 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3 10.340000 1.660000 Male No Sun Dinner 3
2 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3 21.010000 3.500000 Male No Sun Dinner 3
3 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2 23.680000 3.310000 Male No Sun Dinner 2
4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4 24.590000 3.610000 Female No Sun Dinner 4

Concatenating DataFrame Outputs

You can combine two or more Stylers if they have the same columns. This is particularly handy when presenting summary statistics for a DataFrame

Code
(
    df.groupby("day")[["total_bill", "tip", "size"]]
    .mean()
    .style.format(precision=1)
    .concat(
        df[["total_bill", "tip", "size"]]
        .agg(["mean", "median", "sum"])
        .style.format(precision=1)
        .relabel_index(["Average", "Median", "Sum"])
    )
)
Table 13: Demonstrating Easily Concatenating Different DataFrame Outputs
  total_bill tip size
day      
Fri 17.2 2.7 2.1
Sat 20.4 3.0 2.5
Sun 21.4 3.3 2.8
Thur 17.7 2.8 2.5
Average 19.8 3.0 2.6
Median 17.8 2.9 2.0
Sum 4827.8 731.6 627.0

Exporting

To save your styled DataFrame in Excel format, you can export it to an .xlsx file, preserving the styling. To do this, ensure you have the ‘openpyxl’ package installed.

And there you have it, folks! This concludes our brief exploration of the formatting options available right within Pandas. I hope you found this article insightful and enjoyable. Until next time, happy coding!