Code
import pandas as pd
import numpy as np
Adam Cseresznye
August 27, 2023
As you wrap up your data analysis journey, you face a fun decision: how to share your discoveries effectively. Tables and graphs both have their moments to shine. Tables work great when you want your audience to spot exact values and compare them. But, here’s the twist: tables can look overwhelming at first, especially if they’re crammed with data.
But don’t worry! Styling and formatting are here to help. Pandas comes to the rescue. It lets you turn your data into stylish tables effortlessly.
In this article, we’ll explore some tricks to make your Pandas DataFrames look awesome and tell a clear data story. For the demonstrations, we’ll dive into Seaborn’s built-in “tips” dataset. This dataset is a nifty data frame with 244 rows and 7 variables, offering a peek into the world of tipping. This comprehensive collection includes variables like the tip amount in dollars, the bill total in dollars, the gender of the bill payer, the presence of smokers in the party, the day of the week, the time of day, and the party size. Ready to roll? Let’s jump right in!
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
109 | 14.31 | 4.00 | Female | Yes | Sat | Dinner | 2 |
42 | 13.94 | 3.06 | Male | No | Sun | Dinner | 2 |
44 | 30.40 | 5.60 | Male | No | Sun | Dinner | 4 |
37 | 16.93 | 3.07 | Female | No | Sat | Dinner | 3 |
40 | 16.04 | 2.24 | Male | No | Sat | Dinner | 3 |
In this article, we won’t cover every single feature of the Styler class because that might make it too long. Instead, we’ll concentrate on the practical aspects that can simplify your data analysis, improve data comprehension, and make your message clearer to your audience. So, instead of taking a deep dive, we’ll start with the low-hanging fruits and explore the built-in options that Pandas offers.
Before we dive in, let’s prepare an aggregated dataframe for our demonstrations. Here it is:
total_bill | tip | size | |||
---|---|---|---|---|---|
day | smoker | time | |||
Fri | No | Dinner | 22.490 | 3.250 | 2.0 |
Lunch | 15.980 | 3.000 | 3.0 | ||
Yes | Dinner | 16.320 | 3.000 | 2.0 | |
Lunch | 12.790 | 2.100 | 2.0 | ||
Sat | No | Dinner | 17.820 | 2.750 | 2.0 |
Yes | Dinner | 20.390 | 2.690 | 2.0 | |
Sun | No | Dinner | 18.430 | 3.020 | 3.0 |
Yes | Dinner | 23.100 | 3.500 | 2.0 | |
Thur | No | Dinner | 18.780 | 3.000 | 2.0 |
Lunch | 15.235 | 2.105 | 2.0 | ||
Yes | Lunch | 16.470 | 2.560 | 2.0 |
Check out these handy helpers: highlight_min
and highlight_max
. They’re your go-to for spotting the lowest and highest values in each column. It’s a speedy way to emphasize the most important data points in each category.
Also, please note the format(precision=1, thousands=",", decimal=".")
snippet, this is not exactly a built-in style feature but has everything to do with keeping those float numbers tidy. Pandas tends to display more decimal places than we often require, which can be a bit distracting. To tone it down a notch, we can lean on the format()
and format_index()
methods to fine-tune the precision. Trust me, it’s super useful!
total_bill | tip | size | |||
---|---|---|---|---|---|
day | smoker | time | |||
Fri | No | Dinner | 22.5 | 3.2 | 2.0 |
Lunch | 16.0 | 3.0 | 3.0 | ||
Yes | Dinner | 16.3 | 3.0 | 2.0 | |
Lunch | 12.8 | 2.1 | 2.0 | ||
Sat | No | Dinner | 17.8 | 2.8 | 2.0 |
Yes | Dinner | 20.4 | 2.7 | 2.0 | |
Sun | No | Dinner | 18.4 | 3.0 | 3.0 |
Yes | Dinner | 23.1 | 3.5 | 2.0 | |
Thur | No | Dinner | 18.8 | 3.0 | 2.0 |
Lunch | 15.2 | 2.1 | 2.0 | ||
Yes | Lunch | 16.5 | 2.6 | 2.0 |
Imagine you’re want to find the days when tips fell between 3 and 5 dollars. In this scenario, the highlight_between
method comes to the rescue. Don’t forget to use the subset
argument; it’s your trusty sidekick when you only want to work with selected columns.
day | smoker | time | total_bill | tip | size | |
---|---|---|---|---|---|---|
0 | Fri | No | Dinner | 22.5 | 3.2 | 2.0 |
1 | Fri | No | Lunch | 16.0 | 3.0 | 3.0 |
2 | Fri | Yes | Dinner | 16.3 | 3.0 | 2.0 |
3 | Fri | Yes | Lunch | 12.8 | 2.1 | 2.0 |
4 | Sat | No | Dinner | 17.8 | 2.8 | 2.0 |
5 | Sat | Yes | Dinner | 20.4 | 2.7 | 2.0 |
6 | Sun | No | Dinner | 18.4 | 3.0 | 3.0 |
7 | Sun | Yes | Dinner | 23.1 | 3.5 | 2.0 |
8 | Thur | No | Dinner | 18.8 | 3.0 | 2.0 |
9 | Thur | No | Lunch | 15.2 | 2.1 | 2.0 |
10 | Thur | Yes | Lunch | 16.5 | 2.6 | 2.0 |
Utilizing parameters such as q_left=0.05
, q_right=0.95
, axis=0
, and defining props='opacity: 10%;'
, we can highlight values residing outside the 5-95 percentile range.
total_bill | tip | size | |||
---|---|---|---|---|---|
day | smoker | time | |||
Fri | No | Dinner | 22.5 | 3.2 | 2.0 |
Lunch | 16.0 | 3.0 | 3.0 | ||
Yes | Dinner | 16.3 | 3.0 | 2.0 | |
Lunch | 12.8 | 2.1 | 2.0 | ||
Sat | No | Dinner | 17.8 | 2.8 | 2.0 |
Yes | Dinner | 20.4 | 2.7 | 2.0 | |
Sun | No | Dinner | 18.4 | 3.0 | 3.0 |
Yes | Dinner | 23.1 | 3.5 | 2.0 | |
Thur | No | Dinner | 18.8 | 3.0 | 2.0 |
Lunch | 15.2 | 2.1 | 2.0 | ||
Yes | Lunch | 16.5 | 2.6 | 2.0 |
We can show data trends using both the background_gradient
and text_gradient
methods. These methods introduce gradient-style background colors and text shading to our visualizations. To optimize their impact, it’s advisable to first arrange your data with the sort_values
method before applying the background_gradient
.
day | smoker | time | total_bill | tip | size | |
---|---|---|---|---|---|---|
3 | Fri | Yes | Lunch | 12.8 | 2.1 | 2.0 |
9 | Thur | No | Lunch | 15.2 | 2.1 | 2.0 |
1 | Fri | No | Lunch | 16.0 | 3.0 | 3.0 |
2 | Fri | Yes | Dinner | 16.3 | 3.0 | 2.0 |
10 | Thur | Yes | Lunch | 16.5 | 2.6 | 2.0 |
4 | Sat | No | Dinner | 17.8 | 2.8 | 2.0 |
6 | Sun | No | Dinner | 18.4 | 3.0 | 3.0 |
8 | Thur | No | Dinner | 18.8 | 3.0 | 2.0 |
5 | Sat | Yes | Dinner | 20.4 | 2.7 | 2.0 |
0 | Fri | No | Dinner | 22.5 | 3.2 | 2.0 |
7 | Sun | Yes | Dinner | 23.1 | 3.5 | 2.0 |
day | smoker | time | total_bill | tip | size | |
---|---|---|---|---|---|---|
3 | Fri | Yes | Lunch | 12.8 | 2.1 | 2.0 |
9 | Thur | No | Lunch | 15.2 | 2.1 | 2.0 |
1 | Fri | No | Lunch | 16.0 | 3.0 | 3.0 |
2 | Fri | Yes | Dinner | 16.3 | 3.0 | 2.0 |
10 | Thur | Yes | Lunch | 16.5 | 2.6 | 2.0 |
4 | Sat | No | Dinner | 17.8 | 2.8 | 2.0 |
6 | Sun | No | Dinner | 18.4 | 3.0 | 3.0 |
8 | Thur | No | Dinner | 18.8 | 3.0 | 2.0 |
5 | Sat | Yes | Dinner | 20.4 | 2.7 | 2.0 |
0 | Fri | No | Dinner | 22.5 | 3.2 | 2.0 |
7 | Sun | Yes | Dinner | 23.1 | 3.5 | 2.0 |
Let’s explore a technique for highlighting the significance of values by embedding bar charts right within the cells. The blend of bar heights and color gradients can pack a powerful punch in your data storytelling arsenal. Don’t forget to experiment with the ‘align’ option, a handy tool that helps you position these bars within the cells just right, giving your visuals a polished look. Feel free to play around with the settings and find what clicks best with your data tales.
day | smoker | time | total_bill | tip | size | |
---|---|---|---|---|---|---|
3 | Fri | Yes | Lunch | 12.8 | 2.1 | 2.0 |
9 | Thur | No | Lunch | 15.2 | 2.1 | 2.0 |
1 | Fri | No | Lunch | 16.0 | 3.0 | 3.0 |
2 | Fri | Yes | Dinner | 16.3 | 3.0 | 2.0 |
10 | Thur | Yes | Lunch | 16.5 | 2.6 | 2.0 |
4 | Sat | No | Dinner | 17.8 | 2.8 | 2.0 |
6 | Sun | No | Dinner | 18.4 | 3.0 | 3.0 |
8 | Thur | No | Dinner | 18.8 | 3.0 | 2.0 |
5 | Sat | Yes | Dinner | 20.4 | 2.7 | 2.0 |
0 | Fri | No | Dinner | 22.5 | 3.2 | 2.0 |
7 | Sun | Yes | Dinner | 23.1 | 3.5 | 2.0 |
day | smoker | time | total_bill | tip | size | |
---|---|---|---|---|---|---|
3 | Fri | Yes | Lunch | 12.8 | 2.1 | 2.0 |
9 | Thur | No | Lunch | 15.2 | 2.1 | 2.0 |
1 | Fri | No | Lunch | 16.0 | 3.0 | 3.0 |
2 | Fri | Yes | Dinner | 16.3 | 3.0 | 2.0 |
10 | Thur | Yes | Lunch | 16.5 | 2.6 | 2.0 |
4 | Sat | No | Dinner | 17.8 | 2.8 | 2.0 |
6 | Sun | No | Dinner | 18.4 | 3.0 | 3.0 |
8 | Thur | No | Dinner | 18.8 | 3.0 | 2.0 |
5 | Sat | Yes | Dinner | 20.4 | 2.7 | 2.0 |
0 | Fri | No | Dinner | 22.5 | 3.2 | 2.0 |
7 | Sun | Yes | Dinner | 23.1 | 3.5 | 2.0 |
Below is an illustrative example of the myriad possibilities when it comes to fine-tuning the style of a DataFrame. In this code, we’ve undertaken various stylistic enhancements:
This demonstration underscores the virtually endless possibilities for customizing the appearance of your DataFrame. However, it’s important to note that the physical attributes set by the set_table_styles
method won’t be exported to Excel, should you choose to do so. Just a handy tidbit to keep in mind.
# Start by resetting the index and renaming columns with underscores
# Replace underscores with spaces for better readability
(
grouped.reset_index().rename(columns=lambda x: x.replace("_", " "))
# Sort the DataFrame by the 'total bill' column
.sort_values(by="total bill")
# Apply Pandas Styler to format the table
.style
# Apply background color gradient to columns along the vertical axis (axis=0)
.background_gradient(cmap="viridis", axis=0)
# Set a caption for the table
.set_caption("Exploring Dining Trends: Bill Amounts, Tips, and Party Sizes")
# Customize the table's visual styling
.set_table_styles(
[
{
"selector": "th.col_heading",
"props": "text-align: center; font-size: 1.5em;",
},
{"selector": "td", "props": "text-align: center;"},
{
"selector": "td:hover",
"props": "font-style: italic; color: black; font-weight:bold; background-color : #ffffb3;",
},
],
overwrite=False,
)
# Apply custom formatting to the index labels (convert to uppercase)
.format_index(str.upper, axis=1)
# Hide the 'smoker' column from the table
.hide(subset=["smoker"], axis=1)
# Hide the index label (row numbers)
.hide(axis="index")
# Format specific columns with dollar signs and one decimal place
.format(
{
"total bill": "$ {:.1f}",
"tip": "$ {:.1f}",
"size": "{:.0f}",
}
)
)
DAY | TIME | TOTAL BILL | TIP | SIZE |
---|---|---|---|---|
Fri | Lunch | $ 12.8 | $ 2.1 | 2 |
Thur | Lunch | $ 15.2 | $ 2.1 | 2 |
Fri | Lunch | $ 16.0 | $ 3.0 | 3 |
Fri | Dinner | $ 16.3 | $ 3.0 | 2 |
Thur | Lunch | $ 16.5 | $ 2.6 | 2 |
Sat | Dinner | $ 17.8 | $ 2.8 | 2 |
Sun | Dinner | $ 18.4 | $ 3.0 | 3 |
Thur | Dinner | $ 18.8 | $ 3.0 | 2 |
Sat | Dinner | $ 20.4 | $ 2.7 | 2 |
Fri | Dinner | $ 22.5 | $ 3.2 | 2 |
Sun | Dinner | $ 23.1 | $ 3.5 | 2 |
If the predefined styling methods don’t meet your needs, you have the flexibility to create your custom styling function and use it with your DataFrame. You can apply styling element by element using the .applymap()
method, or you can work on columns or rows using the .apply()
method. If you need to work specifically on column header rows or indexes, both .apply_index()
and .applymap_index()
are at your disposal.
# Define a custom styling function that highlights 'Sun' in yellow
def highlight_sunday(value, color):
return f"color: {color};" if value == "Sun" else None
# Sample 10 random rows from the DataFrame
(
df.sample(10).style
# Set formatting options for numeric values
.format(precision=1, thousands=",", decimal=".")
# Apply the highlight_sunday function to the 'day' column and set the color to yellow
.applymap(highlight_sunday, color="yellow", subset=["day"])
# Apply opacity styling to cells in the 'size' column where the value is less than or equal to 3
.applymap(lambda value: "opacity: 50%;" if (value <= 3) else None, subset=["size"])
# Apply red text color styling to cells in the 'tip' column where the value is less than or equal to 3
.applymap(lambda value: "color:red;" if (value <= 3) else None, subset=["tip"])
# Use apply_index to style index labels. If the label is 'tip', 'day', or 'size', set the color to white, otherwise black.
.apply_index(
lambda s: np.where(
s.isin(["tip", "day", "size"]), "opacity: 100%;", "opacity: 10%;"
),
axis=1,
)
)
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
179 | 34.6 | 3.5 | Male | Yes | Sun | Dinner | 2 |
3 | 23.7 | 3.3 | Male | No | Sun | Dinner | 2 |
83 | 32.7 | 5.0 | Male | Yes | Thur | Lunch | 2 |
208 | 24.3 | 2.0 | Male | Yes | Sat | Dinner | 2 |
53 | 9.9 | 1.6 | Male | No | Sun | Dinner | 2 |
17 | 16.3 | 3.7 | Male | No | Sun | Dinner | 3 |
6 | 8.8 | 2.0 | Male | No | Sun | Dinner | 2 |
182 | 45.4 | 3.5 | Male | Yes | Sun | Dinner | 3 |
48 | 28.6 | 2.0 | Male | No | Sun | Dinner | 3 |
61 | 13.8 | 2.0 | Male | Yes | Sat | Dinner | 2 |
set_sticky
What if you find yourself facing a DataFrame with more columns than can comfortably fit on your screen, yet you still wish to inspect each column individually? In the past, you might have resorted to using pd.set_option('display.max_columns', xyz)
to expand the display. However, there’s a much more elegant solution: set_sticky
.
set_sticky
introduces a clever CSS trick that permanently pins the index or column headers within a scrolling frame. In our case, although the ‘tips’ DataFrame doesn’t have an excessive number of columns, we’ve concatenated 10 DataFrames together to showcase the remarkable utility of set_sticky
. As you scroll horizontally, you’ll notice that you can now conveniently inspect all the columns while the index remains firmly in place, thanks to the magic of set_sticky
. Let’s explore this feature below.
total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 | 16.990000 | 1.010000 | Female | No | Sun | Dinner | 2 |
1 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 | 10.340000 | 1.660000 | Male | No | Sun | Dinner | 3 |
2 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 | 21.010000 | 3.500000 | Male | No | Sun | Dinner | 3 |
3 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 | 23.680000 | 3.310000 | Male | No | Sun | Dinner | 2 |
4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 | 24.590000 | 3.610000 | Female | No | Sun | Dinner | 4 |
You can combine two or more Stylers if they have the same columns. This is particularly handy when presenting summary statistics for a DataFrame
total_bill | tip | size | |
---|---|---|---|
day | |||
Fri | 17.2 | 2.7 | 2.1 |
Sat | 20.4 | 3.0 | 2.5 |
Sun | 21.4 | 3.3 | 2.8 |
Thur | 17.7 | 2.8 | 2.5 |
Average | 19.8 | 3.0 | 2.6 |
Median | 17.8 | 2.9 | 2.0 |
Sum | 4827.8 | 731.6 | 627.0 |
To save your styled DataFrame in Excel format, you can export it to an .xlsx file, preserving the styling. To do this, ensure you have the ‘openpyxl’ package installed.
And there you have it, folks! This concludes our brief exploration of the formatting options available right within Pandas. I hope you found this article insightful and enjoyable. Until next time, happy coding!
---
title: Mastering Pandas DataFrame Styling for a Stunning Presentation
author: Adam Cseresznye
date: '2023-08-27'
categories:
- Pandas
jupyter: python3
toc: true
format:
html:
code-fold: true
code-tools: true
---
![Image by Mika Baumeister](https://images.unsplash.com/photo-1529078155058-5d716f45d604?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=2069&q=80){fig-align="center" width=50%}
As you wrap up your data analysis journey, you face a fun decision: how to share your discoveries effectively. Tables and graphs both have their moments to shine. Tables work great when you want your audience to spot exact values and compare them. But, here's the twist: tables can look overwhelming at first, especially if they're crammed with data.
But don't worry! Styling and formatting are here to help. Pandas comes to the rescue. It lets you turn your data into stylish tables effortlessly.
In this article, we'll explore some tricks to make your Pandas DataFrames look awesome and tell a clear data story. For the demonstrations, we'll dive into Seaborn's built-in "tips" dataset. This dataset is a nifty data frame with 244 rows and 7 variables, offering a peek into the world of tipping. This comprehensive collection includes variables like the tip amount in dollars, the bill total in dollars, the gender of the bill payer, the presence of smokers in the party, the day of the week, the time of day, and the party size. Ready to roll? Let's jump right in!
# Setup
```{python}
#| tags: []
import pandas as pd
import numpy as np
```
```{python}
#| label: tbl-table1
#| tbl-cap: Original table
#| tags: []
df = pd.read_csv(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv"
)
df.sample(5)
```
# Built-in styling
In this article, we won't cover every single feature of the [Styler class](https://pandas.pydata.org/docs/user_guide/style.html#Table-Visualization) because that might make it too long. Instead, we'll concentrate on the practical aspects that can simplify your data analysis, improve data comprehension, and make your message clearer to your audience. So, instead of taking a deep dive, we'll start with the low-hanging fruits and explore the built-in options that Pandas offers.
Before we dive in, let's prepare an aggregated dataframe for our demonstrations. Here it is:
```{python}
#| label: tbl-table2
#| tbl-cap: How the Data Appears After Grouping and Calculating Medians (Before Pandas Styling)
#| tags: []
grouped = df.groupby(["day", "smoker", "time"]).median(numeric_only=True)
grouped
```
## Highlighting minimum and maximum values
Check out these handy helpers: `highlight_min` and `highlight_max`. They're your go-to for spotting the lowest and highest values in each column. It's a speedy way to emphasize the most important data points in each category.
Also, please note the `format(precision=1, thousands=",", decimal=".")` snippet, this is not exactly a built-in style feature but has everything to do with keeping those float numbers tidy. Pandas tends to display more decimal places than we often require, which can be a bit distracting. To tone it down a notch, we can lean on the `format()` and `format_index()` methods to fine-tune the precision. Trust me, it's super useful!
```{python}
#| label: tbl-table3
#| tbl-cap: Highlighting Maximum and Minimum Values with Pandas Styler
#| tags: []
(
grouped.style.format(precision=1, thousands=",", decimal=".")
.highlight_max(
axis=0, props="color:white; font-weight:bold; background-color:green;"
)
.highlight_min(axis=0, props="color:white; font-weight:bold; background-color:red;")
)
```
## Emphasizing Values Within a Range
Imagine you're want to find the days when tips fell between 3 and 5 dollars. In this scenario, the `highlight_between` method comes to the rescue. Don't forget to use the `subset` argument; it's your trusty sidekick when you only want to work with selected columns.
```{python}
#| label: tbl-table4
#| tbl-cap: Highlighting Data within a Specified Range Using Pandas Styler
#| tags: []
(
grouped.reset_index()
.style.format(precision=1, thousands=",", decimal=".")
.highlight_between(
left=3,
right=5,
subset=["tip"],
axis=1,
props="color:white; font-weight:bold; background-color:purple;",
)
)
```
## Highlight Column-Wise Outliers
Utilizing parameters such as `q_left=0.05`, `q_right=0.95`, `axis=0`, and defining `props='opacity: 10%;'`, we can highlight values residing outside the 5-95 percentile range.
```{python}
#| label: tbl-table5
#| tbl-cap: Highlighting Outliers
#| tags: []
(
grouped.style.format(precision=1, thousands=",", decimal=".").highlight_quantile(
q_left=0.05, q_right=0.95, axis=0, props="opacity: 10%;"
)
)
```
## Spot trends Using Color Gradients
We can show data trends using both the `background_gradient` and `text_gradient` methods. These methods introduce gradient-style background colors and text shading to our visualizations. To optimize their impact, it's advisable to first arrange your data with the `sort_values` method before applying the `background_gradient`.
```{python}
#| label: tbl-table6
#| tbl-cap: Unveiling Data Trends with Pandas Styler's `background_gradient`
#| tags: []
(
grouped.reset_index()
.sort_values(by="total_bill")
.style.format(precision=1, thousands=",", decimal=".")
.background_gradient(cmap="viridis", axis=0)
)
```
```{python}
#| label: tbl-table7
#| tbl-cap: Unveiling Data Trends with Pandas Styler's `text_gradient`
#| tags: []
(
grouped.reset_index()
.sort_values(by="total_bill")
.style.format(precision=1, thousands=",", decimal=".")
.text_gradient(cmap="bwr", axis=0)
)
```
## Display Bar Charts within Your Table
Let's explore a technique for highlighting the significance of values by embedding bar charts right within the cells. The blend of bar heights and color gradients can pack a powerful punch in your data storytelling arsenal. Don't forget to experiment with the 'align' option, a handy tool that helps you position these bars within the cells just right, giving your visuals a polished look. Feel free to play around with the settings and find what clicks best with your data tales.
```{python}
#| label: tbl-table8
#| tbl-cap: Bar Charts in Your Table with Alignment Set to 'Mean'
#| tags: []
(
grouped.reset_index()
.sort_values(by="total_bill")
.style.format(precision=1, thousands=",", decimal=".")
.bar(
align="mean",
cmap="bwr",
height=50,
width=60,
props="width: 120px; border-right: 1px solid black;",
)
)
```
```{python}
#| label: tbl-table9
#| tbl-cap: Bar Charts in Your Table with Alignment Set using a float number
#| tags: []
(
grouped.reset_index()
.sort_values(by="total_bill")
.style.format(precision=1, thousands=",", decimal=".")
.bar(
align=0,
cmap="bwr",
height=50,
width=60,
props="width: 120px; border-right: 1px solid black;",
)
)
```
## Formatting
Below is an illustrative example of the myriad possibilities when it comes to fine-tuning the style of a DataFrame. In this code, we've undertaken various stylistic enhancements:
1. **Background Gradient**: We've applied a nice background color gradient to the columns along the vertical axis.
2. **Caption**: We've set a descriptive caption for the DataFrame.
3. **Visual Styling**: We've specified text alignment and introduced hovering options to make highlighted values pop even more.
4. **Column Names Formatting**: We've reformatted our column names for clarity and aesthetics.
5. **Column Hiding**: We've concealed the 'smoker' column, decluttering our view.
6. **Index Hiding**: We've also hidden the index labels for a cleaner look.
7. **Numerical Formatting**: We've individually specified the numerical representation, including adding a dollar sign for currency values.
This demonstration underscores the virtually endless possibilities for customizing the appearance of your DataFrame. However, it's important to note that the physical attributes set by the `set_table_styles` method won't be exported to Excel, should you choose to do so. Just a handy tidbit to keep in mind.
```{python}
#| label: tbl-table10
#| tbl-cap: Formatting tables
#| tags: []
# Start by resetting the index and renaming columns with underscores
# Replace underscores with spaces for better readability
(
grouped.reset_index().rename(columns=lambda x: x.replace("_", " "))
# Sort the DataFrame by the 'total bill' column
.sort_values(by="total bill")
# Apply Pandas Styler to format the table
.style
# Apply background color gradient to columns along the vertical axis (axis=0)
.background_gradient(cmap="viridis", axis=0)
# Set a caption for the table
.set_caption("Exploring Dining Trends: Bill Amounts, Tips, and Party Sizes")
# Customize the table's visual styling
.set_table_styles(
[
{
"selector": "th.col_heading",
"props": "text-align: center; font-size: 1.5em;",
},
{"selector": "td", "props": "text-align: center;"},
{
"selector": "td:hover",
"props": "font-style: italic; color: black; font-weight:bold; background-color : #ffffb3;",
},
],
overwrite=False,
)
# Apply custom formatting to the index labels (convert to uppercase)
.format_index(str.upper, axis=1)
# Hide the 'smoker' column from the table
.hide(subset=["smoker"], axis=1)
# Hide the index label (row numbers)
.hide(axis="index")
# Format specific columns with dollar signs and one decimal place
.format(
{
"total bill": "$ {:.1f}",
"tip": "$ {:.1f}",
"size": "{:.0f}",
}
)
)
```
# Defining Custom Styling
If the predefined styling methods don't meet your needs, you have the flexibility to create your custom styling function and use it with your DataFrame. You can apply styling element by element using the `.applymap()` method, or you can work on columns or rows using the `.apply()` method. If you need to work specifically on column header rows or indexes, both `.apply_index()` and `.applymap_index()` are at your disposal.
```{python}
#| label: tbl-table11
#| tbl-cap: Applying Custom Styling
#| tags: []
# Define a custom styling function that highlights 'Sun' in yellow
def highlight_sunday(value, color):
return f"color: {color};" if value == "Sun" else None
# Sample 10 random rows from the DataFrame
(
df.sample(10).style
# Set formatting options for numeric values
.format(precision=1, thousands=",", decimal=".")
# Apply the highlight_sunday function to the 'day' column and set the color to yellow
.applymap(highlight_sunday, color="yellow", subset=["day"])
# Apply opacity styling to cells in the 'size' column where the value is less than or equal to 3
.applymap(lambda value: "opacity: 50%;" if (value <= 3) else None, subset=["size"])
# Apply red text color styling to cells in the 'tip' column where the value is less than or equal to 3
.applymap(lambda value: "color:red;" if (value <= 3) else None, subset=["tip"])
# Use apply_index to style index labels. If the label is 'tip', 'day', or 'size', set the color to white, otherwise black.
.apply_index(
lambda s: np.where(
s.isin(["tip", "day", "size"]), "opacity: 100%;", "opacity: 10%;"
),
axis=1,
)
)
```
# Other Useful Features
## Get a Clearer Overview with `set_sticky`
What if you find yourself facing a DataFrame with more columns than can comfortably fit on your screen, yet you still wish to inspect each column individually? In the past, you might have resorted to using `pd.set_option('display.max_columns', xyz)` to expand the display. However, there's a much more elegant solution: `set_sticky`.
`set_sticky` introduces a clever CSS trick that permanently pins the index or column headers within a scrolling frame. In our case, although the 'tips' DataFrame doesn't have an excessive number of columns, we've concatenated 10 DataFrames together to showcase the remarkable utility of `set_sticky`. As you scroll horizontally, you'll notice that you can now conveniently inspect all the columns while the index remains firmly in place, thanks to the magic of `set_sticky`. Let's explore this feature below.
```{python}
#| label: tbl-table12
#| tbl-cap: Demonstrating `set_sticky`
#| tags: []
(pd.concat([df for i in range(10)], axis=1).head().style.set_sticky(axis="index"))
```
## Concatenating DataFrame Outputs
You can combine two or more Stylers if they have the same columns. This is particularly handy when presenting summary statistics for a DataFrame
```{python}
#| label: tbl-table13
#| tbl-cap: Demonstrating Easily Concatenating Different DataFrame Outputs
#| tags: []
(
df.groupby("day")[["total_bill", "tip", "size"]]
.mean()
.style.format(precision=1)
.concat(
df[["total_bill", "tip", "size"]]
.agg(["mean", "median", "sum"])
.style.format(precision=1)
.relabel_index(["Average", "Median", "Sum"])
)
)
```
# Exporting
To save your styled DataFrame in Excel format, you can export it to an .xlsx file, preserving the styling. To do this, ensure you have the 'openpyxl' package installed.
And there you have it, folks! This concludes our brief exploration of the formatting options available right within Pandas. I hope you found this article insightful and enjoyable. Until next time, happy coding!