How to use the Lets-Plot library by JetBrains

Lets-Plot
ggplot2
Author

Adam Cseresznye

Published

August 5, 2023

When I embarked on my data science journey, due to my academics background I quickly gravitated towards the R programming language. Like many R novices, I began with Hadley Wickham’s R for Data Science book which introduced me to the wonderful ggplot2 library. As my interest in machine learning grew, I made the switch to Python. Nowadays, for most of my data plotting needs, I rely mainly on matplotlib or seaborn. Though I love these libraries, their multiple ways of accomplishing the same tasks can be a bit cumbersome and challenging to learn at first.

That’s why in this article, I’m excited to introduce you to the Lets-Plot library by JetBrains. It is the closest you can get to ggplot’s syntax while using Python. While some traditional Python users might find the syntax a bit unfamiliar initially, I’m here to make a case for this fantastic library and hopefully inspire you to incorporate it into your day-to-day data science activities.

To showcase (some of) the key features of Lets-Plot, we will be utilizing the penguins dataset 🐧 from Github.

Without further ado, let’s dive right in and discover the power and beauty of Lets-Plot! 🐍📊

Code
# Installation
# pip install lets-plot 
Code
# Import libraries
import numpy as np
import pandas as pd
from lets_plot import *
from lets_plot.mapping import as_discrete
LetsPlot.setup_html()
Code
address='https://gist.githubusercontent.com/slopp/ce3b90b9168f2f921784de84fa445651/raw/4ecf3041f0ed4913e7c230758733948bc561f434/penguins.csv'
df=pd.read_csv(address)
df.head()
rowid species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 1 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 2 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 3 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 4 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 5 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

Syntax Similarities: Lets-Plot and ggplot2

For our first exercise, I thought it would be beneficial to replicate a basic plot inspired by Hadley’s book. When comparing my code here with the one presented by him, you’ll notice that there is very little difference between the two. The syntax is nearly identical, making it a smooth transition from ggplot to Lets-Plot.

Now, let’s take a closer look at the code. In the ggplot function, we define the DataFrame we’ll be working with, and the aesthetic mappings are set at the global level. We assign the values for the x and y axes, as well as the color argument, which groups the data based on the categorical variable representing the three different penguin species: Adelie, Gentoo, and Chinstrap. This color parameter is quite similar to seaborn’s hue, making it easy for those familiar with seaborn to adapt to Lets-Plot seamlessly.

After the ggplot() function sets the global aesthetic mappings, the geom_point() function comes into play and draws the points defined by the x and y parameters, effectively creating a scatter plot. These points represent the data points from the penguins dataset, with x and y coordinates corresponding to the specified variables.

Additionally, we enhance the plot by using geom_smooth(method=‘lm’), which adds a smoothed conditional mean. The lm method stands for ‘linear model,’ indicating that the smoothing is based on a linear regression. This smoothed line helps reveal trends and patterns in the data, making it easier to observe any overall relationships between the variables.

Let’s continue exploring more of what Lets-Plot has in store for us! 📊🐧🌈

Code
(ggplot(df,
        aes(x='flipper_length_mm',
            y = 'body_mass_g',
            color='species'
           )
       )
 + geom_point() # Draw points defined by an x and y coordinate, as for a scatter plot.
 + geom_smooth(method='lm') # Add a smoothed conditional mean. ‘lm’ stands for 'linear model' as Smoothing method
) 

In the previous example, we highlighted the importance of placing the color parameter at the global level, which grouped the data by the three penguin species and showed separate regression lines for each group. However, if we prefer to depict the regression line for the entire dataset, regardless of the group association, we can do so just as easily. All we need to do is remove the color parameter from the aesthetics of the ggplot function and place it solely in the geom_point.

Additionally, to enhance the plot further, we can properly label the x and y axes, add a title and subtitle. With these simple adjustments, we can achieve the same output as Hadley’s original code, with little to no modification.

Code
(ggplot(df, 
        aes(x='flipper_length_mm',
            y = 'body_mass_g',
           )
       )
 + geom_point(aes(color='species', shape='species'))
 + geom_smooth(method='lm')
 + labs(
    title = "Body mass and flipper length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Flipper length (mm)", y = "Body mass (g)",
  ) 
 + scale_color_viridis() # lets-plots equivalent of the scale_color_colorblind()
) 

Visualizing Data Based on Categorical Variables

Lets-Plot provides numerous options to showcase our data using categorical variables. From bar plots, box plots, and violin plots to pie charts, the possibilities are diverse. You can check out their API for reference. Let’s explore some examples to demonstrate the versatility of Lets-Plot in visualizing categorical data.

Code
penguin_bar = (ggplot(df,aes(x='species'))
               + geom_bar()
              )

penguin_box = (ggplot(df,aes(x = 'species', y = 'body_mass_g'))
               + geom_boxplot()
              )

penguin_density = (ggplot(df,aes('body_mass_g', color='species', fill='species'))
                   + geom_density(alpha=0.5)
                  )

penguin_rel_frequency = (ggplot(df,aes(x = 'island', fill = 'species'))
                         + geom_bar(position='fill')
                        )
gggrid([penguin_bar, penguin_box, penguin_density, penguin_rel_frequency], ncol=2)

Incorporate Multiple Variables with facet_wrap

So far we’ve discovered how easy it is to plot data based on a single categorical variable. However, what if we want to depict relationships involving two or more categorical variables? That’s where facet_wrap comes in handy. This versatile function bears resemblance to similar functions found in seaborn or ggplot2 libraries.

To unlock the potential of facet_wrap, we simply need to define aesthetics, which can either be global or local to the mapping function. Then, we can use facet_wrap with the relevant categorical variable we want to visualize. It’s as simple as that!

Code
(ggplot(df, aes(x = 'flipper_length_mm', y = 'body_mass_g'))  
 + geom_point(aes(color = 'species', shape = 'species'), size = 4) 
 + facet_wrap('island', nrow=1)
 + labs(title = "Body mass and flipper length based on island",
        subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
        x = "Flipper length (mm)", y = "Body mass (g)",
       )
 + theme(plot_title=element_text(size=20,face='bold'))
 + ggsize(1000,500)
)

Reordering Categorical Variables Based On Statistics

When visualizing data, a task I frequently encounter is ordering categorical variables in either ascending or descending order, based on statistics like median or mean. In my previous point on “Visualizing Data Based on Categorical Variables,” you noticed that the boxplot displayed categories in an unordered manner. However, consider how we can present them in an ascending order, determined by the median. This not only enhances the aesthetics of the plot but also provides valuable insights into the relationships among the categories.

Code
(ggplot(df,aes(as_discrete('species', order=1, order_by='..middle..'), 
               y = 'body_mass_g'))
 + geom_boxplot()
)

By incorporating the as_discrete function, specifying the column, the ordering direction (1 for ascending, -1 for descending), and setting the order_by variable to middle (representing the median), the plot has become significantly more informative. This simple addition has allowed us to organize the categorical variables in a meaningful manner, improving the visualization’s clarity and aiding in the interpretation of relationships among the categories.

Chaining Pandas Code with Lets Plot Visualization

One of the best features of the pandas library is its remarkable customizability. With the help of the pd.pipe function, we can seamlessly integrate any of our own functions into method chains, as long as they return a DataFrame or Series. This opens up exciting possibilities to fully incorporate Lets-Plot into our code, just like pandas’ own built-in plotting functionality.

While Lets-Plot may be slightly more verbose than pandas plotting, it offers significantly more flexibility and freedom for customization. Not to mention that some may consider it visually more appealing. With Lets-Plot integrated into our pandas code, we can effortlessly create stunning and informative plots, making data analysis an even more enjoyable experience.

Code
(df
 .groupby('species')
 [['body_mass_g', 'flipper_length_mm']]
 .mean()
 .reset_index()
 .pipe(lambda df: (ggplot(df)
                   + geom_pie(aes(slice='body_mass_g', fill='species'), 
                              stat='identity',size=30, hole=0.2, stroke=1.0,
                              labels=layer_labels().line('@body_mass_g').format('@body_mass_g', '{.0f}').size(20)
                             )
                   + labs(title = "Body mass based on species",
                          subtitle = "Representing how Lets-Plot can be used with pd. pipe",
                          x = "", y = "",
                         )
                   + theme(axis='blank',
                          plot_title=element_text(size=20,face='bold'))
                   + ggsize(500,400)
                  )
 )
)

That’s a wrap on the Lets-Plot library! There’s so much more to explore and learn about this powerful tool. I hope you found this introduction helpful and consider integrating Lets-Plot into your daily data analysis routine.

Happy coding 🐍🖥️🔍🚀