The Ultimate Speed Test: Pandas vs Polars

Polars
Pandas
Author

Adam Cseresznye

Published

August 13, 2023

Once I shared my first article about “Getting Started with Polars” I started thinking about something exciting: comparing the speed of Polars and Pandas. This curiosity was sparked by all the buzz around the brand-new Pandas 2.0 release, promising lightning-fast performance. Pandas 2.0 was announced to come packed with cool features, including the addition of Apache Arrow (pyarrow) as its backing memory format. The big perk of Apache Arrow is that it makes operations speedier and more memory-friendly. Naturally, this got me wondering: how does Pandas 2.0 measure up against Polars? Let’s dive in and find out!

Warning

Keep in mind: occasionally, I refer to some performance difference figures. Even though I repeated these experiments 100 times each, because computations can have a bit of randomness, there might be slight variations in the exact numbers.

Setup

Code
import pandas as pd
import polars as pl
import polars.selectors as cs

import numpy as np
import timeit
import random

import plotly.io as pio
import plotly.express as px

pio.templates.default = "presentation"

print(pd.__version__)
print(pl.__version__)
2.0.3
0.18.11
Note

Throughout this article, we’ll be working with the latest versions of the packages: Pandas 2.0.3 and Polars 0.18.11.

Generate the synthetic data

For the purpose of benchmarking, we’re going to create our own dataset. To ensure simplicity and uniformity, we’ll generate a dataframe containing columns like name, year of birth, and city of residence – pretty straightforward stuff. We’re aiming for a dataset of 10,000 rows. While we could certainly run this benchmark on a dataframe with more than a million rows, our intention is to provide a realistic example. The code we’ll provide allows you to effortlessly explore various scenarios according to your requirements. Feel free to use it or modify it if you want to perform your own test.

Code
def create_sample_dataframe(n_rows: int, name: str, seed: int = 42) -> pl.DataFrame:
    """
    Create a sample Polars DataFrame with the specified number of rows.

    This function generates random data for several columns and creates a Polars
    DataFrame with the specified number of rows. The generated data includes information
    about names, birth years, cities, zip codes, incomes, marital status, number of children,
    and car brands. The resulting DataFrame is written to CSV and Parquet files named
    'sample.csv' and '{name}.parquet', respectively.

    Parameters
    ----------
    n_rows : int
        The number of rows to generate in the sample DataFrame.
    name : str
        The name to be used when writing the Parquet file.
    seed : int, optional
        The seed to use when generating random data. Default is 42.

    Returns
    -------
    pl.DataFrame
        The first 5 rows of the generated sample DataFrame.
    """
    random.seed(seed)
    np.random.seed(seed)

    data = {
        "name": np.random.choice(["Alice", "Bob", "Charlie", "David", "Eva"], n_rows),
        "born": np.random.randint(1950, 1990, size=n_rows),
        "city": np.random.choice(
            ["Shanghai", "San Francisco", "London", "Munich", "Mumbai"], n_rows
        ),
        "zip_code": np.random.randint(10000, 99999, size=n_rows),
        "income": np.random.normal(50000, 10000, size=n_rows),
        "is_married": np.random.choice([True, False], n_rows),
        "children": np.random.randint(0, 5, size=n_rows),
        "car": np.random.choice(["Ford", "BMW", "Toyota", "Bentley", "Mini"], n_rows),
    }
    temp_df = pl.DataFrame(data)
    temp_df.write_csv("sample.csv")
    temp_df.write_parquet(f"{name}.parquet")
    return temp_df.head()


create_sample_dataframe(n_rows=10_000, name="sample", seed=1)
shape: (5, 8)
name born city zip_code income is_married children car
str i32 str i32 f64 bool i32 str
"David" 1953 "Shanghai" 35716 69693.044244 false 4 "Bentley"
"Eva" 1984 "London" 30931 30547.027839 true 1 "Toyota"
"Alice" 1979 "San Francisco" 85967 63489.882712 false 1 "Mini"
"Bob" 1969 "San Francisco" 64527 56030.692112 false 1 "Toyota"
"David" 1979 "Munich" 53316 51118.005552 true 3 "Ford"

Define helper class

In order to streamline our code, we’ll create a convenient helper class featuring two essential methods: get_time and get_figure. The get_time method leverages the timeit.repeat function to accurately gauge the execution time of our code. In this context, we’ll set the parameters number = 1 and repeat = 100. This configuration runs the operations 100 times, allowing us to derive key statistics like the mean, median, and standard deviation.

On the other hand, the get_figure method takes these gathered results and generates a visually appealing image using Plotly.

To bring it all together and keep things simple, we’ve designed the run_test function. This function orchestrates the entire process, seamlessly merging the steps outlined above.

Code
class Timeit:
    """
    A class for measuring the execution time of multiple functions.
    This class provides methods for measuring the execution time of
    multiple functions using the `timeit` module. The functions to be
    tested are passed to the constructor as a dictionary, where the
    keys are the names of the functions and the values are the functions
    themselves. The `number` and `repeat` parameters control how many times
    each function is executed and how many times the timing is repeated,
    respectively.

    The `get_time` method measures the execution time of each function and
    stores the results in a dictionary, where the keys are the names of
    the functions and the values are lists of execution times. The `get_figure`
    method generates a box plot of the execution times using the `plotly.express.box`
    function.

    Parameters
    ----------
    func_dict : dict
        A dictionary of functions to test, where the keys are the names of the
        functions and the values are the functions themselves.
    number : int, optional
        The number of times to execute each function in each timing run (default is 1).
    repeat : int, optional
        The number of times to repeat each timing run (default is 20).

    Attributes
    ----------
    results : dict or None
        A dictionary of execution times for each function, where the keys are the names
        of the functions and the values are lists of execution times. This attribute is
        `None` until `get_time` is called.

    Methods
    -------
    get_time()
        Measure the execution time of each function and store the results in the `results` attribute.
    get_figure()
        Generate a box plot of the execution times using `plotly.express.box`.
    """

    def __init__(self, func_dict, number=1, repeat=100):
        self.func_dict = func_dict
        self.number = number
        self.repeat = repeat
        self.results = None

    def get_time(self):
        """
        Measure the execution time of each function and store the results.
        This method uses the `timeit.repeat` function to measure the execution time of each function
        in `func_dict`. The results are stored in a dictionary, where the keys are the names of
        the functions and the values are lists of execution times. This dictionary is also stored
        in the `results` attribute of the `Timeit` object.

        Returns
        -------
        dict
            A dictionary of execution times for each function.
        """
        results = {}
        for k, v in self.func_dict.items():
            results[k] = timeit.repeat(v, number=self.number, repeat=self.repeat)
        self.results = results
        return results

    def get_figure(self):
        """
        Generate a box plot of the execution times.
        This method uses `plotly.express.box` to generate a box plot of the execution times
        stored in `results`.If `results` is `None`, this method raises a `ValueError`.

        Returns
        -------
        plotly.graph_objs.Figure
            A box plot figure object.

        Raises
        ------
        ValueError
            If no results are available (i.e., if `get_time` has not been called).
        """
        if self.results is None:
            raise ValueError("No results available. Please run get_time first.")
        df = pd.DataFrame(self.results)
        mean_times = df.median() * 1000
        title = f'Median Execution Times: <br><sup>{", ".join([f"{k}={v:.6f}ms" for k,v in mean_times.items()])}</sup>'
        return px.box(
            df,
            points="all",
            labels={"value": "time (sec)", "variable": ""},
            title=title,
            width=600,
            height=500,
        )


def run_test(test: dict, name: str):
    """
    Run a timing test on a dictionary of functions using the Timeit class and
    return the results as a DataFrame.

    This function takes a dictionary of functions as input, where the keys are
    the names of the functions and the values are the functions themselves. It creates
    a Timeit object with this dictionary and uses it to measure the execution time of
    each function. The results are then displayed as a box plot using the `get_figure`
    method of the Timeit object and returned as a Pandas DataFrame with columns renamed
    according to the `name` argument.

    Parameters
    ----------
    test : dict
        A dictionary of functions to test, where the keys are the names of the functions
        and the values are the functions themselves.
    name : str
        A string used to rename the columns of the returned DataFrame.

    Returns
    -------
    pd.DataFrame
        A DataFrame containing the results of the timing test, with columns renamed
        according to the `name` argument.
    """

    # Create a Timeit object
    t = Timeit(test)
    # Measure the execution time of the functions
    results = t.get_time()
    # Generate a box plot of the execution times
    fig = t.get_figure()
    fig.show()

    return pd.DataFrame(results).rename(
        columns={"Pandas": f"{name}_pandas", "Polars": f"{name}_polars"}
    )

Reading in data From a CSV File and parquet

CSV

Reading CSV files from disk is a task that data scientists often find themselves doing. Now, let’s see how these two libraries compare for this particular job. To maximize the blazing-fast data handling capabilities of PyArrow, we’ll equip Pandas with the engine="pyarrow" and dtype_backend="pyarrow" arguments. Let’s see how these choices shape the performance!

Code
test_dict = {
    "Pandas": lambda: pd.read_csv(
        "sample.csv", engine="pyarrow", dtype_backend="pyarrow"
    ),
    "Polars": lambda: pl.read_csv("sample.csv"),
}
read_csv = run_test(test_dict, "Read csv")
Figure 1: Reading in data From a CSV File

For the sake of comparison, we’ll also demonstrate the timeit function invoked using Jupyter cell magic. You’ll notice that the numbers generated this way are quite closely aligned with ours.

Code
%%timeit
pd.read_csv("sample.csv", engine="pyarrow", dtype_backend="pyarrow")
6.33 ms ± 234 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Code
%%timeit
pl.read_csv("sample.csv")
3.01 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

parquet

Now, let’s read the data in Parquet format.

Code
test_dict = {
    "Pandas": lambda: pd.read_parquet(
        "sample.parquet", engine="pyarrow", dtype_backend="pyarrow"
    ),
    "Polars": lambda: pl.read_parquet("sample.parquet"),
}
read_parquet = run_test(test_dict, "Read parquet")
Figure 2: Reading in data From a parquet File
Note

Polars unquestionably wins this round, it can boast a speed advantage of 2 to 4 times over Pandas.

Selecting Columns

Alright, let’s spice things up a bit and select some columns to see which library races ahead in terms of speed!

Files awaiting in-memory reading

A clever approach to conserve memory and enhance speed involves reading only the columns essential for operations. Consider a scenario where we’re interested in displaying just the names from this dataset. The big question now: how do these libraries measure up in terms of speed? Let’s find out!

Code
test_dict = {
    "Pandas": lambda: pd.read_csv(
        "sample.csv", engine="pyarrow", dtype_backend="pyarrow", usecols=["name"]
    ),
    "Polars": lambda: pl.scan_csv("sample.csv").select(pl.col("name")).collect(),
}
select_col_not_in_memory = run_test(test_dict, "Select column (not in memory)")
Figure 3: Selecting Columns from a File Not Yet in Memory

File is in memory

As anticipated, Polars continues to showcase its swiftness. It’s worth highlighting the usage of the lazy and collect methods in Polars. These nifty tools grant us access to the library’s clever query optimization techniques, which play a pivotal role in significantly enhancing performance. OK, one step further: suppose our files are already loaded into memory. Would there still be a distinction in performance under this circumstance?

Code
df_pandas = pd.read_csv("sample.csv", engine="pyarrow", dtype_backend="pyarrow")
df_polars = pl.read_csv("sample.csv")
Code
test_dict = {
    "Pandas": lambda: df_pandas.loc[:, "name"],
    "Polars": lambda: df_polars.lazy().select(pl.col("name")).collect(),
}
select_col_in_memory = run_test(test_dict, "Select column")
Figure 4: Selecting Columns from a File Already in Memory
Note

While Polars showed a significant speed advantage for tasks involving pre-read files, both libraries perform similarly when the files are already in memory.

Filtering Rows

Now, let’s explore the scenario where we filter our dataset based on one or more column values

Based on one condition

For our simple scenario, we’ll be narrowing down our focus to filter data based on individuals with the name “David”.

Code
test_dict = {
    "Pandas": lambda: (df_pandas.query("name=='David'")),
    "Polars": lambda: (df_polars.lazy().filter((pl.col("name") == "David")).collect()),
}
filter_row_one_condition = run_test(test_dict, "Filter (simple)")
Figure 5: Filtering Rows Based on One Condition

Based on multiple conditions

Now, for a more intricate challenge, we’re going to dive into querying the data to extract individuals who meet specific criteria: those named David, born after 1980, residing in a city other than London, married, and with three children.

Code
test_dict = {
    "Pandas": lambda: (
        df_pandas.query(
            "name=='David' and born>1980 and city != 'London' or is_married == True and children >= 3"
        )
    ),
    "Polars": lambda: (
        df_polars.lazy()
        .filter(
            (pl.col("name") == "David")
            & (pl.col("born") > 1980)
            & (pl.col("city") != "London")
            | (pl.col("is_married") == True) & (pl.col("children") >= 3)
        )
        .collect()
    ),
}
filter_row_multiple_condition = run_test(test_dict, "Filter (complex)")
Figure 6: Filtering Rows Based on Multiple Condition
Note

Both libraries tackled this challenge quite well, yet Pandas struggled to keep pace with Polars. It’s intriguing to observe that while Pandas required nearly twice the time for the more intricate task, Polars managed to complete it in almost the same amount of time. Parallelization in action.

Performing operations on columns

Now, let’s roll up our sleeves and dive into performing some operations on the columns.

Single operation

As a single operation, we’ll simply calculate the century in which these individuals were born.

Code
test_dict = {
    "Pandas": lambda: (df_pandas.assign(born=lambda df: df.born.div(100).round())),
    "Polars": lambda: (
        df_polars.lazy().with_columns((pl.col("born") / 100).round()).collect()
    ),
}
operate_one_column = run_test(test_dict, "Operate (one column)")
Figure 7: Performing a Singme Operation on a Column

Multiple operations

Let’s also explore what happens when performing multiple operations on the columns. We’ll mix things up with some string operations, mapping, and math calculations to see how these libraries handle it!

Code
test_dict = {
    "Pandas": lambda: (
        df_pandas.assign(
            born=lambda df: df.born.div(100).round(),
            name=lambda df: df.name.str.lower(),
            city=lambda df: df.city.str.upper(),
            zip_code=lambda df: df.zip_code.mul(2),
            income=lambda df: df.income.div(10).astype("str"),
            is_married=lambda df: df.is_married.map({False: 0, True: 1}),
            children=lambda df: df.children.astype("bool"),
            car=lambda df: df.car.str[0],
        )
    ),
    "Polars": lambda: (
        df_polars.lazy()
        .with_columns(
            [
                (pl.col("born") / 100).round(),
                pl.col("name").str.to_lowercase(),
                pl.col("city").str.to_uppercase(),
                pl.col("zip_code") * 2,
                (pl.col("income") / 10).cast(pl.Utf8),
                pl.col("is_married").map_dict({False: 0, True: 1}),
                pl.col("children").cast(pl.Boolean),
                pl.col("car").str.slice(0, length=1),
            ]
        )
        .collect()
    ),
}
operate_multiple_column = run_test(test_dict, "Operate (more columns)")
Figure 8: Perfrming a Multiple Operation on Columns
Note

Once again, Polars takes the lead. While both libraries required more time for the task involving multiple operations, Polars demonstrated superior scalability in this scenario.

Concatenating Data

Now, let’s turn our attention to concatenating two datasets. Let the merging begin!

Code
df_pandas2 = df_pandas.copy(deep=True)
df_polars2 = df_polars.clone()
Code
test_dict = {
    "Pandas": lambda: pd.concat([df_pandas, df_pandas2], axis=0),
    "Polars": lambda: pl.concat([df_polars, df_polars2], how="vertical"),
}
concatenate = run_test(test_dict, "Concatenate")
Figure 9: Concatenating Dataframes
Note

Once more, Polars shines with a remarkable speed advantage.

Aggregation

Simple

Time to shift our attention to aggregation. First up, a simple task: let’s calculate the mean income based on names. Then, for a bit more complexity, we’ll dive into computing statistics involving the income, children, and car columns. Things are about to get interesting!

Code
test_dict = {
    "Pandas": lambda: (df_pandas.groupby("name").income.mean()),
    "Polars": lambda: (df_polars.lazy().groupby("name").mean().collect()),
}
aggregate_simple = run_test(test_dict, "Aggregate (simple)")
Figure 10: Performing a simple aggregation

More complex

Code
test_dict = {
    "Pandas": lambda: (
        df_pandas.groupby(["name", "car", "is_married"]).agg(
            born_min=("born", min),
            born_max=("born", max),
            income_mean=("income", np.mean),
            income_median=("income", np.median),
            children_mean=("children", np.mean),
            car_count=("car", "count"),
        )
    ),
    "Polars": lambda: (
        df_polars.lazy()
        .groupby(["name", "car", "is_married"])
        .agg(
            [
                pl.col("born").min().alias("born_min"),
                pl.col("born").max().alias("born_max"),
                pl.col("income").mean().alias("income_mean"),
                pl.col("income").median().alias("income_median"),
                pl.col("children").mean().alias("children_mean"),
                pl.col("car").count().alias("car_count"),
            ]
        )
        .collect()
    ),
}
aggregate_complex = run_test(test_dict, "Aggregate (complex)")
Figure 11: Performing a complex aggregation
Note

While Pandas showcased noteworthy speed for the simple aggregation, the more intricate task exposed significant disparities between the two libraries. Polars took a commanding lead in this scenario, presenting a considerably faster performance compared to Pandas.

Whole workflow

We’re going to tackle this in two versions. First, we’ll use pl.read_csv for Polars, assessing the time it takes to complete the entire workflow on a DataFrame already in memory. Then, for the second version, we’ll employ pl.scan_csv for Polars. This nifty function lets us lazily read data from a CSV file or multiple files using glob patterns.

The cool part? With pl.scan_csv, the query optimizer can push down predicates and projections to the scan level. This nifty move has the potential to cut down on memory overhead. Let’s dive into both versions and see how they stack up!

Using pl.read_csv

Code
test_dict = {
    "Pandas": lambda: (
        df_pandas.loc[:, lambda df: ~df.columns.isin(["is_married"])]
        .query("name=='David' and born>1980 and city != 'London' and children >= 3")
        .assign(
            born=lambda df: df.born.div(100).round(),
            name=lambda df: df.name.str.lower(),
            city=lambda df: df.city.str.upper(),
            zip_code=lambda df: df.zip_code.mul(2),
            income=lambda df: df.income.div(10),
            children=lambda df: df.children.astype("bool"),
            car=lambda df: df.car.str[0],
        )
        .groupby(
            [
                "name",
                "car",
            ]
        )
        .agg(
            born_min=("born", min),
            born_max=("born", max),
            income_mean=("income", np.mean),
            income_median=("income", np.median),
            children_mean=("children", np.mean),
            car_count=("car", "count"),
        )
    ),
    "Polars": lambda: (
        df_polars.lazy()
        .select(cs.all() - cs.ends_with("married"))
        .filter(
            (pl.col("name") == "David")
            & (pl.col("born") > 1980)
            & (pl.col("city") != "London")
            & (pl.col("children") >= 3)
        )
        .with_columns(
            [
                (pl.col("born") / 100).round(),
                pl.col("name").str.to_lowercase(),
                pl.col("city").str.to_uppercase(),
                pl.col("zip_code") * 2,
                (pl.col("income") / 10),
                pl.col("children").cast(pl.Boolean),
                pl.col("car").str.slice(0, length=1),
            ]
        )
        .groupby(
            [
                "name",
                "car",
            ]
        )
        .agg(
            [
                pl.col("born").min().alias("born_min"),
                pl.col("born").max().alias("born_max"),
                pl.col("income").mean().alias("income_mean"),
                pl.col("income").median().alias("income_median"),
                pl.col("children").mean().alias("children_mean"),
                pl.col("car").count().alias("car_count"),
            ]
        )
        .collect()
    ),
}
whole_workflow_read_csv = run_test(test_dict, "Whole workflow")
Figure 12: Performing a representative workflow

Using pl.scan_csv

Code
test_dict = {
    "Pandas": lambda: (
        df_pandas.loc[:, lambda df: ~df.columns.isin(["is_married"])]
        .query("name=='David' and born>1980 and city != 'London' and children >= 3")
        .assign(
            born=lambda df: df.born.div(100).round(),
            name=lambda df: df.name.str.lower(),
            city=lambda df: df.city.str.upper(),
            zip_code=lambda df: df.zip_code.mul(2),
            income=lambda df: df.income.div(10),
            children=lambda df: df.children.astype("bool"),
            car=lambda df: df.car.str[0],
        )
        .groupby(
            [
                "name",
                "car",
            ]
        )
        .agg(
            born_min=("born", min),
            born_max=("born", max),
            income_mean=("income", np.mean),
            income_median=("income", np.median),
            children_mean=("children", np.mean),
            car_count=("car", "count"),
        )
    ),
    "Polars": lambda: (
        pl.scan_csv("sample.csv")
        .select(cs.all() - cs.ends_with("married"))
        .filter(
            (pl.col("name") == "David")
            & (pl.col("born") > 1980)
            & (pl.col("city") != "London")
            & (pl.col("children") >= 3)
        )
        .with_columns(
            [
                (pl.col("born") / 100).round(),
                pl.col("name").str.to_lowercase(),
                pl.col("city").str.to_uppercase(),
                pl.col("zip_code") * 2,
                (pl.col("income") / 10),
                pl.col("children").cast(pl.Boolean),
                pl.col("car").str.slice(0, length=1),
            ]
        )
        .groupby(
            [
                "name",
                "car",
            ]
        )
        .agg(
            [
                pl.col("born").min().alias("born_min"),
                pl.col("born").max().alias("born_max"),
                pl.col("income").mean().alias("income_mean"),
                pl.col("income").median().alias("income_median"),
                pl.col("children").mean().alias("children_mean"),
                pl.col("car").count().alias("car_count"),
            ]
        )
        .collect()
    ),
}
whole_workflow_scan_csv = run_test(test_dict, "Whole workflow (scan_csv)")
Figure 13: Performing a representative workflow (using pl_scan_csv for Polars)

As evident, the utilization of scan_csv increased the required time by about 3-4 times. However, even with this increase, Polars still manages to maintain a substantial advantage of around 5 times faster than the entire workflow executed using Pandas.

Note

When we consider the entirety of the data processing pipeline, irrespective of the file reading approach, Polars emerges as the victor. It consistently exhibits a considerable speed advantage compared to Pandas.

Putting it all together

Time to bring together all the things we’ve explored! Let’s sum up what we’ve learned.

Code
summary = pd.concat(
    [
        read_csv,
        read_parquet,
        # select_col_not_in_memory,
        select_col_in_memory,
        filter_row_one_condition,
        filter_row_multiple_condition,
        operate_one_column,
        operate_multiple_column,
        concatenate,
        aggregate_simple,
        aggregate_complex,
        whole_workflow_read_csv,
        # whole_workflow_scan_csv
    ],
    axis=1,
)

fig = (
    summary.melt(var_name="experiment", value_name="time_sec")
    .assign(
        package=lambda df: df.experiment.str[-6:],
        experiment=lambda df: df.experiment.str[:-7],
    )
    .groupby(["experiment", "package"])
    .time_sec.mean()
    .mul(1000)
    .reset_index()
    .pivot(columns="package", values="time_sec", index="experiment")
    .sort_values(
        by="polars",
    )
    .reset_index()
    .pipe(
        lambda df: px.bar(
            df,
            y="experiment",
            x=["pandas", "polars"],
            barmode="group",
            text_auto='.2s',
            orientation="h",
            labels={"value": "Average execution Time (msec)", "experiment": ""},
            title= f'<b>Speed Showdown in Data Workflow Operations</b> <br><sup><i>Pandas {pd.__version__} vs. Polars {pl.__version__}</i></sup>',
            width=1000,
            height=700,
        )
    )
)
fig.update_xaxes(tickangle=90)
fig.update_layout(yaxis=dict(automargin=True), legend_title_text="Library")
fig.update_traces(textposition='outside')
Figure 14: Analyzing Speed Differences Between Pandas and Polars in Common Data Science Workflow Operations

Throughout our exploration, Polars has consistently outperformed Pandas. We intentionally focused on tasks that frequently arise in data analysis, encompassing file reading, column selection, filtering, and more. However, where Polars truly shines is in aggregation, capitalizing on its multiprocessing prowess. Take a glance at Figure 14, and you’ll notice that reading files is the most time-consuming step for Polars. To address this (and reduce memory usage at the same time), we have the nifty scan_csv function that enables Polars to operate in a lazy mode, optimizing the entire data pipeline. It’s clear that Polars packs a punch in the realm of data processing!

Should you consider using Polars? Based on the findings of this experiment, it’s evident that Polars can bring substantial benefits to every stage of your data processing. Its remarkable speed can make a significant impact. If you’re open to investing time in understanding its API (which isn’t drastically different from Pandas), I’m confident your projects will gain a significant speed boost. And don’t forget, the trajectory of the project seems promising, implying a bright future ahead.

Whatever path you choose, ensure you become well-acquainted with your selected library and stay current with emerging technologies. And above all, relish the journey. Happy coding, and until next time! 🐼🐍🤓💻