Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Introduction to Polars

Polars is a fast DataFrame library for data wrangling/analytics. Like pandas, it lets you load, transform, join, group, and aggregate tabular data---but it’s built with a different performance model:

  • Engine: written in Rust (pandas is mostly Python with NumPy/C extensions).

  • Execution model: supports lazy evaluation (build a query plan, then execute efficiently) and query optimization; pandas is mostly eager (does work immediately).

  • Parallelism: Polars is designed to use multiple CPU cores by default for many operations; pandas is often single-threaded for typical DataFrame ops.

  • Memory model: Polars uses Apache Arrow columnar memory under the hood, which is great for speed and interoperability.

Pandas vs Polars

DimensionpandasPolars
Primary languagePython (with C/NumPy under the hood)Rust (Python bindings)
Execution modelEager (operations run immediately)Eager and Lazy (query planning + optimization)
PerformanceGood for small--medium data; can slow down on large groupbys/joinsOften much faster, especially on large datasets
ParallelismLimited; many ops are single-threadedBuilt-in multi-threading by default
Memory formatNumPy-based, row-oriented tendenciesApache Arrow, columnar
Memory efficiencyCan be higher overhead, especially with object columnsGenerally more memory-efficient
Data typesFlexible; object dtype commonStrict, explicit dtypes
Missing valuesUses NaN, None, nullable dtypes added laterNative null support via Arrow
IndexCentral concept (powerful but sometimes confusing)No index (explicit columns only)
API styleImperative, step-by-stepExpression-based, declarative
Lazy evaluation❌ No✅ Yes
Query optimization❌ No✅ Yes
Streaming / out-of-coreLimitedSupported (especially with lazy mode)
String performanceOften slower (object strings)Very fast (Arrow strings)
Time series supportVery matureSolid and improving
Ecosystem supportMassive (default for ML, stats, viz)Growing, but smaller
Learning curveLow (widely taught)Moderate (different mental model)
InteroperabilityNative to most Python data toolsEasy conversion to/from pandas
Typical use casesData exploration, ML prep, teaching, quick analysisETL pipelines, large data, performance-critical workflows
MaturityVery matureNewer but rapidly evolving

Using Polars

Installing Polars

To install polars, use pip in your terminal or command prompt (where your Jupyter environment is set up):

pip install polars
import polars as pl
import datetime as dt

df_polars = pl.DataFrame(
    {
        "developer": [
            "Alice Chen",
            "Brian Patel",
            "Carlos Gomez",
            "Diana Nguyen",
        ],
        "hire_date": [
            dt.date(2019, 6, 1),
            dt.date(2020, 9, 15),
            dt.date(2018, 3, 22),
            dt.date(2021, 1, 10),
        ],
        "weekly_commits": [45, 30, 60, 25],
        "hours_worked": [40, 38, 45, 35],
    }
)

df_polars
Loading...

Check the type of the df object to confirm it’s a Polars DataFrame:

type(df_polars)
polars.dataframe.frame.DataFrame

Comparing Polars and Pandas DataFrames

import pandas as pd
import datetime as dt

df_pandas = pd.DataFrame(
    {
        "developer": [
            "Alice Chen",
            "Brian Patel",
            "Carlos Gomez",
            "Diana Nguyen",
        ],
        "hire_date": [
            "2019-06-01",
            "2020-09-15",
            "2018-03-22",
            "2021-01-10",
        ],
        "weekly_commits": [45, 30, 60, 25],
        "hours_worked": [40, 38, 45, 35],
    }
)

df_pandas
Loading...
type(df_pandas)
pandas.core.frame.DataFrame

You’ll notice that the Polars DataFrame is of type polars.dataframe.DataFrame, while the pandas DataFrame is of type pandas.core.frame.DataFrame. Other than that, they look quite similar!

However, Polars offers many modern-API features. For example, Polars supports expressions for data transformations.

For example, here is an expression to calculate the productivity score for each developer based on their number of weekly commits and hours worked:

result = df_polars.select(
    pl.col("developer"),
    pl.col("hire_date").dt.year().alias("hire_year"),
    (pl.col("weekly_commits") / pl.col("hours_worked")).alias("productivity_score"),
)
result
Loading...

You can also add columns to the DataFrame instead of selecting them using with_columns:

result = df_polars.with_columns(
    pl.col("developer"),
    pl.col("hire_date").dt.year().alias("hire_year"),
    (pl.col("weekly_commits") / pl.col("hours_worked")).alias("productivity_score"),
)
result
Loading...

Filtering

Filtering in Polars can be done using expressions as well. For example, to filter developers with a productivity score greater than 1:

result = df_polars.filter((pl.col("weekly_commits") / pl.col("hours_worked")) > 1)
result
Loading...

You can also provide multiple expressions as separate parameters, which is much more convenient than having to use bitwise operators (&, |) as in pandas.

result = df_polars.filter(
    pl.col("hire_date").is_between(dt.date(2018, 1, 1), dt.date(2019, 12, 31)),
    pl.col("hours_worked") >= 40,
)
result
Loading...