Introduction to Polars
Polars is a fast DataFrame library for data wrangling/analytics. Like pandas, it lets you load, transform, join, group, and aggregate tabular data---but it’s built with a different performance model:
Engine: written in Rust (pandas is mostly Python with NumPy/C extensions).
Execution model: supports lazy evaluation (build a query plan, then execute efficiently) and query optimization; pandas is mostly eager (does work immediately).
Parallelism: Polars is designed to use multiple CPU cores by default for many operations; pandas is often single-threaded for typical DataFrame ops.
Memory model: Polars uses Apache Arrow columnar memory under the hood, which is great for speed and interoperability.
Pandas vs Polars¶
| Dimension | pandas | Polars |
|---|---|---|
| Primary language | Python (with C/NumPy under the hood) | Rust (Python bindings) |
| Execution model | Eager (operations run immediately) | Eager and Lazy (query planning + optimization) |
| Performance | Good for small--medium data; can slow down on large groupbys/joins | Often much faster, especially on large datasets |
| Parallelism | Limited; many ops are single-threaded | Built-in multi-threading by default |
| Memory format | NumPy-based, row-oriented tendencies | Apache Arrow, columnar |
| Memory efficiency | Can be higher overhead, especially with object columns | Generally more memory-efficient |
| Data types | Flexible; object dtype common | Strict, explicit dtypes |
| Missing values | Uses NaN, None, nullable dtypes added later | Native null support via Arrow |
| Index | Central concept (powerful but sometimes confusing) | No index (explicit columns only) |
| API style | Imperative, step-by-step | Expression-based, declarative |
| Lazy evaluation | ❌ No | ✅ Yes |
| Query optimization | ❌ No | ✅ Yes |
| Streaming / out-of-core | Limited | Supported (especially with lazy mode) |
| String performance | Often slower (object strings) | Very fast (Arrow strings) |
| Time series support | Very mature | Solid and improving |
| Ecosystem support | Massive (default for ML, stats, viz) | Growing, but smaller |
| Learning curve | Low (widely taught) | Moderate (different mental model) |
| Interoperability | Native to most Python data tools | Easy conversion to/from pandas |
| Typical use cases | Data exploration, ML prep, teaching, quick analysis | ETL pipelines, large data, performance-critical workflows |
| Maturity | Very mature | Newer but rapidly evolving |
Using Polars¶
Installing Polars¶
To install polars, use pip in your terminal or command prompt (where your Jupyter environment is set up):
pip install polarsimport polars as pl
import datetime as dt
df_polars = pl.DataFrame(
{
"developer": [
"Alice Chen",
"Brian Patel",
"Carlos Gomez",
"Diana Nguyen",
],
"hire_date": [
dt.date(2019, 6, 1),
dt.date(2020, 9, 15),
dt.date(2018, 3, 22),
dt.date(2021, 1, 10),
],
"weekly_commits": [45, 30, 60, 25],
"hours_worked": [40, 38, 45, 35],
}
)
df_polarsCheck the type of the df object to confirm it’s a Polars DataFrame:
type(df_polars)polars.dataframe.frame.DataFrameComparing Polars and Pandas DataFrames¶
import pandas as pd
import datetime as dt
df_pandas = pd.DataFrame(
{
"developer": [
"Alice Chen",
"Brian Patel",
"Carlos Gomez",
"Diana Nguyen",
],
"hire_date": [
"2019-06-01",
"2020-09-15",
"2018-03-22",
"2021-01-10",
],
"weekly_commits": [45, 30, 60, 25],
"hours_worked": [40, 38, 45, 35],
}
)
df_pandastype(df_pandas)pandas.core.frame.DataFrameYou’ll notice that the Polars DataFrame is of type polars.dataframe.DataFrame, while the pandas DataFrame is of type pandas.core.frame.DataFrame. Other than that, they look quite similar!
However, Polars offers many modern-API features. For example, Polars supports expressions for data transformations.
For example, here is an expression to calculate the productivity score for each developer based on their number of weekly commits and hours worked:
result = df_polars.select(
pl.col("developer"),
pl.col("hire_date").dt.year().alias("hire_year"),
(pl.col("weekly_commits") / pl.col("hours_worked")).alias("productivity_score"),
)
resultYou can also add columns to the DataFrame instead of selecting them using with_columns:
result = df_polars.with_columns(
pl.col("developer"),
pl.col("hire_date").dt.year().alias("hire_year"),
(pl.col("weekly_commits") / pl.col("hours_worked")).alias("productivity_score"),
)
resultFiltering¶
Filtering in Polars can be done using expressions as well. For example, to filter developers with a productivity score greater than 1:
result = df_polars.filter((pl.col("weekly_commits") / pl.col("hours_worked")) > 1)
resultYou can also provide multiple expressions as separate parameters, which is much more convenient than having to use bitwise operators (&, |) as in pandas.
result = df_polars.filter(
pl.col("hire_date").is_between(dt.date(2018, 1, 1), dt.date(2019, 12, 31)),
pl.col("hours_worked") >= 40,
)
result