Narwhals

Project: Scaling Narwhals Documentation and Backend Enhancements

About Narwhals

Narwhals is an open source Python library designed to serve as a lightweight and extensible compatibility layer between different DataFrame libraries. It enables the data ecosystem to become DataFrame-agnostic. By spreading API standards, Narwhals empowers DataFrame authors to innovate on the implementation side. Since its first release in Feb 2024, the project has already become a dependency of major data science tools such as Altair, Marimo, Plotly, Py-Shiny, and Vegafusion.

About the project with POSSEE

The proposed deliverables of the project with POSSEE will propel Narwhals’ mission to strengthen the open source data science ecosystem.

Featuring a wider range of DataFrame libraries besides Polars and pandas in the docstring examples will showcase Narwhals’ flexibility to seamlessly integrate with various data processing ecosystems.

The migration and developer guides will provide clear and detailed documentation on how to upgrade to newer versions of Narwhals and extend its functionality for their specific use cases.

Improving backend support for lazy execution frameworks like DuckDB and Dask will enable more efficient processing of large datasets, which is essential for modern data science workflows that require high performance and scalability.

Allowing output SQL without requiring a backend installation will further streamline data science pipelines, removing dependencies and simplifying integration with various data sources.

Regular performance benchmarking will ensure that Narwhals meets the demands of high-performance computing while minimizing overhead, providing users with reliable insights into Narwhals’ performance under different conditions.

Project scope

  1. Improve all docstring examples by featuring a wider range of libraries besides Polars and pandas. Anything eager should showcase pandas and Polars, and anything lazy should showcase Polars LazyFrame and Dask and DuckDB.
  2. Improve backend support: improve support for lazy layer (DuckDB / PySpark / Dask).
  3. Write migration guide to upgrade from narwhals.stable.v1 to narwhals.stable.v2.
  4. Write developer guide for extending Narwhals.
  5. Set up regular benchmarking jobs for Narwhals performance and overhead.
  6. Choose a DataFrame-consuming library and make it DataFrame-agnostic using Narwhals.
  7. Allow Narwhals to output SQL without any backend being installed.

Additional information

The proposed scope of work is building on the deliverables of the previous project with POSSEE.
Skip to content