(Pretty) big data wrangling with DuckDB and Polars

Note: These materials were originally prepared as part of the Workshops for Ukraine series. I have since refined and reused them in other contexts

Website: https://grantmcdermott.com/duckdb-polars

Description: This workshop will introduce you to DuckDB and Polars, two data wrangling libraries at the frontier of high-performance computation. (See benchmarks.) In addition to being extremely fast and portable, both DuckDB and Polars provide user-friendly implementations across multiple languages. This makes them very well suited to production and applied research settings, without the overhead of tools like Spark. We will provide a variety of real-life examples in both R and Python, with the aim of getting participants up and running as quickly as possible. We will learn how wrangle datasets extending over several hundred million observations in a matter of seconds or less, using only our laptops. And we will learn how to scale to even larger contexts where the data exceeds our computers’ RAM capacity. Finally, we will also discuss some complementary tools and how these can be integrated for an efficient end-to-end workflow (data I/O -> wrangling -> analysis).

Disclaimer: The content for this workshop has been prepared, and is presented, in my personal capacity. Any opinions expressed herein are my own and are not necessarily shared by my employer. Please do not share any recorded material (e.g., audio or video) without the express permission of myself or the workshop organisers. The materials themselves may be freely repurposed and distributed (with attribution) per the accompanying CC BY 4.0.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
README_files/libs		README_files/libs
_freeze		_freeze
nyc-taxi		nyc-taxi
polars-rpy_files/figure-html		polars-rpy_files/figure-html
slides		slides
.gitignore		.gitignore
.here		.here
LICENSE		LICENSE
README.html		README.html
README.md		README.md
_quarto.yml		_quarto.yml
duckdb-dplyr.qmd		duckdb-dplyr.qmd
duckdb-ibis.qmd		duckdb-ibis.qmd
duckdb-sql.qmd		duckdb-sql.qmd
index.qmd		index.qmd
polars-rpy.qmd		polars-rpy.qmd
requirements.qmd		requirements.qmd
styles.css		styles.css
tabset-sync.html		tabset-sync.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(Pretty) big data wrangling with DuckDB and Polars

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

(Pretty) big data wrangling with DuckDB and Polars

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages