Manipulating data frames and wrangling data


We have already seen how to perform calculations with data frames by:

  • Adding a column, computed from other columns if desired.

  • Computing aggregations.

  • Using window functions

These methods are useful for wrangling data frames, doing things like unit conversions or computing statistical point estimates. There are plenty of other methods you can find in the Polars documentation. The Polars docs, and questions on Stack Overflow are among your best references for usage.

What’s to come

In the next sections of this lesson, we discuss how to reshape and combine data frames. In these operations, we are not really doing calculations with data, but rather are working to organize our data set into an easily usable, tidy form. We will cover

  • Creating a data frame from scratch (as opposed to reading in data from a file)

  • Joining and concatenating data frames (putting two or more data frames together into one)

  • Reshaping data frames by unpivoting and pivoting.

All of these methods are useful to bring inputted data into formats that are convenient to work with.

Polars documentation

The Pandas documentation is a great resource for learning about these methods. In particular, the section of the user guide on joining concatenating, unpivoting, and pivoting is very useful. They use very small, contrived data frames to demonstrate the concepts. There are benefits to this approach, certainly, but I find it is often easier to grasp the concepts with real examples. In the following sections of this lesson, we will approach these topics with real examples. You can think of the these lessons as valuable introductions and supplements to the official Polars documentation.