Styling data frames

Data set download

[2]:

import pandas as pd

It is sometimes useful to highlight features in a data frame when viewing them. (Note that this is generally far less useful than making informative plots, which we will come to shortly.) Pandas offers some convenient ways to style the display of a data frame.

To demonstrate, we will again use a data set from Beattie, et al. containing results from a study the effects of sleep quality on performance in the Glasgow Facial Matching Test (GMFT).

[3]:

df = pd.read_csv(os.path.join(data_path, 'gfmt_sleep.csv'), na_values='*')

As our first example demonstrating styling, let’s say we wanted to highlight rows corresponding to women who scored at or above 75% correct. We can write a function that will take as an argument a row of the data frame, check the value in the 'gender' and 'percent correct' columns, and then specify a row color of gray or green accordingly. We then use df.style.apply() with the axis=1 kwarg to apply that function to each row.

[4]:

def highlight_high_scoring_females(s):
    if s["gender"] == "f" and s["percent correct"] >= 75:
        return ["background-color: #7fc97f"] * len(s)
    else:
        return ["background-color: lightgray"] * len(s)

df.head(10).style.apply(highlight_high_scoring_females, axis=1)

[4]:

	participant number	gender	age	correct hit percentage	correct reject percentage	percent correct	confidence when correct hit	confidence incorrect hit	confidence correct reject	confidence incorrect reject	confidence when correct	confidence when incorrect	sci	psqi	ess
0	8	f	39	65	80	72.500000	91.000000	90.000000	93.000000	83.500000	93.000000	90.000000	9	13	2
1	16	m	42	90	90	90.000000	75.500000	55.500000	70.500000	50.000000	75.000000	50.000000	4	11	7
2	18	f	31	90	95	92.500000	89.500000	90.000000	86.000000	81.000000	89.000000	88.000000	10	9	3
3	22	f	35	100	75	87.500000	89.500000	nan	71.000000	80.000000	88.000000	80.000000	13	8	20
4	27	f	74	60	65	62.500000	68.500000	49.000000	61.000000	49.000000	65.000000	49.000000	13	9	12
5	28	f	61	80	20	50.000000	71.000000	63.000000	31.000000	72.500000	64.500000	70.500000	15	14	2
6	30	m	32	90	75	82.500000	67.000000	56.500000	66.000000	65.000000	66.000000	64.000000	16	9	3
7	33	m	62	45	90	67.500000	54.000000	37.000000	65.000000	81.500000	62.000000	61.000000	14	9	9
8	34	f	33	80	100	90.000000	70.500000	76.500000	64.500000	nan	68.000000	76.500000	14	12	10
9	35	f	53	100	50	75.000000	74.500000	nan	60.500000	65.000000	71.000000	65.000000	14	8	7

We can be more fancy. Let’s say we want to shade the 'percent correct' column with a bar corresponding to the value in the column. We use the df.style.bar() method to do so. The subset kwarg specifies which columns are to have bars.

[5]:

df.head(10).style.bar(subset=["percent correct"], vmin=0, vmax=100)

[5]:

	participant number	gender	age	correct hit percentage	correct reject percentage	percent correct	confidence when correct hit	confidence incorrect hit	confidence correct reject	confidence incorrect reject	confidence when correct	confidence when incorrect	sci	psqi	ess
0	8	f	39	65	80	72.500000	91.000000	90.000000	93.000000	83.500000	93.000000	90.000000	9	13	2
1	16	m	42	90	90	90.000000	75.500000	55.500000	70.500000	50.000000	75.000000	50.000000	4	11	7
2	18	f	31	90	95	92.500000	89.500000	90.000000	86.000000	81.000000	89.000000	88.000000	10	9	3
3	22	f	35	100	75	87.500000	89.500000	nan	71.000000	80.000000	88.000000	80.000000	13	8	20
4	27	f	74	60	65	62.500000	68.500000	49.000000	61.000000	49.000000	65.000000	49.000000	13	9	12
5	28	f	61	80	20	50.000000	71.000000	63.000000	31.000000	72.500000	64.500000	70.500000	15	14	2
6	30	m	32	90	75	82.500000	67.000000	56.500000	66.000000	65.000000	66.000000	64.000000	16	9	3
7	33	m	62	45	90	67.500000	54.000000	37.000000	65.000000	81.500000	62.000000	61.000000	14	9	9
8	34	f	33	80	100	90.000000	70.500000	76.500000	64.500000	nan	68.000000	76.500000	14	12	10
9	35	f	53	100	50	75.000000	74.500000	nan	60.500000	65.000000	71.000000	65.000000	14	8	7

Note that I have used the vmin=0 and vmax=100 kwargs to set the base of the bar to be at zero and the maximum to be 100.

Alternatively, I could color the percent correct according to the percent correct.

[6]:

df.head(10).style.background_gradient(subset=["percent correct"], cmap="Reds")

[6]:

	participant number	gender	age	correct hit percentage	correct reject percentage	percent correct	confidence when correct hit	confidence incorrect hit	confidence correct reject	confidence incorrect reject	confidence when correct	confidence when incorrect	sci	psqi	ess
0	8	f	39	65	80	72.500000	91.000000	90.000000	93.000000	83.500000	93.000000	90.000000	9	13	2
1	16	m	42	90	90	90.000000	75.500000	55.500000	70.500000	50.000000	75.000000	50.000000	4	11	7
2	18	f	31	90	95	92.500000	89.500000	90.000000	86.000000	81.000000	89.000000	88.000000	10	9	3
3	22	f	35	100	75	87.500000	89.500000	nan	71.000000	80.000000	88.000000	80.000000	13	8	20
4	27	f	74	60	65	62.500000	68.500000	49.000000	61.000000	49.000000	65.000000	49.000000	13	9	12
5	28	f	61	80	20	50.000000	71.000000	63.000000	31.000000	72.500000	64.500000	70.500000	15	14	2
6	30	m	32	90	75	82.500000	67.000000	56.500000	66.000000	65.000000	66.000000	64.000000	16	9	3
7	33	m	62	45	90	67.500000	54.000000	37.000000	65.000000	81.500000	62.000000	61.000000	14	9	9
8	34	f	33	80	100	90.000000	70.500000	76.500000	64.500000	nan	68.000000	76.500000	14	12	10
9	35	f	53	100	50	75.000000	74.500000	nan	60.500000	65.000000	71.000000	65.000000	14	8	7

We could have multiple effects together as well.

[7]:

df.head(10).style.bar(
    subset=["percent correct"], vmin=0, vmax=100
).apply(
    highlight_high_scoring_females, axis=1
)

[7]:

	participant number	gender	age	correct hit percentage	correct reject percentage	percent correct	confidence when correct hit	confidence incorrect hit	confidence correct reject	confidence incorrect reject	confidence when correct	confidence when incorrect	sci	psqi	ess
0	8	f	39	65	80	72.500000	91.000000	90.000000	93.000000	83.500000	93.000000	90.000000	9	13	2
1	16	m	42	90	90	90.000000	75.500000	55.500000	70.500000	50.000000	75.000000	50.000000	4	11	7
2	18	f	31	90	95	92.500000	89.500000	90.000000	86.000000	81.000000	89.000000	88.000000	10	9	3
3	22	f	35	100	75	87.500000	89.500000	nan	71.000000	80.000000	88.000000	80.000000	13	8	20
4	27	f	74	60	65	62.500000	68.500000	49.000000	61.000000	49.000000	65.000000	49.000000	13	9	12
5	28	f	61	80	20	50.000000	71.000000	63.000000	31.000000	72.500000	64.500000	70.500000	15	14	2
6	30	m	32	90	75	82.500000	67.000000	56.500000	66.000000	65.000000	66.000000	64.000000	16	9	3
7	33	m	62	45	90	67.500000	54.000000	37.000000	65.000000	81.500000	62.000000	61.000000	14	9	9
8	34	f	33	80	100	90.000000	70.500000	76.500000	64.500000	nan	68.000000	76.500000	14	12	10
9	35	f	53	100	50	75.000000	74.500000	nan	60.500000	65.000000	71.000000	65.000000	14	8	7

In practice, I almost never use these features because it is almost always better to display results as a plot rather than in tabular form. Still, it can be useful when exploring data sets to highlight certain aspects when exploring data sets in tabular form.

Computing environment

[8]:

%load_ext watermark
%watermark -v -p pandas,jupyterlab

Python implementation: CPython
Python version       : 3.11.5
IPython version      : 8.15.0

pandas    : 2.0.3
jupyterlab: 4.0.6