48  Implementation of PCA

Open in Google Colab | Download notebook

Data set download

Code
# Colab setup ------------------
import os, shutil, sys, subprocess, urllib.request
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
# ------------------------------
import warnings

import numpy as np
import scipy.io

import sklearn
import sklearn.decomposition
import sklearn.preprocessing

import bebi103

import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
Loading BokehJS ...

48.1 Example data set

As an example of PCA, we will use data from Remedios, et al., 2017. This paper from David Anderson’s lab features a study of mouse behavioral activity (scored by a human) and neuronal activity in the ventrolateral subdivision of the ventromedial hypothalamus (monitored by calcium imaging) of a male mouse when presented with another mouse of a given sex. The authors of the paper have made the data available as a MAT-file, which you can download here: https://s3.amazonaws.com/bebi103.caltech.edu/data/hypothalamus_calcium_imaging_remedios_et_al.mat.

Scipy offers a convenient interface for loading in older versions of MAT files via its scipy.io.loadmat() function. Let us put it to use and see what is in the file.

48.2 Loading older-style MAT files

Scipy offers a convenient interface for loading in older versions of MAT files via its scipy.io.loadmat() function. Let us put it to use and see what is in the file.

data = scipy.io.loadmat(
    os.path.join(data_path, 'hypothalamus_calcium_imaging_remedios_et_al.mat')
)

# Look at what is in the Python variable `data`
data
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jun 27 10:14:33 2022',
 '__version__': '1.0',
 '__globals__': [],
 'attack_vector': array([[0, 0, 0, ..., 0, 0, 0]], shape=(1, 18561), dtype=uint8),
 'neural_data': array([[-0.88539819, -0.80569246, -0.73758439, ..., -0.98890073,
         -1.03551113, -0.99196781],
        [-0.14700497, -0.32165216, -0.33878817, ...,  3.78779148,
          3.83222242,  3.85200285],
        [-1.19636847, -1.18637515, -1.16444654, ..., -0.34082998,
         -0.39085209, -0.38183045],
        ...,
        [-1.07157774, -0.58209432, -0.17334395, ..., -0.8675362 ,
         -0.87503357, -1.08967763],
        [ 1.82225675,  1.30727208,  1.03393152, ...,  1.30556881,
          1.41885948,  0.99312731],
        [ 0.5958771 ,  0.36866712,  0.28664763, ...,  1.78433094,
          1.47429182,  1.85403792]], shape=(115, 18561)),
 'sex_vector': array([[1, 1, 1, ..., 0, 0, 0]], shape=(1, 18561), dtype=uint8)}

It is apparent that scipy.io.loadmat() returns a dictionary with the various Matlab variables as well as some metadata. As given in the '__header__' entry of the dictionary, this is a version 5.0 MAT-file, which is indeed an earlier version than 7.3, which uses HDF5.

According to metadata associated with the paper, data were acquired at 30 Hz, to the time between each measurement is 1/30th of a second. The variables have the following meanings.

  • attack_vector: An entry is zero if the mouse was not attacking his companion in the arena at that time point and one if he was.
  • sex_vector: An entry is one if the presented mouse was female and zero at a given time point and zero if male.
  • neural_data: Each row of this Numpy array corresponds to a single neuron as measured by calcium imaging and each column corresponds to a given time point. The reported value is the relative fluorescent change, \((F(t) - F_0) / F_0\), where \(F_0\) is the average fluorescent intensity over all frames. The reported values are normalized over the standard deviation of a baseline trial and are in units of that standard deviation.

We can pop those data out.

neural_data = data.pop('neural_data') # Row is time, column is neuron
sex_vector = data.pop('sex_vector').squeeze()
attack_vector = data.pop('attack_vector').squeeze()

# Set up time points
t = np.arange(neural_data.shape[1]) / 30

Now we can construct a visualization.

# Kwargs for plotting
kwargs = dict(
    x_interpixel_distance=1/30,
    y_interpixel_distance=1,
    frame_height=150,
    frame_width=600,
    x_axis_label="time (s)",
    y_axis_label="neuron",
)

# Plot neural data
p_neuron = bebi103.image.imshow(neural_data, **kwargs)

# Sex of presented mouse
p_sex = bokeh.plotting.figure(
    frame_height=50,
    frame_width=600,
    x_axis_label=None,
    y_axis_label='sex',
    y_range=['m', 'f'],
    toolbar_location=None,
)
sex_color = ['orchid' if s else 'dodgerblue' for s in sex_vector]
p_sex.scatter(t, ['f' if s else 'm' for s in sex_vector], size=2, color=sex_color)
p_sex.xaxis.major_label_text_font_size = '0pt'

# Attack encoding
p_attack = bokeh.plotting.figure(
    frame_height=50,
    frame_width=600,
    x_axis_label=None,
    y_axis_label='attack',
    y_range=['no', 'yes'],
    toolbar_location=None,
)
attack_color = ['tomato' if a else 'gray' for a in attack_vector]
p_attack.scatter(t, ['yes' if a else 'no' for a in attack_vector], size=2, color=attack_color)
p_attack.xaxis.major_label_text_font_size = '0pt'

# Link x-axes
p_attack.x_range = p_neuron.x_range
p_sex.x_range = p_neuron.x_range

# Show the plots together
bokeh.io.show(bokeh.layouts.column(p_sex, p_attack, p_neuron))