Backend

class caltrig.core.backend.CellClustering(section=None, outliers_list=[], A=None, fft=True, distance_metric='euclidean')[source]

Bases: object

Cell clustering class. This class is used to cluster cells based on their temporal activity, using FFT and agglomerative clustering.

Parameters:
  • section (dict) – A dictionary containing the cell ids as keys and the temporal activity as values.

  • outliers_list (list) – A list of cell ids that should be excluded from the clustering.

  • A (xr.DataArray) – The spatial footprints of the cells.

  • fft (bool, optional) – Whether to use FFT to compute the PSD. By default True.

  • distance_metric (str, optional) – The distance metric to use for the clustering. The options are: - euclidean - cosine

Variables:
  • A (xr.DataArray) – The spatial footprints of the cells.

  • psd_list_pre (dict) – A dictionary containing the cell ids as keys and the PSD as values.

  • psd_list (list) – A list of the PSD values.

  • outliers_list (list) – A list of cell ids that should be excluded from the clustering.

  • special_unit (list) – A list of cell ids that have no activity.

  • distance_metric (str) – The distance metric to use for the clustering.

  • signals (dict) – A dictionary containing the cell ids as keys and the temporal activity as values.

  • linkage_data (np.array) – The linkage data for the clustering.

  • dendro (dict) – The dendrogram data.

  • cluster_indices (np.array) – The cluster indices.

compute_psd(unit)[source]

Compute the power spectral density of the signal for a given cell.

Parameters:

unit (int) – The cell id.

visualize_clusters(t)[source]

Visualize the clusters by assigning a color to each cluster and looking up the corresponding footprint of each cell.

Parameters:

t (int) – The number of clusters to create.

Returns:

cluster_result (dict) – A dictionary containing the cluster results. The keys are the cluster indices and the values are dictionaries containing the cell ids and the image of the cluster.

visualize_clusters_color()[source]

Slightly different approach to visualize the clusters. This will color the cells based on the cluster the dendrogram results.

Returns:

matplotlib.image.AxesImage – The image of the clustered cells.

visualize_dendrogram(color_threshold=None, ax=None)[source]

Apply dendrogram from scipy.cluster.hierarchy and save result to class attribute.

Parameters:
  • color_threshold (float, optional) – The color threshold for the dendrogram. By default None.

  • ax (matplotlib.axes.Axes, optional) – The axes to plot the dendrogram. By default None.

Returns:

dendro (dict) – The dendrogram data.

class caltrig.core.backend.DataInstance(config_path)[source]

Bases: object

This class is used to store all the data related to a single experiment/recording. This includes all CNMF output data, behavioral/timestamp data and video data.

Parameters:

config_path (str) – The path to the configuration file that contains the paths to the minian, behavior and video files.

Variables:
  • events_type (List[str]) – A list of all the event types that are supported by the program.

  • mouseID (str) – The mouse ID for the experiment.

  • day (str) – The day of the experiment.

  • session (str) – The session of the experiment.

  • group (str) – The group of the experiment.

  • data (dict) – A dictionary that contains all the CNMF output data. The keys are ‘A’, ‘C’, ‘S’, ‘E’, ‘b’, ‘f’, ‘DFF’, ‘YrA’, ‘M’, ‘timestamp(ms)’.

  • video_data (dict) – A dictionary that contains all the video data. The keys are ‘Y_fm_chk’, ‘varr’, ‘Y_hw_chk’, ‘behavior_video’.

  • events (dict) – A dictionary that contains all the behavior event data. The keys are the event types and the values are Event objects.

add_cell_id_group(cell_ids, group_id)[source]

Allocate specific cell ids to a group id

This function will allocate the cell ids to a group id. This will be stored in a dictionary where the key is the cell id and the value is a set of group ids. If group_id is an empty string, we will allocate a number as the group id.

Parameters:
  • cell_ids (list) – List of cell ids to allocate to the group

  • group_id (str) –

add_missed(A)[source]

Adds a missed cell to the data. The missed cell is represented by a footprint mask and added to the M data array and saved to the data folder. A unique missed_id is assigned to the missed cell.

Parameters:

A (np.array) – The footprint mask of the missed cell.

backup_data(name)[source]

Backup a specified data array to the backup folder.

Parameters:

name (str) – The name of the data array to backup.

centroid(A, verbose=False)[source]

Compute centroids of spatial footprint of each cell.

Parameters:
  • A (xr.DataArray) – Input spatial footprints.

  • verbose (bool, optional) – Whether to print message and progress bar. By default False.

Returns:

cents_df (pd.DataFrame) – Centroid of spatial footprints for each cell. Has columns “unit_id”, “height”, “width” and any other additional metadata dimension.

centroid_max(A, verbose=False)[source]

Compute the centroid by taking the maximum value in the image. Nearly the same as centroid() however it is looks better in the 3D visualizations

Parameters:
  • A (xr.DataArray) – Input spatial footprints.

  • verbose (bool, optional) – Whether to print message and progress bar. By default False.

Returns:

cents_df (pd.DataFrame) – Centroid Max of spatial footprints for each cell. Has columns “unit_id”, “height”, “width” and any other additional metadata dimension.

check_DFF()[source]

Check if the DFF xarray exists and if not create it.

check_E()[source]

Check if the E xarray exists and if not create it.

check_essential_data()[source]

Create a list of essential data that is required for the analysis.

get_SNR(savgol_data, noise)[source]

We will simply calculate the ratio. However, we will need to make sure that the noise is not 0. Any 0 value will be replaced with the lowest non-zero value.

Parameters:
  • savgol_data (np.array) – The Savitzky-Golay smoothed data.

  • noise (np.array) – The noise data.

get_average_peak_dff()[source]

Calculate the average peak dff for each cell. The peak dff is calculated by taking the maximum value of the DFF signal of each transient. Then calculate the average of all the peak dffs.

Returns:

results (dict) – A dictionary where the keys are the unit_ids and the values are the average peak dffs.

get_cell_ids(group_id, verified=False)[source]

Get the cell ids for the group id.

Parameters:
  • group_id (str) – The group id to extract the cell ids from.

  • verified (bool) – If True, only extract the verified cells.

get_filtered_C()[source]

This function will filter the C data array by multiplying it with the normalized S data array. This has the effect of removing non-event related signals from the C data array.

get_mad(id=None)[source]

Get the median absolute deviation.

Parameters:

id (int) – The unit_id of the cell for which the MAD should be calculated. If None, then the MAD will be calculated for all cells.

get_mean_iei_per_cell(transient_frames, cell_id, total_transients, frame_rate=None)[source]

Calculate the mean inter-event interval for a single cell. The mean inter-event interval is calculated by taking the difference between the start of each transient. The mean is then calculated from the differences.

Parameters:
  • start_of_transients (xr.DataArray) – The start of each transient for all cells taken as an output of get_mean_iei().

  • cell_id (int) – The cell for which we want to calculate the mean inter-event interval

  • total_transients (xr.DataArray) – This contains the total number of transients for each cell. The number of transients should correspond to the number of 1s in the frames array. If the length of frames is 1 less than the number of transients, then we can assume that the first transient starting at frame 0 was missed by the diff operation in get_transient_frames().

get_noise(savgol_data, id, params={})[source]

Noise will be estimated by taking the absolute value difference between the dff data and savgol_smoothed signal. The noise will be then estimated with a rolling window approach where the mean, median or maximum value will be taken.

Parameters:
  • savgol_data (np.array) – The Savitzky-Golay smoothed data.

  • id (int) – The unit_id of the cell for which the noise should be calculated.

get_savgol(id, params={})[source]

Calculate the Savitzky-Golay filter for the DFF signal, this will be used to estimate the noise.

Parameters:
  • id (int) – The unit_id of the cell for which the Savitzky-Golay filter should be calculated.

  • params (dict) –

    A dictionary that contains the parameters for the Savitzky-Golay filter. The parameters are:

    • win_lenint, optional

      The length of the filter window. Must be an odd integer. Default is 10.

    • poly_orderint, optional

      The polynomial order. Default is 2.

    • derivint, optional

      The order of the derivative to compute. Default is 0.

    • deltafloat, optional

      The spacing of the samples to which the filter will be applied. Default is 1.0.

    • modestr, optional

      The mode parameter for the savgol_filter function. Default is “interp”.

get_timestep(type)[source]

Return a list that contains contains the a list of the frames where the ALP occurs.

Parameters:

type (str) – The type of event to extract the timesteps from.

get_transient_frames(unit_ids=None)[source]

Get the inter-event interval. The approach is as follows: the diff of the E array will give us the rising edges. For E this means that the start of each transient will have a value of 1. We can extrapolate the inter-event by taking their corresponding frame numbers and performing another diff on them.

get_transient_frames_iti_dict(unit_ids)[source]

Does the same thing as get_transient_frames() but returns two dictionaries. The first dictionary contains the unit_ids as keys and the values are the transient frames. The second dictionary contains the unit_ids as keys and the values are the inter-event intervals.

Parameters:

unit_ids (List[int]) – The list of unit_ids for which the inter-event interval should be calculated.

Returns:

  • frame_start (dict) – A dictionary where the keys are the unit_ids and the values are the start of each transient.

  • iti (dict) – A dictionary where the keys are the unit_ids and the values are the inter-event intervals.

load_cell_groups()[source]

Load cell groups from session_group_ids.json if it exists.

Called during __init__ to auto-load existing groups when opening a session. Converts JSON format {group_name: [cell_ids]} to internal format {cell_id: [group_ids]}. Populates self.cell_ids_to_groups.

load_data(config_path)[source]

Load the data from the data path specified in the config file.

Parameters:

config_path (str) – The path to the configuration file that contains the paths to the minian, behavior and video files.

merge_cells(cell_ids)[source]

Merge the cells in the list of cell ids. By averaging both their spatial footprints and temporal activities. The previous C, S, A, YrA, DFF and E arrays will be first backed up before the merge is performed. The E array will drop the cell ids that are not in the list of cell ids to merge and it will change the verified status to 0 for the merged cell id.

Parameters:

cell_ids (list) – List of cell ids to merge.

prune_rejected_cells(cells)[source]

Prune the cells that have been rejected from the list of cells.

reject_cells(cells)[source]

Set the good_cells array to 0 for the cells in the list.

remove_cell_id_group(cell_id_group)[source]

Remove the cell ids from the group id.

Parameters:

cell_id_group (list) – List of cell ids to remove from the group id

remove_missed(ids)[source]

Removes the missed cells from the data. The missed cells are identified by their missed_id.

Parameters:

ids (List[int]) – A list of missed_ids that should be removed.

save_cell_groups()[source]

Save current cell groups to session_group_ids.json in the session’s data folder.

Called automatically whenever groups are added/removed in the GUI. Converts internal format {cell_id: [group_ids]} to JSON format {group_name: [cell_ids]}.

Returns:

str – Path to the saved file

unit_id_consistency()[source]

This function will check if the unit_ids are consistent across all data arrays. If not, it will drop the inconsistent unit_ids from all data arrays. This will be achieved by taking the intersection of all unit_ids and then filtering the data arrays.

update_and_save_E(unit_id, spikes, update_type='Accept Incoming Only')[source]

Update the E array with the final peaks and save it to the minian file.

Parameters:
  • unit_id (int) – The unit_id of the cell for which the E array should be updated.

  • spikes (Union[list, np.ndarray]) – The final peaks that should be added to the E array.

  • update_type (str, optional) – The type of update that should be performed. The options are: * Accept Incoming Only : Only accept the incoming spikes and ignore any overlapping spikes. * Accept Overlapping Only : Accept all spikes including overlapping spikes. * Accept All : Accept all spikes and set the E array to 1 for all the spikes.

class caltrig.core.backend.Event(event_type, data, timesteps)[source]

Bases: object

An event in this context refers to external behavioral events, such as RNFs, ALPs, ILPs, etc… This class also contains various methods to extract relevant information for each Event.

Parameters:
  • event_type (str) – The type of behavioral event, e.g. “ALP”, “ILP”, “RNF”, etc…

  • data (xr.DataArray) – The data array that contains the all CNMF output related data.

  • timesteps (List[int]) – A list of timesteps where the event occurs.

get_interval_section(event_frame, duration, delay=0.0, interval=100, type='C')[source]

Return the selection of the data that is within the given time frame.

Parameters:
  • event_frame (int) – Frame at which the event occurs

  • duration (float) – Duration of the event in seconds

  • delay (float) – Specifies how much time from the event frame should be included in the selection. If delay is positive, then the selection will start from the event frame + delay. If delay is negative, then the selection will start from the event frame - delay.

  • interval (int) – The interval at which the data should be sampled. This is in milliseconds.

  • type (str) – Specfies which data type to extract from the minian file. Default is “C”.

get_section(event_frame, duration, delay=0.0, type='C')[source]

Return the selection of the data that is within the given time frame. duration indicates the number of frames.

Parameters:
  • event_frame (int) – event time stamp

  • duration (float) – last time (seconds)

  • delay (float) – before or after (seconds)

set_values()[source]

Update the values dictionary with the values of the event data and the corresponding windows.

caltrig.core.backend.delete_xarray(dpath, var_name='M')[source]

Delete the specified xarray DataArray by removing the zarr file.

The function serves as a convenience method to deal with “Missing” DataArrays. It will be necessary to call this whenever all missing cells have been removed.

Parameters:
  • dpath (str) – The path to the zarr file that should be deleted.

  • var_name (str, optional) – The name of the DataArray that should be deleted. By default “M”, as we expect this to be the missing data array.

caltrig.core.backend.open_minian(dpath, post_process=None, return_dict=True)[source]

Taken from https://github.com/denisecailab/minian/blob/f64c456ca027200e19cf40a80f0596106918fd09/minian/utilities.py#L278. The current version of minian has outdated dependencies and is not compatible with this project.

Load an existing minian dataset.

If dpath is a file, then it is assumed that the full dataset is saved as a single file, and this function will directly call xarray.open_dataset() on dpath. Otherwise if dpath is a directory, then it is assumed that the dataset is saved as a directory of zarr arrays, as produced by save_minian(). This function will then iterate through all the directories under input dpath and load them as xr.DataArray with zarr backend, so it is important that the user make sure every directory under dpath can be load this way. The loaded arrays will be combined as either a xr.Dataset or a dict. Optionally a user-supplied custom function can be used to post process the resulting xr.Dataset.

Parameters:
  • dpath (str) – The path to the minian dataset that should be loaded.

  • post_process (Callable, optional) – User-supplied function to post process the dataset. Only used if return_dict is False. Two arguments will be passed to the function: the resulting dataset ds and the data path dpath. In other words the function should have signature f(ds: xr.Dataset, dpath: str) -> xr.Dataset. By default None.

  • return_dict (bool, optional) – Whether to combine the DataArray as dictionary, where the .name attribute will be used as key. Otherwise the DataArray will be combined using xr.merge(…, compat=”no_conflicts”), which will implicitly align the DataArray over all dimensions, so it is important to make sure the coordinates are compatible and will not result in creation of large NaN-padded results. Only used if dpath is a directory, otherwise a xr.Dataset is always returned. By default False.

Returns:

ds (Union[dict, xr.Dataset]) – The resulting dataset. If return_dict is True it will be a dict, otherwise a xr.Dataset.

See also

xarray.open_zarr

for how each directory will be loaded as xr.DataArray

xarray.merge

for how the xr.DataArray will be merged as xr.Dataset

caltrig.core.backend.overwrite_xarray(varr, dpath, retrieve=False)[source]

Save an xarray DataArray to a zarr file.

This function creates a temporary zarr file in the same directory as the existing zarr file, and then renames the temporary file to the original. This is due to the fact that certain errors would occur whenever I would try to save the zarr file directly to the original file, loading the zarr array into memory would also cause the same error. This is a workaround to avoid the error.

Parameters:
  • varr (xr.DataArray) – The xarray DataArray that should be saved.

  • dpath (str) – The path to the zarr file that should be saved.

  • retrieve (bool, optional) – Whether the saved xarray DataArray should be read from the zarr file. By default False.

Returns:

arr (xr.DataArray) – The saved xarray DataArray. It will identical to the input varr but it will read from a new zarr file.

caltrig.core.backend.save_minian(var, dpath, meta_dict=None, overwrite=False, chunks=None, compute=True, mem_limit='500MB')[source]

Taken from https://github.com/denisecailab/minian/blob/f64c456ca027200e19cf40a80f0596106918fd09/minian/utilities.py#L440. The current version of minian has outdated dependencies and is not compatible with this project, hence the function has been copied here.

Save a xr.DataArray with zarr storage backend following minian conventions.

This function will store arbitrary xr.DataArray into dpath with zarr backend. A separate folder will be created under dpath, with folder name var.name + “.zarr”. Optionally metadata can be retrieved from directory hierarchy and added as coordinates of the xr.DataArray. In addition, an on-disk rechunking of the result can be performed using rechunker.rechunk() if chunks are given.

Parameters:
  • var (xr.DataArray) – The array to be saved.

  • dpath (str) – The path to the minian dataset directory.

  • meta_dict (dict, optional) – How metadata should be retrieved from directory hierarchy. The keys should be negative integers representing directory level relative to dpath (so -1 means the immediate parent directory of dpath), and values should be the name of dimensions represented by the corresponding level of directory. The actual coordinate value of the dimensions will be the directory name of corresponding level. By default None.

  • overwrite (bool, optional) – Whether to overwrite the result on disk. By default False.

  • chunks (dict, optional) – A dictionary specifying the desired chunk size. The chunk size should be specified using dask:array-chunks convention, except the “auto” specifiication is not supported. The rechunking operation will be carried out with on-disk algorithms using rechunker.rechunk(). By default None.

  • compute (bool, optional) – Whether to compute var and save it immediately. By default True.

  • mem_limit (str, optional) – The memory limit for the on-disk rechunking algorithm, passed to rechunker.rechunk(). Only used if chunks is not None. By default “500MB”.

Returns:

var (xr.DataArray) – The array representation of saving result. If compute is True, then the returned array will only contain delayed task of loading the on-disk zarr arrays. Otherwise all computation leading to the input var will be preserved in the result.

Examples

The following will save the variable var to directory /spatial_memory/alpha/learning1/minian/important_array.zarr, with the additional coordinates: {“session”: “learning1”, “animal”: “alpha”, “experiment”: “spatial_memory”}.

>>> save_minian(
...     var.rename("important_array"),
...     "/spatial_memory/alpha/learning1/minian",
...     {-1: "session", -2: "animal", -3: "experiment"},
... )