Pipeline#

While you can use QuPath and cuisto functionalities as you see fit, there exists a pipeline version of those. It requires a specific structure to store files (so that the different scripts know where to look for data). It also requires that you have detections stored as geojson files, which can be achieved using a pixel classifier and further segmentation (see here) for example.

In the event you can't or don't want to follow the pipeline depicted below, but still want to be able to batch-process animals, check the last section.

Purpose#

This is especially useful to perform quantification for several animals at once, where you'll only need to specify the root directory and the animals identifiers that should be pooled together, instead of having to manually specify each detections and annotations files.

Three main parts are used within the pipeline :

exportPixelClassifierProbabilities.groovy to create prediction maps of objects of interest
The cuisto.segmentation module to segment those maps and create geojson files to be imported back to QuPath as detections
pipelineImportExport.groovy to :
- clear all objects
- import ABBA regions
- mirror regions names
- import geojson detections (from $folderPrefix$segmentation/$segTag$/geojson)
- add measurements to detections
- add atlas coordinates to detections
- add hemisphere to detections' parents
- add regions measurements
  - count for punctual objects
  - cumulated length for lines objects
- export detections measurements
  - as CSV for punctual objects
  - as JSON for lines
- export annotations as CSV

Directory structure#

Following a specific directory structure ensures subsequent scripts and functions can find required files. The good news is that this structure will mostly be created automatically using the segmentation scripts (from QuPath and Python), as long as you stay consistent filling the parameters of each script. The structure expected by the groovy all-in-one script and cuisto batch-process function is the following :

some_directory/
    ├──AnimalID0/  
    │   ├── animalid0_qupath/
    │   └── animalid0_segmentation/  
    │       └── segtag/  
    │           ├── annotations/  
    │           ├── detections/  
    │           ├── geojson/  
    │           └── probabilities/  
    ├──AnimalID1/  
    │   ├── animalid1_qupath/
    │   └── animalid1_segmentation/  
    │       └── segtag/  
    │           ├── annotations/  
    │           ├── detections/  
    │           ├── geojson/  
    │           └── probabilities/

Info

Except the root directory and the QuPath project, the rest is automatically created based on the parameters provided in the different scripts. Here's the description of the structure and the requirements :

animalid0 should be a convenient animal identifier.
The hierarchy must be followed.
The experiment root directory, AnimalID0, can be anything but should correspond to one and only one animal.
Subsequent animalid0 should be lower case.
animalid0_qupath can be named as you wish in practice, but should be the QuPath project.
animalid0_segmentation should be called exactly like this -- replacing animalid0 with the actual animal ID. It will be created automatically with the exportPixelClassifierProbabilities.groovy script.
segtag corresponds to the type of segmentation (cells, fibers...). It is specified in the exportPixelClassifierProbabilities.groovy script. It could be anything, but to recognize if the objects are polygons (and should be counted per regions) or polylines (and the cumulated length should be measured), there are some hardcoded keywords in the cuisto.segmentation module and the pipelineImportExport.groovy script :
- Cells-like when you need measurements related to its shape (area, circularity...) : cells, cell, polygons, polygon
- Cells-like when you consider them as punctual : synapto, synaptophysin, syngfp, boutons, points
- Fibers-like (polylines) : fibers, fiber, axons, axon
annotations contains the atlas regions measurements as TSV files.
detections contains the objects atlas coordinates and measurements as CSV files (for punctal objects) or JSON (for polylines objects).
geojson contains objects stored as geojson files. They could be generated with the pixel classifier prediction map segmentation.
probabilities contains the prediction maps to be segmented by the cuisto.segmentation module.

Tip

You can see an example minimal directory structure with only annotations stored in resources/multi.

Usage#

Tip

Remember that this is merely an example pipeline, you can shortcut it at any points, as long as you end up with TSV files following the requirements for cuisto.

Create a QuPath project.
Register your images on an atlas with ABBA and export the registration back to QuPath.
Use a pixel classifier and export the prediction maps with the exportPixelClassifierProbabilities.groovy script. You need to get a pixel classifier or create one.
Segment those maps with the segmentation module (see an example) to generate the geojson files containing the objects of interest.
Run the pipelineImportExport.groovy script on your QuPath project.
Set up your configuration files.
Then, analysing your data with any number of animals should be as easy as executing those lines in Python (either from IPython directly or in a script to easily run it later) :

import cuisto

# Parameters
wdir = "/path/to/some_directory"
animals = ["AnimalID0", "AnimalID1"]
config_file = "/path/to/your/config.toml"
output_format = "h5"  # to save the quantification values as hdf5 file

# Processing
cfg = cuisto.Config(config_file)
df_regions, dfs_distributions, df_coordinates = cuisto.process.process_animals(
    wdir, animals, cfg, out_fmt=output_format
)

# Display
cuisto.display.plot_regions(df_regions, cfg)
cuisto.display.plot_1D_distributions(dfs_distributions, cfg, df_coordinates=df_coordinates)
cuisto.display.plot_2D_distributions(df_coordinates, cfg)

Tip

You can see a live example in this demo notebook.

Batch-process animals#

It is still possible to process several subjects at once without using the directory structure specified above. The cuisto.process.process_animals() (plural) method is merely a wrapper around cuisto.process.process_animal() (singular). The former fetch the data from the expected locations, the latter is where the analysis actually happens. Therefore, it is possible to fetch your data yourself and feed it to process_animal().

For example, say you used the QuPath Measure > Export measurements for each of your animals. For each individual, this builds a single file with all your images. Let's collect those individual files in a single directory called "results", and name the files in a consistent manner that allows you to identify "Annotations" and "Detections", as well as the animal identifier, for instance "animal0_annotations.tsv".

Important

The configuration file is mandatory, even for single-animal analysis.

The script batch_process_animals.py located in the examples will mimick process_animals() functionnality.

Click to show the file

batch_process_animals.py
"""
This example shows how to collect data from single animal when they were exported
individually from QuPath, resulting in a single file per animal, containing data from
all the images in each project. Thus, there should be, for each animal, a file
corresponding to Annotations (brain regions) and Detections (objects of interest).

We assume all the pairs of files are located in the same directory, and their file name
is in the form : animalid_annotations.tsv and animalid_detections.tsv.

For fibers, a json file is required to store the coordinates of all the points making
a single fiber. Those would be generated with the exportFibersAtlasCoordinates.groovy
script. We assume all json files corresponding to one animal is stored in a
"animalid_detections" folder.

"""

# import required packages
import os
import tarfile
import sys
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd
import requests
from tqdm import tqdm

import cuisto

# --- Parameters
# Configure to get the example online
dl_example = True
animals = ("animalid0", "animalid1")
example_url = "https://github.com/TeamNCMC/cuisto/raw/main/resources/example.tar.gz"

# Configure with your own data
# full path to where are the TSV files and {animal}_detections folders
# dl_example = False
# animals = ("animalid0", "animalid1")
# input_dir = "~/.cuisto/example"
# config_file = "~/.cuisto/config_multi.toml"

# Download example data into cuisto default folder. Remove this block if using your data
if dl_example:
    default_destination = Path.home() / ".cuisto"
    print(f"Downloading example data to {default_destination}...")
    if not default_destination.exists():
        default_destination.mkdir()

    response = requests.get(example_url)
    if response.ok:
        tarname = default_destination / "example.tar.gz"
        with open(tarname, "wb") as fid:
            fid.write(response.content)
        with tarfile.open(tarname) as tar:
            tar.extractall(tarname.parent, filter="data")

    else:
        msg = (
            "Download failed. Download manually here : "
            "https://github.com/TeamNCMC/cuisto/tree/main/resources"
        )
        print(msg)
        sys.exit()

    # set paths
    input_dir = default_destination / "example"
    config_file = input_dir / "config_multi.toml"
    # we need to overwrite the paths to the other configuration files
    with open(config_file, "r") as fid:
        data_config = fid.readlines()
    data_config[-4] = (
        "blacklist = '" + str(input_dir / "demo_atlas_blacklist_brain.toml") + "'\n"
    )
    data_config[-3] = (
        "fusion = '" + str(input_dir / "demo_atlas_fusion_brain.toml") + "'\n"
    )
    with open(config_file, "w") as fid:
        fid.writelines(data_config)


# --- Preparation

# load configuration
cfg = cuisto.Config(config_file)

# initialize lists
df_regions = []
dfs_distributions = []
df_coordinates = []

# --- Processing
pbar = tqdm(animals)

for animal in pbar:
    pbar.set_description(f"Processing {animal}")

    # read annotation for this animal
    df_annotations = pd.read_csv(
        os.path.join(input_dir, f"{animal}_annotations.tsv"),
        index_col="Object ID",
        sep="\t",
    )

    # read detections only to plot spatial distributions, otherwise set
    # df_detections = pd.DataFrame()
    # uncomment out for cells
    # df_detections = pd.read_csv(
    #     os.path.join(input_dir, f"{animal}_detections.tsv"),
    #     index_col="Object ID",
    #     sep="\t",
    # )

    # comment out for non-fibers
    df_detections = cuisto.io.cat_json_dir(
        os.path.join(input_dir, f"{animal}_detections"),
        hemisphere_names=cfg.hemispheres["names"],  # we need it now for performance
        atlas=cfg.bg_atlas,
    )

    # get results
    df_reg, dfs_dis, df_coo = cuisto.process.process_animal(
        animal,
        df_annotations,
        df_detections,
        cfg,
        compute_distributions=True,  # set to False if df_detections is empty
    )

    # collect results
    df_regions.append(df_reg)
    dfs_distributions.append(dfs_dis)
    df_coordinates.append(df_coo)

# concatenate all results
df_regions = pd.concat(df_regions, ignore_index=True)
dfs_distributions = [
    pd.concat(dfs_list, ignore_index=True) for dfs_list in zip(*dfs_distributions)
]
df_coordinates = pd.concat(df_coordinates, ignore_index=True)

# plot as usual -- animals will be pooled and the mean +/- sem will be shown
cuisto.display.plot_regions(df_regions, cfg)
cuisto.display.plot_1D_distributions(
    dfs_distributions, cfg, df_coordinates=df_coordinates
)
cuisto.display.plot_2D_distributions(df_coordinates, cfg)

plt.show()