Usage
Installation
To use seapipe, first install it using pip:
(.venv) $ pip install seapipe
Data Preparation and Setup
Seapipe is a bids-standard data processing pipeline,
and as such the data will need to be organised according to bids-specification
for seapipe to run properly.
By running seapipe, raw data will be transformed and partial as well as final results will be saved. Therefore, the original, raw data must be separated from the outputs (derivatives).
This is done by placing the raw dataset inside a directory labelled sourcedata inside the root directory.
For example, a eeg datafile should be in the structure ~/rootdir/rawdata/sub-01/ses-01/eeg/sub-01_ses-01_task-sleep_acq-PSG_eeg.edf
An example of the datastructure would look like this:
└─ my_project-1/
├─ rawdata/
│ ├─ sub-01/
│ │ ├─ ses-01
│ │ │ └─ eeg
│ │ │ ├─ sub-01_ses-01_task-sleep_acq-PSG_eeg.edf
│ │ │ ├─ sub-01_ses-01_task-sleep_acq-PSG_eeg.json
│ │ │ ├─ sub-01_ses-01_task-sleep_acq-PSG_events.tsv *optional for seapipe*
│ │ │ └─ sub-01_ses-01_task-sleep_acq-PSG_channels.tsv *optional for seapipe*
│ │ ├─ ses-02/
│ │ └─ ...
│ ├─ sub-02/
│ ├─ ...
│ ├─ dataset_description.json
│ └─ participants.tsv
└─ derivatives/
│ ├─ seapipe/
│ └─ ...
└─ tracking.tsv
Creating a pipeline
To begin, open python and load seapipe
(.venv) $ python
>>> from seapipe import pipeline
Then you can initiate the pipeline by specifying the path to your dataset.
>>> project_name = pipeline('~/my_project-1/')
Checking your dataset
Before running any analyses, it is important to check your data. For seapipe to run properly, the data needs to be organised in the Brain Imaging Data Structure (BIDS) (see Data Preparation and Setup).
However, seapipe also works almost symbiotically with the Wonambi package. Therefore, any annotations (sleep scoring, artefact markings etc.) need to be inside a wonambi annotations (.xml) file. For more information, see Annotations file.
To receive an overview of your dataset, including whether the each participant’s directory is BIDS compatible, as well as
how many sessions, recording (e.g. edfs) and annotation files they contain, you can call the pipeline.audit property
of every dataset:
>>> project_name.audit()
Summary:
2 files, 2.80 GB
Subjects: 2
Sessions: 2
2024-12-02 18:35:54 - Audit - The dataset appears compatible for SEAPIPE analysis.
BIDS? #sessions #recordings
sub-001 False 2 1 !!
sub-003 False 1 1
This will be automatically saved to a file dataset_audit.csv
To retrieve a list of all the files inside the root directory, along with the
directories 1 and 2 levels preceding the files,
you can use the pipeline.list_dataset() function:
>>> project_name.list_dataset()
Directory: project/bids
Files = ['dataset_description.json', 'participants.tsv']
----------
Directory: ses-01/eeg
Files = ['sub-001_ses-01_eeg.edf']
----------
Directory: ses-02/eeg
Files = ['sub-001_ses-02_eeg.edf']
----------
Directory: ses-01/eeg
Files = ['sub-002_ses-01_eeg.edf']
----------
etc.
To retrieve a table of all the analyses that have been run (and are located in <root_dir>/OUT/), run the following command:
project_name.track(subs = 'tracking.tsv',
step=['staging','spindles', 'so', 'fooof'],
chan = ['Fz (eeg)'],
outfile='progress.csv')
This will output a table of each stage provided for the subs, sessions and channels specified:
2024-12-02 18:42:41 - Tracking - Slow oscillation detection has NOT been run.
ses staging spindle slow_osc fooof
sub-001 [ses-V1, ses-V2] [ses-V1, -] [ses-V1, -] [-] [ses-V1, -]
sub-003 [ses-V1] [ses-V1] [ses-V1] [-] [-]
sub-004 [-] [-] [-] [-] [-]
Tracking File
Uniformity in EEG electrode placement is crucial for ensuring consistent signal capture, minimizing artifacts, and improving comparability across recordings. EEG measures scalp electrical activity, meaning even slight variations in electrode positioning can alter recorded signals, affecting amplitude and frequency analyses, and source localization accuracy. Unlike MRI, which provides high-resolution brain images and allows for spatial normalization to a common template, EEG lacks a direct post hoc standardization method, making uniform electrode placement essential.
This has led to systems of EEG application, most notably the 10-20 system.
Therefore, when working with EEG data, each timeseries is affiliated to a source electrode. And because EEG is a measure of electrical potentials there is the need for reference channels.
However, despite uniformity in spatial placement of recording electrode sites, not all recording software use the same EEG configurations (e.g. channel names, online references, sampling_frequencies etc). This can cause headaches when trying to conduct pipeline analyses across datasets with inconsistences in these certain parameters.
One way that seapipe gets around this is with the use of a tracking file. This file can be in .tsv or .xlsx format. However it must be named: tracking.tsv/xslx and placed at the root level directory in the dataset (see Data Preparation and Setup)
It’s structure should look like this:
sub ses loff lon format chanset1 chanset1_rename refset1
sub-01 ses-1 330 31500 .edf F3, C3 F3, C3 M1, M2
sub-01 ses-2 4320 32390 .edf F3, C3 F3, C3 M1, M2
sub-02 ses-1 1900 29945 .edf F3 (A2), C3 (A2) F3, C3 A1, A2
sub-02 ses-2 670 31010 .edf F3 (A2), C3 (A2) F3, C3 A1, A2
...
- As you can see with this dataset, there are some inconsistences in the channel naming:
sub-01 has channels named ‘F3’ and ‘C3’ <-> sub-02 has channels named ‘F3 (A2)’ and ‘C3 (A2)’ sub-01 has references named ‘M1’ and ‘M2’ <-> sub-02 has channels named ‘A2’ and ‘A2’ All subjects and sessions have different lights out (loff) and lights on (lon) times, corresponding to the time in bed.
If you create this tracking file, then you can read parameters such as channel names by setting this:
chan = None
** Coming soon ** The function to read from a channels.tsv file in a BIDS dataset
Contributing and troubleshooting
- If you run into any issues with using seapipe, notice any documentation is incorrect, or have any suggested functions that you would like to see implemented - please either:
raise an issue on GitHub.
or contact me directly: nathan.cross.90@gmail.com