Staging

Overview

Sleep staging is the process of dividing a sleep recording into standard physiological states, typically Wake, NREM1, NREM2, NREM3, and REM, across consecutive epochs. In most PSG-style workflows, the recording is scored in 30-second windows using EEG together with EOG and EMG, because each stage has characteristic patterns such as alpha attenuation at sleep onset, spindles and K-complexes in NREM2, high-amplitude slow waves in NREM3, and rapid eye movements with reduced muscle tone in REM.

Traditionally, sleep staging is done manually by a trained scorer who reviews the recording epoch by epoch and assigns stages according to published criteria such as AASM rules. Manual scoring is still the reference standard because it allows contextual judgment when signals are noisy or borderline, but it is slow, labour-intensive, and introduces scorer variability. Automatic staging uses algorithms or machine-learning models to assign stages from the same signals. These methods are much faster and scale well to large datasets, but their quality depends on signal quality, channel availability, and how similar the new data are to the data the model was developed on. In practice, automated staging is often used either as a first-pass scorer or as a way to generate staging that is later reviewed and corrected manually.

Sleep staging is used both as a primary outcome and as a structural framework for downstream analysis. At the macro level, it lets you quantify sleep architecture: total sleep time, sleep efficiency, sleep latency, REM latency, wake after sleep onset (WASO), time spent in each stage, stage proportions, fragmentation, and sleep cycles. At the microstructural or dynamical level, it supports analyses of stage transitions, bout durations, transition entropy, stability of stages, and hypnogram similarity across nights or participants. It is also essential for event-based analyses, because many phenomena of interest such as spindles, slow oscillations, REMs, PAC, or power spectra are interpreted differently depending on the stage in which they occur. So staging is both a descriptive summary of sleep organization and the backbone for most stage-specific sleep neurophysiology analyses.

Seapipe allows both staging to be performed externally (manually) and can perform automatic sleep staging based on previously published algorithms:

Vallat & Walker (2020)

Sleep ECG - TO DO

Sleep U-Sleep (2021) - TO DO

SE-Res-UNet (2025) - TO DO

Functions to automatically score staging

Detecting sleep stages will involve these functions:

Detect sleep stages:

project.detect_sleep_stages()

This will copy the Annotations file from every /sub-XXX/ses-XXX in <xml_dir> to <root_dir>/derivatives/staging/ and write in the detected stages.

Summarise hypnogram dynamics (TIDE):

project.tide()

This will read the scored hypnogram from each Annotations file and export transition matrices, bout-duration summaries, and group-level hypnogram similarity metrics.

Detect stages

Command line argument:

project.detect_sleep_stages(xml_dir = None,
                            out_dir = None,
                            subs = 'all',
                            sessions = 'all',
                            filetype = '.edf',
                            method = 'Vallat2021',
                            qual_thresh = 0.5,
                            eeg_chan = None,
                            ref_chan = None,
                            eog_chan = None,
                            emg_chan = None,
                            rater = None,
                            invert = False,
                            outfile = True)

Positional arguments:

xml_dir (str)

Path to the directory with sub-directories /sub-XXX/ses-XXX containing the input Annotations files.

Default is None which will point to <root_dir>/derivatives/staging/ (Annotations files with sleep stage markings and arousal/artefact events).

out_dir (str)

Output path for the .xml file containing the new detected event (events will be named like the method used; e.g., Ray2015)

Default is None which will point to <root_dir>/derivatives/spindle/

subs (str, NoneType or list)

Subject IDs to analyze

Acceptable options:

Default is 'all' which will point to all the sub-XXX/ directories in <root_dir>/rawdata/

Entering None will point seapipe to the sub column in the tracking file

Entering a list of sub IDs (e.g., ['sub-01', 'sub-02']) will result in detections for those subjects only

sessions (str, NoneType or list)

Session IDs to analyse per subject

Acceptable options:

Default is 'all' which will point to all the ses-XXX/ directories within the sub-XXX/ directories in <root_dir>/rawdata/

Entering None will point seapipe to the ses column in the tracking file

Entering a list of ses visits (e.g., ['ses-V1', 'ses-V2']) will result in detections for those session(s) within each subject only

filetype (str)

Format of files containing EEG signal

Acceptable options:

Default is '.edf' format

The pipeline can also read .eeg, .set formats

method (str)

Method(s) of automated detection algorithm to detect staging with.

Acceptable options:

Currently only 'Vallat2021' is supported. ref

qual_thresh (float)

Quality threshold. Any stages with a confidence of prediction lower than this threshold will be set to 'Undefined' for futher manual review.

eeg_chan (NoneType or str or list)

EEG channel to use for sleep stage detection

Acceptable options:

Default is None which will point to the stagechan columns in the tracking file

Entering a list of channel names (e.g., ['Fz', 'Cz']) will only detect the selected channels (see NOTE in section Channel Names)

ref_chan (NoneType or list)

Reference channel(s) for the channels of interest (e.g., mastoid A1 or A2 or joint mastoids)

Acceptable options:

Default is None which will point to the refset columns in the tracking file. NOTE If the tracking file or no refset columns exist, then channels will not be re-referenced!

Entering a list of channel names (e.g., ['A1', 'A2']) will re-reference to these channels

Entering an empty list (i.e., []) will perform no re-referencing

eog_chan (NoneType or str or list)

EOG channel to use for sleep stage detection

Acceptable options:

Default is None which will point to the eog columns in the tracking file

Entering in a str containing a channel name (e.g., 'EOGr') will use that channel (see NOTE in section Channel Names)

Entering a list of channel names (e.g., ['EOGl', 'EOGr']) will use all the named channels

emg_chan (NoneType or str or list)

EMG channel to use for sleep stage detection

Acceptable options:

Default is None which will point to the emg columns in the tracking file

Entering in a str containing a channel name (e.g., 'EMG1') will use that channel (see NOTE in section Channel Names)

Entering a list of channel names (e.g., ['EMG1', 'EMG2']) will use all the named channels

rater (NoneType or list)

Name of the rater in the Annotations file to save the detections under

Acceptable options:

Default is None which will discard the name of the rater.

Note

This assumes there is only one rater per Annotations file (.xml) !! make sure you don’t have multiple raters!!

Entering a list of rater names (e.g., ['Rater1', 'Rater2']) will only save detected events on this rater in the Annotations file

invert (NoneType or logical)

Option to invert polarity

Acceptable options:

Default is None which will point to the chanset_invert columns in the tracking file. However, if the tracking file does not specify chanset_invert

columns, the detection will default to False

Entering False will keep the polarity of the recording as it is

Entering True will reverse (flip) the polarity of the recording

outfile (str or logical)

Logging of detection

Acceptable options:

Default is True which will create a logfile detect_spindles_{method}_{datetime}_log.txt in <root_dir>/derivatives/audit/logs/

Entering a string <custom_outfile_name.txt> will save the logfile under that custom name

Entering False won’t save a logfile

TIDE: Transitions, Intervals, and Dynamics of Epochs

TIDE reads sleep stages from the staging .xml annotations and exports three families of hypnogram-derived metrics:

Transition matrices
- Row-normalised stage-to-stage transition probabilities.
- resolution='complete' exports a 5 x 5 matrix over ['Wake', 'NREM1', 'NREM2', 'NREM3', 'REM'] by default.
- resolution='reduced' collapses NREM1/NREM2/NREM3 into a single NREM class and exports a 3 x 3 matrix over Wake, NREM, and REM.
Stage bout duration distributions
- Bout durations are calculated from contiguous runs of the same stage.
- Duration-valued columns are reported in minutes and include:
  - stage_mean_bout_dur_min_*
  - stage_median_bout_dur_min_*
  - stage_p75_bout_dur_min_*
- Additional per-stage bout metrics include:
  - stage_skew_bout_dur_*
  - stage_num_bouts_*
  - stage_prop_short_bouts_* (proportion of bouts shorter than 2 min)
- The same six bout metrics are also exported for *_all_stages (all selected stages pooled together).
Hypnogram similarity
- hyp_sim_epoch: epoch-by-epoch agreement after aligning hypnograms from sleep onset.
- hyp_sim_kappa: Cohen’s kappa on the aligned hypnograms.
- hyp_sim_transition_corr: correlation between subject-level transition matrices.

TIDE Outputs

By default TIDE separates subject/session outputs from group summaries:

Subject/session outputs are written to <root_dir>/derivatives/hypnogram/sub-XXX/ses-XXX/
- *_tide_transition_matrix_complete.csv
- *_tide_transition_counts_complete.csv
- *_tide_transition_matrix_reduced.csv
- *_tide_transition_counts_reduced.csv
- *_tide_stage_duration_distributions.csv
Group-level outputs are written to <root_dir>/derivatives/datasets/hypnogram/
- tide_transition_matrix_complete_summary.csv
- tide_transition_matrix_reduced_summary.csv
- tide_stage_duration_distributions_summary.csv
- hyp_sim_epoch.csv
- hyp_sim_kappa.csv
- hyp_sim_transition_corr.csv
- hypnogram_similarity_manifest.csv

Run TIDE from the pipeline

Command line argument:

project.tide(xml_dir = None,
             out_dir = None,
             subject_out_dir = None,
             subs = 'all',
             sessions = 'all',
             stage = None,
             rater = None,
             resolution = 'complete',
             analyses = 'all',
             keyword = None,
             outfile = True)

Positional arguments:

xml_dir (str)

Path to the directory with sub-directories /sub-XXX/ses-XXX containing the staging Annotations files.

Default is None which will point to the staging derivatives directory resolved by seapipe (usually <root_dir>/derivatives/staging/ or <root_dir>/derivatives/staging_auto/ depending on what exists).

out_dir (str)

Output path for group-level .csv summaries.

Default is None which will point to <root_dir>/derivatives/datasets/hypnogram/

subject_out_dir (str)

Output path for subject/session-level files.

Default is None which will point to <root_dir>/derivatives/hypnogram/

subs (str, NoneType or list)

Subject IDs to analyse

Acceptable options:

Default is 'all' which will analyse all sub-XXX/ directories in xml_dir

Entering None will point seapipe to the sub column in the tracking file

Entering a list of subject IDs (e.g., ['sub-01', 'sub-02']) will analyse those subjects only

sessions (str, NoneType or list)

Session IDs to analyse per subject

Acceptable options:

Default is 'all' which will analyse all ses-XXX/ directories within each sub-XXX/ directory in xml_dir

Entering None will point seapipe to the ses column in the tracking file

Entering a list of session IDs (e.g., ['ses-V0', 'ses-V1']) will analyse those sessions only

stage (NoneType, str or list)

Stages to include in the calculations.

Acceptable options:

Default is ['Wake', 'NREM1', 'NREM2', 'NREM3', 'REM']

Entering a single stage name (e.g., 'REM') or a list of stage names restricts the calculations to those stages

rater (NoneType or str)

Name of the rater in the Annotations file to read staging from

Acceptable options:

Default is None which will select the first rater found in the file

Entering a rater name (e.g., 'Vallat2021') will read staging only from that rater

resolution (str)

Resolution of the transition matrix export.

Acceptable options:

'complete' exports a 5 x 5 matrix

'reduced' exports a 3 x 3 Wake/NREM/REM matrix

analyses (str or list)

Which TIDE analyses to run.

Acceptable options:

Default is 'all' which runs:

'transition_matrix'

'stage_duration_distributions'

'hypnogram_similarity'

Entering one string (e.g., 'hypnogram_similarity') runs only that analysis

Entering a list of names runs the selected subset

keyword (NoneType or str)

Optional substring used to select the correct .xml file when more than one annotations file exists in a subject/session folder.

outfile (str or logical)

Logging of the analysis

Acceptable options:

Default is True which will create a logfile in <root_dir>/derivatives/audit/logs/

Entering a string <custom_outfile_name.txt> will save the logfile under that custom name

Entering False won’t save a logfile

Examples

Run the full TIDE workflow:

project.tide()

Run only the reduced transition matrices:

project.tide(analyses = 'transition_matrix',
             resolution = 'reduced')

Run TIDE on a subset of subjects/sessions and read a specific rater:

project.tide(subs = ['sub-01', 'sub-02'],
             sessions = ['ses-V0'],
             rater = 'Vallat2021')

TIDE metric details

Transition matrices

*_transition_matrix_complete.csv and *_transition_matrix_reduced.csv contain row-normalised probabilities.
Each row sums to 1 within the selected state space.
*_transition_counts_complete.csv and *_transition_counts_reduced.csv contain the underlying transition counts used to calculate those probabilities.

Bout duration metrics

A bout is a contiguous run of epochs with the same stage label.
stage_mean_bout_dur_min_*: arithmetic mean bout duration in minutes.
stage_median_bout_dur_min_*: median bout duration in minutes.
stage_p75_bout_dur_min_*: 75th percentile of the bout duration distribution in minutes.
stage_skew_bout_dur_*: skewness of the bout duration distribution.
stage_num_bouts_*: number of bouts for that stage.
stage_prop_short_bouts_*: proportion of bouts shorter than 2 minutes.

Whole-hypnogram metrics

p_stay_same_stage: proportion of epoch-to-epoch transitions that stay in the same stage.
stage_shift_rate: number of stage changes divided by the total number of scored epochs in the selected stage set.
transition_entropy: entropy of the transition probability structure across the selected stages.
num_sleep_cycles: approximate number of NREM-to-REM cycles, estimated from bout structure.
rem_first_half_prop: proportion of epochs scored as REM in the first half of the sleep period.
rem_second_half_prop: proportion of epochs scored as REM in the second half of the sleep period.
delta_n3_early_late_ratio: ratio of N3 proportion in the first half versus the second half of the sleep period.

Hypnogram similarity

hyp_sim_epoch.csv: pairwise epoch-by-epoch agreement between hypnograms, aligned from sleep onset.
hyp_sim_kappa.csv: pairwise Cohen’s kappa between aligned hypnograms.
hyp_sim_transition_corr.csv: pairwise correlation between subject-level transition matrices.
hypnogram_similarity_manifest.csv: manifest listing the subject/session IDs included in the similarity matrices and the number of epochs used from sleep onset.

Lower-level TIDE class

Advanced users can call the lower-level class directly:

from seapipe.stats.tide import tide

T = tide(xml_dir,
         out_dir = None,
         stage = None,
         rater = None,
         subs = 'all',
         sessions = 'all',
         keyword = None,
         subject_out_dir = None)

T.transition_matrix(stage = None, resolution = 'complete')
T.stage_duration_distributions(stage = None)
T.hypnogram_similarity(stage = None)

Here:

out_dir controls the group-level outputs, while subject_out_dir controls the per-subject/per-session files.

The method-level stage argument can be used to override the stage list stored on the class instance.