Annotating a BIDS dataset
What is an annotation?
Annotation refers to metadata that is directly associated with data. Without adequate annotation, your valuable shared data may be of limited use to other researchers and to you in the future.
While the BIDS requirements for annotation are limited, BIDS supports a framework for inserting comprehensive data annotation at several levels in the dataset. For example, BIDS supports annotations of events and subject characteristics using Hierarchical Event Descriptors (HED), an infrastructure and controlled vocabulary for producing standardized machine-actionable annotations.
This tutorial provides a step-by-step process for data annotation in the BIDS framework.
Annotations in BIDS can be done at several levels, including the dataset, subjects, sessions, scans, and events.
Required BIDS annotation files
Dataset sourcing (dataset_description.json
)
dataset_description.json
is a top-level file that gives details about the source of the dataset,
funding, and citation information.
This file does not provide any actual description of the data.
You can fill in this blank dataset description template or use it as a guide.
Dataset description (README
)
README
file is a top-level text file that gives the actual overview of the dataset.
A comprehensive README
is essential for users of your data.
You can edit the README template with the vital information needed for others to analyze your dataset.
Subject annotations
Annotations at the subject level can be done in the participants.tsv
file,
which is a top-level tab-separated value file that provides
subject information such as age, sex, and handedness.
Each subject in the dataset should have a row in participants.tsv
.
Each type of metadata is provided in a column in this file,
and the nature of the column data is described in the top-level
participants.json
file.
Other subject information such as diagnosis or group may be provided
in the participants.tsv
and its corresponding participants.json
files.
Any such information makes your data more valuable to users.
You can edit the participants.tsv template and the corresponding participants.json template to provide this information.
If the dataset includes multiple sets of participant level measurements see the BIDS guidelines for adding phenotypic and assessment data.
Session annotations
At the session level, the optional sessions.tsv
and sessions.json
files can be used to add
annotations that apply to an entire session.
Scans/run annotations
At the scans or run level, the optional scans.tsv
and scans.json
files can be used to add
annotations that apply to an entire run.
Event annotations
Why is event annotation necessary?
Events provide the crucial linkage between what happens in the experiment and the data itself. Without the information provided by the dataset events, many types of datasets cannot be analyzed.
Beyond marking experimental stimuli, participant responses, instructions, and feedback, events can also mark the initiation and termination of tasks and experimental conditions.
BIDS event infrastructure
Events in BIDS are marked by providing events.tsv
files associated with data recordings.
These tab-separated files have rows corresponding to the individual event markers and
columns corresponding to information about the corresponding event.
BIDS minimum requirements
BIDS requires that events.tsv
files have an onset
column marking the
time of the time that the event occurred in seconds relative to the start
of the correspondingly named data recording file.
The events.tsv
files must also have a duration
column indicating
the duration of the event in seconds.
At the present time, many datasets model events as instantaneous
and use n/a
in the duration column.
Usually, events.tsv
files have additional columns containing
information about the events. Optional columns include sample
,
trial_type
, response_time
, value
, and HED
.
The events.tsv
files may contain an arbitrary number of additional columns.
All the optional columns are dataset-specific
and will be meaningless to dataset users
without additional documentation.
BIDS allows, but does not require documentation about the meanings
of the events.tsv
file columns in similarly-named
events.json
files referred to as JSON sidecars.
Text descriptions of events
The BIDS JSON sidecar format accommodates text descriptions of the meanings
and contents of event file columns in the
Description
and Levels
keys.
At a minimum, good text descriptions of the event file columns are needed in order for users to use the data correctly.
Machine actionable annotation with HED
The difficulty with just providing text descriptions of the event file columns and their contents is that users will usually be required to write custom code to use your data.
BIDS supports Hierarchical Event Descriptors (HED), which is an infrastructure and a controlled vocabulary that allows you to annotate your events in a manner that can be used directly by tools.
Remember: Most users will not be able to work with your dataset without having meaningful information about the dataset events.
Additional information
See Task events and Appendix III: Hierarchical Event Descriptors in the BIDS specification for an overview of events before getting started with your own annotation.
The next section provides an overview of the event annotation process and links to helpful guides and tutorials with the details.
The event annotation process
The goal of event annotation is to provide information about events needed for effective and correct data analysis.
Ideally, most of this information should be in a single events.json
sidecar file
located in the root directory of your dataset,
where it is easy to find and update.
An overview of how event annotation works in BIDS as well as tutorials about using available online tools to facilitate annotation can be found in the BIDS annotation quickstart.
There are several online tools available at HED Tools Online to help you during this process:
-
You can extract a ready-to-fill-in JSON sidecar template from a representative
events.tsv
file in your BIDS dataset. A step-by-step tutorial for doing this can be found in the Create a JSON template tutorial. -
Once you have a template, you can start editing the template directory, or you can convert the template to a spreadsheet and edit your annotations in Excel or another tool. Instructions for doing this are available in the Spreadsheet templates tutorial.
This process and templates make it convenient to provide basic descriptions, as well as HED tags for your dataset events.
A HED annotation quickstart outlines a step-by-step process for selecting HED tags during the annotation process.
HED schemas
The HED tags used to annotate data come from a controlled vocabulary called a HED schema. A HED schema is a structured vocabulary of terms consisting of top-level tags representing general categories in this vocabulary. Each top-level tag is the root of a tree containing tags falling into that category. This structure allows detailed and accurate annotation of events, machine validation of the annotations, and event description-based search across data collected in various studies.
The rules for HED schema vocabularies and HED-compliant tools can be found in the HED Specification.
HED library schemas
The HED standard schema contains basic terms that are common across most human neuroimaging, behavioral, and physiological experiments. The HED ecosystem schema libraries extend the standard HED schema with structured vocabularies, including terms unique to specific research fields. This allows the expansion of the HED vocabulary in a scalable manner to support specialized data annotations, for instance, electrophysiological events (HED-SCORE) or language stimuli (LISA).
Additional details about particular schemas can be found on the HED schemas documentation page. See HED schema developer’s guide to begin developing your own library schema or contribute to existing HED vocabularies.