Developer notes¶
Notes on how the repo / database is organized, intended for a new developer.
Development workflow¶
New contributors should follow the Fork-and-Branch workflow. See GitHub instructions here.
Regular contributors may choose to follow the Feature Branch Workflow for features that will involve multiple contributors.
Code organization¶
- Tables are grouped into schemas by topic (e.g.,
common_metrics
) - Schemas
- Are defined in a
py
pile. - Correspond to MySQL 'databases'.
- Are organized into modules (e.g.,
common
) by folders.
- Are defined in a
- The common module
- In principle, contains schema that are shared across all projects.
- In practice, contains shared tables (e.g., Session) and the first draft of
schemas that have since been split into their own
modality-specific\
modules (e.g.,
lfp
) - Should not be added to without discussion.
- A pipeline
- Refers to a set of tables used for processing data of a particular modality (e.g., LFP, spike sorting, position tracking).
- May span multiple schema.
- For analysis that will be only useful to you, create your own schema.
Types of tables¶
Spyglass uses DataJoint's default table tiers.
By convention, an individual pipeline has one or more the following table types:
- Common/Multi-pipeline table
- NWB ingestion table
- Parameters table
- Selection table
- Data table
- Merge Table (see also doc
Common/Multi-pipeline¶
Tables shared across multiple pipelines for shared data types.
- Naming convention: None
- Data tier:
dj.Manual
- Examples:
IntervalList
(time interval for any analysis),AnalysisNwbfile
(analysis NWB files)
Note: Because these are stand-alone tables not part of the dependency structure, developers should include enough information to link entries back to the pipeline where the data is used.
NWB ingestion¶
Automatically populated when an NWB file is ingested (i.e., dj.Imported
) to
keep track of object hashes (i.e., object_id
) in the NWB file. All such tables
should be included in the make
method of Session
.
- Naming convention: None
- Data tier:
dj.Imported
- Primary key: foreign key from
Session
- Non-primary key:
object_id
, the unique hash of an object in the NWB file. - Examples:
Raw
,Institution
, etc. - Required methods:
make
: must read information from an NWB file and insert it to the table.fetch_nwb
: retrieve the data specified by the object ID.
Parameters¶
Stores the set of values that may be used in an analysis.
- Naming convention: end with
Parameters
orParams
- Data tier:
dj.Manual
, ordj.Lookup
- Primary key:
{pipeline}_params_name
,varchar
- Non-primary key:
{pipeline}_params
,blob
- dict of parameters - Examples:
RippleParameters
,DLCModelParams
- Possible method: if
dj.Manual
, includeinsert_default
Notes: Some early instances of Parameter tables (a) used non-primary keys for each individual parameter, and (b) use the Manual rather than Lookup tier, requiring a class method to insert defaults.
Selection¶
A staging area to pair sessions with parameter sets, allowing us to be selective in the analyses we run. It may not make sense to pair every paramset with every session.
- Naming convention: end with
Selection
- Data tier:
dj.Manual
- Primary key(s): Foreign key references to
- one or more NWB or data tables
- optionally, one or more parameter tables
- Non-primary key: None
- Examples:
MetricSelection
,LFPSElection
It is possible for a Selection table to collect information from more than one
Parameter table. For example, the Selection table for spike sorting holds
information about both the interval (SortInterval
) and the group of electrodes
(SortGroup
) to be sorted.
Data¶
The output of processing steps associated with a selection table. Has a make
method that carries out the computation specified in the Selection table when
populate
is called.
- Naming convention: None
- Data tier:
dj.Computed
- Primary key: Foreign key reference to a Selection table.
- Non-primary key:
analysis_file_name
inherited fromAnalysisNwbfile
table (i.e., name of the analysis NWB file that will hold the output of the computation). - Required methods:
make
: carries out the computation and insert a new entry; must also create an analysis NWB file and insert it to theAnalysisNwbfile
table. Note that this method is never called directly; it is called viapopulate
. Multiple entries can be run in parallel when called withreserve_jobs=True
.delete
: extension of thedelete
method that checks user privilege before deleting entries as a way to prevent accidental deletion of computations that take a long time (see below).
- Example:
QualityMetrics
,LFPV1
Merge¶
Following a convention outlined in the dedicated doc, merges the output of different pipelines dedicated to the same modality as part tables (e.g., common LFP, LFP v1, imported LFP) to permit unified downstream processing.
- Naming convention:
{Pipeline}Output
- Data tier: custom
_Merge
class - Primary key:
merge_id
,uuid
- Non-primary key:
source
,varchar
table name associated with that entry - Required methods: None - see custom class methods with
merge_
prefix - Example:
LFPOutput
,PositionOutput
Integration with NWB¶
NWB files¶
NWB files contain everything about the experiment and form the starting point of all analyses.
- Naming:
{animal name}YYYYMMDD.nwb
- Storage:
- On disk, directory identified by
settings.py
asraw_dir
(e.g.,/stelmo/nwb/raw
) - In database, in the
Nwbfile
table
- On disk, directory identified by
- Copies:
- made with an underscore
{animal name}YYYYMMDD_.nwb
- stored in the same
raw_dir
- contain pointers to objects in original file
- permit adding new parts to the NWB file without risk of corrupting the original data
- made with an underscore
Analysis files¶
Hold the results of intermediate steps in the analysis.
- Naming:
{animal name}YYYYMMDD_{10-character random string}.nwb
- Storage:
- On disk, directory identified by
settings.py
asanalysis_dir
(e.g.,/stelmo/nwb/analysis
). Items are further sorted into folders matching original NWB file name - In database, in the
AnalysisNwbfile
table.
- On disk, directory identified by
- Examples: filtered recordings, spike times of putative units after sorting, or waveform snippets.
Note: Because NWB files and analysis files exist both on disk and listed in
tables, these can become out of sync. You can 'equalize' the database table
lists and the set of files on disk by running cleanup
method, which deletes
any files not listed in the table from disk.
Reading and writing recordings¶
Recordings start out as an NWB file, which is opened as a
NwbRecordingExtractor
, a class in spikeinterface
. When using sortingview
for visualizing the results of spike sorting, this recording is saved again in
HDF5 format. This duplication should be resolved in the future.
Naming convention¶
The following objects should be uniquely named.
- Recordings: Underscore-separated concatenations of uniquely defining
features,
NWBFileName_IntervalName_ElectrodeGroupName_PreprocessingParamsName
. - SpikeSorting: Adds
SpikeSorter_SorterParamName
to the name of the recording. - Waveforms: Adds
_WaveformParamName
to the name of the sorting. - Quality metrics: Adds
_MetricParamName
to the name of the waveform. - Analysis NWB files:
NWBFileName_IntervalName_ElectrodeGroupName_PreprocessingParamsName.nwb
- Each recording and sorting is given truncated UUID strings as part of concatenations.
Following broader Python conventions, methods a method that will not be
explicitly called by the user should start with _
Time¶
The IntervalList
table stores all time intervals in the following format:
[start_time, stop_time]
, which represents a contiguous time of valid data.
These are used to exclude any invalid timepoints, such as missing data from a
faulty connection.
- Intervals can be nested for a set of disjoint intervals.
- Some recordings have explicit PTP timestamps associated with each sample. Some older recordings are missing PTP times, and times must be inferred from the TTL pulses from the camera.
Misc¶
- During development, we suggest using a Docker container. See example.
- DataJoint is unable to set delete permissions on a per-table basis. If a user
is able to delete entries in a given table, she can delete entries in any
table in the schema. The
SpikeSorting
table extends the built-indelete
method to check if the username matches a list of allowed users whendelete
is called. Issues #226 and #586 track the progress of generalizing this feature. numpy
style docstrings will be interpreted by API docs. To check for compliance, monitor the std out when building docs (seedocs/README.md
)fetch_nwb
is currently reperated across many tables. For progress on a fix, follow issue #530
Making a release¶
Spyglass follows Semantic Versioning with versioning of
the form X.Y.Z
(e.g., 0.4.2
).
- In
CITATION.cff
, update theversion
key. - Make a pull request with changes.
- After the pull request is merged, pull this merge commit and tag it with
git tag {version}
- Publish the new release tag. Run
git push origin {version}
. This will rebuild docs and push updates to PyPI. - Make a new release on GitHub.