Insert Data¶
Intro¶
Developer Note: if you may make a PR in the future, be sure to copy this
notebook, and use the gitignore prefix temp to avoid future conflicts.
This is one notebook in a multi-part series on Spyglass.
- To set up your Spyglass environment and database, see the Setup notebook
- For additional info on DataJoint syntax, including table definitions and inserts, see these additional tutorials
Let's start by importing the spyglass package, along with a few others.
import os
import datajoint as dj
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
os.chdir("..")
dj.config.load("dj_local_conf.json") # load config for database connection info
# ignore datajoint+jupyter async warnings
import warnings
warnings.simplefilter("ignore", category=DeprecationWarning)
warnings.simplefilter("ignore", category=ResourceWarning)
warnings.simplefilter("ignore", category=UserWarning)
# spyglass.common has the most frequently used tables
import spyglass.common as sgc
# spyglass.data_import has tools for inserting NWB files into the database
import spyglass.data_import as sgi
[2023-10-05 11:48:12,292][INFO]: Connecting root@localhost:3306 [2023-10-05 11:48:12,302][INFO]: Connected root@localhost:3306
Visualizing the database¶
Datajoint enables users to use Python to build and interact with a Relational Database. In a Relational Data Model, each table is an object that can reference information in other tables to avoid redundancy.
DataJoint has built-in tools for generating/saving a Diagram of the relationships between tables. This page describes the notation used.
Polygons are tables, colors reference table type:
- Green rectangle: tables whose entries are entered manually.
- Blue oval: tables whose entries are imported from external files (e.g. NWB file).
- Red circle: tables whose entries are computed from entries of other tables.
- No shape (only text): tables whose entries are part of the table upstream
Lines are dependencies between tables. An upstream table is connected to a downstream table via inheritance of the primary key. This is the set of attributes (i.e., column names) used to uniquely define an entry (i.e., a row)
- Bold lines: the upstream primary key is the sole downstream primary key
- Solid lines: the upstream table as part of the downstream primary key
- Dashed lines: the primary key of upstream table as non-primary key
# Draw tables that are two levels below and one level above Session
dj.Diagram(sgc.Session) - 1 + 2
By adding diagrams together, of adding and subtracting levels, we can visualize key parts of Spyglass.
Note: Notice the Selection tables. This is a design pattern that selects a subset of upstream items for further processing. In some cases, these also pair the selected data with processing parameters.
Example data¶
After exploring the pipeline's structure, we'll now grab some example data. Spyglass will assume that the data is a neural recording with relevant auxiliary in NWB.
We offer a few examples:
minirec20230622.nwb, .3 GB: minimal recording, Linkmediumnwb20230802.nwb, 32 GB: full-featured dataset, Linkmontague20200802.nwb, 8 GB: full experimental recording, Link- For those in the UCSF network, these and many others on
/stelmo/nwb/raw
If you are connected to the Frank lab database, please rename any downloaded
files (e.g., example20200101_yourname.nwb) to avoid naming collisions, as the
file name acts as the primary key across key tables.
from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
# Define the name of the file that you copied and renamed
nwb_file_name = "minirec20230622.nwb"
nwb_copy_file_name = get_nwb_copy_filename(nwb_file_name)
Spyglass will create a copy with this name.
nwb_copy_file_name
'minirec20230622_.nwb'
Basic Inserts: Lab Team¶
Let's start small by inserting personnel information.
The LabMember table lists all lab members, with an additional part table for
LabMemberInfo. This holds Google account and DataJoint username info for each
member, for authentication purposes.
We can insert lab member information using the NWB file experimenter field
as follows...
# take a look at the lab members
sgc.LabMember.insert_from_nwbfile(nwb_file_name)
Please add the Google user ID for Firstname Lastname in the LabMember.LabMemberInfo table to help manage permissions. Please add the Google user ID for Firstname2 Lastname2 in the LabMember.LabMemberInfo table to help manage permissions.
We can insert
into LabMemberInfo directly with a list of lists that reflect the order of
the fields present in the table. See
this notebook
for examples of inserting with dicts.
sgc.LabMember.LabMemberInfo.insert(
[ # Full name, Google email address, DataJoint username
["Firstname Lastname", "example1@gmail.com", "example1"],
["Firstname2 Lastname2", "example2@gmail.com", "example2"],
],
skip_duplicates=True,
)
sgc.LabMember.LabMemberInfo()
| lab_member_name | google_user_name used for permission to curate | datajoint_user_name used for permission to delete entries |
|---|---|---|
| Firstname Lastname | example1@gmail.com | example1 |
| Firstname2 Lastname2 | example2@gmail.com | example2 |
Total: 2
A LabTeam is a set of lab members who own a set of NWB files and the
associated information in the database. This is often a subgroup that
collaborates on the same projects. Data is associated with a given team,
granting members analysis (e.g., curation) and deletion (coming soon)
privileges.
sgc.LabTeam().create_new_team(
team_name="My Team", # Should be unique
team_members=["Firstname Lastname", "Firstname2 Lastname2"],
team_description="test", # Optional
)
By default, each member is part of their own team. We can see all teams and
members by looking at the
part table
LabTeam.LabTeamMember.
sgc.LabTeam.LabTeamMember()
| team_name | lab_member_name |
|---|---|
| Firstname Lastname | Firstname Lastname |
| My Team | Firstname Lastname |
| Firstname2 Lastname2 | Firstname2 Lastname2 |
| My Team | Firstname2 Lastname2 |
Total: 4
Inserting from NWB¶
spyglass.data_import.insert_sessions helps take the many fields of data
present in an NWB file and insert them into various tables across Spyglass. If
the NWB file is properly composed, this includes...
- the experimenter (replicating part of the process above)
- animal behavior (e.g. video recording of position)
- neural activity (extracellular recording of multiple brain areas)
- etc.
Note: this may take time as Spyglass creates the copy. You may see a prompt about inserting device information.
sgi.insert_sessions(nwb_file_name)
Creating a copy of NWB file minirec20230622.nwb with link to raw ephys data: minirec20230622_.nwb
Populate Session...
No config found at file path /home/cb/wrk/data/raw/minirec20230622_spyglass_config.yaml
Institution...
Lab...
LabMember...
Please add the Google user ID for Firstname2 Lastname2 in the LabMember.LabMemberInfo table to help manage permissions.
Subject...
Populate CameraDevice...
Inserted camera devices ['test camera 1']
Populate Probe...
Probe ID '128c-4s6mm6cm-15um-26um-sl' already exists in the database. Spyglass will use that and not create a new Probe, Shanks, or Electrodes.
Inserted probes {'128c-4s6mm6cm-15um-26um-sl'}
Skipping Apparatus for now...
IntervalList...
LabMember with name lastname, firstname does not exist. Cannot link Session with LabMember in Session.Experimenter.
LabMember with name lastname2, firstname2 does not exist. Cannot link Session with LabMember in Session.Experimenter.
Populate ElectrodeGroup...
Populate Electrode...
No config found at file path /home/cb/wrk/data/raw/minirec20230622_spyglass_config.yaml
Populate Raw...
Estimating sampling rate...
Estimated sampling rate for file: 30000.0 Hz
Importing raw data: Sampling rate: 30000.0 Hz
Number of valid intervals: 2
Populate SampleCount...
Populate DIOEvents...
Populate TaskEpochs
Populate StateScriptFile
Populate VideoFile
No video found corresponding to file minirec20230622_.nwb, epoch 01_s1
No video found corresponding to file minirec20230622_.nwb, epoch 02_s2
RawPosition...
Estimated sampling rate for 12345: 30.0 Hz
WARNING: Setting minimum valid interval to 5.1912336349487305
Estimated sampling rate for 12345: 30.0 Hz
WARNING: Setting minimum valid interval to 5.1912336349487305
Estimated sampling rate for 12345: 30.0 Hz
WARNING: Setting minimum valid interval to 5.339195609092712
Estimated sampling rate for 12345: 30.0 Hz
WARNING: Setting minimum valid interval to 5.339195609092712
Populated PosIntervalMap for minirec20230622_.nwb, 01_s1
Populated PosIntervalMap for minirec20230622_.nwb, 02_s2
Inspecting the data¶
To look at data, we can
query
a table with Table() syntax.
sgc.Lab()
| lab_name |
|---|
| Loren Frank Lab |
Total: 1
The Session table has considerably more fields
sgc.Session.heading.names
['nwb_file_name', 'subject_id', 'institution_name', 'lab_name', 'session_id', 'session_description', 'session_start_time', 'timestamps_reference_time', 'experiment_description']
But a short primary key
sgc.Session.heading.primary_key
['nwb_file_name']
The primary key is shown in bold in the html
sgc.Session()
| nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
|---|---|---|---|---|---|---|---|---|
| minirec20230622_.nwb | 54321 | UCSF | Loren Frank Lab | 12345 | test yaml insertion | 2023-06-22 15:59:58 | 1970-01-01 00:00:00 | Test Conversion |
Total: 1
Text only interfaces designate the primary key fields with *
print(sgc.Session())
*nwb_file_name subject_id institution_na lab_name session_id session_descri session_start_ timestamps_ref experiment_des +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ minirec2023062 54321 UCSF Loren Frank La 12345 test yaml inse 2023-06-22 15: 1970-01-01 00: Test Conversio (Total: 1)
To see a the
table definition,
including
data types,
use describe.
---separates the primary key:are used to separate field name from data type#can be used to add comments to a field
from pprint import pprint # adds line breaks
pprint(sgc.Session.describe())
('# Table for holding experimental sessions.\n'
'-> sgc.Nwbfile\n'
'---\n'
'-> [nullable] sgc.Subject\n'
'-> [nullable] sgc.Institution\n'
'-> [nullable] sgc.Lab\n'
'session_id=null : varchar(200) \n'
'session_description : varchar(2000) \n'
'session_start_time : datetime \n'
'timestamps_reference_time : datetime \n'
'experiment_description=null : varchar(2000) \n')
sgc.Session & {"nwb_file_name": nwb_copy_file_name}
| nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
|---|---|---|---|---|---|---|---|---|
| minirec20230622_.nwb | 54321 | UCSF | Loren Frank Lab | 12345 | test yaml insertion | 2023-06-22 15:59:58 | 1970-01-01 00:00:00 | Test Conversion |
Total: 1
Raw is connected to Session with a bold line, so it has the same primary key.
dj.Diagram(sgc.Session) + dj.Diagram(sgc.Raw)
sgc.Raw & {"nwb_file_name": nwb_copy_file_name}
| nwb_file_name name of the NWB file | interval_list_name descriptive name of this interval list | raw_object_id the NWB object ID for loading this object from the file | sampling_rate Sampling rate calculated from data, in Hz | comments | description |
|---|---|---|---|---|---|
| minirec20230622_.nwb | raw data valid times | 4e756642-9203-4f00-b9d0-0e9747c14493 | 30000.0 | No comment | Recording of extracellular voltage |
Total: 1
IntervalList is connected to Session with a solid line because it has
additional primary key attributes. Here, you need to know both nwb_file_name
and interval_list_name to uniquely identify an entry.
# join/split condenses long spaces before field comments
pprint("".join(sgc.IntervalList.describe().split(" ")))
('# Time intervals used for analysis\n'
'-> sgc.Session\n'
'interval_list_name : varchar(200) # descriptive name of this interval list\n'
'---\n'
'valid_times: longblob # numpy array with start and end times for each '
'interval\n')
sgc.IntervalList & {"nwb_file_name": nwb_copy_file_name}
| nwb_file_name name of the NWB file | interval_list_name descriptive name of this interval list | valid_times numpy array with start and end times for each interval |
|---|---|---|
| minirec20230622_.nwb | 01_s1 | =BLOB= |
| minirec20230622_.nwb | 02_s2 | =BLOB= |
| minirec20230622_.nwb | pos 0 valid times | =BLOB= |
| minirec20230622_.nwb | pos 1 valid times | =BLOB= |
| minirec20230622_.nwb | raw data valid times | =BLOB= |
Total: 5
Raw data types like valid_times are shown as =BLOB=. We can inspect
these with fetch
Note: like insert/insert1, fetch can be uses as fetch1 to raise an
error when many (or no) entries are retrieved. To limit to one entry when there
may be many, use query.fetch(limit=1)[0]
(
sgc.IntervalList
& {
"nwb_file_name": nwb_copy_file_name,
"interval_list_name": "pos 1 valid times",
}
).fetch1("valid_times")
array([[1.68747483e+09, 1.68747484e+09]])
In DataJoint operators,
& selects by a condition and - removes a condition.
(
(
(sgc.IntervalList & {"nwb_file_name": nwb_copy_file_name})
- {"interval_list_name": "pos 1 valid times"}
)
- {"interval_list_name": "pos 2 valid times"}
).fetch("interval_list_name")
array(['01_s1', '02_s2', 'pos 0 valid times', 'raw data valid times'],
dtype=object)
IntervalList has an additional secondary key pipeline which can describe the origin of the data.
Because it is a secondary key, it is not required to uniquely identify an entry.
Current values for this key from spyglass pipelines are:
| pipeline | Source |
|---|---|
| position | sg.common.PositionSource |
| lfp_v0 | sg.common.LFP |
| lfp_v1 | sg.lfp.v1.LFPV1 |
| lfp_band | sg.common.LFPBand, sg.lfp.analysis.v1.LFPBandV1 |
| lfp_artifact | sg.lfp.v1.LFPArtifactDetection |
| spikesorting_artifact_v0 | sg.spikesorting.ArtifactDetection |
| spikesorting_artifact_v1 | sg.spikesorting.v1.ArtifactDetection |
| spikesorting_recording_v0 | sg.spikesorting.SpikeSortingRecording |
| spikesorting_recording_v1 | sg.spikesorting.v1.SpikeSortingRecording |
Deleting data¶
Another neat feature of DataJoint is that it automatically maintains
data integrity
with cascading deletes. For example, if we delete our Session entry, all
associated downstream entries are also deleted (e.g. Raw, IntervalList).
Note: The deletion process can be complicated by Merge Tables when the entry is referenced by a part table. To demo deletion in these cases, run the hidden code below.
Quick Merge Insert
import spyglass.lfp as lfp
sgc.FirFilterParameters().create_standard_filters()
lfp.lfp_electrode.LFPElectrodeGroup.create_lfp_electrode_group(
nwb_file_name=nwb_copy_file_name,
group_name="test",
electrode_list=[0],
)
lfp_key = {
"nwb_file_name": nwb_copy_file_name,
"lfp_electrode_group_name": "test",
"target_interval_list_name": "01_s1",
"filter_name": "LFP 0-400 Hz",
"filter_sampling_rate": 30_000,
}
lfp.v1.LFPSelection.insert1(lfp_key, skip_duplicates=True)
lfp.v1.LFPV1().populate(lfp_key)
Deleting Merge Entries
from spyglass.utils.dj_merge_tables import delete_downstream_merge
delete_downstream_merge(
sgc.Nwbfile(),
restriction={"nwb_file_name": nwb_copy_file_name},
dry_run=False, # True will show Merge Table entries that would be deleted
)
session_entry = sgc.Session & {"nwb_file_name": nwb_copy_file_name}
session_entry
| nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
|---|---|---|---|---|---|---|---|---|
| minirec20230622_.nwb | 54321 | UCSF | Loren Frank Lab | 12345 | test yaml insertion | 2023-06-22 15:59:58 | 1970-01-01 00:00:00 | Test Conversion |
Total: 1
By default, DataJoint is cautious about deletes and will prompt before deleting.
To delete, uncomment the cell below and respond yes in the prompt.
session_entry.delete()
[2023-09-28 08:29:15,814][INFO]: Deleting 4 rows from `common_behav`.`_raw_position__pos_object` INFO:datajoint:Deleting 4 rows from `common_behav`.`_raw_position__pos_object` [2023-09-28 08:29:15,822][INFO]: Deleting 2 rows from `common_behav`.`_raw_position` INFO:datajoint:Deleting 2 rows from `common_behav`.`_raw_position` [2023-09-28 08:29:15,834][INFO]: Deleting 4 rows from `common_behav`.`position_source__spatial_series` INFO:datajoint:Deleting 4 rows from `common_behav`.`position_source__spatial_series` [2023-09-28 08:29:15,841][INFO]: Deleting 2 rows from `common_behav`.`position_source` INFO:datajoint:Deleting 2 rows from `common_behav`.`position_source` [2023-09-28 08:29:15,851][INFO]: Deleting 7 rows from `common_dio`.`_d_i_o_events` INFO:datajoint:Deleting 7 rows from `common_dio`.`_d_i_o_events` [2023-09-28 08:29:15,871][INFO]: Deleting 128 rows from `common_ephys`.`_electrode` INFO:datajoint:Deleting 128 rows from `common_ephys`.`_electrode` [2023-09-28 08:29:15,879][INFO]: Deleting 1 rows from `common_ephys`.`_electrode_group` INFO:datajoint:Deleting 1 rows from `common_ephys`.`_electrode_group` [2023-09-28 08:29:15,887][INFO]: Deleting 1 rows from `common_ephys`.`_raw` INFO:datajoint:Deleting 1 rows from `common_ephys`.`_raw` [2023-09-28 08:29:15,896][INFO]: Deleting 1 rows from `common_ephys`.`_sample_count` INFO:datajoint:Deleting 1 rows from `common_ephys`.`_sample_count` [2023-09-28 08:29:15,908][INFO]: Deleting 2 rows from `common_behav`.`__position_interval_map` INFO:datajoint:Deleting 2 rows from `common_behav`.`__position_interval_map` [2023-09-28 08:29:15,918][INFO]: Deleting 2 rows from `common_task`.`_task_epoch` INFO:datajoint:Deleting 2 rows from `common_task`.`_task_epoch` [2023-09-28 08:29:15,924][INFO]: Deleting 5 rows from `common_interval`.`interval_list` INFO:datajoint:Deleting 5 rows from `common_interval`.`interval_list` [2023-09-28 08:29:15,931][INFO]: Deleting 1 rows from `common_session`.`_session` INFO:datajoint:Deleting 1 rows from `common_session`.`_session` [2023-09-28 08:29:18,765][INFO]: Deletes committed. INFO:datajoint:Deletes committed.
1
We can check that delete worked, both for Session and IntervalList
sgc.Session & {"nwb_file_name": nwb_copy_file_name}
| nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
|---|---|---|---|---|---|---|---|---|
Total: 0
sgc.IntervalList & {"nwb_file_name": nwb_copy_file_name}
| nwb_file_name name of the NWB file | interval_list_name descriptive name of this interval list | valid_times numpy array with start and end times for each interval |
|---|---|---|
Total: 0
delete is useful for re-running something. Editing entries is possible, but
discouraged because it can lead to
integrity
issues. Instead, re-enter and let the automation handle the rest.
Spyglass falls short, however, in that deleting from Session doesn't also
delete the associated entry in Nwbfile, which has to be removed separately
(for now). This table offers a cleanup method to remove the added files (with
the delete_files argument as True).
Note: this also applies to deleting files from AnalysisNwbfile table.
# Uncomment to delete
# (sgc.Nwbfile & {"nwb_file_name": nwb_copy_file_name}).delete()
[2023-07-18 19:01:15,343][INFO]: Deleting 1 rows from `common_nwbfile`.`nwbfile` INFO:datajoint:Deleting 1 rows from `common_nwbfile`.`nwbfile` [2023-07-18 19:01:17,130][INFO]: Deletes committed. INFO:datajoint:Deletes committed.
1
Note that the file (ends with _.nwb) has not been deleted, even if the entry
was deleted above.
!ls $SPYGLASS_BASE_DIR/raw
minirec20230622.nwb minirec20230622_.nwb montague20200802_tutorial.nwb montague20200802_tutorial_.nwb montague20200802_tutorial__.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim_.nwb tonks20211103_.nwb
We can clean these files with the cleanup method
sgc.Nwbfile().cleanup(delete_files=True)
100%|██████████| 1/1 [00:00<00:00, 304.24it/s]
!ls $SPYGLASS_BASE_DIR/raw
minirec20230622.nwb montague20200802_tutorial.nwb montague20200802_tutorial_.nwb montague20200802_tutorial__.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim_.nwb tonks20211103_.nwb
Up Next¶
In the next notebook, we'll explore tools for syncing.