Insert Data¶
Intro¶
Developer Note: if you may make a PR in the future, be sure to copy this
notebook, and use the gitignore
prefix temp
to avoid future conflicts.
This is one notebook in a multi-part series on Spyglass.
- To set up your Spyglass environment and database, see the Setup notebook
- For additional info on DataJoint syntax, including table definitions and inserts, see these additional tutorials
Let's start by importing the spyglass
package, along with a few others.
import os
import datajoint as dj
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
os.chdir("..")
dj.config.load("dj_local_conf.json") # load config for database connection info
# ignore datajoint+jupyter async warnings
import warnings
warnings.simplefilter("ignore", category=DeprecationWarning)
warnings.simplefilter("ignore", category=ResourceWarning)
warnings.simplefilter("ignore", category=UserWarning)
# spyglass.common has the most frequently used tables
import spyglass.common as sgc
# spyglass.data_import has tools for inserting NWB files into the database
import spyglass.data_import as sgi
[2023-10-05 11:48:12,292][INFO]: Connecting root@localhost:3306 [2023-10-05 11:48:12,302][INFO]: Connected root@localhost:3306
Visualizing the database¶
Datajoint enables users to use Python to build and interact with a Relational Database. In a Relational Data Model, each table is an object that can reference information in other tables to avoid redundancy.
DataJoint has built-in tools for generating/saving a Diagram of the relationships between tables. This page describes the notation used.
Polygons are tables, colors reference table type:
- Green rectangle: tables whose entries are entered manually.
- Blue oval: tables whose entries are imported from external files (e.g. NWB file).
- Red circle: tables whose entries are computed from entries of other tables.
- No shape (only text): tables whose entries are part of the table upstream
Lines are dependencies between tables. An upstream table is connected to a downstream table via inheritance of the primary key. This is the set of attributes (i.e., column names) used to uniquely define an entry (i.e., a row)
- Bold lines: the upstream primary key is the sole downstream primary key
- Solid lines: the upstream table as part of the downstream primary key
- Dashed lines: the primary key of upstream table as non-primary key
# Draw tables that are two levels below and one level above Session
dj.Diagram(sgc.Session) - 1 + 2
By adding diagrams together, of adding and subtracting levels, we can visualize key parts of Spyglass.
Note: Notice the Selection tables. This is a design pattern that selects a subset of upstream items for further processing. In some cases, these also pair the selected data with processing parameters.
Example data¶
After exploring the pipeline's structure, we'll now grab some example data. Spyglass will assume that the data is a neural recording with relevant auxiliary in NWB.
We offer a few examples:
minirec20230622.nwb
, .3 GB: minimal recording, Linkmediumnwb20230802.nwb
, 32 GB: full-featured dataset, Linkmontague20200802.nwb
, 8 GB: full experimental recording, Link- For those in the UCSF network, these and many others on
/stelmo/nwb/raw
If you are connected to the Frank lab database, please rename any downloaded
files (e.g., example20200101_yourname.nwb
) to avoid naming collisions, as the
file name acts as the primary key across key tables.
from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
# Define the name of the file that you copied and renamed
nwb_file_name = "minirec20230622.nwb"
nwb_copy_file_name = get_nwb_copy_filename(nwb_file_name)
Spyglass will create a copy with this name.
nwb_copy_file_name
'minirec20230622_.nwb'
Basic Inserts: Lab Team¶
Let's start small by inserting personnel information.
The LabMember
table lists all lab members, with an additional part table for
LabMemberInfo
. This holds Google account and DataJoint username info for each
member, for authentication purposes.
We can insert lab member information using the NWB file experimenter
field
as follows...
# take a look at the lab members
sgc.LabMember.insert_from_nwbfile(nwb_file_name)
Please add the Google user ID for Firstname Lastname in the LabMember.LabMemberInfo table to help manage permissions. Please add the Google user ID for Firstname2 Lastname2 in the LabMember.LabMemberInfo table to help manage permissions.
We can insert
into LabMemberInfo
directly with a list of lists that reflect the order of
the fields present in the table. See
this notebook
for examples of inserting with dicts
.
sgc.LabMember.LabMemberInfo.insert(
[ # Full name, Google email address, DataJoint username
["Firstname Lastname", "example1@gmail.com", "example1"],
["Firstname2 Lastname2", "example2@gmail.com", "example2"],
],
skip_duplicates=True,
)
sgc.LabMember.LabMemberInfo()
lab_member_name | google_user_name used for permission to curate | datajoint_user_name used for permission to delete entries |
---|---|---|
Firstname Lastname | example1@gmail.com | example1 |
Firstname2 Lastname2 | example2@gmail.com | example2 |
Total: 2
A LabTeam
is a set of lab members who own a set of NWB files and the
associated information in the database. This is often a subgroup that
collaborates on the same projects. Data is associated with a given team,
granting members analysis (e.g., curation) and deletion (coming soon)
privileges.
sgc.LabTeam().create_new_team(
team_name="My Team", # Should be unique
team_members=["Firstname Lastname", "Firstname2 Lastname2"],
team_description="test", # Optional
)
By default, each member is part of their own team. We can see all teams and
members by looking at the
part table
LabTeam.LabTeamMember
.
sgc.LabTeam.LabTeamMember()
team_name | lab_member_name |
---|---|
Firstname Lastname | Firstname Lastname |
My Team | Firstname Lastname |
Firstname2 Lastname2 | Firstname2 Lastname2 |
My Team | Firstname2 Lastname2 |
Total: 4
Inserting from NWB¶
spyglass.data_import.insert_sessions
helps take the many fields of data
present in an NWB file and insert them into various tables across Spyglass. If
the NWB file is properly composed, this includes...
- the experimenter (replicating part of the process above)
- animal behavior (e.g. video recording of position)
- neural activity (extracellular recording of multiple brain areas)
- etc.
Note: this may take time as Spyglass creates the copy. You may see a prompt about inserting device information.
sgi.insert_sessions(nwb_file_name)
Creating a copy of NWB file minirec20230622.nwb with link to raw ephys data: minirec20230622_.nwb Populate Session... No config found at file path /home/cb/wrk/data/raw/minirec20230622_spyglass_config.yaml Institution... Lab... LabMember... Please add the Google user ID for Firstname2 Lastname2 in the LabMember.LabMemberInfo table to help manage permissions. Subject... Populate CameraDevice... Inserted camera devices ['test camera 1'] Populate Probe... Probe ID '128c-4s6mm6cm-15um-26um-sl' already exists in the database. Spyglass will use that and not create a new Probe, Shanks, or Electrodes. Inserted probes {'128c-4s6mm6cm-15um-26um-sl'} Skipping Apparatus for now... IntervalList... LabMember with name lastname, firstname does not exist. Cannot link Session with LabMember in Session.Experimenter. LabMember with name lastname2, firstname2 does not exist. Cannot link Session with LabMember in Session.Experimenter. Populate ElectrodeGroup... Populate Electrode... No config found at file path /home/cb/wrk/data/raw/minirec20230622_spyglass_config.yaml Populate Raw... Estimating sampling rate... Estimated sampling rate for file: 30000.0 Hz Importing raw data: Sampling rate: 30000.0 Hz Number of valid intervals: 2 Populate SampleCount... Populate DIOEvents... Populate TaskEpochs Populate StateScriptFile Populate VideoFile No video found corresponding to file minirec20230622_.nwb, epoch 01_s1 No video found corresponding to file minirec20230622_.nwb, epoch 02_s2 RawPosition... Estimated sampling rate for 12345: 30.0 Hz WARNING: Setting minimum valid interval to 5.1912336349487305 Estimated sampling rate for 12345: 30.0 Hz WARNING: Setting minimum valid interval to 5.1912336349487305 Estimated sampling rate for 12345: 30.0 Hz WARNING: Setting minimum valid interval to 5.339195609092712 Estimated sampling rate for 12345: 30.0 Hz WARNING: Setting minimum valid interval to 5.339195609092712 Populated PosIntervalMap for minirec20230622_.nwb, 01_s1 Populated PosIntervalMap for minirec20230622_.nwb, 02_s2
Inspecting the data¶
To look at data, we can
query
a table with Table()
syntax.
sgc.Lab()
lab_name |
---|
Loren Frank Lab |
Total: 1
The Session
table has considerably more fields
sgc.Session.heading.names
['nwb_file_name', 'subject_id', 'institution_name', 'lab_name', 'session_id', 'session_description', 'session_start_time', 'timestamps_reference_time', 'experiment_description']
But a short primary key
sgc.Session.heading.primary_key
['nwb_file_name']
The primary key is shown in bold in the html
sgc.Session()
nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
---|---|---|---|---|---|---|---|---|
minirec20230622_.nwb | 54321 | UCSF | Loren Frank Lab | 12345 | test yaml insertion | 2023-06-22 15:59:58 | 1970-01-01 00:00:00 | Test Conversion |
Total: 1
Text only interfaces designate the primary key fields with *
print(sgc.Session())
*nwb_file_name subject_id institution_na lab_name session_id session_descri session_start_ timestamps_ref experiment_des +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ minirec2023062 54321 UCSF Loren Frank La 12345 test yaml inse 2023-06-22 15: 1970-01-01 00: Test Conversio (Total: 1)
To see a the
table definition,
including
data types,
use describe
.
---
separates the primary key:
are used to separate field name from data type#
can be used to add comments to a field
from pprint import pprint # adds line breaks
pprint(sgc.Session.describe())
('# Table for holding experimental sessions.\n' '-> sgc.Nwbfile\n' '---\n' '-> [nullable] sgc.Subject\n' '-> [nullable] sgc.Institution\n' '-> [nullable] sgc.Lab\n' 'session_id=null : varchar(200) \n' 'session_description : varchar(2000) \n' 'session_start_time : datetime \n' 'timestamps_reference_time : datetime \n' 'experiment_description=null : varchar(2000) \n')
sgc.Session & {"nwb_file_name": nwb_copy_file_name}
nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
---|---|---|---|---|---|---|---|---|
minirec20230622_.nwb | 54321 | UCSF | Loren Frank Lab | 12345 | test yaml insertion | 2023-06-22 15:59:58 | 1970-01-01 00:00:00 | Test Conversion |
Total: 1
Raw
is connected to Session
with a bold line, so it has the same primary key.
dj.Diagram(sgc.Session) + dj.Diagram(sgc.Raw)
sgc.Raw & {"nwb_file_name": nwb_copy_file_name}
nwb_file_name name of the NWB file | interval_list_name descriptive name of this interval list | raw_object_id the NWB object ID for loading this object from the file | sampling_rate Sampling rate calculated from data, in Hz | comments | description |
---|---|---|---|---|---|
minirec20230622_.nwb | raw data valid times | 4e756642-9203-4f00-b9d0-0e9747c14493 | 30000.0 | No comment | Recording of extracellular voltage |
Total: 1
IntervalList
is connected to Session
with a solid line because it has
additional primary key attributes. Here, you need to know both nwb_file_name
and interval_list_name
to uniquely identify an entry.
# join/split condenses long spaces before field comments
pprint("".join(sgc.IntervalList.describe().split(" ")))
('# Time intervals used for analysis\n' '-> sgc.Session\n' 'interval_list_name : varchar(200) # descriptive name of this interval list\n' '---\n' 'valid_times: longblob # numpy array with start and end times for each ' 'interval\n')
sgc.IntervalList & {"nwb_file_name": nwb_copy_file_name}
nwb_file_name name of the NWB file | interval_list_name descriptive name of this interval list | valid_times numpy array with start and end times for each interval |
---|---|---|
minirec20230622_.nwb | 01_s1 | =BLOB= |
minirec20230622_.nwb | 02_s2 | =BLOB= |
minirec20230622_.nwb | pos 0 valid times | =BLOB= |
minirec20230622_.nwb | pos 1 valid times | =BLOB= |
minirec20230622_.nwb | raw data valid times | =BLOB= |
Total: 5
Raw data types like valid_times
are shown as =BLOB=
. We can inspect
these with fetch
Note: like insert
/insert1
, fetch
can be uses as fetch1
to raise an
error when many (or no) entries are retrieved. To limit to one entry when there
may be many, use query.fetch(limit=1)[0]
(
sgc.IntervalList
& {
"nwb_file_name": nwb_copy_file_name,
"interval_list_name": "pos 1 valid times",
}
).fetch1("valid_times")
array([[1.68747483e+09, 1.68747484e+09]])
In DataJoint operators,
&
selects by a condition and -
removes a condition.
(
(
(sgc.IntervalList & {"nwb_file_name": nwb_copy_file_name})
- {"interval_list_name": "pos 1 valid times"}
)
- {"interval_list_name": "pos 2 valid times"}
).fetch("interval_list_name")
array(['01_s1', '02_s2', 'pos 0 valid times', 'raw data valid times'], dtype=object)
IntervalList
has an additional secondary key pipeline
which can describe the origin of the data.
Because it is a secondary key, it is not required to uniquely identify an entry.
Current values for this key from spyglass pipelines are:
pipeline | Source |
---|---|
position | sg.common.PositionSource |
lfp_v0 | sg.common.LFP |
lfp_v1 | sg.lfp.v1.LFPV1 |
lfp_band | sg.common.LFPBand, sg.lfp.analysis.v1.LFPBandV1 |
lfp_artifact | sg.lfp.v1.LFPArtifactDetection |
spikesorting_artifact_v0 | sg.spikesorting.ArtifactDetection |
spikesorting_artifact_v1 | sg.spikesorting.v1.ArtifactDetection |
spikesorting_recording_v0 | sg.spikesorting.SpikeSortingRecording |
spikesorting_recording_v1 | sg.spikesorting.v1.SpikeSortingRecording |
Deleting data¶
Another neat feature of DataJoint is that it automatically maintains
data integrity
with cascading deletes. For example, if we delete our Session
entry, all
associated downstream entries are also deleted (e.g. Raw
, IntervalList
).
Note: The deletion process can be complicated by Merge Tables when the entry is referenced by a part table. To demo deletion in these cases, run the hidden code below.
Quick Merge Insert
import spyglass.lfp as lfp
sgc.FirFilterParameters().create_standard_filters()
lfp.lfp_electrode.LFPElectrodeGroup.create_lfp_electrode_group(
nwb_file_name=nwb_copy_file_name,
group_name="test",
electrode_list=[0],
)
lfp_key = {
"nwb_file_name": nwb_copy_file_name,
"lfp_electrode_group_name": "test",
"target_interval_list_name": "01_s1",
"filter_name": "LFP 0-400 Hz",
"filter_sampling_rate": 30_000,
}
lfp.v1.LFPSelection.insert1(lfp_key, skip_duplicates=True)
lfp.v1.LFPV1().populate(lfp_key)
Deleting Merge Entries
from spyglass.utils.dj_merge_tables import delete_downstream_merge
delete_downstream_merge(
sgc.Nwbfile(),
restriction={"nwb_file_name": nwb_copy_file_name},
dry_run=False, # True will show Merge Table entries that would be deleted
)
session_entry = sgc.Session & {"nwb_file_name": nwb_copy_file_name}
session_entry
nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
---|---|---|---|---|---|---|---|---|
minirec20230622_.nwb | 54321 | UCSF | Loren Frank Lab | 12345 | test yaml insertion | 2023-06-22 15:59:58 | 1970-01-01 00:00:00 | Test Conversion |
Total: 1
By default, DataJoint is cautious about deletes and will prompt before deleting.
To delete, uncomment the cell below and respond yes
in the prompt.
session_entry.delete()
[2023-09-28 08:29:15,814][INFO]: Deleting 4 rows from `common_behav`.`_raw_position__pos_object` INFO:datajoint:Deleting 4 rows from `common_behav`.`_raw_position__pos_object` [2023-09-28 08:29:15,822][INFO]: Deleting 2 rows from `common_behav`.`_raw_position` INFO:datajoint:Deleting 2 rows from `common_behav`.`_raw_position` [2023-09-28 08:29:15,834][INFO]: Deleting 4 rows from `common_behav`.`position_source__spatial_series` INFO:datajoint:Deleting 4 rows from `common_behav`.`position_source__spatial_series` [2023-09-28 08:29:15,841][INFO]: Deleting 2 rows from `common_behav`.`position_source` INFO:datajoint:Deleting 2 rows from `common_behav`.`position_source` [2023-09-28 08:29:15,851][INFO]: Deleting 7 rows from `common_dio`.`_d_i_o_events` INFO:datajoint:Deleting 7 rows from `common_dio`.`_d_i_o_events` [2023-09-28 08:29:15,871][INFO]: Deleting 128 rows from `common_ephys`.`_electrode` INFO:datajoint:Deleting 128 rows from `common_ephys`.`_electrode` [2023-09-28 08:29:15,879][INFO]: Deleting 1 rows from `common_ephys`.`_electrode_group` INFO:datajoint:Deleting 1 rows from `common_ephys`.`_electrode_group` [2023-09-28 08:29:15,887][INFO]: Deleting 1 rows from `common_ephys`.`_raw` INFO:datajoint:Deleting 1 rows from `common_ephys`.`_raw` [2023-09-28 08:29:15,896][INFO]: Deleting 1 rows from `common_ephys`.`_sample_count` INFO:datajoint:Deleting 1 rows from `common_ephys`.`_sample_count` [2023-09-28 08:29:15,908][INFO]: Deleting 2 rows from `common_behav`.`__position_interval_map` INFO:datajoint:Deleting 2 rows from `common_behav`.`__position_interval_map` [2023-09-28 08:29:15,918][INFO]: Deleting 2 rows from `common_task`.`_task_epoch` INFO:datajoint:Deleting 2 rows from `common_task`.`_task_epoch` [2023-09-28 08:29:15,924][INFO]: Deleting 5 rows from `common_interval`.`interval_list` INFO:datajoint:Deleting 5 rows from `common_interval`.`interval_list` [2023-09-28 08:29:15,931][INFO]: Deleting 1 rows from `common_session`.`_session` INFO:datajoint:Deleting 1 rows from `common_session`.`_session` [2023-09-28 08:29:18,765][INFO]: Deletes committed. INFO:datajoint:Deletes committed.
1
We can check that delete worked, both for Session
and IntervalList
sgc.Session & {"nwb_file_name": nwb_copy_file_name}
nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
---|---|---|---|---|---|---|---|---|
Total: 0
sgc.IntervalList & {"nwb_file_name": nwb_copy_file_name}
nwb_file_name name of the NWB file | interval_list_name descriptive name of this interval list | valid_times numpy array with start and end times for each interval |
---|---|---|
Total: 0
delete
is useful for re-running something. Editing entries is possible, but
discouraged because it can lead to
integrity
issues. Instead, re-enter and let the automation handle the rest.
Spyglass falls short, however, in that deleting from Session
doesn't also
delete the associated entry in Nwbfile
, which has to be removed separately
(for now). This table offers a cleanup
method to remove the added files (with
the delete_files
argument as True
).
Note: this also applies to deleting files from AnalysisNwbfile
table.
# Uncomment to delete
# (sgc.Nwbfile & {"nwb_file_name": nwb_copy_file_name}).delete()
[2023-07-18 19:01:15,343][INFO]: Deleting 1 rows from `common_nwbfile`.`nwbfile` INFO:datajoint:Deleting 1 rows from `common_nwbfile`.`nwbfile` [2023-07-18 19:01:17,130][INFO]: Deletes committed. INFO:datajoint:Deletes committed.
1
Note that the file (ends with _.nwb
) has not been deleted, even if the entry
was deleted above.
!ls $SPYGLASS_BASE_DIR/raw
minirec20230622.nwb minirec20230622_.nwb montague20200802_tutorial.nwb montague20200802_tutorial_.nwb montague20200802_tutorial__.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim_.nwb tonks20211103_.nwb
We can clean these files with the cleanup
method
sgc.Nwbfile().cleanup(delete_files=True)
100%|██████████| 1/1 [00:00<00:00, 304.24it/s]
!ls $SPYGLASS_BASE_DIR/raw
minirec20230622.nwb montague20200802_tutorial.nwb montague20200802_tutorial_.nwb montague20200802_tutorial__.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim.nwb sub-despereaux_ses-despereaux-08_behavior+ecephys_trim_.nwb tonks20211103_.nwb
Up Next¶
In the next notebook, we'll explore tools for syncing.