Merge Tables¶
Intro¶
Developer Note: if you may make a PR in the future, be sure to copy this
notebook, and use the gitignore
prefix temp
to avoid future conflicts.
This is one notebook in a multi-part series on Spyglass.
- To set up your Spyglass environment and database, see the Setup notebook
- To insert data, see the Insert Data notebook
- For additional info on DataJoint syntax, including table definitions and inserts, see these additional tutorials
- For information on why we use merge tables, and how to make one, see our documentation
In short, merge tables represent the end processing point of a given way of processing the data in our pipelines. Merge Tables allow us to build new processing pipeline, or a new version of an existing pipeline, without having to drop or migrate the old tables. They allow data to be processed in different ways, but with a unified end result that downstream pipelines can all access.
Imports¶
Let's start by importing the spyglass
package, along with a few others.
import os
import datajoint as dj
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
os.chdir("..")
dj.config.load("dj_local_conf.json") # load config for database connection info
# ignore datajoint+jupyter async warnings
import warnings
warnings.simplefilter("ignore", category=DeprecationWarning)
warnings.simplefilter("ignore", category=ResourceWarning)
warnings.simplefilter("ignore", category=UserWarning)
import spyglass.common as sgc
import spyglass.lfp as lfp
from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
from spyglass.utils.dj_merge_tables import delete_downstream_parts, Merge
from spyglass.common.common_ephys import LFP as CommonLFP # Upstream 1
from spyglass.lfp.lfp_merge import LFPOutput # Merge Table
from spyglass.lfp.v1.lfp import LFPV1 # Upstream 2
[2024-01-29 16:15:00,903][INFO]: Connecting root@localhost:3309 [2024-01-29 16:15:00,912][INFO]: Connected root@localhost:3309
Example data¶
Check to make sure the data inserted in the previour notebook is still there.
nwb_file_name = "minirec20230622.nwb"
nwb_copy_file_name = get_nwb_copy_filename(nwb_file_name)
nwb_file_dict = {"nwb_file_name": nwb_copy_file_name}
sgc.Session & nwb_file_dict
nwb_file_name name of the NWB file | subject_id | institution_name | lab_name | session_id | session_description | session_start_time | timestamps_reference_time | experiment_description |
---|---|---|---|---|---|---|---|---|
minirec20230622_.nwb | 54321 | UCSF | Loren Frank Lab | 12345 | test yaml insertion | 2023-06-22 15:59:58 | 1970-01-01 00:00:00 | Test Conversion |
Total: 1
If you haven't already done so, insert data into a Merge Table.
Note: Some existing parents of Merge Tables perform the Merge Table insert as part of the populate methods. This practice will be revised in the future.
sgc.FirFilterParameters().create_standard_filters()
lfp.lfp_electrode.LFPElectrodeGroup.create_lfp_electrode_group(
nwb_file_name=nwb_copy_file_name,
group_name="test",
electrode_list=[0],
)
lfp_key = {
"nwb_file_name": nwb_copy_file_name,
"lfp_electrode_group_name": "test",
"target_interval_list_name": "01_s1",
"filter_name": "LFP 0-400 Hz",
"filter_sampling_rate": 30_000,
}
lfp.v1.LFPSelection.insert1(lfp_key, skip_duplicates=True)
lfp.v1.LFPV1().populate(lfp_key)
LFPOutput.insert([lfp_key], skip_duplicates=True)
Helper functions¶
Merge Tables have multiple custom methods that begin with merge
.
help
can show us the docstring of each
merge_methods = [d for d in dir(Merge) if d.startswith("merge")]
print(merge_methods)
['merge_delete', 'merge_delete_parent', 'merge_fetch', 'merge_get_parent', 'merge_get_parent_class', 'merge_get_part', 'merge_html', 'merge_populate', 'merge_restrict', 'merge_restrict_class', 'merge_view']
help(getattr(Merge, merge_methods[-1]))
Help on method merge_view in module spyglass.utils.dj_merge_tables: merge_view(restriction: str = True) method of datajoint.user_tables.TableMeta instance Prints merged view, including null entries for unique columns. Note: To handle this Union as a table-like object, use `merge_resrict` Parameters --------- restriction: str, optional Restriction to apply to the merged view
Showing data¶
merge_view
shows a union of the master and all part tables.
Note: Restrict Merge Tables with arguments, not the &
operator.
- Normally:
Table & "field='value'"
- Instead:
MergeTable.merge_view(restriction="field='value'"
).
LFPOutput.merge_view()
*merge_id *source *nwb_file_name *lfp_electrode *target_interv *filter_name *filter_sampli +------------+ +--------+ +------------+ +------------+ +------------+ +------------+ +------------+ c34f98c5-7de7- LFPV1 minirec2023062 test 01_s1 LFP 0-400 Hz 30000 (Total: 1)
UUIDs help retain unique entries across all part tables. We can fetch NWB file by referencing this or other features.
uuid_key = (LFPOutput & nwb_file_dict).fetch(limit=1, as_dict=True)[-1]
restrict = LFPOutput & uuid_key
restrict
merge_id | source |
---|---|
c34f98c5-7de7-1daf-6eaf-1e15981def44 | LFPV1 |
Total: 1
result1 = restrict.fetch_nwb(restrict.fetch1("KEY"))
result1
[{'nwb_file_name': 'minirec20230622_.nwb', 'lfp_electrode_group_name': 'test', 'target_interval_list_name': '01_s1', 'filter_name': 'LFP 0-400 Hz', 'filter_sampling_rate': 30000, 'analysis_file_name': 'minirec20230622_R5DWQ6S53S.nwb', 'interval_list_name': 'lfp_test_01_s1_valid times', 'lfp_object_id': 'ffb893d1-a31e-41d3-aec7-8dc8936c8898', 'lfp_sampling_rate': 1000.0, 'lfp': filtered data pynwb.ecephys.ElectricalSeries at 0x129602752674544 Fields: comments: no comments conversion: 1.0 data: <HDF5 dataset "data": shape (10476, 1), type "<i2"> description: filtered data electrodes: electrodes <class 'hdmf.common.table.DynamicTableRegion'> interval: 1 offset: 0.0 resolution: -1.0 timestamps: <HDF5 dataset "timestamps": shape (10476,), type "<f8"> timestamps_unit: seconds unit: volts}]
nwb_key = LFPOutput.merge_restrict(nwb_file_dict).fetch(as_dict=True)[0]
nwb_key
{'merge_id': UUID('c34f98c5-7de7-1daf-6eaf-1e15981def44'), 'source': 'LFPV1', 'nwb_file_name': 'minirec20230622_.nwb', 'lfp_electrode_group_name': 'test', 'target_interval_list_name': '01_s1', 'filter_name': 'LFP 0-400 Hz', 'filter_sampling_rate': 30000}
result2 = LFPOutput().fetch_nwb(nwb_key)
result2 == result1
True
Selecting data¶
There are also functions for retrieving part/parent table(s) and fetching data.
These get
functions will either return the part table of the Merge table or the parent table with the source information for that part.
result4 = LFPOutput.merge_get_part(restriction=nwb_file_dict, join_master=True)
result4
merge_id | source | nwb_file_name name of the NWB file | lfp_electrode_group_name the name of this group of electrodes | target_interval_list_name descriptive name of this interval list | filter_name descriptive name of this filter | filter_sampling_rate sampling rate for this filter |
---|---|---|---|---|---|---|
c34f98c5-7de7-1daf-6eaf-1e15981def44 | LFPV1 | minirec20230622_.nwb | test | 01_s1 | LFP 0-400 Hz | 30000 |
Total: 1
result5 = LFPOutput.merge_get_parent(restriction='nwb_file_name LIKE "mini%"')
result5
nwb_file_name name of the NWB file | lfp_electrode_group_name the name of this group of electrodes | target_interval_list_name descriptive name of this interval list | filter_name descriptive name of this filter | filter_sampling_rate sampling rate for this filter | analysis_file_name name of the file | interval_list_name descriptive name of this interval list | lfp_object_id the NWB object ID for loading this object from the file | lfp_sampling_rate the sampling rate, in HZ |
---|---|---|---|---|---|---|---|---|
minirec20230622_.nwb | test | 01_s1 | LFP 0-400 Hz | 30000 | minirec20230622_R5DWQ6S53S.nwb | lfp_test_01_s1_valid times | ffb893d1-a31e-41d3-aec7-8dc8936c8898 | 1000.0 |
Total: 1
fetch
will collect all relevant entries and return them as a list in
the format specified by keyword arguments and one's DataJoint config.
result6 = result5.fetch("lfp_sampling_rate") # Sample rate for all mini* files
result6
array([1000.])
merge_fetch
requires a restriction as the first argument. For no restriction,
use True
.
result7 = LFPOutput.merge_fetch(True, "filter_name", "nwb_file_name")
result7
[array(['LFP 0-400 Hz'], dtype=object), array(['minirec20230622_.nwb'], dtype=object)]
result8 = LFPOutput.merge_fetch(as_dict=True)
result8
{'merge_id': UUID('c34f98c5-7de7-1daf-6eaf-1e15981def44'), 'nwb_file_name': 'minirec20230622_.nwb', 'lfp_electrode_group_name': 'test', 'target_interval_list_name': '01_s1', 'filter_name': 'LFP 0-400 Hz', 'filter_sampling_rate': 30000}
Deletion from Merge Tables¶
When deleting from Merge Tables, we can either...
delete from the Merge Table itself with
merge_delete
, deleting both the master and part.use
merge_delete_parent
to delete from the parent sources, getting rid of the entries in the source table they came from.use
delete_downstream_parts
to find downstream part tables, like Merge Tables, and get rid full entries, avoiding orphaned master table entries.
The two latter cases can be destructive, so we include an extra layer of
protection with dry_run
. When true (by default), these functions return
a list of tables with the entries that would otherwise be deleted.
LFPOutput.merge_delete(nwb_file_dict) # Delete from merge table
[2024-01-29 16:15:23,054][INFO]: Deleting 1 rows from `lfp_merge`.`l_f_p_output__l_f_p_v1` [2024-01-29 16:15:23,058][INFO]: Deleting 1 rows from `lfp_merge`.`l_f_p_output`
[2024-01-29 16:15:24,953][WARNING]: Deletes cancelled
LFPOutput.merge_delete_parent(restriction=nwb_file_dict, dry_run=True)
[FreeTable(`lfp_v1`.`__l_f_p_v1`) *nwb_file_name *lfp_electrode *target_interv *filter_name *filter_sampli analysis_file_ interval_list_ lfp_object_id lfp_sampling_r +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ minirec2023062 test 01_s1 LFP 0-400 Hz 30000 minirec2023062 lfp_test_01_s1 ffb893d1-a31e- 1000.0 (Total: 1)]
delete_downstream_parts
is available from any other table in the pipeline,
but it does take some time to find the links downstream. If you're using this,
you can save time by reassigning your table to a variable, which will preserve
a copy of the previous search.
Because the copy is stored, this function may not see additional merge tables
you've imported. To refresh this copy, set reload_cache=True
nwbfile = sgc.Nwbfile()
(nwbfile & nwb_file_dict).delete_downstream_parts(
dry_run=True,
reload_cache=False, # if still encountering errors, try setting this to True
)
[16:15:37][INFO] Spyglass: Building merge cache for nwbfile. Found 3 downstream merge tables
dict_values([[*nwb_file_name *analysis_file *lfp_electrode *target_interv *filter_name *filter_sampli *merge_id nwb_file_a analysis_f analysis_file_ analysis_p interval_list_ lfp_object_id lfp_sampling_r +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +--------+ +--------+ +------------+ +--------+ +------------+ +------------+ +------------+ minirec2023062 minirec2023062 test 01_s1 LFP 0-400 Hz 30000 c34f98c5-7de7- =BLOB= =BLOB= =BLOB= lfp_test_01_s1 ffb893d1-a31e- 1000.0 (Total: 1) ]])
This function is run automatically whin you use cautious_delete
, which
checks team permissions before deleting.
(nwbfile & nwb_file_dict).cautious_delete()
Up Next¶
In the next notebook, we'll start working with ephys data with spike sorting.