Merge Tables¶

Intro¶

Developer Note: if you may make a PR in the future, be sure to copy this notebook, and use the gitignore prefix temp to avoid future conflicts.

This is one notebook in a multi-part series on Spyglass.

To set up your Spyglass environment and database, see the Setup notebook
To insert data, see the Insert Data notebook
For additional info on DataJoint syntax, including table definitions and inserts, see these additional tutorials
For information on why we use merge tables, and how to make one, see our documentation

In short, merge tables represent the end processing point of a given way of processing the data in our pipelines. Merge Tables allow us to build new processing pipeline, or a new version of an existing pipeline, without having to drop or migrate the old tables. They allow data to be processed in different ways, but with a unified end result that downstream pipelines can all access.

Imports¶

Let's start by importing the spyglass package, along with a few others.

In [1]:

Copied!





import os
import datajoint as dj

# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")
dj.config.load("dj_local_conf.json")  # load config for database connection info

# ignore datajoint+jupyter async warnings
import warnings

warnings.simplefilter("ignore", category=DeprecationWarning)
warnings.simplefilter("ignore", category=ResourceWarning)
warnings.simplefilter("ignore", category=UserWarning)

import spyglass.common as sgc
import spyglass.lfp as lfp
from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
from spyglass.utils.dj_merge_tables import delete_downstream_parts, Merge
from spyglass.common.common_ephys import LFP as CommonLFP  # Upstream 1
from spyglass.lfp.lfp_merge import LFPOutput  # Merge Table
from spyglass.lfp.v1.lfp import LFPV1  # Upstream 2
import os
import datajoint as dj

# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")
dj.config.load("dj_local_conf.json")  # load config for database connection info

# ignore datajoint+jupyter async warnings
import warnings

warnings.simplefilter("ignore", category=DeprecationWarning)
warnings.simplefilter("ignore", category=ResourceWarning)
warnings.simplefilter("ignore", category=UserWarning)

import spyglass.common as sgc
import spyglass.lfp as lfp
from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
from spyglass.utils.dj_merge_tables import delete_downstream_parts, Merge
from spyglass.common.common_ephys import LFP as CommonLFP  # Upstream 1
from spyglass.lfp.lfp_merge import LFPOutput  # Merge Table
from spyglass.lfp.v1.lfp import LFPV1  # Upstream 2

[2024-01-29 16:15:00,903][INFO]: Connecting root@localhost:3309
[2024-01-29 16:15:00,912][INFO]: Connected root@localhost:3309

Example data¶

Check to make sure the data inserted in the previour notebook is still there.

In [2]:

Copied!





nwb_file_name = "minirec20230622.nwb"
nwb_copy_file_name = get_nwb_copy_filename(nwb_file_name)
nwb_file_dict = {"nwb_file_name": nwb_copy_file_name}
sgc.Session & nwb_file_dict
nwb_file_name = "minirec20230622.nwb"
nwb_copy_file_name = get_nwb_copy_filename(nwb_file_name)
nwb_file_dict = {"nwb_file_name": nwb_copy_file_name}
sgc.Session & nwb_file_dict

Out[2]:

Table for holding experimental sessions.

nwb_file_name name of the NWB file	subject_id	institution_name	lab_name	session_id	session_description	session_start_time	timestamps_reference_time	experiment_description
minirec20230622_.nwb	54321	UCSF	Loren Frank Lab	12345	test yaml insertion	2023-06-22 15:59:58	1970-01-01 00:00:00	Test Conversion

Total: 1

If you haven't already done so, insert data into a Merge Table.

Note: Some existing parents of Merge Tables perform the Merge Table insert as part of the populate methods. This practice will be revised in the future.

In [3]:

Copied!





sgc.FirFilterParameters().create_standard_filters()
lfp.lfp_electrode.LFPElectrodeGroup.create_lfp_electrode_group(
    nwb_file_name=nwb_copy_file_name,
    group_name="test",
    electrode_list=[0],
)
lfp_key = {
    "nwb_file_name": nwb_copy_file_name,
    "lfp_electrode_group_name": "test",
    "target_interval_list_name": "01_s1",
    "filter_name": "LFP 0-400 Hz",
    "filter_sampling_rate": 30_000,
}
lfp.v1.LFPSelection.insert1(lfp_key, skip_duplicates=True)
lfp.v1.LFPV1().populate(lfp_key)
LFPOutput.insert([lfp_key], skip_duplicates=True)
sgc.FirFilterParameters().create_standard_filters()
lfp.lfp_electrode.LFPElectrodeGroup.create_lfp_electrode_group(
    nwb_file_name=nwb_copy_file_name,
    group_name="test",
    electrode_list=[0],
)
lfp_key = {
    "nwb_file_name": nwb_copy_file_name,
    "lfp_electrode_group_name": "test",
    "target_interval_list_name": "01_s1",
    "filter_name": "LFP 0-400 Hz",
    "filter_sampling_rate": 30_000,
}
lfp.v1.LFPSelection.insert1(lfp_key, skip_duplicates=True)
lfp.v1.LFPV1().populate(lfp_key)
LFPOutput.insert([lfp_key], skip_duplicates=True)

Helper functions¶

Merge Tables have multiple custom methods that begin with merge.

help can show us the docstring of each

In [4]:

Copied!

merge_methods = [d for d in dir(Merge) if d.startswith("merge")]
print(merge_methods)
merge_methods = [d for d in dir(Merge) if d.startswith("merge")]
print(merge_methods)

['merge_delete', 'merge_delete_parent', 'merge_fetch', 'merge_get_parent', 'merge_get_parent_class', 'merge_get_part', 'merge_html', 'merge_populate', 'merge_restrict', 'merge_restrict_class', 'merge_view']

In [5]:

Copied!

help(getattr(Merge, merge_methods[-1]))
help(getattr(Merge, merge_methods[-1]))

Help on method merge_view in module spyglass.utils.dj_merge_tables:

merge_view(restriction: str = True) method of datajoint.user_tables.TableMeta instance
    Prints merged view, including null entries for unique columns.
    
    Note: To handle this Union as a table-like object, use `merge_resrict`
    
    Parameters
    ---------
    restriction: str, optional
        Restriction to apply to the merged view

Showing data¶

merge_view shows a union of the master and all part tables.

Note: Restrict Merge Tables with arguments, not the & operator.

Normally: Table & "field='value'"
Instead: MergeTable.merge_view(restriction="field='value'").

In [6]:

Copied!

LFPOutput.merge_view()
LFPOutput.merge_view()

*merge_id      *source    *nwb_file_name *lfp_electrode *target_interv *filter_name   *filter_sampli
+------------+ +--------+ +------------+ +------------+ +------------+ +------------+ +------------+
c34f98c5-7de7- LFPV1      minirec2023062 test           01_s1          LFP 0-400 Hz   30000         
 (Total: 1)

UUIDs help retain unique entries across all part tables. We can fetch NWB file by referencing this or other features.

In [ ]:

Copied!

uuid_key = (LFPOutput & nwb_file_dict).fetch(limit=1, as_dict=True)[-1]
restrict = LFPOutput & uuid_key
restrict
uuid_key = (LFPOutput & nwb_file_dict).fetch(limit=1, as_dict=True)[-1]
restrict = LFPOutput & uuid_key
restrict

Out[ ]:

merge_id	source
c34f98c5-7de7-1daf-6eaf-1e15981def44	LFPV1

Total: 1

In [8]:

Copied!

result1 = restrict.fetch_nwb(restrict.fetch1("KEY"))
result1
result1 = restrict.fetch_nwb(restrict.fetch1("KEY"))
result1

Out[8]:

[{'nwb_file_name': 'minirec20230622_.nwb',
  'lfp_electrode_group_name': 'test',
  'target_interval_list_name': '01_s1',
  'filter_name': 'LFP 0-400 Hz',
  'filter_sampling_rate': 30000,
  'analysis_file_name': 'minirec20230622_R5DWQ6S53S.nwb',
  'interval_list_name': 'lfp_test_01_s1_valid times',
  'lfp_object_id': 'ffb893d1-a31e-41d3-aec7-8dc8936c8898',
  'lfp_sampling_rate': 1000.0,
  'lfp': filtered data pynwb.ecephys.ElectricalSeries at 0x129602752674544
  Fields:
    comments: no comments
    conversion: 1.0
    data: <HDF5 dataset "data": shape (10476, 1), type "<i2">
    description: filtered data
    electrodes: electrodes <class 'hdmf.common.table.DynamicTableRegion'>
    interval: 1
    offset: 0.0
    resolution: -1.0
    timestamps: <HDF5 dataset "timestamps": shape (10476,), type "<f8">
    timestamps_unit: seconds
    unit: volts}]

In [9]:

Copied!

nwb_key = LFPOutput.merge_restrict(nwb_file_dict).fetch(as_dict=True)[0]
nwb_key
nwb_key = LFPOutput.merge_restrict(nwb_file_dict).fetch(as_dict=True)[0]
nwb_key

Out[9]:

{'merge_id': UUID('c34f98c5-7de7-1daf-6eaf-1e15981def44'),
 'source': 'LFPV1',
 'nwb_file_name': 'minirec20230622_.nwb',
 'lfp_electrode_group_name': 'test',
 'target_interval_list_name': '01_s1',
 'filter_name': 'LFP 0-400 Hz',
 'filter_sampling_rate': 30000}

In [10]:

Copied!

result2 = LFPOutput().fetch_nwb(nwb_key)
result2 == result1
result2 = LFPOutput().fetch_nwb(nwb_key)
result2 == result1

Out[10]:

True

Selecting data¶

There are also functions for retrieving part/parent table(s) and fetching data.

These get functions will either return the part table of the Merge table or the parent table with the source information for that part.

In [11]:

Copied!

result4 = LFPOutput.merge_get_part(restriction=nwb_file_dict, join_master=True)
result4
result4 = LFPOutput.merge_get_part(restriction=nwb_file_dict, join_master=True)
result4

Out[11]:

merge_id	source	nwb_file_name name of the NWB file	lfp_electrode_group_name the name of this group of electrodes	target_interval_list_name descriptive name of this interval list	filter_name descriptive name of this filter	filter_sampling_rate sampling rate for this filter
c34f98c5-7de7-1daf-6eaf-1e15981def44	LFPV1	minirec20230622_.nwb	test	01_s1	LFP 0-400 Hz	30000

Total: 1

In [12]:

Copied!

result5 = LFPOutput.merge_get_parent(restriction='nwb_file_name LIKE "mini%"')
result5
result5 = LFPOutput.merge_get_parent(restriction='nwb_file_name LIKE "mini%"')
result5

Out[12]:

nwb_file_name name of the NWB file	lfp_electrode_group_name the name of this group of electrodes	target_interval_list_name descriptive name of this interval list	filter_name descriptive name of this filter	filter_sampling_rate sampling rate for this filter	analysis_file_name name of the file	interval_list_name descriptive name of this interval list	lfp_object_id the NWB object ID for loading this object from the file	lfp_sampling_rate the sampling rate, in HZ
minirec20230622_.nwb	test	01_s1	LFP 0-400 Hz	30000	minirec20230622_R5DWQ6S53S.nwb	lfp_test_01_s1_valid times	ffb893d1-a31e-41d3-aec7-8dc8936c8898	1000.0

Total: 1

fetch will collect all relevant entries and return them as a list in the format specified by keyword arguments and one's DataJoint config.

In [13]:

Copied!

result6 = result5.fetch("lfp_sampling_rate")  # Sample rate for all mini* files
result6
result6 = result5.fetch("lfp_sampling_rate")  # Sample rate for all mini* files
result6

Out[13]:

array([1000.])

merge_fetch requires a restriction as the first argument. For no restriction, use True.

In [14]:

Copied!

result7 = LFPOutput.merge_fetch(True, "filter_name", "nwb_file_name")
result7
result7 = LFPOutput.merge_fetch(True, "filter_name", "nwb_file_name")
result7

Out[14]:

[array(['LFP 0-400 Hz'], dtype=object),
 array(['minirec20230622_.nwb'], dtype=object)]

In [15]:

Copied!

result8 = LFPOutput.merge_fetch(as_dict=True)
result8
result8 = LFPOutput.merge_fetch(as_dict=True)
result8

Out[15]:

{'merge_id': UUID('c34f98c5-7de7-1daf-6eaf-1e15981def44'),
 'nwb_file_name': 'minirec20230622_.nwb',
 'lfp_electrode_group_name': 'test',
 'target_interval_list_name': '01_s1',
 'filter_name': 'LFP 0-400 Hz',
 'filter_sampling_rate': 30000}

Deletion from Merge Tables¶

When deleting from Merge Tables, we can either...

delete from the Merge Table itself with merge_delete, deleting both the master and part.
use merge_delete_parent to delete from the parent sources, getting rid of the entries in the source table they came from.
use delete_downstream_parts to find downstream part tables, like Merge Tables, and get rid full entries, avoiding orphaned master table entries.

The two latter cases can be destructive, so we include an extra layer of protection with dry_run. When true (by default), these functions return a list of tables with the entries that would otherwise be deleted.

In [16]:

Copied!

LFPOutput.merge_delete(nwb_file_dict)  # Delete from merge table
LFPOutput.merge_delete(nwb_file_dict)  # Delete from merge table

[2024-01-29 16:15:23,054][INFO]: Deleting 1 rows from `lfp_merge`.`l_f_p_output__l_f_p_v1`
[2024-01-29 16:15:23,058][INFO]: Deleting 1 rows from `lfp_merge`.`l_f_p_output`

[2024-01-29 16:15:24,953][WARNING]: Deletes cancelled

In [17]:

Copied!

LFPOutput.merge_delete_parent(restriction=nwb_file_dict, dry_run=True)
LFPOutput.merge_delete_parent(restriction=nwb_file_dict, dry_run=True)

Out[17]:

[FreeTable(`lfp_v1`.`__l_f_p_v1`)
 *nwb_file_name *lfp_electrode *target_interv *filter_name   *filter_sampli analysis_file_ interval_list_ lfp_object_id  lfp_sampling_r
 +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+
 minirec2023062 test           01_s1          LFP 0-400 Hz   30000          minirec2023062 lfp_test_01_s1 ffb893d1-a31e- 1000.0        
  (Total: 1)]

delete_downstream_parts is available from any other table in the pipeline, but it does take some time to find the links downstream. If you're using this, you can save time by reassigning your table to a variable, which will preserve a copy of the previous search.

Because the copy is stored, this function may not see additional merge tables you've imported. To refresh this copy, set reload_cache=True

In [18]:

Copied!





nwbfile = sgc.Nwbfile()

(nwbfile & nwb_file_dict).delete_downstream_parts(
    dry_run=True,
    reload_cache=False,  # if still encountering errors, try setting this to True
)
nwbfile = sgc.Nwbfile()

(nwbfile & nwb_file_dict).delete_downstream_parts(
    dry_run=True,
    reload_cache=False,  # if still encountering errors, try setting this to True
)

[16:15:37][INFO] Spyglass: Building merge cache for nwbfile.
	Found 3 downstream merge tables

Out[18]:

dict_values([[*nwb_file_name *analysis_file *lfp_electrode *target_interv *filter_name   *filter_sampli *merge_id      nwb_file_a analysis_f analysis_file_ analysis_p interval_list_ lfp_object_id  lfp_sampling_r
+------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +------------+ +--------+ +--------+ +------------+ +--------+ +------------+ +------------+ +------------+
minirec2023062 minirec2023062 test           01_s1          LFP 0-400 Hz   30000          c34f98c5-7de7- =BLOB=     =BLOB=                    =BLOB=     lfp_test_01_s1 ffb893d1-a31e- 1000.0        
 (Total: 1)
]])

This function is run automatically whin you use cautious_delete, which checks team permissions before deleting.

In [ ]:

Copied!

(nwbfile & nwb_file_dict).cautious_delete()
(nwbfile & nwb_file_dict).cautious_delete()

Up Next¶

In the next notebook, we'll start working with ephys data with spike sorting.