Sync Data¶

Overview¶

This notebook will cover ...

General Kachery information
Setting up Kachery as a host. If you'll use an existing host, skip this.
Setting up Kachery in your database. If you're using an existing database, skip this.
Adding Kachery data.

Imports¶

Developer Note: if you may make a PR in the future, be sure to copy this notebook, and use the gitignore prefix temp to avoid future conflicts.

This is one notebook in a multi-part series on Spyglass.

To set up your Spyglass environment and database, see the Setup notebook
To fully demonstrate syncing features, we'll need to run some basic analyses. This can either be done with code in this notebook or by running another notebook (e.g., LFP)
For additional info on DataJoint syntax, including table definitions and inserts, see these additional tutorials

Let's start by importing the spyglass package and testing that your environment is properly configured for kachery sharing

If you haven't already done so, be sure to set up your Spyglass base directory and Kachery sharing directory with Setup

In [ ]:

Copied!





import os
import datajoint as dj
import pandas as pd

# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")
dj.config.load("dj_local_conf.json")  # load config for database connection

import spyglass.common as sgc
import spyglass.sharing as sgs
from spyglass.settings import config

import warnings

warnings.filterwarnings("ignore")
import os
import datajoint as dj
import pandas as pd

# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
    os.chdir("..")
dj.config.load("dj_local_conf.json")  # load config for database connection

import spyglass.common as sgc
import spyglass.sharing as sgs
from spyglass.settings import config

import warnings

warnings.filterwarnings("ignore")

[2023-12-22 08:22:32,189][INFO]: Connecting sambray@lmf-db.cin.ucsf.edu:3306
[2023-12-22 08:22:32,244][INFO]: Connected sambray@lmf-db.cin.ucsf.edu:3306

For example analysis files, run the code hidden below.

Quick Analysis

from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
import spyglass.data_import as sgi
import spyglass.lfp as lfp

nwb_file_name = "minirec20230622.nwb"
nwb_copy_file_name = get_nwb_copy_filename(nwb_file_name)

sgi.insert_sessions(nwb_file_name)
sgc.FirFilterParameters().create_standard_filters()
lfp.lfp_electrode.LFPElectrodeGroup.create_lfp_electrode_group(
    nwb_file_name=nwb_copy_file_name,
    group_name="test",
    electrode_list=[0],
)
lfp.v1.LFPSelection.insert1(
    {
        "nwb_file_name": nwb_copy_file_name,
        "lfp_electrode_group_name": "test",
        "target_interval_list_name": "01_s1",
        "filter_name": "LFP 0-400 Hz",
        "filter_sampling_rate": 30_000,
    },
    skip_duplicates=True,
)
lfp.v1.LFPV1().populate()

Kachery¶

Cloud¶

This notebook contains instructions for setting up data sharing/syncing through Kachery Cloud, which makes it possible to share analysis results, stored in NWB files. When a user tries to access a file, Spyglass does the following:

Try to load from the local file system/store.
If unavailable, check if it is in the relevant sharing table (i.e., NwbKachery or AnalysisNWBKachery).
If present, attempt to download from the associated Kachery Resource to the user's spyglass analysis directory.

Note: large file downloads may take a long time, so downloading raw data is not supported. We suggest direct transfer with globus or a similar service.

Zone¶

A Kachery Zone is a cloud storage host. The Frank laboratory has three separate Kachery zones:

franklab.default: Internal file sharing, including figurls
franklab.collaborator: File sharing with collaborating labs.
franklab.public: Public file sharing (not yet active)

Setting your zone can either be done as as an environment variable or an item in a DataJoint config. Spyglass will automatically handle setting the appropriate zone when downloading database files through kachery

Environment variable:

export KACHERY_ZONE=franklab.default
export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery-cloud

DataJoint Config:

"custom": {
   "kachery_zone": "franklab.default",
   "kachery_dirs": {
      "cloud": "/your/base/path/.kachery-cloud"
   }
}

Host Setup¶

If you are a member of a team with a pre-existing database and zone who will be sharing data, please skip to Sharing Data
If you are a collaborator outside your team's network and need to access files shared with you, please skip to Accessing Shared Data

Zones¶

See instructions for setting up new Kachery Zones, including creating a cloud bucket and registering it with the Kachery team.

Notes:

Bucket names cannot include periods, so we substitute a dash, as in franklab-default.
You only need to create an API token for your first zone.

Resources¶

See instructions for setting up zone resources. This allows for sharing files on demand. We suggest using the same name for the zone and resource.

Note: For each zone, you need to run the local daemon that listens for requests from that zone and uploads data to the bucket for client download when requested. An example of the bash script we use is

export KACHERY_ZONE=franklab.collaborators
    export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery-cloud
    cd /stelmo/nwb/franklab_collaborators_resource
    npx kachery-resource@latest share

For convenience, we recommend saving this code as a bash script which can be executed by the local daemon. For franklab member, these scripts can be found in the directory /home/loren/bin/:

run_restart_kachery_collab.sh
run_restart_kachery_default.sh

Database Setup¶

Once you have a hosted zone running, we need to add its information to the Spyglass database. This will allow spyglass to manage linking files from our analysis tables to kachery. First, we'll check existing Zones.

In [2]:

Copied!

sgs.KacheryZone()
sgs.KacheryZone()

Out[2]:

kachery_zone_name the name of the kachery zone. Note that this is the same as the name of the kachery resource.	description description of this zone	kachery_cloud_dir kachery cloud directory on local machine where files are linked	kachery_proxy kachery sharing proxy	lab_name
franklab.collaborators	franklab collaborator zone	/stelmo/nwb/.kachery-cloud	https://kachery-resource-proxy.herokuapp.com	Loren Frank
franklab.default	internal franklab kachery zone	/stelmo/nwb/.kachery-cloud	https://kachery-resource-proxy.herokuapp.com	Loren Frank

Total: 2

To add a new hosted Zone, we need to prepare an entry for the KacheryZone table. Note that the kacherycloud_dir key should be the path for the server daemon hosting the zone, and is not required to be present on the client machine:

In [38]:

Copied!





zone_name = config.get("KACHERY_ZONE")
cloud_dir = config.get("KACHERY_CLOUD_DIR")

zone_key = {
    "kachery_zone_name": zone_name,
    "description": " ".join(zone_name.split(".")) + " zone",
    "kachery_cloud_dir": cloud_dir,
    "kachery_proxy": "https://kachery-resource-proxy.herokuapp.com",
    "lab_name": sgc.Lab.fetch("lab_name", limit=1)[0],
}
zone_name = config.get("KACHERY_ZONE")
cloud_dir = config.get("KACHERY_CLOUD_DIR")

zone_key = {
    "kachery_zone_name": zone_name,
    "description": " ".join(zone_name.split(".")) + " zone",
    "kachery_cloud_dir": cloud_dir,
    "kachery_proxy": "https://kachery-resource-proxy.herokuapp.com",
    "lab_name": sgc.Lab.fetch("lab_name", limit=1)[0],
}

Use caution when inserting into an active database, as it could interfere with ongoing work.

In [39]:

Copied!

sgs.KacheryZone().insert1(zone_key)
sgs.KacheryZone().insert1(zone_key)

Once the zone exists, we can add AnalysisNWB files we want to share with members of the zone.

The AnalysisNwbFileKachery table links analysis files made within other spyglass tables with a uri used by kachery. We can view files already made available through kachery here:

In [8]:

Copied!

sgs.AnalysisNwbfileKachery()
sgs.AnalysisNwbfileKachery()

Out[8]:

kachery_zone_name the name of the kachery zone. Note that this is the same as the name of the kachery resource.	analysis_file_name name of the file	analysis_file_uri the uri of the file
franklab.collaborators	Banner20220224_18NJSA2B42.nwb	sha1://562b488936e5288eb89e7c480ae5c10b31c9cf2f
franklab.collaborators	Frodo20230810_0F936W4B9Z.nwb	sha1://b38d2b0fc1e9cde91cc239e1a0b50e3211b976fc
franklab.collaborators	Frodo20230810_2MJ374GSJX.nwb	sha1://ca9c238b83fd8539658a5100a9770a459a539771
franklab.collaborators	Frodo20230810_4L35OWMGHQ.nwb	sha1://a8452cf8cf6e596b44569eb9189612d2dcd4c7d6
franklab.collaborators	Frodo20230810_63PWL1N0VS.nwb	sha1://ca9c238b83fd8539658a5100a9770a459a539771
franklab.collaborators	Frodo20230810_7LYW2MK0C9.nwb	sha1://ca9c238b83fd8539658a5100a9770a459a539771
franklab.collaborators	Frodo20230810_998JNA1VBF.nwb	sha1://aa0e06028d52f5195cf24d61922ace233d8da783
franklab.collaborators	Frodo20230810_CFKWZTGXX0.nwb	sha1://ca9c238b83fd8539658a5100a9770a459a539771
franklab.collaborators	Frodo20230810_GMCOCDSJ54.nwb	sha1://2889b68d7aa2b30561e62be519c19759facad2d3
franklab.collaborators	Frodo20230810_I25NQSZQ5O.nwb	sha1://973ea71d97aef91e050117bf860ea2ed83950b10
franklab.collaborators	Frodo20230810_JS06HC1RLC.nwb	sha1://088a345c5eadfa3adea021de3f158aa86a527d4e
franklab.collaborators	Frodo20230810_KEEEEBDUNE.nwb	sha1://4aa3199011b1405e745bbe96b62b825cd93bdacd

...

Total: 298

We can share additional results by populating new entries in this table.

To do so we first add these entries to the AnalysisNwbfileKacherySelection table.

Note: This step depends on having previously run an analysis on the example file.

In [40]:

Copied!





nwb_copy_filename = "minirec20230622_.nwb"

analysis_file_list = (  # Grab all analysis files for this nwb file
    sgc.AnalysisNwbfile() & {"nwb_file_name": nwb_copy_filename}
).fetch("analysis_file_name")

kachery_selection_key = {"kachery_zone_name": zone_name}

for file in analysis_file_list:  # Add all analysis to shared list
    kachery_selection_key["analysis_file_name"] = file
    sgs.AnalysisNwbfileKacherySelection.insert1(
        kachery_selection_key, skip_duplicates=True
    )
nwb_copy_filename = "minirec20230622_.nwb"

analysis_file_list = (  # Grab all analysis files for this nwb file
    sgc.AnalysisNwbfile() & {"nwb_file_name": nwb_copy_filename}
).fetch("analysis_file_name")

kachery_selection_key = {"kachery_zone_name": zone_name}

for file in analysis_file_list:  # Add all analysis to shared list
    kachery_selection_key["analysis_file_name"] = file
    sgs.AnalysisNwbfileKacherySelection.insert1(
        kachery_selection_key, skip_duplicates=True
    )

With those files in the selection table, we can add them as links to the zone by populating the AnalysisNwbfileKachery table:

In [ ]:

Copied!

sgs.AnalysisNwbfileKachery.populate()
sgs.AnalysisNwbfileKachery.populate()

Alternatively, we can share data based on its source table in the database using the helper function share_data_to_kachery()

This will take a list of tables and add all associated analysis files for entries corresponding with a passed restriction. Here, we are sharing LFP and position data for the Session "minirec20230622_.nwb"

In [4]:

Copied!





from spyglass.sharing import share_data_to_kachery
from spyglass.lfp.v1 import LFPV1
from spyglass.position.v1 import TrodesPosV1

tables = [LFPV1, TrodesPosV1]
restriction = {"nwb_file_name": "minirec20230622_.nwb"}
share_data_to_kachery(
    table_list=tables,
    restriction=restriction,
    zone_name="franklab.collaborators",
)
from spyglass.sharing import share_data_to_kachery
from spyglass.lfp.v1 import LFPV1
from spyglass.position.v1 import TrodesPosV1

tables = [LFPV1, TrodesPosV1]
restriction = {"nwb_file_name": "minirec20230622_.nwb"}
share_data_to_kachery(
    table_list=tables,
    restriction=restriction,
    zone_name="franklab.collaborators",
)

Managing access¶

If all of that worked,

Go to https://kachery-gateway.figurl.org/admin?zone=your_zone (changing your_zone to the name of your zone)
Go to the Admin/Authorization Settings tab
Add the GitHub login names and permissions for the users you want to share with.

If those users can connect to your database, they should now be able to use the .fetch_nwb() method to download any AnalysisNwbfiles that have been shared through Kachery.

For example:

from spyglass.spikesorting import CuratedSpikeSorting

test_sort = (
    CuratedSpikeSorting & {"nwb_file_name": "minirec20230622_.nwb"}
).fetch()[0]
sort = (CuratedSpikeSorting & test_sort).fetch_nwb()

Accessing Shared Data¶

If you are a collaborator accessing datasets, you first need to be given access to the zone by a collaborator admin (see above).

If you know the uri for the dataset you are accessing you can test this process below (example is for members of franklab.collaborators)

In [ ]:

Copied!





import kachery_cloud as kcl

path = "/path/to/save/file/to/test"
zone_name = "franklab.collaborators"
uri = "sha1://ceac0c1995580dfdda98d6aa45b7dda72d63afe4"

os.environ["KACHERY_ZONE"] = zone_name
kcl.load_file(uri=uri, dest=path, verbose=True)
assert os.path.exists(path), f"File not downloaded to {path}"
import kachery_cloud as kcl

path = "/path/to/save/file/to/test"
zone_name = "franklab.collaborators"
uri = "sha1://ceac0c1995580dfdda98d6aa45b7dda72d63afe4"

os.environ["KACHERY_ZONE"] = zone_name
kcl.load_file(uri=uri, dest=path, verbose=True)
assert os.path.exists(path), f"File not downloaded to {path}"

In normal use, spyglass will manage setting the zone and uri when accessing files. In general, the easiest way to access data valueswill be through the fetch1_dataframe() function part of many of the spyglass tables. In brief this will check for the appropriate nwb analysis file in your local directory, and if not found, attempt to download it from the appropriate kachery zone. It will then parse the relevant information from that nwb file into a pandas dataframe.

We will look at an example with data from the LFPV1 table:

In [9]:

Copied!





from spyglass.lfp.v1 import LFPV1

# Here is the data we are going to access
LFPV1 & {
    "nwb_file_name": "Winnie20220713_.nwb",
    "target_interval_list_name": "pos 0 valid times",
}
from spyglass.lfp.v1 import LFPV1

# Here is the data we are going to access
LFPV1 & {
    "nwb_file_name": "Winnie20220713_.nwb",
    "target_interval_list_name": "pos 0 valid times",
}

Out[9]:

nwb_file_name name of the NWB file	lfp_electrode_group_name the name of this group of electrodes	target_interval_list_name descriptive name of this interval list	filter_name descriptive name of this filter	filter_sampling_rate sampling rate for this filter	analysis_file_name name of the file	interval_list_name descriptive name of this interval list	lfp_object_id the NWB object ID for loading this object from the file	lfp_sampling_rate the sampling rate, in HZ
Winnie20220713_.nwb	tetrode_sample_Winnie	pos 0 valid times	LFP 0-400 Hz	30000	Winnie20220713_C52XDICU6D.nwb	lfp_tetrode_sample_Winnie_pos 0 valid times_valid times	a89c590f-290b-4f9c-a568-b9ae67eee96d	1000.0

Total: 1

We can access the data using fetch1_dataframe()

In [10]:

Copied!





(
    LFPV1
    & {
        "nwb_file_name": "Winnie20220713_.nwb",
        "target_interval_list_name": "pos 0 valid times",
    }
).fetch1_dataframe()
(
    LFPV1
    & {
        "nwb_file_name": "Winnie20220713_.nwb",
        "target_interval_list_name": "pos 0 valid times",
    }
).fetch1_dataframe()

Out[10]:

	0	1	2	3	4	5	6	7	8	9	...	18	19	20	21	22	23	24	25	26	27
time
1.657741e+09	-90	-65	-104	-89	-31	-68	-27	-26	-32	-92	...	-91	-99	-87	-117	-123	-85	-73	-74	-62	13
1.657741e+09	-202	-145	-227	-220	-57	-130	-84	-68	-30	-191	...	-168	-199	-176	-250	-238	-172	-158	-140	-127	54
1.657741e+09	-218	-150	-224	-216	-84	-154	-84	-93	-29	-206	...	-125	-153	-158	-219	-206	-137	-132	-129	-120	69
1.657741e+09	-226	-151	-240	-230	-97	-144	-71	-95	-38	-236	...	-105	-136	-149	-183	-210	-111	-83	-129	-92	116
1.657741e+09	-235	-154	-250	-231	-54	-91	-81	-89	-30	-247	...	-85	-107	-116	-140	-190	-68	-28	-114	-36	193
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1.657742e+09	-3	-27	-6	29	-227	-442	-1	67	25	-15	...	-83	-217	-61	-248	-196	-63	-111	-211	-52	166
1.657742e+09	44	19	44	82	-175	-407	13	95	62	38	...	3	-112	32	-177	-123	22	-5	-147	54	285
1.657742e+09	94	63	92	129	-121	-341	61	132	88	88	...	62	-28	104	-99	-53	82	61	-62	125	347
1.657742e+09	142	107	135	179	-106	-370	88	178	120	148	...	113	48	199	-44	7	145	108	-13	213	453
1.657742e+09	108	84	95	130	-82	-281	52	134	73	105	...	97	46	169	-16	22	118	94	-3	175	348

901529 rows × 28 columns

Up Next¶

In the next notebook, we'll explore the details of a table tier unique to Spyglass, Merge Tables.

Sync Data¶

Overview¶

Imports¶

Kachery¶

Cloud¶

Zone¶

Host Setup¶

Zones¶

Resources¶

Database Setup¶

Sharing Data¶

Managing access¶

Accessing Shared Data¶

Up Next¶