Sync Data¶
Overview¶
This notebook will cover ...
- General Kachery information
- Setting up Kachery as a host. If you'll use an existing host, skip this.
- Setting up Kachery in your database. If you're using an existing database, skip this.
- Adding Kachery data.
Imports¶
Developer Note: if you may make a PR in the future, be sure to copy this
notebook, and use the gitignore
prefix temp
to avoid future conflicts.
This is one notebook in a multi-part series on Spyglass.
- To set up your Spyglass environment and database, see the Setup notebook
- To fully demonstrate syncing features, we'll need to run some basic analyses. This can either be done with code in this notebook or by running another notebook (e.g., LFP)
- For additional info on DataJoint syntax, including table definitions and inserts, see these additional tutorials
Let's start by importing the spyglass
package and testing that your environment
is properly configured for kachery sharing
If you haven't already done so, be sure to set up your Spyglass base directory and Kachery sharing directory with Setup
import os
import datajoint as dj
# change to the upper level folder to detect dj_local_conf.json
if os.path.basename(os.getcwd()) == "notebooks":
os.chdir("..")
dj.config.load("dj_local_conf.json") # load config for database connection
import spyglass.common as sgc
import spyglass.sharing as sgs
from spyglass.settings import config
import warnings
warnings.filterwarnings("ignore")
[2023-12-22 08:22:32,189][INFO]: Connecting sambray@lmf-db.cin.ucsf.edu:3306 [2023-12-22 08:22:32,244][INFO]: Connected sambray@lmf-db.cin.ucsf.edu:3306
For example analysis files, run the code hidden below.
Quick Analysis
from spyglass.utils.nwb_helper_fn import get_nwb_copy_filename
import spyglass.data_import as sgi
import spyglass.lfp as lfp
nwb_file_name = "minirec20230622.nwb"
nwb_copy_file_name = get_nwb_copy_filename(nwb_file_name)
sgi.insert_sessions(nwb_file_name)
sgc.FirFilterParameters().create_standard_filters()
lfp.lfp_electrode.LFPElectrodeGroup.create_lfp_electrode_group(
nwb_file_name=nwb_copy_file_name,
group_name="test",
electrode_list=[0],
)
lfp.v1.LFPSelection.insert1(
{
"nwb_file_name": nwb_copy_file_name,
"lfp_electrode_group_name": "test",
"target_interval_list_name": "01_s1",
"filter_name": "LFP 0-400 Hz",
"filter_sampling_rate": 30_000,
},
skip_duplicates=True,
)
lfp.v1.LFPV1().populate()
Kachery¶
Cloud¶
This notebook contains instructions for setting up data sharing/syncing through Kachery Cloud, which makes it possible to share analysis results, stored in NWB files. When a user tries to access a file, Spyglass does the following:
- Try to load from the local file system/store.
- If unavailable, check if it is in the relevant sharing table (i.e.,
NwbKachery
orAnalysisNWBKachery
). - If present, attempt to download from the associated Kachery Resource to the user's spyglass analysis directory.
Note: large file downloads may take a long time, so downloading raw data is not supported. We suggest direct transfer with globus or a similar service.
Zone¶
A Kachery Zone is a cloud storage host. The Frank laboratory has three separate Kachery zones:
franklab.default
: Internal file sharing, including figurlsfranklab.collaborator
: File sharing with collaborating labs.franklab.public
: Public file sharing (not yet active)
Setting your zone can either be done as as an environment variable or an item in a DataJoint config. Spyglass will automatically handle setting the appropriate zone when downloading database files through kachery
Environment variable:
export KACHERY_ZONE=franklab.default export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery-cloud
DataJoint Config:
"custom": { "kachery_zone": "franklab.default", "kachery_dirs": { "cloud": "/your/base/path/.kachery-cloud" } }
Host Setup¶
If you are a member of a team with a pre-existing database and zone who will be sharing data, please skip to
Sharing Data
If you are a collaborator outside your team's network and need to access files shared with you, please skip to
Accessing Shared Data
Zones¶
See instructions for setting up new Kachery Zones, including creating a cloud bucket and registering it with the Kachery team.
Notes:
- Bucket names cannot include periods, so we substitute a dash, as in
franklab-default
. - You only need to create an API token for your first zone.
Resources¶
See instructions for setting up zone resources. This allows for sharing files on demand. We suggest using the same name for the zone and resource.
Note: For each zone, you need to run the local daemon that listens for requests from that zone and uploads data to the bucket for client download when requested. An example of the bash script we use is
export KACHERY_ZONE=franklab.collaborators
export KACHERY_CLOUD_DIR=/stelmo/nwb/.kachery-cloud
cd /stelmo/nwb/franklab_collaborators_resource
npx kachery-resource@latest share
For convenience, we recommend saving this code as a bash script which can be executed by the local daemon. For franklab member, these scripts can be found in the directory /home/loren/bin/
:
- run_restart_kachery_collab.sh
- run_restart_kachery_default.sh
Database Setup¶
Once you have a hosted zone running, we need to add its information to the Spyglass database. This will allow spyglass to manage linking files from our analysis tables to kachery. First, we'll check existing Zones.
sgs.KacheryZone()
kachery_zone_name the name of the kachery zone. Note that this is the same as the name of the kachery resource. | description description of this zone | kachery_cloud_dir kachery cloud directory on local machine where files are linked | kachery_proxy kachery sharing proxy | lab_name |
---|---|---|---|---|
franklab.collaborators | franklab collaborator zone | /stelmo/nwb/.kachery-cloud | https://kachery-resource-proxy.herokuapp.com | Loren Frank |
franklab.default | internal franklab kachery zone | /stelmo/nwb/.kachery-cloud | https://kachery-resource-proxy.herokuapp.com | Loren Frank |
Total: 2
To add a new hosted Zone, we need to prepare an entry for the KacheryZone
table.
Note that the kacherycloud_dir
key should be the path for the server daemon hosting the zone,
and is not required to be present on the client machine:
zone_name = config.get("KACHERY_ZONE")
cloud_dir = config.get("KACHERY_CLOUD_DIR")
zone_key = {
"kachery_zone_name": zone_name,
"description": " ".join(zone_name.split(".")) + " zone",
"kachery_cloud_dir": cloud_dir,
"kachery_proxy": "https://kachery-resource-proxy.herokuapp.com",
"lab_name": sgc.Lab.fetch("lab_name", limit=1)[0],
}
Use caution when inserting into an active database, as it could interfere with ongoing work.
sgs.KacheryZone().insert1(zone_key, skip_duplicates=True)
Sharing Data¶
Once the zone exists, we can add AnalysisNWB
files we want to share with members of the zone.
The AnalysisNwbFileKachery
table links analysis files made within other spyglass tables with a uri
used by kachery. We can view files already made available through kachery here:
sgs.AnalysisNwbfileKachery()
kachery_zone_name the name of the kachery zone. Note that this is the same as the name of the kachery resource. | analysis_file_name name of the file | analysis_file_uri the uri of the file |
---|---|---|
franklab.collaborators | Banner20220224_18NJSA2B42.nwb | sha1://562b488936e5288eb89e7c480ae5c10b31c9cf2f |
franklab.collaborators | Frodo20230810_0F936W4B9Z.nwb | sha1://b38d2b0fc1e9cde91cc239e1a0b50e3211b976fc |
franklab.collaborators | Frodo20230810_2MJ374GSJX.nwb | sha1://ca9c238b83fd8539658a5100a9770a459a539771 |
franklab.collaborators | Frodo20230810_4L35OWMGHQ.nwb | sha1://a8452cf8cf6e596b44569eb9189612d2dcd4c7d6 |
franklab.collaborators | Frodo20230810_63PWL1N0VS.nwb | sha1://ca9c238b83fd8539658a5100a9770a459a539771 |
franklab.collaborators | Frodo20230810_7LYW2MK0C9.nwb | sha1://ca9c238b83fd8539658a5100a9770a459a539771 |
franklab.collaborators | Frodo20230810_998JNA1VBF.nwb | sha1://aa0e06028d52f5195cf24d61922ace233d8da783 |
franklab.collaborators | Frodo20230810_CFKWZTGXX0.nwb | sha1://ca9c238b83fd8539658a5100a9770a459a539771 |
franklab.collaborators | Frodo20230810_GMCOCDSJ54.nwb | sha1://2889b68d7aa2b30561e62be519c19759facad2d3 |
franklab.collaborators | Frodo20230810_I25NQSZQ5O.nwb | sha1://973ea71d97aef91e050117bf860ea2ed83950b10 |
franklab.collaborators | Frodo20230810_JS06HC1RLC.nwb | sha1://088a345c5eadfa3adea021de3f158aa86a527d4e |
franklab.collaborators | Frodo20230810_KEEEEBDUNE.nwb | sha1://4aa3199011b1405e745bbe96b62b825cd93bdacd |
...
Total: 298
We can share additional results by populating new entries in this table.
To do so we first add these entries to the AnalysisNwbfileKacherySelection
table.
Note: This step depends on having previously run an analysis on the example file.
nwb_copy_filename = "minirec20230622_.nwb"
analysis_file_list = ( # Grab all analysis files for this nwb file
sgc.AnalysisNwbfile() & {"nwb_file_name": nwb_copy_filename}
).fetch("analysis_file_name")
kachery_selection_key = {"kachery_zone_name": zone_name}
for file in analysis_file_list: # Add all analysis to shared list
kachery_selection_key["analysis_file_name"] = file
sgs.AnalysisNwbfileKacherySelection.insert1(
kachery_selection_key, skip_duplicates=True
)
With those files in the selection table, we can add them as links to the zone by
populating the AnalysisNwbfileKachery
table:
sgs.AnalysisNwbfileKachery.populate()
Alternatively, we can share data based on its source table in the database using the helper function share_data_to_kachery()
This will take a list of tables and add all associated analysis files for entries corresponding with a passed restriction. Here, we are sharing LFP and position data for the Session "minirec20230622_.nwb"
from spyglass.sharing import share_data_to_kachery
from spyglass.lfp.v1 import LFPV1
from spyglass.position.v1 import TrodesPosV1
tables = [LFPV1, TrodesPosV1]
restriction = {"nwb_file_name": "minirec20230622_.nwb"}
share_data_to_kachery(
table_list=tables,
restriction=restriction,
zone_name=zone_name,
)
Managing access¶
If all of that worked,
- Go to https://kachery-gateway.figurl.org/admin?zone=your_zone (changing your_zone to the name of your zone)
- Go to the Admin/Authorization Settings tab
- Add the GitHub login names and permissions for the users you want to share with.
If those users can connect to your database, they should now be able to use the
.fetch_nwb()
method to download any AnalysisNwbfiles
that have been shared
through Kachery.
For example:
from spyglass.spikesorting import CuratedSpikeSorting
test_sort = (
CuratedSpikeSorting & {"nwb_file_name": "minirec20230622_.nwb"}
).fetch()[0]
sort = (CuratedSpikeSorting & test_sort).fetch_nwb()
Accessing Shared Data¶
If you are a collaborator accessing datasets, you first need to be given access to the zone by a collaborator admin (see above).
If you know the uri for the dataset you are accessing you can test this process below (example is for members of franklab.collaborators
)
import kachery_cloud as kcl
path = "/path/to/save/file/to/test"
zone_name = "franklab.collaborators"
uri = "sha1://ceac0c1995580dfdda98d6aa45b7dda72d63afe4"
os.environ["KACHERY_ZONE"] = zone_name
kcl.load_file(uri=uri, dest=path, verbose=True)
assert os.path.exists(path), f"File not downloaded to {path}"
In normal use, spyglass will manage setting the zone and uri when accessing files.
In general, the easiest way to access data valueswill be through the fetch1_dataframe()
function part of many of the spyglass tables. In brief this will check for the appropriate
nwb analysis file in your local directory, and if not found, attempt to download it from the appropriate kachery zone.
It will then parse the relevant information from that nwb file into a pandas dataframe.
We will look at an example with data from the LFPV1
table:
from spyglass.lfp.v1 import LFPV1
# Here is the data we are going to access
LFPV1 & {
"nwb_file_name": "Winnie20220713_.nwb",
"target_interval_list_name": "pos 0 valid times",
}
nwb_file_name name of the NWB file | lfp_electrode_group_name the name of this group of electrodes | target_interval_list_name descriptive name of this interval list | filter_name descriptive name of this filter | filter_sampling_rate sampling rate for this filter | analysis_file_name name of the file | interval_list_name descriptive name of this interval list | lfp_object_id the NWB object ID for loading this object from the file | lfp_sampling_rate the sampling rate, in HZ |
---|---|---|---|---|---|---|---|---|
Winnie20220713_.nwb | tetrode_sample_Winnie | pos 0 valid times | LFP 0-400 Hz | 30000 | Winnie20220713_C52XDICU6D.nwb | lfp_tetrode_sample_Winnie_pos 0 valid times_valid times | a89c590f-290b-4f9c-a568-b9ae67eee96d | 1000.0 |
Total: 1
We can access the data using fetch1_dataframe()
(
LFPV1
& {
"nwb_file_name": "Winnie20220713_.nwb",
"target_interval_list_name": "pos 0 valid times",
}
).fetch1_dataframe()
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
1.657741e+09 | -90 | -65 | -104 | -89 | -31 | -68 | -27 | -26 | -32 | -92 | ... | -91 | -99 | -87 | -117 | -123 | -85 | -73 | -74 | -62 | 13 |
1.657741e+09 | -202 | -145 | -227 | -220 | -57 | -130 | -84 | -68 | -30 | -191 | ... | -168 | -199 | -176 | -250 | -238 | -172 | -158 | -140 | -127 | 54 |
1.657741e+09 | -218 | -150 | -224 | -216 | -84 | -154 | -84 | -93 | -29 | -206 | ... | -125 | -153 | -158 | -219 | -206 | -137 | -132 | -129 | -120 | 69 |
1.657741e+09 | -226 | -151 | -240 | -230 | -97 | -144 | -71 | -95 | -38 | -236 | ... | -105 | -136 | -149 | -183 | -210 | -111 | -83 | -129 | -92 | 116 |
1.657741e+09 | -235 | -154 | -250 | -231 | -54 | -91 | -81 | -89 | -30 | -247 | ... | -85 | -107 | -116 | -140 | -190 | -68 | -28 | -114 | -36 | 193 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1.657742e+09 | -3 | -27 | -6 | 29 | -227 | -442 | -1 | 67 | 25 | -15 | ... | -83 | -217 | -61 | -248 | -196 | -63 | -111 | -211 | -52 | 166 |
1.657742e+09 | 44 | 19 | 44 | 82 | -175 | -407 | 13 | 95 | 62 | 38 | ... | 3 | -112 | 32 | -177 | -123 | 22 | -5 | -147 | 54 | 285 |
1.657742e+09 | 94 | 63 | 92 | 129 | -121 | -341 | 61 | 132 | 88 | 88 | ... | 62 | -28 | 104 | -99 | -53 | 82 | 61 | -62 | 125 | 347 |
1.657742e+09 | 142 | 107 | 135 | 179 | -106 | -370 | 88 | 178 | 120 | 148 | ... | 113 | 48 | 199 | -44 | 7 | 145 | 108 | -13 | 213 | 453 |
1.657742e+09 | 108 | 84 | 95 | 130 | -82 | -281 | 52 | 134 | 73 | 105 | ... | 97 | 46 | 169 | -16 | 22 | 118 | 94 | -3 | 175 | 348 |
901529 rows × 28 columns
Up Next¶
In the next notebook, we'll explore the details of a table tier unique to Spyglass, Merge Tables.