Database Management¶
While Spyglass can help you organize your data, there are a number of things you'll need to do to manage users, database backups, and file cleanup.
Some these tasks should be set to run regularly. Cron jobs can help with automation.
MySQL Version¶
The Frank Lab's database is running MySQL 8.0 with a number of custom configurations set by our system admin to reflect UCSF's IT security requirements.
DataJoint's default docker container for MySQL is version 5.7. As the Spyglass team has hit select compatibility issues, we've worked with the DataJoint team to update the open source package to support MySQL 8.0.
While the Spyglass team won't be able to support earlier versions, if you run into any issues declaring Spyglass tables with an 8.0 instance, please let us know.
User Management¶
The DatabaseSettings class provides a
number of methods to help you manage users. By default, it will write out a
temporary .sql
file and execute it on the database.
Privileges¶
DataJoint schemas correspond to MySQL databases. Privileges are managed by schema/database prefix.
SELECT
privileges allow users to read, write, and delete data.ALL
privileges allow users to create, alter, or drop tables and schemas in addition to operations above.
In practice, DataJoint only permits alterations of secondary keys on existing tables, and more derstructive operations would require using DataJoint to execeute MySQL commands.
Shared schema prefixes are those defined in the Spyglass package (e.g.,
common
, lfp
, etc.). A 'user schema' is any schema with the username as
prefix. User types differ in the privileges they are granted on these prifixes.
Declaring a table with the SpyglassMixin on a schema other than a shared module
or the user's own prefix will raise a warning.
Users roles¶
When a database is first initialized, the team should run add_roles
to create
the following roles:
dj_guest
:SELECT
on all schemas.dj_collab
:ALL
on user schema,SELECT
on all other schemas.dj_user
:ALL
on shared and user schema,SELECT
on all other schemas.dj_admin
:ALL
on all schemas.
If new shared modules are introduced, the add_module
method should be used to
expand the privileges of the dj_user
role.
Setting Passwords¶
New users are generated with the password temppass
. In order to change this,
we recommend downloading DataJoint 0.14.2
(currently pre-release).
Then, you the user can reset within Python:
Database Backups¶
The following codeblockes are a series of files used to back up our database and migrate the contents to another server. Some conventions to note:
.host
: files used in the host's context.container
: files used inside the database Docker container.env
: files used to set environment variables used by the scripts for database name, backup name, and backup credentials
This backup process uses a dedicated backup user, that an admin would need to criate with the relevant permissions.
mysql.env.host¶
MySQL host environment variables
Values may be adjusted as needed for different building images.ROOT_PATH=/usr/local/containers/mysql # path to this container's working area
# variables for building image
SRC=ubuntu
VER=20.04
DOCKERFILE=Dockerfile.base
# variables for referencing image
IMAGE=mysql8
TAG=u20
# variables for running the container
CNAME=mysql-datajoint
MACADDR=4e:b0:3d:42:e0:70
RPORT=3306
# variables for initializing/relaunching the container
# - where the mysql data and backups will live - these values
# are examples
DB_PATH=/data/db
DB_DATA=mysql
DB_BACKUP=/data/mysql-backups
# backup info
BACK_USER=mysql-backup
BACK_PW={password}
BACK_DBNAME={database}
# mysql root password - make sure to remove this AFTER the container
# is initialized - and this file will be replicated inside the container
# on initialization, so remove it from there: /opt/bin/mysql.env
backup-database.sh.host¶
This script runs the mysql-backup container script (exec inside the container) that dumps the database contents for each database as well as the entire database. Use cron to set this to run on your desired schedule.
MySQL host docker exec
mysql-backup-xfer.csh.host¶
This script transfers the backup to another server 'X' and is specific for us as it uses passwordless ssh keys to a local unprivileged user on X that has the mysql backup area on X as that user's home.
MySQL host transfer script
myenv.csh.container¶
Docker container environment variables
mysql-backup.csh.container¶
Generate backups from within container
#!/bin/csh
source /opt/bin/myenv.csh
set td=`date +"%Y%m%d"`
cd /${db_backup}
mkdir ${back_dbname}-${td}
set list=`echo "show databases;" | mysql --user=${back_user} --password=${back_pw}`
set cnt=0
foreach db ($list)
if ($cnt == 0) then
echo "dumping mysql databases on $td"
else
echo "dumping MySQL database : $db"
# Per-schema backups
mysqldump $db --max_allowed_packet=512M --user=${back_user} --password=${back_pw} > /${db_backup}/${back_dbname}-${td}/mysql.${db}.sql
endif
@ cnt = $cnt + 1
end
# Full database backup
mysqldump --all-databases --max_allowed_packet=512M --user=${back_user} --password=${back_pw} > /${db_backup}/${back_dbname}-${td}/mysql-all.sql
File Cleanup¶
Spyglass is designed to hold metadata for analyses that reference NWB files on disk. There are several tables that retain lists of files that have been generated during analyses. If someone deletes analysis entries, files will still be on disk.
Additionally, there are periphery tables such as IntervalList
which are used
to store entries created by downstream tables. These entries are not
automatically deleted when the downstream entry is removed. To minimize interference
with ongoing user entry creation, we recommend running these cleanups on a less frequent
basis (e.g. weekly).
To remove orphaned files and entries, we run the following commands in our cron jobs:
from spyglass.common import AnalysisNwbfile, IntervalList
from spyglass.spikesorting import SpikeSorting
from spyglass.common.common_nwbfile import schema as nwbfile_schema
from spyglass.decoding.v1.sorted_spikes import schema as spikes_schema
from spyglass.decoding.v1.clusterless import schema as clusterless_schema
def main():
AnalysisNwbfile().nightly_cleanup()
SpikeSorting().nightly_cleanup()
IntervalList().cleanup()
nwbfile_schema.external['analysis'].delete(delete_external_files=True))
nwbfile_schema.external['raw'].delete(delete_external_files=True))
spikes_schema.external['analysis'].delete(delete_external_files=True))
clusterless_schema.external['analysis'].delete(delete_external_files=True))
The delete
calls above use DataJoint's ExternalTable.delete
method, which
will remove files from disk that are no longer referenced in the database.