Coding for Reuse¶
Reusing code requires that it be faster to read and change than it would be to start from scratch.
We can speed up that process by ...
- Making reading predictable.
- Atomizing - separating pieces into the smallest meaningful chunks.
- Leaving notes via type hints, docstrings, and comments
- Getting ahead of errors
- Automating as much of the above as possible.
This document pulls from resources like Tidy First and SOLID Principles. Experienced object-oriented developers may find these principles familiar.
Predictable Formatting¶
- Many programming languages offer flexibility in how they are written.
- Tools like
black
andisort
take away stylistic preferences in favor of one norm. - Strict line limits (e.g., 80) make it easier to do side by side comparisons in git interfaces.
black
is also useful for detecting an error on save - if it doesn't run on what you wrote, there's an error somewhere.
Let's look at a few examples of the same code block formatted different ways...
Original¶
def get_data_interface(nwbfile, data_interface_name, data_interface_class=None, unused_other_arg=None):
ret = { 'centroid_method': "two_pt_centroid", 'points': {'point1': 'greenLED', "point2": 'redLED_C'}, 'interpolate': True}
for module in nwbfile.processing.values():
match = module.data_interfaces.get(data_interface_name, None)
if match is not None:
if data_interface_class is not None and not isinstance(match, data_interface_class):
continue
ret.append(match)
if len(ret) > 1:
print(f"Multiple data interfaces with name '{data_interface_name}' found with identifier {nwbfile.identifier}.")
if len(ret) >= 1:
return ret[0]
return None
Black formatted¶
With black
, we have a limited line length and indents reflect embedding.
def get_data_interface( # Each arg gets its own line
nwbfile,
data_interface_name,
data_interface_class=None,
unused_other_arg=None,
):
ret = { # dictionaries show embedding
"centroid_method": "two_pt_centroid",
"points": {
"point1": "greenLED",
"point2": "redLED_C",
},
"interpolate": True,
}
for module in nwbfile.processing.values():
match = module.data_interfaces.get(data_interface_name, None)
if match is not None:
if data_interface_class is not None and not isinstance(
match, data_interface_class
): # long lines broken up
continue
ret.append(match)
if len(ret) > 1:
print( # long strings need to be broken up manually
f"Multiple data interfaces with name '{data_interface_name}' "
f"found in NWBFile with identifier {nwbfile.identifier}. "
)
if len(ret) >= 1:
return ret[0]
return None
Control flow adjustments¶
Although subjective, we can do even better by adjusting the logic to follow how we read.
from typing import Type
def get_data_interface(...):
ret = {...}
# decide no input early
data_interface_class = data_interface_class or Type
for match in [ # generate via list comprehension
module.get_data_interface(data_interface_name)
for module in nwbfile.processing.values()
]: # only process good case, no `continue`
if match and isinstance(match, data_interface_class):
ret.append(match)
if len(ret) > 1:
print(...)
return ret[0] if len(ret) >= 1 else None # fits on one line
Atomizing¶
Working memory limits our ability to understand long code blocks.
We can extract pieces into separate places to give them a name and make 'one' memory chunk out of a set of functions.
Depending on the scope, chunks can be separated with ...
- Paragraph breaks - to group instructions together.
- Conditional assignment - for data maps local to a function.
- Methods of a class - for functions that deserve a separate name.
- Helpers in a script - for functions used multiple times in a schema.
- Util scripts in a package - for functions used throughout a project.
Atomizing example¶
- Let's read the next function as if we're revisiting old code.
- This example was taken from an existing project and adjusted for demonstration.
- Please review without commentary and make mental notes ow what each part line is doing and how they relate to other lines.
No commentary
class MyTable(dj.Computed):
...
def make(self, key):
rat_name = key["rat_name"]
ron_all_dict = {"some_data": 1}
tonks_all_dict = {"other_data": 2}
try:
if len((OtherTable & key).fetch("cluster_id")[0]) > 0:
if rat_name == "ron":
data_dict = ron_all_dict
elif rat_name == "tonks":
data_dict = tonks_all_dict
else:
raise ValueError(f"Unsupported rat {rat_name}")
for data_key, data_value in data_dict.items():
try:
if data_value == 1:
cluster_spike_times = (OtherTable & key).fetch_nwb()[
0
]["units"]["spike_times"]
else:
cluster_spike_times = (OtherTable & key).fetch_nwb()[
data_value - 1
]["units"]["spike_times"][data_key]
self.insert1(cluster_spike_times)
except KeyError:
print("cluster missing", key["nwb_file_name"])
else:
print("no spikes")
except IndexError:
print("no data")
With Commentary
Note how the numbers correspond to their counterparts - 1Q, 1A, 2Q, 2A ...class MyTable(dj.Computed):
...
def make(self, key):
rat_name = key["rat_name"] # 1Q. Can this function handle others?
ron_all_dict = {"some_data": 1} # 2Q. Are these parameters?
tonks_all_dict = {"other_data": 2}
try: # 3Q. What error could be thrown? And by what?
if len((OtherTable & key).fetch("cluster_id")[0]) > 0: # 4Q. What happens if none?
if rat_name == "ron":
data_dict = ron_all_dict # 2A. ok, we decide the data here
elif rat_name == "tonks":
data_dict = tonks_all_dict
else: # 1Q. Ok, we can only do these two
raise ValueError(f"Unsupported rat {rat_name}")
for data_key, data_value in data_dict.items(): # 2A. Maybe parameter?
try: # 5Q. What could throw an error?
if data_value == 1:
cluster_spike_times = (OtherTable & key).fetch_nwb()[
0
]["units"]["spike_times"] # 6Q. What do we need this for?
else:
cluster_spike_times = (OtherTable & key).fetch_nwb()[
data_value - 1
]["units"]["spike_times"][data_key]
self.insert1(cluster_spike_times) # 6A. Ok, insertion
except KeyError: # 5A. Mayble this fetch is unreliable?
print("cluster missing", key["nwb_file_name"])
else:
print("no spikes") # 4A. Ok we bail if no clusters
except IndexError: # 3A. What could have thrown this? Are we sure nothing else?
print("no data")
Embedding¶
- The process of stream of consciousness coding often generates an embedding trail from core out
-
Our mental model of A -> B -> C -> D may actually read like
D( C( B( A )))
or ... -
Prepare for D
- Open a loop for C
- Add caveat B
- Do core process A
- Check other condition B
- Close D
Let's contrast with an approach that reduces embedding.
class MyTable(dj.Computed):
...
def _get_cluster_times(self, key, nth_file, index): # We will need times
clust = (OtherTable & key).fetch_nwb()[nth_file]["units"]["spike_times"]
try: # Looks like this indexing may not return the data
return clust[index] if nth_file == 0 else clust # if/then handled here
except KeyError: # Show as err, keep moving
logger.error("Cluster missing", key["nwb_file_name"])
def make(self, key):
rat_paramsets = {"ron": {"some_data": 1}, "tonks": {"other_data": 2}} # informative variable name
if (rat_name := key["rat_name"]) not in rat_paramsets: # walrus operator `:=` can assign within `if`
raise ValueError(f"Unsupported rat {rat_name}") # we can only handle a subset a rats
rat_params = rat_paramsets[rat_name] # conditional assignment
if not len((OtherTable & key).fetch("cluster_id")[0]): # paragraph breaks separate chunks conceptually
logger.info(f"No spikes for {key}") # log level can be adjusted at run
insertion_list = [] # We're gonna insert something
for file_index, file_n in rat_params.items():
insertion_list.append(
self._get_cluster_times(key, file_n - 1, file_index) # there it is, clusters
)
self.insert(insertion_list) # separate inserts to happen all at once
Comments, Type hints and docstrings¶
It's tempting to leave comments in code, but they can become outdated and confusing. Instead try Atomizing and using Type hints and docstrings.
Type hints are not enforced, but make it much easier to tell the design intent
when reread. Docstrings are similarly optional, but make it easy to get prompts
without looking at the code again via help(myfunc)
Type hints¶
def get_data_interface(
nwbfile: pynwb.Nwbfile,
data_interface_name: Union[str, list], # one or the other
other_arg: Dict[str, Dict[str, dj.FreeTable]] = None, # show embedding
) -> NWBDataInterface: # What it returns. `None` if no return
pass
Docstrings¶
- Spyglass uses the NumPy docstring style, as opposed to Google.
- These are rendered in the API documentation
def get_data_interface(*args, **kwargs):
"""One-line description.
Additional notes or further description in case the one line above is
not enough.
Parameters
----------
nwbfile : pynwb.NWBFile
Description of the arg. e.g., The NWB file object to search in.
data_interface_name : Union[str, list]
More here.
data_interface_class : Dict[str, Dict[str, dj.FreeTable]], optional
more here
Warns
-----
LoggerWarning
Why warn.
Raises
------
ValueError
Why it would hit this error.
Returns
-------
data_interface : NWBDataInterface
Example
-------
> data_interface = get_data_interface(mynwb, "interface_name")
"""
pass
Error detection with linting¶
- Packages like
ruff
can show you bad code 'smells' while you write and fix some for you. - PEP8, Flake8 and other standards will flag issues like ...
- F401: Module imported but unused
- E402: Module level import not at top of file
- E713: Test for membership should be 'not in'
black
will fix a subset of Flake8 issues, but not all.ruff
identifies or fixes these rules and many others.
Automation¶
black
,isort
, andruff
can be run on save in most IDEs by searching their extensions.pre-commit
is a tool that can be used to run these checks before each commit, ensuring that all your code is formatted, as defined in ayaml
file.
default_stages: [commit, push]
exclude: (^.github/|^docs/site/|^images/)
repos:
- repo: https://github.com/ambv/black
rev: 24.1.1
hooks:
- id: black
language_version: python3.9
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
- id: codespell
args: [--toml, pyproject.toml]
additional_dependencies:
- tomli