📓 Notebook Development
📓 Notebook Development¶
This guide is a work in progress
What are notebooks?¶
“Jupyter notebooks are documents that combine live runnable code with narrative text (Markdown), equations (LaTeX), images, interactive visualizations and other rich output” (JupyterLab documentation)
There are several ways notebooks are used:
Exploratory messing-around (example)
won’t be very tidy and reproducibility is low
easy way to quickly store your ideas for later or share with somebody else
Demonstrations and documentation (example)
used to demonstrate how to use a particular package / system
good way to showcase something and can be part of a wider documentation
Tutorials (scientific or otherwise) (example)
ordered and informative with focus on the teaching element
can be used as a resource for a workshop
for a deep dive, read Teaching and Learning with Jupyter
Reproducible scientific analysis
publishable scientific content
portability and reproducibility is important
can be supplementary material for a publication
Basis for interactive dashboards (example)
combining powerful libraries to deploy a low-code dashboard developed directly in a notebook
Writing a notebook¶
Preamble: The top of the notebook should contain the following things to orientate the user:
Short introduction to what the notebook contains, including links to related notebooks & relevant resources.
List of non-standard requirements for the notebook: e.g. data accessed by the notebook; additional packages to be installed. In the context of the VRE, non-standard here refers to anything not supported by the VRE currently (we can then investigate supporting these if appropriate). For a more sophisticated setup, consider a requirements.txt and/or environment.yml to specify packages (and versions).
Import all modules used in the notebook, and specify data file paths (use pathlib for platform-agnostic paths). This will make it clear what other resources (outside the notebook) are required to run it - if you can run this first code cell, you should be able to run the rest.
Organisation: Divide the notebook into manageable sections separated by sub-titles and descriptive text.
Refactoring: As you experiment with things, the notebook will inevitably get disorganised and hard to follow. You should occasionally review this and merge or split code cells into logical units. Before leaving the notebook (and definitely before sharing with others!), restart the kernel and run the notebook from top to bottom to ensure it is valid.
Pitfalls of notebooks¶
Out-of-order execution: It is easy to change cells, re-execute them etc., in different orders, as you iteratively explore an analysis. This can rapidly get the notebook into an ambiguous state (the code written in the notebook no longer represents what has actually been run). Avoid this with frequent review & refactoring.
Managing the namespace: A long notebook can contain too many variables to keep track of. Sometimes you may inadvertently re-use a variable name that you have used earlier, leading to unforeseen consequences. Variables should be kept available only within the scope of where they are relevant, and having too many variables defined at a given moment makes it hard for the reader to follow. Avoid this by refactoring the contents of a code cell into a function.
Re-using code across notebooks: Often you will want to re-use a recipe developed in another notebook. You can simply copy across the code from one notebook into another - this is where refactoring the code into portable documented functions will help. However, this is not a very maintainable path (do you update both occurrences of the code when you want to change it?). If the code is particularly important and often re-used, then it should be moved into an importable Python module, or even to a core package (e.g. viresclient).
More detailed style guidance and worked examples
Problems with notebooks: challenges with: version control, integration with IDEs, testing and CI, linting, code quality, maintainability & extensibility
Improvements to workflow through Jupyter extensions
Diagram showing progress of a tool from notebook (usable by this notebook) to module+notebook (usable by any notebook in this repository) to package+notebook (usable by anybody) – increasing maturity