Over the last several years, Jupyter Notebooks have become a staple in the data scientist’s toolkit. As both an interface for interactive computation and a document standard for structuring blocks of computational and narrative content, notebooks are ubiquitous, flexible, and powerful.
Since their inception, notebooks were aimed at many communication processes for scientists and educators, including journal publishing pipelines. However, there are still gaps in the current Jupyter Notebook landscape in terms of writing and publishing.
With this post we announce the Executable Book Project — a Sloan-funded grant to improve the state of writing and publishing with Jupyter Notebooks, along with the broader open source science ecosystem. The project is a collaboration between the Australian National University, UC Berkeley, and Northern Arizona University. We’ll cover some of the major challenges we hope to tackle below.
tl;dr: What are we hoping to build?#
From a user’s perspective, we want to enable:
Generation of a high-quality book from a single folder containing either markdown or Jupyter notebooks (authors can write in either). The book will support common publishing features, such as citations, cross-references, and numbered equations. As a part of this process, all executable code in the source files will be run, intelligently cached and embedded in the document. Book output formats will include HTML and PDF, as well as Jupyter notebooks that allow readers to step through and interact with content.
A simple command-line interface that allows users to focus on the content creation process, not on the backend stack that runs the actual building.
A smooth and convenient author-edit-compile cycle, including productivity tools for use in common editors (e.g., code completion). The option of using human-readable text-based source material will facilitate version control and distributed collaboration.
This stack should be entirely open source.
What are our principles and constraints?#
In running this project, we aim to adhere to several principles that we believe will result in higher-quality technology that aligns with the core principles of the open source community. Here are a few key components:
Support casual users equally to power users. Complicating the feature space to support a “power user feature” should be done with great care.
Build modular components that are useful elsewhere. Rather than building a single vertical stack, find parts of the workflow that naturally separate. Create modular tools for these parts so that they may benefit the community outside of the context of building interactive books.
Use pre-existing technology where possible. Rather than re-inventing the wheel, make every effort to utilize pre-existing open source tech.
Use pre-existing standards where possible. In the event that we must create new patterns of content creation or tooling, utilize prior art in the open source community as much as possible, especially when it comes to markup languages.
Contribute improvements upstream. Where we utilize pre-existing tools, contribute improvements to them as we build off of them for this project.
Design for the future. While we have a bit of funding now, it won’t last forever. This means the technology should be easy for potential developers to read, understand, and modify/improve.
Users should not need to know anything about the build system. If they want to dig into the guts of our infrastructure, they can, but knowledge of Sphinx, Latex, etc should not be a requirement. They should only need to use a simple tool to control the process.
Don’t try to do everything. Focus our tools on publishing computational documents with reasonable choices made for the user.
What does it mean to publish executable documents?#
We think of this process in three main parts: authoring, execution, and publishing.
Authoring is when the writer creates their content. They may do this in a notebook-focused interface like the Jupyter Notebook UI or Jupyter Lab, or in a traditional text editor. They require the ability to enrich their document with information such as equations, citations, and cross-references.
Execution occurs after the author has written their content and wants to prepare it for publishing. In this step, all executable content needs to be run against one or more languages. Authors may or may not want to run everything each time they publish, depending on how long the components take to run.
Publishing occurs once all of the content has been written and run. It is the final step where the author builds “final” artifacts from their build process. These artifacts might take many forms - the two most-common ones being an HTML website and a PDF.