In the beginning …

… there was a 9-track tape shipped from Bell Labs. S would be installed for “unix”. Extensions could be programmed in Fortran. I started using this around 1985.

S REPORT presaged Sweave and knitr: troff, eqn, pic could be interlarded with S computations.

Here’s a taste of inputs and outputs for S REPORT. In the left hand panel of Figure 1, .PP starts a paragraph, .EQ and .EN offset an equation markup, and backslash-left-bracket is used to start a block of S code that will be executed when the report is compiled. A right-bracket closes the code block. This markup generates the second paragraph in the right hand panel.

Figure 1. Left panel: troff, eqn, and S code for processing by S REPORT. Right panel: output including a table for which code is not shown. From Becker and Chambers (1984)

Fast forward:

In May 2002, Bioconductor 1.0 was released with 15 packages, developed for version 1.5 of R. By April of 2025 the collection has grown to over 2300 software packages, developed for version 4.5 of R. To give a sense of the components and processes in play, I made a sketch:

Figure 2. Author’s agitated overview of Bioconductor in early 2024. The motivation was, in part, an encounter with the notion that LaTeX might be replaced by Typst.

The “infinity signs” are used with “refresh” circular arrows to indicate that the associated resources can be modified at any time, while Bioconductor’s release is nominally “stable” for six months.

In sum

Figure 2 depicts many aspects of Bioconductor that are invisible to users as they do their work. “Ask and you shall receive” is not an overt commitment of Bioconductor’s core. I found it interesting to reflect on the ways in which new capabilities are introduced into the system. The user experience, particularly for those doing statistical analysis of numerical data, is not dramatically different from that of the user of S in the 1980s. The number of technologies involved, and the maintenance efforts required for those technologies and their interactions, is much greater.

It is natural, in my view, to refer to Bioconductor as an ecosystem, as it forms a home for many beings and processes engaged in biological data science. “Ecosystem”, for me, leads to “ecology”, and in my view, ecology is not just a study of interactions between beings and their environments, but includes a normative component. An ecology of Bioconductor will address how to manage its growth and change, identifying principles for expanding along new frontiers, for forming new connections, and for shedding materials that have lost value.

Upcoming

My purpose here is to set the stage for a series of posts examining some relatively new but underappreciated aspects of the project, including

biocViews, EDAM, and the description of Bioconductor resources
approaching GPU computing for genomics with Bioconductor
containers and binaries to speed environment creation
subcommunity dynamics: scverse, spatialdata, Bioconductor … how to relate?
- python interop
- OME-Zarr
values derived from excursions involving WebAssembly
“vendoring” sources of important libraries
curating genomic annotation
large arrays: sparsity and indexing

Reference

Becker, Richard A., and John M. Chambers. 1984. S: An Interactive Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth.