Glue: A python Application for Multi-Dimensional Linked Data Exploration
Thomas Robitaille (Freelance scientific software developer, lead glue developer), firstname.lastname@example.org
Rapid increases in the volume and variety of data available to scientists offer unprecedented opportunities for discovery along with challenges for interactive data exploration and analysis. Astronomers routinely work with heterogeneous data spread out over catalogs, images, spectra, spectral cubes, and more complex datasets, and being able to effectively visualize and explore these data all together is crucial to enabling discoveries. To address this challenge, we have developed 1 a new open-source python package named glue that allows scientists to visualize many different types of data, and explore relationships within and across related datasets, by making it easy to select subsets of data interactively or programmatically in 1, 2, or 3 dimensions. Glue makes it easy to see those selections propagate live across all open visualizations of the data, such as 2D or 3D scatter plots, histograms, images, and volume renderings (watch a short introductory video). The concept of using multiple views of a dataset and having selections propagate between them is typically referred to as brushing and linking in the visualization community.
At present glue has one main (desktop-based) graphical user interface that most users interact with (see Figure 1). This interface consists of a canvas area in which users can add various data viewers, each of which is effectively a new window on the canvas. Which viewers are chosen and how they are arranged can be different depending on the particular question being researched. On the left is a sidebar containing a representation of all loaded datasets, as well as controls for the data viewers once these are created. In a typical session, users load in datasets, drag these onto the main canvas to create viewers, and then make selections in the different viewers. If multiple viewers include the same dataset, selections made in one of the viewers will propagate to all others.
While linking of selections within datasets is already useful in itself, for instance in the case of tabular data with dozens of fields/columns, the unique power of glue is the ability to link heterogeneous datasets that share conceptually linked components. Linking datasets together can be done graphically. The user selects two datasets that they want to link, and can then either select one component in each and link them as being conceptually identical, or can choose an arbitrary mapping function that may take two or more components to link. For example, users can indicate that a column in a table (for instance, a J‑band magnitude) is equivalent to a column in a different table. A more complex example would be to link the equatorial coordinates in a table with the Galactic coordinates of an image. More generally, any non-linear mapping between components is possible.
The session in Figure 1 shows a near-infrared image of the Hubble Ultra Deep Field (HUDF) and a catalog (Ryan et al. 2007) of galaxies in the field, which includes the position of the galaxies as well as various properties such as the redshift and the age. Linking the coordinates of the galaxies in the catalog with the coordinates of the image allows the sources to be shown on top of the image. The red points show a subset of points selected from the scatter plot, while the blue points show a subset of points selected from the histogram.
We have also developed 3D viewers, shown in Figure 2, which include a 3D scatter-plot viewer and a 3D volume-rendering viewer, as well as more-complex viewers, which includes one that can show trees/dendrograms of data and allows for the selection of structures in the data). In addition to using glue as a standalone application, users can also launch it from an Ipython or Jupyter Notebook. Finally, glue makes it easy to save sessions to files, and load them later or share them with collaborators.
Glue is designed from the ground up to be applicable to many different fields of science, as well as to industry. Many aspects of the application can easily be customized by users, from simple aspects such as custom colormaps for images or custom linking functions, to custom viewers or arbitrary plug-ins that may create new windows and operate in any way on the data or any other aspect of the application. Plugins can be distributed as normal python packages which, once installed, are then automatically loaded by the glue application.
Using this plugin framework, participants in the James Webb Space Telescope Data Analysis Development Forum (DADF) have been developing data analysis applications that will be usable inside or outside of glue to analyze spectroscopic data (e.g., from Hubble and Webb). As shown in Figure 3, when used in glue, these applications allow catalogs of sources with spectroscopic data to be displayed, and once a subset of sources has been selected based on any of the source attributes, the spectroscopic data (either 1D spectra or spectral cubes) for this subset of sources can be explored in more detail. Other examples of plugins being developed include one that allows the WorldWide Telescope application (now developed by the American Astronomical Society) to be used inside glue, and also includes plugins that add support for reading in medical or geospatial file formats.
In addition to working on supporting the needs of Hubble data analysis and future missions such as Webb, we are planning to work on making sure that glue can be used with very large datasets, as well as simulations with non-regular-cartesian grids (in collaboration with the yt project). We are also investigating ways to bring glue to the browser, to allow users to more easily explore remote datasets.
Trying out glue and getting involved
Glue is compatible with Linux, MacOS X, and Windows; works with python 2.7 or python 3.3 and above; and is easy to install using the conda package manager (it is available in both the conda-forge and astroconda channels).
The development of glue is done in the open and all the code is open-source—the source code for the main glue package as well as many of the plugins can be found on GitHub, and we have user and developer mailing lists. We welcome anyone interested in joining the project!
Brammer, G. B., et al. 2012, ApJ, 758L
Grogin, N. A., et al. 2011, ApJS, 197, 35
Koekemoer, A., et al. 2011, ApJS, 197, 36
Ryan, R. E., Jr., et al. 2007, AAS 210, BAAS, 39, p. 104
1 The glue project is led by Alyssa Goodman at the Harvard-Smithsonian Center for Astrophysics, and development has been funded by the NASA JWST project through a contract with STScI.