Glue: A python Application for Multi-Dimensional Linked Data Exploration

Thomas Robitaille (Freelance scientific software developer, lead glue developer), thomas.robitaille@gmail.com

Introduction
Rapid increases in the volume and variety of data available to scientists offer unprecedented opportunities for discovery along with challenges for interactive data exploration and analysis. Astronomers routinely work with heterogeneous data spread out over catalogs, images, spectra, spectral cubes, and more complex datasets, and being able to effectively visualize and explore these data all together is crucial to enabling discoveries. To address this challenge, we have developed 1 a new open-source python package named glue that allows scientists to visualize many different types of data, and explore relationships within and across related datasets, by making it easy to select subsets of data interactively or programmatically in 1, 2, or 3 dimensions. Glue makes it easy to see those selections propagate live across all open visualizations of the data, such as 2D or 3D scatter plots, histograms, images, and volume renderings (watch a short introductory video). The concept of using multiple views of a dataset and having selections propagate between them is typically referred to as brushing and linking in the visualization community.

Main functionality
At present glue has one main (desktop-based) graphical user interface that most users interact with (see Figure 1). This interface consists of a canvas area in which users can add various data viewers, each of which is effectively a new window on the canvas. Which viewers are chosen and how they are arranged can be different depending on the particular question being researched. On the left is a sidebar containing a representation of all loaded datasets, as well as controls for the data viewers once these are created. In a typical session, users load in datasets, drag these onto the main canvas to create viewers, and then make selections in the different viewers. If multiple viewers include the same dataset, selections made in one of the viewers will propagate to all others.

Figure 1: The main glue application window, showing image data (WFC3 at 1.6 µm) and catalog data (Ryan et al. 2007) from the Hubble Ultra Deep Field survey. The scatter plot (top left) shows the age and redshift for all galaxies in the catalog, with the red and blue points showing subsets. The histogram (top right) shows the best matching spectral template. The image (bottom) shows the WFC3 data with the catalog galaxies overlaid. The yellow shape shows a selection that is in progress using a lasso selection tool—once the user finishes the selection, the red subset will update to include only points inside the yellow selection region.

While linking of selections within datasets is already useful in itself, for instance in the case of tabular data with dozens of fields/columns, the unique power of glue is the ability to link heterogeneous datasets that share conceptually linked components. Linking datasets together can be done graphically. The user selects two datasets that they want to link, and can then either select one component in each and link them as being conceptually identical, or can choose an arbitrary mapping function that may take two or more components to link. For example, users can indicate that a column in a table (for instance, a J‑band magnitude) is equivalent to a column in a different table. A more complex example would be to link the equatorial coordinates in a table with the Galactic coordinates of an image. More generally, any non-linear mapping between components is possible.

The session in Figure 1 shows a near-infrared image of the Hubble Ultra Deep Field (HUDF) and a catalog (Ryan et al. 2007) of galaxies in the field, which includes the position of the galaxies as well as various properties such as the redshift and the age. Linking the coordinates of the galaxies in the catalog with the coordinates of the image allows the sources to be shown on top of the image. The red points show a subset of points selected from the scatter plot, while the blue points show a subset of points selected from the histogram.

We have also developed 3D viewers, shown in Figure 2, which include a 3D scatter-plot viewer and a 3D volume-rendering viewer, as well as more-complex viewers, which includes one that can show trees/dendrograms of data and allows for the selection of structures in the data). In addition to using glue as a standalone application, users can also launch it from an Ipython or Jupyter Notebook. Finally, glue makes it easy to save sessions to files, and load them later or share them with collaborators.

Figure 2: Examples of visualizations in glue made with the new 3D viewers. From left to right: 13CO spectral cube of the L1448 star-forming region in the Perseus molecular cloud; the location of earthquakes around the world (with yellower colors indicating locations deeper in the Earth’s crust); two medical scans from different instruments visualized in grey and green.

Glue is designed from the ground up to be applicable to many different fields of science, as well as to industry. Many aspects of the application can easily be customized by users, from simple aspects such as custom colormaps for images or custom linking functions, to custom viewers or arbitrary plug-ins that may create new windows and operate in any way on the data or any other aspect of the application. Plugins can be distributed as normal python packages which, once installed, are then automatically loaded by the glue application.

Using this plugin framework, participants in the James Webb Space Telescope Data Analysis Development Forum (DADF) have been developing data analysis applications that will be usable inside or outside of glue to analyze spectroscopic data (e.g., from Hubble and Webb). As shown in Figure 3, when used in glue, these applications allow catalogs of sources with spectroscopic data to be displayed, and once a subset of sources has been selected based on any of the source attributes, the spectroscopic data (either 1D spectra or spectral cubes) for this subset of sources can be explored in more detail. Other examples of plugins being developed include one that allows the WorldWide Telescope application (now developed by the American Astronomical Society) to be used inside glue, and also includes plugins that add support for reading in medical or geospatial file formats.

Figure 3: An example of GLUE being used to explore data from the CANDELS (Grogin et al. 2011; Koekemoer et al. 2011) and 3DHST (Brammer et al. 2012) surveys. The scatter plot (top left) shows the magnitude of the galaxies versus the ellipticity; the table viewer (top right) shows the entries in the catalog, with the selected galaxy highlighted; the spectrum viewer (bottom left) is the Specviz tool being developed at STScI, which shows the spectrum of the selected source; and the image viewer (bottom right) shows an ACS image of part of the field, with the selected source.

In addition to working on supporting the needs of Hubble data analysis and future missions such as Webb, we are planning to work on making sure that glue can be used with very large datasets, as well as simulations with non-regular-cartesian grids (in collaboration with the yt project). We are also investigating ways to bring glue to the browser, to allow users to more easily explore remote datasets.

Trying out glue and getting involved

Glue is compatible with Linux, MacOS X, and Windows; works with python 2.7 or python 3.3 and above; and is easy to install using the conda package manager (it is available in both the conda-forge and astroconda channels).

The development of glue is done in the open and all the code is open-source—the source code for the main glue package as well as many of the plugins can be found on GitHub, and we have user and developer mailing lists. We welcome anyone interested in joining the project!

References

Brammer, G. B., et al. 2012, ApJ, 758L

Grogin, N. A., et al. 2011, ApJS, 197, 35

Koekemoer, A., et al. 2011, ApJS, 197, 36

Ryan, R. E., Jr., et al. 2007, AAS 210, BAAS, 39, p. 104


1 The glue project is led by Alyssa Goodman at the Harvard-Smithsonian Center for Astrophysics, and development has been funded by the NASA JWST project through a contract with STScI.