Google Summer of Code, 2017: Animint
Summary of the Animated Interactive Plots (animint) project under the R project for statistical computing.
Student: Faizan Uddin Fahad Khan
Mentors: Toby Dylan Hocking and Carson Sievert
Abstract
animint
package in R allows animated data visualization which is a useful tool for obtaining an intuitive understanding of patterns in multivariate data sets used extensively in data related environments. It uses the ggplot2
syntax and is based on the widely accepted Grammar of Graphics. This aim of the project was to build upon the already existing animint
package by adding new useful features all the while making it more portable and developer friendly.
Introduction
The motivation for this GSoC project stemmed from the fact that the animint
package depends on the ggplot2
package. This in itself is not a problem but it has been the root cause of many animint
issues encountered by users in the past (animint
issues #176, #173, #104). To handle these frequent recurring issues, we needed to make some additions to the ggplot2
source code, but our changes were not accepted by the maintainers of the package (ggplot2
PR #1649). So it was decided that since we cannot make the changes to the original code and depending on a fork also creates problems, the best way to go would be to have our own fork edited specially for animint
functionality. The fork would then be renamed to avoid issues where users had issues using the CRAN version of ggplot2
. This idea led us to create two new packages: animint2
- a successor to animint
, and ggplot2Animint
- animint2
specific version of ggplot2
.
First Evaluation
For the major part of the first coding period, the main aim was to clean up the animint
internals in animint2
and try to skim off any redundant code (Refactor Code). The original implementation used a mutable meta environment which made the internals complicated and it was hard to contain errors. This was cleaned up to a large extent with separate new functions including appropriate documentation which helped later on in the project.
Second Evaluation
After completing the refactoring part, we needed to solve the ggplot2
dependency issue which was the top on our priority list. After much effort and deliberation, it was concluded that the newest version of ggplot2
, version 2.2.1, was giving us an error which was hard and cumbersome to trace. Since our requirements were fulfilled by the ggplot2
v2.1.0 already, we decided to fall back on that. It saved us some needless effort and a lot of time. The validate-params branch in the ggplot2
fork was edited to remove the ggplot2
dependency and the new package was named ggplot2Animint
. This package supported the change in syntax proposed for later stages in the project.
Final Evaluation
After removing the dependency on ggplot2
, the first task was to adapt the animint2
code to the new syntax. The two major features of the animint
package: showSelected
and clickSelects
were to be passed as params, instead of the older convention where they were bundled with the other aesthetics
of the layer.
# Older syntax
geom_point(aes(xVar, yVar,
clickSelects=clickVar,
showSelected=showVar, showSelected2=showVar2,
showSelected.variable=selector.name,
showSelected.value=selector.value,
))
# New syntax
geom_point(aes(xVar, yVar),
clickSelects="clickVar",
showSelected=c("showVar", "showVar2", selector.name="selector.value"))
So all the tests and examples in the package had to be rewritten for the code to work. Getting the code to work proved to be suprisingly (and notoriously) more complicated than imagined because the ggplot2
code apparently uses environments to reference plots. As such, the animint
code needed to either enforce the code to make deep copies of the plots to be rendered (which needed excess resources and computation) or needed a workaround. We preferred the workaround to keep only the original mapping for our purposes instead of the entire plot. After the CI build was passing, the PR was merged to support the new syntax. The package also depends on the ggplot2Animint
package instead of the the CRAN ggplot2
, so further changes to ggplot2
won’t require any changes to the animint2
codebase.
Link to commits
The links to all the commits are given below:
Future Work
Currently the aim is to introduce a data.table
dependency in animint2
for performance optimization. The progress can be tracked here. Another idea for further improvement would be to port the geom specific code from animint2
to ggplot2Animint
.