heddlr
This vignette serves as a basic introduction to the
heddlr
package, a set of utilities to make it easier to
write R Markdown documents with sections that repeat or which might need
to add or remove sections based on an underlying data source. In order
to demonstrate the essentials of how the package works, let’s imagine we
have a super cool R Markdown document, which looks something like
this:
---
title: "My cool report!"
author: "Captain heddlr"
output: html_document
---
# Let's talk about irises!
## Iris setosa
This species of flower is great! It has a mean sepal length of
`.r mean(iris[iris$Species == "setosa", "Sepal.Length"])`, and a
mean sepal width of `.r mean(iris[iris$Species == "setosa", "Sepal.Width"])`.
That looks like this on a graph!
.```{r}
iris %>%
filter(Species == "setosa") %>%
ggplot(aes(Sepal.Length, Sepal.Width)) +
geom_point()
.```
## Iris virginica
This species of flower is great! It has a mean sepal length of
`.r mean(iris[iris$Species == "virginica", "Sepal.Length"])`, and a
mean sepal width of `.r mean(iris[iris$Species == "virginica", "Sepal.Width"])`.
That looks like this on a graph!
.```{r}
iris %>%
filter(Species == "virginica") %>%
ggplot(aes(Sepal.Length, Sepal.Width)) +
geom_point()
.```
This is a great report, and it probably didn’t take that long to create. However, one day Joe down the hall points out that there are actually more types of irises than the ones you’re always talking about - and your database already has information about one called versicolor!
If you wanted, you could go ahead and copy and paste the species section again, making sure to change all the species names to versicolor. However, there’s some level of risk associated with that – you can wind up with errors in your reporting if you miss replacing a value. More importantly, though, is that copying and pasting by hand doesn’t scale to reports which are put out often and need multiple sections added or removed. It can save a lot of time and energy to instead automate that task away.
That’s the motivation behind heddlr
1: to reduce that
repetitive work and, as a side effect, simplify your code. To do so,
heddlr
looks at reports as a collection of components,
which we’ll call patterns. As it happens, our super cool report above is
made up of two patterns – first, the setup material:
---
title: "My cool report!"
author: "Captain heddlr"
output: html_document
---
.```{r setup}
library(dplyr)
library(ggplot2)
.```
# Let's talk about irises!
And secondly the species-specific section, which gets repeated for each species in our dataset – I’ve swapped the specific name out for a placeholder value:
## Iris SPECIES_NAME
This species of flower is great! It has a mean sepal length of
`.r mean(iris[iris$Species == "SPECIES_NAME", "Sepal.Length"])`, and a
mean sepal width of `.r mean(iris[iris$Species == "SPECIES_NAME", "Sepal.Width"])`.
That looks like this on a graph!
.```{r}
iris %>%
filter(Species == "SPECIES_NAME") %>%
ggplot(aes(Sepal.Length, Sepal.Width)) +
geom_point()
.```
When using heddlr
, we’ll usually go ahead and save each
of those patterns in their own files – for our example report, we’ll
name those files setup_pattern.Rmd
and
species_pattern.Rmd
respectively2.
We then have a few ways we can import them into our R session. Let’s
load heddlr
and then walk through them:
The first and most straightforward method is to use
heddlr::import_pattern()
, which does more or less what
you’d expect and imports a single pattern into a single R object.
# These can be any sort of plaintext file -- I tend to save them as .Rmd,
# so that I can see code highlighting in R Studio with them, but any extension
# should work fine
setup_pattern <- import_pattern("setup_pattern.Rmd")
species_pattern <- import_pattern("species_pattern.Rmd")
This gives us objects that contain strings like this:
#> [1] "---\ntitle: \"My cool report!\"\nauthor: \"Captain heddlr\"\noutput: html_document\n---\n\n```{r setup}\nlibrary(dplyr)\nlibrary(ggplot2)\n```\n\n# Let's talk about irises!\n\n"
However, it can be helpful in reports with multiple patterns to store
everything in one object, just to have fewer things floating around your
top-level environment. heddlr
provides
heddlr::import_draft()
for this purpose, wrapping an
lapply
call which will return a single list object holding
all of your patterns:
iris_draft <- import_draft(
"setup_pattern" = "setup_pattern.Rmd",
"species_pattern" = "species_pattern.Rmd"
)
#> $setup_pattern
#> [1] "---\ntitle: \"My cool report!\"\nauthor: \"Captain heddlr\"\noutput: html_document\n---\n\n```{r setup}\nlibrary(dplyr)\nlibrary(ggplot2)\n```\n\n# Let's talk about irises!\n\n"
#>
#> $species_pattern
#> [1] "## Iris SPECIES_NAME\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == \"SPECIES_NAME\", \"Sepal.Length\"])`, and \nmean sepal width of `r mean(iris[iris$Species == \"SPECIES_NAME\", \"Sepal.Width\"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n filter(Species == \"SPECIES_NAME\") %>%\n ggplot(aes(Sepal.Length, Sepal.Width)) + \n geom_point()\n```\n"
Now that we’ve got our patterns into R, it’s time to start working
with them. To demonstrate how we do that with heddlr
, we
first need to load a few libraries:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(purrr)
Our first function that we’ll use to work with patterns is
heddlr::heddle()
. On the most basic level, this is the
function that will replace the placeholders in our patterns with our
data. If we have a vector containing the values we want to use for each
pattern, we’re able to use this function as follows and get a vector in
return:
# heddle takes three arguments: data, pattern, placeholder to replace
heddle(unique(iris$Species), "This is a pattern - CODE ", "CODE")
#> [1] "This is a pattern - setosa " "This is a pattern - versicolor "
#> [3] "This is a pattern - virginica "
In the context of our example document, this function call would look like this:
#> [1] "## Iris setosa\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == \"setosa\", \"Sepal.Length\"])`, and \nmean sepal width of `r mean(iris[iris$Species == \"setosa\", \"Sepal.Width\"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n filter(Species == \"setosa\") %>%\n ggplot(aes(Sepal.Length, Sepal.Width)) + \n geom_point()\n```\n"
It isn’t super important, but I think of the outputs from
heddle()
as being components of a larger
template, and will be using similar terminology from here on
out.
If we wanted to instead use heddle()
as part of a
magrittr pipeline, we can create new dataframe columns using
dplyr::mutate()
:
iris %>%
distinct(Species) %>%
# exact same pattern of arguments: data, pattern, placeholder to replace
mutate(component = heddle(Species, iris_draft$species_pattern, "SPECIES_NAME"))
#> Species
#> 1 setosa
#> 2 versicolor
#> 3 virginica
#> component
#> 1 ## Iris setosa\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == "setosa", "Sepal.Length"])`, and \nmean sepal width of `r mean(iris[iris$Species == "setosa", "Sepal.Width"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n filter(Species == "setosa") %>%\n ggplot(aes(Sepal.Length, Sepal.Width)) + \n geom_point()\n```\n
#> 2 ## Iris versicolor\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == "versicolor", "Sepal.Length"])`, and \nmean sepal width of `r mean(iris[iris$Species == "versicolor", "Sepal.Width"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n filter(Species == "versicolor") %>%\n ggplot(aes(Sepal.Length, Sepal.Width)) + \n geom_point()\n```\n
#> 3 ## Iris virginica\n\nThis species of flower is great! It has a mean sepal length of \n`r mean(iris[iris$Species == "virginica", "Sepal.Length"])`, and \nmean sepal width of `r mean(iris[iris$Species == "virginica", "Sepal.Width"])`. \nThat looks like this on a graph!\n\n```{r}\niris %>%\n filter(Species == "virginica") %>%\n ggplot(aes(Sepal.Length, Sepal.Width)) + \n geom_point()\n```\n
We can also use heddle()
as its own step in a pipeline.
When we pass heddle()
a dataframe like this, instead of a
vector, we have to specify which column we want to replace the
placeholder with:
iris %>%
distinct(Species) %>%
# the data argument is provided by %>%
# so we just provide the pattern and placeholder
# (format "PLACEHOLDER" = Variable)
heddle("This is a pattern - CODE ", "CODE" = Species)
#> [1] "This is a pattern - setosa " "This is a pattern - versicolor "
#> [3] "This is a pattern - virginica "
An advantage of this method is we can now replace multiple placeholders with our data in a single function call:
iris %>%
distinct(Species) %>%
heddle("This is a pattern - CODE ", "CODE" = Species, "This" = Species)
#> [1] "setosa is a pattern - setosa "
#> [2] "versicolor is a pattern - versicolor "
#> [3] "virginica is a pattern - virginica "
One last way we can use heddle()
is to use
purrr::map()
to apply it to nested columns made via
tidyr::nest()
. To do so, we just provide arguments in the
same way as above:
iris %>%
nest(nested = Species) %>%
mutate(component = map(nested,
heddle,
"This is a pattern - CODE ",
"CODE" = Species
)) %>%
head(2)
#> # A tibble: 2 × 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width nested component
#> <dbl> <dbl> <dbl> <dbl> <list> <list>
#> 1 5.1 3.5 1.4 0.2 <tibble [1 × 1]> <chr [1]>
#> 2 4.9 3 1.4 0.2 <tibble [1 × 1]> <chr [1]>
This is also the supported way to change multiple placeholder values
while saving a component as a column in a dataframe via mutate – nest
the columns you’re using to replace placeholders with and then use
purrr
to replace the placeholders in one step.
You’ll notice that the output of this method is another list column,
not the same string column we’re used to seeing as an output. Those
familiar with purrr:map()
might know that we can get a
character output from many dataframes with purrr::map_chr()
– however, this doesn’t work with dataframes where you’ll get more than
one output from the map()
call:
iris %>%
nest(nested = Species) %>%
mutate(component = map_chr(nested, heddle,
"This is a pattern - CODE ",
"CODE" = Species
))
# > Error: Result 102 must be a single string, not a character vector of length 2
Instead, we can turn to our next function,
heddlr::make_template()
. There are a few different ways
that we can use this function. For our current situation, for instance,
we can use purrr::map_chr()
to apply
make_template()
to our new component list column,
transforming it into the normal set of strings we’re used to:
iris %>%
nest(nested = Species) %>%
mutate(
component = map(nested, heddle, "This is a pattern - CODE ", "CODE" = Species),
component = map_chr(component, make_template)
) %>%
head(2)
#> # A tibble: 2 × 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width nested component
#> <dbl> <dbl> <dbl> <dbl> <list> <chr>
#> 1 5.1 3.5 1.4 0.2 <tibble [1 × 1]> "This is a…
#> 2 4.9 3 1.4 0.2 <tibble [1 × 1]> "This is a…
In addition to this, there are two other contexts we can use
make_template()
. If we pass it a dataframe and a vector, it
will collapse that vector into a single string:
iris %>%
nest(nested = Species) %>%
mutate(
component = map(nested, heddle,
"This is a pattern - CODE ",
"CODE" = Species
),
component = map_chr(component, make_template)
) %>%
head(2) %>%
make_template(component)
#> [1] "This is a pattern - setosa This is a pattern - setosa "
If instead the first argument we pass it is a vector, it will combine all the vectors you provide it into a single string:
In the context of our example report, these steps would look something like this:
species_template <- iris %>%
distinct(Species) %>%
mutate(component = heddle(
Species,
iris_draft$species_pattern,
"SPECIES_NAME"
)) %>%
make_template(component)
report_template <- make_template(iris_draft$setup_pattern, species_template)
I refer to these output strings as templates, which are the
second to last step in the heddlr
pipeline. However, in
order to actually create our reports, we need to export these templates
to .Rmd files. In order to do that, heddlr
provides a
helpful function, aptly named heddlr::export_template()
.
Normally, export_template
takes two arguments – the
template to write out, and the file to write it out to. Here, I’m going
to tell it to print to stdout()
instead – that will just
output here what our sample report would look like:
#> ---
#> title: "My cool report!"
#> author: "Captain heddlr"
#> output: html_document
#> ---
#>
#> ```{r setup}
#> library(dplyr)
#> library(ggplot2)
#> ```
#>
#> # Let's talk about irises!
#>
#> ## Iris setosa
#>
#> This species of flower is great! It has a mean sepal length of
#> `r mean(iris[iris$Species == "setosa", "Sepal.Length"])`, and
#> mean sepal width of `r mean(iris[iris$Species == "setosa", "Sepal.Width"])`.
#> That looks like this on a graph!
#>
#> ```{r}
#> iris %>%
#> filter(Species == "setosa") %>%
#> ggplot(aes(Sepal.Length, Sepal.Width)) +
#> geom_point()
#> ```
#> ## Iris versicolor
#>
#> This species of flower is great! It has a mean sepal length of
#> `r mean(iris[iris$Species == "versicolor", "Sepal.Length"])`, and
#> mean sepal width of `r mean(iris[iris$Species == "versicolor", "Sepal.Width"])`.
#> That looks like this on a graph!
#>
#> ```{r}
#> iris %>%
#> filter(Species == "versicolor") %>%
#> ggplot(aes(Sepal.Length, Sepal.Width)) +
#> geom_point()
#> ```
#> ## Iris virginica
#>
#> This species of flower is great! It has a mean sepal length of
#> `r mean(iris[iris$Species == "virginica", "Sepal.Length"])`, and
#> mean sepal width of `r mean(iris[iris$Species == "virginica", "Sepal.Width"])`.
#> That looks like this on a graph!
#>
#> ```{r}
#> iris %>%
#> filter(Species == "virginica") %>%
#> ggplot(aes(Sepal.Length, Sepal.Width)) +
#> geom_point()
#> ```
And there we have it! Repeating the above steps will always generate sections for each flower included in your dataset, whether or not Joe down the hall remembered to tell you about it.
To recap, the essential steps to make our report can be summed up as follows:
import_pattern()
or import_draft()
heddle()
make_template()
export_template()
I’ll usually save these steps into a “generator” file, which I can
then run every time I need to regenerate my report. I’ll also often
include a step in that file that calls rmarkdown::render()
on the generated report file, so that I can just run a single script in
order to remake and rebuild my entire report. Obviously, this can be
something of an overkill for reports as simple as our example here, but
it can be a huge time saver for more complex reports that have more
repeating sections, built more often, or are based on more dynamic data
sources that require maintenance to add or remove components between
builds. It also can help make your code DRYer3 so you can make edits
in a single place and see them replicated across your entire report.
If you want to see this process applied to a somewhat more complicated example, check out how we can make this dashboard via this article.
It’s a play on the loom component, since
we’re trying to automate the Sweave/knitr process and someone already
took the name loomr
. ↩︎
In order for the vignettes for this package to build
correctly, I haven’t actually saved these files off separately – if you
look at the source code for this vignette, you’ll notice I’m not
actually using the heddlr
functions, but rather using some
workarounds to get the same exact results. ↩︎