An R package is a bundled collection of R functions and data. While wider distribution on platforms such as CRAN requires meeting specific quality standards, anyone can create a package for personal use or GitHub repository. This highlights the collaborative and open-source nature of R development.
Creating and distributing an R package involves several steps. In this post, I would like to introduce a high-level overview of the process and how to distribute a package through GitHub repository.
1. Set Up Your Package Structure
a. Install the devtools
and roxygen2
Package
Before you start creating your package, make sure you have the
devtools
and roxygen2
package installed. The
devtools
is an R package that provides a
set of functions and tools to help developing an R package. Created by
Hadley Wickham, it simplifies many tasks from building package structure
to documenting and distributing the package. While it is possible to
create an R package solely based on base R functionality, the
devtools
package provides a more streamlined and modern
approach for package development.
The roxygen2
is another R package
designed for creating and maintaining documentation within R packages.
It simplifies the process of documenting your code by allowing you to
embed documentation directly within your R script
files.
install.packages(c("devtools", "roxygen2"))
b. Create Your Package Directory
Use the devtools
package to create a new directory for
your package. The following R command will create a directory with
necessary files and folders.
library(devtools)
create("path/to/YourPackageName")
Let’s navigate to the package directory and start developing your R
functions and documentation. The
devtools::create("YourPackageName")
will create the
following folder and files:
R/
: Place your R scripts here..gitignore
: This file is used to specify intentionally untracked files that Git should ignore. It’s commonly used to exclude files generated during the development process and other non-essential files..Rbuildignore
: This file lists files and directories that should be excluded when building the package. Similar to.gitignore
, it helps in specifying files that should not be included in the package when it is built.DESCRIPTION
: Package metadata, including its name, version, dependencies and other ingormation. It is very important for package management in R.YourPackageName.Rproj
: It is used by RStudio to denote a project. It contains project-specific settings and configurations.NAMESPACE
: TheNAMESPACE
file is used to specify the exports and imports of the package. It defines which functions and objects are accessible to users of the package.
2. Create R Files
Next step is to create your R scripts in R/
directory of
your package. When creating functions and objects, unlike Python, R
doesn’t have a single, universally agreed-upon naming convention.
However, general principles of any programming language can also be
applied to R:
- Choose consistency: Whichever style you choose, stick with it throughout your code for better readability and maintainability.
- Clarity matters: Use names that are descriptive and accurately reflect the object’s purpose and role.
- Avoid ambiguity: Do not reuse names for different objects, and choose names that don’t easily create confusion.
- Keep it concise: Long, convoluted names can be cumbersome and hinder readability. Aim for names that are informative without being excessive.
Additionally, if you’re working with existing R packages, it can be a good practice to match their naming conventions for better integration. For example, if you are working extensively with the Tidyverse packages developed by Hadley Wickham, you can consider adopting his style guide:
- Function naming: Use verbs for function names
(e.g.,
filter
,select
,mutate
). - Variable naming: Use nouns and avoid underscores for variable names.
- Assignment: Use
<-
for assignment rather than=
. - Spacing: Use spaces around all binary operators
(except
:
) and after commas. - Indentation: Use two spaces per indentation level.
- Curly Braces: Put opening curly braces on the same line as the function or control statement and place closing curly braces on their own line.
- Line Length: Line limits to 80 characters.
- Comments: Keep comments on a separate line,
starting with
#
.
a. Document Your Code
While writing R files and functions, you can document your code using
roxygen2
package. The documentation is written using a
special syntax in comments, and the roxygen2
processes
these comments to generate standard documentation files.
In a typical R package setup, it is common to have multiple R files. You may place the associated Roxygen comment blocks for each file and/or function. Here’s an example structure with four R files:
package.R
: This file is dedicated to
package-level metadata.
#' My R Package
#'
#' Description of your package.
#'
#' @docType package
#' @name mypackage
#' @title My R Package
#' @description Description of your package.
#' @author Your Name
#' @importFrom dplyr mutate filter
#'
#' @keywords package
#' @export
NULL
function.R
: This file contains
functions with their individual Roxygen documentation.
#' Function 1
#'
#' Description of function 1.
#'
#' @param x A numeric vector.
#' @return The mean of x.
#' @export
my_function1 <- function(x) {
mean(x)
}
#' Function 2
#'
#' Description of function 2.
#'
#' @param y A character vector.
#' @return The result of some operation.
#' @export
my_function2 <- function(y) {
# function code here
}
data.R
: This file contains data-related
code.
#' Sample dataset
#'
#' A data frame containing some sample data.
#'
#' @name sample_data
#' @export
sample_data <- data.frame(
x = 1:10,
y = rnorm(10)
)
utils.R
: This file contains utility
functions.
#' Utility function
#'
#' Description of the utility function.
#'
#' @param a Some parameter.
#' @return Some result.
#' @export
utility_function <- function(a) {
# function code here
}
This is just one way to organize your files. The key is to have a
dedicated file (such as package.R
) for
package-level metadata, and then use other files for
organizing your functions, data, and utility code.
b. Unit Testing
In R programming language, the testthat
package provides a framework for writing and running unit tests for your
function. For example, suppose that you have two R files named
numerical_deriv.R
and testthat_examples.R
in
the package folder, with contents look like:
func_numerical_deriv.R
:
derive = function(fn, x) {
eps = .Machine$double.eps
if(x == 0) {
h = 2 * eps
} else {
h = sqrt(eps) * x
}
deriv = (fn(x + h) - fn(x - h)) / (2 * h)
return(deriv)
}
test_numerical_deriv.R
:
library(testthat)
## Warning: package 'testthat' was built under R version 4.3.2
context("check numerical derivative function")
source("func_numerical_deriv.R")
test_that("Derivatives match on simple functions", {
expect_equal(deriv(function(x) x^2, 1), 2)
expect_equal(deriv(function(x) 2*x, -5), 2)
expect_equal(deriv(function(x) x^2, 0), 0)
expect_equal(deriv(function(x) 2*x, 0), 2)
expect_equal(deriv(function(x) exp(x), 0), exp(0))
})
## Test passed 😀
test_that("Error thrown when derivative doesn't exist", {
expect_error(deriv(function(x) log(x), 0))
})
## ── Warning: Error thrown when derivative doesn't exist ─────────────────────────
## NaNs produced
## Backtrace:
## ▆
## 1. ├─testthat::expect_error(deriv(function(x) log(x), 0))
## 2. │ └─testthat:::quasi_capture(...)
## 3. │ ├─testthat (local) .capture(...)
## 4. │ │ └─base::withCallingHandlers(...)
## 5. │ └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
## 6. └─global deriv(function(x) log(x), 0)
## 7. └─fn(x - h)
##
## ── Failure: Error thrown when derivative doesn't exist ─────────────────────────
## `deriv(function(x) log(x), 0)` did not throw an error.
## Error:
## ! Test failed
The three key components of the testthat
package are
expectations, tests, and context/files. The expectation
is the smallest unit of testing. It checks one as aspect of a function’s
output at a time. That is, it tests whether a call to a
function does what you expect for a test case. In
testthat
, the expectations starts with expect_
and takes two arguments: the actual results from the test case and your
expectation for the case. Some of the most useful expectations are:
expect_equal
/expect_identical
: Check for equality within numerical precision or exact equivalence1expect_match
: Checks whether a string matches a regular expression.expect_output
: Checks the output of a function the same wayexpect_match
would.expect_warning
/expect_error
: Checks whether the function gives an error or warning when it should.expect_is
: Checks whether the function gives a result of the correct class.expect_true
/expect_false
: Catch-alls for cases the other expectations don’t cover.
3. Prepare and Review Your Documents
At this point, you would have finished writing your R files with
Roxygen comments. To generate the DESCRIPTION
and
NAMESPACE
files, all you need to do is just run
devtools::document()
in your R session. The
roxygen2
will process the comments and prepare the
documentation files. Then, you should review the
DESCRIPTION
and NAMESPACE
, before distributing
your package.
The DESCRIPTION
file contains metadata
and information about your package. It helps users and other developers
understand key details about the package. Here is an example
DESCRIPTION
file:
Package: MyPackage
Type: Package
Title: A Sample R Package
Version: 0.1.0
Date: 2023-01-01
Authors@R: c(
person("John Doe", email = "john.doe@example.com", role = c("aut", "cre")),
person("Jane Smith", email = "jane.smith@example.com", role = "aut")
)
Maintainer: John Doe <john.doe@example.com>
Description: This is a sample R package for demonstration purposes.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
URL: https://github.com/johndoe/MyPackage
BugReports: https://github.com/johndoe/MyPackage/issues
Depends: R (>= 3.5.0)
Imports: dplyr, ggplot2
Suggests: testthat
Before distributing your package, it is recommended to review the
DESCRIPTION
package and, if necessary, update the document
through the Roxygen comments in your R files.
The NAMESPACE
file is used to define
your package’s namespace, which controls the visibility and
accessibility of functions, objects, and other elements. It specifies
which functions and objects are exported (visible to users) and which
are kept internal to the package. Here is an example
NAMESPACE
file:
# Generated by roxygen2: do not edit by hand
export(my_function1, my_function2)
export(sample_data)
exportPattern("^plot_", include = FALSE)
importFrom(dplyr, mutate, filter)
importFrom(stats, mean, sd)
S3method(print, my_class)
export
: This line specifies which functions and datasets are exported and will be visible to users when they load the package. In this example,my_function1
,my_function2
, andsample_data
are exported. TheexportPattern
line specifies that any object starting with “plot_” is not exported.importFrom
: These lines specify which functions are imported from other packages. In this case,mutate
andfilter
are imported from thedplyr
package, andmean
andsd
are imported from the base R stats package.S3method
: This line specifies that a particular S3 method2 (print for the S3 class my_class) is part of the package. This is relevant for packages that define S3 methods for generic functions.
Please note that in practice, you won’t need to manually edit the
NAMESPACE
and DESCRIPTION
file directly.
Instead, you typically use roxygen2
during package
development, and they automatically generate or update the files based
on the information provided in your source code and Roxygen comments.
The examples above are just to illustrate the syntax and content of the
files.
4. Distrubute Your Package
Finally, you are ready to distribute your package. The easiest way to
do it is through GitHub. Once you pushed your files to your repository,
users can install your package directly in R environment using the
devtools
package:
devtools::install_github("yourusername/YourPackageName")
To install an updated package, a user can add force=TRUE
parameter in the command:
devtools::install_github("yourusername/YourPackageName", force=TRUE)
The
expect_identical
is built upon theidentical()
function, which also checks for data type.↩︎In R, S3 (Simple, Scalar, and Sufficient) methods refer to a type of object-oriented programming used for defining and dispatching generic functions. S3 classes are typically used for simple and informal OOP, as it is light weight and easy to use. Other than S3, R also has S4, which is suited for complex systems with the need for formal class definitions, multiple dispatch, and inheritance.↩︎