How to make an R package

 

An R package is a bundled collection of R functions and data. While wider distribution on platforms such as CRAN requires meeting specific quality standards, anyone can create a package for personal use or GitHub repository. This highlights the collaborative and open-source nature of R development.

Creating and distributing an R package involves several steps. In this post, I would like to introduce a high-level overview of the process and how to distribute a package through GitHub repository.



1. Set Up Your Package Structure

a. Install the devtools and roxygen2 Package

Before you start creating your package, make sure you have the devtools and roxygen2 package installed. The devtools is an R package that provides a set of functions and tools to help developing an R package. Created by Hadley Wickham, it simplifies many tasks from building package structure to documenting and distributing the package. While it is possible to create an R package solely based on base R functionality, the devtools package provides a more streamlined and modern approach for package development.

The roxygen2 is another R package designed for creating and maintaining documentation within R packages. It simplifies the process of documenting your code by allowing you to embed documentation directly within your R script files.

install.packages(c("devtools", "roxygen2"))


b. Create Your Package Directory

Use the devtools package to create a new directory for your package. The following R command will create a directory with necessary files and folders.

library(devtools)
create("path/to/YourPackageName")


Let’s navigate to the package directory and start developing your R functions and documentation. The devtools::create("YourPackageName") will create the following folder and files:

  • R/: Place your R scripts here.
  • .gitignore: This file is used to specify intentionally untracked files that Git should ignore. It’s commonly used to exclude files generated during the development process and other non-essential files.
  • .Rbuildignore: This file lists files and directories that should be excluded when building the package. Similar to .gitignore, it helps in specifying files that should not be included in the package when it is built.
  • DESCRIPTION: Package metadata, including its name, version, dependencies and other ingormation. It is very important for package management in R.
  • YourPackageName.Rproj: It is used by RStudio to denote a project. It contains project-specific settings and configurations.
  • NAMESPACE: The NAMESPACE file is used to specify the exports and imports of the package. It defines which functions and objects are accessible to users of the package.



2. Create R Files

Next step is to create your R scripts in R/ directory of your package. When creating functions and objects, unlike Python, R doesn’t have a single, universally agreed-upon naming convention. However, general principles of any programming language can also be applied to R:

  • Choose consistency: Whichever style you choose, stick with it throughout your code for better readability and maintainability.
  • Clarity matters: Use names that are descriptive and accurately reflect the object’s purpose and role.
  • Avoid ambiguity: Do not reuse names for different objects, and choose names that don’t easily create confusion.
  • Keep it concise: Long, convoluted names can be cumbersome and hinder readability. Aim for names that are informative without being excessive.

Additionally, if you’re working with existing R packages, it can be a good practice to match their naming conventions for better integration. For example, if you are working extensively with the Tidyverse packages developed by Hadley Wickham, you can consider adopting his style guide:

  1. Function naming: Use verbs for function names (e.g., filter, select, mutate).
  2. Variable naming: Use nouns and avoid underscores for variable names.
  3. Assignment: Use <- for assignment rather than =.
  4. Spacing: Use spaces around all binary operators (except :) and after commas.
  5. Indentation: Use two spaces per indentation level.
  6. Curly Braces: Put opening curly braces on the same line as the function or control statement and place closing curly braces on their own line.
  7. Line Length: Line limits to 80 characters.
  8. Comments: Keep comments on a separate line, starting with #.


a. Document Your Code

While writing R files and functions, you can document your code using roxygen2 package. The documentation is written using a special syntax in comments, and the roxygen2 processes these comments to generate standard documentation files.

In a typical R package setup, it is common to have multiple R files. You may place the associated Roxygen comment blocks for each file and/or function. Here’s an example structure with four R files:

package.R: This file is dedicated to package-level metadata.

#' My R Package
#'
#' Description of your package.
#'
#' @docType package
#' @name mypackage
#' @title My R Package
#' @description Description of your package.
#' @author Your Name
#' @importFrom dplyr mutate filter
#'
#' @keywords package
#' @export
NULL


function.R: This file contains functions with their individual Roxygen documentation.

#' Function 1
#'
#' Description of function 1.
#'
#' @param x A numeric vector.
#' @return The mean of x.
#' @export
my_function1 <- function(x) {
  mean(x)
}

#' Function 2
#'
#' Description of function 2.
#'
#' @param y A character vector.
#' @return The result of some operation.
#' @export
my_function2 <- function(y) {
  # function code here
}


data.R: This file contains data-related code.

#' Sample dataset
#'
#' A data frame containing some sample data.
#'
#' @name sample_data
#' @export
sample_data <- data.frame(
  x = 1:10,
  y = rnorm(10)
)


utils.R: This file contains utility functions.

#' Utility function
#'
#' Description of the utility function.
#'
#' @param a Some parameter.
#' @return Some result.
#' @export
utility_function <- function(a) {
  # function code here
}


This is just one way to organize your files. The key is to have a dedicated file (such as package.R) for package-level metadata, and then use other files for organizing your functions, data, and utility code.


b. Unit Testing

In R programming language, the testthat package provides a framework for writing and running unit tests for your function. For example, suppose that you have two R files named numerical_deriv.R and testthat_examples.R in the package folder, with contents look like:

func_numerical_deriv.R:

derive = function(fn, x) {
  eps = .Machine$double.eps
  if(x == 0) {
    h = 2 * eps
  } else {
    h = sqrt(eps) * x
  }
  
  deriv = (fn(x + h) - fn(x - h)) / (2 * h)
  return(deriv)
}


test_numerical_deriv.R:

library(testthat)
## Warning: package 'testthat' was built under R version 4.3.2
context("check numerical derivative function")
source("func_numerical_deriv.R")

test_that("Derivatives match on simple functions", {
  expect_equal(deriv(function(x) x^2, 1), 2)
  expect_equal(deriv(function(x) 2*x, -5), 2)
  expect_equal(deriv(function(x) x^2, 0), 0)
  expect_equal(deriv(function(x) 2*x, 0), 2)
  expect_equal(deriv(function(x) exp(x), 0), exp(0))
})
## Test passed 😀
test_that("Error thrown when derivative doesn't exist", {
  expect_error(deriv(function(x) log(x), 0))
})
## ── Warning: Error thrown when derivative doesn't exist ─────────────────────────
## NaNs produced
## Backtrace:
##     ▆
##  1. ├─testthat::expect_error(deriv(function(x) log(x), 0))
##  2. │ └─testthat:::quasi_capture(...)
##  3. │   ├─testthat (local) .capture(...)
##  4. │   │ └─base::withCallingHandlers(...)
##  5. │   └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
##  6. └─global deriv(function(x) log(x), 0)
##  7.   └─fn(x - h)
## 
## ── Failure: Error thrown when derivative doesn't exist ─────────────────────────
## `deriv(function(x) log(x), 0)` did not throw an error.
## Error:
## ! Test failed


The three key components of the testthat package are expectations, tests, and context/files. The expectation is the smallest unit of testing. It checks one as aspect of a function’s output at a time. That is, it tests whether a call to a function does what you expect for a test case. In testthat, the expectations starts with expect_ and takes two arguments: the actual results from the test case and your expectation for the case. Some of the most useful expectations are:

  • expect_equal/expect_identical: Check for equality within numerical precision or exact equivalence1
  • expect_match: Checks whether a string matches a regular expression.
  • expect_output: Checks the output of a function the same way expect_match would.
  • expect_warning/expect_error: Checks whether the function gives an error or warning when it should.
  • expect_is: Checks whether the function gives a result of the correct class.
  • expect_true/expect_false: Catch-alls for cases the other expectations don’t cover.



3. Prepare and Review Your Documents

At this point, you would have finished writing your R files with Roxygen comments. To generate the DESCRIPTION and NAMESPACE files, all you need to do is just run devtools::document() in your R session. The roxygen2 will process the comments and prepare the documentation files. Then, you should review the DESCRIPTION and NAMESPACE, before distributing your package.

The DESCRIPTION file contains metadata and information about your package. It helps users and other developers understand key details about the package. Here is an example DESCRIPTION file:

Package: MyPackage
Type: Package
Title: A Sample R Package
Version: 0.1.0
Date: 2023-01-01
Authors@R: c(
    person("John Doe", email = "john.doe@example.com", role = c("aut", "cre")),
    person("Jane Smith", email = "jane.smith@example.com", role = "aut")
)
Maintainer: John Doe <john.doe@example.com>
Description: This is a sample R package for demonstration purposes.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
URL: https://github.com/johndoe/MyPackage
BugReports: https://github.com/johndoe/MyPackage/issues
Depends: R (>= 3.5.0)
Imports: dplyr, ggplot2
Suggests: testthat


Before distributing your package, it is recommended to review the DESCRIPTION package and, if necessary, update the document through the Roxygen comments in your R files.

The NAMESPACE file is used to define your package’s namespace, which controls the visibility and accessibility of functions, objects, and other elements. It specifies which functions and objects are exported (visible to users) and which are kept internal to the package. Here is an example NAMESPACE file:

# Generated by roxygen2: do not edit by hand

export(my_function1, my_function2)
export(sample_data)
exportPattern("^plot_", include = FALSE)

importFrom(dplyr, mutate, filter)
importFrom(stats, mean, sd)

S3method(print, my_class)


  • export: This line specifies which functions and datasets are exported and will be visible to users when they load the package. In this example, my_function1, my_function2, and sample_data are exported. The exportPattern line specifies that any object starting with “plot_” is not exported.
  • importFrom: These lines specify which functions are imported from other packages. In this case, mutate and filter are imported from the dplyr package, and mean and sd are imported from the base R stats package.
  • S3method: This line specifies that a particular S3 method2 (print for the S3 class my_class) is part of the package. This is relevant for packages that define S3 methods for generic functions.

Please note that in practice, you won’t need to manually edit the NAMESPACE and DESCRIPTION file directly. Instead, you typically use roxygen2 during package development, and they automatically generate or update the files based on the information provided in your source code and Roxygen comments. The examples above are just to illustrate the syntax and content of the files.



4. Distrubute Your Package

Finally, you are ready to distribute your package. The easiest way to do it is through GitHub. Once you pushed your files to your repository, users can install your package directly in R environment using the devtools package:

devtools::install_github("yourusername/YourPackageName")


To install an updated package, a user can add force=TRUE parameter in the command:

devtools::install_github("yourusername/YourPackageName", force=TRUE)

  1. The expect_identical is built upon the identical() function, which also checks for data type.↩︎

  2. In R, S3 (Simple, Scalar, and Sufficient) methods refer to a type of object-oriented programming used for defining and dispatching generic functions. S3 classes are typically used for simple and informal OOP, as it is light weight and easy to use. Other than S3, R also has S4, which is suited for complex systems with the need for formal class definitions, multiple dispatch, and inheritance.↩︎

Post a Comment

0 Comments