Macro: SAS Macro Basics

macro is essentially a rule or pattern that tells the program to replace a certain input with a predefined sequence of code. While less common today[1], macros are programming constructs that enable you to avoid duplicating the same code over and over, and thereby improve the maintainability of your program: if you need to modify to a piece of code used in multiple places, you only need to update the macro definition. 

SAS is a great example of where macros come into their own for automating tasks and improving code. The SAS macro facility, consisting of the macro language and macro processor, extends the capabilities of SAS by enabling you to:

  • Pass values throughout your SAS program.
  • Dynamically generate code at runtime.
  • Conditionally execute DATA or PROC steps.
  • Create generalized and flexible code.

The SAS macro language defines macro variables (to store and call text literals) and macros (to save and call specific actions). These elements are "resolved" through the SAS macro processor, which substitutes macro reference with its stored literals, prior to compiling your SAS program.

It is very important to keep in mind that macro references (% or &) are resolved prior to the compilation of SAS steps, as described in the diagram above. Indeed, this diagram can serve as a useful resource in addressing frequently raised questions in SAS macro programming, such as:

  • Can macro %IF statements be used interchangeably with DATA step IF statements?
  • Why can't I assign a DATA step variable value to a macro variable using the %LET statement?
  • Why can't I use a DATA step IF to conditionally execute a %LET statement?
  • Why do data set variables not have values when using them in %IF statements?

One general answer for all these four questions is that macro references are resolved before the program data vector (PDV) is populated. In other words, at the time when a macro is resolved, any DATA step variables are not yet available.

Defining and Using Macro Variables

Macro variables, also known as symbolic variables, are very powerful tools all by themselves. Even if you know nothing else about the macro language other than how to define and call macro variables properly, you can accomplish a great deal and confidently say that you are proficient in SAS macro programming at your job interview.

You can define a macro variable using a %LET statement:

%LET macro-variable-name = text-value;

The naming rules for the macro variables are as follows:

  • A name can be up to 32 characters long.
  • A name must begin with a letter or underscore (_).
  • Subsequent characters can consist of letters, numbers, and underscores in any combination.

After the keyword %LET, specify your chosen macro variable name, an equal sign (=), and text literal to be stored in the variable. The text literals can be up to 65,534 characters in length. When naming a macro variable, it is important to choose a name that do not conflict with existing functionalities in the SAS macro language. Particularly, SAS reserves names beginning with "SYS" for automatic macro variables. Using such names for your own macro variables can lead to unintended behavior or errors. So, it is advisable to avoid starting your macro variable names with "SYS" altogether.

Now, let's see how macro variables are defined and used in action! Here's an example SAS program:

/* Import data: MyData.WineQuality */
DATA MyData.WineQuality;
INFILE '/home/u63368964/source/wine-quality.csv' DLM=',' FIRSTOBS=2; INPUT Type $ FixedAcid VolatileAcid CitricAcid ResidualSugar Chlorides FreeSulfurDioxide TotalSulfurDioxide Density pH Sulphates Alcohol Quality;
Type = PROPCASE(Type); RUN;

/* Defining and using macro variables */
%LET dsn = WineQuality; %LET nobs = 20; PROC PRINT DATA = &dsn (OBS = &nobs); TITLE1 "First &nobs Rows of &dsn Data";
TITLE2 'First &nobs Rows of &dsn Data'; RUN;

In this example, two macro variables, dsn and nobs,  are defined by the %LET statements. When this program is submitted for execution, SAS first resolves macro references and replace &dsn with WineQuality and &nobs with 20. Here, observe that the text literals are enclosed by quotation marks. If they were, the resolved values will also be enclosed by quotation marks, which generally you don't want. Also note that when a macro variable should be resolved inside of a quoted string, that string must be enclosed by double quotes ("), not single ('). Otherwise, the macro variable will not be resolved.

In this example SAS program, the PROC PRINT is generalized to accept any data set and any number of observations. If you need to print a different data set or adjust the number of observations, you can easily modify the macro variable values without altering the core logic of the program.

Resolving Macro Variables 

As we've seen earlier, before executing SAS statements, any macro variables in the code are resolved first. The resolved text values are then substituted back into the statements. Predicting how the SAS statements will be after resolving is generally straightforward, but it can occasionally be less intuitive. This is particularly true when a macro variable reference is concatenated with other references or embedded within a text string. 

Adding Text Before and After Macro Variables

In computer programming, employing suffixes or prefixes when naming objects is a widely adopted practice for several reasons. It helps organizing related objects together, preventing naming conflicts, providing additional context for code readers. By appending a macro variable to data set names, data set variables, or any other strings, you can easily implement this practice in your SAS program. 

Basically, when resolved, a macro variable will be replaced by its text literal. You won't need any concatenator, as what macro variable substitute is the piece of code line itself, not the data value. The SAS code with the resolved text will then be compiled for execution. For example:

%LET wine_type = White;

DATA Only&wine_type;
SET WineQuality;
WHERE Type="&wine_type";
RUN;

In this example, the macro variable &wine_type is appended to a preceding text, Only. Thus, this code will be resolved as:

DATA OnlyWhite;
SET WineQuality;
WHERE Type="White";
RUN;

Similarly, you can also put text values after a macro variable reference. However, in this case, it could be unclear whether the following text is part of the macro variable name or not. For example, let's take a look at the following program referencing the wine_type defined earlier:

DATA &wine_typeWines;
SET WineQuality;
WHERE Type="&wine_type";
RUN;

In the program, our intention was to name the newly created data set from the DATA step as WhiteWines, after resolving &wine_type. However, due to the appended text value Wines, it is hard to determine if the macro variable is represented by the entire &wine_typeWines or just &wine_type. To avoid this ambiguity, you can add a period (.) after the macro reference. This period serves as a marker indicating the end of macro variable name:

DATA &wine_type.Wines;
SET WineQuality;
WHERE Type="&wine_type";
RUN;

This time, your program will be resolved as:

DATA WhiteWines;
SET WineQuality;
WHERE Type="White";
RUN;

Sometimes the first character of the string after the macro variable can be a period. However, a single period appended to the macro variable serves as a delimiter and will not appear in the resolved text. To get around this, you can use a double period (..) when a single period (.) is desired in the text, as shown in the program above. For example:

%LET libref = MyData;
%LET dsn = WineQuality;
%LET batch = White;

DATA &libref..&dsn&batch;
SET &dsn;
WHERE Type = "&batch";
RUN;

DATA &libref..&dsn.Red;
SET &dsn;
WHERE Type <> "&batch";
RUN;

In this example, double periods are used to represent a literal single periods. On the other hand, prior to &batch, observe that single period is not used, as there is no chance of confusion. 

Multiple Level Resolution

When two or more ampersands (&) appear consecutively, successive passes or scans are necessary achieve the final resolution. You can think of the double ampersand (&&) as a special reference that resolves to a single ampersand. To illustrate, let's define macro variables as follows:

%LET libref = SASHELP;
%LET dsn = NVST;
%LET n = 5;
%LET NVST5 = SASHELP.NVST5;

With the variables, let's consider the two combinations listed below:

CombinationFirst Scan Resolves ToSecond Scan Resolves To
&&dsn&n&dsn5No such macro variable is defined, and thus will throw and error!
&&&dsn&n&NVST5SASHELP.NVST5

In the first case, && is resolved into &. Then, there is a following text, dsn. Lastly, &n is resolved into 5. Thus, after first scan, the resolved value would be &dsn5. Since no such macro variable is defined, during the second scan, SAS will throw an error.

On the other hand, in the second case, we encounter a triple ampersand (&&&). During the first scan, the first two of the triple ampersand are resolved into &. Following this, the last ampersand and its subsequent text, dsn, are interpreted as a macro variable, &dsn, and will be resolved into NVST. Then the remaining &n is resolved into 5. Consequently, the outcome of this first scan is &NVST5. This first scan outcome, &NVST5, is then further resolved as it is defined, resulting in SASHELP.NVST5.

PROC PRINT DATA=&&&Var_Dataset&Var_N;
TITLE "&&&Var_Dataset&Var_N";
RUN;

Printing Macro Variables in the Log: %PUT

The SAS log is a text file that provides detailed information about the execution of a SAS program. It contains messages, warnings, errors, and other diagnostic information generated during the process of SAS code. The log helps users identify and troubleshoot issues in their programs, such as syntax errors, data error, or any other unexpected behavior.

The %PUT statement, which is analogous to the DATA step PUT statement, prints out the current values of macro variables along with some other text messages to SAS log. For example:

%PUT Libref: &libref;
%PUT Dataset: &dsn;
%PUT Batch: &n;

Several reserved words are available for you to print out macro variables through %PUT:

  • _ALL_: List all macro variables in all referencing environments.
  • _AUTOMATIC_: List all of automatic macro variables.
  • _GLOBAL_: List all global macro variables.
  • _LOCAL_: List all macro variables that are accessible only in the current referencing environment.
  • _USER_: List all macro variables that can be accessed by the current user.

Automatic Macro Variables

Some macro variables are created automatically by the macro processor. You can employ these variables in the same manner as any other macro variable. Listed below are some commonly used automatic variables:

  • &SYSDATE: Date that the session began executing (DATE7. form).
  • &SYSDATE9: Date that the session began executing displayed with a four-digit year (DATE9. form).
  • &SYSDAY: Day of the week that the session began executing.
  • &SYSTIME: Time of the day that the SAS session began executing.
  • &SYSLAST: Name of the last SAS data set created with the library and data set name separated with at least one space.
  • &SYSDSN: Name of the last SAS data set created with the library and data set name separated with at least one space.
  • &SYSERR: Stores the return codes of PROC and DATA steps.
  • &SYSCC: Stores the overall session return code.
  • &SYSPARM: Specifies a character string that can be passed into SAS programs. Usually used in the batch environment, this macro variable accesses the same value as is stored in the SYSPARM= system option and can also be retrieved using the SYSPARM() DATA step function.
  • &SYSRC: Indicates the last return code from your operating environment.
  • &SYSSITE: Contains the current site number.
  • &SYSSCP: Gives the name of the host operating environment.
  • &SYSUSERID: This macro variable stores the operating system username used for the current login session. If the operating system has not captured the user ID, &SYSUSERID receives "default."
  • &SYSMACRONAME: This automatic macro variable provides the name of the macro that is currently running. &SYSMACRONAME is commonly used for documenting the execution flow of your SAS application. Note that if &SYSMACRONAME is used in open code (outside of any macro definitions), it will have a null value. Also note that if macros are nested, &SYSMACRONAME will store the name of the inner most macro.

Macro Functions

Numerical Evaluations on Macro Variables

Again, what a macro variable hold is neither character nor numeric. Rather, it holds a literal text value that will be substituted into your SAS code. Thus, it is generally not possible to directly take operations on a macro variable. However, we sometimes want the expression to be directly evaluated. One of the solutions for such case is the %EVAL function. When it is called, %EVAL always perform integer arithmetic. For example:

%LET A = 5;
%LET B = &A + 1;
%LET C = %EVAL(&B + 1);
%LET D = %EVAL(&A / 2);
%LET E = %EVAL(&A + 0.2);

%PUT A: &A;
%PUT B: &B;
%PUT C: &C;
%PUT D: &D;
%PUT E: &E;

We see that the macro variable &B resolves to the literal 5 + 1, rather than being directly evaluated as 6. On the other hand, we see that &C directly evaluated the expression and resolves to 7. Similarly, %EVAL function evaluates &A / 2, but in this case, it truncates any decimals and return the whole number.

%EVAL function only takes whole numbers and performs the four basic operations: addition, subtraction, multiplication, and division. Thus, when it comes to %EVAL(&A + 0.2), SAS throws an error. For floating point evaluations, you should use %SYSEVALF function, instead of %EVAL. For example:

%LET X = 5/3; %PUT Default: %SYSEVALF(&X); %PUT Bool: %SYSEVALF(&X, BOOLEAN); %PUT Ceil: %SYSEVALF(&X, CEIL);
%PUT Floor: %SYSEVALF(&X, FLOOR);
%PUT Integer: %SYSEVALF(&X, INTEGER);

Macro Functions for Text Modification

Sometimes, you may need to modify literal texts stored in a macro variable or extract information from it. Text functions can be quite helpful in these scenarios. Here are some commonly used text functions for the purposes:

  • %INDEX(arg1arg2): Searches arg1 for the first occurrence in arg2. If there is any, return the position of the first match.
  • %LENGTH(arg): Determines the length of its argument.
  • %SCAN(arg1arg2, <delimiters>): Searches arg1 for the n-th word (arg2) and return its value. If omitted, the same word delimiter that was used as in the DATA step will be used.
  • %SUBSTR(arg, pos, <length>): Return a portion of arg, starting from pos to the <length>. If omitted, it will return by the end of the string.
  • %UPCASE(arg): Converts all characters in the arg to upper case. This function is useful when you need to compare text strings that my have inconsistent case.

For example:

%LET my_pangram = The jovial fox jumps over the lazy dog;
%LET pos_jumps = %INDEX(%UPCASE(&my_text), JUMPS); %LET my_substr = %SUBSTR(&my_pangram, &pos_jumps, %LENGTH(jumps)); %PUT &my_pangram; %PUT &pos_jumps; %PUT &my_substr; %LET x = XYZ.ABC/XYY; %LET word = %SCAN(&x, 3); %LET part = %SCAN(&x, 1, z);
%PUT WORD is &word and PART is &part;

Defining and Using Macros

SAS Macros are an extension of the macro variables we've discussed thus far. They can perform more complex tasks beyond the capabilities of macro variables alone. You can define a macro by enclosing code blocks of your interest between %MACRO and %MEND. For example:

%LET libref = SASHELP; %LET dsn = RETAIL; %LET nobs = 20; %MACRO head; PROC PRINT DATA=&libref..&dsn (OBS=&nobs); TITLE "First &nobs observations of &libref..&dsn"; RUN; %MEND head;

In this example, %MACRO start defining a SAS macro named head, and %MEND head; marks the end of the head macro definition. Once you've define the macro, you can call it anywhere in your SAS program using the following syntax:

%head;

When is called, the head first resolves the global macro variables, libref, dsn, and nobs. Subsequently, it executes PROC PRINT based on the resolved macro variables.

Understanding Scope of Macro Variables

Unlike data set variables, macro variables have their values stored in symbol tables in memory. These tables act as a dictionary, mapping macro variable names to their corresponding values and scope, which defines the visibility and accessibility of a macro variable. For example:

%LET global_var = global;

%MACRO show_scope; %LET local_var = local;
%PUT ***** Inside the macro *****;
%PUT Global Variable: &global_var;
%PUT Local Variable: &local_var; %MEND show_scope;

%show_scope;

%PUT ***** Outside the macro *****; %PUT Global Variable: &global_var; %PUT Local Variable: &local_var;

  • Global Scope: Macro variables that are defined outside of any specific macro definition is called global macro variables. Each global variable holds a single value that is accessible to all macros throughout your program. 
  • Local Scope: A local macro variable's value is only accessible within the macro where it is defined or macros nested inside that macro. Since macros can call other macros, this creates a hierarchy with multiple levels of nested local symbol tables.

For example, in the example program shown above, %LET global_var = global; defines a global macro variable named glogal_var. Since it is outside any macro definition, it has global scope and is accessible throughout the program. On the other hand, local_var is defined within the show_scope macro. Thus, this variable has local scope and only accessible within the macro and any nested macros.

Defining Macros with Parameters

When creating a macro, while it is possible to define and use a macro variable defined by a %LET statement, relying solely on %LET statement can easily become cumbersome. Particularly, %LET statement used in a macro is not flexible enough for handling different values or arguments; you will need to edit the macro each time you want to change a variable's value.

To address this limitations, creating macros with parameters is the preferred approach. Macro parameters allow you to pass different values into the macro during each call, making it adaptable to various scenarios. The assignment of values to the parameters is made when the macro is called, not when the macro is coded.

For macros with a small number of parameters (typically less than four), where the order between the parameters is clear or not very important, it is convenient to define and use a macro with positional parameters. For example:

%MACRO stacking_two_datasets(dsn1, dsn2); DATA Output; SET &dsn1 &dsn2; RUN; PROC PRINT DATA=Output; TITLE "Dataset: &dsn1 + &dsn2"; RUN; %MEND stacking_two_datasets;

The stacking_two_datasets are defined with two positional parameters, dsn1 and dsn2. Then inside the macro, the positional parameters dsn1 and dsn2 are used directly with the SET statement of the DATA step. This macro can be called as follows:

%stacking_two_datasets(SASHELP.NVST1, SASHELP.NVST2);

When the stacking_two_datasets is called, the provided values were assigned by their positions: SASHELP.NVST1 is assigned to dsn1 and SASHELP.NVST2 is assigned to dsn2. The macro code then uses these assigned values inside the macro body.

In essence, the order in which you list the arguments when calling the macro determines which parameter they are assigned to. So, mixing up the order of arguments can lead to errors or unexpected results

Sometimes, you don't know how many parameter would be needed for your macro. Rather, you would like to leave it undetermined and make your macro adopt any number of parameters. In such cases, you can have a single parameter as a placeholder for any number of parameter. For example:

%MACRO stacking_datasets(ds_list); DATA Output; SET &ds_list; RUN; PROC PRINT DATA=Output; TITLE "Dataset: &ds_list"; RUN; %MEND stacking_datasets;

%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3);
%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5);

Here, when stacking_datasets is called, the parameter &ds_list is resolved to the list of the provided arguments: %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3); calls the macro with three arguments and %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5); calls it with five arguments. This trick can make your macro take any number of arguments as needed. 

Notice that there is no commas between the data set names in the macro call. When resolved, a macro reference will become a text string that forms a single definition. So, in this context, where we would like to list multiple data sets in a SET statement, we should not include any commas. 

In the %MACRO statement, you can also designate parameters as keywords. Unlike positional parameters, these keyword parameters can be employed in any sequence and may have default values assigned them. Particularly, when you have more than four parameters to use, want to specify default values, or when parameter names can provide some additional information, it is convenient to define a macro with keyword parameters. For example:

%MACRO head(libref=, dsn=, nobs=5, var_format_pair=); PROC PRINT DATA=&libref..&dsn (OBS=&nobs); TITLE "First &nobs observations of &libref..&dsn";
FORMAT &var_format_pair; RUN; %MEND head;

In this example, head is defined with four keyword parameters: libref, dsn, nobs, and var_format_pair. Among the four parameters, the nobs is assigned its default value, while the others are not. Now, let's call this macro as follows:

%head(libref=SASHELP, dsn=RENT, var_format_pair=Date EURDFDD10. Amount EUROX12.2);

In the macro call, the three keyword arguments are provided for libref, dsn, and var_format_pair; nobs will be resolved with its default value of 5. Note that if a keyword argument does not have a default value and is not provided any argument, it will be resolved into a null string.

In a more common scenario, you would specify both keyword and positional parameters in the %MACRO statement. In this case, however, you must list positional parameters before any keyword parameters. For example:

%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*); DATA StockSubset; SET MyData.SP500; WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period); RUN; PROC SGPLOT DATA=StockSubset;
TITLE "&ticker Stock";
FOOTNOTE "Last &period &int"; &open SERIES X = DATE Y = OPEN; &high SERIES X = DATE Y = HIGH; &low SERIES X = DATE Y = LOW; &close SERIES X = DATE Y = CLOSE;
YAXIS LABEL = 'USD'; RUN; %MEND stock_chart;

The stock_chart is defined with 7 parameters: ticker, period, and int are defined as positional parameters, while the remaining four are defined as keyword parameters. In the definition, all positional parameters are placed before keyword parameters. Calling this macro also requires positional parameters prior to any keyword parameters:

%stock_chart(ABT, 5, Year, high=, low=);

In this code line, the macro stock_chart is invoked with five arguments: ABT, 5, Year, high=, and low=. Note that the keyword parameters have a default value of *, acting as automatic commenting-out feature. When a null value is provided, on the other hand, the associated variable will be employed for plotting the stock chart.


Documenting Your Macro

After creating a macro, documenting it is generally considered good practice. Well-documented macros are easier to modify and debug, as they provide explanations about the macros' purpose, functionalities, and parameters, as well as their intended behaviors. Furthermore, when working in a team environment, documentation promotes code sharing and reusability within the team, reducing redundant efforts to write new codes with the same functionality. 

Depending on the teams and projects, there could be different style guides and templates on how to write a documentation. But typically follows the rules listed below:

  • Use descriptive parameter names, so that users can easily grasp what each parameter does.
  • Supply default values, whenever reasonable defaults are available.
  • Lining up each parameter one per line with its default value and add explanation, such as range of acceptable values or some examples. 

For example:

%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*);
/*
ticker Ticker symbol of the SP500 company of interest.
period Desired time period for analysis.
int Unit of time interval. Available options are: Year | Month | Day
open=* To plot open price on the chart, pass open=
high=* To plot high price on the chart, pass high=
low=* To plot low price on the chart, pass low=
close=* To plot close price on the chart, pass close=
*/
DATA StockSubset; SET MyData.SP500; WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period); RUN; PROC SGPLOT DATA=StockSubset;
TITLE "&ticker Stock";
FOOTNOTE "Last &period &int"; &open SERIES X = DATE Y = OPEN; &high SERIES X = DATE Y = HIGH; &low SERIES X = DATE Y = LOW; &close SERIES X = DATE Y = CLOSE;
YAXIS LABEL = 'USD'; RUN; %MEND stock_chart;


[1] In modern programming languages, such as Python or Java, functionalities that macros can provide is replaced by variables and functions.

Post a Comment

0 Comments