Macro: Control Structures

control structure in programing is a block that supports variable evaluations and conditional decision making. It determines the path your program follows based on specified condition, so that allow you to control program flow. SAS macro language, as a programming tool, also integrates control structures with its own syntax. In fact, implementing a statistical algorithm using SAS macros typically requires conditional logic.

%IF-%THEN/%ELSE Statements

Remember that IF-THEN/ELSE statements in a DATA step allow you to write programs that execute conditionally based on the Program Data Vector (PDV) variable values. The macro %IF-%THEN and %ELSE statements acts similarly to DATA step IF-THEN/ELSE statements. However, unlike DATA step statements, the macro %IF-%THEN/%ELSE statements do not directly evaluate variables on PDV.

Instead, they manage program flow by evaluating whether a macro variable equals to your specified value. This is because, when you compile your SAS program, any macro reference will be resolved before DATA step loops. Thus, %IF-%THEN statements compare a macro variable with another macro variable or literal. For example:

%IF &SYSDAY = Sunday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;

In this example, the %IF compares the resolved value of &SYSDAY with the specified literal Sunday. If &SYSDAY equals Sunday after resolving its value, the macro variable &workout_routine will be assigned the value Leg

The %DO block is basically analogous to that of DATA step. It begins with %DO; and is terminated with %END;. The %DO block is needed when you want to add multiple macro calls, multiple macro statements, or even multiple SAS statements after %THEN.

To specify what to do when the %IF condition is not true, you should add an %ELSE block. For example:

%IF &SYSDAY = Sunday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;
%ELSE %DO;
%LET workout_routine = Push;
%PUT Today is &workout_routine. day!;
%END;



However, you cannot nest %IF block in open code. To nest a %IF-%THEN block, it must be inside a macro. So, the following code will not work:

%IF &SYSDAY = Sunday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;
%ELSE %DO;
%IF &SYSDAY = Monday %THEN %DO;
%LET workout_routine = Push;
%PUT Today is &workout_routine. day!;
%END;
%ELSE %DO;
%LET workout_routine = Pull;
%PUT Today is &workout_routine. day!;
%END;
%END;

You can also use AND, OR, NOT operators to add some more complicated expressions in the %IF clause. For example:

%IF &SYSDAY = Sunday OR &SYSDAY = Wednesday %THEN %DO;
%LET workout_routine = Leg;
%PUT Today is &workout_routine. day!;
%END;
%IF &SYSDAY = Monday OR &SYSDAY = Thursday %THEN %DO; %LET workout_routine = Push; %PUT Today is &workout_routine. day!; %END;
%ELSE %DO;
%LET workout_routine = Pull;
%PUT Today is &workout_routine. day!; %END;


Iterative %DO-%END Blocks 

In SAS macros, the %DO-%END blocks are not only used for grouping multiple statements that you want to execute under the same program flow path. You can also use them for iteration with arguments specifying the loop's behavior. The iterative %DO-%END blocks are basically analogous to the DO-END block of DATA step statements, except that: 

  • %WHILE and %UNTIL specifications cannot be added to the increments.
  • Increments are integer only.
  • Only one specification is allowed.
  • %DO defines and increments a macro variable, not a data set variable.

A common use case for the iterative %DO-%END block is generating a series of data sets or variable names with prefixed or suffixed names. For example:

%MACRO names(name, first, last); %DO i = &first %TO &last;
&name._&i
%END;
%MEND names; %PUT %names(MyData, 1, 5);

In this example, the local macro variable &i is incremented by one starting with &first and ending with &last. Then in the %DO-%END loop, it concatenates &name with an underscore and &i. Notice that the statement doesn't closed by a semicolon. This is because what it supposed to generate is a list of text literals, neither a macro nor DATA step statement.

So, if you append a semicolon to &name._&i, SAS will throw four errors, after successfully generate MyData_1:

You can call this macro for naming a series of SAS data sets or variables. For example:

/* Creates five empty data sets from scratch */
DATA %names(MyData, 1, 5);
ATTRIB
VarA LENGTH=8 LENGTH=BEST12. LABEL="Dummy variable A"
VarB LENGTH=8 LENGTH=BEST12. LABEL="Dummy variable B"
VarC LENGTH=8 LENGTH=BEST12. LABEL="Dummy variable C"
;
STOP;
RUN;

%WHILE and %UNTIL specifies conditions to stop iterations. For each iteration, the %WHILE statement first evaluates condition. If the it is true, then proceed with the executions for the current iteration. For example:

%MACRO count_while(n); %PUT Count starts at: &n; %DO %WHILE(&n < 3); %PUT *** &n ***; %LET n = %EVAL(&n + 1); %END; %PUT Count ends at: &n; %MEND count_while;

%count_while(1); %count_while(5);

Conversely, %UNTIL executes the current iteration's tasks first and then evaluates the condition. If true, the loop terminates. For example:

%MACRO count_until(n); %PUT Count starts at: &n; %DO %UNTIL(&n >= 3); %PUT *** &n ***; %LET n = %EVAL(&n + 1); %END; %PUT Count ends at: &n; %MEND count_until; %count_until(1); %count_until(5);

Macros Invoking Macros

In SAS macro programming, it is a common practice to define global macros as separate building blocks and to invoke them in another macro. This approach promotes modularity by compartmentalizing functionality and improves maintainability by allowing for isolated changes. For example, let's consider the %interleaving_two_datasets shown below:

%MACRO interleaving_two_datasets(dsn1, dsn2, by_var_list);
%sorting_obs(&dsn1, out_dsn1, &by_var_list);
%sorting_obs(&dsn2, out_dsn2, &by_var_list);
DATA Output;
SET out_dsn1 out_dsn2;
BY &by_var_list;
RUN;
%MEND interleaving_two_datasets;

%MACRO sorting_obs(input_dsn, output_dsn, by_var_list);
PROC SORT DATA=&input_dsn OUT=&output_dsn;
BY &by_var_list;
RUN;
%MEND sorting_obs;

%interleaving_two_datasets(SASHELP.NVST1, SASHELP.NVST2, Date);

In this example, %interleaving_two_datasets is defined alongside parameters that %sorting_obs will use as arguments. Here, the values specified when you call %interleaving_two_datasets are resolved and passed to %sorting_obs. Thus, invoking %interleaving_two_datasets with the three arguments, SASHELP.NVST1, SASHELP.NVST2, and Date, will be resolved as:

PROC SORT DATA=SASHELP.NVST1 OUT=out_dsn1; BY Date; RUN;

PROC SORT DATA=SASHELP.NVST2 OUT=out_dsn2; BY Date; RUN;

DATA Output;
SET out_dsn1 out_dsn2;
BY Date;
RUN;

Notice that the macro %interleaving_two_datasets is defined prior to the macro that it calls. This works because of how the SAS preprocessor handles macro definitions:

  1. Preprocessing: Before running the actual SAS program, the preprocessor scans the code for macros.
  2. Macro Expansion: During preprocessing, whenever the preprocessor encounters a macro call, it replaces it the entire definition of the called macro. This process is called macro expansion.
  3. In-place Substitution: Importantly, the macro expansion happens in-place. This means the preprocessor substitutes the macro call with the complete definition, including any nested macro calls within that definition.

Thus, as long as each macro is defined before it is called, the order in which macros are defined does not matter. In this example, the calls to %sorting_obs are not executed until %interleaving_two_datasets itself is executed.

In practice, a separate macro that does nothing but calling other macros is often referred to as a master macro. This approach promotes code reusability and efficiency. By incorporating control structures like conditionals and loops we've discussed so far, you can craft master macros that automate complex workflows, conditionally executing tasks, or iterating through processes in a single organized batch.

Bad Practice: Nesting Macro Definitions

Nested macro definitions occur when the %MACRO through %MEND statements are enclosed within another macro's definition. Almost always, this is due to the lack of programmer's understanding how macros are stored; nesting macro definitions is only very rarely necessary or advisable.

For example, in the following program, the %interleaving_two_datasets is rewritten to nest definition of the %sorting_obs. Although this program would work just as before, it is very inefficient. Every time the %interleaving_two_datasets is executed, the nested macro %sorting_obs will be redundantly re-compiled:

%MACRO interleaving_two_datasets(dsn1, dsn2, by_var_list); %MACRO sorting_obs(input_dsn, output_dsn, by_var_list); PROC SORT DATA=&input_dsn OUT=&output_dsn; BY &by_var_list; RUN; %MEND sorting_obs; %sorting_obs(&dsn1, out_dsn1, &by_var_list); %sorting_obs(&dsn2, out_dsn2, &by_var_list); DATA Output; SET out_dsn1 out_dsn2; BY &by_var_list; RUN; %MEND interleaving_two_datasets;

There is no need for this. If you define %sorting_obs in the global scope, its definition will be compiled only once. Then, whenever you invoke %sorting_obs within %interleaving_two_datasets calls, the compiled definition of %sorting_obs will be reused.

So, it is almost always best practice to avoid any nesting definitions. The only justifiable use I can think of would be when the nested macro's definition needs to vary based on conditional evaluation. However, even in such cases, I would recommend defining two separate macros in the global scope and invoking the appropriate one based on the evaluation.

MLOGIC and MPRINT Options

When modifying or debugging your macro, printing its execution process would be very helpful. The MLOGIC option outputs the macro's logical flow to the SAS LOG. On the other hand, the MPRINT option displays the actual execution of program. For example:

OPTIONS MLOGIC MPRINT;

%interleaving_two_datasets(SASHELP.NVST1, SASHELP.NVST2, Date);

OPTIONS NOMLOGIC NOMPRINT;

Post a Comment

0 Comments