Macro: Data Set Interface

In many macro language programs and applications, there is very often a need to define macro variables or parameters based on values from SAS data sets. However, because macro references are resolved before any DATA or PROC steps, directly using data set values for your macros is not possible. Thus, a specialized interface is required to achieve this. In this blog post, we will explore how to extract values from SAS data sets and convert them into macro variable values. Let's get started!

Using the SYMPUT Routine

As discussed in an earlier blog post, macro language references are resolved and executed prior to the compilation of the DATA step. This means that you cannot use %LET statement and assign a macro variable with DATA step variable. To illustrate, let's consider the following SAS program:

DATA Employees;
INFILE '/home/u63368964/source/employees.csv' FIRSTOBS=2 DLM=',' TRUNCOVER; FORMAT Dob DATE10. StartDate DATE10. EndDate DATE10.; INPUT EmployeeID $4. GivenName :$15. SurName :$15. Dob DDMMYY10. JobTitle :$25. StartDate :DATE10. EndDate ?:DATE10.;
IF EndDate = '.' THEN YearsInOffice = INTCK('Year', StartDate, TODAY()); ELSE YearsInOffice = INTCK('Year', StartDate, EndDate); RUN; PROC FREQ DATA=Employees NOPRINT; TABLES JobTitle / OUT=JobTitles_Freq; RUN; PROC SORT DATA=JobTitles_Freq; BY DESCENDING COUNT; RUN;

This program counts the number of employees in the Employees data set by JobTitle through PROC FREQ and outputs the results as a new data set, JobTitles_Freq. Then, it sorts the JobTitles_Freq data set in descending order based on the count of job titles, so that the most frequent job title appear first.

Now, let's say that we want to take the first observation from the JobTitles_Freq and store it in a macro variable. Using this most frequent JobTitle, the following pseudo code intends to filter the Employees data set and apply PROC MEANS on the YearsInOffice variable. So, the following pseudo code aim to obtain the average YearsInOffice for the most frequent JobTitle:

DATA MostFreqJobTitle;
SET JobTitles_Freq(OBS=1);
%LET job_title = JobTitle;
RUN;

%PUT &job_title; PROC MEANS DATA=Employees; VAR YearsInOffice; WHERE JobTitle = "&job_title"; RUN;

However, since the %LET statement executed long before the DATA step is even compiled, this program will not work as intended: Instead of the DATA step variable value JobTitle, the macro variable &job_title contains a text literal, J-o-b-T-i-t-l-e.

To get around this problem, we can use the SYMPUT routine, which places DATA step information onto the macro variable symbol tables. Similar to %LET statement, the SYMPUT routine can be used to assign a new macro variable. For example:

DATA MostFreqJobTitle; SET JobTitles_Freq(OBS=1); CALL SYMPUT('job_title', JobTitle); RUN; PROC MEANS DATA=Employees; VAR YearsInOffice; WHERE JobTitle = "&job_title"; RUN;

It is worth noting that this is not a macro level statement; it is a SAS DATA step routine. So, in the code lines shown above, the macro reference &job_title in the PROC MEANS are resolved, prior to the SYMPUT call itself. Thus, at the moment it is first resolved, technically it has not defined yet. 

Then, during the execution of DATA MostFreqJobTitle, the macro variable job_title receives the value contained in the PDV variable JobTitle. Subsequently, this received value (which is Production Technician I) is used in the WHERE statement of the PROC MEANS. 

Without knowing this order of execution can results in unintended behavior. For example, let's consider the following DATA step:

DATA Employees2; SET Employees; CALL SYMPUT('first_name', GivenName); NickName = "&first_name"; RUN;

In this example, before the DATA Employees2 is even compiled, the macro variable reference &first_name must be resolved. Since this variable has not yet been defined by the CALL SYMPUT at this point, SAS throws a warning message  and says that "WARNING: Apparent symbolic reference FIRST_NAME not resolved." Consequently, the DATA step variable NickName will contain the characters exactly like they are specified in the quotation marks.

Thus, defining a macro variable using SYMPUT and attempting to access it within the same DATA step will not yield the expected result, which is a very common mistake in SAS programming. 

Using a SAS Data Set As a Control File

Very often, macros are defined with tons of parameters. Especially, when these macros are called frequently throughout your program, manually passing all the arguments each time can become cumbersome. In such cases, you can store those arguments in a SAS data sets. These values will then be used as a control information during execution, taking advantage of SYMPUT routine. For example:

DATA Control; Libref = 'MyData'; Dataset = 'Boston'; NumObs = '20'; RUN; %MACRO macro_resolution; DATA _NULL_; SET Control; CALL SYMPUT('libref', Libref); CALL SYMPUT('dsn', Dataset);
CALL SYMPUT('nobs', NumObs);
RUN; %MEND macro_resolution;

%MACRO my_print; PROC PRINT DATA=&libref..&dsn (OBS=&nobs); TITLE "First &nobs of &libref..&dsn"; RUN;
%MEND my_print; %macro_resolution;
%my_print;

In this example, the SAS data set Control contains three DATA step variables: Libref, Dataset, and NumObs. Then the %macro_resolution defines all macro variables needed for later use. So, whenever required, you can call the control information. When a control data set contains information about data sets as shown in this example, we often refer to it as metadata.

Post a Comment

0 Comments