A macro is essentially a rule or pattern that tells the program to replace a certain input with a predefined sequence of code. While less common today[1], macros are programming constructs that enable you to avoid duplicating the same code over and over, and thereby improve the maintainability of your program: if you need to modify to a piece of code used in multiple places, you only need to update the macro definition.
SAS is a great example of where macros come into their own for automating tasks and improving code. The SAS macro facility, consisting of the macro language and macro processor, extends the capabilities of SAS by enabling you to:
- Pass values throughout your SAS program.
- Dynamically generate code at runtime.
- Conditionally execute DATA or PROC steps.
- Create generalized and flexible code.
The SAS macro language defines macro variables (to store and call text literals) and macros (to save and call specific actions). These elements are "resolved" through the SAS macro processor, which substitutes macro reference with its stored literals, prior to compiling your SAS program.
It is very important to keep in mind that macro references (% or &) are resolved prior to the compilation of SAS steps, as described in the diagram above. Indeed, this diagram can serve as a useful resource in addressing frequently raised questions in SAS macro programming, such as:
- Can macro %IF statements be used interchangeably with DATA step IF statements?
- Why can't I assign a DATA step variable value to a macro variable using the %LET statement?
- Why can't I use a DATA step IF to conditionally execute a %LET statement?
- Why do data set variables not have values when using them in %IF statements?
One general answer for all these four questions is that macro references are resolved before the program data vector (PDV) is populated. In other words, at the time when a macro is resolved, any DATA step variables are not yet available.
A macro is essentially a rule or pattern that tells the program to replace a certain input with a predefined sequence of code. While less common today[1], macros are programming constructs that enable you to avoid duplicating the same code over and over, and thereby improve the maintainability of your program: if you need to modify to a piece of code used in multiple places, you only need to update the macro definition.
SAS is a great example of where macros come into their own for automating tasks and improving code. The SAS macro facility, consisting of the macro language and macro processor, extends the capabilities of SAS by enabling you to:
- Pass values throughout your SAS program.
- Dynamically generate code at runtime.
- Conditionally execute DATA or PROC steps.
- Create generalized and flexible code.
The SAS macro language defines macro variables (to store and call text literals) and macros (to save and call specific actions). These elements are "resolved" through the SAS macro processor, which substitutes macro reference with its stored literals, prior to compiling your SAS program.
It is very important to keep in mind that macro references (% or &) are resolved prior to the compilation of SAS steps, as described in the diagram above. Indeed, this diagram can serve as a useful resource in addressing frequently raised questions in SAS macro programming, such as:
- Can macro %IF statements be used interchangeably with DATA step IF statements?
- Why can't I assign a DATA step variable value to a macro variable using the %LET statement?
- Why can't I use a DATA step IF to conditionally execute a %LET statement?
- Why do data set variables not have values when using them in %IF statements?
One general answer for all these four questions is that macro references are resolved before the program data vector (PDV) is populated. In other words, at the time when a macro is resolved, any DATA step variables are not yet available.
Defining and Using Macro Variables
Macro variables, also known as symbolic variables, are very powerful tools all by themselves. Even if you know nothing else about the macro language other than how to define and call macro variables properly, you can accomplish a great deal and confidently say that you are proficient in SAS macro programming at your job interview.
You can define a macro variable using a %LET statement:
%LET macro-variable-name = text-value;
The naming rules for the macro variables are as follows:
- A name can be up to 32 characters long.
- A name must begin with a letter or underscore (_).
- Subsequent characters can consist of letters, numbers, and underscores in any combination.
After the keyword %LET, specify your chosen macro variable name, an equal sign (=), and text literal to be stored in the variable. The text literals can be up to 65,534 characters in length. When naming a macro variable, it is important to choose a name that do not conflict with existing functionalities in the SAS macro language. Particularly, SAS reserves names beginning with "SYS" for automatic macro variables. Using such names for your own macro variables can lead to unintended behavior or errors. So, it is advisable to avoid starting your macro variable names with "SYS" altogether.
Now, let's see how macro variables are defined and used in action! Here's an example SAS program:
/* Import data: MyData.WineQuality */DATA MyData.WineQuality; INFILE '/home/u63368964/source/wine-quality.csv' DLM=',' FIRSTOBS=2;
INPUT Type $ FixedAcid VolatileAcid CitricAcid
ResidualSugar Chlorides
FreeSulfurDioxide TotalSulfurDioxide
Density pH Sulphates Alcohol
Quality; Type = PROPCASE(Type);
RUN;
/* Defining and using macro variables */%LET dsn = WineQuality;
%LET nobs = 20;
PROC PRINT DATA = &dsn (OBS = &nobs);
TITLE1 "First &nobs Rows of &dsn Data"; TITLE2 'First &nobs Rows of &dsn Data';
RUN;
In this example, two macro variables, dsn and nobs, are defined by the %LET statements. When this program is submitted for execution, SAS first resolves macro references and replace &dsn with WineQuality and &nobs with 20. Here, observe that the text literals are enclosed by quotation marks. If they were, the resolved values will also be enclosed by quotation marks, which generally you don't want. Also note that when a macro variable should be resolved inside of a quoted string, that string must be enclosed by double quotes ("), not single ('). Otherwise, the macro variable will not be resolved.
In this example SAS program, the PROC PRINT is generalized to accept any data set and any number of observations. If you need to print a different data set or adjust the number of observations, you can easily modify the macro variable values without altering the core logic of the program.
You can define a macro variable using a %LET statement:
%LET macro-variable-name = text-value;
The naming rules for the macro variables are as follows:
- A name can be up to 32 characters long.
- A name must begin with a letter or underscore (_).
- Subsequent characters can consist of letters, numbers, and underscores in any combination.
After the keyword %LET, specify your chosen macro variable name, an equal sign (=), and text literal to be stored in the variable. The text literals can be up to 65,534 characters in length. When naming a macro variable, it is important to choose a name that do not conflict with existing functionalities in the SAS macro language. Particularly, SAS reserves names beginning with "SYS" for automatic macro variables. Using such names for your own macro variables can lead to unintended behavior or errors. So, it is advisable to avoid starting your macro variable names with "SYS" altogether.
Now, let's see how macro variables are defined and used in action! Here's an example SAS program:
/* Import data: MyData.WineQuality */DATA MyData.WineQuality;INFILE '/home/u63368964/source/wine-quality.csv' DLM=',' FIRSTOBS=2; INPUT Type $ FixedAcid VolatileAcid CitricAcid ResidualSugar Chlorides FreeSulfurDioxide TotalSulfurDioxide Density pH Sulphates Alcohol Quality;Type = PROPCASE(Type); RUN;/* Defining and using macro variables */%LET dsn = WineQuality; %LET nobs = 20; PROC PRINT DATA = &dsn (OBS = &nobs); TITLE1 "First &nobs Rows of &dsn Data";TITLE2 'First &nobs Rows of &dsn Data'; RUN;
In this example, two macro variables, dsn and nobs, are defined by the %LET statements. When this program is submitted for execution, SAS first resolves macro references and replace &dsn with WineQuality and &nobs with 20. Here, observe that the text literals are enclosed by quotation marks. If they were, the resolved values will also be enclosed by quotation marks, which generally you don't want. Also note that when a macro variable should be resolved inside of a quoted string, that string must be enclosed by double quotes ("), not single ('). Otherwise, the macro variable will not be resolved.
In this example SAS program, the PROC PRINT is generalized to accept any data set and any number of observations. If you need to print a different data set or adjust the number of observations, you can easily modify the macro variable values without altering the core logic of the program.
Resolving Macro Variables
As we've seen earlier, before executing SAS statements, any macro variables in the code are resolved first. The resolved text values are then substituted back into the statements. Predicting how the SAS statements will be after resolving is generally straightforward, but it can occasionally be less intuitive. This is particularly true when a macro variable reference is concatenated with other references or embedded within a text string.
As we've seen earlier, before executing SAS statements, any macro variables in the code are resolved first. The resolved text values are then substituted back into the statements. Predicting how the SAS statements will be after resolving is generally straightforward, but it can occasionally be less intuitive. This is particularly true when a macro variable reference is concatenated with other references or embedded within a text string.
Adding Text Before and After Macro Variables
In computer programming, employing suffixes or prefixes when naming objects is a widely adopted practice for several reasons. It helps organizing related objects together, preventing naming conflicts, providing additional context for code readers. By appending a macro variable to data set names, data set variables, or any other strings, you can easily implement this practice in your SAS program.
Basically, when resolved, a macro variable will be replaced by its text literal. You won't need any concatenator, as what macro variable substitute is the piece of code line itself, not the data value. The SAS code with the resolved text will then be compiled for execution. For example:
%LET wine_type = White;
DATA Only&wine_type; SET WineQuality; WHERE Type="&wine_type";RUN;
In this example, the macro variable &wine_type is appended to a preceding text, Only. Thus, this code will be resolved as:
DATA OnlyWhite; SET WineQuality; WHERE Type="White";RUN;
Similarly, you can also put text values after a macro variable reference. However, in this case, it could be unclear whether the following text is part of the macro variable name or not. For example, let's take a look at the following program referencing the wine_type defined earlier:
DATA &wine_typeWines; SET WineQuality; WHERE Type="&wine_type";RUN;
In the program, our intention was to name the newly created data set from the DATA step as WhiteWines, after resolving &wine_type. However, due to the appended text value Wines, it is hard to determine if the macro variable is represented by the entire &wine_typeWines or just &wine_type. To avoid this ambiguity, you can add a period (.) after the macro reference. This period serves as a marker indicating the end of macro variable name:
DATA &wine_type.Wines; SET WineQuality; WHERE Type="&wine_type";RUN;
This time, your program will be resolved as:
DATA WhiteWines; SET WineQuality; WHERE Type="White";RUN;
Sometimes the first character of the string after the macro variable can be a period. However, a single period appended to the macro variable serves as a delimiter and will not appear in the resolved text. To get around this, you can use a double period (..) when a single period (.) is desired in the text, as shown in the program above. For example:
%LET libref = MyData;%LET dsn = WineQuality;%LET batch = White;
DATA &libref..&dsn&batch; SET &dsn; WHERE Type = "&batch";RUN;
DATA &libref..&dsn.Red; SET &dsn; WHERE Type <> "&batch";RUN;
In this example, double periods are used to represent a literal single periods. On the other hand, prior to &batch, observe that single period is not used, as there is no chance of confusion.
In computer programming, employing suffixes or prefixes when naming objects is a widely adopted practice for several reasons. It helps organizing related objects together, preventing naming conflicts, providing additional context for code readers. By appending a macro variable to data set names, data set variables, or any other strings, you can easily implement this practice in your SAS program.
Basically, when resolved, a macro variable will be replaced by its text literal. You won't need any concatenator, as what macro variable substitute is the piece of code line itself, not the data value. The SAS code with the resolved text will then be compiled for execution. For example:
%LET wine_type = White;DATA Only&wine_type;SET WineQuality;WHERE Type="&wine_type";RUN;
In this example, the macro variable &wine_type is appended to a preceding text, Only. Thus, this code will be resolved as:
DATA OnlyWhite;SET WineQuality;WHERE Type="White";RUN;
Similarly, you can also put text values after a macro variable reference. However, in this case, it could be unclear whether the following text is part of the macro variable name or not. For example, let's take a look at the following program referencing the wine_type defined earlier:
DATA &wine_typeWines;SET WineQuality;WHERE Type="&wine_type";RUN;
In the program, our intention was to name the newly created data set from the DATA step as WhiteWines, after resolving &wine_type. However, due to the appended text value Wines, it is hard to determine if the macro variable is represented by the entire &wine_typeWines or just &wine_type. To avoid this ambiguity, you can add a period (.) after the macro reference. This period serves as a marker indicating the end of macro variable name:
DATA &wine_type.Wines;SET WineQuality;WHERE Type="&wine_type";RUN;
This time, your program will be resolved as:
DATA WhiteWines;SET WineQuality;WHERE Type="White";RUN;
Sometimes the first character of the string after the macro variable can be a period. However, a single period appended to the macro variable serves as a delimiter and will not appear in the resolved text. To get around this, you can use a double period (..) when a single period (.) is desired in the text, as shown in the program above. For example:
%LET libref = MyData;%LET dsn = WineQuality;%LET batch = White;DATA &libref..&dsn&batch;SET &dsn;WHERE Type = "&batch";RUN;DATA &libref..&dsn.Red;SET &dsn;WHERE Type <> "&batch";RUN;
In this example, double periods are used to represent a literal single periods. On the other hand, prior to &batch, observe that single period is not used, as there is no chance of confusion.
Multiple Level Resolution
When two or more ampersands (&) appear consecutively, successive passes or scans are necessary achieve the final resolution. You can think of the double ampersand (&&) as a special reference that resolves to a single ampersand. To illustrate, let's define macro variables as follows:
%LET libref = SASHELP;%LET dsn = NVST;%LET n = 5;%LET NVST5 = SASHELP.NVST5;
With the variables, let's consider the two combinations listed below:
Combination First Scan Resolves To Second Scan Resolves To &&dsn&n &dsn5 No such macro variable is defined, and thus will throw and error! &&&dsn&n &NVST5 SASHELP.NVST5
In the first case, && is resolved into &. Then, there is a following text, dsn. Lastly, &n is resolved into 5. Thus, after first scan, the resolved value would be &dsn5. Since no such macro variable is defined, during the second scan, SAS will throw an error.
On the other hand, in the second case, we encounter a triple ampersand (&&&). During the first scan, the first two of the triple ampersand are resolved into &. Following this, the last ampersand and its subsequent text, dsn, are interpreted as a macro variable, &dsn, and will be resolved into NVST. Then the remaining &n is resolved into 5. Consequently, the outcome of this first scan is &NVST5. This first scan outcome, &NVST5, is then further resolved as it is defined, resulting in SASHELP.NVST5.
PROC PRINT DATA=&&&Var_Dataset&Var_N; TITLE "&&&Var_Dataset&Var_N";RUN;
When two or more ampersands (&) appear consecutively, successive passes or scans are necessary achieve the final resolution. You can think of the double ampersand (&&) as a special reference that resolves to a single ampersand. To illustrate, let's define macro variables as follows:
%LET libref = SASHELP;%LET dsn = NVST;%LET n = 5;%LET NVST5 = SASHELP.NVST5;
With the variables, let's consider the two combinations listed below:
Combination | First Scan Resolves To | Second Scan Resolves To |
&&dsn&n | &dsn5 | No such macro variable is defined, and thus will throw and error! |
&&&dsn&n | &NVST5 | SASHELP.NVST5 |
In the first case, && is resolved into &. Then, there is a following text, dsn. Lastly, &n is resolved into 5. Thus, after first scan, the resolved value would be &dsn5. Since no such macro variable is defined, during the second scan, SAS will throw an error.
On the other hand, in the second case, we encounter a triple ampersand (&&&). During the first scan, the first two of the triple ampersand are resolved into &. Following this, the last ampersand and its subsequent text, dsn, are interpreted as a macro variable, &dsn, and will be resolved into NVST. Then the remaining &n is resolved into 5. Consequently, the outcome of this first scan is &NVST5. This first scan outcome, &NVST5, is then further resolved as it is defined, resulting in SASHELP.NVST5.
PROC PRINT DATA=&&&Var_Dataset&Var_N;TITLE "&&&Var_Dataset&Var_N";RUN;
Printing Macro Variables in the Log: %PUT
The SAS log is a text file that provides detailed information about the execution of a SAS program. It contains messages, warnings, errors, and other diagnostic information generated during the process of SAS code. The log helps users identify and troubleshoot issues in their programs, such as syntax errors, data error, or any other unexpected behavior.
The %PUT statement, which is analogous to the DATA step PUT statement, prints out the current values of macro variables along with some other text messages to SAS log. For example:
%PUT Libref: &libref;%PUT Dataset: &dsn;%PUT Batch: &n;
Several reserved words are available for you to print out macro variables through %PUT:
- _ALL_: List all macro variables in all referencing environments.
- _AUTOMATIC_: List all of automatic macro variables.
- _GLOBAL_: List all global macro variables.
- _LOCAL_: List all macro variables that are accessible only in the current referencing environment.
- _USER_: List all macro variables that can be accessed by the current user.
The SAS log is a text file that provides detailed information about the execution of a SAS program. It contains messages, warnings, errors, and other diagnostic information generated during the process of SAS code. The log helps users identify and troubleshoot issues in their programs, such as syntax errors, data error, or any other unexpected behavior.
The %PUT statement, which is analogous to the DATA step PUT statement, prints out the current values of macro variables along with some other text messages to SAS log. For example:
%PUT Libref: &libref;%PUT Dataset: &dsn;%PUT Batch: &n;
Several reserved words are available for you to print out macro variables through %PUT:
- _ALL_: List all macro variables in all referencing environments.
- _AUTOMATIC_: List all of automatic macro variables.
- _GLOBAL_: List all global macro variables.
- _LOCAL_: List all macro variables that are accessible only in the current referencing environment.
- _USER_: List all macro variables that can be accessed by the current user.
Automatic Macro Variables
Some macro variables are created automatically by the macro processor. You can employ these variables in the same manner as any other macro variable. Listed below are some commonly used automatic variables:
- &SYSDATE: Date that the session began executing (DATE7. form).
- &SYSDATE9: Date that the session began executing displayed with a four-digit year (DATE9. form).
- &SYSDAY: Day of the week that the session began executing.
- &SYSTIME: Time of the day that the SAS session began executing.
- &SYSLAST: Name of the last SAS data set created with the library and data set name separated with at least one space.
- &SYSDSN: Name of the last SAS data set created with the library and data set name separated with at least one space.
- &SYSERR: Stores the return codes of PROC and DATA steps.
- &SYSCC: Stores the overall session return code.
- &SYSPARM: Specifies a character string that can be passed into SAS programs. Usually used in the batch environment, this macro variable accesses the same value as is stored in the SYSPARM= system option and can also be retrieved using the SYSPARM() DATA step function.
- &SYSRC: Indicates the last return code from your operating environment.
- &SYSSITE: Contains the current site number.
- &SYSSCP: Gives the name of the host operating environment.
- &SYSUSERID: This macro variable stores the operating system username used for the current login session. If the operating system has not captured the user ID, &SYSUSERID receives "default."
- &SYSMACRONAME: This automatic macro variable provides the name of the macro that is currently running. &SYSMACRONAME is commonly used for documenting the execution flow of your SAS application. Note that if &SYSMACRONAME is used in open code (outside of any macro definitions), it will have a null value. Also note that if macros are nested, &SYSMACRONAME will store the name of the inner most macro.
Some macro variables are created automatically by the macro processor. You can employ these variables in the same manner as any other macro variable. Listed below are some commonly used automatic variables:
- &SYSDATE: Date that the session began executing (DATE7. form).
- &SYSDATE9: Date that the session began executing displayed with a four-digit year (DATE9. form).
- &SYSDAY: Day of the week that the session began executing.
- &SYSTIME: Time of the day that the SAS session began executing.
- &SYSLAST: Name of the last SAS data set created with the library and data set name separated with at least one space.
- &SYSDSN: Name of the last SAS data set created with the library and data set name separated with at least one space.
- &SYSERR: Stores the return codes of PROC and DATA steps.
- &SYSCC: Stores the overall session return code.
- &SYSPARM: Specifies a character string that can be passed into SAS programs. Usually used in the batch environment, this macro variable accesses the same value as is stored in the SYSPARM= system option and can also be retrieved using the SYSPARM() DATA step function.
- &SYSRC: Indicates the last return code from your operating environment.
- &SYSSITE: Contains the current site number.
- &SYSSCP: Gives the name of the host operating environment.
- &SYSUSERID: This macro variable stores the operating system username used for the current login session. If the operating system has not captured the user ID, &SYSUSERID receives "default."
- &SYSMACRONAME: This automatic macro variable provides the name of the macro that is currently running. &SYSMACRONAME is commonly used for documenting the execution flow of your SAS application. Note that if &SYSMACRONAME is used in open code (outside of any macro definitions), it will have a null value. Also note that if macros are nested, &SYSMACRONAME will store the name of the inner most macro.
Macro Functions
Numerical Evaluations on Macro Variables
Again, what a macro variable hold is neither character nor numeric. Rather, it holds a literal text value that will be substituted into your SAS code. Thus, it is generally not possible to directly take operations on a macro variable. However, we sometimes want the expression to be directly evaluated. One of the solutions for such case is the %EVAL function. When it is called, %EVAL always perform integer arithmetic. For example:
%LET A = 5;%LET B = &A + 1;%LET C = %EVAL(&B + 1);%LET D = %EVAL(&A / 2);%LET E = %EVAL(&A + 0.2);
%PUT A: &A; %PUT B: &B;%PUT C: &C;%PUT D: &D;%PUT E: &E;
We see that the macro variable &B resolves to the literal 5 + 1, rather than being directly evaluated as 6. On the other hand, we see that &C directly evaluated the expression and resolves to 7. Similarly, %EVAL function evaluates &A / 2, but in this case, it truncates any decimals and return the whole number.
%EVAL function only takes whole numbers and performs the four basic operations: addition, subtraction, multiplication, and division. Thus, when it comes to %EVAL(&A + 0.2), SAS throws an error. For floating point evaluations, you should use %SYSEVALF function, instead of %EVAL. For example:
%LET X = 5/3;
%PUT Default: %SYSEVALF(&X);
%PUT Bool: %SYSEVALF(&X, BOOLEAN);
%PUT Ceil: %SYSEVALF(&X, CEIL);%PUT Floor: %SYSEVALF(&X, FLOOR);%PUT Integer: %SYSEVALF(&X, INTEGER);
Again, what a macro variable hold is neither character nor numeric. Rather, it holds a literal text value that will be substituted into your SAS code. Thus, it is generally not possible to directly take operations on a macro variable. However, we sometimes want the expression to be directly evaluated. One of the solutions for such case is the %EVAL function. When it is called, %EVAL always perform integer arithmetic. For example:
%LET A = 5;%LET B = &A + 1;%LET C = %EVAL(&B + 1);%LET D = %EVAL(&A / 2);%LET E = %EVAL(&A + 0.2);%PUT A: &A;%PUT B: &B;%PUT C: &C;%PUT D: &D;%PUT E: &E;
We see that the macro variable &B resolves to the literal 5 + 1, rather than being directly evaluated as 6. On the other hand, we see that &C directly evaluated the expression and resolves to 7. Similarly, %EVAL function evaluates &A / 2, but in this case, it truncates any decimals and return the whole number.
%EVAL function only takes whole numbers and performs the four basic operations: addition, subtraction, multiplication, and division. Thus, when it comes to %EVAL(&A + 0.2), SAS throws an error. For floating point evaluations, you should use %SYSEVALF function, instead of %EVAL. For example:
%LET X = 5/3; %PUT Default: %SYSEVALF(&X); %PUT Bool: %SYSEVALF(&X, BOOLEAN); %PUT Ceil: %SYSEVALF(&X, CEIL);%PUT Floor: %SYSEVALF(&X, FLOOR);%PUT Integer: %SYSEVALF(&X, INTEGER);
Macro Functions for Text Modification
Sometimes, you may need to modify literal texts stored in a macro variable or extract information from it. Text functions can be quite helpful in these scenarios. Here are some commonly used text functions for the purposes:
- %INDEX(arg1, arg2): Searches arg1 for the first occurrence in arg2. If there is any, return the position of the first match.
- %LENGTH(arg): Determines the length of its argument.
- %SCAN(arg1, arg2, <delimiters>): Searches arg1 for the n-th word (arg2) and return its value. If omitted, the same word delimiter that was used as in the DATA step will be used.
- %SUBSTR(arg, pos, <length>): Return a portion of arg, starting from pos to the <length>. If omitted, it will return by the end of the string.
- %UPCASE(arg): Converts all characters in the arg to upper case. This function is useful when you need to compare text strings that my have inconsistent case.
For example:
%LET my_pangram = The jovial fox jumps over the lazy dog;%LET pos_jumps = %INDEX(%UPCASE(&my_text), JUMPS);
%LET my_substr = %SUBSTR(&my_pangram, &pos_jumps, %LENGTH(jumps));
%PUT &my_pangram;
%PUT &pos_jumps;
%PUT &my_substr;
%LET x = XYZ.ABC/XYY;
%LET word = %SCAN(&x, 3);
%LET part = %SCAN(&x, 1, z);
%PUT WORD is &word and PART is ∂
Sometimes, you may need to modify literal texts stored in a macro variable or extract information from it. Text functions can be quite helpful in these scenarios. Here are some commonly used text functions for the purposes:
- %INDEX(arg1, arg2): Searches arg1 for the first occurrence in arg2. If there is any, return the position of the first match.
- %LENGTH(arg): Determines the length of its argument.
- %SCAN(arg1, arg2, <delimiters>): Searches arg1 for the n-th word (arg2) and return its value. If omitted, the same word delimiter that was used as in the DATA step will be used.
- %SUBSTR(arg, pos, <length>): Return a portion of arg, starting from pos to the <length>. If omitted, it will return by the end of the string.
- %UPCASE(arg): Converts all characters in the arg to upper case. This function is useful when you need to compare text strings that my have inconsistent case.
For example:
%LET my_pangram = The jovial fox jumps over the lazy dog;%LET pos_jumps = %INDEX(%UPCASE(&my_text), JUMPS); %LET my_substr = %SUBSTR(&my_pangram, &pos_jumps, %LENGTH(jumps)); %PUT &my_pangram; %PUT &pos_jumps; %PUT &my_substr; %LET x = XYZ.ABC/XYY; %LET word = %SCAN(&x, 3); %LET part = %SCAN(&x, 1, z);
%PUT WORD is &word and PART is ∂
Defining and Using Macros
SAS Macros are an extension of the macro variables we've discussed thus far. They can perform more complex tasks beyond the capabilities of macro variables alone. You can define a macro by enclosing code blocks of your interest between %MACRO and %MEND. For example:
%LET libref = SASHELP;
%LET dsn = RETAIL;
%LET nobs = 20;
%MACRO head;
PROC PRINT DATA=&libref..&dsn (OBS=&nobs);
TITLE "First &nobs observations of &libref..&dsn";
RUN;
%MEND head;
In this example, %MACRO start defining a SAS macro named head, and %MEND head; marks the end of the head macro definition. Once you've define the macro, you can call it anywhere in your SAS program using the following syntax:
%head;
When is called, the head first resolves the global macro variables, libref, dsn, and nobs. Subsequently, it executes PROC PRINT based on the resolved macro variables.
SAS Macros are an extension of the macro variables we've discussed thus far. They can perform more complex tasks beyond the capabilities of macro variables alone. You can define a macro by enclosing code blocks of your interest between %MACRO and %MEND. For example:
%LET libref = SASHELP; %LET dsn = RETAIL; %LET nobs = 20; %MACRO head; PROC PRINT DATA=&libref..&dsn (OBS=&nobs); TITLE "First &nobs observations of &libref..&dsn"; RUN; %MEND head;
In this example, %MACRO start defining a SAS macro named head, and %MEND head; marks the end of the head macro definition. Once you've define the macro, you can call it anywhere in your SAS program using the following syntax:
%head;
When is called, the head first resolves the global macro variables, libref, dsn, and nobs. Subsequently, it executes PROC PRINT based on the resolved macro variables.
Understanding Scope of Macro Variables
Unlike data set variables, macro variables have their values stored in symbol tables in memory. These tables act as a dictionary, mapping macro variable names to their corresponding values and scope, which defines the visibility and accessibility of a macro variable. For example:
%LET global_var = global;
%MACRO show_scope;
%LET local_var = local; %PUT ***** Inside the macro *****; %PUT Global Variable: &global_var; %PUT Local Variable: &local_var;
%MEND show_scope;
%show_scope;
%PUT ***** Outside the macro *****;
%PUT Global Variable: &global_var;
%PUT Local Variable: &local_var;
- Global Scope: Macro variables that are defined outside of any specific macro definition is called global macro variables. Each global variable holds a single value that is accessible to all macros throughout your program.
- Local Scope: A local macro variable's value is only accessible within the macro where it is defined or macros nested inside that macro. Since macros can call other macros, this creates a hierarchy with multiple levels of nested local symbol tables.
For example, in the example program shown above, %LET global_var = global; defines a global macro variable named glogal_var. Since it is outside any macro definition, it has global scope and is accessible throughout the program. On the other hand, local_var is defined within the show_scope macro. Thus, this variable has local scope and only accessible within the macro and any nested macros.
Unlike data set variables, macro variables have their values stored in symbol tables in memory. These tables act as a dictionary, mapping macro variable names to their corresponding values and scope, which defines the visibility and accessibility of a macro variable. For example:
%LET global_var = global;%MACRO show_scope; %LET local_var = local;%PUT ***** Inside the macro *****;%PUT Global Variable: &global_var;%PUT Local Variable: &local_var; %MEND show_scope;%show_scope;%PUT ***** Outside the macro *****; %PUT Global Variable: &global_var; %PUT Local Variable: &local_var;
- Global Scope: Macro variables that are defined outside of any specific macro definition is called global macro variables. Each global variable holds a single value that is accessible to all macros throughout your program.
- Local Scope: A local macro variable's value is only accessible within the macro where it is defined or macros nested inside that macro. Since macros can call other macros, this creates a hierarchy with multiple levels of nested local symbol tables.
For example, in the example program shown above, %LET global_var = global; defines a global macro variable named glogal_var. Since it is outside any macro definition, it has global scope and is accessible throughout the program. On the other hand, local_var is defined within the show_scope macro. Thus, this variable has local scope and only accessible within the macro and any nested macros.
Defining Macros with Parameters
When creating a macro, while it is possible to define and use a macro variable defined by a %LET statement, relying solely on %LET statement can easily become cumbersome. Particularly, %LET statement used in a macro is not flexible enough for handling different values or arguments; you will need to edit the macro each time you want to change a variable's value.
To address this limitations, creating macros with parameters is the preferred approach. Macro parameters allow you to pass different values into the macro during each call, making it adaptable to various scenarios. The assignment of values to the parameters is made when the macro is called, not when the macro is coded.
For macros with a small number of parameters (typically less than four), where the order between the parameters is clear or not very important, it is convenient to define and use a macro with positional parameters. For example:
%MACRO stacking_two_datasets(dsn1, dsn2);
DATA Output;
SET &dsn1 &dsn2;
RUN;
PROC PRINT DATA=Output;
TITLE "Dataset: &dsn1 + &dsn2";
RUN;
%MEND stacking_two_datasets;
The stacking_two_datasets are defined with two positional parameters, dsn1 and dsn2. Then inside the macro, the positional parameters dsn1 and dsn2 are used directly with the SET statement of the DATA step. This macro can be called as follows:
%stacking_two_datasets(SASHELP.NVST1, SASHELP.NVST2);
When the stacking_two_datasets is called, the provided values were assigned by their positions: SASHELP.NVST1 is assigned to dsn1 and SASHELP.NVST2 is assigned to dsn2. The macro code then uses these assigned values inside the macro body.
In essence, the order in which you list the arguments when calling the macro determines which parameter they are assigned to. So, mixing up the order of arguments can lead to errors or unexpected results.
Sometimes, you don't know how many parameter would be needed for your macro. Rather, you would like to leave it undetermined and make your macro adopt any number of parameters. In such cases, you can have a single parameter as a placeholder for any number of parameter. For example:
%MACRO stacking_datasets(ds_list);
DATA Output;
SET &ds_list;
RUN;
PROC PRINT DATA=Output;
TITLE "Dataset: &ds_list";
RUN;
%MEND stacking_datasets;
%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3);%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5);
Here, when stacking_datasets is called, the parameter &ds_list is resolved to the list of the provided arguments: %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3); calls the macro with three arguments and %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5); calls it with five arguments. This trick can make your macro take any number of arguments as needed.
Notice that there is no commas between the data set names in the macro call. When resolved, a macro reference will become a text string that forms a single definition. So, in this context, where we would like to list multiple data sets in a SET statement, we should not include any commas.
In the %MACRO statement, you can also designate parameters as keywords. Unlike positional parameters, these keyword parameters can be employed in any sequence and may have default values assigned them. Particularly, when you have more than four parameters to use, want to specify default values, or when parameter names can provide some additional information, it is convenient to define a macro with keyword parameters. For example:
%MACRO head(libref=, dsn=, nobs=5, var_format_pair=);
PROC PRINT DATA=&libref..&dsn (OBS=&nobs);
TITLE "First &nobs observations of &libref..&dsn"; FORMAT &var_format_pair;
RUN;
%MEND head;
In this example, head is defined with four keyword parameters: libref, dsn, nobs, and var_format_pair. Among the four parameters, the nobs is assigned its default value, while the others are not. Now, let's call this macro as follows:
%head(libref=SASHELP, dsn=RENT, var_format_pair=Date EURDFDD10. Amount EUROX12.2);
In the macro call, the three keyword arguments are provided for libref, dsn, and var_format_pair; nobs will be resolved with its default value of 5. Note that if a keyword argument does not have a default value and is not provided any argument, it will be resolved into a null string.
In a more common scenario, you would specify both keyword and positional parameters in the %MACRO statement. In this case, however, you must list positional parameters before any keyword parameters. For example:
%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*);
DATA StockSubset;
SET MyData.SP500;
WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period);
RUN;
PROC SGPLOT DATA=StockSubset; TITLE "&ticker Stock"; FOOTNOTE "Last &period &int";
&open SERIES X = DATE Y = OPEN;
&high SERIES X = DATE Y = HIGH;
&low SERIES X = DATE Y = LOW;
&close SERIES X = DATE Y = CLOSE; YAXIS LABEL = 'USD';
RUN;
%MEND stock_chart;
The stock_chart is defined with 7 parameters: ticker, period, and int are defined as positional parameters, while the remaining four are defined as keyword parameters. In the definition, all positional parameters are placed before keyword parameters. Calling this macro also requires positional parameters prior to any keyword parameters:
%stock_chart(ABT, 5, Year, high=, low=);
In this code line, the macro stock_chart is invoked with five arguments: ABT, 5, Year, high=, and low=. Note that the keyword parameters have a default value of *, acting as automatic commenting-out feature. When a null value is provided, on the other hand, the associated variable will be employed for plotting the stock chart.
When creating a macro, while it is possible to define and use a macro variable defined by a %LET statement, relying solely on %LET statement can easily become cumbersome. Particularly, %LET statement used in a macro is not flexible enough for handling different values or arguments; you will need to edit the macro each time you want to change a variable's value.
To address this limitations, creating macros with parameters is the preferred approach. Macro parameters allow you to pass different values into the macro during each call, making it adaptable to various scenarios. The assignment of values to the parameters is made when the macro is called, not when the macro is coded.
For macros with a small number of parameters (typically less than four), where the order between the parameters is clear or not very important, it is convenient to define and use a macro with positional parameters. For example:
%MACRO stacking_two_datasets(dsn1, dsn2); DATA Output; SET &dsn1 &dsn2; RUN; PROC PRINT DATA=Output; TITLE "Dataset: &dsn1 + &dsn2"; RUN; %MEND stacking_two_datasets;
The stacking_two_datasets are defined with two positional parameters, dsn1 and dsn2. Then inside the macro, the positional parameters dsn1 and dsn2 are used directly with the SET statement of the DATA step. This macro can be called as follows:
%stacking_two_datasets(SASHELP.NVST1, SASHELP.NVST2);
When the stacking_two_datasets is called, the provided values were assigned by their positions: SASHELP.NVST1 is assigned to dsn1 and SASHELP.NVST2 is assigned to dsn2. The macro code then uses these assigned values inside the macro body.
In essence, the order in which you list the arguments when calling the macro determines which parameter they are assigned to. So, mixing up the order of arguments can lead to errors or unexpected results.
Sometimes, you don't know how many parameter would be needed for your macro. Rather, you would like to leave it undetermined and make your macro adopt any number of parameters. In such cases, you can have a single parameter as a placeholder for any number of parameter. For example:
%MACRO stacking_datasets(ds_list); DATA Output; SET &ds_list; RUN; PROC PRINT DATA=Output; TITLE "Dataset: &ds_list"; RUN; %MEND stacking_datasets;%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3);%stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5);
Here, when stacking_datasets is called, the parameter &ds_list is resolved to the list of the provided arguments: %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3); calls the macro with three arguments and %stacking_datasets(SASHELP.NVST1 SASHELP.NVST2 SASHELP.NVST3 SASHELP.NVST4 SASHELP.NVST5); calls it with five arguments. This trick can make your macro take any number of arguments as needed.
Notice that there is no commas between the data set names in the macro call. When resolved, a macro reference will become a text string that forms a single definition. So, in this context, where we would like to list multiple data sets in a SET statement, we should not include any commas.
In the %MACRO statement, you can also designate parameters as keywords. Unlike positional parameters, these keyword parameters can be employed in any sequence and may have default values assigned them. Particularly, when you have more than four parameters to use, want to specify default values, or when parameter names can provide some additional information, it is convenient to define a macro with keyword parameters. For example:
%MACRO head(libref=, dsn=, nobs=5, var_format_pair=); PROC PRINT DATA=&libref..&dsn (OBS=&nobs); TITLE "First &nobs observations of &libref..&dsn";FORMAT &var_format_pair; RUN; %MEND head;
In this example, head is defined with four keyword parameters: libref, dsn, nobs, and var_format_pair. Among the four parameters, the nobs is assigned its default value, while the others are not. Now, let's call this macro as follows:
%head(libref=SASHELP, dsn=RENT, var_format_pair=Date EURDFDD10. Amount EUROX12.2);
In the macro call, the three keyword arguments are provided for libref, dsn, and var_format_pair; nobs will be resolved with its default value of 5. Note that if a keyword argument does not have a default value and is not provided any argument, it will be resolved into a null string.
In a more common scenario, you would specify both keyword and positional parameters in the %MACRO statement. In this case, however, you must list positional parameters before any keyword parameters. For example:
%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*); DATA StockSubset; SET MyData.SP500; WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period); RUN; PROC SGPLOT DATA=StockSubset;TITLE "&ticker Stock";FOOTNOTE "Last &period &int"; &open SERIES X = DATE Y = OPEN; &high SERIES X = DATE Y = HIGH; &low SERIES X = DATE Y = LOW; &close SERIES X = DATE Y = CLOSE;YAXIS LABEL = 'USD'; RUN; %MEND stock_chart;
The stock_chart is defined with 7 parameters: ticker, period, and int are defined as positional parameters, while the remaining four are defined as keyword parameters. In the definition, all positional parameters are placed before keyword parameters. Calling this macro also requires positional parameters prior to any keyword parameters:
%stock_chart(ABT, 5, Year, high=, low=);
In this code line, the macro stock_chart is invoked with five arguments: ABT, 5, Year, high=, and low=. Note that the keyword parameters have a default value of *, acting as automatic commenting-out feature. When a null value is provided, on the other hand, the associated variable will be employed for plotting the stock chart.
Documenting Your Macro
After creating a macro, documenting it is generally considered good practice. Well-documented macros are easier to modify and debug, as they provide explanations about the macros' purpose, functionalities, and parameters, as well as their intended behaviors. Furthermore, when working in a team environment, documentation promotes code sharing and reusability within the team, reducing redundant efforts to write new codes with the same functionality.
Depending on the teams and projects, there could be different style guides and templates on how to write a documentation. But typically follows the rules listed below:
- Use descriptive parameter names, so that users can easily grasp what each parameter does.
- Supply default values, whenever reasonable defaults are available.
- Lining up each parameter one per line with its default value and add explanation, such as range of acceptable values or some examples.
For example:
%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*);/*ticker Ticker symbol of the SP500 company of interest.period Desired time period for analysis.int Unit of time interval. Available options are: Year | Month | Dayopen=* To plot open price on the chart, pass open= high=* To plot high price on the chart, pass high= low=* To plot low price on the chart, pass low= close=* To plot close price on the chart, pass close= */ DATA StockSubset;
SET MyData.SP500;
WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period);
RUN;
PROC SGPLOT DATA=StockSubset; TITLE "&ticker Stock"; FOOTNOTE "Last &period &int";
&open SERIES X = DATE Y = OPEN;
&high SERIES X = DATE Y = HIGH;
&low SERIES X = DATE Y = LOW;
&close SERIES X = DATE Y = CLOSE; YAXIS LABEL = 'USD';
RUN;
%MEND stock_chart;
[1] In modern programming languages, such as Python or Java, functionalities that macros can provide is replaced by variables and functions. ↩
In a data analysis, printing out observations from a data set is useful in many situations. For example, when exploring a new data set, printing out the first few observations can provide some initial grasps into its structure and content. Moreover, during data cleaning, selectively printing observations of interest and reviewing them allows for the intuitive identification of any outliers or missing values. Printing out observations is also essential for documentation, enhancing reproducibility and reliability of your data reports.
In this guide, we will explore how to print observations with custom formats using PROC PRINT and PROC FORMAT with some practical examples. Let's get started!
After creating a macro, documenting it is generally considered good practice. Well-documented macros are easier to modify and debug, as they provide explanations about the macros' purpose, functionalities, and parameters, as well as their intended behaviors. Furthermore, when working in a team environment, documentation promotes code sharing and reusability within the team, reducing redundant efforts to write new codes with the same functionality.
Depending on the teams and projects, there could be different style guides and templates on how to write a documentation. But typically follows the rules listed below:
- Use descriptive parameter names, so that users can easily grasp what each parameter does.
- Supply default values, whenever reasonable defaults are available.
- Lining up each parameter one per line with its default value and add explanation, such as range of acceptable values or some examples.
For example:
%MACRO stock_chart(ticker, period, int, open=*, high=*, low=*, close=*);/*ticker Ticker symbol of the SP500 company of interest.period Desired time period for analysis.int Unit of time interval. Available options are: Year | Month | Dayopen=* To plot open price on the chart, pass open=high=* To plot high price on the chart, pass high=low=* To plot low price on the chart, pass low=close=* To plot close price on the chart, pass close=*/DATA StockSubset; SET MyData.SP500; WHERE Ticker = "&ticker" AND DATE >= INTNX("&int", MAX(Date), -&period); RUN; PROC SGPLOT DATA=StockSubset;TITLE "&ticker Stock";FOOTNOTE "Last &period &int"; &open SERIES X = DATE Y = OPEN; &high SERIES X = DATE Y = HIGH; &low SERIES X = DATE Y = LOW; &close SERIES X = DATE Y = CLOSE;YAXIS LABEL = 'USD'; RUN; %MEND stock_chart;
[1] In modern programming languages, such as Python or Java, functionalities that macros can provide is replaced by variables and functions. ↩
In a data analysis, printing out observations from a data set is useful in many situations. For example, when exploring a new data set, printing out the first few observations can provide some initial grasps into its structure and content. Moreover, during data cleaning, selectively printing observations of interest and reviewing them allows for the intuitive identification of any outliers or missing values. Printing out observations is also essential for documentation, enhancing reproducibility and reliability of your data reports.
In this guide, we will explore how to print observations with custom formats using PROC PRINT and PROC FORMAT with some practical examples. Let's get started!
PROC PRINT
PROC PRINT would be one of the most frequently used procedures in SAS programming. As its name implies, the print procedure basically prints out observations stored in a data set. The basic syntax of the procedure is:
PROC PRINT DATA=MyData;TITLE 'Your Title'; /* Optional title text */FOOTNOTE 'Your footnotes'; /* Optional footnote text */LABEL variable1 = 'variable one' /* Optional labels for variables */variable2 = 'variable two';RUN;
For any procedures, if not specified otherwise, SAS uses the most recently created dataset. PROC PRINT is no exception. In practice, it is almost always recommended to explicitly specify the DATA= option for clarity in your program, as it is often hard to quickly determine which dataset was created last.
In addition to DATA=, some useful options for PROC PRINT are:
- NOOBS: By default, SAS prints the observation numbers along with the variables. If you don't want observation numbers, however, you can add the NOOBS option at PROC PRINT.
- LABEL: This option allows you to use variable labels instead of variable names in the output. This option enhances readability of your output, particularly useful for documentation purposes.
- (OBS=n): This suboption prints out only the first n observations from the beginning.
The following codelines show all of these options together:
PROC PRINT DATA=MyData.Boston (OBS=20) NOOBS LABEL; TITLE1 'Boston Housing Dataset'; TITLE2 'First 20 Obs'; FOOTNOTE 'http://lib.stat.cmu.edu/datasets/boston'; LABEL CRIM = 'Crime rate' ZN = '% residential area' INDUS = '% non-retail business area' CHAS = 'Riverside' NOX = 'Nitric oxides' RM = 'Rooms per dwelling' AGE = '% old units' DIS = 'Distance to business centers' RAD = 'Radial highway accessibility' TAX = 'Tax per $10,000' PTRATIO = 'Pupil-teacher ratio' LSTAT = '% lower status of the population'; RUN;
Optionally, you can also add the following statements to PROC PRINT:
- BY variable-list;
- In the context of PROC PRINT, the BY statement starts a new section in the output for each new value of the BY variables and prints the values of the BY variables at the top of each section. Note that the data must be presorted by the BY variables.
- ID variable-list;
- When you use the ID statement, the observation numbers are not printed. Instead, the variables in the ID variable list appear on the left-hand side of the page.
- SUM variable-list;
- The SUM statement prints sums for the variable in the list.
- VAR variable-list;
- The VAR statement specifies which variables to print and the order. Without a VAR statement, all variables in the SAS dataset are printed in the order that they occur in the dataset.
- FORMAT variable format;
- You can change the appearance of printed values using standard data formats.
- For numeric values, you can specify a format along with the width w and decimals d (formatw.d). Note that the period and d also counts for w. For example, 5.3 can display up to 9.999.
- For character values, you must put a dollar sign to indicate that it is character format ($formatw.). It takes only the width w.
- Internally, the only two data types a SAS dataset can have are numeric and character. Any date values are stored as the number of days since Jan 1, 1960. Thus, to display it as actual date values, you must specify the format.
For example:
PROC SORT DATA=SASHELP.BASEBALL OUT=SortedByTeam;BY Team;RUN;PROC PRINT DATA=SortedByTeam; TITLE "86's MLB Players"; BY Team; SUM nAtBat nHits nHome nRuns nRBI nBB nOuts nAssts nError;VAR Name nAtBat nHits nHome nRuns nRBI nBB nOuts nAssts nError Salary;FORMAT Salary DOLLAR13.2; RUN;
This procedure prints observations from the SASHELP.BASEBALL data set, pre-sorted by Team. For each Team, the print procedure prints all observed values for the variables listed in the VAR statement and calculates the sum for the variables listed in the SUM statement. The Salary values should be formatted as DOLLAR13.2.
Here are some selected standard data formats that are commonly employed:
Description | Example | Format | Result |
Character | |||
Converts character values to upper case. w ranges 1-32767, defaults to 8. | my cat | $UPCASE6. | MY CAT |
Writes standard character data - does not trim leading blanks (same as $CHARw.) w ranges 1-32767, defaults to 1. | my cat my snake | $8. '*' | my cat * my snak* |
Date, Time, and Datetime | |||
Writes SAS date values in form ddmmmyy or ddmmmyyyy. w ranges 1-11, defaults to 7. | 8966 | DATE7. DATE9. | 19JUL84 19JUL1984 |
Writes SAS datetime values in form ddmmmyy:hh:mm:ss.ss. w ranges 7-40, defaults to 16. | 12182 | DATETIME13. DATETIME18.1 | 01JAN60:03:23 01JAN60:03:23:02.0 |
Writes SAS datetime values in form ddmmmyy or ddmmmyyyy. w ranges 5-9, defaults to 7. | 12182 | DTDATE7. DTDATE9. | 01JAN60 01JAN1960 |
Writes SAS date values in form dd.mm.yy or dd.mm.yyyy. w ranges 2-10, defaults to 8. | 8966 | EURDFDD8. EURDFDD10. | 19.07.84 19.07.1984 |
Writes SAS date values in Julian date form yyddd or yyyyddd. w ranges 5-7, defaults to 5. | 8966 | JULIAN5. JULIAN7. | 84201 1984201 |
Writes SAS date values in form mm/dd/yy or mm/dd/yyyy. w ranges 2-10, defaults to 8. | 8966 | MMDDYY8. MMDDYY6. | 7/19/84 071984 |
Writes SAS time values in form hh:mm:ss.ss. w ranges 2-20, defaults to 8. | 12182 | TIME8. TIME11.2 | 3:23:02 3:23:02.00 |
Writes SAS date values in form day-of-week, month-name dd, yy or yyyy. w ranges 3-37, defaults to 29. | 8966 | WEEKDATE5. WEEKDATE9. | Thu, Jul 19, 84 Thursday, July 19, 1984 |
Writes SAS date values in form month-name dd, yyyy. w ranges 3-32, defaults to 18. | 8966 | WORDDATE12. WORDDATE18. | Jul 19, 1984 July 19, 1984 |
SAS decides best format - default format for numeric data. w ranges 1-32 | 1200001 | BEST6. BEST8. | 1.20E6 1200001 |
Writes numbers with commas. w ranges 2-32, defaults to 6, defaults to 12. | 1200001 | COMMA9. COMMA12.2 | 1,200,001 1,200,001.00 |
Writes numbers with a leading $ and commas separating every three digits. w ranges 2-32, defaults to 6. | 1200001 | DOLLAR10. DOLLAR13.2 | $1,200,001 $1,200,001.00 |
Writes numbers in scientific notation. w ranges 7-32, defaults to 12. | 1200001 | E7. | 1.2E+06 |
Writes numbers with a leading € and periods separating every three digits. w ranges 2-32, defaults to 6. | 1200001 | EUROX13.2 | €1.200.001,00 |
Writes numeric data as percentages. w ranges 4-32, defaults to 6. | 0.05 | PERCENT9.2 | 5.00% |
Writes standard numeric data. w ranges 1-32. | 23.635 | 6.3 5.2 | 23.635 23.64 |
PROC FORMAT
Occasionally, standard data formats listed earlier are not enough for particular requirements, prompting the need for custom formats tailored to your specific needs. For example, let's consider a data set introduced below:
The data set contains 913 responses from a survey on perceptions and practices of using Wikipedia as a teaching resource conducted among faculty members from two different universities located in Barcelona, Spain. Excluding AGE and YEARSEXP, 51 variables are coded as follows:
- GENDER: 0=Male; 1=Female
- DOMAIN: 1=Arts & Humanities; 2=Science; 3=Health Sciences; 4=Engineering & Architecture; 5=Law & Politics
- UNIVERSITY: 1=UOC (University Oberta de Catalunya), 2=UPF(Universitat Pompeu Fabra)
- UOC_POSITION and OTHER_POSITION: 1=Professor, 2=Associate, 3=Assistant, 4=Lecturer, 5=Instructor, 6=Adjunct
- OTHERSTATUS, PhD, and USERWIKI: 0=No; 1=Yes
- All remaining survey items are 5-point Likert scales: 1=Strongly disagree/Never; 2=Disagree/Rarely; 3=Neither agree or disagree/Sometimes; 4=Agree/Often; 5=Strongly agree/Always
Printing this data set with a user-defined formats would be very convenient as it removes necessity of data code book for interpretation. In SAS, PROC FORMAT creates custom formats that will later be associated with variables in a FORMAT statement. The basic syntax of the PROC FORMAT would be as follows:
PROC FORMAT;VALUE name range-1 = 'formatted-text-1'range-2 = 'formatted-text-2'⁞range-n = 'formatted-text-n';RUN;
Where name is the name of the format you are creating. Note that if the format is for character data, the name must start with a dollar sign ($name). Format names must be unique to each other, can be up to 32 characters long (including the $ for character data), must not start or end with a number, and cannot contain any special characters except underscores.
In the VALUE statement, each range represents the value of a variable that is assigned to the text given in quotation mark on the right side of the equal sign. These formatted texts can be up to 32,767 characters long, but some procedures print only the first 8 or 16 characters.
PROC FORMAT; VALUE Fmt_AgeGroup LOW - 40 = "Under 40" 40 -< 65 = "40 to 65" 65 - High = "Over 65"; VALUE Fmt_BinaryAnswer 0 = "No" 1 = "Yes";VALUE Fmt_University 1 = "UOC"2 = "UFP";VALUE Fmt_Position 1 = "Professor"2 = "Associate"3 = "Assistant"4 = "Lecturer"5 = "Instructor"6 = "Adjunct"; VALUE Fmt_Gender 0 = "Male" 2 = "Female"; VALUE Fmt_Likert 1 = "Strongly disagree / Never" 2 = "Disagree / Rarely" 3 = "Neither agree or disagree / Sometimes" 4 = "Agree / Often" 5 = "Strongly agree / Always"; VALUE Fmt_Domain 1 = "Arts & Humanities" 2 = "Science" 3 = "Health Science" 4 = "Engineering & Architecture" 5 = "Law & Politics"OTHER = "Others"; RUN; PROC PRINT DATA=MyData.Wiki4HE; TITLE "Wiki4HE"; FORMAT AGE Fmt_AgeGroup. GENDER Fmt_Gender. DOMAIN Fmt_Domain. PhD Fmt_BinaryAnswer. USERWIKI Fmt_BinaryAnswer. OTHERSTATUS Fmt_BinaryAnswer.UNIVERSITY Fmt_University.UOC_POSITION Fmt_Position. OTHER_POSITION Fmt_Position. PU1 -- Exp5 Fmt_Likert.; RUN;
In the SAS program above, PROC FORMAT defines several user-defined formats (UDFs) that assign labels to numeric codes. Each VALUE statement creates a UDF with a name and mappings between numeric values and corresponding character labels. The keywords LOW and HIGH are used in ranges to indicate the lowest and highest non-missing values, respectively. The OTHER keyword is used to assign a format to any values not listed in the VALUE statement.
Subsequently, in the PROC PRINT, the UDFs created from PROC FORMAT is employed; AGE employees Fmt_AgeGroup, Gender employees Fmt_Gender, and so forth. Note that PU1 -- Exp5 means all variables PU1 through Exp5.
0 Comments