SAS PROC Step Basics

SAS procedures (PROCs) are pre-defined functionalities that are integrated into the SAS environment. They are fully verified by SAS, and to perform any task in SAS, such as data manipulation, analysis, or reporting, you must use the procedures. By providing well-tested codes as PROCs, SAS can ensure its statistical reliability, as well as the maintainability and reusability of common operations across the entire SAS program. Here's a breakdown of what PROCs can do:

  • Data Analysis and Manipulation: PROCs can sort, summarize, and analyze data. This includes calculating statistics, creating tables and reports, and performing various statistical tests. (e.g., PROC MEANS for descriptive statistics, PROC FREQ for frequency tables)
  • Reporting and Visualization: PROCs can generate various charts, graphs, and formatted reports to visualize your data analysis. (e.g., procedures in SAS/GRAPH for creating visualizations)
  • SQL Queries: Some PROCs allow you to write SQL queries within SAS to interact with relational databases. (e.g., PROC SQL for database queries)

In terms of data manipulation, unlike the DATA step, which creates a new SAS dataset as a result of its execution, PROC steps basically do not generate a new dataset. Instead, they apply pre-built operations, such as sorting, to an existing SAS dataset to facilitate further data analysis.

PROC Statement

Each PROC is equipped with its own set of options and statements to achieve desired output. However, they adhere to basic forms. They begin with the keyword PROC, followed by the procedure's name, like CONTENT, MEANS, or SORT. Any associated options come after its name. For example, the DATA option specifies which SAS dataset to be used as input for the procedure. If it is omitted, SAS will use the most recently created dataset, which is not necessarily the dataset you intend to use. 

To apply a procedure on a permanent SAS dataset, you may include the dataset's two-level name in the DATA option. For example:


TITLE and FOOTNOTE Statements

The TITLE and FOOTNOTE statements are used to add titles and footnotes, respectively, to your PROC result. Both TITLE and FOOTNOTE statements are global statements, meaning that they are technically not a part of any PROC or DATA step. However, considering that the statements apply to the procedure output, it generally makes sense to put them with the procedure.

The TITLE statement consists of the keyword TITLE followed by your desired title enclosed in quotation marks. Similarly, the FOOTNOTE statement follows the same syntax, with the keyword FOOTNOTE preceding your footnote text enclosed in quotation marks. Note that you can also use double quotation marks instead of single ones; there is no functional difference, and it is purely a matter of preference.

If you find that your title or footnote texts contain an apostrophe, you have two options: you can either enclose the text in double quotation marks, or you can put an escape character ' in front of the apostrophe. For example:


Titles and footnotes stay in effect until you replace them with new ones or cancel them with a null statement. For example:


When you specify a new title or footnote, it replaces the old texts with the same number and cancels those with a higher number. One procedure can have up to 10 titles and footnotes. For example:


LABEL Statements

By default, SAS uses variable names to label your output. However, if you require more descriptive names for your variables, you can create them using the LABEL statements. Each label can be up to 256 characters long. For example:


Note that when a LABEL statement is used in a DATA step, the labels become part of the dataset. On the other hand, when used in a PROC step, the labels stay in effect only for the duration of that particular step.

BY Statement

The BY statement specifies the variable(s) by which variable you want to apply a procedure. It is thereby required for the PROC SORT, which sorts observations. For all other PROCs, the BY statement is optional. 

The variables listed in the BY statement are referred to as BY variables. When used in a PROC, other than PROC SORT, the BY statement instructs SAS to perform separate analyses for each unique combination of the BY variable values. However, it is important to note that for this functionality to work, a SAS dataset must be pre-sorted by the BY variables, typically achieved through PROC SORT. Otherwise, SAS will throw an error. For example:


In the SAS LOG window, we can see that it throws an error as we applied a BY variable in the PROC MEANS, without pre-sorting the observations with the variable. If the observations were sorted by the BY variable, SAS will apply the MEANS procedure for each unique value of the variable. For example:



Subsetting in Procedures with the WHERE Statement

One optional statement for any PROC that reads a SAS dataset is the WHERE statement. It allows you to specify a subset of the data to be used in the analysis. While you can also achieve this through a DATA step with IF statements, the WHERE statement serves as a convenient shortcut. Unlike subsetting IFs, which create a new SAS dataset after filtering, the WHERE statement in a PROC directly filters observations and applies the procedure on the current dataset. Thus, it is typically more efficient to use the WHERE statement than to first use subsetting IFs and then apply the procedure.

 

Here are the most frequently used operators for conditional expressions:

SymbolicMnemonicExample
=EQWHERE Make = 'Acura';
^=, ~=, <>
NEWHERE Make ^= 'Acura';
>GTWHERE MSRP > 40000;
<
LTWHERE MSRP < 40000;
>=GEWHERE MSRP >= 40000;
<=LEWHERE MSRP <= 40000;
&ANDWHERE Make = 'Acura' AND MSRP <= 40000;
|, !
ORWHERE Make = 'Acura' OR Make = 'Audi';

IS NOT MISSING
WHERE MSRP IS NOT MISSING;

BETWEEN AND
WHERE MSRP BETWEEN 30000 AND 40000;

CONTAINS
WHERE Make CONTAINS 'ura';

IN (LIST)
WHERE Make IN ('Acura', 'Audi', 'BMW');

Post a Comment

0 Comments