Getting Started with SAS Programming Using SAS OnDemand for Academics

SAS (Statistical Analysis System) is a comprehensive software suite for advanced analytics, data management, business intelligence, and predictive analysis. Initially developed as a project to support agricultural research at North Carolina State University, SAS gained widespread adoption beyond academia and once stood top choice for enterprises seeking robust data solutions. In recent years, however, SAS has faced increasing challenges in the rapidly evolving data analytics market. While it remains a preferred choice in some highly regulated industries, such as finance and pharmaceuticals, its prohibitive subscription costs and steep learning curve have made it less accessible to startups and individual learners. The rise of open-source tools, driven by collaborative innovations in the data science community, has further highlighted the limitations of SAS as a proprietary software suite. This restricted access to research and talent pools has slowed SAS's adoption of cutting-edge AI technologies, causing it to appear increasingly outdated in the fast-paced industry.

In response to these challenges, SAS introduced SAS Studio, a web-based interface for writing and executing SAS programs without installation, and SAS OnDemand for Academics, which provides free access to SAS Studio. This initiative aims to make SAS more accessible to individual learners and cultivate a new generation of data scientists equipped with SAS programming skills. 

First Look at SAS Studio

Let's navigate to SAS OnDemand for Academics. Signing into your account and clicking "Launch" will start a new session. User interface consists of three major parts: the top menu, the work area, and the navigation pane.

  1. Top Menu: Overall application controls and functionalities of SAS Studio:
    • Search and open files that are uploaded to your SAS Studio environments.
    • Switching back and forth to the SAS Programmer and Visual Programmer perspectives.
    • Custom your SAS Studio work environment.
  2. Navigation Pane: Sections to manage and organize your work files:
    • Server Files and Folders: This section allows you to browse and access your files, stored in the SAS Studio environments.
    • Tasks and Utilities: Collection of pre-defined tasks and workflows that you can readily employ for common data processing needs.
    • Snippets: Collection of code snippets for common data processing tasks. You can also create your own for later use.
    • Libraries: Permanently stores and organizes SAS data sets.
    • File Shortcuts: Creates and manages file shortcuts.
  3. Work Area: This is the main space where you can create your SAS programs, each of them in either SAS Programmer or Visual Programmer perspective.

SAS Studio has two "perspectives" for different user needs: SAS Programmer and Visual Programmer. The SAS Programmer is the default perspective when you open SAS Studio. It allows users to write, edit, run, and debug SAS codes directly. Program files you created through this perspective will have .sas extension.

On the other hand, the Visual Programmer perspective allows you to build workflows. In this perspective, you can drag and drop files and items in the "Tasks and Utilities" section on the left panel. You can also add your custom SAS program to the Process Flow. Click the + sign on the menu bar under the work area and select SAS Program. Double clicking it will open a text editor where you can write and run your SAS program. 

Under the Visual Programmer perspective, you can visually explore and overview the whole process of your data analysis project. Each node is connected in the order of data processing to visually confirm the workflow at a glance. The working files are saved as Process Flow files with an extension .cpf.

How to Upload External Data Files into a SAS Dataset?

SAS Studio is a cloud application. Prior to any data processing, you must first upload data files stored in your local machine. To upload one, go to the "Server Files and Folders" section on the navigation pane, and click on "Files". Next, select the destination folder under the "Files" and click "Upload" button.  

After uploading the selected file, the next step is to create a new SAS data set. SAS cannot directly process raw data files like CSV or Excel. These files must first imported into a SAS dataset. The easiest way is to use the point-and-click interface of SAS Studio. 

Right-click on the uploaded data file and select "Import Data".

By default, the imported SAS dataset will be saved in the temporal WORK library and named "Import". So, click "Change" and replace the library and dataset name. Next, fill in the row number at which data reading should start in the "Start reading data at row". In a SAS dataset, all columns must have appropriate lengths and data types. The "Guessing rows" field determines the number of rows read to determine these attributes. Select a number that is smaller than the entire number of rows, but is reasonably large enough. All things completed, click "Save" and "Run" buttons to start data imports. 

SAS Libraries

SAS libraries are essentially a storage location for SAS datasets, grouping related datasets under specific names and providing callable references. By default, all SAS datasets are temporarily stored in the WORK library and will be automatically deleted by the end of the current session. To avoid this, you should create a new library for your project and store datasets under the library.

One of the ways to create a new SAS library is the LIBNAME statement. Here's the basic syntax:

LIBNAME MyData '/path/to/your/library';

In the SAS ODA, your library paths will always begin with '/home/your-user-name/'. You can find the user name for the path at the bottom right corner on your browser.

Alternatively, you can create a library through SAS ODA's graphical interface. Navigate to the "Libraries" section on the left panel, then right click on the "My Libraries". You will find the "New Library" button.

In the "New Library" window:

  • Name: Specify the name for the new library. It should be descriptive, no longer than 8 characters, must start with a Roman alphabet, cannot contain any blanks or special characters other than underscore.
  • Path: The directory where the library is located, equivalent to the file path specified in a LIBNAME statement. You may click the Browse button to select the directory for the new library.
  • Re-create this Library at start-up: Optionally, you can set SAS to "remember" the new library every time you start a new SAS Studio. This ensures that the library exists in a new session.

Getting Started with the SAS Programming Language

The SAS programming language is a specialized tool designed for working with SAS products. It is part of the SAS software suite and is specialized in tasks such as data manipulation, statistical modeling, and reporting. Some might argue that point-and-click is good enough for SAS and there is no need to learn the language. However, although basic data analysis can be performed through SAS's menu-driven interface, mastering the language allows users to understand the underlying processes, automate repetitive tasks, and create highly customized analysis workflows.

In essence, a SAS program is a collection of steps (or procedures), with each step being either a DATA step or a PROC step. The DATA step creates a new SAS dataset by referencing an external data file or another existing SAS dataset. During this process, referenced data will be manipulated by the SAS statements under the step, providing users greater flexibility beyond the capabilities of the basic menu-driven interface for data importation introduced earlier.

On the other hand, the PROC step is used to manipulate and analyze existing SAS datasets. SAS allows a wide range of tasks, such as summarizing data, generating statistical analysis, and creating visualizations. Note that the PROC step relies solely on SAS's predefined procedures; you cannot create your own procedures like you do with PL/SQL and the only thing you can do is specifying options, which vary depending on the procedure being used[1]

Within a SAS step, you can provide some instructions through SAS statements. Here's a breakdown of the basic syntax of a SAS statement:

  • SAS Keywords: Every SAS statement begins with a keyword that specifies the action you want SAS to undertake. For example, the DATA keyword initiates a new DATA step, PROC PRINT prints the data set most recently created, etc.
  • Options/References: Following keywords, you may also include options or references to provide additional details for the action:
    • Literals: Numbers or text strings values.
    • Variables[2]: Named column vectors in your SAS dataset holding data values. 
    • Expressions: Combinations of variables, literals, and operators.
    • Options: Additional specifications modifying the behavior of SAS statements.
  • SemicolonEvery SAS statement must conclude with a semicolon. It acts as a signal to SAS that the instruction is complete. Omitting a semicolon is a very common mistake that even experienced SAS programmers often make. So, please double-check to ensure you haven't forgotten a semicolon in your SAS statement.
  • SAS keywords and syntax itself are not case sensitive, meaning that they can be either in upper- or lowercase and there is no difference in terms of their functionality

There really aren't any widely accepted rules about how to format your SAS program. SAS statements can start in any column, continue on the next line (as long as you don't split words in two), be on the same line as other statements. However, neatly organizing your SAS code lines is always helpful, as it makes your program more maintainable.

Adding Comments

Just like with any other programming languages, you can add some comments for the code reviewers. There are two main ways to include comments in a SAS program script:

  • Single-line comments:
    • Start the comment with an asterisk followed by a space ().
    • Any text until encountering a semicolon (;) is considered comment and ignored by SAS.
  • Multi-line comments:
    • Start the comment with /* followed by a space and end the comment with */.
    • Everything between /* and */ is considered a comment, even if it spans multiple lines.

Programming Tips

People who have no experience in any programming language often get frustrated when their programs don't work correctly on the first try. Don't try to tackle a long complicated program all at once. By starting small, building upon what works, and consistently checking your results, you can enhance your programming efficiency. 

Even if you get errors, never get frustrated. Surprisingly, experienced SAS programmers could make simple mistakes; they forget to add a semicolon, misspell a word, or place statements in an incorrect order. These small mistakes can cause a whole list of errors. Sometimes, even when programs run without throwing errors, they may still be incorrect. It is always a good practice to test your code with small cases.


[1] This is why neither of SAS DATA step or PROC step is considered as a full programming language. They lack essential features for procedural data processing, including 
variable assignments, conditional branching, and iterations. These features are instead available through SAS macros.  
[2] This is not the programming variable as identifier. Rather, it refers to a table column in a SAS dataset.  

Post a Comment

0 Comments