Introduction to SAS and Hadoop

Duration: 16 hours

This course teaches you how to use SAS programming methods to read, write, and manipulate Hadoop data. Base SAS methods that are covered include reading and writing raw data with the DATA step and managing the Hadoop file system and executing Map-Reduce and Pig code from SAS via the HADOOP procedure. In addition, the SAS/ACCESS Interface to Hadoop methods that allow LIBNAME access and SQL pass-through techniques to read and write Hadoop HIVE or Cloudera Impala tables structures is part of this course. Although not covered in any detail, a brief overview of additional SAS and Hadoop technologies, including DS2, high-performance analytics, SAS LASR Server, and in- memory Statistics, as well as the computing infrastructure and data access methods that support these, is also part of this course. This course is included in the Expert Exchange on Hadoop: Using SAS/ACCESS service offering to configure SAS/ACCESS Interface to Hadoop or SAS/ACCESS Interface to Impala to work with your Hadoop environment.

Learn how to:

  • Access Hadoop distributions using the LIBNAME statement and the SQL pass-through facility
  • Create and use SQL procedure pass-through queries
  • Use options and efficiency techniques for optimizing data access performance
  • Join data using the SQL procedure and the DATA step
  • Read and write Hadoop files with the FILENAME statement
  • Execute and use Hadoop commands with PROC HADOOP
  • Use Base SAS procedures with Hadoop.

Who should attend: SAS programmers that need to access data in Hadoop from within SAS.

Prerequisites:

Before attending this course, you should be comfortable programming in SAS and Structured Query Language (SQL). You can gain this experience from the SQL1 – Essentials course. You can gain knowledge of SAS from the Programming 1 – Essentials course. A working knowledge of Hadoop is helpful.

This course addresses SAS/ACCESS, Base SAS software.

This course addresses SAS/ACCESS Interface to Hadoop.