If you wish to reference a file in S3 from a pig script you might do something like this:

set fs.s3n.awsSecretAccessKey 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
set fs.s3n.awsAccessKeyId 'xxxxxxxxxxxxxxxxxxxxx';
A = load 's3n://<bucket>/<path-to-file>' USING TextLoader;

If you're on HDP 2.2.6, you'll likely see this error:

Error: java.io.Exception, no filesystem for scheme: s3n

The following steps resolve this issue:

In core-site.xml add:

<description>The FileSystem for s3n: (Native S3) uris.</description>

Then add to the MR2 and/or TEZ class path(s):


These configs ensure 2 things:

  1. That the worker YARN containers spawned by pig have access to the hadoop-aws.jar file
  2. That the worker YARN containers know which class implements the file system type identified by "s3n://"

Spoke to the author.. This is still definitely relevant to HDP 2.2 and I think HDP 2.3.

s3n is deprecated in newer versions of Hadoop (see https://wiki.apache.org/hadoop/AmazonS3), so it's better to use s3a. To use s3a, specify s3a:// in front of the path when accessing files.

The following properties need to be configured first:


