Snowflake and Amazon S3 Integration

There is a high demand for storing and analyzing large volumes of data. This demand is being satisfied by numerous data storage and data warehouse providers. In this article, we talk about two of the most popular services — Amazon S3 and Snowflake. We discuss why people choose them, highlight their benefits, and check their native connectivity options and possible alternatives.

Table of Contents

  1. What is Amazon S3?
  2. Amazon S3 Benefits
  3. What is Snowflake?
  4. Snowflake Benefits
  5. Amazon S3 Snowflake Integration
  6. Manual Connection of Amazon S3 to Snowflake
  7. Integration of Amazon S3 with Snowflake Using Skyvia
  8. Conclusion

What is Amazon S3?

Amazon S3 is one of the most popular cloud storage services. It stores data as objects within buckets. S3 earned its popularity due to extremely high scalability and availability.

Amazon S3 Benefits

Amazon S3 is a complex system that provides numerous benefits. We recommend you check the S3 official documentation to learn about them in detail. For the overview purpose, we are pointing out the elephant in the room.

Scalability

In Amazon S3, you can store an almost unlimited number and volume of objects. It allows you to scale storage services on the go according to your current needs.

Availability

Amazon states that the service provides a 99.9% availability for the data you store. That’s an extremely high number, and according to our user experience and overall feedback over the internet, that statement is pretty close to reality.

Cost Efficiency

With Amazon S3, you don’t need to guess how much storage you will need in the future. You can store as much data as you want and access it when needed. You only pay for the storage you use.

Security

Amazon S3 ensures you don’t lose valuable data by automatically making copies of your objects on multiple devices across multiple facilities. It also lets you preserve, retrieve and restore every version of every object in an Amazon S3 bucket so you can easily recover data if users accidentally delete something or application failures.

Bucket policies, AWS Identity and Access Management (IAM) policies, access control lists (ACLs), and S3 Access Points help you precisely control the access to your objects and buckets.

Discover best pricing

What is Snowflake?

Snowflake is a cloud-based data warehouse. Like all data warehouses, Snowflake collects and aggregates data from one or many sources so data can be used to produce business insights.

Complex queries usually require stopping the database updates for the time of query execution, which is only sometimes possible. That’s why businesses use Snowflake as a repository for grouping up the data and executing queries on it (separately from the main database).

Snowflake Benefits

Snowflake is known for becoming the biggest software IPO in history in 2020. And that’s for a reason. Company holders were able to properly share the benefits that Snowflake can provide to the masses. Here you can find the list of the main ones.

Effective Data Sharing

Snowflake’s architecture enables smooth data sharing between users. You can provide access to your data in Snowflake to the users who aren’t Snowflake customers by creating client accounts for them.

Accessibility and Performance

There are no concurrency-related delays or failures. Multi-cluster architecture makes sure all your concurrent queries run smoothly.

Security

Snowflake is HIPAA, HITRUST CSF, PCI DSS, FedRAMP, and IRAP Protected compliant and is trusted by many governmental, health, and banking institutions.

Simplicity

It requires no on-premise installation or management and provides several built-in features such as speed optimization, security, easy data restoration, and others.

Amazon S3 Snowflake Integration

Amazon S3 stores vast volumes of data. You can use this data to achieve valuable insights by uploading it to a purpose-built cloud data warehouse such as Snowflake. To integrate Snowflake and Amazon S3, you can use native methods or custom third-party solutions. Let’s check both options in detail, so you can decide what fits your needs best.

Manual Connection of Amazon S3 with Snowflake

You can set up a manual connection between Snowflake and Amazon S3 by using built-in tools to access S3 data by using LIST <Stage_name>; command. The process consists of 6 steps.

Note! Completing the instructions in this paragraph requires permissions in AWS to create and manage IAM policies and roles.

Step 1. Creating IAM policy

Snowflake needs the following permissions on an S3 bucket to access files in the folder (and sub-folders):

s3:GetBucketLocation

s3:GetObject

s3:GetObjectVersion

s3:ListBucket

As the best practice for this, Snowflake suggests creating an IAM policy. Afterward, you can attach a policy to a certain role and use credentials generated by AWS.

  • Log into the AWS Management Console.
  • Go to Dashboard > Identity & Access Management.
  • Open Account settings on the left.
  • Activate your AWS region by expanding the Security Token Service Regions list and choosing Activate next to your region.
  • Open Policies on the left.
  • Click Create Policy.
  • Click the JSON tab.

Add a policy document that allows Snowflake to access the S3 bucket and folder.

Copy and paste the text into the policy editor:

Note! Make sure to replace bucket and prefix with actual names.

 "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
              "s3:PutObject",
              "s3:GetObject",
              "s3:GetObjectVersion",
              "s3:DeleteObject",
              "s3:DeleteObjectVersion"
            ],
            "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::<bucket>",
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "<prefix>/*"
                    ]
                }
            }
        }
    ]
}
  • Click Review policy.
  • Enter the policy name and click Create Policy.

Step 2. Create the IAM Role in AWS

While in AWS Management Console, choose Identity & Access Management (IAM):

  • Choose Roles on the left.
  • Click Create Role.
  • Select Another AWS Account as the trusted entity type.
  • In the Account ID field, enter your own AWS account ID. You will modify this value later.
  • Select Require External ID. Enter any value as your ID. You will modify it later.
  • Click Next.
  • Select the policy you created in Step 1.
  • Click Next.
  • Enter a name and description for the role, and click Create Role.
  • Save the Role ARN value from the role summary page. In the next step, you will create a Snowflake integration that references this role.

Step 3. Create a Cloud Storage Integration in Snowflake

Create a storage integration using the CREATE STORAGE INTEGRATION command.

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = '<iam_role>'
  STORAGE_ALLOWED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('s3://<bucket>/<path>/', 's3://<bucket>/<path>/') ]

And replace the following parameters with your values:

  • integration_name
  • iam_role
  • Bucket
  • path

Step 4. Retrieve the AWS IAM User for your Snowflake Account

Execute the DESCRIBE INTEGRATION command to retrieve the ARN for the AWS IAM user that was created automatically for your Snowflake account:

DESC INTEGRATION <integration_name>;

Replace <integration_name> with the name of the integration you created in Step 3.

Save the following values for the next step:

STORAGE_AWS_IAM_USER_ARN

STORAGE_AWS_EXTERNAL_ID

Step 5. Grant the IAM User Permissions to Access Bucket Objects

  • Log into the AWS Management Console.
  • Choose Identity & Access Management (IAM).
  • Choose Roles from the left-hand navigation pane.
  • Click on the role you created in Step 2.
  • Click on the Trust Relationships tab.
  • Click the Edit Trust Relationship button.
  • Modify the policy document with the DESC STORAGE INTEGRATION output values you recorded in Step 4, where snowflake_user_arn is STORAGE_AWS_IAM_USER_ARN and snowflake_external_id is STORAGE_AWS_EXTERNAL_ID.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "<snowflake_user_arn>"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<snowflake_external_id>"
        }
      }
    }
  ]
}
  • Click Update Trust Policy.

Step 6. Create an External Stage

Create the stage using the CREATE STAGE command.

create stage my_s3_stage
  storage_integration = s3_int
  url = 's3://bucket1/path1'
  file_format = my_csv_format;

Finally, you can access your S3 data by using LIST <Stage_name>; command where <Stage_name> is the name of the created stage.

Integration of Amazon S3 with Snowflake Using Skyvia

Suppose you don’t want to spend much time creating and manually configuring a connection between Amazon S3 and Snowflake. In that case, you can use Skyvia to create packages and run them manually or on a schedule based on your needs. To do so, you’ll need to follow three simple steps:

  • Create Amazon S3 Connection.
  • Create SnowFlake Connection.
  • Create and run a package of choice. We create an Import package as an example of importing a CSV file from Amazon S3 to Snowflake.

The instructions below assume that you have already created a Skyvia account. If you haven’t, you can do that for free by visiting the Skyvia app page.

Create Amazon S3 Connection

To create an Amazon S3 connection, go to +New > Connection, select Amazon S3, and do the following:

Create Amazon S3 Connection in Skyvia
  1. Enter Access Key ID — the first part of your Amazon Web Services access key.
  2. Enter Secret Key — the second part of your Amazon Web Services access key.
  3. Enter Security Token (optionally) — a session token used with temporary security credentials.
  4. Set the Region — an AWS region, where your S3 storage is hosted.
  5. Enter the Bucket Name — the name of your S3 bucket to load CSV files from or to.
  6. Click Create Connection.

Create Snowflake Connection

To create an Amazon S3 connection, go to +New > Connection, select Snowflake, and do the following:

Create Snowflake Connection in Skyvia
  1. Enter your Snowflake Domain name.
  2. Enter your Snowflake User name.
  3. Enter the user Password.
  4. Enter the Database name.
  5. Click Create Connection.

Create Integration Package

Once connections to Snowflake and Amazon S3 are created, you can create a package of choice. For this example, we create a CSV import package:

  • Go to New > Import.
  • Choose CSV from storage service option.
Create Integration Package in Skyvia
  • Choose Amazon S3 as Source and Snowflake as Target and click Add New.
  • Select your CSV file and click Next Step.
  • Choose the table you want to import data to and the type of import action.
Create Integration Package in Skyvia 2
  • Map columns if needed and click Save.

The Import package is ready to use. You can run it manually or on schedule.

Conclusion

Amazon S3 and Snowflake are great cloud tools for storing and working with data. S3 Snowflake integration can benefit you with a deeper understanding of the data you store on the S3. You can use built-in and third-party solutions to make Amazon S3 and Snowflake share data. The integration process with Skyvia requires no more than the knowledge of your credentials. Additionally, you can combine data imported from S3 with data from other sources. However, it’s still an option if you prefer classical integration.

Amanda Claymore
Amanda Claymore
Content Marketer

TOPICS

BY CONNECTORS

Skyvia podcast