Cloud computing has taken over most businesses these days. Organizations across the world are evaluating various strategies to either migrate their existing infrastructure to the cloud or adopt cloud computing while developing products from scratch. When we talk about the cloud, it is mostly about AWS, Azure, and GCP. Although there are other players in the market, AWS currently occupies around 34% of the global market which makes it a leading cloud computing provider.
Table of contents
- Benefits of Using AWS RDS for PostgreSQL
- A step-by-step guide to exporting data using AWS RDS for PostgreSQL
- Pros and cons
- Make it easier with Skyvia Data Integration
- Pros and cons
- Conclusion
At the time of writing, AWS offers more than 300 services in various domains such as compute, storage, databases, big data, machine learning, and AI. Although not an exhaustive list, some of the most commonly used services from AWS are its Simple Storage Service, also known as S3, and the Relational Database Service, otherwise known as RDS.
AWS S3 is a simple object storage service that allows users to upload files and store them in the cloud. As an analogy, it can be compared to something similar to Google Drive or Dropbox where users can upload their files. Since it is hosted and managed by AWS, storing data in S3 makes it durable and highly available. Amazon also offers a Relational Database Service where users can host their relational databases on the cloud. It is offered as a PaaS which means there is no infrastructure to be provisioned by the user. Amazon takes care of running the virtual machines and hosting the databases on the backend and users are provided with a neat user interface to manage their databases. Users can run all major databases like MySQL, PostgreSQL, etc. using Amazon RDS.
In this article, we will explore how to export data from a PostgreSQL database running in Amazon RDS to a bucket in S3. After reading through this, readers can
- Get an overview of data export from Amazon RDS to S3 buckets.
- Be able to verify the PostgreSQL version for Amazon RDS
- Specify S3 buckets and access
- Use the aws_s3.query_export_to_s3 function to export data
Please note – This article assumes that readers already have a PostgreSQL instance up and running in AWS RDS and can access the database with rds_superuser permissions using a GUI (preferred) or CLI tool. Also, to verify the version of AWS services, we will use the AWS CLI which must be configured correctly.
Benefits of Using AWS RDS for PostgreSQL
While working with AWS S3 and RDS, it is often required to move data back and forth between these two services. There are various use cases that might require users to load data from S3 to RDS and also to export data from an RDS database to Amazon S3.
A step-by-step guide to exporting data using AWS RDS for PostgreSQL
Let’s take a look at the various steps that we need to take in order to export data from a PostgreSQL database running in Amazon RDS to an S3 bucket.
Step 1 – Install the aws_s3 extension
Step 2 – Provide sufficient access to the S3 bucket policy
Step 3 – Specify the AWS S3 bucket or path to export the data
Step 4 – Export data using aws_s3.query_export_to_s3 function
Step 5 – Optionally, troubleshooting with access to Amazon S3 buckets
Let’s go through each of the above steps and export data to an S3 bucket.
Step 1 – Install the aws_s3 extension
In order to install the aws_s3 extension, we need to execute the following statement. This can be executed either by using the GUI (PGAdmin4) or by using the CLI.
CREATE EXTENSION aws_s3 CASCADE;
The CASCADE option will install the required dependencies along with the extension. Once the installation is successful, it can be verified by running the following command.
SELECT * FROM pg_extension;
This will return a list of installed extensions on the PostgreSQL instance. As we can see, the two extensions – aws_commons and aws_s3 have been installed successfully.
We can also verify if the AWS RDS instance supports the aws_s3 extension. This can be done by running the following command on a terminal.
aws rds describe-db-engine-versions --region us-east-1 --engine postgres --engine-version 14.4
This will use the AWS CLI to connect to our database instance and fetch the necessary details. This command will generate an output that should contain the string “s3Export” as a part of SupportedFeatureNames.
Step 2 – Provide sufficient access to the S3 bucket policy
In this section, we will focus on creating a role and a policy in AWS that will allow the RDS instance to export data into the S3 bucket. By default, an RDS instance has no privileges to write data into an S3 bucket.
Navigate to the IAM Policy in the AWS Management Console and create a new policy. Specifically, we need to allow S3:PutObject permissions to our S3 bucket. The following policy document can be used to create the policy in AWS. Please adjust the source bucket name accordingly.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RdsDataExport",
"Action": [
"S3:PutObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::rds-data-export-demo/*"
]
}
]
}
The next step is to create a role in IAM and attach this policy to the role. This can also be done in the AWS Management Console.
The final step in this process is to allow the PostgreSQL instance to assume this role. This will allow the RDS service to write data into the S3 bucket.
Navigate to the AWS Management Console, and select the RDS service and the DB Instance that we are going to connect to. Under the Connectivity and security tab, scroll down to the Manage IAM Roles section. Select the new role created for the dropdown Add IAM roles to this instance and feature as s3Export. Click on Add role.
Step 3 – Specify the AWS S3 bucket or path to export the data
In this section, we will configure the aws_s3 extension in our PostgreSQL database and set up an URI for the S3 bucket. The following parameters are required to configure the URI.
- Bucket Name – The name of the S3 bucket under which data is to be exported.
- Path – A key that will be prepended to the exported object.
- AWS Region – The AWS region in which the S3 bucket exists.
In order to configure the URI, the aws_commons.create_s3_uri function needs to be executed with the above parameters.
SELECT aws_commons.create_s3_uri(
'rds-data-export-demo',
'export',
'us-west-1'
) AS s3_export_uri;
This S3 connection will be provided to the aws_s3.query_export_to_s3 function to connect to the S3 bucket and export the data.
Step 4 – Export data using aws_s3.query_export_to_s3 function
The aws_s3.query_export_to_s3 function expects the following parameters to export the data to an S3 bucket.
- SQL Statement – This is a simple string that represents the SQL statement to select required data that needs to be exported. In this case, we will provide the following SQL statement – SELECT * FROM customers_export.
- S3 URI – The S3 URI (s3_export_uri) that was created in the previous section. This will allow RDS to connect to the S3 bucket and export the data.
Once these parameters are ready, the aws_s3.query_export_to_s3 function can be executed as follows.
SELECT *
FROM aws_s3.query_export_to_s3(
'SELECT * FROM customers_export',
:'s3_export_uri'
);
Execute the query and if the permissions are set properly, a file will be created in S3. The data in the file will be the result of what has been selected in the SQL statement. If there are no formatting options set, the output file will be tab separated.
To view the file in S3, use the AWS CLI to first list the files in the bucket and then download it to the local machine.
aws s3 rds-data-export-demo ls
Open the downloaded file with any editor of choice and contents can then be verified.
Step 5 – Optionally, troubleshooting with access to Amazon S3 buckets
This is an optional step, not required to follow if everything was configured correctly in the previous steps and the export worked well. However, in case there was an issue with the data export from RDS to S3, the following points can be useful to troubleshoot.
- IAM Policy and Role – The most common issue while exporting data from Amazon RDS to the S3 bucket is due to incorrect permissions. Verify that permissions have been granted to the RDS service to have write access to the S3 bucket and that there is no policy that restricts the RDS service to talk to the S3 bucket.
- S3 Region – Also, due to some reasons, it is advisable to host the S3 bucket and the RDS service in the same AWS region.
Pros and Cons
Let’s take a look at some of the major pros and cons of exporting data via the aws_s3 extension.
Pros
- Working with AWS RDS directly allows users to have more customizability over what data needs to be exported.
- It allows users to control granular access levels and use existing SQL queries to export data into S3.
Cons
- Using the extension requires administrative or root user privileges to the RDS database instance. This might be a bottleneck as users might not have administrative rights to the database and hence not be able to install the extension themselves.
- Correct IAM policies and roles need to be provided to the RDS instance so that it can export data to the S3 bucket. Also, if the bucket is in another AWS account, then the cross-account bucket policy must also be updated to allow RDS to write data to the bucket.
- Scheduling automated exports using the PostgreSQL extension is a hassle and needs some technical knowledge to be able to write cron jobs and schedule the exports. Also, the server needs to be always up and running to run the scheduled exports.
Make it easier with Skyvia Data Integration
Skyiva is a popular cloud-based SaaS application that provides Data Integration capabilities on the cloud. Users can sign up for the tool and start building integrations on the move. With its ease of use and simple GUI, Skyvia allows non-technical users to build robust data pipelines that can be used to move data within various cloud platforms such as Amazon RDS and S3.
Let’s take a look at how easy it is to develop a data pipeline with Skyvia Data Integration that will export data from Amazon RDS to an S3 bucket. This is the same exercise that we performed in the previous section using the aws_s3 extension on the PostgreSQL database.
Step 1 – Sign up for Skyvia
Navigate to https://app.skyvia.com/ and signup with your public profile. After signing up, we are provided with a Default Workspace, where we can create multiple objects, for example, Connections, Agents, Packages, etc. This is the one-stop shop for all the objects that can be done within the Skyvia Data Integration platform.
For our use case, we need to first establish connections to the Amazon RDS Database instance and also the S3 bucket. These connections can then be used to create a Package in Skyvia that will export data from the PostgreSQL database to the S3 bucket.
Step 2 – Create Connections
Creating connections to the AWS account from Skyvia is very easy and can be done using a very intuitive user interface.
Let’s first create the connection to the PostgreSQL database running on Amazon RDS. The following details would be required in order to establish a successful connection with RDS.
- RDS Database Public Endpoint – The public endpoint is the connection string for the database instance and it can be found in the RDS Management Console under Connectivity and security. Grab the endpoint and the port on which PostgreSQL is running.
- Database Username – The master username used to connect to the database instance.
- Database Password – The master password used to connect to the database instance.
- Default Database – The default database to which Skyvia would connect. This would be the database from which the data is to be exported to S3.
Note: The Security Group must be configured to allow inbound connections to the RDS Instance.
Click on Connections and then Create new to create a new connection. On the search bar, search for PostgreSQL and click on the connector.
Select the Connection Mode as Direct, and provide the Server, Port, User ID, and Password. In the Database dropdown, select the name of the database and click Test Connection to validate if the connection is successful.
If the details are correct, a message will be displayed with success. Click on Create Connection to create the connection.
Now, let’s create the connection to the Amazon S3 bucket. In order to connect to the bucket, we need to get Access Key ID and Secret Key from the Amazon Management Console. To learn how to create and use an Access Key, please follow the official document from AWS.
Select Amazon S3 from the Connector dropdown and provide the details as follows.
- Access Key ID – The Access Key ID for the user.
- Secret Key – The Secret Key for the user.
- Region – The preferred AWS region.
- Bucket Name – The bucket to which data will be exported.
Click on Test Connection to validate if the connection is successful. If there is a message with success, then click on Create Connection to create the connection to the S3 bucket.
Step 3 – Create a Package to export data
In this step, we will use the connections created in the previous step to create a package that will export data from a table in the PostgreSQL database. Click on Packages and Export to create a new export package.
Select the RDS connection as the Source Connection and for the Target, select the Target Type as To storage service and the S3 connection that we created in the previous step. Provide a Folder name to which the exported files will be created and keep the other settings as default.
Once the connections are provided, the next step is to create a Task. A Task is a unit that can be executed to export the data. On the right-hand pane, click on Add new for Tasks.
In the Task Editor modal view that appears, select the Editor Mode as Simple and provide a name for the Target File. Select Compress Type as zip or gzip if compression is desired, otherwise select None. Also, select the checkboxes next to the field names that need to be exported to the CSV file. Unchecked fields will not be included in the exported file. Optionally, users can also filter and sort the exported data by using Filter and Order By controls available.
Click on the Next step to map the field names from the database to the exported file. For simplicity, let’s leave the filed names as is and click Save to save the task.
Click Validate to validate the package and then click Create to create the package. Once the package is created, click on Run to export data from the Amazon RDS database to the S3 bucket.
Step 4 – Run and Monitor
Click on the Monitor tab to monitor the status of the package. The initial state will be Queued. However, once the execution is completed, the status will be updated with the number of success and error rows.
To verify if the file has been exported correctly, navigate to the S3 bucket from AWS Management Console and download the file to the local machine.
Open the file in a code editor of choice to view the contents.
Pros and Cons
Some of the major pros and cons of using the Skyvia connector for PostgreSQL are discussed below.
Pros
- With the Skyiva connector, it is very easy for non-technical users to create data pipelines and export data from Amazon RDS to S3 buckets.
- The intuitive GUI allows users to filter, sort, and select columns to be exported without writing a piece of custom code.
- Skyvia provides in-house scheduling of data exports from Amazon RDS to S3 buckets. This can be done by simply selecting when and where to export the data to.
- No third-party extensions are required to be installed.
Cons
- There are no video tutorials to follow while creating the data pipelines with Skyvia.
- With the free tier, there are some limitations in usage; however, with the paid version, there are a plethora of options to explore.
Conclusion
As we have seen in the sections above, exporting data using the native aws_s3 connector involves a lot of manual steps in configuring the access correctly. If the permissions are not set correctly, the export will not be generated and needs to be troubleshot. Often the most common problem is that the role that has been assumed by RDS, does not have sufficient permissions to write data to the S3 bucket.
However, all these problems can be easily overcome by using a cloud data platform like Skyvia Data Integration. With great ease and intuitive UI, it becomes possible for non-technical users to create packages and export data from RDS to S3.