Editor’s note: This article has been updated for accuracy and comprehensiveness in December 2023.
In our days, one can hardly find a business owner or high-level manager without understanding the importance of data. Data can help them better understand their customers’ needs, perform analysis of processes in their companies, study competitors, and make important business decisions. Working with data is now a part of nearly any task and process.
Since data takes part in so many different tasks, there are very many different tools for data storing, analysis, using, and processing. For example, companies use CRM software like Salesforce for working with customers as well as collecting and processing data on their customers and all their interactions. For general data storage and data analysis, companies usually use relational databases or cloud data warehouses. Currently, Amazon Redshift is the most popular cloud data warehouse.
However, it’s not always convenient to have different data locked in different tools. Data must work together in order to provide you with a 360-degree view of your customers and the situation in general. It is important to have your data connected, and the article about Salesforce to Redshift integration is what you need. It describes how to integrate Amazon Redshift to Salesforce manually and automatically in detail.
Table of contents
Important information
- Why Do You Need to Connect Salesforce to Redshift?
- How to connect Salesforce to Redshift Using Native Tools
- How to Connect Salesforce to Redshift Using Salesforce API
- How to connect Salesforce to Redshift Using Third-Party Tools
Additional information
Key Features of Salesforce
Salesforce is the most popular cloud CRM on the market. It has already become the standard for customer relationship management and is used by almost one in five businesses. Besides customer-relationship management, Salesforce offers a united platform with a number of additional tools for marketing (Salesforce Marketing Cloud), data analysis (like Salesforce Wave Analytics), customer service and many more. The platform’s functionality can also be extended with customer or third-party applications.
Key Features of Redshift
Amazon Redshift is a part of Amazon Web Services platform (AWS). It is a Massively Parallel Processing (MPP) data warehouse that can split data processing tasks and run them simultaneously on multiple processors, providing exceptional performance for complex analytical queries over huge volumes of data. It allows storing and processing data on petabyte scale and offers relatively cheap data storage per gigabyte in comparison to Salesforce. As well as Salesforce, it is a leader in the market of MPP data warehouse cloud solutions.
Why Do You Need to Connect Salesforce to Redshift?
Let’s see why Salesforce and Redshift integration is important and how it can be useful for business. First, you may need to load Salesforce data into Redshift to use immense Redshift performance for data analysis. While Salesforce has its own data analysis tools, some of them come with an additional price tag, and connecting Salesforce to Redshift can be a better option. Besides, you may load data from other applications and tools used in the company to Redshift too, and analyze all the data together, building a 360 degree business view.
Second, you may just want to have a copy of Salesforce data elsewhere, and Redshift can be a cost-effective place to store your data. In this case, you can build a Salesforce to Redshift pipeline to pull Salesforce data whenever they are updated and apply the updates to Redshift. In this way, you will have a copy of an up-to-date Salesforce data in Redshift, and can access it whenever you need.
Another scenario can be Salesforce data archiving. Storing large data volumes in Salesforce can be quite costly. If a significant portion of this data is legacy records, which you rarely need to access, but you need to store them for legal and compliance reasons, a good option is to move data from Salesforce to Redshift. Keep the fresh records that you need to work with in Salesforce and archive the rest to a Redshift database. There are even ways to connect Salesforce to Redshift and query data from redshift from salesforce interface. We describe the ways to build Redshift to Salesforce integration later.
How to connect Salesforce to Redshift Using Native Tools
Salesforce offers several native ways and tools for connecting Salesforce to Redshift. However, the most convenient of them usually come with the additional cost and are not available for all Salesforce editions.
Let’s list some of the ways that Salesforce provides via its own tools:
- MuleSoft AnyPoint Platform. This is a Salesforce owned solution for automatic data integration that allows Salesforce and Redshift integration as well as integration with other apps and systems. It comes with an additional cost, depending on data sources and use cases, and there is no well-defined pricing published, you need to contact them for a quote.
- CSV Export and Import to Redshift. As an alternative, you can manually export Salesforce data to CSV files and then import them to Redshift. If you have Salesforce Enterprise, Performance, Unlimited, or Developer edition, you can use the Data Loader export wizard to export data to CSV files. However, if you have Salesforce Professional or Essentials, Data Loader export wizard is not available. In this case, you may use the Export Backup Data Salesforce feature. It also allows you to export your data to CSV files every 29 days (or every week in Enterprise, Performance, and Unlimited Salesforce editions).After you export data to CSV files, you can upload your CSV files to Amazon S3 and then import them to Redshift using the COPY command.This way includes a lot of manual actions but does not include any additional costs. Let’s describe it in more detail.
Exporting Salesforce Data via Data Loader Export Wizard
First, you need to download and install the data loader. You can download it from your Salesforce account settings. Open the Salesforce Setup. Then type “Data Loader” (without quotes) in the Quick Find box and, under Integrations, click Data Loader.
On the Data Loader page of Salesforce Setup, you can download the Data Loader application. Download, install, and run it. Then perform the following steps:
1. In the Data Loader window, click Export.
2. Click Log in and sign in to Salesforce, then click Next.
3. Select a Salesforce object to export data from and specify the CSV file name. Click Next. Data Loader allows you to export both predefined and custom objects.
4. On the next wizard page, you can configure a SOQL query to use for exporting data. We only need to select the fields that we want to export, but if you want to filter data, you can also configure WHERE clauses. For our purposes, we can just click Select all fields, and then click Finish.
5. Finally, click Yes to start export. The data will be exported to the specified file.
Exporting Salesforce Data via Export Backup Data
To export data to CSV using the Salesforce Export Backup Data feature, perform the following steps:
1. Open the Salesforce Setup.
2. Type “Data Export” (without quotes) in the Quick Findbox and, under Data, click Data Export.
3. Click Export Now. Note that this feature allows you to export data on weekly basis or every 29 days depending on your Salesforce edition, and it may not be available if you have used it recently.
4. Select Salesforce objects to export and click Start Export. You can export both predefined and custom objects.
The data export will start, and after some time, you will receive an email with the link to exported data:
5. Click this link to open the page where you can download the export results:
Importing Data to Redshift
If you haven’t prepared the tables for the data in Redshift, you need to create the tables first. For example, you can execute the following statement to create a table for Salesforce Accounts (note that your Account object may have custom fields, not included in this statement):
CREATE TABLE "Account" (
"Id" VARCHAR(18) NOT NULL,
"IsDeleted" BOOLEAN NOT NULL,
"MasterRecordId" VARCHAR(18),
"Name" VARCHAR(255) NOT NULL,
"Type" VARCHAR(255),
"ParentId" VARCHAR(18),
"BillingStreet" VARCHAR(255),
"BillingCity" VARCHAR(40),
"BillingState" VARCHAR(80),
"BillingPostalCode" VARCHAR(20),
"BillingCountry" VARCHAR(80),
"BillingLatitude" DOUBLE PRECISION,
"BillingLongitude" DOUBLE PRECISION,
"BillingGeocodeAccuracy" VARCHAR(255),
"BillingAddress" TEXT,
"ShippingStreet" VARCHAR(255),
"ShippingCity" VARCHAR(40),
"ShippingState" VARCHAR(80),
"ShippingPostalCode" VARCHAR(20),
"ShippingCountry" VARCHAR(80),
"ShippingLatitude" DOUBLE PRECISION,
"ShippingLongitude" DOUBLE PRECISION,
"ShippingGeocodeAccuracy" VARCHAR(255),
"ShippingAddress" TEXT,
"Phone" VARCHAR(40),
"Fax" VARCHAR(40),
"AccountNumber" VARCHAR(40),
"Website" VARCHAR(255),
"PhotoUrl" VARCHAR(255),
"Sic" VARCHAR(20),
"Industry" VARCHAR(255),
"AnnualRevenue" DOUBLE PRECISION,
"NumberOfEmployees" INTEGER,
"Ownership" VARCHAR(255),
"TickerSymbol" VARCHAR(20),
"Description" TEXT,
"Rating" VARCHAR(255),
"Site" VARCHAR(80),
"OwnerId" VARCHAR(18) NOT NULL,
"CreatedDate" TIMESTAMP NOT NULL,
"CreatedById" VARCHAR(18) NOT NULL,
"LastModifiedDate" TIMESTAMP NOT NULL,
"LastModifiedById" VARCHAR(18) NOT NULL,
"SystemModstamp" TIMESTAMP NOT NULL,
"LastActivityDate" DATE,
"LastViewedDate" TIMESTAMP,
"LastReferencedDate" TIMESTAMP,
"Jigsaw" VARCHAR(20),
"JigsawCompanyId" VARCHAR(20),
"AccountSource" VARCHAR(255),
"SicDesc" VARCHAR(80),
CONSTRAINT "PK_Account" PRIMARY KEY ("Id")
);
After creating the necessary tables, you need to upload the CSV files to an Amazon S3 bucket. This is the easiest way of importing CSV files to Amazon Redshift manually. If you don’t have an Amazon S3 bucket or don’t know how to upload files there, please read the corresponding topic in the AWS documentation.
The uploaded CSV files can be imported into Redshift table with the COPY command. The COPY command has the following syntax:
COPY table_name [ column_list ] FROM data_source CREDENTIALS access_credentials [options]
For example:
copy Account
from 's3://awssampledbuswest2/tickit/Account.csv'
iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
CSV
INGOREHEADER 1; </role-name></aws-account-id>
If columns in CSV file and the table are the same, we can omit the column list. Note that Data Loader exports CSV file with columns sorted alphabetically, so we may need to specify the column list for it.
copy Account(AccountNumber, AccountSource, AnnualRevenue, BillingAddress, BillingCity, BillingCountry, BillingGeocodeAccuracy, BillingLatitude, BillingLongitude, BillingPostalCode, BillingState, BillingStreet, ChannelProgramLevelName, ChannelProgramName, CreatedById, CreatedDate, Id, Industry, IsCustomerPortal, IsDeleted, IsPartner, Jigsaw, JigsawCompanyId, LastActivityDate, LastModifiedById, LastModifiedDate, LastReferencedDate, LastViewedDate, MasterRecordId, Name, NumberOfEmployees, OwnerId, Ownership, ParentId, Phone, PhotoUrl, Rating, ShippingAddress, ShippingCity, ShippingCountry, ShippingGeocodeAccuracy, ShippingLatitude, ShippingLongitude, ShippingPostalCode, ShippingState, ShippingStreet, Sic, SicDesc, Site, SystemModstamp)
from 's3://awssampledbuswest2/tickit/Account.csv'
iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>'
CSV
INGOREHEADER 1; </role-name></aws-account-id>
These examples use AWS Identity and Access Management (IAM) role to access the specified bucket. This is the recommended way to access it. You can also use authentication via your access key ID and secret access key, like in the example of the credentials clause below, but this is not recommended.
credentials 'aws_access_key_id=<your-access-key-id>;aws_secret_access_key=<your-secret-access-key>' </your-secret-access-key></your-access-key-id>
We need to add the CSV and INGOREHEADER options because we are importing a comma-separated file with one header row.
After the import you can remove the CSV files from the Amazon S3 bucket or even remove the bucket itself to reduce costs of storing your file in Amazon S3 storage.
Limitations of Manual Integration
We have considered the manual integration of Salesforce to Redshift, which does not involve any additional costs and suits all Salesforce editions. This way, however, includes quite a lot of manual actions and is not well suitable for continuous integration, when data in Redshift needs to be updated with fresh records from Salesforce. Let’s consider a better way of building a Salesforce to Redshift data pipeline, this time using third-party solutions.
How to Connect Salesforce to Redshift Using Salesforce API
We’ve already explored both manual and automatic ways of integrating Salesforce with Redshift. However, there’s still one method for bringing different tools together, the very traditional one but still popular. Now, we talk about APIs, and Salesforce offers a large number of them. Here are the most popular and widely used APIs provided by Salesforce:
- REST API
- SOAP API
- Bulk API
- Metadata API
- Streaming API
- Apex API
Such a variety provides a space for developers to customize the integration of Salesforce data into other apps to the desired level. As for Redshift, not all of the above-mentioned API types are applicable. But now, let’s focus on the Salesforce data preparation before its transfer to Amazon Redshift and then present you with a specific example of how to connect Salesforce with Redshift on the basis of available APIs.
Prepare Your Salesforce Data for Amazon Redshift
The way of loading Salesforce data to Amazon Redshift starts with the design of schema within the data warehouse. You’ll have to adhere to the relational database schema with tables and columns. Make sure that the data type for each attribute corresponds to the one supported by Redshift.
In general, you design a model that corresponds to the table structure within Salesforce for easier data transfer. This step is essential as it impacts all further operations on data within the data warehouse.
Another thing to consider is that Amazon Redshift can’t pull data directly from Salesforce. Data first needs to be loaded to one of the data sources supported as input by Redshift:
- Amazon S3
- DynamoDB
- Kinesis Firehose
So, Salesforce data makes one stop on the way to the final destination. And then, later, with the help of additional tools, the data is pulled from these sources to Amazon Redshift.
Note that Skyvia Replication deprives of the necessity to go through this extra step but creates the database table structure automatically in Redshift and transfers Salesforce data directly there.
Load Data from Salesforce to Amazon Redshift
Here, you will find detailed instructions on the process, including minor preparatory and intermediary procedures.
PREREQUISITES
Actions on the Salesforce side
- Find your Salesforce security token.
- Define a custom platform event.
- Create a connected app.
Actions on the Amazon side
- Create an EventBridge account or log into an existing one.
- Enable Amazon S3 bucket for EventBridge.
STEP 1. ESTABLISH CONNECTION TO SALESFORCE API
- Open the Amazon EventBridge console.
- Go to API destinations -> Connections -> Create connection.
- Provide the name and description of the connection.
- Select Partners for the Destinations type and select Salesforce for Partner Destinations.
- In the Authorization endpoint field, type one of the following:
NOTE: The URL will be different in case you are using Sandbox with or without enhanced domains.
- Select POST from the HTTP method drop-down list.
- Enter the client ID and client secret associated with your Salesforce-connected app.
- Enter your Salesforce credentials.
- Click Create.
STEP 2. CREATE API DESTINATION
- Open the Amazon EventBridge console.
- Go to API destinations -> Create API destination.
- Provide the name and description of the API destination.
- In the API destination endpoint field, type one of the following:
- https://MyDomainName.my.salesforce.com/services/data/v54.0/sobjects/MyEvent__e, where Myevent__e is the platform event where you want to send information.
- Select POST from the HTTP method drop-down list.
- In the Invocation rate limit field, type 300.
- Select the Use an existing connection option and specify the Salesforce connection created in the previous step.
- Click Create.
STEP 3. CREATE A RULE
This step is necessary to organize the sending of events to Salesforce once an Amazon S3 object is created.
- Open the Amazon EventBridge console.
- Go to Rules -> Create rule.
- Provide the name and description of the rule.
- Set Default for the Event bus option and select Rule with an even pattern from the drop-down list in the Rule type field.
- Click Next.
- Select Other in the Event source field.
- In the Event pattern field, type “source”: [“aws.s3”].
- Click Next.
- Choose EventBridge API destination in the Target types and specify the existing API destination for Salesforce created in the previous step.
- In the Execution role, select Create a new role….
- Click Next.
- Click Create rule.
To test your rule, create an Amazon S3 object by uploading a file to an EventBridge-enabled bucket. The information about the created object will be sent to the Salesforce platform event.
STEP 4. LOAD DATA FROM S3 TO REDSHIFT
For that, please check the official documentation for transferring data from Amazon S3 to Amazon Redshift.
As you see, the process of Salesforce – Redshift integration with APIs isn’t that simple. This method is definitely more sophisticated than manual and automated data transfer with third-party tools. However, integration with APIs has its benefits – it offers a high degree of customization, which isn’t available with other approaches.
How to connect Salesforce to Redshift Using Third-Party Tools
There are a number of third-party tools, which can be used for Salesforce and Redshift integration. One of such tools we will describe below.
Connector for Salesforce to Redshift Replication
Let’s consider Salesforce to Redshift replication using Skyvia. Skyvia is a powerful data platform with lots of data-related tools, including ETL tools to configure Salesforce to Redshift ETL. (It includes reverse ETL tools for extracting data from Redshift and putting them to Salesforce as well). Its data replication tool offers an easy way to copy cloud data into a database or cloud data warehouse.
Replication configuration is very easy with Skyvia – you can set it up in under 5 minutes. After you’ve configured replication once, you can schedule data load into Redshift from Salesforce, and Skyvia will automatically import Salesforce data into Redshift incrementally (which means that only modified data will be loaded).
Let’s describe how to configure replication in Skyvia. First, you need to register a Skyvia account (it’s free). Then create connections to your data sources – Salesforce and Redshift.
CREATING CONNECTIONS
To create a connection, select New and then select Connection on the left. Then select the respective connector. Skyvia supports a large number of connectors including Redshift and Salesforce connectors.
To create a connection to Salesforce, you need to click Sign In with Salesforce and the sign in to Salesforce. Then you can enter connection name and save the connection.
Alternatively, you can select Username and Password Authentication and specify your username, password, and security token.
For Redshift connection, you need to specify the corresponding connection parameters for Redshift.
Don’t forget also to click Advanced Settings and enter the parameters for connecting to Amazon S3 storage.
CREATING REPLICATION PACKAGE
Now let us proceed to building a Salesforce to Redshift pipeline. Select New and then, under Integration, select Replication. Then, in the replication package editor, select source and target connections. After selecting connections, all we need to do is to select the Salesforce objects to replicate to Redshift.
Then you can also click Schedule and set schedule for automatic replication.
After this, Salesforce to Redshift replication will export data from Salesforce to Redshift automatically, and it will keep data in Redshift up-to-date without any user interaction. It can be used for any Salesforce edition.
As you can see, automating Salesforce and Redshift integration with Skyvia is very simple and requires just a few minutes. However, it’s often necessary not just load data from Salesforce to Redshift, but also load it back. For example, it can be important when performing Salesforce data archiving, when you load legacy Salesforce data to Amazon Redshift for saving on Salesforce storage costs, but still want to be able to access them from time to time from Salesforce. Let’s see how Skyvia can help you integrate Amazon Redshift to Salesforce.
Integrate Redshift to Salesforce in 3 Steps
If you need simply to load data from Redshift to Salesforce, then you can use Skyvia Import or Data Flow tools. They are ETL tools that can extract data from a data source, transform them if necessary, and load to the target. But in the case of Salesforce data archiving, loading data back to Salesforce is not always suitable. So we will consider a different scenario – connecting Redshift to Salesforce as an external data source via Salesforce Connect.
Salesforce Connect allows you to connect OData endpoints as external data sources and work with OData entities as with Salesforce objects. Skyvia allows you to publish data from different data sources as OData endpoints and offers the Skyvia Connect product for it. Using Skyvia Connect and Salesforce Connect together, you can publish data in Redshift as an OData endpoint and then link it to Salesforce as an external data source.
Suppose we have loaded legacy accounts and contacts to Redshift. Let’s see how we can link them back via Salesforce Connect and Skyvia Connect.
STEP 1: CREATING ODATA ENDPOINT IN SKYVIA CONNECT.
1. Select New and then, under Connect, select OData Endpoint.
2. Click Simple mode.
3. Select your Redshift connection.
4. Select tables, data of which you want to publish. In our case, we select the account and contact tables.
5. Let’s change the generated entity names and add the ‘legacy_’ prefix to them. We are doing it in order for Salesforce external objects to have different, easily distinguishable names than the predefined Salesforce Account and Contact objects. To do it, click the pencil icon near the table name. This opens the Edit Entity dialog box.
In this dialog, you can edit the result name of entity and entity set, and the names of entity properties. You can also hide certain properties and not publish them via OData endpoint. Here, we need to add the ‘legacy_’ prefix to the values of the Name and Entity Set boxes. We need to do it both for account and contact tables. After this, click Next step.
6. On the next step, we configure security settings. Here we can add usernames and passwords if we want our endpoint to require authentication and restrict access to the endpoint by specifying IP address ranges.
To add authentication, click Public and then +Add new. Then enter the required user name and password. For example, we will add user skyvia with the password skyviapassword for this endpoint, but in the non-test environments, we recommend that you use more secure passwords.
In a similar way, you can add IP Address ranges to allow access only from these ranges. After you finish configuring endpoint security, click Next step.
7. Finally, we need to specify the endpoint name and configure a couple of additional endpoint options, namely, OData protocol version and whether the endpoint will allow writing data. Skyvia supports both OData v1-v3 and OData v4, and it allows creating writable OData endpoints, so you can modify their data in Salesforce using Salesforce Connect. That’s all, after you configure these settings, click Save endpoint.
Our OData endpoint is ready, and we can copy its URL to link in Salesforce Connect.
STEP 2: CREATING EXTERNAL DATA SOURCE IN SALESFORCE
First, we need to say that Salesforce Connect is not available for all editions. Salesforce Connect is available only in Enterprise, Performance, and Unlimited editions (as well as in the Developer edition), and comes with extra cost.
To create an external data source in Salesforce, we need to perform the following steps:
1. In the Salesforce Setup, type “External” (without quotes) in the Quick Find box and, under Develop, click External Data Sources.
2. Click New External Data Source.
3. Enter values for External Data Source (a user-friendly name) and Name (unique external data source identifier) boxes.
4. In the Type list select Salesforce Connect: OData 4.0.
5. Paste the endpoint URL into the URL box.
6. Select the Writable External Objects checkbox if you use an endpoint to a writable data source.
7. Optionally, configure other settings, like High Data Volume depending on your data in the data source.
8. If you created an endpoint with user accounts and password, you need to configure Authentication settings. Set Identity Type to Named Principal, Authentication Protocol to Password Authentication, and specify the endpoint Username and Password. In our example, they are skyvia and skyviapassword .
9. We have specified all the necessary settings, so click Save.
10. Select the tables you want to sync and click Sync. This will create the necessary external objects automatically.
STEP 3: CREATING TABS FOR EXTERNAL OBJECTS
We have created external objects in Salesforce, and now let’s create tabs in Salesforce interface to work with them.
1. In the Salesforce Setup, type “Tabs” (without quotes) in the Quick Find box and, under Create, click Tabs.
2. Click New.
3. Select the required Object and set the Tab Style and Description.
4. Click Next.
5. Specify the tab visibility settings and click Next.
6. Configure the tab availability for custom apps and click Save.
More Tools for Salesforce and Redshift
Skyvia offers more different data-related tools than described in this blog article. For example, it allows you to perform Salesforce backup and restore. It also provides a powerful online SQL client for cloud and database data, from which you can query Salesforce with SQL or mass update records in Salesforce. Skyvia’s pricing plans are not based on the number of data sources or Salesforce users, and it has free pricing plans for its products that allow you to use Skyvia with small data volumes for free.