Table of contents
- Key Features of Salesforce
- Key Features of AWS Redshift
- Why Do You Need to Connect Salesforce to Redshift?
- How to connect Salesforce to Redshift Using Native Tools
- How to connect Salesforce to Redshift Using Third-Party Tools
- More Tools for Salesforce and Redshift
In our days, one can hardly find a business owner or high-level manager without understanding the importance of data. Data can help them better understand their customers’ needs, perform analysis of processes in their companies, study competitors, and make important business decisions. Working with data is now a part of nearly any task and process.
Since data takes part in so many different tasks, there are very many different tools for data storing, analysis, using, and processing. For example, companies use CRM software like Salesforce for working with customers as well as collecting and processing data on their customers and all the interactions with them. For general data storage and data analysis, companies usually use relational databases or cloud data warehouses. Currently, Amazon Redshift is the most popular cloud data warehouse.
However, it’s not always convenient to have different data locked in different tools. Data must work together in order to provide you with a 360-degree view of your customers and the situation in general. It is important to have your data connected, and the article about Salesforce to Redshift integration is what you need. It describes how to integrate Amazon Redshift to Salesforce manually and automatically in detail.
Key Features of Salesforce
Salesforce is the most popular cloud CRM on the market. It has already become the standard for customer relationship management and is used by almost one in five businesses. Besides customer-relationship management, Salesforce offers a united platform with a number of additional tools for marketing (Salesforce Marketing Cloud), data analysis (like Salesforce Wave Analytics), customer service and many more. The platform’s functionality can also be extended with customer or third-party applications.
Key Features of Redshift
Amazon Redshift is a part of Amazon Web Services platform (AWS). It is a Massively Parallel Processing (MPP) data warehouse that can split data processing tasks and run them simultaneously on multiple processors, providing exceptional performance for complex analytical queries over huge volumes of data. It allows storing and processing data on petabyte scale and offers relatively cheap data storage per gigabyte in comparison to Salesforce. As well as Salesforce, it is a leader in the market of MPP data warehouse cloud solutions.
Why Do You Need to Connect Salesforce to Redshift?
Let’s see why Salesforce and Redshift integration is important and how it can be useful for business. First, you may need to load Salesforce data into Redshift to use immense Redshift performance for data analysis. While Salesforce has its own data analysis tools, some of them come with an additional price tag, and connecting Salesforce to Redshift can be a better option. Besides, you may load data from other applications and tools used in the company to Redshift too, and analyze all the data together, building a 360 degree business view.
Second, you may just want to have a copy of Salesforce data elsewhere, and Redshift can be a cost-effective place to store your data. In this case, you can build a Salesforce to Redshift pipeline to pull Salesforce data whenever they are updated and apply the updates to Redshift. In this way, you will have a copy of an up-to-date Salesforce data in Redshift, and can access it whenever you need.
Another scenario can be Salesforce data archiving. Storing large data volumes in Salesforce can be quite costly. If a significant portion of this data is legacy records, which you rarely need to access, but you need to store them for legal and compliance reasons, a good option is to move data from Salesforce to Redshift. Keep the fresh records that you need to work with in Salesforce and archive the rest to a Redshift database. There are even ways to connect Salesforce to Redshift and query data from redshift from salesforce interface. We describe the ways to build Redshift to Salesforce integration later.
How to connect Salesforce to Redshift Using Native Tools
Salesforce offers several native ways and tools for connecting Salesforce to Redshift. However, the most convenient of them usually come with the additional cost and are not available for all Salesforce editions.
Let’s list some of the ways that Salesforce provides via its own tools:
- MuleSoft AnyPoint Platform. This is a Salesforce owned solution for automatic data integration that allows Salesforce and Redshift integration as well as integration with other apps and systems. It comes with an additional cost, depending on data sources and use cases, and there is no well-defined pricing published, you need to contact them for a quote.
- CSV Export and Import to Redshift. As an alternative, you can manually export Salesforce data to CSV files and then import them to Redshift. If you have Salesforce Enterprise, Performance, Unlimited, or Developer edition, you can use the Data Loader export wizard to export data to CSV files. However, if you have Salesforce Professional or Essentials, Data Loader export wizard is not available. In this case, you may use the Export Backup Data Salesforce feature. It also allows you to export your data to CSV files every 29 days (or every week in Enterprise, Performance, and Unlimited Salesforce editions).After you export data to CSV files, you can upload your CSV files to Amazon S3 and then import them to Redshift using the COPY command.This way includes a lot of manual actions but does not include any additional costs. Let’s describe it in more detail.
Exporting Salesforce Data via Data Loader Export Wizard
First, you need to download and install the data loader. You can download it from your Salesforce account settings. Open the Salesforce Setup. Then type “Data Loader” (without quotes) in the Quick Find box and, under Integrations, click Data Loader.
On the Data Loader page of Salesforce Setup, you can download the Data Loader application. Download, install, and run it. Then perform the following steps:
1. In the Data Loader window, click Export.
2. Click Log in and sign in to Salesforce, then click Next.
3. Select a Salesforce object to export data from and specify the CSV file name. Click Next. Data Loader allows you to export both predefined and custom objects.
4. On the next wizard page, you can configure a SOQL query to use for exporting data. We only need to select the fields that we want to export, but if you want to filter data, you can also configure WHERE clauses. For our purposes, we can just click Select all fields, and then click Finish.
5. Finally, click Yes to start export. The data will be exported to the specified file.
Exporting Salesforce Data via Export Backup Data
To export data to CSV using the Salesforce Export Backup Data feature, perform the following steps:
1. Open the Salesforce Setup.
2. Type “Data Export” (without quotes) in the Quick Findbox and, under Data, click Data Export.
3. Click Export Now. Note that this feature allows you to export data on weekly basis or every 29 days depending on your Salesforce edition, and it may not be available if you have used it recently.
4. Select Salesforce objects to export and click Start Export. You can export both predefined and custom objects.
The data export will start, and after some time, you will receive an email with the link to exported data:
5. Click this link to open the page where you can download the export results:
Importing Data to Redshift
If you haven’t prepared the tables for the data in Redshift, you need to create the tables first. For example, you can execute the following statement to create a table for Salesforce Accounts (note that your Account object may have custom fields, not included in this statement):
CREATE TABLE "Account" ( "Id" VARCHAR(18) NOT NULL, "IsDeleted" BOOLEAN NOT NULL, "MasterRecordId" VARCHAR(18), "Name" VARCHAR(255) NOT NULL, "Type" VARCHAR(255), "ParentId" VARCHAR(18), "BillingStreet" VARCHAR(255), "BillingCity" VARCHAR(40), "BillingState" VARCHAR(80), "BillingPostalCode" VARCHAR(20), "BillingCountry" VARCHAR(80), "BillingLatitude" DOUBLE PRECISION, "BillingLongitude" DOUBLE PRECISION, "BillingGeocodeAccuracy" VARCHAR(255), "BillingAddress" TEXT, "ShippingStreet" VARCHAR(255), "ShippingCity" VARCHAR(40), "ShippingState" VARCHAR(80), "ShippingPostalCode" VARCHAR(20), "ShippingCountry" VARCHAR(80), "ShippingLatitude" DOUBLE PRECISION, "ShippingLongitude" DOUBLE PRECISION, "ShippingGeocodeAccuracy" VARCHAR(255), "ShippingAddress" TEXT, "Phone" VARCHAR(40), "Fax" VARCHAR(40), "AccountNumber" VARCHAR(40), "Website" VARCHAR(255), "PhotoUrl" VARCHAR(255), "Sic" VARCHAR(20), "Industry" VARCHAR(255), "AnnualRevenue" DOUBLE PRECISION, "NumberOfEmployees" INTEGER, "Ownership" VARCHAR(255), "TickerSymbol" VARCHAR(20), "Description" TEXT, "Rating" VARCHAR(255), "Site" VARCHAR(80), "OwnerId" VARCHAR(18) NOT NULL, "CreatedDate" TIMESTAMP NOT NULL, "CreatedById" VARCHAR(18) NOT NULL, "LastModifiedDate" TIMESTAMP NOT NULL, "LastModifiedById" VARCHAR(18) NOT NULL, "SystemModstamp" TIMESTAMP NOT NULL, "LastActivityDate" DATE, "LastViewedDate" TIMESTAMP, "LastReferencedDate" TIMESTAMP, "Jigsaw" VARCHAR(20), "JigsawCompanyId" VARCHAR(20), "AccountSource" VARCHAR(255), "SicDesc" VARCHAR(80), CONSTRAINT "PK_Account" PRIMARY KEY ("Id") );
After creating the necessary tables, you need to upload the CSV files to an Amazon S3 bucket. This is the easiest way of importing CSV files to Amazon Redshift manually. If you don’t have an Amazon S3 bucket or don’t know how to upload files there, please read the corresponding topic in the AWS documentation.
The uploaded CSV files can be imported into Redshift table with the COPY command. The COPY command has the following syntax:
COPY table_name [ column_list ] FROM data_source CREDENTIALS access_credentials [options] For example: copy Account from 's3://awssampledbuswest2/tickit/Account.csv' iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>' CSV INGOREHEADER 1; </role-name></aws-account-id>
If columns in CSV file and the table are the same, we can omit the column list. Note that Data Loader exports CSV file with columns sorted alphabetically, so we may need to specify the column list for it.
copy Account(AccountNumber, AccountSource, AnnualRevenue, BillingAddress, BillingCity, BillingCountry, BillingGeocodeAccuracy, BillingLatitude, BillingLongitude, BillingPostalCode, BillingState, BillingStreet, ChannelProgramLevelName, ChannelProgramName, CreatedById, CreatedDate, Id, Industry, IsCustomerPortal, IsDeleted, IsPartner, Jigsaw, JigsawCompanyId, LastActivityDate, LastModifiedById, LastModifiedDate, LastReferencedDate, LastViewedDate, MasterRecordId, Name, NumberOfEmployees, OwnerId, Ownership, ParentId, Phone, PhotoUrl, Rating, ShippingAddress, ShippingCity, ShippingCountry, ShippingGeocodeAccuracy, ShippingLatitude, ShippingLongitude, ShippingPostalCode, ShippingState, ShippingStreet, Sic, SicDesc, Site, SystemModstamp) from 's3://awssampledbuswest2/tickit/Account.csv' iam_role 'arn:aws:iam::<aws-account-id>:role/<role-name>' CSV INGOREHEADER 1; </role-name></aws-account-id>
These examples use AWS Identity and Access Management (IAM) role to access the specified bucket. This is the recommended way to access it. You can also use authentication via your access key ID and secret access key, like in the example of the credentials clause below, but this is not recommended.
credentials 'aws_access_key_id=<your-access-key-id>;aws_secret_access_key=<your-secret-access-key>' </your-secret-access-key></your-access-key-id>
We need to add the CSV and INGOREHEADER options because we are importing a comma-separated file with one header row.
Limitations of Manual Integration
We have considered the manual integration of Salesforce to Redshift, which does not involve any additional costs and suits all Salesforce editions. This way, however, includes quite a lot of manual actions and is not well suitable for continuous integration, when data in Redshift needs to be updated with fresh records from Salesforce. Let’s consider a better way of building a Salesforce to Redshift data pipeline, this time using third-party solutions.
How to connect Salesforce to Redshift Using Third-Party Tools
There are a number of third-party tools, which can be used for Salesforce and Redshift integration. One of such tools we will describe below.
Connector for Salesforce to Redshift Replication
Let’s consider Salesforce to Redshift replication using Skyvia. Skyvia is a powerful data platform with lots of data-related tools, including ETL tools to configure Salesforce to Redshift ETL. (It includes reverse ETL tools for extracting data from Redshift and putting them to Salesforce as well). Its data replication tool offers an easy way to copy cloud data into a database or cloud data warehouse.
Replication configuration is very easy with Skyvia – you can set it up in under 5 minutes. After you’ve configured replication once, you can schedule data load into Redshift from Salesforce, and Skyvia will automatically import Salesforce data into Redshift incrementally (which means that only modified data will be loaded).
To create a connection, select New and then select Connection on the left. Then select the respective connector. Skyvia supports a large number of connectors including Redshift and Salesforce connectors.
To create a connection to Salesforce, you need to click Sign In with Salesforce and the sign in to Salesforce. Then you can enter connection name and save the connection.
Alternatively, you can select Username and Password Authentication and specify your username, password, and security token.
For Redshift connection, you need to specify the corresponding connection parameters for Redshift.
Don’t forget also to click Advanced Settings and enter the parameters for connecting to Amazon S3 storage.
CREATING REPLICATION PACKAGE
Now let us proceed to building a Salesforce to Redshift pipeline. Select New and then, under Integration, select Replication. Then, in the replication package editor, select source and target connections. After selecting connections, all we need to do is to select the Salesforce objects to replicate to Redshift.
Then you can also click Schedule and set schedule for automatic replication.
After this, Salesforce to Redshift replication will export data from Salesforce to Redshift automatically, and it will keep data in Redshift up-to-date without any user interaction. It can be used for any Salesforce edition.
As you can see, automating Salesforce and Redshift integration with Skyvia is very simple and requires just a few minutes. However, it’s often necessary not just load data from Salesforce to Redshift, but also load it back. For example, it can be important when performing Salesforce data archiving, when you load legacy Salesforce data to Amazon Redshift for saving on Salesforce storage costs, but still want to be able to access them from time to time from Salesforce. Let’s see how Skyvia can help you integrate Amazon Redshift to Salesforce.
Integrate Redshift to Salesforce in 3 Steps
If you need simply to load data from Redshift to Salesforce, then you can use Skyvia Import or Data Flow tools. They are ETL tools that can extract data from a data source, transform them if necessary, and load to the target. But in the case of Salesforce data archiving, loading data back to Salesforce is not always suitable. So we will consider a different scenario – connecting Redshift to Salesforce as an external data source via Salesforce Connect.
Salesforce Connect allows you to connect OData endpoints as external data sources and work with OData entities as with Salesforce objects. Skyvia allows you to publish data from different data sources as OData endpoints and offers the Skyvia Connect product for it. Using Skyvia Connect and Salesforce Connect together, you can publish data in Redshift as an OData endpoint and then link it to Salesforce as an external data source.
STEP 1: CREATING ODATA ENDPOINT IN SKYVIA CONNECT.
1. Select New and then, under Connect, select OData Endpoint.
2. Click Simple mode.
3. Select your Redshift connection.
4. Select tables, data of which you want to publish. In our case, we select the account and contact tables.
5. Let’s change the generated entity names and add the ‘legacy_’ prefix to them. We are doing it in order for Salesforce external objects to have different, easily distinguishable names than the predefined Salesforce Account and Contact objects. To do it, click the pencil icon near the table name. This opens the Edit Entity dialog box.
In this dialog, you can edit the result name of entity and entity set, and the names of entity properties. You can also hide certain properties and not publish them via OData endpoint. Here, we need to add the ‘legacy_’ prefix to the values of the Name and Entity Set boxes. We need to do it both for account and contact tables. After this, click Next step.
6. On the next step, we configure security settings. Here we can add usernames and passwords if we want our endpoint to require authentication and restrict access to the endpoint by specifying IP address ranges.
To add authentication, click Public and then +Add new. Then enter the required user name and password. For example, we will add user skyvia with the password skyviapassword for this endpoint, but in the non-test environments, we recommend that you use more secure passwords.
In a similar way, you can add IP Address ranges to allow access only from these ranges. After you finish configuring endpoint security, click Next step.
7. Finally, we need to specify the endpoint name and configure a couple of additional endpoint options, namely, OData protocol version and whether the endpoint will allow writing data. Skyvia supports both OData v1-v3 and OData v4, and it allows creating writable OData endpoints, so you can modify their data in Salesforce using Salesforce Connect. That’s all, after you configure these settings, click Save endpoint.
Our OData endpoint is ready, and we can copy its URL to link in Salesforce Connect.
STEP 2: CREATING EXTERNAL DATA SOURCE IN SALESFORCE
First, we need to say that Salesforce Connect is not available for all editions. Salesforce Connect is available only in Enterprise, Performance, and Unlimited editions (as well as in the Developer edition), and comes with extra cost.
To create an external data source in Salesforce, we need to perform the following steps:
1. In the Salesforce Setup, type “External” (without quotes) in the Quick Find box and, under Develop, click External Data Sources.
2. Click New External Data Source.
3. Enter values for External Data Source (a user-friendly name) and Name (unique external data source identifier) boxes.
4. In the Type list select Salesforce Connect: OData 4.0.
5. Paste the endpoint URL into the URL box.
6. Select the Writable External Objects checkbox if you use an endpoint to a writable data source.
7. Optionally, configure other settings, like High Data Volume depending on your data in the data source.
8. If you created an endpoint with user accounts and password, you need to configure Authentication settings. Set Identity Type to Named Principal, Authentication Protocol to Password Authentication, and specify the endpoint Username and Password. In our example, they are skyvia and skyviapassword .
9. We have specified all the necessary settings, so click Save.
10. Select the tables you want to sync and click Sync. This will create the necessary external objects automatically.
STEP 3: CREATING TABS FOR EXTERNAL OBJECTS
We have created external objects in Salesforce, and now let’s create tabs in Salesforce interface to work with them.
1. In the Salesforce Setup, type “Tabs” (without quotes) in the Quick Find box and, under Create, click Tabs.
2. Click New.
3. Select the required Object and set the Tab Style and Description.
4. Click Next.
5. Specify the tab visibility settings and click Next.
6. Configure the tab availability for custom apps and click Save.
More Tools for Salesforce and Redshift
Skyvia offers more different data-related tools than described in this blog article. For example, it allows you to perform Salesforce backup and restore. It also provides a powerful online SQL client for cloud and database data, from which you can query Salesforce with SQL or mass update records in Salesforce. Skyvia’s pricing plans are not based on the number of data sources or Salesforce users, and it has free pricing plans for its products that allow you to use Skyvia with small data volumes for free.