Is Power BI ETL even a thing? It may be a surprise if you know Power BI mainly as a data visualization tool. But part of business intelligence is data preparation. And Power BI is not disconnected from that. As you will see later, there’s more to Power BI than meets the eye.
So, are there data preparation techniques or methods in Power BI? Read on to find out. Here’s a quick overview of what we’re going to cover:
Table of contents
- Is Power BI an ETL Tool?
- ETL Processes and Capabilities in Power BI
- Using Power Query for ETL
- ETL with Power BI Dataflows
- Skyvia ETL Tool for Power BI
- Conclusion
Is Power BI an ETL Tool?
Power BI is a business intelligence tool. It’s kind of an ETL tool because ETL is part of business intelligence.
To clarify this further, let’s revisit what ETL means. ETL is Extract, Transform, Load. It’s a form of a data pipeline to integrate various data sources. Without it, your analytical reports and dashboards look old because of outdated data. ETL helps update them so your reports are current.
Meanwhile, here’s how Microsoft defines Power BI: Power BI is a collection of software services, apps, and connectors that work together to turn your unrelated sources of data into coherent, visually immersive, and interactive insights. – Microsoft Docs
The “visually immersive and interactive insights” part is the output we all see. But note that it’s “a collection of services, apps, and connectors”. And you guessed it right. ETL is one of them.
So, Power BI has an ETL tool plus more. The ETL part makes the data coherent. Meanwhile, the data modeling and visualization part makes it a very appealing tool for data analysis. Moving forward, let’s discuss its capabilities and harness ETL using Power BI.
ETL Process and Capabilities in Power BI
When you connect to a data source in Microsoft Power BI, you pick the tables you need for your report or dashboard. It can be product sales or whatever. Underneath that is Power Query – the ETL engine. If you didn’t perform any transformations, you may not have seen it.
So, here are the Power BI ETL capabilities:
- Able to connect to various data sources and extract data from them.
- A visual editor that allows hundreds of data transformations, like sorting, grouping, column splitting, and more.
- The M language. Any transformation you do visually is written in M. You can also edit the code using the Advanced Editor.
- Perform AI insights through Azure Machine Learning and Cognitive Services.
Power Query is also available in other Microsoft products aside from Power BI. These include Excel, SQL Server Analysis Services, and more. So, you can reuse the Power Query skills you gain from one product to another.
But how does it work? The diagram below shows a typical ETL process. But in the case of Power Query, the destination is limited to where it is supported. In our case, this is Power BI.
Let’s discuss ETL tools Power BI has to offer through examples.
Using Power Query for ETL
Let’s start with the Power BI ETL tool using Power Query. Whether you’re using the Power BI Desktop or Power BI Service, you’ll be using Power Query. Also, using Power Query Desktop is free. So, learning it from here is a good starting point.
To see this in action, let’s use Power Query ETL for a data warehouse in Power BI. For more details about the data structure of this data warehouse, please refer to this article.
STEP 1: Open Power BI and Connect to the Data Source
For this example, we’re using SQL Server Database as the data source.
Enter the connection settings for the data source. You may be asked for a user ID and password upon clicking OK.
STEP 2: Select the Tables You Need
Then, another window will appear to ask you to pick the tables you need. Select tables from the data source.
The screenshot shows the tables within the data warehouse. Mark checked all the dimensions and fact tables. Then, select Transform Data. At this point, we’re done with the extraction part of the ETL.
STEP 3: Transform the Data Based on Your Requirements
There are many data transformations you can do in Power Query. But in this example, let’s assume you need to group the data.
CREATE A NEW QUERY BY REFERENCING AN EXISTING TABLE
First, make a reference to the FactFireInsuranceSales table. See below how to do it.
Right-click the FactFireInsuranceSales table. Then, select Reference. Another query called FactInsuranceSales(2) will appear. Then, rename this to MonthlyProductSales by selecting it and pressing F2.
GROUP THE DATA WITH AN AGGREGATION
Referring to the screenshot, here’s how it is done:
- Click Group By.
- Then, click Advanced. We need to group using 2 columns. So, Basic won’t do.
- Click Add grouping. And when a dropdown list appears, select product_id.
- Then, define the aggregation. Enter Total Premium in New column name. Then, select the Operation. And last, select the premium column to sum.
- Finally, click OK to create the grouping.
The final result of the transformations we did is seen in the next screenshot.
All the steps needed appear in the right pane under Applied Steps. You can select any of them to modify a step. Meanwhile, all the steps generated the M language code seen below.
let
Source = FactFireInsuranceSales,
#"Expanded dimDate" = Table.ExpandRecordColumn(Source, "dimDate", {"year_month_number"}, {"dimDate.year_month_number"}),
#"Grouped Rows" = Table.Group(#"Expanded dimDate", {"dimDate.year_month_number", "product_id"}, {{"Total Premium", each List.Sum([premium]), type number}})
in
#"Grouped Rows"
Once you’re done, click Close & Apply. And your data will be imported to Power BI Desktop as the destination of this ETL setup.
ETL with Power BI Dataflows
ETL with Power BI Dataflows will take your ETL process to the next level. Read on to know more.
What Are Power BI Dataflows?
Power BI Dataflows are collections of entities or tables created and managed in workspaces in the Power BI service. This is much like a table in your database. And then, you update them using refresh schedules. It works by using Power Query and loads the result to Azure Data Lake Storage Gen2.
Before we begin, you need a Power BI Service with a premium license.
5 Easy Ways to Do ETL with Power BI Dataflows
Creating Dataflows is similar to the one we did with Power BI Desktop because of ETL Power Query. But first, log in to your Power BI Service. Then, open your workspace and click New. Then, select Dataflow.
Then, you will see 4 options to create a Dataflow as shown below on the screenshot.
These 4 options allow you to use different methods in creating Dataflows. These are explained next.
#1. CREATE DATAFLOWS USING NEW TABLES
Refer to the above screenshot and click Add new tables to use this first method. This allows you to connect to a data source and select the tables you need.
Use cases: Any table or entity in a data source you wish to perform ETL. This can be sales records, customer information, and many more.
So, the next step is to connect to your data source. We’re going to use the same SQL Server database in this example. First, you select the SQL Server data source for your Dataflow.
Second, you configure connection settings for a SQL Server data source in Power BI Dataflow.
Then, you choose the tables you need. See an example screenshot below.
Once you checkmark all the tables you need, click Transform data. You can now perform similar functions like grouping data. Finally, don’t forget to name and save your new Dataflow.
#2. CREATE DATAFLOWS USING LINKED TABLES
The second method involves creating a link to an existing table. This allows you to reuse an existing table.
Use cases: Any table you can reuse in different Dataflows. One example is reusing a Date dimension table in a data warehouse.
To start with this method, refer to four options in creating a new Dataflow, which we described above. Then, click Add link tables. You will be taken to a page that lists all the Dataflows and tables in a tree hierarchy. Select the table you wish to link. Then, click Transform Data. See the screenshot below and, finally, click Save and Close.
#3. CREATE DATAFLOWS USING COMPUTED TABLES
The third method involves referencing a table and performing in-storage computations. Since it’s just a reference to a table, calculations will not involve the external data source. The result is a new table called a computed table.
Tables and linked tables can be edited. And you can start creating computed tables from here. To do that, right-click a table and mark Enable load as checked. Then, create a Reference to it. Check this out in the screenshot below.
Then, rename the resulting computed table and do any transformations as needed.
#4. CREATE DATAFLOWS USING IMPORT/EXPORT
The fourth method is using the Import/Export model. This allows you to export an existing Dataflow to a JSON file. Then, use the Import model to create a Dataflow in another workspace using the exported file.
Now, refer again to four options in creating a new Dataflow, which we described above. To import the model we just exported, go to another workspace, and create a new Dataflow. And then choose the Import model. Then, browse for the JSON file you exported and click OK.
#5. CREATE DATAFLOWS USING CDM FOLDER
The fifth method of creating a Dataflow is through the use of a CDM folder. This allows you to reference a table that has been written by another application in the Common Data Model (CDM) format. You need to provide the complete path to the CDM format file stored in Azure Data Lake Storage Gen 2. Note that you also need to have permission to access that folder.
To do that, refer to four options above. Then, create a new Dataflow and click Create and attach. Then, see what you need to input next and, finally, click Create and attach.
Consuming a Dataflow in Power BI
You can use the Dataflows you created in Power BI Desktop. See how this is done in the screenshot below.
Selecting Power BI dataflows requires you to sign in to Power BI Service. Then, you need to choose the tables you need in your Power BI report.
There’s so much more about Dataflows as an ETL tool in Power BI than we can discuss in this article. Visit these valuable links for more information:
- Benefits of using Power BI Dataflows and their use-cases.
- Configuring Incremental Refresh for faster loading.
- Limitations and considerations in using Power BI Dataflows.
- Power BI ETL best practices using Dataflows.
Skyvia ETL Tool for Power BI
Skyvia is a cloud data platform that combines both ETL and Reverse ETL elements. Skyvia makes it easy to put (replicate) data into DWH to be further analyzed through Power BI (analytics reports, visualization, etc) by offering ETL functionality. In addition to this, Skyvia also offers the Reverse ETL functionality, which returns the required actionable data back to the operational system. That means that any user can not only build dashboards in Power BI, but also return the data back to operational systems if necessary. What’s also worth mentioning is that Skyvia is a completely code-free solution, and you don’t need to install it to use.
To replicate data into DWH, you can use Skyvia Replication. And further use data for reporting and visualization by means of BI tools.
To send data back to any operational system, use Skyvia Import. More about Reverse ETL can be found in this article.
Conclusion
You’ve learned a great deal today in using ETL in Power BI. You’ve also learned a more flexible option of using Skyvia ETL to power your reports in Power BI.
Do you have any questions and comments? Then, write them in the Comments section below.