Tips and tricks: Handling large data in Power BI using the dataflow

Nishan
4 min readAug 29, 2021
Photo by patricia serna on Unsplash

Overview of the Datasets size and limit

The dataset size Power BI can handle depends on the premium plan and depends on the maximum capacity of the SKU. The dataset can get as big as 12 GB after being published in the premium workspace.

The recommended SKUs for .pbix file upload or publish to the Power BI service:

  • P1< 3 GB (If the premium plan is P1 then your .pbix file should be less than 3 GB)
  • P2< 6 GB (If the premium plan is P2 then your .pbix file should be less than 6 GB)
  • P3, P4, P5 up to 10 GB (If the premium plan is P3, P4, P5 then your .pbix file should be less than 10 GB)

Large data load and Import Mode

An import model is loaded when it is queried (interactive operation), or if it needs to be refreshed (background operation).

When working on the large data using the import model people often encounter slow response and data load time in Power BI Desktop. Wait time could go up to an hour. For the user to be able to continue working on the report, import model process the query and fully load the data into the memory which is time consuming.

Even making the small changes on the power query editor it keeps on processing the full data load.

Dataflow

Photo by Sigmund on Unsplash

Simply put, a flow of data from one point to another. The use of dataflow is pretty common these days in Power BI. Dataflow in many ways is equivalent to the Power Query Editor from Power BI Desktop. The users will have the ability to mend and bend the data in the dataflow itself along with the additional features.

Dataflow use case scenario is expanding frequently with it becoming special instrument to handle special needs. Once the users setup the dataflow they can bypass the need to deal with the gateway in the Power BI service. Incremental refresh is considered as one of the major benefits of using the dataflow.

Dataflow for the large data

Photo by Andrey Sharpilo on Unsplash

To avoid the situation of query execution timeout and long wait time on the import model (where data has to fully load on any changes) dataflow is really useful. By limiting the amount of data in the dataflow and using it to develop the report the user can save good amount of time and can effectively work on the report and not worry about endless waiting time for query to execute.

Below are the steps to perform this operation:

  1. Create a dataflow.
  2. Limit the number of the data in the dataflow. Keep only good enough data for developing the report.
  3. Identify the column which can help you audit the records eg. date column or any column with the primary key.
  4. Start using the dataflow in developing the report in the Power BI Desktop. You will notice the data load getting faster.
  5. After you’re done working on the report publish it to the workspace.
  6. Open the dataflow you’ve created (dataflow resides in the workspace) and release the data cap. Now, the dataflow is pulling everything.
  7. Refresh the dataflow for it to capture every data from the table. This will take a quite a bit of time since you’ve released the data cap from the dataflow.
  8. Once the dataflow refresh is complete, refresh the dataset. This will also take some good amount of time.
  9. Apply the incremental refresh on the dataflow. This will help your dataflow and datasets refresh faster by pulling only those records that are not in the tables. Your refresh time on the dataflow and the dataset should be faster after first complete run once you apply the incremental refresh.
  10. Repeat this process every time you’re working on the report that’s pulling data from large tables.

Furthermore, the recent August 2021 feature came up with Evaluation configuration settings which evaluate the queries simultaneously instead of one-by-one contributing the faster process time. Follow more on the August 2021 complete feature updates by clicking here.

--

--