What are Data Flows

23-01-2019  0 Comment(s)

The new Azure Data Factory (ADF) Data Flow capability is analogous to those from SSIS: a data flow allows you to build data transformation logic using a graphical interface. A really interesting aspect about ADF Data Flows is that they use Azure Databricks as the runtime engine underneath -- however, you don't actually have to know Spark or Databricks in order to be able to use ADF Data Flows. The goal is for it to be a low code/no code way to transform data at scale.

Power BI Dataflows (yes, this one is branded as one word) are a new type of object in a Power BI Workspace which will allow you to load data into a Common Data Model. Data is loaded via a web-based version of Power Query, which is why this capability is referred to as self-service data prep. The resulting data is stored in Azure Data Lake Storage Gen 2. Once in the Common Data Model in the data lake, it can be reused among various Power BI datasets -- allowing the data load, transformations, and cleansing to be done once rather than by numerous PBIX files. This capability was known for a little while during the private preview as Power BI Datapools or as 'Common Data Service for Analytics' (CDS-A) -- but the final name looks like it's going to be Power BI Dataflows.

It's still early so there's not a lot of info available online yet. James Serra wrote up a nice summary and has a few links on his blog. Also, here's a diagram that Chris and I included in the recently updated whitepaper Planning a Power BI Enterprise Deployment which shows our initial understanding of the Power BI Dataflows capability:

Comment Here


No Comments to Show