Azure Data Factory (ADF), is a great tool to schedule and orchestrate cloud activities. Although ADF has many features to cover various data integration needs, it is not as flexible as its on-premises predecessor, SQL Server Integration Services (SSIS). To address these limitations, ADF v2 now includes Execute SSIS activity. In this post, I leverage SSIS with Power Query as a Source for transformation.
When an Excel file is posted in the Microsoft teams channel, it should be picked up for processing and after processing the processed data should be loaded into the Azure SQL database. For Processing the data as I mentioned we use Power Query as a Source in SSIS Package.
The Solution proposed here involves direct interaction of logic apps between Microsoft Teams, Azure Blob Storage, and Azure Data Factory(ADF). This Solution also involves SSIS – IR to run SSIS packages in ADF Pipeline.
Tools and Services used
- Visual Studio code for development of SSIS package
- Azure Logic Apps
- Azure Data Factory (ADF)
- Azure Blob Storage
- Azure Sql Database
Part 1 Developing SSIS Package in Visual Studio
A simple SSIS Package has been developed as shown below, we used SuperBowl Dataset for testing.
Execute SQL Task Will Delete all the records in Azure SQL Database Table before loading any, Since this is a routine procedure in Prod to clear the staging table before loading it with new data.
Azure SQL DB schema
Configure connection properties for Azure SQL DB and input SQL Statement as shown below
Data Flow Task
The Dataflow task comprises Power Query Source, Data Conversion, and OLE DB Destination
Power Query Source
The adoption of Power BI Desktop and Power BI service has been gaining movement ever since it has been released. Power Query is used to import/transform the data into the model and Power View is used to visualize the data in reports.
The ability to load and transform the various data sources using menus makes the tool widely available to business users. The Power Query functional language known as “M” is at the heart of the import/ transform process. Microsoft announced the general release of SSIS Projects version 3.3 for Visual Studio 2019. Now any Power Query developed by the business users can now be replaced by an SSIS data flow. This package can be scheduled to run on-premises or in-cloud. Alerting can be added to the package for notification of errors during execution. The data source of the Power BI report can be changed to a pre-processed SQL table instead of the original data source that requires transformation. An UPSERT data movement pattern can guarantee that data is always available for reporting.
But using Power Query as a source in the SSIS package is a bit tricky, and it involves a lot of tweaks to be made in order to work properly. In this post, I discuss one of the methods of how to make Power Query Source extract data from EXCEL sheets located in Azure Blob and load the transformed data into Azure SQL DB. With this solution, any business user can have self-serve data transformation on cloud-scale writing, no code.
Power Query M Script Development
Power Query Scripts are developed from Power BI Desktop. Using Power BI Desktop we can do all necessary transformations and finally, we can copy the Power Query query from the advanced Editor.
Copy the M code and paste it into SSIS Power Query Source in SSIS Package
Once pasted, go to connection manager and click detect data source. After that, create a new connection with Azure Blobs as a source, pass on appropriate credentials, and test the connection.
To load data into Azure SQL DB, we need to do some data conversion to match the destination table data type.
Once it has been done, go to OLE DB Destination and do the necessary connection and mappings
Once all mappings and connections are set, make sure that you do not have any errors,
Warnings may be ignored for testing. Now we execute the SSIS package.
Now verify the target table
Check the record count, which should match with SSIS Pkg execution results.
Now we have a working SSIS Pkg which pulls data from Azure Blob storage and does the transformation using Power query and loads into Azure SQL DB.
In the next part, we will execute this SSIS package in the Azure ADF pipeline using Execute SSIS package task.0