Run SSIS Project with Power query as a Source using Azure File Share and call SSIS Packages in ADF Pipeline

Run SSIS Project in Azure SSIS-IR with Power query as a Source using Azure File Share and call SSIS Packages in ADF Pipeline

Introduction

Azure Data Factory (ADF), is a great tool to schedule and orchestrate cloud activities. Although ADF has many features to cover various data integration needs, it is not as flexible as its on-premises predecessor, SQL Server Integration Services (SSIS). To address these limitations, ADF v2 now includes Execute SSIS activity. In this post, I leverage SSIS with Power Query as a Source for transformation.

Business Requirement

When an Excel file is posted in the Microsoft teams channel, it should be picked up for processing and after processing the processed data should be loaded into the Azure SQL database. For Processing the data as I mentioned we use Power Query as a Source in SSIS Package.

Solution

Solution Architecture
Solution Architecture

The Solution proposed here involves direct interaction of logic apps between Microsoft Teams, Azure Blob Storage, and Azure Data Factory(ADF). This Solution also involves SSIS – IR to run SSIS packages in ADF Pipeline. 

Tools and Services used

  1. Visual Studio code for development of SSIS package
  2. Azure Logic Apps
  3. Azure Data Factory (ADF)
  4. Azure Blob Storage
  5. Azure Sql Database

Part 1 Developing SSIS Package in Visual Studio 

A simple SSIS Package has been developed as shown below, we used SuperBowl Dataset for testing.

SSIS Pkg

Execute SQL Task Will Delete all the records in Azure SQL Database Table before loading any, Since this is a routine procedure in Prod to clear the staging table before loading it with new data.

DELETE  ssis.[superbowl_dest]

Azure SQL DB schema

Azure SQL DB schema

Configure connection properties for Azure SQL DB and input SQL Statement as shown below

Execute SQL Task

Data Flow Task

The Dataflow task consists of Power Query Source, Data Conversion, and OLE DB Destination

Data Flow Task

Power Query Source

The adoption of Power BI Desktop and Power BI service has been gaining movement ever since it has been released.  Power Query is used to importing/transform the data into the model and Power View is used to visualize the data in reports.

The ability to load and transform the various data sources using menus makes the tool widely available to business users.  The Power Query functional language known as “M” is at the heart of the import/ transform process. Microsoft announced the general release of SSIS Projects version 3.3 for Visual Studio 2019. Now any Power Query developed by the business users can now be replaced by an SSIS data flow.  This package can be scheduled to run on-premises or in-cloud.  Alerting can be added to the package for notification of errors during execution.  The data source of the Power BI report can be changed to a pre-processed SQL table instead of the original data source that requires transformation.  An UPSERT data movement pattern can guarantee that data is always available for reporting.

But using Power Query as a source in the SSIS package is a bit tricky, and it involves a lot of tweaks to be made in order to work properly. In this post, I discuss one of the methods of how to make Power Query Source extract data from EXCEL sheets located in Azure Blob and load the transformed data into Azure SQL DB. With this solution any business user can have self-serve data transformation on a cloud scale without writing any code.

Power Query M Script Development

Power Query Scripts are developed from Power BI Desktop. Using Power BI Desktop we can do all necessary transformations and finally, we can copy the Power Query query from the advanced Editor.

Power Query Output
PQ_Advanced Editor

Copy the M code and paste it into SSIS Power Query Source in SSIS Package

PQ_Query_SSIS

Once pasted, go to connection manager and click detect data source, after that create a new connection with Azure Blobs as a source, pass on appropriate credentials, and test connection.

PQ_Blob_TestConnection

To load data into Azure SQL DB we need to do some data conversion to match the destination table data type.

Data Type Conversion
Data Type Conversion Settings

Once it has been done go to OLE DB Destination and do the necessary connection and mappings

OLE Db
OLE Db Mappings

Once all mappings and connections are set, make sure that you do not have any errors,

Final SSIS Pkg

Warnings may be ignored for testing. Now we Execute the SSIS package

Successful SSIS Pkg Execution
Sucessful Data Flow Task Execution

Now verify the target table

Target table with data loaded by running SSIS Pkg

Check the record count, which should match with SSIS Pkg execution results.

Record count

Now we have a working SSIS Pkg which pulls data from Azure Blob storage and does the transformation using Power query and loads into Azure SQL DB.

In the next part, we will execute this SSIS pkg in Azure ADF pipeline using Execute SSIS package task.

0

Leave a Comment

Your email address will not be published. Required fields are marked *