Being able to effectively derive business values from the massive enterprise data is more important than ever to the IT-enabled business agility. In the era of digital transformations, businesses are shifting IT enterprise data solutions from on-premise to Cloud, to take advantage of the elasticity, scalability, security, and supportability of Cloud Storage and Compute power. With Microsoft Azure, one can store rich structured and unstructured data for analytics purposes.
When it comes to SAP data, most of the data sources are still reside in on-premise systems or IaaS systems. Bringing the massive and complex SAP data to Azure for scale-out analytics has been a challenge on the critical path to many SAP customers.
SAP Data Complexity
Integrating with SAP is always challenging for many SAP customers.
First, SAP’s data models are very powerful yet complex. A deep understanding of SAP system metadata and data models is extremely important. Second, SAP offers strong security control and governance on its data. In most cases, the SAP data integration needs to go through the application layer, which is not as straightforward as working with the database directly. This requires rich knowledge of SAP application layer APIs.
Third, doing incremental loads (or “delta loads”) is the only viable option when working with large tables, but the delta logic on some SAP tables is not easy. Some large tables do not even offer “Changed On” timestamp fields. In consideration of data modeling, the logic of delta calculation, and logic of table joins, in many cases, the SAP standard extractors would be the better option to extract data, but then, working with SAP extractors requires deep know-how.
Advantage of Azure Data Factory V2
Azure Data Factory (ADF) is the cloud-based, serverless, scalable, highly available data integration service in Azure.
Comparing to ADF V1, the ADF V2 is a big leap forward, with SSIS support through the Integration Runtime (IR) feature. Customers are now able to lift and shift SSIS packages from on-premise to Azure, to fully take advantage of both the Azure benefit and the powerful SSIS extraction, transformation, and loading capabilities. This is especially valuable to customers who have already invested in SSIS. It is also the enabler for the opportunity of direct integration with SAP complex data models without intermediate data staging hops.
ADF SAP ECC Adapter
The SAP ECC Adapter can be used to extract data from an SAP system using the oData protocol. So any object that can be exposed as an oData service can be used. For example :
- SAP tables or Views
- Data Extractors
- HANA Tables and Views exposed via HANA XS
Note that this also means the Adapter is not limited to ECC also. For more info on the ECC Adapter see Copy data from SAP ECC using Azure Data Factory
In this example we’ll extract product data. The system I’m using is a ABAP 7.5 SP2 – Developer Edition (which is running on Azure). Please have a look at ABAP Developer Edition Installation Guide to get hold of the system.
This system contains an OData service that lists product data for a WebShop. The product data can be seen in a Fiori App at the following URL
https://sapprd:44300/sap/bc/ui5_ui5/ui2/ushell/shells/abap/FioriLaunchpad.html#Shell-home. See the Manage Products app.
- the URLs assume you’ve mapped the hostname (eg. sapprd) of your SAP system to an IP address in your hosts file.
- the user needs the roles SAP_EPM_BCR_PURCHASER_T to see the Manage Products App. Use transaction su01 to assign these roles
The product data can be retrieved via
http://spapprd:8000/sap/opu/odata/sap/EPM_REF_APPS_PROD_MAN_SRV/Products. The meta data of the OData Service can be found at
The target for the extraction will be a Azure SQL DataBase.
Azure SQL DataBase Setup
First, we must create an SQL DataBase and prepare it to receive the product data. To create an Azure SQL Database. We need to create a table that will contain the product data. You can do this with the following SQL script
create table NPLProducts ( id nvarchar(25) not null, currencycode nvarchar(3), stockquantity int, name nvarchar(50), description text, subcategoryid nvarchar(50), subcategoryname nvarchar(50), maincategoryid nvarchar(50), maincategoryname nvarchar(50), supplierid nvarchar(50), suppliername nvarchar(50), lastmodified date, price decimal, quantityunit nvarchar(10), measureunit nvarchar(10), PRIMARY KEY (id) );
The SQL Statements can be executed via the Query Editor in the Azure Portal or via Azure Data Studio. For more info on Azure Data Studio, see What is Azure Data Studio.
Azure Data Factory Setup
First, you need to create a Data Factory. In this Data Factory, we’ll need to create a connection to the SAP System using the SAP ECC Connector. As URL you’ll need to use the base URL of the OData Service: http://x.x.x.x:8000/sap/opu/odata/sap/EPM_REF_APPS_PROD_MAN_SRV/. Note: you need to use the IP address of your SAP system here unless you have a proper DNS entry for your SAP hostname.
Linked service for Azure SQL Database
We also need to define a connection to Azure SQL Database
Defining Data Sets
As a next step you need to define the DataSets. For the SAP System:
If the URL entered within the Connection is correct, ADF will display a drop-down list of the Entity Sets listed in the OData Metadata. Use preview data to test the connection.
Data set for Azure SQL Database
As the last step we can define a pipeline to extract from SAP (source) to SQl Server (sink). For this we use the Copy Data Action. As Source select the SAP Product DataSet and as sink the SQL Product DataSet. Also verify the mapping.
To test the Pipeline you can use ‘Add Trigger > Trigger Now’. This will start the pipeline.
After a successful run, you can use SQL to verify the result.
SELECT * FROM [dbo].[NPLProducts]
So for we had done the initial load into the SQL Db, but for successive incremental loads, we have to devise logic in such a way that every time there was a change or would only like to synchronize the modified products. See you with the logic in the next post0