Using Azure Data Factory to Bring SAP Data to Azure via Self-Hosted IR and SSIS IR

Abstract

With the innovation from Microsoft on the interoperability between Self-hosted IR (SHIR) and SSIS IR and the availability of SSIS IR Express Custom Setup, the experience of bringing complex SAP data to Azure is further improved. This technical paper explains the new approach of setting up ADF SHIR and SSIS IR to bring SAP data to Azure storage with extremely high performance, scalability, security and productivity, with virtually no-code configuration and deployment.

Executive Summary

Two years ago, we introduced the approach of leveraging Azure Data Factory to bring high volume and complex SAP data to Azure via ADF SSIS-IR. Though it was a great PaaS ETL experience to run SAP data extractions in the cloud, quite some manual steps were still required to prepare the installation script and deployment steps. The firewall configuration between Azure and on-premise SAP was also a headache to a lot of SAP customers because none of the existing ways like using SAP Router, setting up VNET or using Express Route is easy.

With the newly introduced interoperability between SHIR and SSIS-IR, finally we have a much elegant way to seamlessly design and deploy SAP ETL SSIS packages to Azure with the full confidence of enterprise security compliance. There is no need to spend additional effort customizing firewall rules anymore.

Also, with SSIS IR Express Custom Setup, there is no more customization of the installation script for AecorSoft component in SSIS IR.

The value propositions of utilizing SHIR and SSIS IR to bring SAP to Azure are obvious:

  • Standardized E2E design and deployment without coding or customization;

  • Complete enterprise security compliance between Azure and SAP instances;

  • Further improved DevOps efficiency to manage SAP ETL delivery in Azure;

  • Unmatched performance and versatility of SAP data extraction.

Architecture

How-To

This is as easy as 1-2-3:

  1. SSIS Package Design;

  2. ADF Configuration;

  3. Deployment and Execution in ADF.

     

Step 1: SSIS Package Design

Preparation

For Visual Studio 2019, ensure the latest SSIS Project Extension for Visual Studio is installed, which is available on Visual Studio Marketplace at https://marketplace.visualstudio.com/items?itemName=SSIS.SqlServerIntegrationServicesProjects .

For Visual Studio 2017, latest standalone SSDT for VS2017 is required which can be downloaded from https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt?view=sql-server-2017#ssdt-for-vs-2017-standalone-installer . Please notice that upgrading existing Visual Studio 2017 via Visual Studio Installer may not bring in the latest SSDT. Therefore, it is still recommended to execute the SSDT for VS2017 standalone installer.

Development

First, create a new connection to SAP by using AecorSoft Connection Manager “AISAP”:

Then, set the “ConnectByProxy” property to “True”:

This setting is critical to ensure the interoperability between SHIR and SSIS IR.

In the development environment, it is fine if the Microsoft Self-hosted Integration Runtime Host Service is not installed, or installed but turned off. The unit test of SSIS package execution from within Visual Studio is not impacted by this setting. Therefore, simply set this “ConnectByProxy” to “True” in the development environment.

As an example, a simple SSIS data flow is developed to extract SAP MARA data to Azure Data Lake Gen2 storage:

Step 2: ADF Configuration

Two steps are required: SHIR configuration and SSIS IR configuration.

SHIR Configuration

Through ADF Portal, create an Azure Self-Hosted IR. Then, download and install the Self-Hosted IR runtime on the on-premise computer.

SSIS IR Configuration

Make sure the deployment strategy (SSISDB, MSDB, or file based) is decided in advance.

Through ADF Portal, start to create Azure SSIS IR. In the Advanced Settings, first check “Customize your Azure-SSIS Integration Runtime with additional …”, and then click “+ New”:

In next step, choose “Install licensed component”, “AecorSoft’s Integration Service”, and then enter the license key obtained from AecorSoft:

Now it is time to enable the interoperability with SHIR. Check “Set up Self-Hosted Integration Runtime as a proxy” :

Then, select the previously created Self-Hosted IR.

The next important configuration is the “Staging storage linked service”. The purpose of the storage is to act as the staging between SHIR and SSIS IR. If there has not been any Blob storage configured previously yet, it is the time to choose “New” to create a new one. If a new storage linked service is being created, make sure “AutoResolveIntegrationRuntime” is selected under “Connect via integration Runtime”.

Finally, the SSIS IR is created.

 

Step3: Deployment and Execution in ADF

Assume we would use the SSISDB way to deploy the SSIS project.

Deployment

From Visual Studio development environment, in the Deployment Wizard, choose “SSIS in Azure Data Factory”:

Provide the Azure SQL database server name (in the form of “<name>.database.windows.net”, enter the user name and password, and deploy the SSISDB.

Execution

In ADF Portal, create a new ADF pipeline, select the “Execute SSIS package” Activity:

We get this:

Set the basic settings for the package:

Under “Connection Managers”, we can verify that the value for “ConnectByProxy” property is “True”:

Now the pipeline can be triggered or scheduled to run successfully:

 

Conclusion

Thanks to Azure Data Factory product team, now we have a beautiful experience to bring SAP data to Azure through the seamless interoperability between SHIR and SSIS IR, as well as the much simplified SSIS IR Express Custom setup experience. This End-to-end configuration does not require any additional customization in the CorpNet firewall, and does not require any custom installation script to be maintained.

SAP customers can now use Microsoft ADF and AecorSoft’s SAP Premium Certified software to extract SAP Table, Transparent/Cluster/Pool Table Join, Extractor, InfoCube, DSO, InfoSet, BAPI, Hierarchy, S/4 CDS View, and Long Text to Azure data platform in extremely performing, scalable and secure manner.