Use Azure Data Factory to Bring SAP Data to Cloud

Executive Summary

Being able to effectively derive business values from the massive enterprise data is more important than ever to the IT enabled business agility. In the era of digital transformations, businesses are shifting IT enterprise data solutions from on-premise to Cloud, to take advantage of the elasticity, scalability, security, and supportability of Cloud Storage and Compute power. Microsoft Azure is one of the best cloud data platforms on the market, with rich structured and unstructured data storage support, versatile analytics, and rich business insight user experiences.

When it comes to SAP data, most of the data sources are still reside in on-premise systems or IaaS systems. Bringing the massive and complex SAP data to Azure for scale-out analytics has been a challenge on the critical path to many SAP customers.

With the release of Azure Data Factory V2 SSIS Integration Runtime and AecorSoft 's smart SAP connectivity solutions, it is now possible to use all Azure based features to integrate with SAP data sources securely and productively, with amazingly high performance (much faster than BOBJ!), to help SAP customers make the most out of their SAP data with Azure Data Integration, Storage and Compute.

SAP Data Complexity

Integrating with SAP is always challenging to many SAP customers.

First, SAP’s data models are very powerful yet complex. Use SAP IDES Ehp7 as an example. There are more than 120K system tables with sophisticated relationship to keep data integrity throughout the business workflows. There are both normalized and denormalized data models. There are tabular and multi-dimensional data structures. Some tables have binary fields which cannot be interpreted directly but have to be translated through multiple function calls. Deep understanding of SAP system metadata and data models is extremely important.

Second, SAP offers strong security control and governance on its data. In most cases, the SAP data integration needs to go through the application layer, which is not as straightforward as working with database directly. This requires rich knowledge of SAP application layer APIs.

Third, doing incremental loads (or “delta loads”) is the only viable option when working with large tables, but the delta logic on some SAP tables is not easy. Some large tables do not even offer “Changed On” timestamp fields. In consideration of data modeling, logic of delta calculation, and logic of table joining, in many cases, the SAP standard extractors would be the better option to extract data, but then, working with SAP extractors requires deep know-how.

 

Advantage of Azure Data Azure Data Factory V2

Azure Data Factory (ADF) is the cloud based, serverless, scalable, highly available data integration service in Azure. Although ADF comes with many connectors out of the box, it was not enough to integrate with complex SAP data. The support for SAP data types was limited to dimensional data from BW and HANA, and ODATA endpoints which heavily depend on SAP NetWeaver Gateway ODATA configurations and development.

Comparing to ADF V1, the ADF V2 is a big leap forward, with SSIS support through the Integration Runtime (IR) feature. Customers are now able to lift and shift SSIS packages from on-premise to Azure, to fully take advantage of both the Azure benefit and the powerful SSIS extraction, transformation, and loading capabilities. This is especially valuable to customers who already invested in SSIS. It is also the enabler for the opportunity of direct integration with SAP complex data models without intermediate data staging hops.

 

The Journey of Bringing SAP data to Azure

One key value of ADF V2 is the ability to lift and shift SSIS packages. Before the data factory is explored, SSIS and SAP integration has to be built. Leveraging “AecorSoft Integration Service for SSIS”, it is now possible to use SSIS to extract large amount of data from SAP in extremely performing way, even multiple times faster (with special turbo function enabled) than SAP’s own BOBJ Data Services.

One key value of ADF V2 is the ability to lift and shift SSIS packages. Before the data factory is explored, SSIS and SAP integration has to be built. Leveraging "AecorSoft Integration Service for SSIS", it is now possible to use SSIS to extract large amount of data from SAP in extremely performing way, even multiple times faster (with special turbo function enabled) than SAP’s own BOBJ Data Services.

Recent performance test shows that AecorSoft is more than 3 times faster than BOBJ Data Services in extracting SAP tables.

 

High Level Architecture

Feature Introduction

"AecorSoft Integration Service for SSIS" is a set of Data Source Components, which enable SSIS data flows to extract massive SAP data. It is one of the most versatile products on the marketplace, without hard dependency on any custom “Z” function in SAP. Built with sophisticated data processing algorithms and deep insight to SAP’s complex data models, AecorSoft Integration Service is able to support multiple types of SAP ECC / BW / HANA data, including Table, Table Join, InfoCube, BEx Query, InfoSet Query, BAPI/RFC function, and Extractor, and provide the best performance and user experience.

SSIS developers are now able to navigate and discover SAP metadata at ease, and treat SAP data source just like an ordinary source, while under the hood the components access SAP through SAP application layer, with full compliance to SAP’s security and authorization models.

AecorSoft Integration Service makes “fast datawarehousing” possible, by significantly reducing the integration efforts with SAP data sources.

Supported Kinds of SAP Data

SAP Table

SAP Table Join

SAP BAPI/RFC

InfoSet Query

SAP InfoCube / BEx Query (QueryCube)

SAP Extractor

HANA objects

 

ADF V2 and SAP

A successful ADF V2 + SAP integration setup consists four aspects:

  1. the tools preparation;
  2. the Azure objects preparation;
  3. the AecorSoft SAP component installation;
  4. the SSIS job deployment / execution.

Tools Preparation

Three tools are required: Azure PowerShell, Storage Explorer, and SQL Server Management Studio.

Azure PowerShell

Azure PowerShell is used for cmdlet executions, to set up variables and manage the Azure IR. Install the latest executable from https://github.com/Azure/azure-powershell/releases/ .

Azure Storage Explorer

Azure Storage Explorer is used to interact with the Azure storage account containers to store setup script and inspect the setup logs. It can be downloaded from http://storageexplorer.com/ .

SQL Server Management Studio

Version 17.2 or later of SQL Server Management Studio (SSMS) is required. Latest SSMS can be downloaded from https://docs.microsoft.com/sql/ssms/download-sql-server-management-studio-ssms .

 

Azure Objects Preparation

(Notice that example names are provided here to help cross reference with PowerShell cmdlet example. Please avoid using these names in your configuration but choose your own.)

With the proper Azure subscription, these Azure objects need to be provisioned:

Azure Storage Account

The storage account is used to host the SSIS SAP component setup script.

On Azure portal, create a new storage account via: Create a resource -> Storage -> Storage account (use the name “aecorsoftstorageaccount” as example).

In the storage account, create a blob container (use the name “aecorsoftblobcontainer” as example). The setup script and the installer file for AecorSoft Integration Service for SSIS, which is the set of core SAP components to allow SSIS to integrate with SAP, will be put in this container.

Azure SQL Database

Azure SQL Database is used to host the deployed SSIS jobs.

On Azure portal, create a new database via: Create a resource -> Databases -> SQL Database.

Note down the DB name (use the name “aecorsoftdb” as example), server name (use the name “aecorsoftdbserver” as example, whose full name will be “aecorsoftdbserver.database.windows.net”), admin user name and password.

Azure Data Factory

On Azure portal, create a new data factory via: Create a resource -> Data + Analytics -> Data Factory, then give the name (use the name “aecorsoftadfv2” as example), select subscription, resource group, select V2 version (very important), and location, and click “Create”.

Azure-SSIS IR

Azure-SSIS IR is used as the compute infrastructure for SSIS execution in Azure.

Once the V2 data factory is created, in the Azure Data Factory “Let’s get started” page, click “Configure SSIS Integration Runtime”. Specify the name (use the name “aecorsoftadfv2ssisir” as example) and other required information.

Next, on the “SQL Settings” page, put in the Azure SQL database endpoint name (“aecorsoftdbserver.database.windows.net” as previously created), admin user name, password, and other required information.

Next, configure the maximum parallel executions per node.

If Virtual Network is needed (e.g. for accessing on-premise SAP system from Azure), tick the VNet checkbox and fill out the requirement VNet information.

Now, Azure will spend 20 - 30 minutes to complete the IR related setup, including the initiation of SSISDB and the preparation of the SSIS catalog.

The steps above can also be done programmatically through PowerShell scripts.

 

AecorSoft SAP Component Installation

In order for SSIS to access SAP, AecorSoft Integration Service for SSIS needs to be installed. There are two scenarios: 1) development oriented component installation; 2) deployment and run oriented component installation.

Development Oriented SAP Component Installation

The development of SSIS packages is done locally in Visual Studio with SQL Server Development Tools. After the execution of AecorSoft component installer, multiple SAP source components will show up in the SSIS toolbox:

This will enable SSIS to integrate with SAP and access multiple types of SAP objects. Once the SSIS project is completed, it is ready to be deployed to Azure Data Factory V2 engine.

Deployment and Run Oriented SAP Component Installation

The AecorSoft SAP components need to be installed in Azure-SSIS IR such that ADF V2 is able to execute the deployed SSIS jobs.

Prepare the setup script

The setup script has to have the name “main.cmd”, which will be provided as part of AecorSoft installation package.

Next, use Azure Storage Explorer to upload “main.cmd” to the blob container. In the example, the blob container is named “aecorsoftblobcontainer”. Please use your own blob container name.

Next, create the Shared Access Signature (SAS) URI by right clicking on the blob container and select “Get Shared Access Signature”. Pick the proper validity period, and choose Read, Write and List under “Permissions”. Then, copy and save the URI.

Prepare the PowerShell variables

Make sure the IR is off. Open PowerShell in Admin mode. Execute these statements (replace the example names with actual object names):

$DataFactoryName = "aecorsoftadfv2"
$AzureSSISIRName = "aecorsoftadfv2ssisir"
$ResourceGroupName = "aecorsoftstorage"
$SetupScriptContainerSasUri = "https://......"

Start Azure-SSIS IR and Execute the Setup Script

Set-AzureRmDataFactoryV2IntegrationRuntime -DataFactoryName $DataFactoryName -Name $AzureSSISIRName -ResourceGroupName $ResourceGroupName -SetupScriptContainerSasUri $SetupScriptContainerSasUri

Start-AzureRmDataFactoryV2IntegrationRuntime -DataFactoryName $DataFactoryName -Name $AzureSSISIRName -ResourceGroupName $ResourceGroupName

After 20-30 minutes, the IR will be up and running, and ready to execute SSIS jobs in cloud.

 

SSIS Job Deployment and Execution

Use Visual Studio or SSMS to deploy the SSIS project to the Azure SQL Database which is created previously (the example is “aecorsoft.database.windows.net”).

Make sure the IR is up and running. Once the SSIS project deployment is successful, execute the project though SSMS, and ADF V2 and IR engine will execute the SSIS job in the IR cluster.

The job status can be monitored in SSMS. Please notice that at this moment, the SSIS execution status cannot be monitored in Azure Data Factory portal in the web browser.

 

Conclusion

The power of Azure Data Factory V2 with SSIS IR, the superb data extraction and data management features of SSIS, and the versatility of the AecorSoft SAP components lead to an efficient and elegant end-to-end solution to bring SAP data to Azure securely and productively. Please contact AecorSoft via online form or email for details of the SAP component software.

By delivering SAP data to Azure through ADF V2 with high performance and low latency, customers are now able to conduct advanced analytics on the massive SAP data in cloud, and truly take advantage of the elasticity of clout computing to drive efficient business insights.