Tuesday, December 26, 2023

Running SSIS packages in Azure Data Factory - scaling and monitoring

Lifting and shifting SSIS packages to Azure Data Factory (ADF) can provide several benefits. By moving your on-premises SSIS workloads to Azure, you can reduce operational costs and the burden of managing infrastructure that you have when you run SSIS on-premises or on Azure virtual machines. 

You can also increase high availability with the ability to specify multiple nodes per cluster, as well as the high availability features of Azure and of Azure SQL Database. You can also increase scalability with the ability to specify multiple cores per node (scale up) and multiple nodes per cluster (scale out) - see Lift and shift SQL Server Integration Services workloads to the cloud

To lift and shift SSIS packages to ADF, you can use the SSIS Integration Runtime (IR) in ADF. The Azure SSIS-IR is a cluster of virtual machines for executing SSIS packages. You can define the number of cores and compute capacity during the initial configuration (Lift and shift SSIS packages using Azure Data Factory on SQLHack)

Even though there is Microsoft article which explains how to Configure the Azure-SSIS integration runtime for high performance, there is not a lot of guidance of how to run it at the lowest possible cost but still being able to complete the jobs. So would you recommend a higher sizing running on a single node or running a lower sizing on multiple nodes? Based on experience, it seems perfectly possible to run most jobs on a single node and up until now we have been running all of them on a D4_v3, 4 cores, 16GB Standard. If you decide to run it on a lower configuration, it would recommend monitoring failures, capacity usage and throughput. (See Monitor integration runtime in Azure Data Factory for more details)



Reference: