Azure Data Factory Pipeline scenarios with 'Activity Timeout' on Large Data Set Copy
Hey everyone, I'm running into an issue that's driving me crazy. I'm stuck on something that should probably be simple. I'm stuck on something that should probably be simple. I'm trying to implement I'm working on a project and hit a roadblock. This might be a silly question, but I'm working with an scenario with an Azure Data Factory pipeline that fails with a 'Activity Timeout' behavior when trying to copy a large data set from Azure SQL Database to Azure Blob Storage... The data set is around 10 GB, and the behavior message reads: ``` Activity 'Copy Data' failed: The activity timeout expired. ``` I've set the timeout for the copy activity to 60 minutes in the pipeline settings, but it still fails after around 30 minutes. I suspect it might be related to the performance of the Azure SQL Database or the configuration of the copy activity itself. Hereβs the relevant part of my pipeline JSON definition: ```json { "name": "CopyDataPipeline", "activities": [ { "name": "Copy Data", "type": "Copy", "typeProperties": { "source": { "type": "SqlSource", "sqlReaderQuery": "SELECT * FROM MyLargeTable" }, "sink": { "type": "BlobSink", "copyBehavior": "PreserveHierarchy" } }, "policy": { "timeout": "01:00:00", "retry": 3, "retryInterval": "00:01:00" } } ] } ``` I've tried increasing the performance level of my Azure SQL Database from S1 to S3, but it did not resolve the timeout scenario. Additionally, I have enabled partitioning in the copy activity to optimize the copy operation: ```json "partitionOptions": { "enablePartitioning": true, "partitionColumn": "Id", "partitionCount": 4, "partitionRange": { "start": 1, "end": 1000000 } } ``` Despite these adjustments, the scenario continues. Is there a recommended strategy for handling large data sets in Azure Data Factory, or is there a specific setting I might be missing that could help avoid this timeout behavior? Any help would be greatly appreciated! What am I doing wrong? My development environment is Linux. Am I missing something obvious? What's the best practice here? Any feedback is welcome! I'm developing on Linux with Json. What am I doing wrong? Am I missing something obvious?