CodexBloom - Programming Q&A Platform

Azure Data Factory: How to Handle Duplicate Rows in Sink When Using Mapping Data Flow?

๐Ÿ‘€ Views: 2 ๐Ÿ’ฌ Answers: 1 ๐Ÿ“… Created: 2025-06-08
azure-data-factory data-flow data-transformation json

I'm integrating two systems and I've looked through the documentation and I'm still confused about I'm using Azure Data Factory (ADF) with Mapping Data Flow to transform data from an Azure SQL Database to an Azure Blob Storage destination. However, I am working with an scenario where duplicate rows are being written to the sink when the source table contains duplicate records. The source dataset has a primary key, but it seems like the data flow does not take this into account during the transformation. I have tried using the `Aggregate` transformation to group by the primary key, but this doesn't help because the duplicates may also contain different values for other columns which I need to retain. Hereโ€™s a simplified version of my data flow: 1. **Source**: Azure SQL Database (table `Orders`) 2. **Transformation**: `Select` to filter specific columns, followed by `Aggregate` to group by `OrderID`. 3. **Sink**: Azure Blob Storage (CSV format). The question is that in the sink output, I see rows like this: | OrderID | ProductName | Quantity | |---------|-------------|----------| | 1 | Widget | 10 | | 1 | Widget | 15 | I expected to only have one row per `OrderID`. Hereโ€™s the ADF Mapping Data Flow snippet I used for aggregation: ```json { "name": "Aggregate", "inputs": ["Source"], "groupBy": ["OrderID"], "aggregates": [{ "name": "Quantity", "aggregateFunction": "sum" }], "outputs": ["Sink"] } ``` Iโ€™ve also tried adding a `Filter` transformation before the `Aggregate`, but it did not resolve the scenario. The pipeline executes successfully, but I end up with duplicates in the output. Is there a best practice for ensuring that only unique records are written to the sink in this scenario? Any suggestions for modifications or a different approach would be greatly appreciated. My development environment is Linux. This is part of a larger service I'm building. Any ideas what could be causing this? This is my first time working with Json 3.9. What are your experiences with this?