Handling Non-Unique MultiIndex Columns in Pandas DataFrame during Concatenation

👀 Views: 44 💬 Answers: 1 📅 Created: 2025-08-24

I'm working on a personal project and I'm trying to concatenate multiple DataFrames that have non-unique column names and a MultiIndex for the columns. I'm using Pandas version 1.4.2. My goal is to keep these non-unique columns identifiable after concatenation. However, when I attempt to concatenate these DataFrames with `pd.concat()`, I receive a warning about duplicate columns, and I end up with unexpected data loss. Here's a simplified version of my code: ```python import pandas as pd # Creating two DataFrames with non-unique column names and MultiIndex arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']] index = pd.MultiIndex.from_arrays(arrays, names=('letters', 'numbers')) df1 = pd.DataFrame({ ('A', 'one'): [1, 2], ('A', 'two'): [3, 4], ('B', 'one'): [5, 6], ('B', 'two'): [7, 8] }) df2 = pd.DataFrame({ ('A', 'one'): [9, 10], ('A', 'two'): [11, 12], ('B', 'one'): [13, 14], ('B', 'two'): [15, 16] }) # Attempting to concatenate along axis=0 result = pd.concat([df1, df2], axis=0) print(result) ``` After running this code, I receive a warning: `FutureWarning: Index has duplicates.` The resulting DataFrame seems to lose rows, and I want to easily distinguish between the identical columns in the output, which makes it hard to analyze the data further. I’ve read through the Pandas documentation and tried using the `keys` parameter in `pd.concat()`, but it seems like it only adds an additional layer to the index rather than resolving the duplication scenario. What’s the best way to handle this situation and ensure that I can concatenate my DataFrames without losing any data or causing confusion due to duplicate column names? Any pointers in the right direction?