How to Properly Use `pd.concat()` with a Nested List of DataFrames Without Losing Indexing?
I've spent hours debugging this and I'm performance testing and I've hit a wall trying to I'm writing unit tests and I've been struggling with this for a few days now and could really use some help... I'm trying to concatenate multiple DataFrames stored in a nested list structure, but I'm running into issues with retaining the original indexes. I have a list of lists containing DataFrames, where each sublist represents a different category of data. When I use `pd.concat()` on these nested lists, the resulting DataFrame loses the original indexing, and I end up with a new integer index, which does not help in differentiating between the categories. Here's a simplified version of what I'm doing: ```python import pandas as pd df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]}) df3 = pd.DataFrame({'A': [9, 10], 'B': [11, 12]}) nested_dfs = [[df1, df2], [df3]] # Attempt to concatenate result = pd.concat([pd.concat(sublist) for sublist in nested_dfs]) print(result) ``` This outputs a DataFrame with a new index: ``` A B 0 1 3 1 2 4 0 5 7 1 6 8 0 9 11 1 10 12 ``` I expected to see a structure that preserves some indicator of which sublist each row comes from. I tried resetting the index before concatenation and using the `keys` parameter in `pd.concat()`, but those didn't seem to resolve the indexing issue. Here's what I tried: ```python result = pd.concat([pd.concat(sublist).reset_index(drop=True) for sublist in nested_dfs], keys=[0, 1]) ``` But this still doesnβt give me a clear hierarchical structure in the index. I'm using pandas version 1.5.3. Is there a way to achieve my desired result while maintaining clarity in the index? Any insights would be appreciated! For context: I'm using Python on Linux. I'd really appreciate any guidance on this. Any suggestions would be helpful. This is part of a larger desktop app I'm building. Thanks in advance! I'd love to hear your thoughts on this. This issue appeared after updating to Python LTS. Any feedback is welcome!