Tuning Merge Sort for Large Datasets in C++ - Unexplained Performance Degradation

👀 Views: 80 💬 Answers: 1 📅 Created: 2025-08-27

Can someone help me understand I'm having a hard time understanding After trying multiple solutions online, I still can't figure this out....... I'm currently implementing the Merge Sort algorithm in C++ to sort large datasets, but I've noticed significant performance degradation when sorting arrays larger than 1 million elements. I used the following implementation: ```cpp #include <iostream> #include <vector> #include <algorithm> void merge(std::vector<int>& arr, int left, int mid, int right) { int n1 = mid - left + 1; int n2 = right - mid; std::vector<int> L(n1); std::vector<int> R(n2); for (int i = 0; i < n1; i++) L[i] = arr[left + i]; for (int j = 0; j < n2; j++) R[j] = arr[mid + 1 + j]; int i = 0, j = 0, k = left; while (i < n1 && j < n2) { if (L[i] <= R[j]) { arr[k++] = L[i++]; } else { arr[k++] = R[j++]; } } while (i < n1) { arr[k++] = L[i++]; } while (j < n2) { arr[k++] = R[j++]; } } void mergeSort(std::vector<int>& arr, int left, int right) { if (left < right) { int mid = left + (right - left) / 2; mergeSort(arr, left, mid); mergeSort(arr, mid + 1, right); merge(arr, left, mid, right); } } int main() { std::vector<int> data(1000000); std::generate(data.begin(), data.end(), rand); mergeSort(data, 0, data.size() - 1); return 0; } ``` I am using GCC 10.2 with optimization flags `-O2` and `-std=c++11`. When testing with smaller datasets, the function performs as expected, but the performance drops dramatically with larger datasets. I noticed that it takes over 5 seconds to sort 10 million elements, which seems excessive. Additionally, I tried profiling the code using gprof, and it indicates that the time spent in the merge function is disproportionately high. I’ve considered switching to an iterative version of Merge Sort or even switching to a different sorting algorithm like Timsort, but I would like to understand why my current implementation is struggling with large arrays. Are there any common pitfalls or optimizations I might be missing that could improve performance in this case? Any ideas what could be causing this? Is there a better approach? This is for a CLI tool running on CentOS. I'm using C++ stable in this project. What would be the recommended way to handle this?