How to efficiently find unique rows in a NumPy array with potential floating-point inaccuracies?

👀 Views: 53 💬 Answers: 1 📅 Created: 2025-08-08

I've been struggling with this for a few days now and could really use some help. I'm working with a large NumPy array that contains rows of floating-point numbers, and I need to find unique rows. However, I'm working with issues due to the nature of floating-point arithmetic, leading to unexpected results. For example, two rows that should be considered equal sometimes return different results when using `np.unique`. Here's a snippet of my current approach: ```python import numpy as np # Sample data with slight floating-point differences arr = np.array([[1.000001, 2.000001], [1.000002, 2.000002], [1.000000, 2.000000]]) # Attempting to find unique rows unique_rows = np.unique(arr, axis=0) print(unique_rows) ``` The output I'm getting is: ``` [[1.000001 2.000001] [1.000002 2.000002] [1.000000 2.000000]] ``` I expected it to identify that the first two rows are essentially the same. I've also tried using a custom comparison function, but that made my code significantly slower and didn't yield the right results either. Is there a more efficient way to handle this, possibly by rounding the values or using a tolerance level? How can I implement that without compromising performance? Any suggestions or best practices would be greatly appreciated! I'm working on a API that needs to handle this. I'd really appreciate any guidance on this. I'm using Python 3.9 in this project. I'd be grateful for any help.