advanced patterns when using np.unique with return_counts on non-unique floating point array

👀 Views: 25 💬 Answers: 1 📅 Created: 2025-06-16

numpy floating-point data-manipulation Python

Could someone explain I'm working on a personal project and I'm working with an scenario with `np.unique` when trying to count occurrences of elements in a floating point array. I've created a simple array of floating point numbers that contains duplicates, and I expected to get the unique values along with their counts, but the results are not what I anticipated. Here’s the code snippet I’m using: ```python import numpy as np data = np.array([0.1, 0.1, 0.2, 0.3, 0.3, 0.3]) unique_vals, counts = np.unique(data, return_counts=True) print("Unique values:", unique_vals) print("Counts:", counts) ``` When I run this, I get the following output: ``` Unique values: [0.1 0.2 0.3] Counts: [2 1 3] ``` This output is expected, but when I modify the array to include a very small floating point number that is close to the duplicates, like so: ```python data = np.array([0.1, 0.1, 0.2, 0.3, 0.3000001, 0.3]) ``` I get: ``` Unique values: [0.1 0.2 0.3 0.3000001] Counts: [2 1 1 1] ``` I was expecting the last two values (0.3 and 0.3000001) to be considered the same because they are very close in value. I thought that NumPy would handle floating point comparisons more gracefully. Is there a way to modify my approach to treat these as the same value? I’ve considered rounding the values before passing them to `np.unique`, but that feels like a hack instead of a proper solution. I'm currently using NumPy version 1.24.0. Any suggestions on how to handle this would be greatly appreciated! Any help would be greatly appreciated! My development environment is Debian.