MATLAB: implementing using `parfor` to parallelize custom function for large datasets
I'm experimenting with I'm having a hard time understanding Hey everyone, I'm running into an issue that's driving me crazy. I'm trying to speed up processing a large dataset using `parfor` in MATLAB R2023b, but I'm working with issues with the variable assignments. I'm applying a custom function to each element of a large cell array, and while the regular `for` loop works as expected, the `parfor` loop runs very slowly and sometimes throws the behavior: ``` behavior using parfor (line 162) Variable 'data' want to be classified as reduction variable. It must be scalar. ``` Here is a simplified version of my code: ```matlab % Sample data data = cell(1, 1000); for i = 1:1000 data{i} = rand(1, 100); % Generating random data dataOut = cell(size(data)); % Custom function to apply function output = myFunction(input) output = mean(input); % Example operation distribution = zeros(1, 10); % Example output vector parfor i = 1:length(data) if ~isempty(data{i}) dataOut{i} = myFunction(data{i}); distribution(dataOut{i}) = distribution(dataOut{i}) + 1; end end ``` I've tried using `parallel.pool` to open a pool of workers, and I ensured that all variables used within the `parfor` loop are properly indexed and not dependent on each other. Still, the scenario continues. The behavior message is particularly confusing as I am not trying to reduce `data`, just iterate through it. What can I do to resolve this? Is there a recommended approach for handling the variable assignments properly in `parfor` loops, especially in cases where you want to accumulate results like in this case? I'm working on a service that needs to handle this. What am I doing wrong? This is happening in both development and production on Windows 11. I'm open to any suggestions. I'm using Matlab 3.9 in this project.