OCI Data Flow Pipeline scenarios with 'Internal Server scenarios' on Large Datasets Using Python SDK

👀 Views: 82 💬 Answers: 1 📅 Created: 2025-06-10

I'm upgrading from an older version and I've been banging my head against this for hours. I'm having trouble with an OCI Data Flow pipeline that throws an 'Internal Server behavior' when processing larger datasets. This happens when I try to submit jobs that read from an OCI Object Storage bucket containing around 10,000 files, each around 1MB in size. I've been using the OCI Python SDK version 2.12.0 to configure and submit these jobs. Here’s a snippet of my code: ```python import oci def submit_data_flow_job(): config = oci.config.from_file() data_flow_client = oci.data_flow.DataFlowClient(config) # Define the job details job_details = oci.data_flow.models.CreateJobDetails( display_name='MyDataFlowJob', application_id='your_application_id', compartment_id='your_compartment_id', configuration={'input_bucket': 'oci://mybucket/path/to/files'}, arguments=['--input', 'oci://mybucket/path/to/files/*'], ) try: response = data_flow_client.create_job(job_details) print('Job submitted: ', response.data.id) except oci.exceptions.ServiceError as e: print('behavior: ', e) submit_data_flow_job() ``` When I run this code, the behavior message I receive is: ``` oci.exceptions.ServiceError: (500, 'Internal Server behavior') ``` I’ve tried splitting the files into smaller batches and submitting them one by one, which works but is not efficient. I’ve also checked the logs in the OCI console, but they don’t provide much insight into what might be going wrong. Is there any known limitation with large datasets that I might be exceeding, or is there a better way to handle this scenario? Any suggestions or best practices for using OCI Data Flow with large files would be greatly appreciated! I'm coming from a different tech stack and learning Python.