AWS Glue Job scenarios with 'InvalidInput' scenarios When Processing Large Files

👀 Views: 34 💬 Answers: 1 📅 Created: 2025-08-20

I'm attempting to set up I'm integrating two systems and I'm running an AWS Glue ETL job that processes files from S3 and loads them into Redshift... Recently, I started getting an 'InvalidInput' behavior when trying to process larger files (around 1GB). The behavior message states: `behavior: InvalidInput: One or more parameters are not valid. (Service: Glue, Status Code: 400, Request ID: <request-id>)`. I've checked the configurations and they seem fine. My script uses the following code to read from S3: ```python import sys import boto3 from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) job = Job(glueContext) input_df = glueContext.create_dynamic_frame.from_options( connection_type='s3', connection_options={ 'paths': ['s3://my-bucket/my-large-file.csv'] }, format='csv' ) # Processing logic here job.commit() ``` I've also tried increasing the DPU from 10 to 20 but the behavior continues. I verified that the Glue version is 2.0 and the job bookmark is enabled. Additionally, I checked the IAM role permissions, and it seems to have full access to S3 and Glue. The job runs fine with smaller files (around 100MB), so it seems to be an scenario with larger files. Any suggestions on what could be going wrong or how to troubleshoot this scenario further? For context: I'm using Python on Linux. What am I doing wrong? I'm working on a service that needs to handle this. Thanks for any help you can provide! Thanks, I really appreciate it!