CodexBloom - Programming Q&A Platform

AWS Glue Job scenarios with 'NoSuchBucket' scenarios When Accessing S3 Data

👀 Views: 22 đŸ’Ŧ Answers: 1 📅 Created: 2025-07-23
aws aws-glue s3 data-processing Python

I'm getting frustrated with I'm running an AWS Glue job to process data stored in an S3 bucket, but I'm working with a `NoSuchBucket` behavior despite the bucket existing and containing the expected files. The Glue job is defined using the following Python script: ```python import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() GlueContext = GlueContext(sc) job = Job(GlueContext) job.init(args['JOB_NAME'], args) # Reading data from S3 datasource0 = GlueContext.create_dynamic_frame.from_catalog( database="my_database", table_name="my_table", transformation_ctx="datasource0" ) # Further processing... job.commit() ``` I have double-checked the bucket name in the AWS S3 console and also verified that the IAM role attached to the Glue job has the necessary permissions to read from the bucket. The IAM policy includes: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-existing-bucket", "arn:aws:s3:::my-existing-bucket/*" ] } ] } ``` I suspect the scenario might be related to the Glue job's execution environment or how the bucket is referenced. I've also tried using the full path in the `create_dynamic_frame` call like this: ```python datasource0 = GlueContext.create_dynamic_frame.from_options( connection_type="s3", connection_options={"paths": ["s3://my-existing-bucket/my-folder/"]}, format="json" ) ``` But I'm still hitting the same `NoSuchBucket` behavior. I've also confirmed that the bucket region matches the region where the Glue job is running. Could it be a delay in the bucket's availability or caching issues? Any insights on resolving this behavior would be greatly appreciated. This is part of a larger service I'm building. Any help would be greatly appreciated! What's the best practice here? I'm working with Python in a Docker container on Windows 11. How would you solve this?