CodexBloom - Programming Q&A Platform

GCP Dataflow job scenarios with 'java.lang.IllegalArgumentException: Invalid input path' when using custom sources

👀 Views: 25 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-10
gcp dataflow apache-beam Java

I'm collaborating on a project where I'm running a Dataflow job that reads from a custom source in GCP, but I'm working with an behavior that says `java.lang.IllegalArgumentException: Invalid input path` when I try to execute my pipeline... I've set up my job to read from a Google Cloud Storage (GCS) bucket and I'm using Apache Beam version 2.37.0. My input path is defined as follows: ```java String inputPath = "gs://my-bucket/my-data/*.csv"; PCollection<String> lines = p.apply("ReadFromGCS", TextIO.read().from(inputPath)); ``` However, I'm certain that the GCS bucket and the file paths are valid because I can access them from the GCP console. I also checked the permissions and the service account attached to the Dataflow job has `Storage Object Viewer` permissions for the bucket. To troubleshoot, I tried running the job with a specific file instead of a wildcard, like this: ```java String inputPath = "gs://my-bucket/my-data/my-file.csv"; ``` But I still received the same behavior. I also verified that the GCS bucket is in the same region as my Dataflow job, which is in `us-central1`. I'm not sure what's going wrong here. Is there a specific way I need to format the input path or any additional configuration that I might be missing? Any insights would be greatly appreciated! How would you solve this? The project is a mobile app built with Java. Is this even possible?