CodexBloom - Programming Q&A Platform

GCP Dataflow job scenarios with 'java.lang.IllegalStateException: Input split is not of type FileSplit' in TextIO

๐Ÿ‘€ Views: 1 ๐Ÿ’ฌ Answers: 1 ๐Ÿ“… Created: 2025-06-10
gcp dataflow cloud-storage java Java

I'm working on a project and hit a roadblock. I'm upgrading from an older version and After trying multiple solutions online, I still can't figure this out. I'm currently working on a GCP Dataflow job that uses the TextIO transform to read files from a Cloud Storage bucket. After deploying my pipeline, I'm working with the behavior `java.lang.IllegalStateException: Input split is not of type FileSplit`. This seems to occur when the input file isn't being recognized as a suitable format for TextIO. Iโ€™ve double-checked that my files are indeed plain text files and my job configuration appears correct. Hereโ€™s the relevant portion of my Dataflow pipeline code: ```java PipelineOptions options = PipelineOptionsFactory.create(); Pipeline p = Pipeline.create(options); p.apply("ReadLines", TextIO.read().from("gs://my-bucket/input/*.txt")) .apply("ProcessLines", ParDo.of(new ProcessLineFn())); p.run().waitUntilFinish(); ``` Iโ€™ve verified the following: 1. The input files in the Cloud Storage bucket are not corrupted (I can open them without issues). 2. The permissions are set correctly; the Dataflow service account has read access to the bucket. 3. I tried specifying a single file instead of a wildcard, and the behavior continues. 4. Iโ€™m using the Dataflow SDK version 2.33.0, which should be compatible with the GCP environment. Despite these checks, the job continues to unexpected result with the same behavior. Has anyone encountered this scenario before or have suggestions on how to resolve it? Any insights would be greatly appreciated! Is there a better approach? What am I doing wrong? I'm on Linux using the latest version of Java. Any suggestions would be helpful. For context: I'm using Java on Ubuntu 22.04. Could someone point me to the right documentation? For reference, this is a production application. Could someone point me to the right documentation? I'm working with Java in a Docker container on Linux. Am I approaching this the right way?