CodexBloom - Programming Q&A Platform

GCP Dataflow job scenarios with 'java.lang.IllegalArgumentException: Invalid input format' when processing Avro data

๐Ÿ‘€ Views: 168 ๐Ÿ’ฌ Answers: 1 ๐Ÿ“… Created: 2025-06-09
google-cloud-dataflow apache-beam avro Java

I'm learning this framework and I just started working with I just started working with I'm having a hard time understanding I tried several approaches but none seem to work..... I'm trying to run a Dataflow job using Apache Beam to process an Avro file stored in Google Cloud Storage, but I'm working with an behavior during the execution. The behavior message states: `java.lang.IllegalArgumentException: Invalid input format`. Hereโ€™s how Iโ€™m setting up my pipeline: ```java PipelineOptions options = PipelineOptionsFactory.create(); Pipeline p = Pipeline.create(options); p.apply("ReadAvro", AvroIO.read(MyAvroClass.class) .from("gs://my-bucket/my-data/*.avro")) .apply("ProcessData", ParDo.of(new MyProcessingFn())); p.run().waitUntilFinish(); ``` Iโ€™ve verified that the Avro files are well-formed and can be read correctly when using other tools, like `avro-tools`. I also checked the dependencies in my `pom.xml`: ```xml <dependency> <groupId>org.apache.beam</groupId> <artifactId>beam-sdks-java-io-avro</artifactId> <version>2.41.0</version> </dependency> <dependency> <groupId>org.apache.avro</groupId> <artifactId>avro</artifactId> <version>1.10.2</version> </dependency> ``` I tried specifying the schema directly within the AvroIO read method, but it didnโ€™t resolve the scenario. Additionally, I confirmed that my IAM roles are correctly set up to allow access to the GCS bucket. Is there something specific Iโ€™m missing in terms of configuration, or could it be an scenario with the version compatibility between Beam and Avro? Any insights or troubleshooting steps would be greatly appreciated. For context: I'm using Java on macOS. Any examples would be super helpful. Has anyone else encountered this? What am I doing wrong? I'm working with Java in a Docker container on Windows 10. Any feedback is welcome!