CodexBloom - Programming Q&A Platform

GCP Dataflow Job scenarios with 'Invalid Parameter' scenarios When Using Apache Beam 2.30.0 with Custom Transform

πŸ‘€ Views: 65 πŸ’¬ Answers: 1 πŸ“… Created: 2025-06-14
gcp dataflow apache-beam python

I'm trying to run a Dataflow job using Apache Beam (version 2.30.0) that includes a custom transform to filter and process messages from Pub/Sub. However, I keep working with an 'Invalid Parameter' behavior when I attempt to execute the pipeline. The transform I'm using looks something like this: ```python import apache_beam as beam class CustomFilter(beam.DoFn): def process(self, element): if 'important_key' in element: yield element def run(): with beam.Pipeline(options=beam.options.pipeline_options.PipelineOptions()) as p: (p | 'ReadFromPubSub' >> beam.io.ReadFromPubSub(subscription='projects/my-project/subscriptions/my-subscription') | 'FilterImportant' >> beam.ParDo(CustomFilter()) | 'WriteToGCS' >> beam.io.WriteToText('gs://my-bucket/output')) if __name__ == '__main__': run() ``` The behavior message I receive in the Dataflow logs is: ``` Invalid Parameter: The provided parameters do not match the expected types. ``` I've checked the Pub/Sub messages and they are formatted as JSON strings, which I parse inside the `CustomFilter` transform. I suspect the scenario might be related to how I'm reading or processing the incoming messages, but I’m unsure how to debug this. I've already tried using `print()` statements to log the elements in the `process` function, but those logs aren’t showing up in Dataflow. Additionally, I validated that the Pub/Sub subscription is delivering messages correctly and that the GCS bucket is configured properly. Has anyone encountered a similar scenario or know how to resolve this?