AWS Lambda and DynamoDB: Slow performance with large datasets despite using batch operations
I'm working on a project and hit a roadblock. I tried several approaches but none seem to work. I'm currently working on an AWS Lambda function that processes records from a DynamoDB table and writes back to another table after some transformations. However, I'm facing significant performance issues when dealing with larger datasets (around 10,000 records). Despite using batch operations to write items, the function takes longer than expected, and I'm encountering throttling errors as well. Here's the basic structure of my Lambda function: ```python import boto3 from boto3.dynamodb.conditions import Key def lambda_handler(event, context): dynamodb = boto3.resource('dynamodb') source_table = dynamodb.Table('SourceTable') dest_table = dynamodb.Table('DestinationTable') # Fetching records in batches response = source_table.scan(Limit=10000) items = response['Items'] while 'LastEvaluatedKey' in response: response = source_table.scan(ExclusiveStartKey=response['LastEvaluatedKey']) items.extend(response['Items']) # Processing and batching writes with dest_table.batch_writer() as batch: for item in items: transformed_item = transform(item) batch.put_item(Item=transformed_item) return {'statusCode': 200, 'body': 'Processed {} items'.format(len(items))} ``` I've set the reserved concurrency for the Lambda function to 5, but I still get throttling exceptions when the incoming dataset is large. I also tried increasing the throughput on my DynamoDB table, but it didn't seem to help much. I'm receiving the following error intermittently in my CloudWatch logs: ``` ThrottlingException: Rate exceeded ``` When I run the Lambda function with smaller datasets (around 1,000 records), it performs adequately. I've also explored optimizing the query using indices, but it seems like the bottleneck is the write operations to the destination table. Does anyone have suggestions on how to improve write performance with DynamoDB in this scenario? Is there a better way to handle large batch writes or any specific best practices that I might be overlooking? Any insight would be greatly appreciated! Any help would be greatly appreciated! I'd be grateful for any help. My development environment is Debian. Thanks in advance!