Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3-sqs input: log a lambda event summary in case of errors #860

Merged
merged 9 commits into from
Dec 13, 2024

Conversation

zmoog
Copy link
Contributor

@zmoog zmoog commented Nov 29, 2024

What does this PR do?

If there is an exception when processing a lambda event from the s3-sqs input, ESF summarizes the lambda event and adds the summary as extra information to the exception log message.

The summary contains the essential elements for the lambda event. For the s3-sqs input, it includes the bucket ARN and the object key. The summary limits the number of records to max_records.

ESF currently supports lambda event summary for s3-sqs only. We want to hear users' feedback before making additional changes or extending the summary to other input types.

Why is it important?

For some error classes (e.g., encoding errors), knowing the S3 objects in flight in the lambda event helps speed up investigation and troubleshooting.

Users can achieve the same goal by enabling APM, but this extra log helps those users who do not or won't use APM for any reason.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.md and updated share/version.py, if my change requires a new release.

How to test this PR locally

Deploy ESF and send an SQS message with a S3 that points to a non-existent S3 object:

aws sqs send-message \
    --queue-url https://sqs.eu-west-1.amazonaws.com/627286350134/mbranca-esf-notifications \
    --message-body '{"Records":[{"awsRegion":"eu-west-1","eventTime": "2024-01-31T19:06:30.680Z","eventName":"ObjectCreated:Put","eventSource":"aws:s3","eventVersion":"2.1","s3":{"bucket":{"arn":"arn:aws:s3:::mbranca-esf-data","name":"mbranca-esf-data"},"object":{"key":"AWSLogs/123456/I-do-not-exist"}}}]}'

Logs

Here is a sample log with the lambda event summary:

{
    "@timestamp": "2024-11-29T12:06:05.404Z",
    "log.level": "error",
    "message": "exception raised",
    "ecs": {
        "version": "1.6.0"
    },
    "error": {
        "message": "An error occurred (404) when calling the HeadObject operation: Not Found",
        "stack_trace": "  File \"/var/task/handlers/aws/utils.py\", line 103, in wrapper\n    return func(lambda_event, lambda_context)\n  File \"/var/task/handlers/aws/handler.py\", line 532, in lambda_handler\n    for es_event, last_ending_offset, last_event_expanded_offset, current_s3_record in _handle_s3_sqs_event(\n  File \"/var/task/handlers/aws/s3_sqs_trigger.py\", line 141, in _handle_s3_sqs_event\n    for log_event, starting_offset, ending_offset, event_expanded_offset in events:\n  File \"/var/task/storage/s3.py\", line 78, in get_by_lines\n    s3_object_head = self._s3_client.head_object(Bucket=self._bucket_name, Key=self._object_key)\n  File \"/var/task/botocore/client.py\", line 565, in _api_call\n    return self._make_api_call(operation_name, kwargs)\n  File \"/var/task/elasticapm/instrumentation/packages/base.py\", line 213, in call_if_sampling\n    return self.call(module, method, wrapped, instance, args, kwargs)\n  File \"/var/task/elasticapm/instrumentation/packages/botocore.py\", line 106, in call\n    result = wrapped(*args, **kwargs)\n  File \"/var/task/botocore/client.py\", line 1017, in _make_api_call\n    raise error_class(parsed_response, operation_name)\n",
        "type": "ClientError"
    },
    "log": {
        "logger": "root",
        "origin": {
            "file": {
                "line": 128,
                "name": "utils.py"
            },
            "function": "wrapper"
        },
        "original": "exception raised"
    },
    "process": {
        "name": "MainProcess",
        "pid": 8,
        "thread": {
            "id": 139632879478592,
            "name": "MainThread"
        }
    },
    "service": {
        "name": "mbranca-esf-ApplicationElasticServerlessForwarder-0VuQ7x2yawGZ"
    },
    "span": {
        "id": "4a348c93af0ca023"
    },
    "summary": {
        "aws:sqs": {
            "records": [
                {
                    "bucket": {
                        "arn": "arn:aws:s3:::mbranca-esf-data",
                        "name": "mbranca-esf-data"
                    },
                    "object": {
                        "key": "AWSLogs/123456/I-do-not-exist"
                    }
                }
            ],
            "total_records": 1
        }
    },
    "trace": {
        "id": "88556e9d3f65fc709e53555b64b44a66"
    },
    "transaction": {
        "id": "c8dc45b175c39628"
    }
}

@zmoog zmoog added the enhancement New feature or request label Nov 29, 2024
@zmoog zmoog self-assigned this Nov 29, 2024
Copy link

github-actions bot commented Nov 29, 2024

Minimum allowed coverage is 100%

Generated by 🐒 cobertura-action against 99c147c

@zmoog zmoog marked this pull request as ready for review November 29, 2024 16:02
@zmoog zmoog requested a review from a team December 9, 2024 16:07
handlers/aws/utils.py Outdated Show resolved Hide resolved
handlers/aws/utils.py Outdated Show resolved Hide resolved
handlers/aws/utils.py Outdated Show resolved Hide resolved
constanca-m
constanca-m previously approved these changes Dec 11, 2024
Copy link
Contributor

@constanca-m constanca-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Maurizio, looks good!


# We stop adding records to the summary once we reach
# the `max_records` limit.
if len(aws_sqs_summary[first_records_key]) < max_records:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also break the loop when reaching that? If so you can reverse the check and add a break to avoid looping for nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was continuing the loop to get the total number of records.

Here's the lambda event structure (with minor changes for readability):

{
    "Records": [
        {
            "messageId": "059f36b4-87a3-44ab-83d2-661975830a7d",
            "receiptHandle": "AQEBwJnKyrHigUMZj6rYigCgxlaS3SLy0a...",
            "body": "{\"Records\":[{\"awsRegion\":\"eu-west-1\",\"eventTime\":\"2024-01-31T19:06:30.680Z\",\"eventName\":\"ObjectCreated:Put\",\"eventSource\":\"aws:s3\",\"eventVersion\":\"2.1\",\"s3\":{\"bucket\":{\"arn\":\"arn:aws:s3:::mbranca-esf-data\",\"name\":\"mbranca-esf-data\"},\"object\":{\"key\":\"AWSLogs/123456/I-do-not-exist\"}}}]}",
            "attributes": {
                "ApproximateReceiveCount": "1",
                "SentTimestamp": "1545082649183",
                "SenderId": "AIDAIENQZJOLO23YVJ4VO",
                "ApproximateFirstReceiveTimestamp": "1545082649185"
            },
            "messageAttributes": {
                "myAttribute": {
                    "stringValue": "myValue", 
                    "stringListValues": [], 
                    "binaryListValues": []
                }
            },
            "md5OfBody": "e4e68fb7bd0e697a0ae8f1bb342846b3",
            "eventSource": "aws:sqs",
            "eventSourceARN": "arn:aws:sqs:us-east-2:123456789012:my-queue",
            "awsRegion": "us-east-2"
        }
    ]
}

So my goal is to provide the total number of the following elements (hoppa jq style):

.Records[].body.Records

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see you point now, we can skip the inner loop 🤦

handlers/aws/utils.py Outdated Show resolved Hide resolved
@zmoog zmoog merged commit 43ce2b3 into main Dec 13, 2024
5 checks passed
@zmoog zmoog deleted the zmoog/lambda-event-summary-s3-sqs branch December 13, 2024 09:55
@zmoog zmoog changed the title [s3-sqs] Log a Lambda event summary on exception s3-sqs input: log a lambda event summary in case of errors Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants