Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ApproximateCreationDateTime from DynamoDBStreamRecord in milliseconds when originating from Kinesis, not seconds. #478

Open
seanlane opened this issue Jan 4, 2023 · 4 comments
Labels
type/events issue or feature request related to the events package

Comments

@seanlane
Copy link

seanlane commented Jan 4, 2023

This is essentially the same issue as aws/aws-lambda-dotnet#839, but without crashing deserialization, which I'm guessing is due to the use of float64 that avoids overflowing with the larger value to deserialize. The relevant points of discussion are:

It seems that the value of ApproximateCreationDateTime will be in seconds when coming from a DynamoDB Stream, but in milliseconds when coming from a Kinesis Stream:

The approximate date and time when the stream record was created, in UNIX epoch time format and rounded down to the closest second

ApproximateCreationDateTime indicates the time of the modification in milliseconds.

There appears to be an internal ticket that's being tracked, so I wanted to open an issue here as well for AWS to monitor and hopefully resolve in the near future. Thanks!

@seanlane
Copy link
Author

seanlane commented Jan 4, 2023

It may not crash, but Unmarshaling and Marshaling a value will cause an overflow for most Unix timestamps due to calling time.UnixNano, toy example:

package main

import (
	"encoding/json"
	"fmt"
	"time"
)

// SecondsEpochTime serializes a time.Time in JSON as a UNIX epoch time in seconds
type SecondsEpochTime struct {
	time.Time
}

const secondsToNanoSecondsFactor = 1000000000
const milliSecondsToNanoSecondsFactor = 1000000

func TestUnmarshalJSON(epoch float64) time.Time {
	epochSec := int64(epoch)
	epochNano := int64((epoch - float64(epochSec)) * float64(secondsToNanoSecondsFactor))
	return time.Unix(epochSec, epochNano)
}

func TestMarshalJSON(t time.Time) ([]byte, error) {
	// UnixNano() returns the epoch in nanoseconds
	unixTime := float64(t.UnixNano()) / float64(secondsToNanoSecondsFactor)
	return json.Marshal(unixTime)
}

func testVal(test float64) time.Time {
	convertedTime := TestUnmarshalJSON(test)
	convertBack, _ := TestMarshalJSON(convertedTime)
	fmt.Printf("%f\t%s\t%s\n", test, convertedTime.String(), convertBack)
	return convertedTime
}

func main() {
	testVal(1669739327580.0) // Millisecond
	testVal(1669739327.0)    // Second
}
1669739327580.000000	54881-12-06 15:53:00 +0000 UTC	-8914383127.569197
1669739327.000000	2022-11-29 16:28:47 +0000 UTC	1669739327

@bmoffatt
Copy link
Collaborator

aws/aws-lambda-dotnet#839 (comment) claims that this only occurs when the dynamo event is first passed through kinesis. Do I understand that correctly? If so, I'm not sure if this is something that's supportable, and the function should operate on the Kinesis event rather than the Dynamo event

@bmoffatt bmoffatt added the type/events issue or feature request related to the events package label Apr 27, 2023
@seanlane
Copy link
Author

seanlane commented Jul 5, 2024

I missed the reply on this issue from last year, but it seems to still be relevant.

...this only occurs when the dynamo event is first passed through kinesis. Do I understand that correctly?

Technically, yes, the pipeline is DynamoDB table -> DynamoDB stream -> Kinesis Data stream -> Firehose (where a transformation Lambda is called) -> Firehose destination

That said, the events passed into the transformation lambda are Kinesis Firehose Events which contain Kinesis Firehose Event Records, which in turn have a Data field.

When configured as above, the Data field contains a DynamoDBEventRecord, which contains the DynamoDBStreamRecord that has the field we're concerned with, ApproximateCreationDateTime.

So Data doesn't contain something like a KinesisEventRecord / KinesisRecord, which would have an ApproximateArrivalTimestamp, which is was I think was suggested in the previous comment.

There is the ApproximateArrivalTimestamp field in the KinesisFirehoseEventRecord, but this would be a timestamp on the event getting accepted by the Firehose stream, as opposed to the time when the change was made in DynamoDB or when it was put into the Kinesis Data stream. The Kinesis documentation suggests that it may not be accurate enough as well (emphasis mine):

Each Amazon Kinesis record includes a value, ApproximateArrivalTimestamp, that is set when a stream successfully receives and stores a record. This is commonly referred to as a server-side time stamp, whereas a client-side time stamp is set when a data producer creates or sends the record to a stream (a data producer is any data source putting data records into a stream, for example with PutRecords). The time stamp has millisecond precision. There are no guarantees about the time stamp accuracy, or that the time stamp is always increasing.

Lastly, the original issue that was referenced (aws/aws-lambda-dotnet#839) appears to have been closed, using the same workaround the we implemented back in 2023: Check if the timestamp is more than 5,000 years in the future, and convert to milliseconds if so:

func firehoseHandler(ctx context.Context, firehoseEvent events.KinesisFirehoseEvent) (
	events.KinesisFirehoseResponse, error) {
	for i, firehoseRecord := range firehoseEvent.Records {

		var ddbRecord events.DynamoDBEventRecord
		err := json.Unmarshal(firehoseRecord.Data, &ddbRecord)
		if err != nil {
			// Do something...
		}
		timeWritten := getDdbCreationTime(ddbRecord.Change)
		// Continue processing...
}

func getDdbCreationTime(e events.DynamoDBStreamRecord) time.Time {
	t := e.ApproximateCreationDateTime

	// There is a bug in the aws-lambda-go library, where timestamps from DDB events in DynamoDB streams are in seconds,
	// but timestamps from DDB events in Kinesis Streams are in milliseconds, but the aws-lambda-go library
	// marshals both as seconds. If this is a Kinesis event, the time.Time object should be at least 50,000
	// years in the future (give or take a few thousand year) when the input was parsed as a seconds timestamp.
	// We can convert it back tho. See https://github.com/aws/aws-lambda-go/issues/478 for more details
	if t.Year() > (time.Now().Year() + 5000) { // Just test 5K years into the future, should be sufficient
		return time.Unix(int64(t.Unix()/1000), (t.Unix()%1000)*1_000_000)
	}
	return t.Time
}

I'm not sure if that same fix should be incorporated here into the library, but the above has been working reasonably well for us over the last 18 months.

@amalakar
Copy link

Confirming that I am experiencing the same error, while using kinesis dynamodb stream with firehose/lambda. There is a ApproximateCreationDateTimePrecision field which can be used to infer the unit of the field.

{

            "dynamodb": {
                "ApproximateCreationDateTime": 1731101300058336,
                "ApproximateCreationDateTimePrecision": "MICROSECOND",

             }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/events issue or feature request related to the events package
Projects
None yet
Development

No branches or pull requests

3 participants