I’ve been wondering how well Amazon DynamoDB would fit an event store implementation. There were two different designs I wanted to explore, both are described in this article, of which I implemented one of them. The source code is available on GitHub, including a nuget package feed on nuget.org.
Event Store Basics
An event store is a storage concept that stores events in a chronological order. These events can describe business critical changes within a domain aggregate. Besides storing the events of the aggregate, an aggregate state can be stored as a snapshot to avoid reading all events each time the aggregate needs to be rebuild. This can boost performance both in regards of latency and the amount of data that needs to be transferred.
DynamoDB
DynamoDB is a large scale distributed NoSQL database that can handle millions of requests per second. It’s built to handle structured documents grouped in partitions which can be stored in order within a partition key. This has obvious similarities to how an event stream for an aggregate looks like, seems promising!
Another neat feature with DynamoDB is DynamoDB Streams and Kinesis Data Streams with DynamoDB, which both can stream changes in a table to various other AWS services and clients. No need to implement an outbox and integrate with a separate message broker. Add point-in-time recovery and it is possible to stream the whole event store at any time!
Separated Snapshot and Events
Let’s start with the first design that uses composite keys to store snapshots and events grouped by aggregate.
The events are grouped into commits to create an atomic unit that has consistency guarantees without using transactions. Commits use a monotonically increasing id as sort key, while the snapshot uses zero. Since sort keys determine in what order items are stored the commits become ordered chronologically with the snapshot leading, meaning the commits of events can be fetched with a range query while the snapshot can be fetched separately. It makes little sense fetching them all, even though that would also be possible. The snapshot includes the sort key to the commit it represent up until in order to know where to start querying for any events that have not yet been applied to the snapshot.
Do note that there is an item size limit of 400kb in DynamoDB, which should be more than enough to represent both commits and snapshots, but as both are stored as opaque binary data they could be compressed. Besides lowering the size of the items and round-trip latency, this can also lower read and write cost.
When storing a commit the sort key is monotonically increased, this is predictable and therefor can be used as a condition to introduce both optimistic concurrency and idempotency in order to prevent inconsistency due to multiple competing consumers writing events at the same time or a network error occurs during a write operation.
"ConditionExpression": "attribute_not_exists(PK)"
Snapshots can be stored once it precedes the size of the non-snapshotted commits to save cost and lower latency. As snapshots are detached from the event stream it doesn’t matter if storing it after a commit succeeds or fails. If it fails the write operation could be re-run at any time. Updating a snapshot includes guarantees for optimistic concurrency and idempotency by only writing if the version the snapshot points to is higher than the currently stored snapshot or if the attribute is missing all together, which means no snapshot exists.
"ConditionExpression": "attribute_not_exists(version) OR version < :version"
More about conditional writes can be found here.
This was the solution I chose to implement!
Interleaving Snapshots and Events
This was an alternative I wanted to try out, interleaving snapshots and events in one continuous stream.
The idea was to only require a single request to fetch both snapshots and the trailing, non-snapshotted, events, lowering the amount of roundtrips to DynamoDB increasing possible throughput. Reading commits and the latest snapshot would be done by reading in reversed chronological order until a snapshot is found.
This however presents a problem. If a snapshot is stored after say every 10th commit, 11 items have to be queried to avoid multiple roundtrips even though the first item could be a snapshot making the 10 other items redundant. Further more, there are no guarantees when a snapshot get’s written, hence there are no way to know upfront exactly how many items to read to reach the latest snapshot.
Another problem is that all snapshots have to be read when reading the whole event stream.
Conclusion
DynamoDB turns out to be quite a good candidate to take the job as a persistence engine for an event store, supporting the design of an ordered list of events including snapshots, and has the capabilities to stream the events to other services. Replaying a single aggregate can be done with a simple ranged query and it’s schema less design makes it easy to store both commits and snapshots in the same table. It’s distributed nature enables almost limit less scalability and the fact that it is a managed service makes operating it a breeze.
Very nice indeed!
Leave a Comment