Serverless Diary: What you need to know about NoSQL AWS Dynamo Database
Having shared my experience of implementing microservices in a serverless style in my previous blogs, let me dive into a level deeper sharing design considerations to be aware of for each component of our microservices. The focus of this blog will be understanding the right way of designing NoSQL Dynamo tables, which in my opinion is a pre-requisite before I share my learnings on how to approach event-driven architecture. More on that in my future blogs. Let’s use the below simple diagram as a starting point to discuss DynamoDb. I will be extending this design in my future blogs as I touch one service at a time and the role it plays in an event-driven serverless architecture.
II. Isn't DynamoDB just another database?
Why even bother going through a blog to understand design considerations for the NoSQL Dynamo database? Why not just crack on with the build and modify/extend as we go along? Here’s why:
- Even if you have several implementations under your belt which used either a SQL or NoSQL database, recognize the fact that this is still a very much vendor (AWS) specific service and you really need to understand how to approach it if you wish to leverage its true benefits and power.
- Winging it as you go approach, will prove both painful and expensive. More on that in a bit.
- It’s a database, but in AWS serverless world, this is THE database that provides a very powerful way of implementing event-driven architectures.
III. My Top 4 Picks
- Define your Access Patterns upfront
Straight from AWS documentation -
“NoSQL design requires a different mindset than RDBMS design. For an RDBMS, you can create a normalized data model without thinking about access patterns. You can then extend it later when new questions and query requirements arise. By contrast, in Amazon DynamoDB, you shouldn’t start designing your schema until you know the questions that it needs to answer. Understanding the business problems and the application use cases upfront is absolutely essential.”
Defining the access patterns upfront allows you to correctly identify partition keys (HASH) and sort keys (range). Getting this correct is very crucial as even though you can always add more Global Secondary Index’s (GSI). But you get only one shot at table creation time of defining your Primary indexes and Local Secondary Indexes (LSI). Imagine you need to change your primary key in production after 2 months because you incorrectly identified your unique primary key. Guess what, the cost is very high. The only way to correct that currently is by deleting your table and performing some kind of data migration activity (ETL) to load back the data.
I work very closely with my development team and always assume that not everyone is experienced working with Dynamo. Hence, I use the below format while documenting my design, something which has proved quite useful.
2. Privacy and Security — Dynamo VPC endpoint
Because Security in the cloud is our responsibility. A VPC endpoint for DynamoDB enables Amazon services in your VPC to use their private IP addresses to access DynamoDB with no exposure to the public internet. This decision will ensure that route to DynamoDB stays entirely within the Amazon network and does not access the public internet.
3. Define/Agree on Data retention and Archiving policy
Defining your data archiving policy upfront will allow you to leverage the unique and free of cost feature which is DynamoDb Time to Live (TTL). You don’t need to design a separate solution for archiving needs, imagine that!! I have had a few use cases where data was of no use the day after. The only thing I had to do was enable TTL on the table, specifying attribute name (date-time stamp in Unix epoch format), and dynamo took care of deleting the item as per the TTL specified. If you have more evolved needs than you can still use this feature and delete the item from the table using TTL but archive it in S3 using dynamo streams. I will cover this in my future blog when discussing streaming patterns.
4. Define the Strategy and Approach to Transactional Behaviours
A few years back managing transactional behavior with dynamodb was a challenge and it would have been correct to state dynamo wasn’t the correct choice for this use case. (Although if you are truly designing event-based architecture, compensation handlers are the way to go). That all has changed and now you can manage ACID transactions over multiple tables within a region as a native construct in AWS. This topic is very well covered by AWS here. My advice still remains the same, any sort of transactional behavior should be the last resort and if there is a need, limit it to using DynamoDb conditional checks only. The good news is there is no actual underlying locking of records and I would guess underneath it uses CAS (Compare And Swap) algorithm.
Final Reflections and takeaways:
In order to leverage the power of serverless, you have to give into best practices and styles of vendor-specific services, be it AWS, GCP, AZURE, etc. The serverless architecture style is a powerful concept but it is powered by grouping individual cloud-native services together. The only way to reap maximum benefits is by understanding each service, associated best practices, and how it integrates with other services.
Be innovative and brave in the serverless world, but always do your homework upfront!!