Serverless Diary: The Ultimate Guide to Caching in the Cloud

Original Photo by Philippe Donn from Pexels


As cloud computing is becoming a more common way of storing and managing data for all small to large-sized enterprises, the importance of understanding data usage patterns and the distance from the requestor entity has become vastly important. Distance is key to improved response times. Following are the three factors to a quicker response time:

  1. What kind of storage it uses (SSD, HDD),
  2. What format the data is stored in (binary, JSON etc).

I. What is caching

A cache is a high-speed data storage layer that needs to be served faster than is possible by accessing the primary data storage location. With modern architecture styles and patterns, caching definition itself has evolved and data on the cache can be both transient or durable** depending on the use case.

II. Why & When use caching

Numerous use cases can benefit from caching. One fact about data on the cloud is- it is “distributed data.” A simple stab at deciphering the word “Distributed” is — requested data can be on physically different machines, buildings, zones, and even regions. So obvious challenge is bringing together this data from all sources(could be single or multiple) for a given request quickly. Depending upon whether the data is static, dynamic, or frequently accessed, there are different ways of designing and serving this data to benefit from:

  1. Reduce cost by serving more data from the cache, resulting in less traffic to backend application and database, which reduces cost (true for serverless architecture) and also reduces the load on the backend. In simpler words, you can serve more read-heavy traffic with less cost.

III. Where to cache

Let’s look at a frequently used and repeatable serverless design on AWS with and without caching. The below diagram will help in understanding in what layers caching can be applied.

  1. CDN cache- The Amazon Cloudfront service is a great option to cache static contents like images, files, videos fronting your s3 buckets (s3 as the origin of cloud front)hosting your websites. With Lambda@Edge compute, the possibilities are endless of what you can serve and secure via your CDN network.
  2. Application Cache
    a. Gateway caching: You can enable API caching in Amazon API Gateway to cache your endpoint's responses. With caching, you can reduce the number of calls made to your endpoint and also improve the latency of requests to your API.
    b. On-host caching (Lambda): This is a very powerful option if you have a more customized use case. In the serverless world, lambda would act as an on-host cache. There are a couple of ways you can implement caching with lambdas.
    1. The preferred way is by adding lambda extensions to your lambda functions. With this approach, you can create an in-memory, fast and secure cache that caches responses from various back-end services like databases, parameter store, etc, and is refreshed as per the configured TTL(Time to Live). Refer to this article for implementation details.
    2. A more traditional approach builds on the fact that any declarations in your Lambda function code (outside the handler code) remain initialized, providing additional optimization when the function is invoked again. So whatever needs to be cached goes outside the handler code. Life cycle management of the cached object needs custom logic to implement TTL.
    c. In-Memory caching: AWS Elastic cache (Redis/Memcached)is a fully managed service(but not serverless) that provides you the option of using Redis/Memcache as an external/side cache. It is a popular choice for real-time use cases like Caching, Session Stores, Gaming, Geospatial Services, Real-Time Analytics, and Queuing.
  3. Database cache- Caching response of databases can provide you that additional ultra-low latency. If that is required, then DAX is a purpose-built cache service for dynamodb and is more efficient compared to side cache like ElasticCache.


Now that you are aware of all the caching tools in your serverless toolbox, let me wrap up with 6 key takeaways:

  1. You should not cache the same data at multiple layers. Understand and Identify the appropriate caching layer, as per your use case.
  2. Always apply TTL to your cached items.
  3. Have a contingency process for forcing a cache refresh on-demand.
  4. If using an in-memory cache like ElasticCache for data storage (instead of as a traditional cache), agree on RTO(Recovery Time Objective) and RPO(Recovery Point Objective) upfront.
  5. If using the lambda caching technique (declaration outside the handler code), be aware that each instance of lambda will end up refreshing from the cache at different times(as per the TTL logic implemented). This can lead to inconsistent results at a given point in time, as a call to API gateway can end up either creating a new lambda or reusing an existing warm lambda with its TTL policy. Be mindful to check if this is acceptable for your use case.

Digital Architect, Agile practitioner, Mentor and Fitness Enthusiast.