Serverless Diary: How to Achieve Strong Consistency with AppConfig- Feature Flags
I. Introduction
“One ring to rule them all, one ring to find them, One ring to bring them all, and in the darkness bind them” — J. R. R. Tolkien (Lord of the Rings)
In the previous blog, we introduced AWS Appconfig and looked at a use case of how dynamic feature toggles can help with the agility of software delivery.
The AWS AppConfig was initially a solution developed and consumed internally within AWS to help them in their agile delivery and tackle common use cases and challenges. It added so much value internally within AWS that they decided to wrap this solution built using the industry best practices and share it with all of us.
Frameworks and SAAS services shine as they abstract away a lot of complexity within a solution so consumers can interact with a layer that makes it super easy to use. But this also implies these services and frameworks to be used in a particular fashion. “Opinionated Framework” is often the term used to describe such behavior.
What to do if a few use cases aren’t standard documented textbook scenarios offered by the framework? What if the out-of-box functionality doesn’t support those few use cases? Do we start looking for alternatives like another SAAS service because we didn’t get a 100% match for all the use cases within a project? Is there a guarantee that any new SAAS solution will cope with all future requirements arising from an agile delivery project? No, there isn’t. And then, There is always a temptation to build something custom and greenfield, which has its demerits, as I addressed in my previous blog.
This blog focuses on how we can use a single feature flag across various components (front end, back end) to manage a consistent behavior of the feature in serverless architectures.
“One Flag to manage them All, One Flag to find them, and with Design bind them to be Strongly Consistent”
— a parody of the popular novel/movie “Lord of the Rings”
II. The Challenge
“As a microservices serverless architecture using Frontend SPA’s deployed in S3 bucket and backend APIs as lambdas behind API gateways, I want the ability to disable/enable features with consistency across backend and frontend applications using a single feature flag managed centrally”
A feature toggle helps us manage the rollout of functionality across several components and services. Consider an example- we have a website that offers features X and Y. The requirement is to have the ability to disable/enable feature x as needed consistently for the front end and back end. Given front end is a Single Page Application (SPA) offering features X and Y, we want to switch off feature X journey from the frontend and backend, as we don’t want users to see pages specific to feature x if the journey is disabled.
So how do we go about doing that using AWS AppConfig? We can Create one feature flag in AppConfig that manages both frontend and backend, with feature flag configuration exposed via API for the frontend.
This approach meets the requirement of having a single feature flag to manage the entire feature across the backend and the front end. But that doesn’t satisfy the requirement around both the front end and the back end being consistent with feature flag values. Let’s dig deeper into what this means by looking at the below diagram.
Let’s look at the sequence of steps from the above diagram (figure1):
- In steps a, b, and c developer checks-in the feature flag configuration to create and deploy it within the AWS AppConfig service. The AppConfig configuration-profile name is ‘FeatureList’, and the feature flag name is ‘isFeatureXEnabled’ that gets deployed to environment ‘test-1’. From this point onwards, the new or updated feature flag is available to be consumed by lambdas.
- In step 1, the user loads the application that makes a backend call via API Gateway to /config endpoint method hosted by feature X lambda. In Step 2, The lambda feature x makes a local call to the lambda layer to retrieve the feature flag value and return the response to the front end. The feature x lambda is attached to a lambda layer that provides local caching and a managed abstraction layer. This layer asynchronously updates the lambda cache with the latest configuration from the AppConfig service, as per the TTL(Time To Live) defined by the lambda environment variable AWS_APPCONFIG_EXTENSION_POLL_INTERVAL_SECONDS.
- In step 3, The front end receives feature flag configuration as the response. If ‘isFeatureX’ is false, then the feature X journey and associated pages are hidden from the user. If ‘isFeatureX’ is true, then the user can go through journey X-related pages and submit the response to the backend endpoint supporting feature X submission. The backend then checks if ‘isFeatureX’ is true from the lambda layer and executes method feature x functionality. If ‘isFeatureX’ is false then respond back with an HTTP-503 ‘Service Unavailable’ message.
This is where the problem of eventual consistency exists. Let’s Consider a scenario with the following assumptions:
- The lambda layer TTL is set to 15 minutes, and the featureX value is false in the AppConfig service. So from a front-end perspective, Feature X journey is currently disabled.
- The front-end website has variable traffic ranging from 10 requests per second to 100 requests per second. This implies, for the feature X lambda, there could be any number of warm lambda instances between 10 to 100 each having a different initialization time and its own local cache/lambda layer
So what does this mean? It means that if we have 100 warm instances of lambda X at a given time, these lambdas may synchronize at different times with the AppConfig Service, between 0 to 15 minutes.
Now let’s look at a sequence of steps for a common scenario:
- The business requests to enable featureX, so the developer updates the feature flag to true and checks in the code. The CI/CD pipeline deploys using the ‘AllAtOnce’ configuration and AWS AppConfig is updated with the latest configuration.
- I start the front-end journey at this point, which results in the front end making a call to backend lambda X via API Gateway. This invocation happens to hit warm lambda instance 1 that already has synchronized with AppConfig, and returns back true, allowing me to continue the journey X.
- Post form completion with featureX-related attributes, I make a submission, and this time I happen to trigger lambda instance 11. This instance has yet not synchronized with the latest AppConfig update and fetches the value of false for featureX from the local cache. This results in the user getting back a service not available error from that instance of lambda. This is the challenge!!!
Now this behavior can occur for a period of 15 minutes as defined by the TTL. Theoretically, we can lower the TTL to a very small value but then that negates the benefit of caching provided by the layer and also implies higher cost as more calls need to happen between the lambda layer and AWS app config service.
Let’s look at the solution to deal with similar use cases and scenarios.
III. The Solution
One of the ways to overcome the eventual consistency issue is to add metadata during deployment that allows the backend lambda to compute if all backend lambda instances are consistent with the new values or not. Let’s take a closer look at the below-updated design that solves the consistency issue.
Let’s look at the sequence of steps from the above diagram (figure2):
- In steps a and b, the developer checks-in the updated feature flag configuration. For purpose of illustration, the named configuration-profile is FeatureList with flag name as isFeatureXEnabled= true (changed from previous false). The CI pipeline creates a new version in AppConfig that has isFeatureXEnabled=true.
- In step c, the deploy pipeline runs that initiates the process to deploy the new version of configuration in the environment test-1. A timestamp attribute is injected into the new configuration as part of this deployment process in the environment. Looking at the design we can observe the injected attribute added is “lastModifiedTimestamp” in the test-1 environment.
- Now step 1 and 2 remain unchanged where the FE requests feature flag configuration from the backend. Step 3 is the additional logic in lambda which takes care of the eventual consistency logic. It does this by computing the total time since the last AppConfig deployment and checking if this is greater than the configured TTL (env variable AWS_APPCONFIG_EXTENSION_POLL_INTERVAL_SECONDS)
config():pseudo logic for computing response payloadif(isFeatureXEnabled){ if((currentTimestamp - lastModifiedTimestamp) > AWS_APPCONFIG_EXTENSION_POLL_INTERVAL_SECONDS){
isBackendConsistent = true;
}
}return {
“isFeatureXEnabled”: ${isFeatureXEnabled},
“isBackendConsistent”: ${isBackendConsistent}
}where
isFeatureXEnabled, lastModifiedTimestamp are read from local cache provided by lambda layer synchronised with AppConfig,AWS_APPCONFIG_EXTENSION_POLL_INTERVAL_SECONDS is configured as lambda env variable, andisBackendConsistent is the attribute added by backend to indicate if all backend lambda instances are definitely synchronised
4. In Step4, the front end checks if both isFeatureXEnabled and isBackendConsistent are true, and then only allows the user to progress through the journey. This ensures that the front end only switches on the feature X when all backend lambda instances have the latest config value to allow feature X i.e. all backend lambda instances are ‘strongly consistent’
V. Summary
In this article, we looked at how we can use a single feature flag across several components in a serverless architecture addressing the challenge of ‘strong consistency’ while still using a managed service like AWS AppConfig as a dynamic feature toggle framework. The purpose of this article isn’t to promote the use of this style where we end up addressing the challenge of consistency with the scaling model of lambda. The takeaway is that there is an elegant solution if one finds themselves with a use case similar to the one presented. This can also cater to serverless use cases like having the ability to switch on a feature globally with ‘strong consistency.’ Although, I must warn that this implementation approach with the desire to be strongly consistent comes with an inherent risk.
If the newly launched feature has an issue requiring rollback, all of the users are impacted more negatively than the selected few. For this reason, AWS recommends gradually rolling out the feature over a defined window, exposing only a small percentage of users to the new feature. In cases of issues, AWS Appconfig can monitor the cloudwatch alarms and roll them back automatically before completing the rollout to 100% of the users.
In this blog, we mostly looked at the design aspect of the problem discussed. In the following blog, I will focus on the build aspects. How easily we can use some nifty features of AppConfig to add metadata to our deployed configurations (like lastModifiedTimestamp), the other raw alternative is having your CD pipeline manage it.