Serverless Diary: How to Increase Agility with Feature Toggles — Native Approach

Anuj Kothiyal
7 min readAug 14, 2022

--

Photograph of a person on a laptop wearing an AWS AppConfig T-shirt
Photograph of a person on a laptop wearing an AWS AppConfig T-shirt

I. Introduction

“Don’t build that you can buy, don’t buy that you can rent”

Every new line of code developers write, there is that feeling of creation, and with that comes satisfaction. Not everyone realizes that every added line of code increases the system complexity and the cost of maintaining and operating it.

As Software Architects, we need to be honest when designing systems and answer one important question- Is the problem we are solving unique? Are there SAAS solutions out in the market which already has an answer to the problem? Speed to market is critical, and leveraging SAAS offerings should always be a consideration when solving a problem for an enterprise.

There are plenty of custom ways to design one’s feature toggle framework using serverless components, but the more code one writes, the more complexity he invites. The focus of this blog is to extend and iterate on the previous design and provide an alternative approach to feature toggle using AWS App Config.

II. Understanding the Service — AppConfig

A capability of AWS Systems Manager, to create, manage, and quickly deploy application configurations. A configuration is a collection of settings that influence the behavior of your application.

There are a lot of use cases and features that AppConfig supports. I will focus on the AppConfig feature flags use case to explain how it fits into the design discussed in this blog.

  1. Creation of Application and Environments: In AWS AppConfig, an application is simply an organizational construct like a folder. For each application, we define one or more environments. An environment is a logical deployment group of AWS AppConfig applications, such as applications in a Test or Production.
  2. Creation and Storage of Flags: All feature flag frameworks need a source where feature flags are stored and managed. When using AppConfig, we can store these feature flags within AppConfig Service or reference our existing configurations like S3 objects, AWS Code Pipeline, and AWS Systems Manager Parameter/document. JSON and YAML are the most commonly used formats for defining feature flags.
  3. Validation of Flags: A validator ensures that our configuration data is syntactically and semantically correct. We can create validators in either JSON Schema or as an AWS Lambda function
  4. Deployment of Application Flags into an Environment: When we are ready to switch a feature on or off, AppConfig provides an option of deploying changes into an environment in seconds, or rolling them out slowly to assess the impact of the changes, with the latter being the recommendation to derisk delivery of a feature. The AWS AppConfig resource that helps us control deployments is called a deployment strategy. We have the option to create a custom deployment strategy or use ready-to-go from AWS.
  5. Rollback: You can configure Amazon CloudWatch alarms for each environment to monitor any key success metrics and failures. The AWS AppConfig monitors alarms during a configuration deployment as per the bake time defined within the deployment strategy. If an alarm is triggered, the system automatically rolls back the configuration without requiring manual intervention.
  6. Fetching Feature Flags in lambda using AWS-AppConfig-Extension layer: The AppConfig extension includes best practices that handle calling the AWS AppConfig service, managing a local cache of retrieved data, tracking the configuration tokens needed for the subsequent service calls, and periodically checking for configuration updates in the background.

III. The Challenge

In the last blog, we saw how starting with feature flags is easy. But with the speed of development and growing software and services, it can get challenging to manage this complexity. AWS App Config shines in such scenarios and provides numerous benefits over the classic approach of using environment variables, a few of them being:

  1. Centralized feature management service with controls in place to reduce configuration change-related errors.
  2. Toggling features don’t require an application code deployment, preventing lambda cold starts. AppConfig provides a dynamic feature toggle approach compared to the static feature toggle approach we discussed in the previous blog.
  3. Control on feature launch rate, and automatic rollback in case of any issues, limiting negative impact on services

IV. The Build

Let us iterate on the use case design ‘figure2- Option 1 Classic Approach’ from the previous blog and look at the updated design to introduce the AWS AppConfig service.

Reminding ourselves of the problem we are trying to solve, but this time using AppConfig:

As an Administrator managing Citizen facing COVID testing website, I want the flexibility to offer PCR Test Centre appointments for a limited time as a contingency measure on the top of always available PCR and LFD Home tests.”

figure1: Native Approach using AWS AppConfig

Let’s look at the developer steps from the above diagram (figure1):

  • Step a: A developer creates or updates the configuration profile and feature flag as JSON files and check-in the change into the source code repository.
  • Step b: The check-in step triggers the CI pipeline that creates or updates the feature flag for the given configuration profile. For example, the pipeline illustrated in the above diagram created a configuration profile in the AWS AppConfig service named TestOptionsFeatureFlag with a feature flag named pcrWalkin.
  • Step c: We now deploy the configuration profile (TestOptionsFeatureFlag) with feature flags within a specific environment created within AWS AppConfig. In the above example, the pipeline deploys the changes in the test-1 environment using AppConfig.Canary10Percent20Minutes as the chosen deployment strategy. This allows us to process the deployment exponentially over 20 mins and also monitor associated cloud watch alarms for 10 minutes. If an alarm is triggered during this time, AppConfig rolls back the deployment.

As soon as the deployment is complete, the new or updated feature flags are available for the lambda service to use.

Now looking at the citizen user journey as per the above diagram (figure1):

  • Step 1: Citizens can access the website to fill in various details like Symptoms etc.
  • Step 2: Before the Testing options page load happens, the controller invokes a backend API (GET /covid/test-options) to retreive the list of testing options. The API gateway proxies the request to reference-data lambda.
  • Step 3: Reference data lambda retrieves /test-options enum that includes the value pcr-test-centre (refer step3) as part of the newly developed feature to support the Walkin PCR Test.
    We want the ability to dynamically enable/disable this feature without requiring application code deployment as per requirements.
  • Step4: The response object from ENUM is converted into a JSON object and wrapped around a conditional logic using an externalized boolean flag pcrWalkin defined within AWS AppConfig Service. The reference-data lambda has an AWS-AppConfig-Extension layer attached to it. This lambda layer has built-in functionality to fetch and cache configuration/feature flag data from AWS AppConfig automatically at an interval controlled by the lambda environment variable AWS_APPCONFIG_EXTENSION_POLL_INTERVAL_SECONDS with a default value of 45. The caching feature provided by the layer implies the lambda can make a local call to always retrieve feature flag value from the cache (local call, hence fast & no additional cost).
    Let’s take a stab at understanding the pseudo-code in step 4 of the design that fetches the value from its local cache provided by the lambda layer. This cache is in sync with AppConfig configuration via a dynamic async process managed by the lambda layer. So every time a feature flag is updated in AWS AppConfig, the lambda layer will asynchronously update the local cache for each instance of warm lambda.
response = enumToJson(testOptions);configData= getConfigFromLocalCache(“http://localhost:2772/applications/serverless-demo/environments/test-1/configurations/TestOptionsFeatureFlag");//getConfigFromLocalCache() method makes a local call to an http endpoint to retreive cached value of AppConfig feature flags,where //application_name = serverless-demo,
//environment_name = test-1 (AppConfig environment)
//configuration_name =TestOptionsFeatureFlag(AppConfig configuration name)
//flag_name=pcrWalkin (Flag name defined within AppConfig)
if(configData.pcrWalkin.enabled == false){
response.remove(“pcr-test-centre”);
}
return response;
}
  • When pcrWalkin is false (disabled), the response filters out the pcr-test-center value from the response payload.
  • The Front end Controller receives the response from endpoint /covid/test-options and displays the available list for users to select, controlled dynamically via the feature flag managed within the AWS AppConfig service.

If you haven’t seen AppConfig in action I would recommend going through this blog article for a quick hands-on example to get a feel of the service and how easy it is with the lambda extension layer.

V. 3 Key takeaways

  1. “To a man with a hammer, everything looks like a nail”. Just because we have the means to make things dynamic, it doesn’t imply every configuration in our lambda and feature should be made dynamic. Feature Toggles come with added cost. The use of feature toggles should be on a case-by-case basis. Consider putting an upper limit on the total feature toggle allowed in a service or application. This constraint ensures a team will remain proactive in removing old release toggles and regularly revisit the usefulness of persistent toggles like the ones required to support Operations.
  2. Don’t reinvent the wheel. Use AWS-AppConfig-Extension layer with the lambda to offload integration with AppConfig and caching to the lambda layer. Less code and complexity for the team to manage.
  3. The ability to switch a feature on/off or use a specific configuration without application code deployment allows automating testing of both active and dormant functionality. Use AWS AppConfig SDK as part of the automation or regression script to test all variants of features supported by Feature Toggles.

It appears quite a bit of work to replace one lambda environment variable with an additional AWS service and CI/CD pipeline? The key to remember here is that when doing the above design at scale, it helps to centralize and manage feature flags from a single place without worrying about maintaining feature flags spread across various services. It allows us to be more agile by switching configurations safely and quickly.

Now we understand feature toggles — the classic (static) and native(dynamic) approach using AWS app config. In the next blog, we will up the game to introduce more evolved uses cases and how we can best use AWS Config to support more complex real-life use cases.

--

--

Anuj Kothiyal

Lead Digital Architect, Agile practitioner, Mentor and Fitness Enthusiast.