Serverless Diary: How to Increase Agility with Feature Toggles — Classical Approach
I. Introduction
“Iterate Fast and Release Often ”
Being Agile is many things, and one of the critical concepts in the context of Software Delivery is the ability to release your software frequently into production. DevOps practice of “Continuous Delivery” provides a means of achieving this. Continuous delivery is a well-known concept in the modern public cloud era. The availability of native tools and services like AWS Code Pipeline provides an easy technical means of delivering software into production as the code is being built and tested.
If we have technical means of deploying our code in production, what is the main challenge in being agile and sticking to a release cadence or continuous delivery model?
II. The Challenge
Let’s understand a scenario that will resonate with many of us working in an Agile delivery model. Regardless of which Agile methodology (Scrum, Kanban, or others) an organization is following, one of the desired outcomes is delivering features into production frequently, and preferable at a regular cadence, more frequent being the obvious Gold standard.
Let us look at a few common scenarios while working across multiple teams and value streams.:
- A team fails to complete the feature in a given sprint or before the expected release cadence.
- The team completed the feature ahead of time, but due to business or policy guidance, it mustn’t go live before a specific date.
- The team completed the feature, but the downstream or 3rd party systems are not ready or have a different release cadence.
- The team completed and deployed the feature as required, but an edge case scenario broke the downstream system, needing rollback by that system.
There are plenty of scenarios, but you can see a common theme emerging here. The team needs a mechanism to continuously deliver work and be prepared to handle uncertainty and impact from both within and outside the team to manage changes to live services in production.
III. The Solution
One way of dealing with the above challenges is being reactive and approaching each scenario differently. Using long-standing feature branches and release management dependencies allows a team to address the challenges mentioned in the previous section. It is a common approach I have observed in a few projects. Scenarios like point 4 may even require a rollback(another application code release) by all impacted systems. Not very agile.
The Feature Toggle approach fits perfectly into our software lifecycle to address discussed challenges. It acts as an enabler for feature teams to be Agile by allowing them to continuously check in code and release it into production by hiding the “Partial” or “Nor Ready” feature behind a simple “If” statement, nothing fancy about it.
If(isFeatureXReady){ renderFeatureX();}
Enabling or disabling a feature doesn’t require an application code change and allows more controlled and safe experimentation in production environments.
This blog will introduce the classical feature toggle approach in serverless architecture design. The subsequent blogs will build on the existing use case and introduce AWS AppConfig and how it helps accelerate working in an agile environment.
IV. The Build
Our fight with COVID 19 continues with the shift from pandemic to endemic phase, and we still need to be prepared for all eventualities. With that in mind, consider a hypothetical possibility.
“As an Administrator managing Citizen facing COVID testing website, I want the flexibility to offer PCR Test Centre appointments for a limited time as a contingency measure on top of always available PCR and LFD Home tests.”
Let us look at our AS-IS design and understand the existing landscape before we iterate and extend functionality.
As per the figure1, we can observe the following:
- Citizens can access a website to fill in various details like Symptoms etc.
- Various testing options are displayed to select from- Order LFD and PCR home test kits. The list provided is dynamic and fetched from the backend reference data service via API Gateway configured with a lambda authoriser (standard serverless API design pattern)
- After selecting one of the testing options, the citizen fills in any detail specific to the testing type and other things like delivery address, etc.
- The citizen goes ahead and submits the form to confirm the order. The Front end code makes calls to the appropriate endpoint based upon user selection, and API Gateway routes the order submission payload to the relevant lambda.
Now let’s look at the next version of the design, the classic serverless approach of AWS Lambda, where we add a conditional new feature.
As per the figure2, we can observe the following developments since the AS-IS design:
- Before the Testing options page load happens, the controller invokes a backend API (GET /covid/test-options) to retreive the list of testing options.
- The API gateway proxies the request to reference-data lambda that retrieves /test-options enum which has a new value “pcr-test-centre” (refer step3) as part of a newly developed feature to support -Walkin PCR Test
- We want the ability to enable/disable this feature in production as per existing COVID Testing demands and government policies. Hence the response object from ENUM is converted into a JSON object and wrapped around a conditional logic using an externalized boolean flag “isPCRWalkinDisabled” (lambda environment variable) to control the behavior of conditional logic.
- When isPCRWalkinDisabled is true, the response filters out the pcr-test-center value from the response payload. (Refer to step 4)
- The Front end Controller receives the response from endpoint /covid/test-options and displays the available list for users to select. (refer to steps 5 &6)
The team can now develop and release tickets and stories into the production behind the static feature flag isPCRWalkinDisabled is true. This feature can remain dormant in production while developers progress onto new features without requiring a long-lived feature branch to manage the project and release dependencies. When the website administrator wishes to activate/deactivate the Walk-in PCR test feature, the lambda env variable value would be updated accordingly and re-deployed in production. The website administrator can now provide Walk in PCR test for a given period, like in winters when demand is high, and remove the option for other periods.
V. Summary
Summarising the pros and cons of implementing feature flag using the “Classic Approach” with lambdas:
Feature toggles are fantastic as they provide an agile and safe environment to perform fail-fast experimentations, manage the rollout of complex epics, and mitigate the problems associated with release deployment and dependencies. But, as with everything in life, this comes at a price. They are costly as they increase code complexity, tech debt, and bugs over a period of time. Hence, paying off the technical debt should not be neglected to allow the organization to reap maximum benefits across all phases of the software lifecycle.