Serverless Diary: 3 Expert tips to designing distributed logging system

Original Photo by Markus Spiske from Pexels

Introduction

“Without you logging, I will be lost in this distributed universe
- love: serverless architecture”

Serverless architectures are the current trend and push to abstract application design and development from infrastructure issues. I have touched upon numerous benefits of serverless designs in my previous posts. As with all good things in life, nothing good comes without its own set of challenges. Logging in the serverless world is one such complex challenge. This article provides guidelines and an efficient approach to distributed logging for serverless architectures.

3 Reasons — Why logging

  1. Many enterprises have adopted a multi-cloud strategy, which allows them to overcome the limits of redundancy, scale, cost, and features in a single cloud provider. Running in multiple clouds requires centralized visibility and control. Distributed logging design using serverless components across the various public clouds is fundamental to achieve that centralized view.
  2. Logs are one of the 3 Pillars of Observability. (the other 2 being metrics and traces) Good logging design is key to making your systems more observable.

3 tiers - Log Management Infrastructure Architecture

  1. Log Generation.
    This tier contains the serverless components that generate the logs. Example AWS Lambda, API Gateway, Cloud Trail, etc
  2. Log Analysis and Storage
    This tier contains the serverless components responsible for receiving log data, performing transformation (if required), and forwarding these in near-real-time to a storage service. Example: AWS Cloud watch, Kinesis firehose, and S3 bucket.
  3. Log Monitoring
    This tier provides visualization tools to monitor and review log data and the results of automated analysis. Usually, for this tier, many enterprises prefer to use 3rd party tools like Alert Logic, Splunk, Kibana, etc. ELK stack is a good alternative to Splunk if you are using AWS as the only cloud provider.

3 guidelines - What to log

  1. Any Failures.
    These typically include Application and System errors(syntax and runtime errors, connectivity and performance issues), Input/Output validation failures (protocol violations, invalid parameters), Authentication, and Authorization failures.
  2. Selected Successful Events
    You want to be careful here, as this is the area where you can log more than what is required. Depending upon the business and the security requirements, one may want to log data like Authentication successes.
  3. Statutory or regulatory activities
    These must be identified and be proportionate to the business and security risks and threats. example- access to a restricted system or functionality

3’s in action

3 tier logging architecture

Steps

  1. Several **AWS services can publish logs directly to cloud watch. For others, AWS CloudTrail collates and stores Application Programming Interface (API) data from AWS services within its scope and then forwards API data to AWS CloudWatch, together with other metrics.
  2. AWS CloudWatch streams data into Kinesis Firehose. Metrics are also made available using the cloud watch data.
  3. Kinesis firehose then pushes data into the s3 bucket. For old data required only for compliance purposes, It’s recommended to us s3 lifecycle policy to change the storage class to Glacier.
    If you are using 3rd party product like Splunk then lambda configured with firehose can wrap it up as a Splunk HEC Event in JSON format and push it to Splunk.
  4. 3rd party products can now access AWS data via IAM role like Grafana can access cloud watch metrics, Alert logic can access data in s3.

Above is just one of several ways of making AWS logs and metrics available for monitoring. As stated in my earlier blog on integration, choosing an appropriate integration style should ideally be driven by the existing solution landscape.

3 takeaways

  1. Write Informative Structured Logs
    Write your logs as structured JSON from the start. CloudWatch Logs (and other 3rd party tools)understands and parses JSON. That will provide you a lot of power to filter and analyze the logs.
  2. Be mindful of excessive logging
    Serverless architectures are prone to creeping costs around logging and storage. So don’t blindly log everything. It all adds up pretty quickly when your production load increases suddenly or over time. For example- Collect (Data Ingestion) to CloudWatch costs $0.50 per GB. So assuming you have a busy system that generates 10GB of logs per day, you are already looking at a monthly cost of $150 alone from data ingestion. Also, it’s a good idea to archive old logs to cheaper storage like S3 Deep Glacier to reduce costs.

Digital Architect, Agile practitioner, Mentor and Fitness Enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store