Stop Drawing Your Infrastructure: The Shift to State-Driven Diagrams

April 22, 2026 · One min read

Originally published at HackerNoon

Engineering Resilience: A Deep Dive into Chaos Engineering in Distributed Systems

April 7, 2026 · One min read

Originally published at HackerNoon

Kill Heroes, Build Systems and Processes

March 9, 2026 · One min read

Originally published at HackerNoon

Stop Throwing AI at Broken Systems: Fix Your Engineering Culture First

February 20, 2026 · One min read

Originally published at HackerNoon

Use the 4-C Framework to Build Observability in Cloud Native Environments

January 22, 2025 · One min read

Originally published at HackerNoon

Accelerating cloud migration from console to IAC tools using “Terraform Import”

July 12, 2024 · 5 min read

With the advantages such as reduced upfront cost, little to no maintenance costs, organizations both large and small are moving to cloud, for their storage, compute and sometimes their complete end-to end business operations.

hese different stages are based upon the Cloud maturity of the organization, which is divided into 5 progressive cloud maturity stages ( Although stage 4 is quite new). To simplify,

Stage 0 — When all the data storage, web/data hosting and operations are done entirely done in the legacy/local systems

Stage 1 — When the company has all the data storage, web/data hosting, and most of the operations are in legacy and only a few costly operations are moved to the cloud.

Stage 2 — When the company chooses to migrate certain part, which might be operations or storage or compute needs to the cloud and the rest are provided from legacy back-end systems.

Stage 3 — When the company has completed its migration and have its entire business in the cloud

Stage 4 — When the company has created multi-cloud deployment either in active-active or active-passive implementation mode.

What is Infrastructure as Code(IAC)

When the company is in the initial stages of cloud, like stage 1 and stage 2, the development of cloud infrastructure can be done entirely from console. But as our infrastructure scales, in stage 3 and stage 4, we can no longer do it from the console, as the infrastructure we have to manage, maintain and update is in a very large scale. Creating and maintaining it at that large scale is quite difficult. To reduce this problem, we have “Infrastructure as Code (IAC) “ tools such as Terraform, Ansible etc. (The Terraform is better suited for infrastructure provisioning while the Ansible is better suited for Configuring provisioned infrastructure / Configuration Management).

The Infrastructure as Code is a concept in which the process of provisioning, maintaining, and managing infrastructure(networks, virtual machines, load balancers, and connection topology), is done through machine-readable definition files (usually written in YAML or JSON), rather than physical hardware configuration or interactive configuration tools.

Implementing IAC tools within the existing development environment sometimes can be daunting as their resources and knowledge to implement, would be, at the moment quite scarce. To overcome this problem, we are going to talk about a certain method, which would ease your Terraforming journey. The tool is a command within the terraform environment called terraform import. In this post, for the sake of demonstration, we are will be using AWS ( as it takes the largest share among cloud providers).

What is Terraform import and how it can help us?

Before we explore the implementation, let's try to understand what this command is and what it does.

Terraform import is one of the commands within the terraform environment, which would help us import the configuration settings, preferred storage selections, security configuration of our already created resources into the terraform state file(The terraform state file is like a treasurer of the terraform environment, who keeps track of what resources are up and running and what are destroyed)

Importing the state file helps us understand:

How our preferences in the console can be translated to the commands within the terraform syntax
To understand which roles or policies a resource needs for its successful working.
How can I create multiple resources, given that we are able to leverage the existing roles and policies, within the cloud environment?

To understand this better, Lets implement an example in AWS environment.

Terraform import implementation in AWS

For the example, let us create an EC2 with t2.micro instance :

EC2 instance:

Lets first create a EC2 in the console and later use terraform import to import resource state into terraform state-file.

Launching EC2 from console:

First log into terraform console and type EC2 in the search bar
Create EC2 instance with t2.micro and Ubuntu image. 3 After couple of minutes the EC2 is created along with its IAM role, and EC2 status is changed into running.
Now, in-order to import , lets open our favorite IDE (which in my case is VS code).
Before we can use terraform in AWS , we need to configure AWS CLI and terraform CLI (Please refer this link, to do it : https://alexander.holbreich.org/2019-terraforming-aws/)
Once AWS and terraform are configured, create a empty resource as shown below in VS Code.
Initialize the terraform within the directory, within the directory using terraform init
Then type terraform import aws_instance.test i-0cc2507156b9510c1( The general form is “terraform import aws_instance. ”) and press enter. You will get the following message, once the resource has been imported.

9. You can view your resources in your local terraform statefile using terraform state show aws_instance.web ( form is “terraform import aws_instance. )

Hope this tip accelerates your AWS journey and help you build amazing infrastructure. Happy Building !

If you would like to connect with me, you can follow my blog here or on linked-in or on Git-hub.

Product-Management in Agile Projects: Addressing Technical Debt in DevOps Projects

July 12, 2024 · 5 min read

While developing products in DevOps teams, we take decisions on which features to develop, how to ship them quite quickly, in order to meet the customer requirements. Often these decisions causes more problems in the long run. These kind of decisions lead to “Technical Debt”.

Tech debt is phenomenon which happens when we prioritise the speed of delivery now, by forgoing everything like code-quality or maintainability. Although the agility of delivery of products is key to stay relevent in this agile world, but we have to make decisions also that the changes are sustainable.

In this article, we’ll talk about what technical debt is, how to handle quick decisions during development, and give examples to help you understand how to avoid future issues.

Tech debt is the extra work we has to be done later because of the technical decisions that we make now. Although it was coined by software developer Ward Cunningham in 1992, but it’s still holds relevance .

Usually, Technical debt occurs when teams rush to push new features within deadlines, by writing write code, without thinking about other considerations such as security, extensibility etc. Over the time the tech debt increases and becomes difficult to manage. The only way to deal with tit then becomes to overhaul the entire system and rewrite everything from scratch. To prevent this scenario we need to continuously groom the tech debt and to that we need to understand the type of tech debt we are dealing with.

Causes of Tech Debts:

Prudent and deliberate: Opting for swift shipping and deferring consequences signifies deliberate debt. This approach is favoured when the product’s significance is relatively low, and the benefits of quick delivery outweigh potential risks.

Reckless and deliberate: Despite knowing how to craft superior code, prioritising rapid delivery over quality leads to reckless and deliberate debt.

Prudent and inadvertent: Prudent and inadvertent debt occurs when there’s a commitment to producing top-tier code, but a superior solution is discovered post-implementation.

Reckless and inadvertent: Reckless and inadvertent debt arises when a team strives for excellence in code without possessing the necessary expertise. Often, the team remains unaware of the mistakes they’re making.

Given these different causes for tech debts, lets try to understand the types of tech debts. These can be broadly categorised under three main heads

Types of Tech Debts:

Code Debt: When we talk about talk debt, code debt is the first thing that comes to the mind. It is due to bad coding practices, not following proper coding standards , insufficinet code documentation etc. This type of causes problem in terms of maintainability, extensibility, security etc.
Testing Debt: This occurs when the entire testing strategy is inadequate , which includes the absence of unit tests, integration tests, and adequate test coverage. This kind of debt causes us to loose confidence pushing new code changes and increases the risk of defects and bugs surfacing in production, potentially leading to system failures and customer dissatisfaction.
Documentation Debt: This manifests when documentation is either insufficient or outdated. It poses challenges for both new and existing team members in comprehending the system and the rationale behind certain decisions, thereby impeding efficiency in maintenance and development efforts.

Architecture Debt:

Design Debt: This results from flawed or outdated software architecture or design choices. It includes overly complex designs, improper use of patterns, and a lack of modularity. Design debt creates obstacles to scalability and the smooth incorporation of new features.
Infrastructure Debt: This is linked to the operational environment of the software, encompassing issues such as outdated servers, inadequate deployment practices, or the absence of comprehensive disaster recovery plans. Infrastructure debt can result in performance bottlenecks and increased periods of downtime.
Dependency Debt: This arises from reliance on outdated or unsupported third-party libraries, frameworks, or tools. Such dependency exposes the software to potential security vulnerabilities and integration complexities.

People/Management Debt:

Process Debt: This relates to inefficient or outdated development processes and methodologies. It includes poor communication practices, a lack of adoption of agile methodologies, and a lack of robust collaboration tools. Additionally, not automating the process can greatly affect the software delivery’s agility.
People/Technical Skills Debt: This occurs when the team lacks essential skills or knowledge, resulting in the implementation of sub-optimal solutions. Investing in training and development initiatives can help reduce this type of debt.

Managing and Prioritising Tech Debt

Technical debt is something that happens when teams are developing products in aglie way. It’s like borrowing against the future by taking shortcuts now. But if the team knows about this debt and has a plan to deal with it later, it can actually help prioritise tasks. Whether the debt was intentional or not, it is crucial that the team grooms the technical debt during a backlog refinement session.

Value to customer vs Cost of solving it

Do It Right Away: These tasks are crucial for the product’s smooth operation.
A Worthy Investment: These tasks contribute to the product’s long-term health, such as upgrading outdated systems.
Quick and Easy Wins: These are minor tasks that can be fixed easily. They’re great for familiarising new team members with the product.
Not Worth Considering: Sometimes, the problem might solve itself or it might not be worth the time and effort to fix, especially if a system upgrade or retirement is planned.

While facing deadlines and working on new products, it’s easy to overlook accumulating technical debts. But if left unchecked, these debts can cause long-term problems. It’s key to balance the need for quick solutions with the importance of long-term stability.

While fast delivery and continuous improvement are central to agile development, it’s important to be mindful of accruing technical debts. Effectively managing technical debt can help ensure your projects’ long-term success.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

Developing Real-time log monitoring and email — alerting with Server-less Architecture using Terraform

July 12, 2024 · 6 min read

Why Log Monitoring ?

Lets say that you have build a certain app ( Here we are building an app based on micro-service architecture) using containerized solution in EKS (Elastic Kubernetes Service) or running an standalone app in EC2 (Elastic Cloud Compute) instance. And to monitor this app, we are sending the application logs to cloud watch-logs. But having to keep a constant eye on the this resource log group is tiresome and sometimes technically challenging, as there are hundred other micro-services that send their own logs to their log groups. And as this app scales up, we need to invest more human resources to perform mundane tasks such as monitor these logs, which could be better utilized in developing new business frontiers.

What if we can build an automated solution, which scales efficiently in terms of cost and performance, help us with monitor and alert if there are any issues within the logs ? We can build this tool in one of the two architecture styles mentioned below :

Using Server based architecture (or)
Server-less architecture.

Server-Centric (or) Server-less Architecture?

With the advent of the cloud technologies, we have moved from server-centric to on demand servers to now the server-less. Before we choose server-centric, on-demand servers or server-less architecture, we must ask ourselves few questions:

How am i going to serve the feature that i am developing? ( Is it extension of available Eco-system or a stand-alone feature?)
What should be its availability and Scalability? What is it runtime requirement?
Does the feature have stateful or stateless functionality?
What is my budget of running this feature?

If the your answers to above questions are quite ambiguous, always remember one thing Prefer Server-less over Server-Centric, if your solution can be build as server-less ( Your Cloud Architect might help you with decision).

In my case, as my log-Monitoring system is

A Standalone system
It is event-based ( the event here is log), which needs to be highly available and should be scalable for logs from different services.
The feature is Stateless.
Budget is Minimum.

Given the above answers, i have chosen Server-less Architecture.

Case Example

This system can be better illustrated by an example. Let say that we have built our application in JAVA ( application is running in tomcat within a node in EKS) and this application in deployed within the EKS cluster.

Example Log -1

java.sql.SQLTransientConnectionException: HikariPool-1 — Connection is not available, request timed out after 30000ms.  
 at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithin Transaction(TransactionAspectSupport.java:367)

Example Log -2

at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(Transaction Interceptor.java:118) 
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:143) 
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)

We would like to get notified every time the application log reads the keyword “ERROR” or “Connection Exception”, as seen in the log above.To achieve this, lets build our monitoring and alerting system.

Key components to build Log Monitoring and Alerting System

AWS Cloud-watch Logs
AWS log filter pattern
AWS Lambda
Simple Notification Service (SNS)
Email Subscription

We combine the above AWS resources, as shown in the architecture diagram above, to create a Real-time server-less log monitoring system.

Building Infrastructure and Working with Terraform

Lets first create a log group, which would receive all the application logs

terraform {
 required_providers {
  aws = {
   source = "hashicorp/aws"
   version = "~> 3.0"
  }
 }
}

# Configure the AWS Provider
provider "aws" {
 region = var.region
}

# Extract the current account details
data "aws_caller_identity" "current" {}

data "aws_region" "current" {}

# Create a Log group to send your application logs
resource "aws_cloudwatch_log_group" "logs" {
 name = "Feature_Logs"
}

Once this resource is created, we expose all our log traffic from application layer in EKS to this log group. As the application starts working, all its outputs and errors are sent as log stream to this log group.

After the above step, we start receiving the logs. Every time the application layer throws an error or connection exception, we would like to get notified, so our desired keywords are “Error” and “Connection Exception” within the log stream of the Cloud watch log group.
We can do this, using the cloud-watch log subscription filter which helps parse all those logs and find the logs which contain either the keyword “Error” or such keywords.

resource "aws_cloudwatch_log_subscription_filter" "logs_lambdafunction_logfilter" {
 name = "logs_lambdafunction_logfilter"

 # role_arn = aws_iam_role.iam_for_moni_pre.arn
 change_log_group_name = aws_cloudwatch_log_group.logs.name

 filter_pattern = "?SQLTransientConnectionException ?Error" // Change the error patterns here

 destination_arn = aws_lambda_function.logmonitoring_lambda.arn
}

When the cloud-watch log subscription filter sends logs to any receiving service such as AWS lambda , they are base64 encoded and compressed with the gzip format. In order for us to unzip , decode the logs and send them to SNS, we need AWS Lambda service.

We create this Lambda service, as a log based triggered event(Thanks to cloudwatch logs), which receives the log events from log group, Unzips it, decodes it base 64, and sends the log to the SNS topic, whose arn is passed as Environment variable to the lambda function.

resource "aws_lambda_function" "logmonitoring_lambda" {
 function_name = "logmonitoring_lambda"
 filename   = data.archive_file.Resource_monitoring_lambda.script.output_path
 script     = data.archive_file.Resource_monitoring_lambda.script
 output_path  = data.archive_file.Resource_monitoring_lambda.script.output_path
 handler    = "lambda_function.lambda_handler"
 package_type = "Zip"
 role      = aws_iam_role.iam_for_moni_pre.arn
 runtime    = "python3.9"
 source_code_hash = filebase64sha256(data.archive_file.Resource_monitoring_lambda.script.output_path)

 timeouts {}

 tracing_config {
  mode = "PassThrough"
 }

 environment {
  variables = {
   sns_arn = "${aws_sns_topic.logsns.arn}"
  }
 }
}

resource "aws_lambda_permission" "allow_cloudwatch" {
 statement_id = "AllowExecutionFromCloudWatch"
 action    = "lambda:InvokeFunction"
 function_name = aws_lambda_function.logmonitoring_lambda.function_name
 principal   = "logs.${data.aws_region.current.name}.amazonaws.com"
 source_arn  = "arn:aws:logs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:*"
}

Having received the decoded logs from lambda, the SNS (Simple Notification Service) topic sends this filtered log to its email subscription and the subscribed email owner gets the email about the filtered log.

resource "aws_sns_topic" "logsns" {
 name = "logsns"
}

resource "aws_sns_topic_subscription" "snstoemail_email_target" {
 topic_arn = aws_sns_topic.logsns.arn
 protocol = "email"
 endpoint = var.email
}

The resources in this architecture, as it it is server-less, are only invoked when there there are such key words in the logs. Hence this method is cost optimized.

If you would like to connect with me , you can follow my blog here (or) on linked-in and you can find all the code in my Git-hub.

Here is the lambda python script:

import gzip
import json
import base64
import boto3
import os

def lambda_handler(event, context):
  log_data = str(gzip.decompress(base64.b64decode(event["awslogs"]["data"])), "utf-8")
  json_body = json.loads(log_data)
  print(json_body)

  sns = boto3.client('sns')
  print(os.environ['snsarn'])
  response = sns.publish(
    TopicArn=str(os.environ['snsarn']),
    Message=str(json_body)
  )
  print(response)

How AWS CloudWatch Agent on Kubernetes Blew Our AWS Bill

July 12, 2024 · 4 min read

When running a microservice-based architecture, traffic flows from the front-end, passes through multiple microservices, and eventually receives the final response from the back-end. Kubernetes is a container orchestrating service that helps us run and manage these numerous microservices, including multiple copies of them if necessary.

During the lifecycle of a request, if it fails at a specific microservice while moving from one service to another, pinpointing the exact point of failure becomes challenging. Observability is a paradigm that allows us to understand the system end-to-end. It provides insights into the “what,” “where,” and “why” of any event within the system, and how it may impact application performance.

There are various monitoring tools available for microservice setups in Kubernetes, both open-source (such as Prometheus and Grafana) and enterprise solutions (such as App Dynamics, DataDog, and AWS CloudWatch). Each tool may serve a specific purpose.

Story Time — How we built our Kubernetes

In one of our projects, we decided to build a lower environment on an AWS Kubernetes cluster using Amazon Elastic Kubernetes Service (EKS) on Amazon Elastic Compute Cloud (EC2). We had around 80+ microservices running on EKS, which were built and deployed into the Kubernetes cluster using GitLab pipelines. During the initial development phase, we had poorly developed Docker images that consumed a significant amount of disk space and included unnecessary components. Additionally, we were not utilizing multi-stage builds, further increasing the image size. For monitoring purposes, we deployed the AWS CloudWatch agent, which utilizes Fluentd to aggregate logs from all the nodes and sends them to CloudWatch Logs.

Setting up Container Insights on Amazon EKS and Kubernetes

How to install and set up CloudWatch Container Insights on Amazon EKS or Kubernetes.

During a routine cost check, we made a startling discovery. The billing for AWS CloudWatch Logs (where the CloudWatch agent sends logs) in our setup was typically around 20–30 dollars per day, but it had spiked to 700–900 dollars per day. This had been going on for five days, resulting in a bill of 4500 dollars solely for the CloudWatch Logs and NAT gateway (used for sending logs to CloudWatch via public HTTPS). As an initial response, we stopped the CloudWatch agent daemon set and refreshed the entire EKS setup with new nodes.

What went wrong

As a temporary fix, we halted the CloudWatch agent running as a daemon set in our cluster to prevent further billing. Upon investigation, we discovered that a large number of pods were in an evicted state. The new pods attempting to start (as Kubernetes tries to match the desired state specified in manifests/Helm charts) were also going into the evicted state. This led to a high volume of logs, which were then sent to CloudWatch Logs via the CloudWatch agent. Since log billing is based on ingestion and storage, it significantly contributed to our AWS bill. This eviction was caused by a condition called “node disk pressure.” The node disk pressure occurred due to the following reasons:

The existing pod had generated a large number of logs, occupying significant disk space.
When a new version of the app was deployed in the cluster, the new container (approximately 3 GB in size) could not start due to insufficient available space.
After multiple attempts to start the pod, it went into an evicted state.
As the current pod was evicted, the deployment controller deployed another pod to match the desired state specified in the deployment.
These events generated more logs, further consuming available disk space.
This cycle continued for five days, exacerbating the situation.

How we resolved it

To address the problem, we implemented the following solutions:

Once we identified the issue, we refreshed Kubernetes by replacing the existing nodes with a new set. This action cleared up all the disk space on the nodes, and since all our logs are stored in CloudWatch Logs, we resolved the log-related concerns.
Additionally, we implemented multi-stage builds, which reduced the overall image size for deployment.
Lastly, we set up CloudWatch alarms to trigger when the disk usage percentage exceeds a certain threshold.

Deploy and Run Hashicorp Vault With TLS Security in AWS Cloud

July 12, 2024 · 9 min read

Often in software engineering, when we are developing new features, it is quite a common feature that we would encode certain sensitive information, such as passwords, secret keys, or tokens, for our code to do its intended functionality. Different professionals within the IT realm use it in different ways, such as the following:

Developers use secrets from API tokens, Database credentials, or other sensitive information within the code.
Dev-ops engineers might have to export certain values as environment variables and write the values in the YAML file for CI/CD pipeline to run efficiently.
The cloud engineers might have to pass the credentials, secret tokens, and other secret information for them to access their respective cloud (In the case of AWS, even if we save these in a .credentials file, we still have to pass the filename in terraform block, which would indicate that the credentials are available locally within the computer.)
The system administrators might have to send different logins and passwords to the people for the employees to access different services

But writing it in plain text or sharing it in plain text is quite a security problem, as anyone logging in to the code-base might access the secret or pull up a Man-in-the-Middle attack. To counter this, in the developing world, we have different options like Importing secrets from another file ( YAML, .py, etc.) or exporting them as an environment variable. But both of these still have a problem: a person having access to a singular config file or the machine can echo out the password ( read print). Given these problems, it would be very useful if we could deploy a single solution that would provide solutions to all the IT professionals mentioned above and more. This is the ideal place for introducing Vault.

HashiCorp Vault — an Introduction

HashiCorp Vault is a secrets and encryption management system based on user identity. If we have to compare it with AWS, it is like an IAM user-based resource (read Vault here) management system which secures your sensitive information. This sensitive information can be API encryption keys, passwords, and certificates.

Its workflow can be visualized as follows:

Hosting Cost of Vault

Local hosting: This method is usually done if the secrets are to be accessed only by the local users or during the development phase. This method has to be shunned if these secret engines have to be shared with other people. As it is within the local development environment, there is no additional investment for deployment. This can be hosted directly in a local machine or by its official docker image
Public Cloud Hosting ( EC2 in AWS/Virtual Machine in Azure): If the idea is to set up Vault to share with people across different regions, hosting it on Public cloud is a good idea. Although we can achieve the same with the on-prem servers, upfront costs and scalability is quite a hassle. In the case of AWS, we can easily secure the endpoint by hosting Vault in the EC2 instance and creating a Security group on which IPs can access the EC2. If you feel more adventurous, you can map this to a domain name and route from Route 53 so the vault is accessible on a domain as a service to the end users. In the case of EC2 hosting with an AWS-defined domain, the cost is $0.0116/hr.
Vault cloud Hosting (HashiCorp Cloud Platform): If you don’t want to set up infrastructure in the Public Cloud Environments, there is an option of choosing the cloud hosted by vault. We can think of it as a SaaS-based cloud platform that enables us to use the Vault as a service on a subscription basis. Since hashicorp itself manages the cloud, we can expect a consistent user experience. For the cost, it has three production grade options: Starter at $ 0.50/hr, Standard at $1.58/hr, and Plus at $1.84/hr (as seen in July 2022).

Example of Self-Hosting in AWS Cloud

Our goal in this Project is to create a Vault instance in EC2 and store static secrets in the Key—Value secrets engine. These secrets are later retrieved into the terraform script, which, when applied, would pull the secrets from the Vault Secrets Engine and use them to create infrastructure in AWS.

To create a ready-to-use Vault, we are going to follow these steps:

Create an EC2 Linux instance with ssh keys to access it.
SSH into the instance and install the Vault to get it up and running
Configure the Valve Secrets Manager

Step 1: Create an EC2 Linux instance with ssh keys to access it

To create an EC2 instance and access it remotely via SSH, we need to create the Key pair. First, let's create an SSH key via the AWS console.

Once the Keys have been created and downloaded into the local workbench, we create an EC2 (t2.micro) Linux instance and associate it with the above keys. The size of the EC2 can be selected based on your requirements, but usually, a t2.micro is more than enough.

Step 2: SSH into the instance and install the secrets to get it up and running

Once the status of the EC2 changes to running, open the directory in which you have saved the SSH(.pem) key. Open a terminal and type ssh -i <keyname.pem> ec2-user @<publicdns IP4> . Once we have established a successful SSH session into our Ec2 instance, we can install the Vault using the following commands:

wget -O- <https://apt.releases.hashicorp.com/gpg> | gpg — dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg  
  
echo "deb \[signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg\] <https://apt.releases.hashicorp.com> $(lsb\release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list  
  
sudo apt update && sudo apt install vault

The above command would install the vault in the EC2 environment. Sometimes the second command is known to throw some errors. In case of an error, replace $(lsb_release -cs) with “jammy”. [This entire process can be automated by copying the above commands to EC2 user data while creating an EC2 instance].

Step 3: Configure the Hashicorp valve

Before initializing the vault, let's ensure it is properly installed by following the command:

vault

Let's make sure there is no environment variable called VAULT_TOKEN. To do this, use the following command:

$ unset VAULT_TOKEN

Once we have installed the Vault, we need to configure the Vault, which is done using HCL files. These HCL files contain data such as backed, listeners, cluster address, UI settings, etc. As we have discussed in the Vault’s Architecture, the back end on which the data is stored is very different from the vault engine, which is to be persisted even when the vault is locked (Stateful resource). In addition to that, we need to specify the following details:

Listener Ports: the port/s on which the Vault listens for API requests.
API address: Specifies the address to advertise to route client requests.
Cluster address: Indicates the address and port to be used for communication between the Vault nodes in a cluster. To secure it much further, we can use TLS-based communication. This step is optional and can only be tried if you want to secure your environment further. The TLS Certificate can be generated using openssl in Linux.

# Installs openssl  
sudo apt install openssl  
  
#Generates TLS Certificate and Private Key  
openssl req -newkey rsa:4096 -x509 -sha512 -days 365 -nodes -out certificate.pem -keyout privatekey.pem 

Insert the TLS Certificate and Private Key file paths in their respective arguments in the listener “tcp” block.

tls_cert_file: Specifies the path to the certificate for TLS in PEM encoded file format.
tls_key_file: Specifies the path to the private key for the certificate in PEM-encoded file format.

#Configuration in config.hcl file  
  
storage "raft" {   
path = "./vault/data"   
node\id = "node1"   
}  
listener "tcp" {  
 address = "127.0.0.1:8200"   
tls\disable = "true"  
tls\cert\file = certificate.pem  
tls\key\file = privatekey.pem  
}  
disable\mlock = true  
api_addr = "http://127.0.0.1:8200"   
cluster_addr = "https://127.0.0.1:8201"  
 ui = true

Once these are created, we create the folder where our backend will rest: vault/data.

mkdir -p ./vault/data

Once done, we can start the vault server using the following command:

vault server -config=config.hcl

Once done, we can start our Vault instance with the backend mentioned in the config file and all its settings.

export VAULT_ADDR='http://127.0.0.1:8200'  
  
vault operator init

After it is initialized, it creates five Unseal keys called shamir keys (out of which three are used to unseal the Vault by default settings) and an Initial root token. This is the only time ever that all of this data is known by Vault, and these details are to be saved securely to unseal the vault. In reality, these shamir keys are to be distributed among key stakeholders in the project, and the Key threshold should be set in such a fashion that Vault can be unsealed when the majority are in consensus to do so.

Once we have created these Keys and the initial token, we need to unseal the vault:

vault operator unseal

Here we need to supply the threshold number of keys to unseal. Once we supply that, the sealed status changes to false.

Then we log in to the Vault using the Initial root token.

vault login

Once authenticated successfully, you can easily explore different encryption engines, such as Transit secrets Engine. This helps encrypt the data in transit, such as the Key-Value Store, which is used to securely store Key-value pairs such as passwords, credentials, etc.

As seen from the process, Vault is pretty robust in terms of encryption, and as long as the shamir keys and initial token are handled in a sensitive way, we can ensure the security and integrity

And you have a pretty secure Vault engine (protected by its own shamir keys) running on a free AWS EC2 instance (which is, in turn, guarded by the security groups)!

**Want to Connect?**  
  
If you want to connect with me, you can do so on [LinkedIn](https://www.linkedin.com/in/krishnadutt/).

Optimizing Golang Docker images with multi-stage build

July 12, 2024 · 5 min read

With the increasing scale of development required to build an product, a large number of developers are required to develop, share and maintain the code. And as each of the developer’s environment are different from one another, it becomes quite a hassle to create similar environments with similar library versions. To solve this issue we use Docker, which creates similar environment experience for all the developers. When using docker, often we face problem of creating a large docker images sometimes racking up to some GBs of space. This idea very much defeats the idea that Docker has evolved beyond the classic VMs — “To create light weight and resource-light images that the developers can work easily ”

To solve this problem of docker image bloating, we have several solutions such as using dockerignore to not add unnecessary files, using distroless/minimal base images or Minimizing the number of layers. But when we are building an application, we would different tools which would rule out the possibility of using distroless images. And when building we would be dealing with several steps so there is so much availability to reduce the layers within the dockerfile.

The tools that we use to build the application are often not used when running application. So what if we can somehow separate/remove these build tools and have only the tools which will run the application? Enter the Multi-stage builds in docker.

Docker : Multi-Stage Builds

Multi-stage builds is an implementation of Builder Pattern in a Dockerfile which helps in minimizing the size of the final container, improving run time performance, allowing for better organization of Docker commands. A multi-stage build is done by breaking the single-stage dockerfile into different sections (You can think them as different jobs like build, stage etc) within the same dockerfile, thereby creating separation of environments. As each of this step would use base image that is only useful to that step while passing its outputs to next step, we can keep the docker image lean. This can also be done using different dockerfiles in a CI (Continuous Integration pipeline), by passing the output of one stage to another. But the Multi-stage feature from docker remove the need to create all these steps with our pipeline and helps keep it clean.

Creating the Multi-Stage Docker file

For explaining this, we will be building and running a Movie application written in Golang, which performs basic crud operations. The code for the app can be found here.

As we know in Go, in order for the app to run we need to compile it. On compilation it will create a executable ( pertaining to that OS) and only this executable is required to run the application. To illustrate the power of multi-stage build let’s first build it as a single stage Docker file.

Once we run docker build on the above file, we get the following executable which is around 350 MB.

Now lets separate build stage and execution stage into two different environments. For build-stage environment, lets use the Golang image based on alpine which comes loaded with all the tools required to run, test, build and validate the Golang. We build our application using the tools using this environment tools. Once this is done we pass the executable to environment which is execution/production environment which will run the executable.

Since the executable is created, we would not require much of the previous environment tools and can work with a base alpine image. Once we run docker build on this file, we observe that size of the file is around 13 MB ( named crud_multistage in the below picture) compared to 350 MB (named crud in the below picture) from single-stage Dockerfile. This multi-stage build offered around 95% reduction in total sie of the docker image

Since this image is very small, it easier for portability and can be easily to deploy in production. Although the multi-stage build sounds like a fantastic idea, there are certain scenarios when this should be used and certain scenarios when this should be avoided.

When not to use Multi-stage build:

When the language you are writing completely packages requirements into a single file ( like in case of GO etc) or at-least as a group of file ( in case of JavaScript etc).
If you are not planning to run docker exec commands on final artifact to explore the application code.
If you don’t require the tools and files used in build stage, further down the line to debug the final artifact.

When to use Multi-stage build:

When you want to minimize the total size of the final Docker image that you deploy into production.
When you want to speed up your CI/CD processes by running steps/stages in the Docker file in parallel.
When different layers in Your Docker file are straight-forward and standardized.
When you are fine with loosing the build intermediaries and only want the final docker artifact.

InfraSecOps : Enable Monitoring and automated continuous Compliance of Security Groups using Cloud-watch and Lambda

July 12, 2024 · 5 min read

As a Dev-ops engineer, we use different compute resources in our cloud, to make sure that different workloads are working efficiently. And in order to restrict the traffic accessing our compute resources ( EC2/ECS/EKS instance in case of AWS) , we create stateful firewalls ( like Security groups in AWS). And as a lead engineer, we often describe the best practices for configuring the Security groups.But when we have large organization working on cloud, monitoring and ensuring each team follows these best practices is quite a tedious task and often eats up lot of productive hours. And it is not as if we can ignore this, this causes security compliance issues.

For example, the Security group might be configured with following configuration by a new developer ( or some rogue engineer). If we observe the below , security group which is supposed to restrict the traffic to different AWS resources is configured to allow all kinds of traffic on all protocols from the entire internet. This beats the logic of configuring the securing the resource with security group and might as well remove it.

{  
    "version": "0",  
    "detail-type": "AWS API Call via CloudTrail",  
      "responseElements": {  
        "securityGroupRuleSet": {  
          "items": \[  
            {  
              "groupOwnerId": "XXXXXXXXXXXXX",  
              "groupId": "sg-0d5808ef8c4eh8bf5a",  
              "securityGroupRuleId": "sgr-035hm856ly1e097d5",  
              "isEgress": false,  
              "ipProtocol": "-1",  --> It allows traffic from all protocols  
              "fromPort": -1, --> to all the ports  
              "toPort": -1,  
              "cidrIpv4": "0.0.0.0/0" --> from entire internet, which is a bad practice.  
            }  
          \]  
        }  
      },  
    }  
  }

This kind of mistake can be done while building a Proof Of Concept or While testing a feature, which would cost us lot in terms of security. And Monitoring these kind of things by Cloud Engineers takes a toll and consumes a lot of time.What if we can automate this monitoring and create a self-healing mechanism, which would detect the deviations from best practices and remediate them?

The present solution that i have built in AWS, watches the each Security group ingress rule ( can be extended to even egress rules too) the ports that it is allowing, the protocol its using and the IP range that it communicating with. These security group rules are compared with the baseline rules that we define for our security compliance, and any deviations are automatically removed. These base-rules are configured in the python code( which can be modified to our liking, based on the requirement).

Components used to build this system

AWS Cloud trail
AWS event bridge rule
AWS lambda
AWS SNS
S3 Bucket
Whenever a new activity ( either creation/modification/deletion of rule) is performed in the security group, its event log not sent as event log to cloud watch ,but as api call to cloud trail. So to monitor these events, we need to first enable cloud trail. This cloud trail will monitor all the api cloud trails from EC2 source and save them in a log file in S3 bucket.
Once these api calls are being recorded, we need to filter only those which are related to the Security group api calls. This can be done by directly sending all the api call to another lambda or via AWS event bridge rule. The former solution using lambda is costly as each api call will invoke lambda, so we create a event bridge rule to only cater the api calls from ec2 instance.

3. These filtered API events are sent to the lambda, which will check for the port, protocol and traffic we have previously configured in the python code( In this example, i am checking for wildcard IP — which is entire internet, all the ports on ingress rule. You can also filter with with the protocol that you don't want to allow. Refer the code for details)

4. This python code will filter all the security groups and find the security group rules, which violate them and delete them.

Creating a rouge security group ruleThe lambda taking action and deleting the rouge rule

5. Once these are deleted, SNS is used to send email event details such as arn of security group rule, the role arn of the person creating this rule, the violations that the rule group has done in reference to baseline security compliance. This email altering can help us to understand the actors causing these deviations and give proper training on the security compliance. The details are also logged in the cloud-watch log groups created in the present architecture.

For entire python code along with terraform code, please refer the following Github repo. To replicate this system in your environment, change the base security rules that you want to monitor for in python and type terraform apply in the terminal. Sit back and have a cup of coffee, while the terraform builds this system in your AWS account.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

Exploring an Object-Oriented Jenkins Pipeline for Terraform:A novel architecture design in Jenkin's multi-stage Terraform CD pipeline to improve CI/CD granularity

July 12, 2024 · 3 min read

Usually, when we perform terraform plan, terraform destroy, or terraform apply, we apply these actions to all the resources in our target files, often main.tf (you can use any name for the file, but this name is just used as a convention).

In the age of CI/CD, when we have everything as pipelines right from data, and application code to infrastructure code, it is usually difficult to this granularity. Usually, at least in Terraform, to achieve these three different actions, we have three different pipelines to perform terraform plan: terraform apply and terraform destroy. And when we select a certain action (let's say terraform plan), this action is performed on all the stages and on all resources within the pipeline.

But when we observe all these pipelines, there is a commonality that can be abstracted out to create a generality, on which the dynamic nature can be inherited. Just as we create a class, using different objects with different attribute values can be built, is it possible to create a similar base class (read pipelines) which when instantiated can create different pipeline objects?

###One Pipeline to create them all###

##The Modular Infrastructure

In order to build this class-based pipeline, we first need to create a terraform script. This script developed should be loosely coupled and should be modular in nature. For this, we have created this modular script, which has three modules named “Networking,” “Compute,” and “Notifications.” The components that each of these modules create is as follows:

Networking: 1 VPC and 1 subnet
Compute : 1 IAM role, 1 Lambda, 1 EC2 t2.micro instance
Notifications: 1 SNS topic and 1 email subscription

And the file structure is as follows:

Once we have this ready, let’s create a groovy script in declarative style in a Jenkins file.

Class-Based Jenkins Pipeline

To create this class-based architecture style to flexibly create pipeline objects at the action and resource level, we are going to utilize a feature called “parameters” in Jenkins. This feature helps us create multiple objects using a single base class Jenkins pipeline. In this example, let’s create three actions namely:

terraform plan: This creates and prints out a plan of the resources that we are going to create in the respective provider ( can be AWS, Kubernetes, GCP, Azure, etc.)
terraform apply: This command creates the resources in the respective provider and creates a state-file that saves the current state of resources in it.
terraform destroy: This removes all the resources that are listed within the state-file.

These actions are performed on three modules/resources namely “Networking,” “Compute,” and “Notifications.”

The above parameters create a UI for the end user, as shown below, which would help the end user to create objects of the base pipeline on the fly.

Based on the actions selected and the resources on which these actions have to be done, Jenkins will create a dynamic pipeline according to your requirement. In the picture below, we see that we have applied terraform for the networking and compute resources in #24, and run terraform apply on networking and notification in run #25. To clean the infrastructure, we ran terraform destroy on run #26.

The present approach implemented is more in line with Continuous delivery principles than continuous deployment.

For the Jenkins file and Terraform code, refer to this link.

**Want to Connect?**Feel free to reach out to my [LinkedIn](https://www.linkedin.com/in/krishnadutt/) for interesting content and productive discussions.

DevOps Wizardry: Crafting Your Parlay GitHub Action - Improve your Development Process with Personalized Custom Automation

July 12, 2024 · 7 min read

Recently while trying to integrate a devsecops tool in my pipeline, i was trying to find the GitHub action which would simplify my workflow. Since i could not find it, i have to write the commands inline to run the command. Although it is not a hassle to write it within the script, it would be beneficial to have an action which we could directly call, pass parameters and run the action within the pipeline.

In this blog, i will walk you through different steps on how you can create a custom GitHub actions which would satisfy your requirement. The blog will be of 2 parts:

Understanding what the GitHub action are
Creating your custom GitHub actions

GitHub actions:

Often when we write pipelines, we would have a set of actions which we would like to perform based on the type of application that we are developing. In order for us to run these actions across the repos in our organization, we would have to copy + paste this code across the repositories, which would make this process error prone and maintenance tussle. It would be better if we take DRY principle of software engineering and apply it to CI/CD world.

GitHub action is exactly this principle in practice. We create and host the required action in a certain GitHub public repository and this action is used across the pipeline to perform the action defined in the action. Now that we understand what GitHub action is, lets explore how we can build a custom GitHub action which can help automate set of actions. For this blog, i illustrate it with an example of SBOM enrichment tool Parlay, for which i have built a custom action.

Creating Custom Action — A case on Parlay

We will be creating our custom action in the following steps:

Defining inputs and outputs in action.yml
Developing business logic in bash script
Dockerize the bash application
Test the action
Publish it in GitHub action Marketplace

Defining inputs and outputs in action.yml

To start creating custom action create a custom git repository, clone that repo in your local system and open it up in your favourite code editor. We start by creation a file named actions.yml. This actions.yml defines the inputs that the action would take, the outputs that it would give and the environment it will run. For our use case we have 3 inputs and 1 output. The actions.yml should have following arguments:

name: This would be the name of the action, which would be used to search in GitHub action marketplace. Since it would be published in marketplace, it’s name should be globally unique like s3 bucket.
description: This describes what your action would do. This would be helpful to identify which action would be the right fit for our use case.
inputs: Defines the list of options which would be used within the action. These can be compulsory or optional, which can be defined using “required” argument. In our current use case we are passing 3 arguments, input_file_name, enricher and output_file_name.
outputs: This enlists the list of outputs that the action gives.
runs: defines the environment in which action will execute , which in our case is docker

The action.yml will look something like this:

\# action.yaml  
name: "Parlay Github Action"  
description: "Runs Parlay on the given input file using the given enricher and outputs it in your given output file"  
branding:  
  icon: "shield"  
  color: "gray-dark"  
inputs:  
  input_file_name:  
    description: "Name of the input SBOM file to enrich"  
    required: true  
  enricher:  
    description: "The enricher used to enrich the parlay sbom. Currently parlay supports ecosystems, snyk, scorecard(openssf scorecard)"  
    required: true  
    default: ecosystems  
  output_file_name:  
    description: "Name of the output file to save the SBOM enriched using the parlay's enricher"  
    required: true  
outputs:  
  output_file_name:  
    description: "Prints the output file"  
runs:  
  using: "docker"  
  image: "Dockerfile"  
  args:  
    - ${{ inputs.input_file_name }}  
    - ${{ inputs.enricher }}  
    - ${{ inputs.output_file_name }}

Developing business logic in bash script

Once we have defined the inputs, outputs and environment that we are going to use, we would like to define what we are going to do with those inputs ( basically our logic) in a file. We can either define this in JavaScript or bash. For my current use case, i am using bash.

In my current logic, i am going to check if all the inputs are first given, if not the action fails. Once i have these 3 arguments, i am going to construct the command to run the action and save the output in an output file. This file is printed in stdout and formatted using jq utility.

#!/bin/bash  
\# Check if all three arguments are provided  
if [ "$#" -ne 3 ]; then  
    echo "Usage: $0 <input> <input_file_name> <output_file_name>"  
    exit 1  
fi  
\# Extract arguments  
INPUT_INPUT_FILE_NAME=$1  
INPUT_ENRICHER=$2  
INPUT_OUTPUT_FILE_NAME=$3  
\# Construct command  
full_command="parlay $INPUT_ENRICHER enrich $INPUT_INPUT_FILE_NAME > $INPUT_OUTPUT_FILE_NAME"  
eval "$full_command"  
\# Check if the command was successful  
if [ $? -eq 0 ]; then  
    echo "Command executed successfully: $full_command"  
    cat $INPUT_OUTPUT_FILE_NAME | jq .  
else  
    echo "Error executing command: $full_command"  
fi

Dockerize the bash application

Once we have the bash script ready, we will be dockerizing it using the following script. Whenever we invoke the action, this action which is defined in the bash script runs in an isolated docker container. In addition to the bash script in entrypoint.sh, we would also be adding the the required libraries such as wget, jq and installing parlay binary.

\# Base image  
FROM --platform=linux/amd64 alpine:latest  
\# installes required packages for our script  
RUN apk add --no-cache bash wget jq  
\# Install parlay  
RUN wget <https://github.com/snyk/parlay/releases/download/v0.1.4/parlay_Linux_x86_64.tar.gz>   
RUN tar -xvf parlay_Linux_x86_64.tar.gz   
RUN mv parlay /usr/bin/parlay  
RUN ls /usr/bin | grep parlay  
RUN parlay  
\# Copies your code file  repository to the filesystem  
COPY . .  
\# change permission to execute the script and  
RUN chmod +x /entrypoint.sh  
\# file to execute when the docker container starts up  
ENTRYPOINT ["/entrypoint.sh"]

Test the action

No amount of software is good without running some tests on it. To test the action, lets first push the code to GitHub. Once pushed, lets define the pipeline in pipeline.yaml file in .github/workflows folder. For the sake of input file, i am using a sample sbom file in cyclonedx format and have pushed it to GitHub. In my pipeline.yaml file, i am cloning the GitHub repo and using my action called krishnaduttPanchagnula/parlayaction@main on cyclonedx.json.

on: [push]  
jobs:  
  custom_test:  
    runs-on: ubuntu-latest  
    name: We test it locally with act  
    steps:  
      - name: Checkout git branch  
        uses: actions/checkout@v1  
          
      - name: Run Parlay locally and get result  
        uses: krishnaduttPanchagnula/parlayaction@main  
        id: parlay  
        with:  
          input_file_name: ./cyclonedx.json  
          enricher: ecosystems  
          output_file_name: enriched_cyclonedx.json

Once the pipeline runs, this should give output in the std-out in pipeline console as follows.

Parlay Github action Output

Publish it in GitHub action Marketplace

Once we have tested the action and that is running fine, we are going to publish it GitHub actions market place. TO do so, our custom app should have globally unique name. To make it more unique we can add icon with our custom symbol and colour to uniquely identify the action in marketplace.

Once that is done you would see the button, “Draft a Release”. Ensure that your action.yml file has Name, description, Icon and color.

Once you have tick marks, you would be guided to release page where you can mention the title and version of the release. After that, click on “publish release” and you should be able to see your action in GitHub Actions marketplace.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

Secure your data and internet traffic with your Personalized VPN in AWS

July 12, 2024 · 8 min read

Introduction

In today’s era, the internet has become embedded into the very fabric of our lives. It has revolutionized the way we communicate, work, shop, and entertain ourselves. With the increasing amount of personal information that we share online, data security has become a major concern. Cyber-criminals are constantly on the lookout for sensitive data such as credit card information, social security numbers, and login credentials to use it for identity theft, fraud, or other malicious activities.

Moreover, governments and companies also collect large amounts of data on individuals, including browsing history, location, and personal preferences, to model the behavior of the users using deep-learning clustering models. This data can be used to coerce users psychologically to buy their products or form an opinion that they want us to form.

To overcome this issue, we can use a VPN which can be used to mask the user’s identity and route our traffic through a remote server. In addition, we can bypass internet censorship and access content that may be restricted in our region, which enables us to access our freedom to consume the data we want rather than what the governments/legal entities want us to consume. The VPNs we will discuss are of two types: Public VPNs such as Nord VPN, Proton VPN, etc., and private VPNs. Let’s try to understand the differences amongst them.

Private vs Public VPN

Public VPNs are VPN services that are available to the general public for a fee or for free. These services typically have servers located all around the world, and users can connect to any of these servers to access the internet.

Private VPNs, on the other hand, are VPNs that are created and managed by individuals or organizations for their own use. Private VPNs are typically used by businesses to allow remote employees to securely access company resources, or by individuals to protect their online privacy and security.

Not all VPNs are created equal, and there are risks associated with using public VPN services over private VPN as follows:

Risks of Using Private VPNs

Trustworthiness of the VPN Provider

When using a private VPN, you are essentially entrusting your online security to the VPN provider. If the provider is untrustworthy or has a history of privacy breaches, your data could be compromised. Unfortunately, many private VPN providers have been caught logging user data or even selling it to third-party companies for profit.

Potential for Malware or Adware

Some private VPNs have been found to include malware or adware in their software. This can be particularly dangerous, as malware can be used to steal sensitive information, while adware can slow down your computer and make browsing the web more difficult.

Unreliable Security of your Data

Private VPNs may not always provide reliable security. As the service is managed by the third-party service, it is difficult to understand how their system is working behind the closed doors. There may be logging data which can be easily used to identify the user, which would straight away remove the idea of anonymity of use.

Benefits of Creating Your Own Personal VPN

Complete Control Over Your Online Security

By creating your own personal VPN, you have complete control over your online security. You can choose the encryption protocol, server locations, and other security features, ensuring that your data is protected to the fullest extent possible.

No Third-Party Involvement

When using a private VPN provider, you are relying on a third-party to protect your online security. By creating your own personal VPN, you eliminate this risk entirely, as there are no third parties involved in your online security.

Cost-Effective

While some private VPN providers charge high monthly fees for their services, creating your own personal VPN can be a cost-effective solution. By using open-source software and free server software, you can create a VPN that is just as secure as a private VPN provider, but without worrying about your browsing history privacy or the excessive costs.

Setting up OpenVPN in AWS

OpenVPN is an open-source project that can be used to create your custom VPN using their community edition and setting things up on your VPN server. Once the VPN server is set up, we use the Open-VPN client to connect to our VPN server and tunnel our traffic through the instance. For setting up the Open-VPN server, we are going to need the following things:

An AWS account
A little bit of curiosity..!

We are going to set up the VPN server in an AWS EC2 instance, which would be used to connect with our Open-VPN client on all our devices.The Open-VPN company also provides a purpose-built OpenVPN Access Server as an EC2 AMI which comes out of the box with AWS-friendly integration , which we are going to use in this blog.

Setup Open-VPN server in AWS:

Once you have setup the AWS, login to your AWS account and search for EC2.
Once you are in the AWS EC2 console, switch to the region you want you VPN to be in and then click “Launch instances” button on the right side of the screen.
In the Ec2 creation console, search for AMI named “openvpn”. You will see a lot of AMI images. Based on the number of VPN connections you require, select the AMI. For the Current demonstration, I am choosing AMI which serves two VPN connection.
Choosing the above VPN, sets the Security group by itself. Ensure that the Ec2 is publicly accessible ( Either with EIP or setting Ec2 in public-subset). Once done press “Launch Instance”.
When we connect to the Ec2 instance, we are greeted with the OpenVPN server agreement. Create the settings as shown below and at the end, create an password.
Once done, open https://:943/admin’ where you would see an login page. Enter your user name and login that you have set in the VPN server, which in my case, username is openvpn and enter your previously set password.
You would enter the openVPN settings page. In configuration>Vpn settings, scroll to the bottom and toggle “Have clients use specific DNS servers” to ON. In the primary DNS enter 1.1.1.1 and in secondary dns enter 8.8.8.8. After this, click save changes on the bottom of the screen.
If you scroll to the top you will see a banner with “Update Running Server”, click on it.
You are set on the Open-VPN server side !

Connecting to Open-VPN server from our device:

Once the server is configured, we would require client to connect to out openVPN server. For that purpose we need to install “Open-VPN connect”

For Windows : Download and install the open-VPN connect from here
For Mobile : Search for “openvpn connect” in the play-store (for Android) and in app-store(for apple)
For Linux:

First ensure that your apt supports the HTTPS transport:

apt install apt-transport-https

Install the Open-VPN repository key used by the OpenVPN 3 Linux packages

curl -fsSL <https://swupdate.openvpn.net/repos/openvpn-repo-pkg-key.pub> | gpg --dearmor > /etc/apt/trusted.gpg.d/openvpn-repo-pkg-keyring.gpg

Then you need to install the proper repository. Replace $DISTRO with the release name depending on your Debian/Ubuntu distribution. For distro list, refer here

curl -fsSL <https://swupdate.openvpn.net/community/openvpn3/repos/openvpn3-$DISTRO.list> >/etc/apt/sources.list.d/openvpn3.list 
apt update 
apt install openvpn3

Once installed open “Open-VPN connect“ , we should see the something like below.
In the URL form, enter the IP of your EC2 instance and click NEXT. Accept the certificate pop-ups you would get during this process.
In the user name form, enter the username that you have set in the server and same for the password. Then click IMPORT.
Once imported, click on the radio button and enter your credentials again.
Once connected you should see the screen link this. Voila ! enjoy using your private VPN in EC2.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

Deploy and Run Hashicorp Vault With TLS Security in AWS Cloud

July 12, 2024 · 9 min read

Developers use secrets from API tokens, Database credentials, or other sensitive information within the code.
Dev-ops engineers might have to export certain values as environment variables and write the values in the YAML file for CI/CD pipeline to run efficiently.
The cloud engineers might have to pass the credentials, secret tokens, and other secret information for them to access their respective cloud (In the case of AWS, even if we save these in a .credentials file, we still have to pass the filename in terraform block, which would indicate that the credentials are available locally within the computer.)
The system administrators might have to send different logins and passwords to the people for the employees to access different services

HashiCorp Vault — an Introduction

Its workflow can be visualized as follows:

Hosting Cost of Vault

Local hosting: This method is usually done if the secrets are to be accessed only by the local users or during the development phase. This method has to be shunned if these secret engines have to be shared with other people. As it is within the local development environment, there is no additional investment for deployment. This can be hosted directly in a local machine or by its official docker image
Public Cloud Hosting ( EC2 in AWS/Virtual Machine in Azure): If the idea is to set up Vault to share with people across different regions, hosting it on Public cloud is a good idea. Although we can achieve the same with the on-prem servers, upfront costs and scalability is quite a hassle. In the case of AWS, we can easily secure the endpoint by hosting Vault in the EC2 instance and creating a Security group on which IPs can access the EC2. If you feel more adventurous, you can map this to a domain name and route from Route 53 so the vault is accessible on a domain as a service to the end users. In the case of EC2 hosting with an AWS-defined domain, the cost is $0.0116/hr.
Vault cloud Hosting (HashiCorp Cloud Platform): If you don’t want to set up infrastructure in the Public Cloud Environments, there is an option of choosing the cloud hosted by vault. We can think of it as a SaaS-based cloud platform that enables us to use the Vault as a service on a subscription basis. Since hashicorp itself manages the cloud, we can expect a consistent user experience. For the cost, it has three production grade options: Starter at $ 0.50/hr, Standard at $1.58/hr, and Plus at $1.84/hr (as seen in July 2022).

Example of Self-Hosting in AWS Cloud

To create a ready-to-use Vault, we are going to follow these steps:

Create an EC2 Linux instance with ssh keys to access it.
SSH into the instance and install the Vault to get it up and running
Configure the Valve Secrets Manager

Step 1: Create an EC2 Linux instance with ssh keys to access it

To create an EC2 instance and access it remotely via SSH, we need to create the Key pair. First, let's create an SSH key via the AWS console.

Step 2: SSH into the instance and install the secrets to get it up and running

wget -O- <https://apt.releases.hashicorp.com/gpg> | gpg — dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg  
  
echo "deb \[signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg\] <https://apt.releases.hashicorp.com> $(lsb\release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list  
  
sudo apt update && sudo apt install vault

Step 3: Configure the Hashicorp valve

Before initializing the vault, let's ensure it is properly installed by following the command:

vault

Let's make sure there is no environment variable called VAULT_TOKEN. To do this, use the following command:

$ unset VAULT_TOKEN

Listener Ports: the port/s on which the Vault listens for API requests.
API address: Specifies the address to advertise to route client requests.
Cluster address: Indicates the address and port to be used for communication between the Vault nodes in a cluster. To secure it much further, we can use TLS-based communication. This step is optional and can only be tried if you want to secure your environment further. The TLS Certificate can be generated using openssl in Linux.

# Installs openssl  
sudo apt install openssl  
  
#Generates TLS Certificate and Private Key  
openssl req -newkey rsa:4096 -x509 -sha512 -days 365 -nodes -out certificate.pem -keyout privatekey.pem 

Insert the TLS Certificate and Private Key file paths in their respective arguments in the listener “tcp” block.

tls_cert_file: Specifies the path to the certificate for TLS in PEM encoded file format.
tls_key_file: Specifies the path to the private key for the certificate in PEM-encoded file format.

#Configuration in config.hcl file  
  
storage "raft" {   
path = "./vault/data"   
node\id = "node1"   
}  
listener "tcp" {  
 address = "127.0.0.1:8200"   
tls\disable = "true"  
tls\cert\file = certificate.pem  
tls\key\file = privatekey.pem  
}  
disable\mlock = true  
api_addr = "http://127.0.0.1:8200"   
cluster_addr = "https://127.0.0.1:8201"  
 ui = true

Once these are created, we create the folder where our backend will rest: vault/data.

mkdir -p ./vault/data

Once done, we can start the vault server using the following command:

vault server -config=config.hcl

Once done, we can start our Vault instance with the backend mentioned in the config file and all its settings.

export VAULT_ADDR='http://127.0.0.1:8200'  
  
vault operator init

Once we have created these Keys and the initial token, we need to unseal the vault:

vault operator unseal

Here we need to supply the threshold number of keys to unseal. Once we supply that, the sealed status changes to false.

Then we log in to the Vault using the Initial root token.

vault login

As seen from the process, Vault is pretty robust in terms of encryption, and as long as the shamir keys and initial token are handled in a sensitive way, we can ensure the security and integrity

And you have a pretty secure Vault engine (protected by its own shamir keys) running on a free AWS EC2 instance (which is, in turn, guarded by the security groups)!

**Want to Connect?**  
  
If you want to connect with me, you can do so on [LinkedIn](https://www.linkedin.com/in/krishnadutt/).

Pushing Digital Transformation boundaries beyond Technology : A Radical Perspective

July 12, 2024 · 5 min read

Digital transformation is a radical re-imagination of how an organization utilizes bleeding-edge technologies to fundamentally change their business models and performance. Implementing technology in both processes and products is key to digital transformation, as it is not just about implementing new technologies, but about fundamentally changing the way an organization functions.

Digital transformation is both digital and cultural. On the digital side, it involves the implementation of new technologies and the optimization of processes and systems to take advantage of those technologies. This can include things like cloud computing, data analytics, automation, and other cutting-edge technologies.

However, digital transformation is not just about the technology. It also involves a cultural shift within the organization. This includes things like customer-centricity, agility, continuous learning, unbounded collaboration, and an appetite for risk. These cultural changes are necessary to enable the organization to stay competitive in an increasingly digital world.

Current Digital transformation is based on Digital — implementing new technologies but not on transformation — which is about how we function as individuals in a social setting. Lots of companies are implementing these cool ideas of DEVOPS while there is no change fundamentally in how they work/function.

Why should we care about Digital Transformation:

As technology advances, it is essential to change the way we live, work, and do business. Organizations that fail to adapt to these changes will struggle to stay competitive and may eventually be left behind.

Digital transformation is not just about implementing new technologies, but about fundamentally changing the way an organization functions.

This includes optimizing processes, products, systems, and organizational structure to take advantage of the latest technologies. By embracing digital transformation, organizations can improve their business performance, reduce costs, and increase efficiency.

Digital transformation can also help organizations improve their customer experience. By using technology to collect and analyze data, organizations can better understand their customers and provide personalized, relevant products and services. This can lead to increased customer satisfaction and loyalty.

Characteristics for Digital transformation culture:

Customer - Centricity: In the past, organizations would implement the same transformation strategies for all of their customers. However, this one-size-fits-all approach is no longer effective. Today’s organizations must consider each customer’s unique vision and goals, and create personalized transformation strategies that align with those goals.
Agility: In a rapidly-changing world, organizations must be able to pivot quickly and adapt to new situations. Hierarchical structures, while useful for reliability, can be a hindrance to agility. As such, many organizations are adopting agile methodologies and flattening their hierarchies to enable faster decision-making and response times.
Continuous learning : As technology and the world around us change rapidly, organizations must be able to adapt and learn new skills and knowledge. This requires a culture of curiosity and a willingness to try new things. Organizations are hiring and working with individuals who are open to new ideas and ready to build new products and services.
Unbounded collaboration: In the past, teams within organizations would often work in silos, with limited communication and collaboration across teams. Today, organizations are fostering cultures that encourage and incentivize cross-team collaboration. This cross-functional knowledge sharing leads to more innovation and better results.
Appetite for risk: Many of the most exciting innovations are created at the edge of what is currently known. This requires organizations to venture into unknown territories, which can be risky. However, by fostering a culture of intelligent failure (failures that occur when trying to do new things) and minimizing preventable failures (failures due to sloppy work), organizations can improve their appetite for risk and drive innovation.

Actions to imbibe cultural change towards digital transformation:

Communicate the importance and benefits of digital transformation: Employees may be resistant to change, especially if they do not understand why it is necessary. By communicating the importance and benefits of digital transformation, organizations can help employees understand why it is necessary and how it will benefit the organization and its customers.
Encourage and reward experimentation: Digital transformation requires a culture of continuous learning and experimentation. Organizations should encourage employees to try new things and should reward them for their efforts, even if those efforts don’t always lead to success.
Foster collaboration and knowledge sharing: Digital transformation often involves cross-functional collaboration and the sharing of knowledge and expertise across teams. Organizations should foster a culture that encourages and incentivizes collaboration and knowledge sharing.
Provide training and support: Digital transformation can be a daunting process, especially for employees who are not familiar with the latest technologies. Organizations should provide training and support to help employees learn new skills and adapt to the changes brought about by digital transformation.
Create a positive, inclusive culture: Digital transformation can be stressful and disruptive, especially for employees who may feel threatened by the changes it brings. Organizations should strive to create a positive, inclusive culture that supports and empowers employees during the transformation process.

The future of digital transformation is uncertain, but it is likely that technology will continue to advance and play an increasingly important role in our lives and in business. Organizations must continue to embrace digital transformation in order to stay competitive and adapt to the changing digital landscape. By taking steps to improve the social culture around digital transformation, organizations can make it easier for employees to adapt to the changes brought about by digital transformation.

Developing Real-time resource monitoring via email on AWS using Terraform

July 12, 2024 · 4 min read

One of the main tasks as an SRE engineer is to maintain the infrastructure that is developed for the deployment of the application. As each of the service exposes the logs in different way, we need plethora of sns and lambdas to monitor the infrastructure. This increases the cost of monitoring, which would compel management to drop this monitoring system.

But what if i say that, we can develop this monitoring system for less than 24 cents ? And what if i say that you can deploy this entire monitoring system with just a single command “Terraform apply”? Sounds like something that you would like to know? Hop on the Terraform ride !

Key components to build the infrastructure

In order to create an monitoring system to send email alerts, we need 3 components:

Event Bridge
SNS
Email subscription

We can build a rudimentary monitoring system, by combining all these components. But the logs we get as email, would be as following:

{
  "version": "1.0",
  "timestamp": "2022-02-01T12:58:45.181Z",
  "requestContext": {
    "requestId": "a4ac706f-1aea-4b1d-a6d2-5e6bb58c4f3e",
    "functionArn": "arn:aws:lambda:ap-south-1:498830417177:function:gggg:$LATEST",
    "condition": "Success",
    "approximateInvokeCount": 1
  },
  "requestPayload": {
    "Records": [
      {
        "eventVersion": "2.1",
        "eventSource": "aws:s3",
        "awsRegion": "ap-south-1",
        "eventTime": "2022-02-01T12:58:43.330Z",
        "eventName": "ObjectCreated:Put",
        "userIdentity": {
          "principalId": "A341B33DQLH0UH"
        },
        "requestParameters": {
          "sourceIPAddress": "43.241.67.169"
        },
        "responseElements": {
          "x-amz-request-id": "GX86AGXCNXB5ZYVQ",
          "x-amz-id-2": "CPVpR8MNcPsNBzxcF8nOFqXbAIU60/zQlNC6njLp+wNFtC/ZnZF0SFhfMuhLOSpEqMFvvPqLA+tyvaXJSYMXAByR5EuDM0VF"
        },
        "s3": {
          "s3SchemaVersion": "1.0",
          "configurationId": "09dae0eb-9352-4d8a-964f-1026c76a5dcc",
          "bucket": {
            "name": "sddsdsbbb",
            "ownerIdentity": {
              "principalId": "A341B33DQLH0UH"
            },
            "arn": "arn:aws:s3:::sddsdsbbb"
          },
          "object": {
            "key": "[variables.tf]",
            "size": 402,
            "eTag": "09ba37f25be43729dc12f2b01a32b8e8",
            "sequencer": "0061F92E834A4ECD4B"
          }
        }
      }
    ]
  },
  "responseContext": {
    "statusCode": 200,
    "executedVersion": "$LATEST"
  },
  "responsePayload": "binary/octet-stream"
}

Not so easy to read right ? What if we can improve it, making it legible for anyone to understand what is happening?

To make it easy to read, we use the feature in the Event bridge called input transformer and input template. This feature helps us in transforming the log in our desired format without using any lambda function.

Infrastructure Working

The way our infrastructure works is as follows:

Our event bridge will collect all the logs from all the events from the AWS account, using event filter.
Once collected, these are sent to input transformer to parse and read our desired components.
After this, we use this parsed data to create our desired format using input template.

Input transformer and input templete for event bridge rule

This transformed data is published to the SNS that we have created.
We create a subscription for this SNS, via email,SMS or HTTP.

And Voila ! you have your infrastructure ready to update the changes…!

Here is the entire terraform code:

terraform {  
  required_providers {  
    aws = {  
      source  = "hashicorp/aws"  
      version = "~> 3.0"  
    }  
  }  
}\# Configure the AWS Provider  
provider "aws" {  
  region = "ap-south-1" #insert your region code  
}resource "aws_cloudwatch_event_rule" "eventtosns" {  
  name = "eventtosns"  
  event_pattern = jsonencode(  
    {  
      account = [ 
        var.account,#insert  your account number  
     ]  
    }  
  )}resource "aws_cloudwatch_event_target" "eventtosns" {\# arn of the target and rule id of the eventrule  
  arn  = aws_sns_topic.eventtosns.arn  
  rule = aws_cloudwatch_event_rule.eventtosns.idinput_transformer {  
    input_paths = {  
      Source      = "$.source",  
      detail-type = "$.detail-type",  
      resources   = "$.resources",  
      state       = "$.detail.state",  
      status      = "$.detail.status"  
    }  
    input_template = "\\"Resource name : <Source> , Action name : <detail-type>,  
      details : <status> <state>, Arn : <resources>\\""  
  }  
}resource "aws_sns_topic" "eventtosns" {  
  name = "eventtosns"  
}resource "aws_sns_topic_subscription" "snstoemail_email-target" {  
  topic_arn = aws_sns_topic.eventtosns.arn  
  protocol  = "email"  
  endpoint  = var.email  
}\# aws_sns_topic_policy.eventtosns:  
resource "aws_sns_topic_policy" "eventtosns" {  
  arn = aws_sns_topic.eventtosns.arnpolicy = jsonencode(  
    {  
      Id = "default_policy_ID"  
      Statement = [ 
        {  
          Action = [ 
            "SNS:GetTopicAttributes",  
            "SNS:SetTopicAttributes",  
            "SNS:AddPermission",  
            "SNS:RemovePermission",  
            "SNS:DeleteTopic",  
            "SNS:Subscribe",  
            "SNS:ListSubscriptionsByTopic",  
            "SNS:Publish",  
            "SNS:Receive",  
         ]  
          condition = {  
            test     = "StringEquals"  
            variable = "AWS:SourceOwner"  
            values = [ 
              var.account,  
           ]  
          }Effect = "Allow"  
          Principal = {  
            AWS = "\*"  
          }  
          Resource = aws_sns_topic.eventtosns.arn  
          Sid      = "__default_statement_ID"  
        },  
        {  
          Action = "sns:Publish"  
          Effect = "Allow"  
          Principal = {  
            Service = "events.amazonaws.com"  
          }  
          Resource = aws_sns_topic.eventtosns.arn  
          Sid      = "AWSEvents_lambdaless_Idcb618e86-b782-4e67-b507-8d10aaca5f09"  
        },  
     ]  
      Version = "2008-10-17"  
    }  
  )  
}

This entire infrastructure can be deployed using Terraform apply on above code.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

Vulnerability Identification of Images and Files using SBOM with Trivy

July 12, 2024 · 7 min read

As we embrace digital progress and shift to using containers for our applications, we’re dealing with a variety of core images that form the basis of our apps. These images vary based on our app’s needs, given the different ways we build them. Unfortunately, in companies with a moderate level of DEVOPS experience (maturity levels 2 to 3), the focus often leans heavily towards getting new features out quickly, sometimes overlooking security.

For us developers, the challenge is keeping up with the ever-changing world of software. This includes the extra pieces they need (dependencies), the building blocks (components), and the tools (libraries) that make them work together across our whole system. All these parts combined are what we call the Software Bill of Materials (SBOM). To tackle this, we need to understand the SBOM of each image before we use it as a foundation for our apps. But doing this manually is complex and time-consuming. Plus, every time we want to update to a newer image version, we have to go through the whole process again, slowing down our ability to deliver our products.

The solution? Automation. By using automation, we can navigate the intricate process of understanding the SBOM. It helps us make organized lists of all the parts, find any known problems, and suggest ways to fix them — all done with tools like Trivy. Before we dive into the detailed steps, let’s make sure we’re all on the same page about what the SBOM really means.

What is SBoM?

SBOM stands for Software Bill of Materials. Think of it like a detailed list of all the parts that make up our software. This list includes things like tools from other people, building blocks, and even the rules that guide how they’re used, all with exact version details.

SBOM is important because it gives us a big picture of all the different parts that come together to create a specific software. This helps our DevSecOps teams find out if there are any possible risks, understand how they could affect us, and take steps to fix them. All of this makes our software strong and secure.

Why create SBOM?

Finding and Managing Outside Parts: SBOM helps us see all the software we use from others. It shows us different versions and even points out any possible security issues. With this info, we can make smart choices about what we use, especially when it comes to libraries and tools from other sources.
Making Our Supply Chain Secure: SBOM acts like a detailed map for our software. This map helps us make sure everything is safe and guards against any tricks or attacks on our software supply chain. We can even use SBOM to check if the people we get our software from follow good security rules.

SBOM format

Software Package Data Exchange (SPDX): This open standard serves as a software bill of materials (SBOM), identifying and cataloging components, licenses, copyrights, security references, and other metadata related to software. While its primary purpose is to enhance license compliance, it also contributes to software supply-chain transparency and security improvement.
Software Identification Tags (SWID): These tags contain descriptive details about a specific software release, including its product and version. They also specify the organizations and individuals involved in producing and distributing the product. These tags establish a product’s lifecycle, from installation on an endpoint to deletion.
CycloneDX (CDX): The CycloneDX project establishes standards in XML, JSON, and protocol buffers. It offers an array of official and community-supported tools that either generate or work with this standard. While similar to SPDX, CycloneDX is a more lightweight alternative.

In addition to creating SBOM, trivy has the capability to SCAN the SBOM generated either by trivy or other tools, to identify the severity of the problem. In addition to the vulnerability detection, it also suggests the possible fixes for the identified vulnerability.

SBOM with TRIVY

Trivy is a comprehensive security scanner, maintained and built by aqua security team. It is reliable, fast, and has several in-built scanners to scan for has different security issues, and different targets where it can find those issues. It can be used to scan following use cases:

OS packages and software dependencies in use (SBOM)
Known vulnerabilities (CVEs)
IaC misconfigurations
Sensitive information and secrets

Today we will be focusing on the SBOM scanning capabilities of the Trivy. In this tutorial, we would doing the following :

First, Create the SBOM using trivy
Analyze the created SBOM to scan and find the vulnerabilites

For the sake of the demo we will be using one of the most used docker images, NGINX, particularly nginx:1.21.5. We would using its Docker hub image and run scanner to generate the SBOM. Once the SBOM is generated, we would use this SBOM to get the list of Vulnerabilities and possible fixes for the same.

Generating SBOM

To generate the SBOM, make sure the trivy is first installed in you workbench. If not you can install using commands in this link. In my case , my pc is debian based, so i installed using following command

sudo apt-get install wget apt-transport-https gnupg lsb-release  
wget -qO - <https://aquasecurity.github.io/trivy-repo/deb/public.key> | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null  
echo "deb \[signed-by=/usr/share/keyrings/trivy.gpg\] <https://aquasecurity.github.io/trivy-repo/deb> $(lsb\_release -sc) main" | sudo tee -a /etc/apt/sources.list.d/trivy.list  
sudo apt-get update  
sudo apt-get install trivy

If you are using any flavour of debian, you might face issue with “lsb_release -sc” in the command. To overcome the issue you can use one of the following values: wheezy, jessie, stretch, buster, trusty, xenial, bionic.

Once installed you should see the following, when you run

trivy --version

Once the trivy is installed, we can scan the vulnerabilities in two ways, either for image, for the whole directory containing the files or an single file.

For a folder containing group of language specific files

trivy fs --format cyclonedx --output result.json /path/to/your\_project

For a specific file

trivy fs --format cyclonedx --output result.json ./trivy-ci-test/Pipfile.lock

For a container image

trivy image --format spdx-json --output result.json nginx:1.21.5

Once done, the scanner will scan the files/image and determine which language is the application written. Once determined, it will download the database pertaining to that specific language and get the list of libraries that are present in that language and check against which are being used in the current context.

These libraries are listed down along with OS level libraries in an SBOM in the format that we have requested in.

Listing Vulnerabilities of SBOM

Once the SBOM are generated, we can create the list of known vulnerabilities of the dependencies in the image/files such as libraries, OS packages etc. In addition to identifying these, trivy also suggests the version in which these vulnerabilities are fixed, making it not an effective tool to identify the vulnerabilities but also to resolve them.

In order for us to generate this list, we use the following command

trivy sbom results.json

This will generate the following list

As you can observe, we get the name of the library, the CVE vulnerability Number, Its Severity (HIGH,MEDIUM,LOW), The status of Vulnerability fix( fixed, not fixed or will_not fix) , if its fixed then the fixed version and along with the details on the vulnerability.

Based on this information ,we can upgrade the fixed vulnerable libraries, understand the vulnerability level in not fixed libraries and remove them if they are not required. In-addition to that, we would have the opportunity to look into the alternatives to the vulnerable libraries.

For more information and documentation, visit this site.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

Deploy and Run Hashicorp Vault With TLS Security in AWS Cloud

July 12, 2024 · 9 min read

In an ideal IaC world, all our infrastructure implementation and updates are written and implemented by pushing the updated code to GitHub, which would trigger a CI/CD pipeline either in Jenkins or Circle-Ci, and changes are reflected in our favorite public cloud. But reality is far from this, even in companies in stage four of cloud maturity. It can be for a plethora of reasons, such as the following:

The company is still in its initial stages of cloud automation
There are multiple stakeholders across different teams who are developing proofs-of-concept via console.
An ad-hoc manual hot-fix is introduced to stabilize the current production.
The user is not aware of IAC tools

Given these reasons, different categories of drift are introduced into the system, each of which has its own remediation actions. This article explains terraform drift, its categories, remediation strategies, and tools to monitor terraform drift.

To understand these concepts better, let’s first explore what terraform drift is and how Terraform detects this drift.

What is Terraform Drift?

When we create resources (i.e., terraform apply) using Terraform, it stores information about current infrastructure, either locally or remote backed in a file named terraform.tfstate. On subsequent terraform apply, this file gets updated with the current state of the infrastructure. But when we make manual changes via console or CLI, those changes are applied in the cloud environment but not seen in the state file.

Terraform drift can be understood as a drift /difference that is observed from the actual state of infrastructure defined in our terraform to the state of infrastructure present in our cloud environment.

In any of the above situations, having the infrastructure changes outside Terraform code causes our Terraform state file to have a very different state than the cloud environment. So when we apply the Terraform code next time, we would see a drift, which might cause the Terraform resources to either change or destroy the resources. So understanding how different kinds of drift creeps into our infrastructure helps us mitigate such risks.

Types of Drifts

We can categorize the Terraform configuration drift into three categories:

Emergent drift — Drift observed when infrastructure changes are made outside of the Terraform ecosystem, which was initially applied via Terraform (So their state is present in the Terraform state file).
Pseudo drift — “Changes” seen in the plan/apply cycle due to ordering items in the list and other provider idiosyncrasies.
Introduced drift — New infrastructure created outside of Terraform.

Sometimes it is debated that introduced drift should not be considered as the infrastructure is entirely set up via the console. But the idea of using Terraform entirely automates infrastructure processes via code. So any manual/hybrid is considered as drift.

Managing Emergent Drift

As mentioned, emergent drift is observed when infrastructure applied and managed by Terraform is modified outside of the Terraform ecosystem. This can be managed based on the state that we prefer :

Infrastructure state preferred: If our preferred state is the state that is in the cloud, then we would make changes to our Terraform configuration figure ( usually [main.tf](http://main.tf) file ) and its dependent modules so that next time we run terraform apply, the state of the configuration file and Terraform state file are in sync.
Configuration state preferred: If our preferred state is the one in our configuration file, we just run terraform apply using our configuration file. This would negate all the changes in the cloud and apply the configuration in the Terraform configuration file.

Managing Pseudo Drift

Pseudo drift can be observed when the ordering of certain resources or certain arguments for a resource is different in the configuration file from the state file. This drift is not so common but can seldom be observed with some providers. To understand this better, let’s take an example of creating a multi-availability zone RDS.

resource "aws\_db\_instance" "default" {  
  allocated\_storage    = 10  
  engine               = "mysql"  
  engine\_version       = "5.7"  
  instance\_class       = "db.t3.micro"  
	availability\_zone    = \["us-east-1b","us-east-1c","us-east-1a"\]# Us-east -1a was added later   
  name                 = "mydb"  
  username             = "foo"  
  password             = "foobarbaz"  
  parameter\_group\_name = "default.mysql5.7"  
  skip\_final\_snapshot  = true  
}

Initially, we only wanted east-1b and 1c, but later added the 1a. When we applied this configuration, it ran successfully. Being careful SRE engineers that we are, we run a terraform plan to confirm that everything is the way we wanted. But to our surprise, we might see it adding this resource again with changes in the “availability zone” line. And when we apply this change again, this change log can be shown in the subsequent terraform apply lifecycles.

To manage this, we should run terraform show which will show us the current state file. Locate the availability zone argument and see the order in which these arguments are passed as a list. Copy these values to the Terraform configuration file, and you should be good to go.

Managing Introduced Drift

Introduced drift happens when new infrastructure is provisioned outside the Terraform ecosystem in the cloud. This is the most gruesome drift, which would require a conscientious effort from the engineer to detect and handle, as there is no track of these changes in the Terraform state file. Unless viewed via console by going through each resource, reading the cloud-watch logs, checking the billing console, or learning from the person who has done this change, it is quite difficult to detect this drift. This can also happen when we run terraform destroy, and some resources fail to destroy.

If we can identify the resource which is manually provisioned, there are two approaches based on which environment it is present:

Provisioning anew: If the resource is not in a production-grade environment, it is recommended that we destroy that resource and then create a module for the same within our Terraform configuration file. This way, the infrastructure is logged, tracked, and monitored via Terraform state file, and all the resources are created via Terraform.
Terraform import: If the resource is present in the production-grade environment, it is difficult to create it anew. In this case, we import the resources using the “terraform import.” Terraform import helps us create Terraform HCL code for the resource in question. Once we get this resource, we can copy this code into the Terraform configuration file, which, when applied, would update the state file with the same configuration as the state present in the cloud.

Drift Identification and Monitoring

All this management of the drift can be done only when we can detect that there is a drift. In the case of emergent and pseudo drift, we can identify them using the “Terraform Plan” command, which would compare the current state file with resources in the cloud (previously created with Terraform). But this would fail in the case of introduced drift, as there is no state for the resource created outside the Terraform ecosystem. So it would serve us better if we can detect this kind of drift beforehand and automate it via IAC tools. This drift can be done using two tools:

CloudQuery

If you like to use a data-centric approach with visualization dashboard, this solution is for you. CloudQuery is an open source tool that compares the state file with the resources in our desired cloud provider, then formats and loads this data into a PostgreSQL database. As a drift detection command is created on top of PostgreSQL with a column as managed or unmanaged, we can use this flag as a filter to visualize in our favorite dashboard solution, such as Tableau or Power BI, to monitor infrastructure state drift. (For more information, refer to https://www.cloudquery.io/docs/cli/commands/cloudquery.)

providers:  
  # provider configurations  
  - name: aws  
    configuration:  
       accounts:  
	      - id: <UNIQUE ACCOUNT IDENTIFIER>  
      # Optional. Role ARN we want to assume when accessing this account  
      #     role\_arn: < YOUR\_ROLE\_ARN >  
      # Named profile in config or credential file from where CQ should grab credentials  
      local\_profile =  default  
      # By default assumes all regions  
	    regions:  
	      - us-east-1  
	      - us-west-2  
        
      # The maximum number of times that a request will be retried for failures.   
	    max\_retries: 5  
      # The maximum back off delay between attempts. The backoff delays exponentially with a jitter based on the number of attempts. Defaults to 30 seconds.  
      max\_backoff: 20  
      #    
    # list of resources to fetch  
	    resources:  
	      - "\*"

Driftctl

If you are more of a CLI kind of person who loves working with the terminal, this tool is for you. Driftctl helps us track and detect managed and unmanaged drifts that may happen with a single command.

Since this is a CLI-based tool, this can be easily integrated into the CI/CD pipeline written in the Jenkins pipeline, and the results can be pushed as output to the PR in GitHub. If that is not your cup of coffee, run this as a cron job within your system. Create a log group that would collect the logs and then use log monitoring solutions such as Fleuentd or Prometheus/graphana packages to visualize and create alerting solutions. For more information, read https://docs.driftctl.com/0.35.0/installation.

#to scan local filedriftctl scan# To scan backend in AWS 
S3driftctl scan --from tfstate+s3://my-bucket/path/to/state.tfstate

Conclusion

It always prevents the drift from creeping into our code rather than creating remediations after they have crept in. Finally, I would like to suggest that it is always better to write better code and coding practices.

Always try to build automated infrastructure. Even if you perform manual steps, try to import them into Terraform script and then apply them.
Write and apply code incrementally.
Implement a drift-tracking system with a custom alerting system that would mail the SRE about the infra-drift observed.

**Liked my content?**Feel free to reach out to my [LinkedIn](https://www.linkedin.com/in/krishnadutt/) for interesting content and productive discussions.

Tfblueprintgen: A Tool to Simplify Terraform Folder Setup and Provide Base Resource Modules

July 12, 2024 · 4 min read

Whether it’s a React front-end app, a Go CLI tool, or a UI design project, there is always initial toil to figure out the optimal folder structure. As this initial decision influences a lot of flexibility, extensibility, self-explanation, and easy maintenance in our projects, it is key decision to ensure a smooth developer experience.

When working with a new tool/technology/framework, our journey typically starts with reading official “getting started” handbook from their official website or even reading some articles on the same topic. We use these resources and start getting our hands dirty with hands-on experience, often using its structure as a foundation for more complex real-world projects. But these articles or tutorials are often serve us good in initial phases of the project, when the complexity is low. When we are solving complex problem involving multiple actors, the legibility and maintainability takes precedence. It becomes a daunting task to later refactor or sometimes rewrite everything from the scratch. To reduce this hassle and tackle this issue head on, I’ve distilled my Terraform experience into a CLI tool. This tool generates a battle-tested folder structure along with basic modules, allowing us to quickly hit the ground running.

Structuring Terraform Folders

Most companies and their ops teams find it cumbersome to manage multiple environments across multiple regions for their applications. We can structure our terraform folders as follows:

Folder structure organized by Region
Folder structure by Resources ( like AWS EC2, or Azure Functions etc)
Folder structure on use case ( like front-end app, networking etc)
Folder structure organized by Account
Folder structure organized by environment and
A Hybrid of all the above

Given the above options it quite becomes confusing to the teams starting with terraform to decide how to structure their projects. Based on my experience here are my 2 cents on how to structure a terraform project:

Create a modular style code with each module containing all the resources required to create for each use-case. These modules would serve as base blueprints which can be utilized across different environments.
For ex: In case of AWS, The front-end module should consist of Cloud-front, S3 bucket, cloud-front origin access control, s3 policy bucket policy and s3 bucket public access block.
Create a folder structure for each of the environments that you are deploying. This statement would be true, if the architecture across all the environments doesn’t change and their deployment strategies does not change.

Tfblueprintgen: A Terraform tool to generate folder structure and base blueprints

Based on the above postulates, i have created a CLI tool called Tfblueprintgen, which generates the folder structure along with the modular working blocks to create AWS building blocks. In terms of folder structure, the structure will look something like below.

Image 1 : Generated Terraform folder structure with base modules

To run the tool download the both windows and Linux binaries from here or you can build your own binary from here. Use the the binary ( if in Windows double-click to run Tfblueprintgen.exe or if it is Linux run the binary using ./Tfblueprintgen)

Image 2 : Running the tfblueprintgen tool

As described in the image 1, the tool generates two things:

A Parent folder which contains all the main terraform files ( outputs.tf, variables.tf and main.tf )for each environment separated in their own folders.
A Module folder which contains all the different basic resources, segregated in their own separate folders.

These modules can be leveraged within each of the environment folders, by calling those modules using module block and these can be applied using “terraform apply”

With this setup, you can hit the ground up and running in no time. Feel free to add more ideas as issues and Stay tuned to project.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

From Scratch to Brew: Creating Personalized Formulae using tfblueprintgen in Homebrew

July 12, 2024 · 4 min read

Homebrew has become defacto way to get and install the open source apps in the apple ecosystem. While homebrew has a vast repository of the appplications that it supports, it is sometimes required to create and publish our own packages to be installed be consumed by other people. In this blog, we are going to discuss how we can create the custom brew package for an app named tfblueprintgen, how do we test it and how can we install it as a homebrew package. The blog will be divided into following sections:

Understanding tfblueprintgen
Setting up the development environment
Creating and Testing Homebrew formula
Testing it locally
Pushing it to up stream and installing it from the upstream repos.

Understanding tfblueprintgen

Tfblueprintgen is an open-source command-line tool developed using the Charmbracelet CLI assets. It generates a modular file structure with code for your Terraform projects, speeding up the development process. By automating the creation of boilerplate files and directory structures, Tfblueprintgen streamlines setting new Terraform projects. To learn more about the project , refer this.

Setting up the development environment

Once you have your application up in the github environment, package your application for release by pushing it .

Once the release is complete we can see our application is avaliable in compressed format such as .tar.gz or .zip.

First lets setup our dev environment:

Set HOMEBREW_NO_INSTALL_FROM_API=1 in your shell environment,
Run brew tap homebrew/core and wait for the clone to complete. If clone fails run the following commands before running the brew tap homebrew/core again.

git config --global core.compression 0  
git clone --depth 1 <https://github.com/Homebrew/homebrew-core.git>  
git fetch --depth=2147483647  
git pull --all

One this is done we are good to create the homebrew formula

Creating and Testing Homebrew formula

To create the boilerplate homebrew formula run “brew create <url of .tar.gz>” in my case it is

brew create <https://github.com/krishnaduttPanchagnula/tfblueprintgen/archive/refs/tags/0.3.tar.gz>

Running the above command opens a file to edit in vim. This formula file contains following:

desc provides a brief description of the package.
homepage is left blank in this case.
url specifies the download URL for the package source code.
sha256 is the SHA-256 checksum of the package, which Home-brew uses to verify the integrity of the downloaded zip.
license declares the software license for the package.
depends_on specifies the dependencies that current formula depends on.
install contains the instructions for building and installing the package.
test defines a test to ensure that the package was installed correctly by checking the version output.

Make changes to the install and test function so it reflects the installation and testing for your application. In my case i have made changes as follows:

class Tfblueprintgen < Formula  
  desc "This contains the formula for installing tfblueprintgen. tfblueprintgen cli utility developed using charmbracelet CLI assets, which generates the Modular file structure with the code for your Terraform code to speed up the development."  
  homepage ""  
  url "<https://github.com/krishnaduttPanchagnula/tfblueprintgen/archive/refs/tags/0.3.tar.gz>"  
  sha256 "0ef05a67fa416691c849bd61d312bfd2e2194aadb14d9ac39ea2716ce6a834a6"  
  license "MIT"  
  depends\_on "go" => :build  
  def install  
      puts \`ls\`  
      # system ("cd tfblueprintgen-0.2")  
       system ("go build -o tfblueprintgen main.go  ")  
      bin.install "tfblueprintgen"  
  end  
  test do  
    system "#{bin}/tfblueprintgen  --version"  
    expected\_version = "Tfblueprintgen version: 0.3"  
    actual\_version = shell\_output("#{bin}/tfblueprintgen --version").strip  
    assert\_match expected\_version, actual\_version  
  end  
end

Once the formula is defined, install the formula using following command

brew install tfblueprintgen.rb

This command installs the package source, build it according to the formula instructions ( defined in the install function), and install the resulting binary. To test the binary installed run “brew test ”. In my case the command will be

brew test tfblueprintgen

If he tests goes well, you should be seeing the test process running without any errors

Pushing it to up stream and installing it from the upstream repo

Once the formula has been written and tested, now it is time to publish the formula. Create a repository with prefix homebrew like “homebrew-”, which in my case is “homebrew-tfblueprintgen”. Clone this repo locally and move your formula to that folder and push it to github.

To install your tap locally from the formula stored in github

Run brew tap /homebrew-
Then run brew install

In my case this is

brew tap krishnaduttPanchagnula/homebrew-tfblueprintgen 

brew install tfblueprintgen

Voila, your package can now be installed via homebrew.

Developing Visual Documentation using diagrams in python : Diagrams as Code - a novel approach for graphics

July 12, 2024 · 5 min read

We as developers, have read the documentation for different frameworks/libraries, while developing features. But when it comes to us developing documentation for the feature, we usually are in hurry, as our sprint ended or the project has pushed way beyond the deadline.

In addition to that, when we develop the documentation in black ink( just reminiscing the previous version of documentation, in literal ink!), sometimes it very difficult to communicate complex cloud architecture or even systems design via text. So we overcome this problem using images, but we have to leave our beloved IDEs to create them in illustrator/photoshop. What if i tell you that we can develop awesome graphics right from our IDEs, using python.

Introducing Diagrams, a python library which lets you create cloud architecture diagrams using code!!!

Diagrams

Diagrams is an python library, which offers Diagrams as Code (DaC) purpose. It helps us build architecture of our system as code and track changes in our architecture. Presently it covers some of major providers such as AWS, Azure, GCP and other providers such as Digital ocean, Alibaba cloud etc. In addition to that they also support onPrem covering several services such as apache,Docker, Hadoop etc.

Advantages of using Diagrams

Still considering whether to use diagrams or not? How about the following reasons:

No Additional Software Overhead: To create diagrams traditionally, we might want to use softwares such as illustrator or photoshop, which requires additional licenses. Even if we choose open source such as inkscape or Gimp, we still need to install these resources. With diagrams, there is no such thing, just pip install diagrams , you are good to go!
No need to search for high resolution images: When developing these images, we would like to have high resolution images, which can be exported to screen of any size. And often it is a hassle to get these kind of images. Thanks to diagrams in-built repository of images , we are able to build high resolution architecture diagrams with ease.
Ease of Editing: Lets say the your architecture changes during the project timeline( Hey, I know it happens in a project), but changing each of these components manually takes lot of time and effort. Thanks to the Diagrams as code framework, we do this work with ease with few lines of the code.
Reusability: Creating diagrams via code helps us in replicating the product, without any additional effort. All we need to do is import code and lo, behold, we have have our work ready in front of us. Thanks to the power of coding, we are able to replicate and create reusability with our code.

Now that we have seen the reasons why to use it, let’s get our hands dirty working with diagrams in python environment.

Diagrams Implementation example with custom node development and clustering:

Here i am going to create the diagram for project on Developing Real-time resource monitoring via email on AWS using Terraform. To brief the project, I have developed serverless architecture to create notifications for any state change or status change etc. in a clean readable format (rather than in complicated json) in real time via email service. This architecture is developed in the AWS and deployed using terraform. For more details, read this article.

At highend architecture , the components involved are:

Eventbridge
SNS and
Email

The email component is not available in the diagrams library. To create it, we can create our custom email node using custom node development method, where we pass our local image as new node, using following code.

from diagrams.custom import Customemail = Custom(‘Name that you want to see’, ‘path of the image’)

Now that we have our components ready, lets code:

with Diagram(“AWS resource monitoring via email notification”) as diagram1: email = ‘/content/drive/MyDrive/gmail-new-icon-vector-34182308.jpg’ emailicon = Custom(‘Email notification’, email) Eventbridge(“Event bridge rule”) >> Lambda(“Lambda”) >> SNS(‘SNS’) >> emailicon

By implementing the above code, we get the following:

As we have developed this is AWS environment using Terraform, I would like to create a cluster wrapping on the above code,using diagrams.Cluster.

with Diagram(“AWS resource monitoring via email notification”) as diag: email = ‘/content/drive/MyDrive/gmail-new-icon-vector-34182308.jpg’ emailicon = Custom(‘Email notification’, email) with Cluster (“Terraform”): with Cluster (‘AWS’): Eventbridge(“Event bridge rule”) >> Lambda(“Lambda”) >> SNS(‘SNS’) >> emailicon

After embedding it in the cluster, the final image looks like:

Final image for the entire architecture

Here is the Final code in totality:

from diagrams import Diagram  
from diagrams.aws.compute import Lambda  
from diagrams.aws.integration import SNS  
from diagrams.aws.integration import Eventbridge  
from diagrams import Cluster, Diagram  
from diagrams.custom import Customwith Diagram(“AWS resource monitoring via email notification”) as diag:  
  email = ‘/content/drive/MyDrive/gmail-new-icon-vector-34182308.jpg’  
  emailicon = Custom(‘Email notification’, email) with Cluster (“Terraform”): with Cluster (‘AWS’): Eventbridge(“Event bridge rule”) >> Lambda(“Lambda”) >> SNS(‘SNS’) >> emailicon

Follow me on Medium and Github for more Cloud, Dev-ops related content.

Happy Learning and Good Day..!

What is Infrastructure as Code(IAC)​

What is Terraform import and how it can help us?​

Terraform import implementation in AWS​

EC2 instance:​

Causes of Tech Debts:​

Types of Tech Debts:​

Code Related Debts:​

Architecture Debt:​

People/Management Debt:​

Managing and Prioritising Tech Debt​

Why Log Monitoring ?​

Server-Centric (or) Server-less Architecture?​

Case Example​

Key components to build Log Monitoring and Alerting System​

Building Infrastructure and Working with Terraform​

Story Time — How we built our Kubernetes​

Setting up Container Insights on Amazon EKS and Kubernetes​

How to install and set up CloudWatch Container Insights on Amazon EKS or Kubernetes.​

What went wrong​

How we resolved it​

HashiCorp Vault — an Introduction​

Hosting Cost of Vault​

Example of Self-Hosting in AWS Cloud​

Step 1: Create an EC2 Linux instance with ssh keys to access it​

Step 2: SSH into the instance and install the secrets to get it up and running​

Step 3: Configure the Hashicorp valve​

Docker : Multi-Stage Builds​

Creating the Multi-Stage Docker file​

Components used to build this system

Class-Based Jenkins Pipeline​

GitHub actions:​

Creating Custom Action — A case on Parlay​

Developing business logic in bash script​

Dockerize the bash application​

Test the action​

Introduction​

Private vs Public VPN​

Risks of Using Private VPNs​

Benefits of Creating Your Own Personal VPN​

Setting up OpenVPN in AWS​

Setup Open-VPN server in AWS:​

Connecting to Open-VPN server from our device:​

HashiCorp Vault — an Introduction​

Hosting Cost of Vault​

Example of Self-Hosting in AWS Cloud​

Step 1: Create an EC2 Linux instance with ssh keys to access it​

Step 2: SSH into the instance and install the secrets to get it up and running​

Step 3: Configure the Hashicorp valve​

Why should we care about Digital Transformation:​

Characteristics for Digital transformation culture:​

Actions to imbibe cultural change towards digital transformation:​

Key components to build the infrastructure​

Infrastructure Working​

What is SBoM?​

Why create SBOM?​

SBOM format​

SBOM with TRIVY​

Generating SBOM​

Listing Vulnerabilities of SBOM​

What is Terraform Drift?​

Types of Drifts​

Managing Emergent Drift​

Managing Pseudo Drift​

Drift Identification and Monitoring​

CloudQuery​

Driftctl​

Conclusion​

Structuring Terraform Folders​

Tfblueprintgen: A Terraform tool to generate folder structure and base blueprints​

Understanding tfblueprintgen​

Setting up the development environment​

Creating and Testing Homebrew formula​

Pushing it to up stream and installing it from the upstream repo​

Diagrams Implementation example with custom node development and clustering:​

What is Infrastructure as Code(IAC)

What is Terraform import and how it can help us?

Terraform import implementation in AWS

EC2 instance:

Causes of Tech Debts:

Types of Tech Debts:

Code Related Debts:

Architecture Debt:

People/Management Debt:

Managing and Prioritising Tech Debt

Why Log Monitoring ?

Server-Centric (or) Server-less Architecture?

Case Example

Key components to build Log Monitoring and Alerting System

Building Infrastructure and Working with Terraform

Story Time — How we built our Kubernetes

Setting up Container Insights on Amazon EKS and Kubernetes

How to install and set up CloudWatch Container Insights on Amazon EKS or Kubernetes.

What went wrong

How we resolved it

HashiCorp Vault — an Introduction

Hosting Cost of Vault

Example of Self-Hosting in AWS Cloud

Step 1: Create an EC2 Linux instance with ssh keys to access it

Step 2: SSH into the instance and install the secrets to get it up and running

Step 3: Configure the Hashicorp valve

Docker : Multi-Stage Builds

Creating the Multi-Stage Docker file

Class-Based Jenkins Pipeline

GitHub actions:

Creating Custom Action — A case on Parlay

Developing business logic in bash script

Dockerize the bash application

Test the action

Introduction

Private vs Public VPN

Risks of Using Private VPNs

Benefits of Creating Your Own Personal VPN

Setting up OpenVPN in AWS

Setup Open-VPN server in AWS:

Connecting to Open-VPN server from our device:

HashiCorp Vault — an Introduction

Hosting Cost of Vault

Example of Self-Hosting in AWS Cloud

Step 1: Create an EC2 Linux instance with ssh keys to access it

Step 2: SSH into the instance and install the secrets to get it up and running

Step 3: Configure the Hashicorp valve

Why should we care about Digital Transformation:

Characteristics for Digital transformation culture:

Actions to imbibe cultural change towards digital transformation:

Key components to build the infrastructure

Infrastructure Working

What is SBoM?

Why create SBOM?

SBOM format

SBOM with TRIVY

Generating SBOM

Listing Vulnerabilities of SBOM

What is Terraform Drift?

Types of Drifts

Managing Emergent Drift

Managing Pseudo Drift

Drift Identification and Monitoring

CloudQuery

Driftctl

Conclusion

Structuring Terraform Folders

Tfblueprintgen: A Terraform tool to generate folder structure and base blueprints

Understanding tfblueprintgen

Setting up the development environment

Creating and Testing Homebrew formula

Pushing it to up stream and installing it from the upstream repo

Diagrams Implementation example with custom node development and clustering: