Skip to main content

5 posts tagged with "Hashicorp"

View All Tags

Deploy and Run Hashicorp Vault With TLS Security in AWS Cloud

· 9 min read

Often in software engineering, when we are developing new features, it is quite a common feature that we would encode certain sensitive information, such as passwords, secret keys, or tokens, for our code to do its intended functionality. Different professionals within the IT realm use it in different ways, such as the following:

  • Developers use secrets from API tokens, Database credentials, or other sensitive information within the code.
  • Dev-ops engineers might have to export certain values as environment variables and write the values in the YAML file for CI/CD pipeline to run efficiently.
  • The cloud engineers might have to pass the credentials, secret tokens, and other secret information for them to access their respective cloud (In the case of AWS, even if we save these in a .credentials file, we still have to pass the filename in terraform block, which would indicate that the credentials are available locally within the computer.)
  • The system administrators might have to send different logins and passwords to the people for the employees to access different services

But writing it in plain text or sharing it in plain text is quite a security problem, as anyone logging in to the code-base might access the secret or pull up a Man-in-the-Middle attack. To counter this, in the developing world, we have different options like Importing secrets from another file ( YAML, .py, etc.) or exporting them as an environment variable. But both of these still have a problem: a person having access to a singular config file or the machine can echo out the password ( read print). Given these problems, it would be very useful if we could deploy a single solution that would provide solutions to all the IT professionals mentioned above and more. This is the ideal place for introducing Vault.

HashiCorp Vault — an Introduction

HashiCorp Vault is a secrets and encryption management system based on user identity. If we have to compare it with AWS, it is like an IAM user-based resource (read Vault here) management system which secures your sensitive information. This sensitive information can be API encryption keys, passwords, and certificates.

Its workflow can be visualized as follows:

Hosting Cost of Vault

  • Local hosting: This method is usually done if the secrets are to be accessed only by the local users or during the development phase. This method has to be shunned if these secret engines have to be shared with other people. As it is within the local development environment, there is no additional investment for deployment. This can be hosted directly in a local machine or by its official docker image
  • Public Cloud Hosting ( EC2 in AWS/Virtual Machine in Azure): If the idea is to set up Vault to share with people across different regions, hosting it on Public cloud is a good idea. Although we can achieve the same with the on-prem servers, upfront costs and scalability is quite a hassle. In the case of AWS, we can easily secure the endpoint by hosting Vault in the EC2 instance and creating a Security group on which IPs can access the EC2. If you feel more adventurous, you can map this to a domain name and route from Route 53 so the vault is accessible on a domain as a service to the end users. In the case of EC2 hosting with an AWS-defined domain, the cost is $0.0116/hr.
  • Vault cloud Hosting (HashiCorp Cloud Platform): If you don’t want to set up infrastructure in the Public Cloud Environments, there is an option of choosing the cloud hosted by vault. We can think of it as a SaaS-based cloud platform that enables us to use the Vault as a service on a subscription basis. Since hashicorp itself manages the cloud, we can expect a consistent user experience. For the cost, it has three production grade options: Starter at $ 0.50/hr, Standard at $1.58/hr, and Plus at $1.84/hr (as seen in July 2022).

Example of Self-Hosting in AWS Cloud

Our goal in this Project is to create a Vault instance in EC2 and store static secrets in the Key—Value secrets engine. These secrets are later retrieved into the terraform script, which, when applied, would pull the secrets from the Vault Secrets Engine and use them to create infrastructure in AWS.

To create a ready-to-use Vault, we are going to follow these steps:

  1. Create an EC2 Linux instance with ssh keys to access it.
  2. SSH into the instance and install the Vault to get it up and running
  3. Configure the Valve Secrets Manager

Step 1: Create an EC2 Linux instance with ssh keys to access it

To create an EC2 instance and access it remotely via SSH, we need to create the Key pair. First, let's create an SSH key via the AWS console.

Once the Keys have been created and downloaded into the local workbench, we create an EC2 (t2.micro) Linux instance and associate it with the above keys. The size of the EC2 can be selected based on your requirements, but usually, a t2.micro is more than enough.

Step 2: SSH into the instance and install the secrets to get it up and running

Once the status of the EC2 changes to running, open the directory in which you have saved the SSH(.pem) key. Open a terminal and type ssh -i <keyname.pem> ec2-user @<publicdns IP4> . Once we have established a successful SSH session into our Ec2 instance, we can install the Vault using the following commands:

wget -O- <https://apt.releases.hashicorp.com/gpg> | gpg — dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg

echo "deb \[signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg\] <https://apt.releases.hashicorp.com> $(lsb\release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update && sudo apt install vault

The above command would install the vault in the EC2 environment. Sometimes the second command is known to throw some errors. In case of an error, replace $(lsb_release -cs) with “jammy”. [This entire process can be automated by copying the above commands to EC2 user data while creating an EC2 instance].

Step 3: Configure the Hashicorp valve

Before initializing the vault, let's ensure it is properly installed by following the command:

vault

Let's make sure there is no environment variable called VAULT_TOKEN. To do this, use the following command:

$ unset VAULT_TOKEN

Once we have installed the Vault, we need to configure the Vault, which is done using HCL files. These HCL files contain data such as backed, listeners, cluster address, UI settings, etc. As we have discussed in the Vault’s Architecture, the back end on which the data is stored is very different from the vault engine, which is to be persisted even when the vault is locked (Stateful resource). In addition to that, we need to specify the following details:

  • Listener Ports: the port/s on which the Vault listens for API requests.
  • API address: Specifies the address to advertise to route client requests.
  • Cluster address: Indicates the address and port to be used for communication between the Vault nodes in a cluster. To secure it much further, we can use TLS-based communication. This step is optional and can only be tried if you want to secure your environment further. The TLS Certificate can be generated using openssl in Linux.
# Installs openssl
sudo apt install openssl

#Generates TLS Certificate and Private Key
openssl req -newkey rsa:4096 -x509 -sha512 -days 365 -nodes -out certificate.pem -keyout privatekey.pem

Insert the TLS Certificate and Private Key file paths in their respective arguments in the listener “tcp” block.

  • tls_cert_file: Specifies the path to the certificate for TLS in PEM encoded file format.
  • tls_key_file: Specifies the path to the private key for the certificate in PEM-encoded file format.
#Configuration in config.hcl file

storage "raft" {
path = "./vault/data"
node\id = "node1"
}
listener "tcp" {
address = "127.0.0.1:8200"
tls\disable = "true"
tls\cert\file = certificate.pem
tls\key\file = privatekey.pem
}
disable\mlock = true
api_addr = "http://127.0.0.1:8200"
cluster_addr = "https://127.0.0.1:8201"
ui = true

Once these are created, we create the folder where our backend will rest: vault/data.

mkdir -p ./vault/data

Once done, we can start the vault server using the following command:

vault server -config=config.hcl

Once done, we can start our Vault instance with the backend mentioned in the config file and all its settings.

export VAULT_ADDR='http://127.0.0.1:8200'

vault operator init

After it is initialized, it creates five Unseal keys called shamir keys (out of which three are used to unseal the Vault by default settings) and an Initial root token. This is the only time ever that all of this data is known by Vault, and these details are to be saved securely to unseal the vault. In reality, these shamir keys are to be distributed among key stakeholders in the project, and the Key threshold should be set in such a fashion that Vault can be unsealed when the majority are in consensus to do so.

Once we have created these Keys and the initial token, we need to unseal the vault:

vault operator unseal

Here we need to supply the threshold number of keys to unseal. Once we supply that, the sealed status changes to false.

Then we log in to the Vault using the Initial root token.

vault login

Once authenticated successfully, you can easily explore different encryption engines, such as Transit secrets Engine. This helps encrypt the data in transit, such as the Key-Value Store, which is used to securely store Key-value pairs such as passwords, credentials, etc.

As seen from the process, Vault is pretty robust in terms of encryption, and as long as the shamir keys and initial token are handled in a sensitive way, we can ensure the security and integrity

And you have a pretty secure Vault engine (protected by its own shamir keys) running on a free AWS EC2 instance (which is, in turn, guarded by the security groups)!

**Want to Connect?**

If you want to connect with me, you can do so on [LinkedIn](https://www.linkedin.com/in/krishnadutt/).

Deploy and Run Hashicorp Vault With TLS Security in AWS Cloud

· 9 min read

Often in software engineering, when we are developing new features, it is quite a common feature that we would encode certain sensitive information, such as passwords, secret keys, or tokens, for our code to do its intended functionality. Different professionals within the IT realm use it in different ways, such as the following:

  • Developers use secrets from API tokens, Database credentials, or other sensitive information within the code.
  • Dev-ops engineers might have to export certain values as environment variables and write the values in the YAML file for CI/CD pipeline to run efficiently.
  • The cloud engineers might have to pass the credentials, secret tokens, and other secret information for them to access their respective cloud (In the case of AWS, even if we save these in a .credentials file, we still have to pass the filename in terraform block, which would indicate that the credentials are available locally within the computer.)
  • The system administrators might have to send different logins and passwords to the people for the employees to access different services

But writing it in plain text or sharing it in plain text is quite a security problem, as anyone logging in to the code-base might access the secret or pull up a Man-in-the-Middle attack. To counter this, in the developing world, we have different options like Importing secrets from another file ( YAML, .py, etc.) or exporting them as an environment variable. But both of these still have a problem: a person having access to a singular config file or the machine can echo out the password ( read print). Given these problems, it would be very useful if we could deploy a single solution that would provide solutions to all the IT professionals mentioned above and more. This is the ideal place for introducing Vault.

HashiCorp Vault — an Introduction

HashiCorp Vault is a secrets and encryption management system based on user identity. If we have to compare it with AWS, it is like an IAM user-based resource (read Vault here) management system which secures your sensitive information. This sensitive information can be API encryption keys, passwords, and certificates.

Its workflow can be visualized as follows:

Hosting Cost of Vault

  • Local hosting: This method is usually done if the secrets are to be accessed only by the local users or during the development phase. This method has to be shunned if these secret engines have to be shared with other people. As it is within the local development environment, there is no additional investment for deployment. This can be hosted directly in a local machine or by its official docker image
  • Public Cloud Hosting ( EC2 in AWS/Virtual Machine in Azure): If the idea is to set up Vault to share with people across different regions, hosting it on Public cloud is a good idea. Although we can achieve the same with the on-prem servers, upfront costs and scalability is quite a hassle. In the case of AWS, we can easily secure the endpoint by hosting Vault in the EC2 instance and creating a Security group on which IPs can access the EC2. If you feel more adventurous, you can map this to a domain name and route from Route 53 so the vault is accessible on a domain as a service to the end users. In the case of EC2 hosting with an AWS-defined domain, the cost is $0.0116/hr.
  • Vault cloud Hosting (HashiCorp Cloud Platform): If you don’t want to set up infrastructure in the Public Cloud Environments, there is an option of choosing the cloud hosted by vault. We can think of it as a SaaS-based cloud platform that enables us to use the Vault as a service on a subscription basis. Since hashicorp itself manages the cloud, we can expect a consistent user experience. For the cost, it has three production grade options: Starter at $ 0.50/hr, Standard at $1.58/hr, and Plus at $1.84/hr (as seen in July 2022).

Example of Self-Hosting in AWS Cloud

Our goal in this Project is to create a Vault instance in EC2 and store static secrets in the Key—Value secrets engine. These secrets are later retrieved into the terraform script, which, when applied, would pull the secrets from the Vault Secrets Engine and use them to create infrastructure in AWS.

To create a ready-to-use Vault, we are going to follow these steps:

  1. Create an EC2 Linux instance with ssh keys to access it.
  2. SSH into the instance and install the Vault to get it up and running
  3. Configure the Valve Secrets Manager

Step 1: Create an EC2 Linux instance with ssh keys to access it

To create an EC2 instance and access it remotely via SSH, we need to create the Key pair. First, let's create an SSH key via the AWS console.

Once the Keys have been created and downloaded into the local workbench, we create an EC2 (t2.micro) Linux instance and associate it with the above keys. The size of the EC2 can be selected based on your requirements, but usually, a t2.micro is more than enough.

Step 2: SSH into the instance and install the secrets to get it up and running

Once the status of the EC2 changes to running, open the directory in which you have saved the SSH(.pem) key. Open a terminal and type ssh -i <keyname.pem> ec2-user @<publicdns IP4> . Once we have established a successful SSH session into our Ec2 instance, we can install the Vault using the following commands:

wget -O- <https://apt.releases.hashicorp.com/gpg> | gpg — dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg

echo "deb \[signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg\] <https://apt.releases.hashicorp.com> $(lsb\release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update && sudo apt install vault

The above command would install the vault in the EC2 environment. Sometimes the second command is known to throw some errors. In case of an error, replace $(lsb_release -cs) with “jammy”. [This entire process can be automated by copying the above commands to EC2 user data while creating an EC2 instance].

Step 3: Configure the Hashicorp valve

Before initializing the vault, let's ensure it is properly installed by following the command:

vault

Let's make sure there is no environment variable called VAULT_TOKEN. To do this, use the following command:

$ unset VAULT_TOKEN

Once we have installed the Vault, we need to configure the Vault, which is done using HCL files. These HCL files contain data such as backed, listeners, cluster address, UI settings, etc. As we have discussed in the Vault’s Architecture, the back end on which the data is stored is very different from the vault engine, which is to be persisted even when the vault is locked (Stateful resource). In addition to that, we need to specify the following details:

  • Listener Ports: the port/s on which the Vault listens for API requests.
  • API address: Specifies the address to advertise to route client requests.
  • Cluster address: Indicates the address and port to be used for communication between the Vault nodes in a cluster. To secure it much further, we can use TLS-based communication. This step is optional and can only be tried if you want to secure your environment further. The TLS Certificate can be generated using openssl in Linux.
# Installs openssl
sudo apt install openssl

#Generates TLS Certificate and Private Key
openssl req -newkey rsa:4096 -x509 -sha512 -days 365 -nodes -out certificate.pem -keyout privatekey.pem

Insert the TLS Certificate and Private Key file paths in their respective arguments in the listener “tcp” block.

  • tls_cert_file: Specifies the path to the certificate for TLS in PEM encoded file format.
  • tls_key_file: Specifies the path to the private key for the certificate in PEM-encoded file format.
#Configuration in config.hcl file

storage "raft" {
path = "./vault/data"
node\id = "node1"
}
listener "tcp" {
address = "127.0.0.1:8200"
tls\disable = "true"
tls\cert\file = certificate.pem
tls\key\file = privatekey.pem
}
disable\mlock = true
api_addr = "http://127.0.0.1:8200"
cluster_addr = "https://127.0.0.1:8201"
ui = true

Once these are created, we create the folder where our backend will rest: vault/data.

mkdir -p ./vault/data

Once done, we can start the vault server using the following command:

vault server -config=config.hcl

Once done, we can start our Vault instance with the backend mentioned in the config file and all its settings.

export VAULT_ADDR='http://127.0.0.1:8200'

vault operator init

After it is initialized, it creates five Unseal keys called shamir keys (out of which three are used to unseal the Vault by default settings) and an Initial root token. This is the only time ever that all of this data is known by Vault, and these details are to be saved securely to unseal the vault. In reality, these shamir keys are to be distributed among key stakeholders in the project, and the Key threshold should be set in such a fashion that Vault can be unsealed when the majority are in consensus to do so.

Once we have created these Keys and the initial token, we need to unseal the vault:

vault operator unseal

Here we need to supply the threshold number of keys to unseal. Once we supply that, the sealed status changes to false.

Then we log in to the Vault using the Initial root token.

vault login

Once authenticated successfully, you can easily explore different encryption engines, such as Transit secrets Engine. This helps encrypt the data in transit, such as the Key-Value Store, which is used to securely store Key-value pairs such as passwords, credentials, etc.

As seen from the process, Vault is pretty robust in terms of encryption, and as long as the shamir keys and initial token are handled in a sensitive way, we can ensure the security and integrity

And you have a pretty secure Vault engine (protected by its own shamir keys) running on a free AWS EC2 instance (which is, in turn, guarded by the security groups)!

**Want to Connect?**

If you want to connect with me, you can do so on [LinkedIn](https://www.linkedin.com/in/krishnadutt/).

Developing Real-time resource monitoring via email on AWS using Terraform

· 4 min read

One of the main tasks as an SRE engineer is to maintain the infrastructure that is developed for the deployment of the application. As each of the service exposes the logs in different way, we need plethora of sns and lambdas to monitor the infrastructure. This increases the cost of monitoring, which would compel management to drop this monitoring system.

But what if i say that, we can develop this monitoring system for less than 24 cents ? And what if i say that you can deploy this entire monitoring system with just a single command “Terraform apply”? Sounds like something that you would like to know? Hop on the Terraform ride !

Key components to build the infrastructure

In order to create an monitoring system to send email alerts, we need 3 components:

  1. Event Bridge
  2. SNS
  3. Email subscription

We can build a rudimentary monitoring system, by combining all these components. But the logs we get as email, would be as following:

{
"version": "1.0",
"timestamp": "2022-02-01T12:58:45.181Z",
"requestContext": {
"requestId": "a4ac706f-1aea-4b1d-a6d2-5e6bb58c4f3e",
"functionArn": "arn:aws:lambda:ap-south-1:498830417177:function:gggg:$LATEST",
"condition": "Success",
"approximateInvokeCount": 1
},
"requestPayload": {
"Records": [
{
"eventVersion": "2.1",
"eventSource": "aws:s3",
"awsRegion": "ap-south-1",
"eventTime": "2022-02-01T12:58:43.330Z",
"eventName": "ObjectCreated:Put",
"userIdentity": {
"principalId": "A341B33DQLH0UH"
},
"requestParameters": {
"sourceIPAddress": "43.241.67.169"
},
"responseElements": {
"x-amz-request-id": "GX86AGXCNXB5ZYVQ",
"x-amz-id-2": "CPVpR8MNcPsNBzxcF8nOFqXbAIU60/zQlNC6njLp+wNFtC/ZnZF0SFhfMuhLOSpEqMFvvPqLA+tyvaXJSYMXAByR5EuDM0VF"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "09dae0eb-9352-4d8a-964f-1026c76a5dcc",
"bucket": {
"name": "sddsdsbbb",
"ownerIdentity": {
"principalId": "A341B33DQLH0UH"
},
"arn": "arn:aws:s3:::sddsdsbbb"
},
"object": {
"key": "[variables.tf]",
"size": 402,
"eTag": "09ba37f25be43729dc12f2b01a32b8e8",
"sequencer": "0061F92E834A4ECD4B"
}
}
}
]
},
"responseContext": {
"statusCode": 200,
"executedVersion": "$LATEST"
},
"responsePayload": "binary/octet-stream"
}

Not so easy to read right ? What if we can improve it, making it legible for anyone to understand what is happening?

To make it easy to read, we use the feature in the Event bridge called input transformer and input template. This feature helps us in transforming the log in our desired format without using any lambda function.

Infrastructure Working

The way our infrastructure works is as follows:

  1. Our event bridge will collect all the logs from all the events from the AWS account, using event filter.

  2. Once collected, these are sent to input transformer to parse and read our desired components.

  3. After this, we use this parsed data to create our desired format using input template.

Input transformer and input templete for event bridge rule

  1. This transformed data is published to the SNS that we have created.

  2. We create a subscription for this SNS, via email,SMS or HTTP.

And Voila ! you have your infrastructure ready to update the changes…!

Here is the entire terraform code:

terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
}
}\# Configure the AWS Provider
provider "aws" {
region = "ap-south-1" #insert your region code
}resource "aws_cloudwatch_event_rule" "eventtosns" {
name = "eventtosns"
event_pattern = jsonencode(
{
account = [
var.account,#insert your account number
]
}
)}resource "aws_cloudwatch_event_target" "eventtosns" {\# arn of the target and rule id of the eventrule
arn = aws_sns_topic.eventtosns.arn
rule = aws_cloudwatch_event_rule.eventtosns.idinput_transformer {
input_paths = {
Source = "$.source",
detail-type = "$.detail-type",
resources = "$.resources",
state = "$.detail.state",
status = "$.detail.status"
}
input_template = "\\"Resource name : <Source> , Action name : <detail-type>,
details : <status> <state>, Arn : <resources>\\""
}
}resource "aws_sns_topic" "eventtosns" {
name = "eventtosns"
}resource "aws_sns_topic_subscription" "snstoemail_email-target" {
topic_arn = aws_sns_topic.eventtosns.arn
protocol = "email"
endpoint = var.email
}\# aws_sns_topic_policy.eventtosns:
resource "aws_sns_topic_policy" "eventtosns" {
arn = aws_sns_topic.eventtosns.arnpolicy = jsonencode(
{
Id = "default_policy_ID"
Statement = [
{
Action = [
"SNS:GetTopicAttributes",
"SNS:SetTopicAttributes",
"SNS:AddPermission",
"SNS:RemovePermission",
"SNS:DeleteTopic",
"SNS:Subscribe",
"SNS:ListSubscriptionsByTopic",
"SNS:Publish",
"SNS:Receive",
]
condition = {
test = "StringEquals"
variable = "AWS:SourceOwner"
values = [
var.account,
]
}Effect = "Allow"
Principal = {
AWS = "\*"
}
Resource = aws_sns_topic.eventtosns.arn
Sid = "__default_statement_ID"
},
{
Action = "sns:Publish"
Effect = "Allow"
Principal = {
Service = "events.amazonaws.com"
}
Resource = aws_sns_topic.eventtosns.arn
Sid = "AWSEvents_lambdaless_Idcb618e86-b782-4e67-b507-8d10aaca5f09"
},
]
Version = "2008-10-17"
}
)
}

This entire infrastructure can be deployed using Terraform apply on above code.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.

Deploy and Run Hashicorp Vault With TLS Security in AWS Cloud

· 9 min read

In an ideal IaC world, all our infrastructure implementation and updates are written and implemented by pushing the updated code to GitHub, which would trigger a CI/CD pipeline either in Jenkins or Circle-Ci, and changes are reflected in our favorite public cloud. But reality is far from this, even in companies in stage four of cloud maturity. It can be for a plethora of reasons, such as the following:

  • The company is still in its initial stages of cloud automation
  • There are multiple stakeholders across different teams who are developing proofs-of-concept via console.
  • An ad-hoc manual hot-fix is introduced to stabilize the current production.
  • The user is not aware of IAC tools

Given these reasons, different categories of drift are introduced into the system, each of which has its own remediation actions. This article explains terraform drift, its categories, remediation strategies, and tools to monitor terraform drift.

To understand these concepts better, let’s first explore what terraform drift is and how Terraform detects this drift.

What is Terraform Drift?

When we create resources (i.e., terraform apply) using Terraform, it stores information about current infrastructure, either locally or remote backed in a file named terraform.tfstate. On subsequent terraform apply, this file gets updated with the current state of the infrastructure. But when we make manual changes via console or CLI, those changes are applied in the cloud environment but not seen in the state file.

Terraform drift can be understood as a drift /difference that is observed from the actual state of infrastructure defined in our terraform to the state of infrastructure present in our cloud environment.

In any of the above situations, having the infrastructure changes outside Terraform code causes our Terraform state file to have a very different state than the cloud environment. So when we apply the Terraform code next time, we would see a drift, which might cause the Terraform resources to either change or destroy the resources. So understanding how different kinds of drift creeps into our infrastructure helps us mitigate such risks.

Types of Drifts

We can categorize the Terraform configuration drift into three categories:

  1. Emergent drift — Drift observed when infrastructure changes are made outside of the Terraform ecosystem, which was initially applied via Terraform (So their state is present in the Terraform state file).
  2. Pseudo drift — “Changes” seen in the plan/apply cycle due to ordering items in the list and other provider idiosyncrasies.
  3. Introduced drift — New infrastructure created outside of Terraform.

Sometimes it is debated that introduced drift should not be considered as the infrastructure is entirely set up via the console. But the idea of using Terraform entirely automates infrastructure processes via code. So any manual/hybrid is considered as drift.

Managing Emergent Drift

As mentioned, emergent drift is observed when infrastructure applied and managed by Terraform is modified outside of the Terraform ecosystem. This can be managed based on the state that we prefer :

  • Infrastructure state preferred: If our preferred state is the state that is in the cloud, then we would make changes to our Terraform configuration figure ( usually [main.tf](http://main.tf) file ) and its dependent modules so that next time we run terraform apply, the state of the configuration file and Terraform state file are in sync.
  • Configuration state preferred: If our preferred state is the one in our configuration file, we just run terraform apply using our configuration file. This would negate all the changes in the cloud and apply the configuration in the Terraform configuration file.

Managing Pseudo Drift

Pseudo drift can be observed when the ordering of certain resources or certain arguments for a resource is different in the configuration file from the state file. This drift is not so common but can seldom be observed with some providers. To understand this better, let’s take an example of creating a multi-availability zone RDS.

resource "aws\_db\_instance" "default" {
allocated\_storage = 10
engine = "mysql"
engine\_version = "5.7"
instance\_class = "db.t3.micro"
availability\_zone = \["us-east-1b","us-east-1c","us-east-1a"\]# Us-east -1a was added later
name = "mydb"
username = "foo"
password = "foobarbaz"
parameter\_group\_name = "default.mysql5.7"
skip\_final\_snapshot = true
}

Initially, we only wanted east-1b and 1c, but later added the 1a. When we applied this configuration, it ran successfully. Being careful SRE engineers that we are, we run a terraform plan to confirm that everything is the way we wanted. But to our surprise, we might see it adding this resource again with changes in the “availability zone” line. And when we apply this change again, this change log can be shown in the subsequent terraform apply lifecycles.

To manage this, we should run terraform show which will show us the current state file. Locate the availability zone argument and see the order in which these arguments are passed as a list. Copy these values to the Terraform configuration file, and you should be good to go.

Managing Introduced Drift

Introduced drift happens when new infrastructure is provisioned outside the Terraform ecosystem in the cloud. This is the most gruesome drift, which would require a conscientious effort from the engineer to detect and handle, as there is no track of these changes in the Terraform state file. Unless viewed via console by going through each resource, reading the cloud-watch logs, checking the billing console, or learning from the person who has done this change, it is quite difficult to detect this drift. This can also happen when we run terraform destroy, and some resources fail to destroy.

If we can identify the resource which is manually provisioned, there are two approaches based on which environment it is present:

  1. Provisioning anew: If the resource is not in a production-grade environment, it is recommended that we destroy that resource and then create a module for the same within our Terraform configuration file. This way, the infrastructure is logged, tracked, and monitored via Terraform state file, and all the resources are created via Terraform.
  2. Terraform import: If the resource is present in the production-grade environment, it is difficult to create it anew. In this case, we import the resources using the “terraform import.” Terraform import helps us create Terraform HCL code for the resource in question. Once we get this resource, we can copy this code into the Terraform configuration file, which, when applied, would update the state file with the same configuration as the state present in the cloud.

Drift Identification and Monitoring

All this management of the drift can be done only when we can detect that there is a drift. In the case of emergent and pseudo drift, we can identify them using the “Terraform Plan” command, which would compare the current state file with resources in the cloud (previously created with Terraform). But this would fail in the case of introduced drift, as there is no state for the resource created outside the Terraform ecosystem. So it would serve us better if we can detect this kind of drift beforehand and automate it via IAC tools. This drift can be done using two tools:

CloudQuery

If you like to use a data-centric approach with visualization dashboard, this solution is for you. CloudQuery is an open source tool that compares the state file with the resources in our desired cloud provider, then formats and loads this data into a PostgreSQL database. As a drift detection command is created on top of PostgreSQL with a column as managed or unmanaged, we can use this flag as a filter to visualize in our favorite dashboard solution, such as Tableau or Power BI, to monitor infrastructure state drift. (For more information, refer to https://www.cloudquery.io/docs/cli/commands/cloudquery.)

providers:
# provider configurations
- name: aws
configuration:
accounts:
- id: <UNIQUE ACCOUNT IDENTIFIER>
# Optional. Role ARN we want to assume when accessing this account
# role\_arn: < YOUR\_ROLE\_ARN >
# Named profile in config or credential file from where CQ should grab credentials
local\_profile = default
# By default assumes all regions
regions:
- us-east-1
- us-west-2

# The maximum number of times that a request will be retried for failures.
max\_retries: 5
# The maximum back off delay between attempts. The backoff delays exponentially with a jitter based on the number of attempts. Defaults to 30 seconds.
max\_backoff: 20
#
# list of resources to fetch
resources:
- "\*"

Driftctl

If you are more of a CLI kind of person who loves working with the terminal, this tool is for you. Driftctl helps us track and detect managed and unmanaged drifts that may happen with a single command.

Since this is a CLI-based tool, this can be easily integrated into the CI/CD pipeline written in the Jenkins pipeline, and the results can be pushed as output to the PR in GitHub. If that is not your cup of coffee, run this as a cron job within your system. Create a log group that would collect the logs and then use log monitoring solutions such as Fleuentd or Prometheus/graphana packages to visualize and create alerting solutions. For more information, read https://docs.driftctl.com/0.35.0/installation.

#to scan local filedriftctl scan# To scan backend in AWS
S3driftctl scan --from tfstate+s3://my-bucket/path/to/state.tfstate

Conclusion

It always prevents the drift from creeping into our code rather than creating remediations after they have crept in. Finally, I would like to suggest that it is always better to write better code and coding practices.

  • Always try to build automated infrastructure. Even if you perform manual steps, try to import them into Terraform script and then apply them.
  • Write and apply code incrementally.
  • Implement a drift-tracking system with a custom alerting system that would mail the SRE about the infra-drift observed.
**Liked my content?**Feel free to reach out to my [LinkedIn](https://www.linkedin.com/in/krishnadutt/) for interesting content and productive discussions.

Developing Visual Documentation using diagrams in python : Diagrams as Code - a novel approach for graphics

· 5 min read

We as developers, have read the documentation for different frameworks/libraries, while developing features. But when it comes to us developing documentation for the feature, we usually are in hurry, as our sprint ended or the project has pushed way beyond the deadline.

In addition to that, when we develop the documentation in black ink( just reminiscing the previous version of documentation, in literal ink!), sometimes it very difficult to communicate complex cloud architecture or even systems design via text. So we overcome this problem using images, but we have to leave our beloved IDEs to create them in illustrator/photoshop. What if i tell you that we can develop awesome graphics right from our IDEs, using python.

Introducing Diagrams, a python library which lets you create cloud architecture diagrams using code!!!

Diagrams

Diagrams is an python library, which offers Diagrams as Code (DaC) purpose. It helps us build architecture of our system as code and track changes in our architecture. Presently it covers some of major providers such as AWS, Azure, GCP and other providers such as Digital ocean, Alibaba cloud etc. In addition to that they also support onPrem covering several services such as apache,Docker, Hadoop etc.

Advantages of using Diagrams

Still considering whether to use diagrams or not? How about the following reasons:

  1. No Additional Software Overhead: To create diagrams traditionally, we might want to use softwares such as illustrator or photoshop, which requires additional licenses. Even if we choose open source such as inkscape or Gimp, we still need to install these resources. With diagrams, there is no such thing, just pip install diagrams , you are good to go!
  2. No need to search for high resolution images: When developing these images, we would like to have high resolution images, which can be exported to screen of any size. And often it is a hassle to get these kind of images. Thanks to diagrams in-built repository of images , we are able to build high resolution architecture diagrams with ease.
  3. Ease of Editing: Lets say the your architecture changes during the project timeline( Hey, I know it happens in a project), but changing each of these components manually takes lot of time and effort. Thanks to the Diagrams as code framework, we do this work with ease with few lines of the code.
  4. Reusability: Creating diagrams via code helps us in replicating the product, without any additional effort. All we need to do is import code and lo, behold, we have have our work ready in front of us. Thanks to the power of coding, we are able to replicate and create reusability with our code.

Now that we have seen the reasons why to use it, let’s get our hands dirty working with diagrams in python environment.

Diagrams Implementation example with custom node development and clustering:

Here i am going to create the diagram for project on Developing Real-time resource monitoring via email on AWS using Terraform. To brief the project, I have developed serverless architecture to create notifications for any state change or status change etc. in a clean readable format (rather than in complicated json) in real time via email service. This architecture is developed in the AWS and deployed using terraform. For more details, read this article.

At highend architecture , the components involved are:

  1. Eventbridge
  2. SNS and
  3. Email

The email component is not available in the diagrams library. To create it, we can create our custom email node using custom node development method, where we pass our local image as new node, using following code.

from diagrams.custom import Customemail = Custom(‘Name that you want to see’, ‘path of the image’)

Now that we have our components ready, lets code:

with Diagram(“AWS resource monitoring via email notification”) as diagram1: email = ‘/content/drive/MyDrive/gmail-new-icon-vector-34182308.jpg’ emailicon = Custom(‘Email notification’, email) Eventbridge(“Event bridge rule”) >> Lambda(“Lambda”) >> SNS(‘SNS’) >> emailicon

By implementing the above code, we get the following:

As we have developed this is AWS environment using Terraform, I would like to create a cluster wrapping on the above code,using diagrams.Cluster.

with Diagram(“AWS resource monitoring via email notification”) as diag: email =/content/drive/MyDrive/gmail-new-icon-vector-34182308.jpg’ emailicon = Custom(‘Email notification’, email) with Cluster (“Terraform”): with Cluster (‘AWS’): Eventbridge(“Event bridge rule”) >> Lambda(“Lambda”) >> SNS(‘SNS’) >> emailicon

After embedding it in the cluster, the final image looks like:

Final image for the entire architecture

Here is the Final code in totality:

from diagrams import Diagram
from diagrams.aws.compute import Lambda
from diagrams.aws.integration import SNS
from diagrams.aws.integration import Eventbridge
from diagrams import Cluster, Diagram
from diagrams.custom import Customwith Diagram(“AWS resource monitoring via email notification”) as diag:
email =/content/drive/MyDrive/gmail-new-icon-vector-34182308.jpg’
emailicon = Custom(‘Email notification’, email) with Cluster (“Terraform”): with Cluster (‘AWS’): Eventbridge(“Event bridge rule”) >> Lambda(“Lambda”) >> SNS(‘SNS’) >> emailicon

Follow me on Medium and Github for more Cloud, Dev-ops related content.

Happy Learning and Good Day..!