Skip to main content

2 posts tagged with "Docker"

View All Tags

Optimizing Golang Docker images with multi-stage build

· 5 min read

With the increasing scale of development required to build an product, a large number of developers are required to develop, share and maintain the code. And as each of the developer’s environment are different from one another, it becomes quite a hassle to create similar environments with similar library versions. To solve this issue we use Docker, which creates similar environment experience for all the developers. When using docker, often we face problem of creating a large docker images sometimes racking up to some GBs of space. This idea very much defeats the idea that Docker has evolved beyond the classic VMs — “To create light weight and resource-light images that the developers can work easily ”

To solve this problem of docker image bloating, we have several solutions such as using dockerignore to not add unnecessary files, using distroless/minimal base images or Minimizing the number of layers. But when we are building an application, we would different tools which would rule out the possibility of using distroless images. And when building we would be dealing with several steps so there is so much availability to reduce the layers within the dockerfile.

The tools that we use to build the application are often not used when running application. So what if we can somehow separate/remove these build tools and have only the tools which will run the application? Enter the Multi-stage builds in docker.

Docker : Multi-Stage Builds

Multi-stage builds is an implementation of Builder Pattern in a Dockerfile which helps in minimizing the size of the final container, improving run time performance, allowing for better organization of Docker commands. A multi-stage build is done by breaking the single-stage dockerfile into different sections (You can think them as different jobs like build, stage etc) within the same dockerfile, thereby creating separation of environments. As each of this step would use base image that is only useful to that step while passing its outputs to next step, we can keep the docker image lean. This can also be done using different dockerfiles in a CI (Continuous Integration pipeline), by passing the output of one stage to another. But the Multi-stage feature from docker remove the need to create all these steps with our pipeline and helps keep it clean.

Creating the Multi-Stage Docker file

For explaining this, we will be building and running a Movie application written in Golang, which performs basic crud operations. The code for the app can be found here.

As we know in Go, in order for the app to run we need to compile it. On compilation it will create a executable ( pertaining to that OS) and only this executable is required to run the application. To illustrate the power of multi-stage build let’s first build it as a single stage Docker file.

Once we run docker build on the above file, we get the following executable which is around 350 MB.

Now lets separate build stage and execution stage into two different environments. For build-stage environment, lets use the Golang image based on alpine which comes loaded with all the tools required to run, test, build and validate the Golang. We build our application using the tools using this environment tools. Once this is done we pass the executable to environment which is execution/production environment which will run the executable.

Since the executable is created, we would not require much of the previous environment tools and can work with a base alpine image. Once we run docker build on this file, we observe that size of the file is around 13 MB ( named crud_multistage in the below picture) compared to 350 MB (named crud in the below picture) from single-stage Dockerfile. This multi-stage build offered around 95% reduction in total sie of the docker image

Since this image is very small, it easier for portability and can be easily to deploy in production. Although the multi-stage build sounds like a fantastic idea, there are certain scenarios when this should be used and certain scenarios when this should be avoided.

When not to use Multi-stage build:

  • When the language you are writing completely packages requirements into a single file ( like in case of GO etc) or at-least as a group of file ( in case of JavaScript etc).
  • If you are not planning to run docker exec commands on final artifact to explore the application code.
  • If you don’t require the tools and files used in build stage, further down the line to debug the final artifact.

When to use Multi-stage build:

  • When you want to minimize the total size of the final Docker image that you deploy into production.
  • When you want to speed up your CI/CD processes by running steps/stages in the Docker file in parallel.
  • When different layers in Your Docker file are straight-forward and standardized.
  • When you are fine with loosing the build intermediaries and only want the final docker artifact.

Vulnerability Identification of Images and Files using SBOM with Trivy

· 7 min read

As we embrace digital progress and shift to using containers for our applications, we’re dealing with a variety of core images that form the basis of our apps. These images vary based on our app’s needs, given the different ways we build them. Unfortunately, in companies with a moderate level of DEVOPS experience (maturity levels 2 to 3), the focus often leans heavily towards getting new features out quickly, sometimes overlooking security.

For us developers, the challenge is keeping up with the ever-changing world of software. This includes the extra pieces they need (dependencies), the building blocks (components), and the tools (libraries) that make them work together across our whole system. All these parts combined are what we call the Software Bill of Materials (SBOM). To tackle this, we need to understand the SBOM of each image before we use it as a foundation for our apps. But doing this manually is complex and time-consuming. Plus, every time we want to update to a newer image version, we have to go through the whole process again, slowing down our ability to deliver our products.

The solution? Automation. By using automation, we can navigate the intricate process of understanding the SBOM. It helps us make organized lists of all the parts, find any known problems, and suggest ways to fix them — all done with tools like Trivy. Before we dive into the detailed steps, let’s make sure we’re all on the same page about what the SBOM really means.

What is SBoM?

SBOM stands for Software Bill of Materials. Think of it like a detailed list of all the parts that make up our software. This list includes things like tools from other people, building blocks, and even the rules that guide how they’re used, all with exact version details.

SBOM is important because it gives us a big picture of all the different parts that come together to create a specific software. This helps our DevSecOps teams find out if there are any possible risks, understand how they could affect us, and take steps to fix them. All of this makes our software strong and secure.

Why create SBOM?

  1. Finding and Managing Outside Parts: SBOM helps us see all the software we use from others. It shows us different versions and even points out any possible security issues. With this info, we can make smart choices about what we use, especially when it comes to libraries and tools from other sources.
  2. Making Our Supply Chain Secure: SBOM acts like a detailed map for our software. This map helps us make sure everything is safe and guards against any tricks or attacks on our software supply chain. We can even use SBOM to check if the people we get our software from follow good security rules.

SBOM format

  • Software Package Data Exchange (SPDX): This open standard serves as a software bill of materials (SBOM), identifying and cataloging components, licenses, copyrights, security references, and other metadata related to software. While its primary purpose is to enhance license compliance, it also contributes to software supply-chain transparency and security improvement.
  • Software Identification Tags (SWID): These tags contain descriptive details about a specific software release, including its product and version. They also specify the organizations and individuals involved in producing and distributing the product. These tags establish a product’s lifecycle, from installation on an endpoint to deletion.
  • CycloneDX (CDX): The CycloneDX project establishes standards in XML, JSON, and protocol buffers. It offers an array of official and community-supported tools that either generate or work with this standard. While similar to SPDX, CycloneDX is a more lightweight alternative.

In addition to creating SBOM, trivy has the capability to SCAN the SBOM generated either by trivy or other tools, to identify the severity of the problem. In addition to the vulnerability detection, it also suggests the possible fixes for the identified vulnerability.

SBOM with TRIVY

Trivy is a comprehensive security scanner, maintained and built by aqua security team. It is reliable, fast, and has several in-built scanners to scan for has different security issues, and different targets where it can find those issues. It can be used to scan following use cases:

  • OS packages and software dependencies in use (SBOM)
  • Known vulnerabilities (CVEs)
  • IaC misconfigurations
  • Sensitive information and secrets

Today we will be focusing on the SBOM scanning capabilities of the Trivy. In this tutorial, we would doing the following :

  • First, Create the SBOM using trivy
  • Analyze the created SBOM to scan and find the vulnerabilites

For the sake of the demo we will be using one of the most used docker images, NGINX, particularly nginx:1.21.5. We would using its Docker hub image and run scanner to generate the SBOM. Once the SBOM is generated, we would use this SBOM to get the list of Vulnerabilities and possible fixes for the same.

Generating SBOM

To generate the SBOM, make sure the trivy is first installed in you workbench. If not you can install using commands in this link. In my case , my pc is debian based, so i installed using following command

sudo apt-get install wget apt-transport-https gnupg lsb-release
wget -qO - <https://aquasecurity.github.io/trivy-repo/deb/public.key> | gpg --dearmor | sudo tee /usr/share/keyrings/trivy.gpg > /dev/null
echo "deb \[signed-by=/usr/share/keyrings/trivy.gpg\] <https://aquasecurity.github.io/trivy-repo/deb> $(lsb\_release -sc) main" | sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update
sudo apt-get install trivy

If you are using any flavour of debian, you might face issue with “lsb_release -sc” in the command. To overcome the issue you can use one of the following values: wheezy, jessie, stretch, buster, trusty, xenial, bionic.

Once installed you should see the following, when you run

trivy --version

Once the trivy is installed, we can scan the vulnerabilities in two ways, either for image, for the whole directory containing the files or an single file.

  • For a folder containing group of language specific files
trivy fs --format cyclonedx --output result.json /path/to/your\_project
  • For a specific file
trivy fs --format cyclonedx --output result.json ./trivy-ci-test/Pipfile.lock
  • For a container image
trivy image --format spdx-json --output result.json nginx:1.21.5

Once done, the scanner will scan the files/image and determine which language is the application written. Once determined, it will download the database pertaining to that specific language and get the list of libraries that are present in that language and check against which are being used in the current context.

These libraries are listed down along with OS level libraries in an SBOM in the format that we have requested in.

Listing Vulnerabilities of SBOM

Once the SBOM are generated, we can create the list of known vulnerabilities of the dependencies in the image/files such as libraries, OS packages etc. In addition to identifying these, trivy also suggests the version in which these vulnerabilities are fixed, making it not an effective tool to identify the vulnerabilities but also to resolve them.

In order for us to generate this list, we use the following command

trivy sbom results.json

This will generate the following list

As you can observe, we get the name of the library, the CVE vulnerability Number, Its Severity (HIGH,MEDIUM,LOW), The status of Vulnerability fix( fixed, not fixed or will_not fix) , if its fixed then the fixed version and along with the details on the vulnerability.

Based on this information ,we can upgrade the fixed vulnerable libraries, understand the vulnerability level in not fixed libraries and remove them if they are not required. In-addition to that, we would have the opportunity to look into the alternatives to the vulnerable libraries.

For more information and documentation, visit this site.

Liked my content ? Feel free to reach out to my LinkedIn for interesting content and productive discussions.