Implementation of Docker Image Scanner

Introduction to Docker images

This post is considered a throwaway to give you some simple ideas.

First to do a Docker image scan, we have to understand what Docker images are all about.

Docker images are overlaid with file systems. The bottom layer is the bootfs, and the part above it is the rootfs.

bootfs is the lowest-level boot file system of the docker image, containing the bootloader and the OS kernel.

rootfs Usually contains a file system needed for the operating system to run. This layer serves as the base mirror.

On top of the base image, various images will be added, such as emacs, apache, etc.

How to analyze mirror images

There are no two ways to analyze a mirror, other than static and dynamic analysis. And the open source referenceable implementations are

Clair, which focuses on static analysis, and Weave Scope, which focuses on container correlation analysis and monitoring. But Weave Scope doesn't seem to have much to do with security, and the author will give some ideas for dynamic analysis below.

First, let's look at the following prestigeClair . Clair currently supports static analysis of appc and docker containers.

The overall structure of Clair is as follows.

Clair contains the following core modules.

Getter (Fetcher ) - Collecting vulnerability data from public sources

Detector (Detector ) - indicates the Feature contained in the container image

Container formatter (Image Format ) - Clair's known container image formats, including Docker, ACI

Notification hooks (Notification Hook ) - Notify users/machines when new vulnerabilities are discovered or when an existing vulnerability has changed

Database (Databases ) - layers in the storage container and vulnerabilities

Worker -Each Post Layer starts a worker for Layer Detect

Compilation and use

Clair currently has 21 RELEASES released. We use the 20th release here, both V2.0.0, for source code dissection.

To reduce errors during compilation, it is recommended to use ubuntu for compilation. And make sure that git,bzr,rpm,xz etc modules are installed before compiling. Golang version using 1.8.3 or higher. And make sure you have postgresql installed, the version I am using is 9.5. suggest you also keep with the author.

Build clair with go build

Build analyze-local-images with gobuild

Where Clair acts as the server side analyze-local-images as Client

Simple to use as follows. Analyze the nginx:latest image via analyze-local-images.

The whole process of interaction between the two can be simplified as follows.

Analyze-local-images source code analysis

When using analyze-local-images, we can specify a number of parameters.

analyze-local-images -endpoint ""

-my-address "" nginx:latest

where endpoint is the ip address of the clair is the address of the client running analyze-local-images.

postLayerURI is the route to send the database to clair API V1.getLayerFeaturesURI is the route to get vulnerability information from clair API V1.

analyze-local-images calls the intMain() function in the main function, and intMain will first go through and parse the user's input parameters. For example, the endpoint just now.

Analyze-local-images is the main execution process for


func intMain() int {

// Parse command line arguments and assign values to some global variables just defined.


// Create a temporary directory

tmpPath, err := ioutil.TempDir("", "analyze-local-image-")

// Create a folder starting with analyze-local-image-in the /tmp directory.

// To be able to clearly observe the changes in the directory under /tmp, we set defer os. Comment out the line RemoveAll(tmpPath) and recompile it.


// Call AnalyzeLocalImage method to analyze the image

go func() {




The image is extracted into the tmp directory with the following directory structure.

The two main methods that analyze-local-images uses to interact with the clair server are analyzeLayer and getLayer. analyzeLayer sends data in JSON format to clair. And getLayer is used to get the clair request. and decode the json format data and format the output.

func AnalyzeLocalImage(imageName string, minSeverity database.Severity, endpoint, myAddress, tmpPath string) error {

//Save the image to the tmp directory

// call the save method

The //save method works by using the docker save image name to first package the image into a tar file

// Then use the tar command to extract the file to the tmp file again.

err := save(imageName, tmpPath)


// Call the historyFromManifest method, read the manifest.json file to get the id name of each layer, save it in layerIDs.

// If not available from the manifest.json file, then read the history

layerIDs, err := historyFromManifest(tmpPath)

if err != nil {

layerIDs, err = historyFromCommand(imageName)



// If clair is not on the local machine, enable HTTP service on analyze-local-images, default port is 9279


// analyze each layer and send both the layer.tar file under each layer to the clair server

err = analyzeLayer(endpoint, tmpPath+"/"+layerIDs[i]+"/layer.tar", layerIDs[i], layerIDs[i-1])



func AnalyzeLocalImage(imageName string, minSeverity database.Severity, endpoint, myAddress, tmpPath string) error {


// Get vulnerability information

layer, err := getLayer(endpoint, layerIDs[len(layerIDs)-1])

//Print vulnerability report


for _, feature := range layer.Features {

if len(feature.Vulnerabilities) > 0 {

for _, vulnerability := range feature.Vulnerabilities {

severity := database.Severity(vulnerability.Severity)

isSafe = false

if minSeverity.Compare(severity) > 0 {



hasVisibleVulnerabilities = true

vulnerabilities = append(vulnerabilities, vulnerabilityInfo)




//Sort output report beautification



At this point, the source code for analyze-local-images has been analyzed. You can see it from there. What analyze-local-images does is simple.

It is the layer.tar that is sent to clair. and the results of the clair analysis are fetched through the API interface and printed locally.

Clair source code dissection

analyze-local-images After sending the layer.tar file it is mainly processed by the ProcessLayer method under /worker.go.

Let's start with a brief explanation of clair's directory structure, we only need to focus on the annotated folders.

--api //api interface

-- cmd//server main program


--database //database related


--ext //extend function

-- pkg//general-purpose method

-- testdata


To be able to understand Clair in depth, we still have to start our analysis with its main function.


funcmain() {

// Parse command line arguments, read database configuration information from /etc/clair/config.yaml by default


// Load configuration file

config, err :=LoadConfig(*flagConfigPath)

if err != nil {

log.WithError(err).Fatal("failedto load configuration")


// Initialize logging system


//start clair




funcBoot(config *Config) {


// Open database

db, err :=database.Open(config.Database)

if err != nil {



defer db.Close()

// Start the notifier service


go clair.RunNotifier(config.Notifier,db, st)

// Start clair's Rest API service


go api.Run(config.API, db, st)


// Start clair's health check service

go api.RunHealth(config.API, db, st)

//Start the updater service


go clair.RunUpdater(config.Updater,db, st)

// Wait for interruption and shutdowngracefully.


log.Info("Received interruption,gracefully stopping ...")



Go api. After Run is executed, clair will open the Rest service.


func Run(cfg *Config, store database.Datastore, st *stopper.Stopper) {

defer st.End()

// Do not start the service if the configuration is empty


srv := &graceful.Server{

Timeout: 0, // Already handled by our TimeOut middleware

NoSignalHandling: true, // We want to use our own Stopper

Server: &http.Server{

Addr: ":" + strconv.Itoa(cfg.Port),

TLSConfig: tlsConfig,

Handler: http.TimeoutHandler(newAPIHandler(cfg, store), cfg.Timeout, timeoutResponse),



// Start HTTP service

listenAndServeWithStopper(srv, st, cfg.CertFile, cfg.KeyFile)

log.Info("main API stopped")


The call to api.newAPIHandler in Api.Run generates an API Handler to handle all API requests.


funcnewAPIHandler(cfg *Config, store database.Datastore) http.Handler {

router := make(router)

router["/v1"] =v1.NewRouter(store, cfg.PaginationKey)

return router


All the Handlers corresponding to the router are in the

In /api/v1/router.go.

funcNewRouter(store database.Datastore, paginationKey string) *httprouter.Router {

router := httprouter.New()

ctx := &context

// Layers

router.POST("/layers",httpHandler(postLayer, ctx))

router.GET("/layers/:layerName", httpHandler(getLayer, ctx))

router.DELETE("/layers/:layerName", httpHandler(deleteLayer,ctx))

// Namespaces

router.GET("/namespaces",httpHandler(getNamespaces, ctx))

// Vulnerabilities

router.GET("/namespaces/:namespaceName/vulnerabilities",httpHandler(getVulnerabilities, ctx))

router.POST("/namespaces/:namespaceName/vulnerabilities",httpHandler(postVulnerability, ctx))

router.GET("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName",httpHandler(getVulnerability, ctx))

router.PUT("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName",httpHandler(putVulnerability, ctx))

router.DELETE("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName",httpHandler(deleteVulnerability, ctx))

// Fixes

router.GET("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName/fixes",httpHandler(getFixes, ctx))

router.PUT("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName/fixes/:fixName",httpHandler(putFix, ctx))

router.DELETE("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName/fixes/:fixName",httpHandler(deleteFix, ctx))

// Notifications

router.GET("/notifications/:notificationName",httpHandler(getNotification, ctx))

router.DELETE("/notifications/:notificationName",httpHandler(deleteNotification, ctx))

// Metrics

router.GET("/metrics",httpHandler(getMetrics, ctx))

return router


And the specific Handler is in /api/v1/routers.go

The layer.tar file sent by analyze-local-images, for example, will eventually be handed off to the postLayer method for processing.

funcpostLayer(w http.ResponseWriter, r *http.Request, p httprouter.Params, ctx*context) (string, int) {


err = clair.ProcessLayer(ctx.Store,request.Layer.Format, request.Layer.Name, request.Layer.ParentName,request.Layer.Path, request.Layer.Headers)



And the ProcessLayer method is what is defined in /worker.go.

funcProcessLayer(datastore database.Datastore, imageFormat, name, parentName, pathstring, headers map[string]string) error {

//Parameter validation


//Detect if the layer is already in the inventory

layer, err := datastore.FindLayer(name, false, false)

if err != nil && err !=commonerr.ErrNotFound {

return err


// If it exists and the Engine Version of this layer is greater than or equal to 3 (the current maximum worker version) than the one recorded in the DB, it indicates that this layer has been detected, and ends up returning. Otherwise detectContent parses the data.

// Analyze the content.

layer.Namespace, layer.Features, err =detectContent(imageFormat, name, path, headers, layer.Parent)

if err != nil {

return err


return datastore.InsertLayer(layer)


In the detectContent method as follows.

func detectContent(imageFormat,name, path string, headers map[string]string, parent *database.Layer)(namespace *database.Namespace, featureVersions []database.FeatureVersion, errerror) {


//resolve namespace

namespace, err = detectNamespace(name,files, parent)

if err != nil {



//Parsing feature versions

featureVersions, err = detectFeatureVersions(name, files, namespace,parent)

if err != nil {






A simple implementation of a static scanner for Docker images

With the source code analysis just done, combined with analyze-local-images as well as clair. We can start by implementing a simple Docker static parser. Layer-by-layer analysis of docker images to achieve output software feature versions. So that we can understand how clair works.

The github link is given directly here:.

Interested parties can download and test it for themselves.

The simple architecture of the Docker image static scanner is given here.

In-depth analysis of Docker images

(1) Webshell detection

as far as sth is concernedwebshell detection, We can Three approaches are used。

Way I: fuzzy hash

The fuzzy hash algorithm uses the following:

We implemented a Go language binding based on its API: gossdeep

There are two main API functions, one is Fuzzy_hash_file and one is Fuzzy_compare.

1.Extracting file fuzzy hash


2.Compare fuzzyhash


Mode 2: yara rules engine

Detection based on the yara rule base


Mode 3: Machine learning

Machine Learning, Classification Algorithms: CNN-Text-Classfication

(2) Trojan virus detection

We know that the open source antivirus engine ClamAV has a very powerful virus library, mainly

1) MD5 hash of a known malicious binary file

2) MD5 hash of the PE (executable file format in Windows) section

3) Hexadecimal feature code (shellcode)

4) Archived metadata feature codes

5) Whitelist database of known legal documents

We can

Convert clamav's virus library to yara rules for malicious code identification. Trojan detection can also be performed using the open source yara rules.

(3) Mirror history analysis

(4) Dynamic scanning

pass (a bill or inspection)docker The configuration file of the, We can Get the port it was leaked on。 After the simulation run, Can be scanned with a regular hacking vulnerability scan。

(5) Call monitoring

Detecting files and system calls with the Docker API

Some ideas for in-depth analysis are given here first, and we will do a detailed presentation in a future article due to space limitations.

1、Celebrating the grand opening of Wanda Wealth Groups Jiangxi operation center on July 22
2、An introduction to common pressure vessel design software Ive bookmarked it
3、Out of the Truman World of Big Data
4、JavaWeb file upload and download
5、2018 Maotai Town Sauce Blockchain Forum and AITC Official Brew Strategic Cooperation Launch Meeting

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送