Implementation of Docker Image Scanner
Introduction to Docker images
This post is considered a throwaway to give you some simple ideas.
First to do a Docker image scan, we have to understand what Docker images are all about.
Docker images are overlaid with file systems. The bottom layer is the bootfs, and the part above it is the rootfs.
bootfs is the lowest-level boot file system of the docker image, containing the bootloader and the OS kernel.
rootfs Usually contains a file system needed for the operating system to run. This layer serves as the base mirror.
On top of the base image, various images will be added, such as emacs, apache, etc.
How to analyze mirror images
There are no two ways to analyze a mirror, other than static and dynamic analysis. And the open source referenceable implementations are
Clair, which focuses on static analysis, and Weave Scope, which focuses on container correlation analysis and monitoring. But Weave Scope doesn't seem to have much to do with security, and the author will give some ideas for dynamic analysis below.
First, let's look at the following prestigeClair . Clair currently supports static analysis of appc and docker containers.
The overall structure of Clair is as follows.
Clair contains the following core modules.
Getter (Fetcher ) - Collecting vulnerability data from public sources
Detector (Detector ) - indicates the Feature contained in the container image
Container formatter (Image Format ) - Clair's known container image formats, including Docker, ACI
Notification hooks (Notification Hook ) - Notify users/machines when new vulnerabilities are discovered or when an existing vulnerability has changed
Database (Databases ) - layers in the storage container and vulnerabilities
Worker -Each Post Layer starts a worker for Layer Detect
Compilation and use
Clair currently has 21 RELEASES released. We use the 20th release here, both V2.0.0, for source code dissection.
To reduce errors during compilation, it is recommended to use ubuntu for compilation. And make sure that git,bzr,rpm,xz etc modules are installed before compiling. Golang version using 1.8.3 or higher. And make sure you have postgresql installed, the version I am using is 9.5. suggest you also keep with the author.
Build clair with go build github.com/coreos/clair/cmd/clair
Build analyze-local-images with gobuild github.com/coreos/analyze-local-images
Where Clair acts as the server side, analyze-local-images as Client。
Simple to use as follows. Analyze the nginx:latest image via analyze-local-images.
The whole process of interaction between the two can be simplified as follows.
Analyze-local-images source code analysis
When using analyze-local-images, we can specify a number of parameters.
analyze-local-images -endpoint "http://10.28.182.152:6060"
-my-address "10.28.182.151" nginx:latest
where endpoint is the ip address of the clair host.my-address is the address of the client running analyze-local-images.
postLayerURI is the route to send the database to clair API V1.getLayerFeaturesURI is the route to get vulnerability information from clair API V1.
analyze-local-images calls the intMain() function in the main function, and intMain will first go through and parse the user's input parameters. For example, the endpoint just now.
Analyze-local-images is the main execution process for
main()->intMain()->AnalyzeLocalImage()—>analyzeLayer()->getLayer()
func intMain() int {
// Parse command line arguments and assign values to some global variables just defined.
......
// Create a temporary directory
tmpPath, err := ioutil.TempDir("", "analyze-local-image-")
// Create a folder starting with analyze-local-image-in the /tmp directory.
// To be able to clearly observe the changes in the directory under /tmp, we set defer os. Comment out the line RemoveAll(tmpPath) and recompile it.
......
// Call AnalyzeLocalImage method to analyze the image
go func() {
analyzeCh
}()
}
The image is extracted into the tmp directory with the following directory structure.
The two main methods that analyze-local-images uses to interact with the clair server are analyzeLayer and getLayer. analyzeLayer sends data in JSON format to clair. And getLayer is used to get the clair request. and decode the json format data and format the output.
func AnalyzeLocalImage(imageName string, minSeverity database.Severity, endpoint, myAddress, tmpPath string) error {
//Save the image to the tmp directory
// call the save method
The //save method works by using the docker save image name to first package the image into a tar file
// Then use the tar command to extract the file to the tmp file again.
err := save(imageName, tmpPath)
.......
// Call the historyFromManifest method, read the manifest.json file to get the id name of each layer, save it in layerIDs.
// If not available from the manifest.json file, then read the history
layerIDs, err := historyFromManifest(tmpPath)
if err != nil {
layerIDs, err = historyFromCommand(imageName)
}
......
// If clair is not on the local machine, enable HTTP service on analyze-local-images, default port is 9279
......
// analyze each layer and send both the layer.tar file under each layer to the clair server
err = analyzeLayer(endpoint, tmpPath+"/"+layerIDs[i]+"/layer.tar", layerIDs[i], layerIDs[i-1])
......
}
func AnalyzeLocalImage(imageName string, minSeverity database.Severity, endpoint, myAddress, tmpPath string) error {
......
// Get vulnerability information
layer, err := getLayer(endpoint, layerIDs[len(layerIDs)-1])
//Print vulnerability report
......
for _, feature := range layer.Features {
if len(feature.Vulnerabilities) > 0 {
for _, vulnerability := range feature.Vulnerabilities {
severity := database.Severity(vulnerability.Severity)
isSafe = false
if minSeverity.Compare(severity) > 0 {
continue
}
hasVisibleVulnerabilities = true
vulnerabilities = append(vulnerabilities, vulnerabilityInfo)
}
}
}
//Sort output report beautification
.....
}
At this point, the source code for analyze-local-images has been analyzed. You can see it from there. What analyze-local-images does is simple.
It is the layer.tar that is sent to clair. and the results of the clair analysis are fetched through the API interface and printed locally.
Clair source code dissection
analyze-local-images After sending the layer.tar file it is mainly processed by the ProcessLayer method under /worker.go.
Let's start with a brief explanation of clair's directory structure, we only need to focus on the annotated folders.
--api //api interface
-- cmd//server main program
--contrib
--database //database related
--Documentation
--ext //extend function
-- pkg//general-purpose method
-- testdata
`--vendor
To be able to understand Clair in depth, we still have to start our analysis with its main function.
/cmd/clair/main.go
funcmain() {
// Parse command line arguments, read database configuration information from /etc/clair/config.yaml by default
......
// Load configuration file
config, err :=LoadConfig(*flagConfigPath)
if err != nil {
log.WithError(err).Fatal("failedto load configuration")
}
// Initialize logging system
......
//start clair
Boot(config)
}
/cmd/clair/main.go
funcBoot(config *Config) {
......
// Open database
db, err :=database.Open(config.Database)
if err != nil {
log.Fatal(err)
}
defer db.Close()
// Start the notifier service
st.Begin()
go clair.RunNotifier(config.Notifier,db, st)
// Start clair's Rest API service
st.Begin()
go api.Run(config.API, db, st)
st.Begin()
// Start clair's health check service
go api.RunHealth(config.API, db, st)
//Start the updater service
st.Begin()
go clair.RunUpdater(config.Updater,db, st)
// Wait for interruption and shutdowngracefully.
waitForSignals(syscall.SIGINT,syscall.SIGTERM)
log.Info("Received interruption,gracefully stopping ...")
st.Stop()
}
Go api. After Run is executed, clair will open the Rest service.
/api/api.go
func Run(cfg *Config, store database.Datastore, st *stopper.Stopper) {
defer st.End()
// Do not start the service if the configuration is empty
......
srv := &graceful.Server{
Timeout: 0, // Already handled by our TimeOut middleware
NoSignalHandling: true, // We want to use our own Stopper
Server: &http.Server{
Addr: ":" + strconv.Itoa(cfg.Port),
TLSConfig: tlsConfig,
Handler: http.TimeoutHandler(newAPIHandler(cfg, store), cfg.Timeout, timeoutResponse),
},
}
// Start HTTP service
listenAndServeWithStopper(srv, st, cfg.CertFile, cfg.KeyFile)
log.Info("main API stopped")
}
The call to api.newAPIHandler in Api.Run generates an API Handler to handle all API requests.
/api/router.go
funcnewAPIHandler(cfg *Config, store database.Datastore) http.Handler {
router := make(router)
router["/v1"] =v1.NewRouter(store, cfg.PaginationKey)
return router
}
All the Handlers corresponding to the router are in the
In /api/v1/router.go.
funcNewRouter(store database.Datastore, paginationKey string) *httprouter.Router {
router := httprouter.New()
ctx := &context
// Layers
router.POST("/layers",httpHandler(postLayer, ctx))
router.GET("/layers/:layerName", httpHandler(getLayer, ctx))
router.DELETE("/layers/:layerName", httpHandler(deleteLayer,ctx))
// Namespaces
router.GET("/namespaces",httpHandler(getNamespaces, ctx))
// Vulnerabilities
router.GET("/namespaces/:namespaceName/vulnerabilities",httpHandler(getVulnerabilities, ctx))
router.POST("/namespaces/:namespaceName/vulnerabilities",httpHandler(postVulnerability, ctx))
router.GET("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName",httpHandler(getVulnerability, ctx))
router.PUT("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName",httpHandler(putVulnerability, ctx))
router.DELETE("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName",httpHandler(deleteVulnerability, ctx))
// Fixes
router.GET("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName/fixes",httpHandler(getFixes, ctx))
router.PUT("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName/fixes/:fixName",httpHandler(putFix, ctx))
router.DELETE("/namespaces/:namespaceName/vulnerabilities/:vulnerabilityName/fixes/:fixName",httpHandler(deleteFix, ctx))
// Notifications
router.GET("/notifications/:notificationName",httpHandler(getNotification, ctx))
router.DELETE("/notifications/:notificationName",httpHandler(deleteNotification, ctx))
// Metrics
router.GET("/metrics",httpHandler(getMetrics, ctx))
return router
}
And the specific Handler is in /api/v1/routers.go
The layer.tar file sent by analyze-local-images, for example, will eventually be handed off to the postLayer method for processing.
funcpostLayer(w http.ResponseWriter, r *http.Request, p httprouter.Params, ctx*context) (string, int) {
......
err = clair.ProcessLayer(ctx.Store,request.Layer.Format, request.Layer.Name, request.Layer.ParentName,request.Layer.Path, request.Layer.Headers)
......
}
And the ProcessLayer method is what is defined in /worker.go.
funcProcessLayer(datastore database.Datastore, imageFormat, name, parentName, pathstring, headers map[string]string) error {
//Parameter validation
......
//Detect if the layer is already in the inventory
layer, err := datastore.FindLayer(name, false, false)
if err != nil && err !=commonerr.ErrNotFound {
return err
}
// If it exists and the Engine Version of this layer is greater than or equal to 3 (the current maximum worker version) than the one recorded in the DB, it indicates that this layer has been detected, and ends up returning. Otherwise detectContent parses the data.
// Analyze the content.
layer.Namespace, layer.Features, err =detectContent(imageFormat, name, path, headers, layer.Parent)
if err != nil {
return err
}
return datastore.InsertLayer(layer)
}
In the detectContent method as follows.
func detectContent(imageFormat,name, path string, headers map[string]string, parent *database.Layer)(namespace *database.Namespace, featureVersions []database.FeatureVersion, errerror) {
......
//resolve namespace
namespace, err = detectNamespace(name,files, parent)
if err != nil {
return
}
//Parsing feature versions
featureVersions, err = detectFeatureVersions(name, files, namespace,parent)
if err != nil {
return
}
......
return
}
A simple implementation of a static scanner for Docker images
With the source code analysis just done, combined with analyze-local-images as well as clair. We can start by implementing a simple Docker static parser. Layer-by-layer analysis of docker images to achieve output software feature versions. So that we can understand how clair works.
The github link is given directly here:.
https://github.com/MXi4oyu/DockerXScan/releases/tag/0.1
Interested parties can download and test it for themselves.
The simple architecture of the Docker image static scanner is given here.
In-depth analysis of Docker images
(1) Webshell detection
as far as sth is concernedwebshell detection, We can Three approaches are used。
Way I: fuzzy hash
The fuzzy hash algorithm uses the following: https://ssdeep-project.github.io
We implemented a Go language binding based on its API: gossdeep
There are two main API functions, one is Fuzzy_hash_file and one is Fuzzy_compare.
1.Extracting file fuzzy hash
Fuzzy_hash_file("/var/www/shell.php")
2.Compare fuzzyhash
Fuzzy_compare("3:YD6xL4fYvn:Y2xMwvn","3:YD6xL4fYvn:Y2xMwvk")
Mode 2: yara rules engine
Detection based on the yara rule base
Yara("./libs/php.yar","/var/www/")
Mode 3: Machine learning
Machine Learning, Classification Algorithms: CNN-Text-Classfication
https://github.com/dennybritz/cnn-text-classification-tf/
(2) Trojan virus detection
We know that the open source antivirus engine ClamAV has a very powerful virus library, mainly
1) MD5 hash of a known malicious binary file
2) MD5 hash of the PE (executable file format in Windows) section
3) Hexadecimal feature code (shellcode)
4) Archived metadata feature codes
5) Whitelist database of known legal documents
We can
Convert clamav's virus library to yara rules for malicious code identification. Trojan detection can also be performed using the open source yara rules.
(3) Mirror history analysis
(4) Dynamic scanning
pass (a bill or inspection)docker The configuration file of the, We can Get the port it was leaked on。 After the simulation run, Can be scanned with a regular hacking vulnerability scan。
(5) Call monitoring
Detecting files and system calls with the Docker API
Some ideas for in-depth analysis are given here first, and we will do a detailed presentation in a future article due to space limitations.