docker 源码分析 四(基于1.8.2版本),Docker镜像的获取和存储

时间:2024-08-10 15:34:02

前段时间一直忙些其他事情,docker源码分析的事情耽搁了,今天接着写,上一章了解了docker client 和 docker daemon(会启动一个http server)是C/S的结构,client端发出的命令由docker daemon接收并处理。

我们在运行docker的时候,可能会使用到docker run命令(当然通过Dockerfile运行docker build命令也是一样的)时,如果本地没有你需要的镜像,docker daemon首先会去下载你需要的docker镜像,然后存储在本地;另外docker 镜像其实是一个很神奇的东西,它有多个层(layer)构成,每一个层的上一层是本层的父亲层(parent layer)。最上层(top layer)是可读可写层,用户对镜像的更新在这一层起作用,top layer之下的层都是只读层;这种实现方式其实也是一种文件系统,UnionFS。

本文就从上一章的结尾,分析一下docker pull 命令的实现,就是docker 怎样下载镜像并怎样存储的;

docker run的客户端命令所在文件是api/client/pull.go 下:

func (cli *DockerCli) CmdPull(args ...string) error {

cmd := Cli.Subcmd("pull", []string{"NAME[:TAG|@DIGEST]"}, "Pull an image or a repository from a registry", true)

allTags := cmd.Bool([]string{"a", "-all-tags"}, false, "Download all tagged images in the repository")

addTrustedFlags(cmd, true)

cmd.Require(flag.Exact, 1)

cmd.ParseFlags(args, true)

remote := cmd.Arg(0)

taglessRemote, tag := parsers.ParseRepositoryTag(remote)

if tag == "" && !*allTags {

tag = tags.DefaultTag

fmt.Fprintf(cli.out, "Using default tag: %s\n", tag)

} else if tag != "" && *allTags {

return fmt.Errorf("tag can't be used with --all-tags/-a")


ref := registry.ParseReference(tag)

// Resolve the Repository name from fqn to RepositoryInfo

repoInfo, err := registry.ParseRepositoryInfo(taglessRemote)

if err != nil {

return err


if isTrusted() && !ref.HasDigest() {

// Check if tag is digest

authConfig := registry.ResolveAuthConfig(cli.configFile, repoInfo.Index)

return cli.trustedPull(repoInfo, ref, authConfig)


v := url.Values{}

v.Set("fromImage", ref.ImageName(taglessRemote))

_, _, err = cli.clientRequestAttemptLogin("POST", "/images/create?"+v.Encode(), nil, cli.out, repoInfo.Index, "pull")

return err


ParseRepositoryTag(pkg/parsers/parsers.go)的作用就是从输入的pull 后面的字符串中提取出tag名字和剩下的部分,叫taglessRemote,


(a) 会被拆分成 和 10.04 两个部分;

(b) ubuntu:10.04 会被拆分成 ubunu 和 10.04 两个部分;

(c) 还有一种digest(我理解就是对镜像生成的摘要)的形式,sha256是生成digest的方法;

这种包含@ 的形式会 分割成@左右两个部分,就是 和 sha256:xxx...

如果分离出的tag为空 并且 alltags flag的值也为空的话(两者不能同时不为空),那么tag就会取默认值,默认值是latest;

ref := registry.ParseReference(tag) (registry/reference.go)的作用就是将分离出的tag 转成成内部的tagReference或者digestReference的形式;

repoInfo, err := registry.ParseRepositoryInfo(taglessRemote) (registry/config.go)的作用就是将taglessRemote转成RepositoryInfo的struct;

RepositoryInfo (registry/types.go)的结构如下,是用来描述一个镜像除了tag之外的部分,可能包括url路径:

type RepositoryInfo struct {

Index *IndexInfo             //registry 信息

RemoteName string         //"library/ubuntu-12.04-base"

LocalName string             //"ubuntu-12.04-base"

CanonicalName string      //""

Official bool                    //像ubuntu的名字就是true,像xxx/ubuntu这种名字就是false;


这三种name之间的区别就如代码的注释中的一样,Index 也是一个表示registry信息的struct (registry/types.go),里面主要包括的name(registry的名字,例如官方,mirrors表示这个registry的镜像,表现为就是一个url的list;


// RepositoryInfo Examples:

// {

//   "Index" : {

//     "Name" : "",

//     "Mirrors" : ["", ""],

//     "Secure" : true,

//     "Official" : true,

//   },

//   "RemoteName" : "library/debian",

//   "LocalName" : "debian",

//   "CanonicalName" : ""

//   "Official" : true,

// }


// {

//   "Index" : {

//     "Name" : "",

//     "Mirrors" : [],

//     "Secure" : false,

//     "Official" : false,

//   },

//   "RemoteName" : "user/repo",

//   "LocalName" : "",

//   "CanonicalName" : "",

//   "Official" : false,

// }

如果稍后docker daemon要访问的registry 需要验证,则通过 repo.Index 和 cli.configFile (api/client/cli.go) 取出对应registry的认证信息 authConfig,autoConfig在cliconfig/config.go文件中:

type AuthConfig struct {

Username      string `json:"username,omitempty"`

Password      string `json:"password,omitempty"`

Auth          string `json:"auth"`

Email         string `json:"email"`

ServerAddress string `json:"serveraddress,omitempty"`


接着调用trustedPull (api/client/trust.go)方法,最终trustPull方法也会通过restful API来调用

_, _, err = cli.clientRequestAttemptLogin("POST", "/images/create?"+v.Encode(), nil, cli.out, repoInfo.Index, "pull") 方法来将pull image的请求发送给docker server进行处理;

本系列文章的前两章中有介绍,docker server对应pull请求的handler是postImagesCreate (api/server/image.go)。

// Creates an image from Pull or from Import

func (s *Server) postImagesCreate(version version.Version, w http.ResponseWriter, r *http.Request, vars map[string]string) error {


if image != "" { //pull

if tag == "" {

image, tag = parsers.ParseRepositoryTag(image)


metaHeaders := map[string][]string{}

for k, v := range r.Header {

if strings.HasPrefix(k, "X-Meta-") {

metaHeaders[k] = v



imagePullConfig := &graph.ImagePullConfig{

MetaHeaders: metaHeaders,

AuthConfig:  authConfig,

OutStream:   output,


err = s.daemon.Repositories().Pull(image, tag, imagePullConfig)

} else { //import

if tag == "" {

repo, tag = parsers.ParseRepositoryTag(repo)


src := r.Form.Get("fromSrc")

// 'err' MUST NOT be defined within this block, we need any error

// generated from the download to be available to the output

// stream processing below

var newConfig *runconfig.Config

newConfig, err = builder.BuildFromConfig(s.daemon, &runconfig.Config{}, r.Form["changes"])

if err != nil {

return err


err = s.daemon.Repositories().Import(src, repo, tag, message, r.Body, output, newConfig)


if err != nil {

if !output.Flushed() {

return err


sf := streamformatter.NewJSONStreamFormatter()



return nil


postImagesCreate函数只截取重要的部分,这里省略号的部分主要是从http request中提取出image名称等参数,当image不为空的时候,由于docker server也需要与 docker registry 通过http交互来下载docker网络镜像,所以首先封装了 imagePullConfig 参数,在与registry通信的时候使用。接下来调用

err = s.daemon.Repositories().Pull(image, tag, imagePullConfig)

s.daemon.Repositories() (daemon/daemon.go) 是*graph.TagStore (graph/tags.go)类型, TagStore是一个比较重要的类型: 它保存着Graph用来完成对镜像的存储,管理着各种repository,同时pullingPool 和 pushingPool 保证同一个时间段只能有一个相同的镜像被下载和上传;

type TagStore struct {

path  string

graph *Graph

// Repositories is a map of repositories, indexed by name.

Repositories map[string]Repository

trustKey     libtrust.PrivateKey


// FIXME: move push/pull-related fields

// to a helper type

pullingPool     map[string]chan struct{}

pushingPool     map[string]chan struct{}

registryService *registry.Service

eventsService   *events.Events

trustService    *trust.Store


接着是 TagStore的Pull()方法,

func (s *TagStore) Pull(image string, tag string, imagePullConfig *ImagePullConfig) error {


for _, endpoint := range endpoints {

logrus.Debugf("Trying to pull %s from %s %s", repoInfo.LocalName, endpoint.URL, endpoint.Version)

if !endpoint.Mirror && (endpoint.Official || endpoint.Version == registry.APIVersion2) {

if repoInfo.Official {




puller, err := NewPuller(s, endpoint, repoInfo, imagePullConfig, sf)

if err != nil {

lastErr = err



if fallback, err := puller.Pull(tag); err != nil {

if fallback {

if _, ok := err.(registry.ErrNoSupport); !ok {

// Because we found an error that's not ErrNoSupport, discard all subsequent ErrNoSupport errors.

discardNoSupportErrors = true

// save the current error

lastErr = err

} else if !discardNoSupportErrors {

// Save the ErrNoSupport error, because it's either the first error or all encountered errors

// were also ErrNoSupport errors.

lastErr = err




logrus.Debugf("Not continuing with error: %v", err)

return err


s.eventsService.Log("pull", logName, "")

return nil





接下来针对每一个endpoint,建立一个Puller:puller, err := NewPuller(s, endpoint, repoInfo, imagePullConfig, sf) 开始拉取镜像;sf就是个jsonformatter;

NewPuller会根据endpoint的形式(endpoint应该遵循restful api的设计,url中含有版本号),决定采用version1还是version2版本,我主要分析v2的版本,在graph/pull_v2.go中:

func (p *v2Puller) Pull(tag string) (fallback bool, err error) {

// TODO(tiborvass): was ReceiveTimeout

p.repo, err = NewV2Repository(p.repoInfo, p.endpoint, p.config.MetaHeaders, p.config.AuthConfig)

if err != nil {

logrus.Debugf("Error getting v2 registry: %v", err)

return true, err


p.sessionID = stringid.GenerateRandomID()

if err := p.pullV2Repository(tag); err != nil {

if registry.ContinueOnError(err) {

logrus.Debugf("Error trying v2 registry: %v", err)

return true, err


return false, err


return false, nil



func (p *v2Puller) pullV2Repository(tag string) (err error) {

var tags []string

taggedName := p.repoInfo.LocalName

if len(tag) > 0 {

tags = []string{tag}

taggedName = utils.ImageReference(p.repoInfo.LocalName, tag)

} else {

var err error

manSvc, err := p.repo.Manifests(context.Background())

if err != nil {

return err


tags, err = manSvc.Tags()

if err != nil {

return err



c, err := p.poolAdd("pull", taggedName)

if err != nil {

if c != nil {

// Another pull of the same repository is already taking place; just wait for it to finish

p.config.OutStream.Write(p.sf.FormatStatus("", "Repository %s already being pulled by another client. Waiting.", p.repoInfo.CanonicalName))


return nil


return err


defer p.poolRemove("pull", taggedName)

var layersDownloaded bool

for _, tag := range tags {

// pulledNew is true if either new layers were downloaded OR if existing images were newly tagged

// TODO(tiborvass): should we change the name of `layersDownload`? What about message in WriteStatus?

pulledNew, err := p.pullV2Tag(tag, taggedName)

if err != nil {

return err


layersDownloaded = layersDownloaded || pulledNew


writeStatus(taggedName, p.config.OutStream, p.sf, layersDownloaded)

return nil



看一下 c, err := p.poolAdd("pull", taggedName)  (graph/tags.go文件)这个函数:

func (store *TagStore) poolAdd(kind, key string) (chan struct{}, error) {


defer store.Unlock()

if c, exists := store.pullingPool[key]; exists {

return c, fmt.Errorf("pull %s is already in progress", key)


if c, exists := store.pushingPool[key]; exists {

return c, fmt.Errorf("push %s is already in progress", key)


c := make(chan struct{})

switch kind {

case "pull":

store.pullingPool[key] = c

case "push":

store.pushingPool[key] = c


return nil, fmt.Errorf("Unknown pool type")


return c, nil


这个tagStore的函数之前提到过,就是保证同一时刻,只能有一个tag在上传或者下载;当下载完成后,会调用 defer p.poolRemove("pull", taggedName) 将这个限制打开;接下来就是实际下载的函数 pullV2Tag 了,是一段很长的代码:

func (p *v2Puller) pullV2Tag(tag, taggedName string) (verified bool, err error) {

logrus.Debugf("Pulling tag from V2 registry: %q", tag)


out := p.config.OutStream

manSvc, err := p.repo.Manifests(context.Background())

if err != nil {

return false, err


manifest, err := manSvc.GetByTag(tag)

if err != nil {

return false, err


verified, err = p.validateManifest(manifest, tag)

if err != nil {

return false, err


if verified {

logrus.Printf("Image manifest for %s has been verified", taggedName)


pipeReader, pipeWriter := io.Pipe()

go func() {

if _, err := io.Copy(out, pipeReader); err != nil {

logrus.Errorf("error copying from layer download progress reader: %s", err)

if err := pipeReader.CloseWithError(err); err != nil {

logrus.Errorf("error closing the progress reader: %s", err)




defer func() {

if err != nil {

// All operations on the pipe are synchronous. This call will wait

// until all current readers/writers are done using the pipe then

// set the error. All successive reads/writes will return with this

// error.

pipeWriter.CloseWithError(errors.New("download canceled"))



out.Write(p.sf.FormatStatus(tag, "Pulling from %s", p.repo.Name()))

var downloads []*downloadInfo

var layerIDs []string

defer func() {

p.graph.Release(p.sessionID, layerIDs...)


for i := len(manifest.FSLayers) - 1; i >= 0; i-- {

img, err := image.NewImgJSON([]byte(manifest.History[i].V1Compatibility))

if err != nil {

logrus.Debugf("error getting image v1 json: %v", err)

return false, err


p.graph.Retain(p.sessionID, img.ID)

layerIDs = append(layerIDs, img.ID)

// Check if exists

if p.graph.Exists(img.ID) {

logrus.Debugf("Image already exists: %s", img.ID)

out.Write(p.sf.FormatProgress(stringid.TruncateID(img.ID), "Already exists", nil))



out.Write(p.sf.FormatProgress(stringid.TruncateID(img.ID), "Pulling fs layer", nil))

d := &downloadInfo{

img:    img,

digest: manifest.FSLayers[i].BlobSum,

// TODO: seems like this chan buffer solved hanging problem in go1.5,

// this can indicate some deeper problem that somehow we never take

// error from channel in loop below

err: make(chan error, 1),

out: pipeWriter,


downloads = append(downloads, d)



// run clean for all downloads to prevent leftovers

for _, d := range downloads {

defer func(d *downloadInfo) {

if d.tmpFile != nil {


if err := os.RemoveAll(d.tmpFile.Name()); err != nil {

logrus.Errorf("Failed to remove temp file: %s", d.tmpFile.Name())





var tagUpdated bool

for _, d := range downloads {

if err := <-d.err; err != nil {

return false, err


if d.layer == nil {



// if tmpFile is empty assume download and extracted elsewhere

d.tmpFile.Seek(0, 0)

reader := progressreader.New(progressreader.Config{

In:        d.tmpFile,

Out:       out,

Formatter: p.sf,

Size:      d.size,

NewLines:  false,

ID:        stringid.TruncateID(d.img.ID),

Action:    "Extracting",


err = p.graph.Register(d.img, reader)

if err != nil {

return false, err


// FIXME: Pool release here for parallel tag pull (ensures any downloads block until fully extracted)

out.Write(p.sf.FormatProgress(stringid.TruncateID(d.img.ID), "Pull complete", nil))

tagUpdated = true


manifestDigest, _, err := digestFromManifest(manifest, p.repoInfo.LocalName)

if err != nil {

return false, err


// Check for new tag if no layers downloaded

if !tagUpdated {

repo, err := p.Get(p.repoInfo.LocalName)

if err != nil {

return false, err


if repo != nil {

if _, exists := repo[tag]; !exists {

tagUpdated = true


} else {

tagUpdated = true



if verified && tagUpdated {

out.Write(p.sf.FormatStatus(p.repo.Name()+":"+tag, "The image you are pulling has been verified. Important: image verification is a tech preview feature and should  not be relied on to provide security."))


firstID := layerIDs[len(layerIDs)-1]

if utils.DigestReference(tag) {

// TODO(stevvooe): Ideally, we should always set the digest so we can

// use the digest whether we pull by it or not. Unfortunately, the tag

// store treats the digest as a separate tag, meaning there may be an

// untagged digest image that would seem to be dangling by a user.

if err = p.SetDigest(p.repoInfo.LocalName, tag, firstID); err != nil {

return false, err


} else {

// only set the repository/tag -> image ID mapping when pulling by tag (i.e. not by digest)

if err = p.Tag(p.repoInfo.LocalName, tag, firstID, true); err != nil {

return false, err



if manifestDigest != "" {

out.Write(p.sf.FormatStatus("", "Digest: %s", manifestDigest))


return tagUpdated, nil



for i := len(manifest.FSLayers) - 1; i >= 0; i-- {

img, err := image.NewImgJSON([]byte(manifest.History[i].V1Compatibility))

if err != nil {

logrus.Debugf("error getting image v1 json: %v", err)

return false, err


p.graph.Retain(p.sessionID, img.ID)

layerIDs = append(layerIDs, img.ID)



type Manifest struct {


// Name is the name of the image's repository

Name string `json:"name"`

// Tag is the tag of the image specified by this manifest

Tag string `json:"tag"`

// Architecture is the host architecture on which this image is intended to

// run

Architecture string `json:"architecture"`

// FSLayers is a list of filesystem layer blobSums contained in this image

FSLayers []FSLayer `json:"fsLayers"`

// History is a list of unstructured historical data for v1 compatibility

History []History `json:"history"`



接着会调用p.graph (graph/graph.go),graph维持着不同版本的镜像文件和他们之间的关系,这里面的driver默认是aufs.go   (daemon/graphdriver/aufs/aufs.go)

type Graph struct {

root             string

idIndex          *truncindex.TruncIndex

driver           graphdriver.Driver

imageMutex       imageMutex // protect images in driver.

retained         *retainedLayers

tarSplitDisabled bool



p.graph.Retain(p.sessionID, img.ID)

将sessionID 和 img.ID加入到 graph的数据结构

type retainedLayers struct {

layerHolders map[string]map[string]struct{} // map[layerID]map[sessionID]



这个结构维护着哪些imageId已经被下载过;如果if p.graph.Exists(img.ID)  为true,说明这个镜像被下载过,直接continue,否则将这个镜像加入下载的downloadInfo里面去去;然后  go 开始下载镜像,下载镜像的过程首先根据之前说到的TagStore判断是不是有同样的镜像在下载过程中,如果没有调用ioutil.TempFile()将镜像内容下载到临时文件;函数结束后,会defer的函数对tempfile进行清理;


err = p.graph.Register(d.img, reader)   (graph/graph.go)

func (graph *Graph) Register(img *image.Image, layerData io.Reader) (err error) {

if err := image.ValidateID(img.ID); err != nil {

return err



defer graph.imageMutex.Unlock(img.ID)

// Skip register if image is already registered

if graph.Exists(img.ID) {

return nil


defer func() {

// If any error occurs, remove the new dir from the driver.

// Don't check for errors since the dir might not have been created.

if err != nil {




if err := os.RemoveAll(graph.imageRoot(img.ID)); err != nil && !os.IsNotExist(err) {

return err



tmp, err := graph.mktemp("")

defer os.RemoveAll(tmp)

if err != nil {

return fmt.Errorf("mktemp failed: %s", err)


// Create root filesystem in the driver

if err := createRootFilesystemInDriver(graph, img, layerData); err != nil {

return err


// Apply the diff/layer

if err := graph.storeImage(img, layerData, tmp); err != nil {

return err


// Commit

if err := os.Rename(tmp, graph.imageRoot(img.ID)); err != nil {

return err



return nil


首先是验证graph是否已经注册过image,如果已经注册过image,那么直接返回nil 退出;接着删除已有的路径,稍后会说,docker存储镜像的时候会新建几个目录,

graph.imageRoot(img.ID) 这个目录是 /var/lib/docker/graph/imageID, 这个路径下每一个文件夹名称是一个imageID,在docker daemon 初始化的时候,会生成新的graph 实例,graph实例会通过restore()方法(graph/graph.go)根据目录下的内容来加载已有的镜像;

graph.driver.Remove(img.ID) graph包含driver,这里用aufs举例,文件存储在/var/lib/docker/aufs目录下,这个目录下会有三个文件夹 mnt, layers, diff。每一个目录下都会有一个以镜像ID为名称的文件,mnt下面存放的是以这个镜像为可读写层的挂载点;layers存储这以这个镜像的所有的祖先镜像的ID列表,diff存储这个镜像的实际的文件系统中的内容;

在删除了可能残留的目录后,开始建立新的目录, createRootFilesystemInDriver(graph, img, layerData),调用driver的Create函数(daemon/graphdriver/aufs/aufs.go),

func (a *Driver) Create(id, parent string) error {

if err := a.createDirsFor(id); err != nil {

return err


// Write the layers metadata

f, err := os.Create(path.Join(a.rootPath(), "layers", id))

if err != nil {

return err


defer f.Close()

if parent != "" {

ids, err := getParentIds(a.rootPath(), parent)

if err != nil {

return err


if _, err := fmt.Fprintln(f, parent); err != nil {

return err


for _, i := range ids {

if _, err := fmt.Fprintln(f, i); err != nil {

return err




return nil



接下来对实际的镜像的实际内容进行存储,graph.storeImage(img, layerData, tmp),storeImage函数(graph/graph.go):

func (graph *Graph) storeImage(img *image.Image, layerData io.Reader, root string) (err error) {

// Store the layer. If layerData is not nil, unpack it into the new layer

if layerData != nil {

if err := graph.disassembleAndApplyTarLayer(img, layerData, root); err != nil {

return err



if err := graph.saveSize(root, img.Size); err != nil {

return err


f, err := os.OpenFile(jsonPath(root), os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.FileMode(0600))

if err != nil {

return err


defer f.Close()

return json.NewEncoder(f).Encode(img)


disassembleAndApplyTarLayer将下载下来的img解压到/var/lib/docker/aufs/diff/imageID中,接下来将镜像的大小也存储为一个文件,存储的地点是通过这句函数tmp, err := graph.mktemp("")建立的临时目录/var/lib/docker/graph/tmp_xxxxx中,


再这些数据都存储完之后,调用os.Rename(tmp, graph.imageRoot(img.ID)) 将之前的临时目录/var/lib/docker/graph/tmp_xxxxx 改成 /var/lib/docker/graph/imageID

Register函数的最后一步是 graph.idIndex.Add(img.ID) ,将ID加入idIndex,idIndex是一个trie结构,为了方便用户根据镜像的前缀来方便的查找镜像;

docker的镜像pull就先写到这儿,下一篇趁热打铁,分析一个docker run的秘密;