_posts/2020-05-06-docker-bind-volumes.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176

---
title: 'Docker: bind mounts & file ownership'
excerpt: Docker + bind mounts + non-root users = pain.
---
If you want to:

1. run your Docker service as a user other than root,
2. share a writable directory between your host and the container,

you're in for a treat!
The thing is, files stored in the shared directory retain their ownership (and
by that I mean their UIDs and GIDs, as they're the only thing that matters)
after being mounted in the container.

Case in point:

    docker run -it --rm -v "$( pwd ):/data" alpine touch /data/test.txt

would create file ./test.txt owned by root:root.

You can fix that by using the `--user` parameter:

    docker run -it --rm -v "$( pwd ):/data" --user "$( id -u ):$( id -g )" alpine touch /data/test.txt

That would create file ./test.txt owned by the current user (if the current
working directory is writable by the current user, of course).

More often though, instead of a simple `touch` call, you have a 24/7 service,
which absolutely mustn't run as root, regardless of whether `--user` was
specified or not.
In such cases, the logical solution would be to create a regular user in the
container, and use it to run the service.
In fact, that's what many popular images do, i.e. [Redis][Redis Dockerfile] and
[MongoDB][MongoDB Dockerfile].

[Redis Dockerfile]: https://github.com/docker-library/redis/blob/cc1b618d51eb5f6bf6e3a03c7842317b38dbd7f9/6.0/Dockerfile#L4
[MongoDB Dockerfile]: https://github.com/docker-library/mongo/blob/5cbf7be9a486932b7e472a39e432c9a444628465/4.2/Dockerfile#L4

How do you run the service as regular user though?
It's tempting to use the `USER` directive in the Dockerfile, but that can be
overridden by `--user`:

    $ cat Dockerfile
    FROM alpine
    
    RUN addgroup --gid 9099 test-group && \
        adduser \
            --disabled-password \
            --gecos '' \
            --home /home/test-user \
            --ingroup test-group \
            --uid 9099 \
            test-user
    
    RUN touch /root.txt
    USER test-user:test-group
    RUN touch /home/test-user/test-user.txt
    
    CMD id && stat -c '%U %G' /root.txt && stat -c '%U %G' /home/test-user/test-user.txt

    $ docker build -t id .
    ...

    $ docker run -it --rm id
    uid=9099(test-user) gid=9099(test-group)
    root root
    test-user test-group

    $ docker run -it --rm --user root id
    uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)
    root root
    test-user test-group

I suppose that's the reason why many popular images override ENTRYPOINT, using
a custom script (and `gosu`, which is basically `sudo`, I think) to forcefully
drop privileges (for example, see [Redis][Redis entrypoint],
[MongoDB][MongoDB entrypoint]).

[Redis entrypoint]: https://github.com/docker-library/redis/blob/cc1b618d51eb5f6bf6e3a03c7842317b38dbd7f9/6.0/docker-entrypoint.sh#L11
[MongoDB entrypoint]: https://github.com/docker-library/mongo/blob/5cbf7be9a486932b7e472a39e432c9a444628465/4.2/docker-entrypoint.sh#L12

Now, what if such service needs persistent storage?
A good solution would be to use Docker volumes.
For development though, you often need to just share a directory between your
host and the container, and it has to be writable by both the host and the
container process.
This can be accomplished using _bind mounts_.
For example, let's try to map ./data to /data inside a Redis container (this
assumes ./data doesn't exist and you're running as regular user with UID 1000;
press Ctrl+C to stop Redis):

    $ mkdir data

    $ stat -c '%u' data
    1000

    $ docker run -it --rm --name redis -v "$( pwd )/data:/data" redis:6.0
    ...

    $ stat -c '%u' data
    999

As you can see, ./data changed its owner from user with UID 1000 (the host
user) to user with UID 999 (the `redis` user inside the container).
This is done in Redis' ENTRYPOINT script, just before dropping root privileges
so that the `redis-server` process owns the /data directory and thus can write
to it.

If you want to preserve ./data ownership, Redis' image (and many others)
explicitly accomodates for it by _not_ changing its owner if the container is
run as anybody other than root.
For example:

    $ mkdir data

    $ stat -c '%u' data
    1000

    $ docker run -it --rm --name redis -v "$( pwd )/data:/data" --user "$( id -u ):$( id -g )" redis:6.0
    ...

    $ stat -c '%u' data
    1000

Sometimes `--user` is not enough though.
That specified user is almost certainly missing from container's /etc/passwd,
it doesn't have a $HOME, etc.
All of that could cause problems with some applications.

One scenario I had to deal with is making an image that bundles all the gems
(and a specific Ruby version) for my Ruby web application.
That application shouldn't be run as root, but it must be able to pick up code
changes on the fly, and I should be able `docker exec` into the container,
update the dependencies (along with Gemfile[.lock], and those changes should be
reflected on the host without messing up file metadata), and restart the app.
It's quite easy to install the dependencies in the Dockerfile, but they (along
with the mapped Gemfile[.lock]) should be writable by the user running the
service.
The solution often suggested is to create a container user with a fixed UID
(that would match the host user UID).
That way, it would be able to update the dependencies stored in the container,
as well as write to the bind mount owned by the host user with the same UID.
Additionally, file ownership info would be preserved on the host!

We can create a user with a fixed UID when

1. building the image (using build `ARG`uments),
2. first starting the container by passing the required UID using environment
variables.

The advantages of creating the user when building the image is that we can also
install the dependencies in the Dockerfile, thus eliminating the need to
rebuild them for every other application.
The disadvantage is that the image would need to be rebuilt for every user on
every machine.

Creating the user when first starting the container has the advantage of not
requiring image rebuilds.
But, as the dependencies need to be installed after creating the user, you'd
have to waste resources by installing them for every user and every app on
every machine (each time when creating a container).

For my project [jekyll-docker] I opted for the former approach, making sure the
`jekyll` process runs with the same UID as the user who built the image (unless
it was built by root, in which case it falls back to a custom UID of 999).

[jekyll-docker]: https://github.com/egor-tensin/jekyll-docker

Useful links
------------

* [Docker and \-\-userns-remap, how to manage volume permissions to share data between host and container?](https://stackoverflow.com/q/35291520/514684)
* [What is the (best) way to manage permissions for Docker shared volumes?](https://stackoverflow.com/q/23544282/514684)
* [Handling Permissions with Docker Volumes](https://denibertovic.com/posts/handling-permissions-with-docker-volumes/)
* [File Permissions: the painful side of Docker](https://blog.gougousis.net/file-permissions-the-painful-side-of-docker/)
* [Avoiding Permission Issues With Docker-Created Files](https://vsupalov.com/docker-shared-permissions/)