Writing a Docker Volume Plugin for CephFS

Currently we are evaluating Ceph for our Docker/Kubernetes on-premise cluster for persistent volume storage. Kubernetes officially supports CephRBD and CephFS as storage volume driver. Docker does not offer a Docker Volume plugin for CephFS currently.

But there are some plugins available online. A Google search comes up with a handful plugins that supports the CephFS protocol but the results are quite old (> 2 years) and outdated or they are using too much dependencies like direct Ceph cluster communication.

This blog post will be a little longer, as it is necessary to provide some basic facts about Ceph and because there are some odd pitfalls during the Plugin creation. Without the great Docker Volume Plugin for SSHFS which is written by Victor Vieux it won’t be possible for me to get the clue about the Docker Volume Plugin structure! Thank you for your work!

Source code of the Docker Volume Plugin for CephFS can be found here.

About Ceph

Basically Ceph is a storage platform that provides three types of storage: RBD (Rados Block Device), CephFS (Shared Filesystem) and ObjectStorage(S3 compatible protocol). Beside this, Ceph offers some API interfaces to operate the Ceph storage remotely. Usually the mounting of the RBD and CephFS is enabled by installing the Ceph client part into your Linux machine via APT, YUM or whatever available. This client side software will install a Linux kernel module which can be used for a classic mount command like mount -t ceph .... Alternatively the use of fuse is also possible. The usage of the client side bindings can be tricky, when different versions of the Ceph Cluster (eg Minic release) and Ceph Client (eg Luminous) are in use. This may lead to the situation where someone creates a RBD device which has a newer feature set than the client which may lead to a non mountable file system.

RBD devices are meant to be exclusively mounted by exactly one end system, like a container which is pretty clear as you would also never share a physical device between two end systems. RBD block devices therefore cannot be shared between multiple containers. Most of the RBD volume plugins are able to create such a device during the creation of a volume if it does not exist. This means that the plugin must be able to communicate with the Ceph Cluster either via the installed Ceph Client software on the server or via the implementation of one of the Ceph API libraries.

CephFS is a shared filesystem which is backed by the Ceph cluster and which can be shared between multiple end systems like any other shared file system you may know. It has some nice features like file system paths which can be authorised separately.

The Kubernetes Persistent Volume documentation contains a matrix about the different file systems and which modes (ReadWriteOnce, ReadOnlyMany, ReadWriteMany) they support.

Docker Volume Plugin Anatomy

Due to the great work of Victor Vieux I was able to get used to the anatomy of the Docker Volume plugin as the official Docker documentation is a little bit, uhm, short. I’am not a programmer ( Especially the docker GitHub go-plugin-helpers repository contains a lot of useful stuff and in sum I was able to copy/paste/change the plugin within a day.

The api.go file of the plugin helper contains the interface method description which needs to be implemented by a plugin.

Some words about the interface:

Get and List are used retrieve the information about a volume and to list the volumes powered by the volume plugin when someone executes docker volume ls.

Create creates the volume with the volume plugin but it will not call the mount command at this time. The volume is only created and nothing more.

Mount is called when a container, which will use created volume, starts.

Path is used to track the mount paths for the container.

Unmount is called when the containers stops.

Remove is called, when the deletion of the volume is requested.

Capabilities is used to describe the needed capabilities of the Docker volume plugin, for example net=host if the plugin needs network communication.

Beside this, every plugin contains a config.json file which describes the configuration (and capabilities) of the plugin.

The plugin itself must use a special file structure, called rootfs!

Howto write the plugin

OK, I admit, I just copied the Docker Volume SSHFS plugin 🙂 and after that I did the following (beside learning the structure):

1) I changed the config.json of the plugin and removed all the things that my plugin does not need
2) I changed the functions mentioned above to reflect the needs of my plugin
3) I packed together everything, test it, uploaded it.

For point 1) and 2), this is just programming and configuring. But 3) is more interesting because the are the pitfalls an this pitfalls are described in the following section.

The pitfalls

Pitfall 1 Vendors

The first thing I did during the development was to refresh the vendors. And this was also my first problem, at it was not possible to get the Plugin up and running. There is a little bug in the api.go of the helper. The CreatedAt cannot be JSON encoded if it empty. There is already a GitHub PR for it, which simply adds the needed annotations to the config. You can use the PR or you just add the needed annotations to the struct like this:

Pitfall 2 Make

The SSHFS Docker Volume is great! Make yourself life easier and use the provided Makefile! You can create the plugin rootfs with it ( make rootfs) and you can easily create the plugin with it ( make create)!

Pitfall 3 Push

After I’ve done all the work I uploaded the source code to GitLab and created a pipeline to push the resulting Docker Container image to Docker Hub so everyone can use it. But this won’t work. After fiddling around an hour, I had the eye opener. The command docker plugin has a separate push function. So you have to use docker plugin push to push a Docker plugin to Docker Hub!

Be aware: The Docker push repository must not exist before your fist push! If you create a repository manually or you push a Container into it, it will be flagged as Container repository and you can never ever push a plugin to it! The error message will be denied: requested access to the resource is denied.

To be able to push the plugin, it must be installed (at least created) in your local Docker engine. Otherwise you cannot push it!

Pitfall 4 Wrong Docker image

Be aware that you use the correct Docker image if you are writing a plugin. If you build your binary with Ubuntu, you might not be able to run it inside your final Docker Volume Plugin container because the image you use is based on Alpine (or the other way around)

Pitfall 5 Unresolved dependencies

Be sure to include all you dependencies in your Docker image build process. For example: If you need the gluster-client, you will have to install them in your Dockerfile to have the dependencies in place when the Docker Volume Plugin image is loaded by the container engine.

Pitfall 6 Linux capabilities

Inside the Docker Plugin configuration, you have to specify all Linux capabilities you need for your plugin. If you miss a capability, the plugin will not do what you like that it does. Eg:

Debug

A word about debugging a Docker Volume Plugin. Beside the information you get from the Docker site (debug via docker socket), I found it helpful to just use the resulting Docker Volume image as a normal Container via docker run. This gives you the ability to test if the Docker image is including all the stuff that you can do what you want with your plugin later. If you go this way, you have to use the correct docker run options with all the capabilities, devices and the privileged flag. Yes, Docker Volume Plugins run privileged! Here is a example command: docker run -ti --rm --privileged --net=host --cap-add SYS_ADMIN --device /dev/fuse myrootfsimage bash. After this, test if all features are working.

Thats all! If you have questions, just contact me via the various channels.
-M

Mario Kleinsasser on GithubMario Kleinsasser on LinkedinMario Kleinsasser on Twitter
Mario Kleinsasser
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, don't hesitate and contact me! - M