Writing a Docker Volume Plugin for CephFS

Currently we are evaluating Ceph for our Docker/Kubernetes on-premise cluster for persistent volume storage. Kubernetes officially supports CephRBD and CephFS as storage volume driver. Docker does not offer a Docker Volume plugin for CephFS currently.

But there are some plugins available online. A Google search comes up with a handful plugins that supports the CephFS protocol but the results are quite old (> 2 years) and outdated or they are using too much dependencies like direct Ceph cluster communication.

This blog post will be a little longer, as it is necessary to provide some basic facts about Ceph and because there are some odd pitfalls during the Plugin creation. Without the great Docker Volume Plugin for SSHFS which is written by Victor Vieux it won’t be possible for me to get the clue about the Docker Volume Plugin structure! Thank you for your work!

Source code of the Docker Volume Plugin for CephFS can be found here.

About Ceph

Basically Ceph is a storage platform that provides three types of storage: RBD (Rados Block Device), CephFS (Shared Filesystem) and ObjectStorage(S3 compatible protocol). Beside this, Ceph offers some API interfaces to operate the Ceph storage remotely. Usually the mounting of the RBD and CephFS is enabled by installing the Ceph client part into your Linux machine via APT, YUM or whatever available. This client side software will install a Linux kernel module which can be used for a classic mount command like mount -t ceph .... Alternatively the use of fuse is also possible. The usage of the client side bindings can be tricky, when different versions of the Ceph Cluster (eg Minic release) and Ceph Client (eg Luminous) are in use. This may lead to the situation where someone creates a RBD device which has a newer feature set than the client which may lead to a non mountable file system.

RBD devices are meant to be exclusively mounted by exactly one end system, like a container which is pretty clear as you would also never share a physical device between two end systems. RBD block devices therefore cannot be shared between multiple containers. Most of the RBD volume plugins are able to create such a device during the creation of a volume if it does not exist. This means that the plugin must be able to communicate with the Ceph Cluster either via the installed Ceph Client software on the server or via the implementation of one of the Ceph API libraries.

CephFS is a shared filesystem which is backed by the Ceph cluster and which can be shared between multiple end systems like any other shared file system you may know. It has some nice features like file system paths which can be authorised separately.

The Kubernetes Persistent Volume documentation contains a matrix about the different file systems and which modes (ReadWriteOnce, ReadOnlyMany, ReadWriteMany) they support.

Docker Volume Plugin Anatomy

Due to the great work of Victor Vieux I was able to get used to the anatomy of the Docker Volume plugin as the official Docker documentation is a little bit, uhm, short. I’am not a programmer ( Especially the docker GitHub go-plugin-helpers repository contains a lot of useful stuff and in sum I was able to copy/paste/change the plugin within a day.

The api.go file of the plugin helper contains the interface method description which needs to be implemented by a plugin.

Some words about the interface:

Get and List are used retrieve the information about a volume and to list the volumes powered by the volume plugin when someone executes docker volume ls.

Create creates the volume with the volume plugin but it will not call the mount command at this time. The volume is only created and nothing more.

Mount is called when a container, which will use created volume, starts.

Path is used to track the mount paths for the container.

Unmount is called when the containers stops.

Remove is called, when the deletion of the volume is requested.

Capabilities is used to describe the needed capabilities of the Docker volume plugin, for example net=host if the plugin needs network communication.

Beside this, every plugin contains a config.json file which describes the configuration (and capabilities) of the plugin.

The plugin itself must use a special file structure, called rootfs!

Howto write the plugin

OK, I admit, I just copied the Docker Volume SSHFS plugin 🙂 and after that I did the following (beside learning the structure):

1) I changed the config.json of the plugin and removed all the things that my plugin does not need
2) I changed the functions mentioned above to reflect the needs of my plugin
3) I packed together everything, test it, uploaded it.

For point 1) and 2), this is just programming and configuring. But 3) is more interesting because the are the pitfalls an this pitfalls are described in the following section.

The pitfalls

Pitfall 1 Vendors

The first thing I did during the development was to refresh the vendors. And this was also my first problem, at it was not possible to get the Plugin up and running. There is a little bug in the api.go of the helper. The CreatedAt cannot be JSON encoded if it empty. There is already a GitHub PR for it, which simply adds the needed annotations to the config. You can use the PR or you just add the needed annotations to the struct like this:

Pitfall 2 Make

The SSHFS Docker Volume is great! Make yourself life easier and use the provided Makefile! You can create the plugin rootfs with it ( make rootfs) and you can easily create the plugin with it ( make create)!

Pitfall 3 Push

After I’ve done all the work I uploaded the source code to GitLab and created a pipeline to push the resulting Docker Container image to Docker Hub so everyone can use it. But this won’t work. After fiddling around an hour, I had the eye opener. The command docker plugin has a separate push function. So you have to use docker plugin push to push a Docker plugin to Docker Hub!

Be aware: The Docker push repository must not exist before your fist push! If you create a repository manually or you push a Container into it, it will be flagged as Container repository and you can never ever push a plugin to it! The error message will be denied: requested access to the resource is denied.

To be able to push the plugin, it must be installed (at least created) in your local Docker engine. Otherwise you cannot push it!

Pitfall 4 Wrong Docker image

Be aware that you use the correct Docker image if you are writing a plugin. If you build your binary with Ubuntu, you might not be able to run it inside your final Docker Volume Plugin container because the image you use is based on Alpine (or the other way around)

Pitfall 5 Unresolved dependencies

Be sure to include all you dependencies in your Docker image build process. For example: If you need the gluster-client, you will have to install them in your Dockerfile to have the dependencies in place when the Docker Volume Plugin image is loaded by the container engine.

Pitfall 6 Linux capabilities

Inside the Docker Plugin configuration, you have to specify all Linux capabilities you need for your plugin. If you miss a capability, the plugin will not do what you like that it does. Eg:

Debug

A word about debugging a Docker Volume Plugin. Beside the information you get from the Docker site (debug via docker socket), I found it helpful to just use the resulting Docker Volume image as a normal Container via docker run. This gives you the ability to test if the Docker image is including all the stuff that you can do what you want with your plugin later. If you go this way, you have to use the correct docker run options with all the capabilities, devices and the privileged flag. Yes, Docker Volume Plugins run privileged! Here is a example command: docker run -ti --rm --privileged --net=host --cap-add SYS_ADMIN --device /dev/fuse myrootfsimage bash. After this, test if all features are working.

Thats all! If you have questions, just contact me via the various channels.
-M

Mario Kleinsasser on GithubMario Kleinsasser on LinkedinMario Kleinsasser on Twitter
Mario Kleinsasser
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, don't hesitate and contact me! - M

Docker Swarm Network – Down the Rabbit Hole

Last week we tracked down a recurring problem with our Docker Swarm, more exactly with the Docker overlay network. To anticipate it, there is a merge which might fix this, but not for Docker-CE 18.03 . The pull mentioned is also not included in Docker-CE 18.06.1 but it is already merged into Moby and part of Docker-CE 18.09.0-ce-tp5 which means that the fix should be available with Docker-CE 18.09.

Description of the problem

If you try to start a container or if you have Docker Swarm which starts containers for you, you might see that the containers cannot start on specific hosts. If you take a look into the log files, you find lines like this:

This means, that a network VXLAN interface for a new container which would like to join a overlay network already exists.

Explanation

The next sentences are not deeply scientific, they are more a sum up of multiple information an experience. As I understand, the startup sequence of a container (driven by dockerd) which uses a overlay network is as follows:

1) Create a VXLAN interface which uses the VXLAN id of the associated Docker network ( docker network create --driver=overlay ...) – at this point the VXLAN interface is visible on the host ( ip -d link show)
2) Then dockerd puts the VXLAN interface into the namespace of the container – at this point the VXLAN interface is not visible anymore on the host
3) When the container stops, the device is given back to the host
4) The device is deleted by the dockerd

Between 3) and 4) a race condition happens and the network device is not deleted.

The important hint to find out more was given by the user gitbensons on github – Kudos to him! He pointed out, that it is possible to find the already existing VXLAN device by running strace against the dockerd process. Here is the strace command to use just before starting an affected container.

THIS COMMAND IS DANGEROUS! IF YOU RUN IT FOR TOO LONG, YOU WILL PROBABLY KILL THE DOCKERD!!!

In the output of the previous command, you can see, that the affected device has the name vx-00106c-clblt. The last five characters of the device name, in this example clblt are specifying the affected overlay network id (short). Login to a Docker manger, run docker network ls | grep clblt and you can find the name of the affected overlay network.

At this point we know, which VXLAN device is still there but shouldn’t. In the next step, just list all vx-* devices on the affected host by doing:

Ups. Now we have a problem. All of this devices are dead ( state DOWN) but where not deleted! This means that on this Docker host, it will not be possible to start containers which would like to join one of the affected overlay networks (look at the id’s).

Solution

After finding the problematic device, you can delete it with ip link delete vx-00100f-drzik for example. Maybe it would be a good practice to delete all devices and to monitor your hosts if there are such devices, as is an indicator that something happens which will prevent starting further containers for the affected networks.

Summary

From the Urban dictionary: Rabbit Hole: Metaphor for the conceptual path which is thought to lead to the true nature of reality. Infinitesimally deep and complex, venturing too far down is probably not that great of an idea.

It is hard to accept, that the error message does not write which file is already existing. I know the cause is found in golang, because if you only print err, you will not get any information about which file is already existing. Writing which interface is already existing would be nice, or deleting it automatically on container start would be even nicer 🙂 But I won’t dig deeper, as there is already a merge … don’t forget Rabbit Holes are dangerous 😉

Mario Kleinsasser on GithubMario Kleinsasser on LinkedinMario Kleinsasser on Twitter
Mario Kleinsasser
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, don't hesitate and contact me! - M

Testing Remote TCP/IP Connectivity – IBM i (AS400)

Preface

I’m used to run telnet to do a quick check if a remote server is reachable and listening on a specific port. When trying this on i5/OS with telnet CMD you may get headache!
After some research I ended up with openssl in PASE to succeed my task on IBM i (AS400).

telnet vs openssl syntax

On Telnet 5250 Command Line you first have to enter PASE using

Then run

instead of

as it is not installed in PASE.

Excamples

Success, with server using SSL

with openssl

with telnet

Failure

with openssl

with telnet

Success, with server not using SSL

with openssl

with telnet

Markus Neuhold on BehanceMarkus Neuhold on EmailMarkus Neuhold on GithubMarkus Neuhold on Twitter
Markus Neuhold
Markus Neuhold
IBMi (AS400) sysadmin since 1997, linux fanboy and loving open source, docker and all about tech and science.

Docker South Austria meets DevOps & Security Meetup in Vienna

Just to let you know, I will give a talk at the DevOps & Security Meetup Vienna. The topic of my talk will be GitOps: DevOps in Real Life.

Here is the featured text of the talk in German:

Container-Technologien in einem On-Premises Umfeld einzuführen bringt viele Veränderungen mit sich. Besonders die Evolution in der Zusammenarbeit zwischen den Teams, sowie der ständige Wandel und die Weiterentwicklung der Technologien, führen zu immer neuen Herausforderungen aber auch zu immer neuen, kreativen und innovativen Lösungen. Seit dem Beginn der Veränderungen vor zwei Jahren haben wir uns ständig verbessert und die GitOps Methode eingeführt. Viele dieser Entwicklungen finden häufig in Public Clouds statt, stehen jedoch ebenso On-Premises zur Verfügung. Dass der Einsatz dieser Technologien On-Premises möglich ist und ausgezeichnet funktioniert, werde ich in meinem Talk zeigen. Special Guests: GitLab, Puppet, Prometheus, Elastic, Docker, CoreDNS, Kubernetes und viele mehr 🙂

If you like, please join us in Vienna on the 3th of July!

-M

Mario Kleinsasser on GithubMario Kleinsasser on LinkedinMario Kleinsasser on Twitter
Mario Kleinsasser
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, don't hesitate and contact me! - M

Kubernetes the roguelike way

It has been a while since the last post, but we had a busy time. Together with my colleagues I wrote a large documentation called Kubernetes the roguelike way over the last few weeks.

The documentation is about our setup of Kubernetes which we use on premise. We will continue the documentation as we are moving forward in our progress. Have a lot of fun and if you have suggestions, please open an issue in the GitLab project.

Mario Kleinsasser on GithubMario Kleinsasser on LinkedinMario Kleinsasser on Twitter
Mario Kleinsasser
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, don't hesitate and contact me! - M

Speaker at DevOps Gathering 2018

This year and for the first time I attended as speaker at the DevOps Gathering 2018 conference at Bochum Germany! Long story short, it was an extremely great experience for me!

The idea for giving this talk was born in October 2017. I knew that Peter Rossback (a long term friend of us) invented a conference at Bochum Germany for the first time in March 2017. Due to our own journey along the DevOps way, which started in the beginning of 2017, I decide to try to give something back to the container community. Of course, we and I are already participating to various Open Source projects, but I looked for the chance to give a talk too to find out if I am convenient with it. After putting in my CFP for the DevOps Gathering 2018, the organizers decide to give me the chance to speak to a larger audience.

I’ve started the detailed preparation of the talk at the end of the last year and continuously updated it along the things changed at work. There are always last knowings you would like to reflect in a talk. There are fine details here and there and on the last weekend before the talk I decided to recreate a graphic I use in the presentation on my own for example.

My trip to the Devops Gathering 2018 started on Monday afternoon with the drive to the Salzburg airport. As I arrived there, I recognized that may flight to Düsseldorf was delayed by an hour. Not a huge problem as I had plenty of time and after some waiting the plane to Düsseldorf took off. After a short flight (one and a half hour) I took the train from Düsseldorf to Bochum and arrived there at approximately 9 pm. On the first day the speakers dinner took place at a Greek restaurant in Bochum which I managed to attend. That was the first that I met with the other speakers. And obviously it was great (see the pictures)

The next day I arrived at the conference location at 8 am. I already knew the boys and girls from the Bee42 gmbh and therefore it was a comfortable situation for me. The conference took place at the GData campus, a very nice location. And as the hour hand came close to 9 pm, the location gets filled up with people. The DevOps Gathering was sold out and in sum there were more than 150 attendees.

On this day after the lunch break my talk takes place at 1:15 pm. I was already on stage because I placed a video (including audio) in my talk. Audio is always an interesting thing. You never know if it works during a presentation, if you do not try it beforehand. I talked to the tech-staff and we managed it to get it up and running with audio too. By the way, all tracks were recorded. Here are the videos of the tracks from the last year and I am sure that the new ones will pop up there too in the following days.

From my point of view, my talk runs very smooth. No huge problems, only some minor ones. My talk, as all of the other talks, lasts 45 minutes (in English of course) with Q&A afterwards. And to sum it up, I was really fun! Yes, I think it is exciting for me to give talks. If you never try things out and leave your comfort zone, you will never knew what is exciting and satisfaction for you.

During the whole day I had the possibility to talk with the other speakers and of course with attendees too and that was very interesting! In the evening the DevOps Gathering party takes place at the GData campus. Drinks, food, music, cool people and tons of fun! If you have the chance, attend to the next conference if it is possible for you! It is worth it! I came in contact with a lot of people there. For example Roland Huß(Red Hat), Docker captain Viktor Farcic (CloudBees), Thomas Fricke (CTO of Endocode), Jan Bruder (Rancher Labs) just to name some of them and many others, though.

On the second day I had to travel back and therefore I had to left the conference early. But from the talks which I have seen on the second day, I can say that the second day was great too!

After my journey to the DevOps Gathering 2018 I can say that it was very exiting and enjoyable to give a talk, to meet a lot of great people and finally to learn much new things. The discussions we had were great too. Everyone has his or her own experience but we were always able to understand and respect the position of each other and this is what makes conferences so wonderful! Side note: I did this adventure on my own expense. Why? As I wrote above, if you never leave your comfort zone you will never find out what is exiting and enjoyable for you. Now I know, that I would like to continue this way! There was also an attendees questionnaire about the conference in general and the speakers. I am anxious what my marks are and now I am waiting for the videos of the DevOps Gathering 2018.

You can find all speaker an their talks here – click. Have a lot of fun!

-M-

Mario Kleinsasser on GithubMario Kleinsasser on LinkedinMario Kleinsasser on Twitter
Mario Kleinsasser
Mario Kleinsasser
Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, don't hesitate and contact me! - M

Older blog entries...