GCP, Ansible and GitLab - Part II

Following the the blog post GCP, Ansible, GitLab and Puppet - Part I here comes part II of it. As you may have noticed, currently Puppet is not uses for the setup for now. Therefore, I stripped it away and to sum it up, it is not in use for the Google Cloud Platform instances at the moment. As often, this will be a longer post and probably it fill the final part from the technical point of view. Maybe I will write a third one, after we have been used the setup for a couple of months. If you need details about some of this steps, let me know via Twitter.

This post is divided into multiple sections covering different topics. It does also include the GitLab pipeline schedule which are done via an Ansible container executed by the GitLab scheduled CI/CD.

The Ansible setup

As described in part I, we use Ansible to setup the GCP instances. I’ve reworked the Ansible playbooks to integrate the possibility to call some webhooks during the play run to track and monitor some statistics with Zabbix. As for now, this playbooks are using Ansible 2.8 - with Ansible 2.9 they have introduced some really important changes to the Ansible GCP modules. Therefore, the Ansible Plays made for Ansible 2.8 are not compatible with Ansible version 2.9. The main differences are in the getting facts module, which is renamed, and in the behavior of the labeling process. Maybe I will post and update on this.

Create GCP instance Ansible playbook

As I wrote in part I, the creation of the GCP instance is easy if you are creating the network at the same time as you are creating the instance. If not, like in our case, it is necessary, that get the information from the already existing VPC network within your project. The trick is, that you take care about the result, as it is using items and therefore you have to use the correct access method to get the information about the network, for example: subnetwork['items'][0].

The next needed point is, that the instances must be labeled. Later, this will help us, to find all running/not running instances without harming any instanced which are created alongside the GitLab runners in the same VPC network inside the same GCP account. As written above, in Ansible 2.8 the labeling of instances is a little bit complicated, because it cannot be done during the instance creation. This is the reason why we have to use the GCP resource URL to update the labels accordingly.

- hosts: localhost
  connection: local
  gather_facts: no
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
    - name: get info on a network
        - name = "{{ gcp_network_vpc }}"
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: network

    - name: debug
        var: network['items'][0] 
    - name: get info on a subnet-network
        - name = "{{ gcp_network_subnetwork_vpc }}"
        project: "{{ gcp_project }}"
        region: "{{ region }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: subnetwork

    - name: debug
        var: subnetwork['items'][0]
    - name: create a disk
        name: "{{ gcp_instance_name }}-disk"
        size_gb: 50
        source_image: projects/ubuntu-os-cloud/global/images/family/ubuntu-1804-lts
        zone: "{{ zone }}"
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
        state: present
      register: disk
    - name: create a instance
        name: "{{ gcp_instance_name }}-instance"
        machine_type: n1-highcpu-8
          preemptible: 'true'
        - auto_delete: 'true'
          boot: 'true'
          source: "{{ disk }}"
        - network: "{{ network['items'][0] }}"
          subnetwork: "{{ subnetwork['items'][0] }}"
          - name: External NAT
            type: ONE_TO_ONE_NAT
        zone: "{{ zone }}"
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
        state: present
      register: result

    - name: debug
        msg: "{{ result.selfLink }}"

    - name: Add labels on an existing instance (using resource_url)
        project_id: "{{ gcp_project }}"
        credentials_file: "{{ gcp_cred_file }}"
          type: gitlab-runner
        resource_url: "{{ result.selfLink }}"
        state: present

Configure GCP instance Ansible playbook

After the creation of the GCP instance has finished, the configuration of the instance has to be done. There is nothing special here, just installing Docker and handling some configuration tasks, like changing the DNS resolvers, disabling netplan, some some other minor changes. We need those changes, because we will run the GitLab runners within a private VPC connected via VPN. The only really important thing here is the linting of the netplan configuration. Editing yaml with Ansible is §$%& !

- hosts: localhost
  connection: local
  gather_facts: no
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
    - name: get info on an instances
        zone: "{{ zone }}"
          - labels.type:gitlab-runner
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: allinstances

    - name: debug
        msg: "{{ allinstances }}"

    - name: Add all instance public IPs to host group
        name: "{{ item.networkInterfaces.0.networkIP }}"
          - gcpinstances
      with_items: "{{ allinstances['items'] }}"

- hosts:  gcpinstances
  gather_facts: no
    - name: Wait for SSH to come up
        module: wait_for
- hosts:  gcpinstances
    ansible_user: sa_your_sa_user_id
  gather_facts: yes
    - name: Pinging on "{{inventory_hostname}}"
    - name: Configure systemd resolved configuration
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/resolved.conf
        dest: /etc/systemd/resolved.conf
      become: yes

    - name: Disable cloud.cfg
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/99-disable-network-config.cfg
        dest: /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
      become: yes

    - name: Disable netplan dns
        path: /etc/netplan/50-cloud-init.yaml
        insertafter: '.*dhcp4: true'
        block: |2
                          use-dns: no
      become: yes

    - name: Add Docker GPG key
      apt_key: url=https://download.docker.com/linux/ubuntu/gpg
      become: yes

    - name: Add Docker APT repository
        repo: deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ansible_distribution_release}} stable
      become: yes

    - name: Install Docker
        name: "docker-ce=5:19.03.5~3-0~ubuntu-bionic"
        state: present
        update_cache: yes
      become: yes

    - name: Copy Docker gitlab-runner systemd file
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner.service
        dest: /etc/systemd/system/docker-gitlab-runner.service
      become: yes

    - name: Copy Docker gitlab-runner systemd start file
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner-start.sh.j2
        dest: /usr/local/bin/docker-gitlab-runner-start.sh
        mode: "0744"
      become: yes

    - name: Copy Docker gitlab-runner systemd stop file
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner-stop.sh
        dest: /usr/local/bin/docker-gitlab-runner-stop.sh
        mode: "0744"
      become: yes

    - name: Enable Docker gitlab-runner systemd service
        name: docker-gitlab-runner
        daemon_reload: yes
        enabled: yes
        state: started
        masked: no
      become: yes

Stopall GCP instance Ansible playbook

The next thing we need, is an Ansible playbook to stop all running instances. This will later be used by the GitLab scheduled CI/CD Pipeline. The Ansible playbook is split in two parts because otherwise it is not possible to trigger a webhook on every machine that is stopped. We use this webhook, to submit the result of the playbook to Elastic. There we can retrieve a neat statistic how often we stopped the GitLab runners. You can find this information in the second part of the yaml files below. The trick is, that you can only use one single command if you are using the Ansible with items statement - but hey, you can use an include there and include multiple tasks from another file! Yes!

In the first part of the Ansible playbook, we trigger some webhooks to monitor the status of the GCP GitLab runners - we will to the same for the starting of the GitLab runner. With this, we can get nice graphics about how many GitLab runners are still running, who many have to restarted and so on.

Here comes the first part of the Ansible playbook

- hosts: localhost
  connection: local
  gather_facts: no
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
    - name: Get NOTRUNNING instances
        zone: "{{ zone }}"
        - labels.type:gitlab-runner AND status:TERMINATED
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: notrunning

    - name: Webhook call
        url: https://your-webhook-url/webhook/gcp-pre-terminated
        method: POST
        body: "{{ notrunning }}"
        body_format: json

    - name: Get RUNNING instances
        zone: "{{ zone }}"
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
        url: https://your-webhook-url/webhook/gcp-pre-running
        method: POST
        body: "{{ running }}"
        body_format: json

    - name: STOP all not running instances
      include_tasks: gcp_compute_stopall_webhook_gitlabrunner.yml
      with_items: "{{ running['items'] }}"

    - name: Get RUNNING instances
        zone: "{{ zone }}"
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
        url: https://your-webhook-url/webhook/gcp-post-running
        method: POST
        body: "{{ running }}"
        body_format: json

Here comes the second part of the Ansible playbook

- name: STOP all not running instances
    name: "{{ item.name }}"
    status: TERMINATED
    zone: "{{ zone }}"
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"
    state: present
- name: Trigger webhook
    url: https://your-webhook-url/webhook/gcp-glr
    method: POST
    body: "{{ item }}"
    body_format: json

Keepalive GCP instance Ansible playbook

As you may have mentioned, we are using preemtible instances for the GitLab runners, because they are cheap and can perfectly be used for CI/CD jobs, like in our case QF tests (web ui frontend tests). The premtible GCP instances could be stopped by GCP at any time. Therefore we run a GitLab CI/CD scheduled pipeline every 5 minutes to keep the runners alive. Of course there are better ways to achieve this, like starting the machine on every GitLab job run, but currently, it is not easy to integrate this. The following playbooks are the same, just for starting already stopped GitLab runners.

Here comes the first part of the Ansible playbook

- hosts: localhost
  connection: local
  gather_facts: no
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
    - name: Get NOTRUNNING instances
        zone: "{{ zone }}"
        - labels.type:gitlab-runner AND status:TERMINATED
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: notrunning

    - name: Webhook call
        url: https://your-webhook-url/webhook/gcp-pre-terminated
        method: POST
        body: "{{ notrunning }}"
        body_format: json
    - name: Get RUNNING instances
        zone: "{{ zone }}"
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
        url: https://your-webhook-url/webhook/gcp-pre-running
        method: POST
        body: "{{ running }}"
        body_format: json

    - name: START all not running instances
      include_tasks: gcp_compute_keepalive_webhook_gitlabrunner.yml
      with_items: "{{ notrunning['items'] }}"

    - name: Get RUNNING instances
        zone: "{{ zone }}"
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
        url: https://your-webhook-url/webhook/gcp-post-running
        method: POST
        body: "{{ running }}"
        body_format: json

Here comes the second part of the Ansible playbook

- name: Start all not running instances / Report Webhook
    name: "{{ item.name }}"
    status: RUNNING
    zone: "{{ zone }}"
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"
    state: present
- name: Webhook call
    url: https://your-webhook-url/webhook/gcp-glr
    method: POST
    body: "{{ item }}"
    body_format: json

Cool, but show me some graphs now!

First, the webhooks described above, will result in a nice Zabbix graph where you can see, how often the GCP GitLab runners were not running. The graph shows one full week and there were only two times were some of the currently five GPC GitLab runner instances were not running. We are using the GCP GitLab runners from 7am until 6 pm. The crossing lines were marking those points, were the stopall and keepalives are started the moring and in the evening.

Inside Elasitc, we can see the Ansible output of the playbook runs.

Running the Ansible playbooks with GitLab

After we’ve created the playbooks, we would like to run them. This can be done by a simple cron job of course, but we can do this also with GitLab which gives us some benefits. The first one is, that we can use the Git repository where or Ansible playbook sources are stored. The second is, that we can use secure variables for the GCP credentials and the last but most important one is, that we can create a Ansible image which contains the correct Ansible version that we need to run this Ansible playbooks. Later we can easily migrate the Ansible playbooks to a newer version without breaking the existing ones and of course we can separate the Ansible playbook source to a separate repository.

Creating a Ansible Docker image

The Dockerfile for the Ansible image is pretty simple, because it just installs Python pip, Ansible and the dependencies which are needed for the Google GCP.

FROM ubuntu:18.04

RUN apt update && apt install -y python-pip openssh-client git && pip install requests google-auth google-api-python-client ansible==2.8.8

Keepalive Git branch

The stopall and the keepalive branch will contain mostly the same, only the triggered scripts will be different.

Here is the gitlab-ci.yml:

image: <your-gitlab-registry>/image:gcp-2.8.8-2020-02-03-01

  DOCKER_DRIVER: overlay

  - docker:dind

  stage: build
    - ./keepalive.sh 
    - docker-build
    - manual

  stage: build
    - ./keepalive.sh 
    - docker-build
    - schedules

And the keepalive.sh contains the following:

git clone https://gitlab+deploy-token-31:$ANSIBLE_REPOSITORY_DEPLOY_TOKEN@<your-ansible-source-repositroy>/ansible/legacy.git /var/opt/ansible-prod

cd /var/opt/ansible-prod/plays/gcp-compute-gitlabrunner-prod


Running the GitLab schedules

At last, you need to configure some schedules to run. Finally this will look like this:

And the result will be:


This post should give you an idea about how you can run GitLab runners in GCP with preemtible instances. If you need more details, reach out for me on Twitter (or somewhere else)! Happy hacking!

