GCP, Ansible and GitLab - Part II

Estimated reading time: 11 mins

Following the the blog post GCP, Ansible, GitLab and Puppet - Part I here comes part II of it. As you may have noticed, currently Puppet is not uses for the setup for now. Therefore, I stripped it away and to sum it up, it is not in use for the Google Cloud Platform instances at the moment. As often, this will be a longer post and probably it fill the final part from the technical point of view. Maybe I will write a third one, after we have been used the setup for a couple of months. If you need details about some of this steps, let me know via Twitter.

This post is divided into multiple sections covering different topics. It does also include the GitLab pipeline schedule which are done via an Ansible container executed by the GitLab scheduled CI/CD.

The Ansible setup

As described in part I, we use Ansible to setup the GCP instances. I've reworked the Ansible playbooks to integrate the possibility to call some webhooks during the play run to track and monitor some statistics with Zabbix. As for now, this playbooks are using Ansible 2.8 - with Ansible 2.9 they have introduced some really important changes to the Ansible GCP modules. Therefore, the Ansible Plays made for Ansible 2.8 are not compatible with Ansible version 2.9. The main differences are in the getting facts module, which is renamed, and in the behavior of the labeling process. Maybe I will post and update on this.

Create GCP instance Ansible playbook

As I wrote in part I, the creation of the GCP instance is easy if you are creating the network at the same time as you are creating the instance. If not, like in our case, it is necessary, that get the information from the already existing VPC network within your project. The trick is, that you take care about the result, as it is using items and therefore you have to use the correct access method to get the information about the network, for example: subnetwork['items'][0].

The next needed point is, that the instances must be labeled. Later, this will help us, to find all running/not running instances without harming any instanced which are created alongside the GitLab runners in the same VPC network inside the same GCP account. As written above, in Ansible 2.8 the labeling of instances is a little bit complicated, because it cannot be done during the instance creation. This is the reason why we have to use the GCP resource URL to update the labels accordingly.

- hosts: localhost
  connection: local
  gather_facts: no
  vars:
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
  tasks:
    - name: get info on a network
      gcp_compute_network_facts:
        filters:
        - name = "{{ gcp_network_vpc }}"
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: network

    - name: debug
      debug:
        var: network['items'][0] 
    
    - name: get info on a subnet-network
      gcp_compute_subnetwork_facts:
        filters:
        - name = "{{ gcp_network_subnetwork_vpc }}"
        project: "{{ gcp_project }}"
        region: "{{ region }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: subnetwork

    - name: debug
      debug:
        var: subnetwork['items'][0]
    
    - name: create a disk
      gcp_compute_disk:
        name: "{{ gcp_instance_name }}-disk"
        size_gb: 50
        source_image: projects/ubuntu-os-cloud/global/images/family/ubuntu-1804-lts
        zone: "{{ zone }}"
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
        state: present
      register: disk
    
    - name: create a instance
      gcp_compute_instance:
        name: "{{ gcp_instance_name }}-instance"
        machine_type: n1-highcpu-8
        scheduling:
          preemptible: 'true'
        disks:
        - auto_delete: 'true'
          boot: 'true'
          source: "{{ disk }}"
        network_interfaces:
        - network: "{{ network['items'][0] }}"
          subnetwork: "{{ subnetwork['items'][0] }}"
          access_configs:
          - name: External NAT
            type: ONE_TO_ONE_NAT
        zone: "{{ zone }}"
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
        state: present
      register: result

    - name: debug
      debug:
        msg: "{{ result.selfLink }}"

    - name: Add labels on an existing instance (using resource_url)
      gce_labels:
        project_id: "{{ gcp_project }}"
        credentials_file: "{{ gcp_cred_file }}"
        labels:
          type: gitlab-runner
        resource_url: "{{ result.selfLink }}"
        state: present

Configure GCP instance Ansible playbook

After the creation of the GCP instance has finished, the configuration of the instance has to be done. There is nothing special here, just installing Docker and handling some configuration tasks, like changing the DNS resolvers, disabling netplan, some some other minor changes. We need those changes, because we will run the GitLab runners within a private VPC connected via VPN. The only really important thing here is the linting of the netplan configuration. Editing yaml with Ansible is §$%& !

- hosts: localhost
  connection: local
  gather_facts: no
  vars:
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
  tasks:
    - name: get info on an instances
      gcp_compute_instance_facts:
        zone: "{{ zone }}"
        filters:
          - labels.type:gitlab-runner
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: allinstances

    - name: debug
      debug:
        msg: "{{ allinstances }}"

    - name: Add all instance public IPs to host group
      add_host: 
        name: "{{ item.networkInterfaces.0.networkIP }}"
        groups:
          - gcpinstances
      with_items: "{{ allinstances['items'] }}"


- hosts:  gcpinstances
  gather_facts: no
  tasks:
    - name: Wait for SSH to come up
      local_action:
        module: wait_for
          host={{inventory_hostname}}
          port=22 
          delay=1 
          timeout=180
          
- hosts:  gcpinstances
  vars:
    ansible_user: sa_your_sa_user_id
  gather_facts: yes
  tasks:
    - name: Pinging on "{{inventory_hostname}}"
      ping:
    
    - name: Configure systemd resolved configuration
      copy:
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/resolved.conf
        dest: /etc/systemd/resolved.conf
      become: yes

    - name: Disable cloud.cfg
      copy:
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/99-disable-network-config.cfg
        dest: /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
      become: yes

    - name: Disable netplan dns
      blockinfile:
        path: /etc/netplan/50-cloud-init.yaml
        insertafter: '.*dhcp4: true'
        block: |2
                      dhcp4-overrides:
                          use-dns: no
      become: yes

    - name: Add Docker GPG key
      apt_key: url=https://download.docker.com/linux/ubuntu/gpg
      become: yes

    - name: Add Docker APT repository
      apt_repository:
        repo: deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ansible_distribution_release}} stable
      become: yes

    - name: Install Docker
      apt:
        name: "docker-ce=5:19.03.5~3-0~ubuntu-bionic"
        state: present
        update_cache: yes
      become: yes

    - name: Copy Docker gitlab-runner systemd file
      copy:
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner.service
        dest: /etc/systemd/system/docker-gitlab-runner.service
      become: yes

    - name: Copy Docker gitlab-runner systemd start file
      template:
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner-start.sh.j2
        dest: /usr/local/bin/docker-gitlab-runner-start.sh
        mode: "0744"
      become: yes

    - name: Copy Docker gitlab-runner systemd stop file
      copy:
        src:  /var/opt/ansible-prod/files/gcp-gitlab-runner-systemd/docker-gitlab-runner-stop.sh
        dest: /usr/local/bin/docker-gitlab-runner-stop.sh
        mode: "0744"
      become: yes

    - name: Enable Docker gitlab-runner systemd service
      systemd:
        name: docker-gitlab-runner
        daemon_reload: yes
        enabled: yes
        state: started
        masked: no
      become: yes

Stopall GCP instance Ansible playbook

The next thing we need, is an Ansible playbook to stop all running instances. This will later be used by the GitLab scheduled CI/CD Pipeline. The Ansible playbook is split in two parts because otherwise it is not possible to trigger a webhook on every machine that is stopped. We use this webhook, to submit the result of the playbook to Elastic. There we can retrieve a neat statistic how often we stopped the GitLab runners. You can find this information in the second part of the yaml files below. The trick is, that you can only use one single command if you are using the Ansible with items statement - but hey, you can use an include there and include multiple tasks from another file! Yes!

In the first part of the Ansible playbook, we trigger some webhooks to monitor the status of the GCP GitLab runners - we will to the same for the starting of the GitLab runner. With this, we can get nice graphics about how many GitLab runners are still running, who many have to restarted and so on.

Here comes the first part of the Ansible playbook

- hosts: localhost
  connection: local
  gather_facts: no
  vars:
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
  tasks:
    - name: Get NOTRUNNING instances
      gcp_compute_instance_facts:
        zone: "{{ zone }}"
        filters:
        - labels.type:gitlab-runner AND status:TERMINATED
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: notrunning

    - name: Webhook call
      uri:
        url: https://your-webhook-url/webhook/gcp-pre-terminated
        method: POST
        body: "{{ notrunning }}"
        body_format: json

    - name: Get RUNNING instances
      gcp_compute_instance_facts:
        zone: "{{ zone }}"
        filters:
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
      uri:
        url: https://your-webhook-url/webhook/gcp-pre-running
        method: POST
        body: "{{ running }}"
        body_format: json

    - name: STOP all not running instances
      include_tasks: gcp_compute_stopall_webhook_gitlabrunner.yml
      with_items: "{{ running['items'] }}"

    - name: Get RUNNING instances
      gcp_compute_instance_facts:
        zone: "{{ zone }}"
        filters:
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
      uri:
        url: https://your-webhook-url/webhook/gcp-post-running
        method: POST
        body: "{{ running }}"
        body_format: json

Here comes the second part of the Ansible playbook

---    
- name: STOP all not running instances
  gcp_compute_instance:
    name: "{{ item.name }}"
    status: TERMINATED
    zone: "{{ zone }}"
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"
    state: present
- name: Trigger webhook
  uri:
    url: https://your-webhook-url/webhook/gcp-glr
    method: POST
    body: "{{ item }}"
    body_format: json

Keepalive GCP instance Ansible playbook

As you may have mentioned, we are using preemtible instances for the GitLab runners, because they are cheap and can perfectly be used for CI/CD jobs, like in our case QF tests (web ui frontend tests). The premtible GCP instances could be stopped by GCP at any time. Therefore we run a GitLab CI/CD scheduled pipeline every 5 minutes to keep the runners alive. Of course there are better ways to achieve this, like starting the machine on every GitLab job run, but currently, it is not easy to integrate this. The following playbooks are the same, just for starting already stopped GitLab runners.

Here comes the first part of the Ansible playbook

- hosts: localhost
  connection: local
  gather_facts: no
  vars:
    type: gitlab-runner
    region: europe-west3
    zone: europe-west3-a
    gcp_instance_name: "{{ nodename }}"
    gcp_project: yourprojectid
    gcp_cred_kind: serviceaccount
    gcp_cred_file: yourserviceaccount.json
    gcp_network_vpc: hub-private-vpc
    gcp_network_subnetwork_vpc: private-subnet
  tasks:
    - name: Get NOTRUNNING instances
      gcp_compute_instance_facts:
        zone: "{{ zone }}"
        filters:
        - labels.type:gitlab-runner AND status:TERMINATED
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: notrunning

    - name: Webhook call
      uri:
        url: https://your-webhook-url/webhook/gcp-pre-terminated
        method: POST
        body: "{{ notrunning }}"
        body_format: json
    
    - name: Get RUNNING instances
      gcp_compute_instance_facts:
        zone: "{{ zone }}"
        filters:
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
      uri:
        url: https://your-webhook-url/webhook/gcp-pre-running
        method: POST
        body: "{{ running }}"
        body_format: json

    - name: START all not running instances
      include_tasks: gcp_compute_keepalive_webhook_gitlabrunner.yml
      with_items: "{{ notrunning['items'] }}"

    - name: Get RUNNING instances
      gcp_compute_instance_facts:
        zone: "{{ zone }}"
        filters:
        - labels.type:gitlab-runner AND status:RUNNING
        project: "{{ gcp_project }}"
        auth_kind: "{{ gcp_cred_kind }}"
        service_account_file: "{{ gcp_cred_file }}"
      register: running

    - name: Webhook call
      uri:
        url: https://your-webhook-url/webhook/gcp-post-running
        method: POST
        body: "{{ running }}"
        body_format: json

Here comes the second part of the Ansible playbook

- name: Start all not running instances / Report Webhook
  gcp_compute_instance:
    name: "{{ item.name }}"
    status: RUNNING
    zone: "{{ zone }}"
    project: "{{ gcp_project }}"
    auth_kind: "{{ gcp_cred_kind }}"
    service_account_file: "{{ gcp_cred_file }}"
    state: present
- name: Webhook call
  uri:
    url: https://your-webhook-url/webhook/gcp-glr
    method: POST
    body: "{{ item }}"
    body_format: json

Cool, but show me some graphs now!

First, the webhooks described above, will result in a nice Zabbix graph where you can see, how often the GCP GitLab runners were not running. The graph shows one full week and there were only two times were some of the currently five GPC GitLab runner instances were not running. We are using the GCP GitLab runners from 7am until 6 pm. The crossing lines were marking those points, were the stopall and keepalives are started the moring and in the evening.

Inside Elasitc, we can see the Ansible output of the playbook runs.

Running the Ansible playbooks with GitLab

After we've created the playbooks, we would like to run them. This can be done by a simple cron job of course, but we can do this also with GitLab which gives us some benefits. The first one is, that we can use the Git repository where or Ansible playbook sources are stored. The second is, that we can use secure variables for the GCP credentials and the last but most important one is, that we can create a Ansible image which contains the correct Ansible version that we need to run this Ansible playbooks. Later we can easily migrate the Ansible playbooks to a newer version without breaking the existing ones and of course we can separate the Ansible playbook source to a separate repository.

Creating a Ansible Docker image

The Dockerfile for the Ansible image is pretty simple, because it just installs Python pip, Ansible and the dependencies which are needed for the Google GCP.

FROM ubuntu:18.04

RUN apt update && apt install -y python-pip openssh-client git && pip install requests google-auth google-api-python-client ansible==2.8.8

Keepalive Git branch

The stopall and the keepalive branch will contain mostly the same, only the triggered scripts will be different.

Here is the gitlab-ci.yml:

image: <your-gitlab-registry>/image:gcp-2.8.8-2020-02-03-01

variables:
  DOCKER_DRIVER: overlay

services:
  - docker:dind

keepalive:
  stage: build
  script:
    - ./keepalive.sh 
  tags:
    - docker-build
  only:
    - manual

keepalive-schedule:
  stage: build
  script:
    - ./keepalive.sh 
  tags:
    - docker-build
  only:
    - schedules

And the keepalive.sh contains the following:

#!/bin/bash
git clone https://gitlab+deploy-token-31:$ANSIBLE_REPOSITORY_DEPLOY_TOKEN@<your-ansible-source-repositroy>/ansible/legacy.git /var/opt/ansible-prod

cd /var/opt/ansible-prod/plays/gcp-compute-gitlabrunner-prod

./gcp_keepalive_gitlab-runner.sh

Running the GitLab schedules

At last, you need to configure some schedules to run. Finally this will look like this:

And the result will be:

Summary

This post should give you an idea about how you can run GitLab runners in GCP with preemtible instances. If you need more details, reach out for me on Twitter (or somewhere else)! Happy hacking!

Closing: Icon information

Icons made by itim2101 from www.flaticon.com.

Mario

Posted on: Sun, 16 Feb 2020 00:00:00 UTC by Mario Kleinsasser
  • Container
  • GitLab
  • GCP
  • Ansible
  • Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). Einstein said "Imagination is more important than knowledge. For knowledge is limited." - I say "The distance between faith and knowledge is infinite. (c) by me". Interesting contacts are always welcome - nice to meet you out there - if you like, do not hesitate and contact me!