Hugo with Lunr Search and GitLab Pipelines

Estimated reading time: 8 mins

Implementing a static site search for Hugo is not as easy as I thought 😂. But, with some online resources and a bit hacking, I was able to get the static search for our site up and running with Lunr. The Lunr index file is build via a GitLab Pipeline. Read on if you are interested how we did it.

Hugo and search

As you might know, Hugo is a static site generator which allows you to create cool websites which can be hosted on GitLab pages for example. The blog which you are currently reading is based on Hugo. The benefit is, that you do not need a database but the drawback is, that you have no search function. You could use something like a custom Google search bar but, you know, this might not be a ideal solution.

If you search on the Internet, you will quickly find a summary from Hugo about different search solutions for your static website which can be found here: https://gohugo.io/tools/search/. The list is not that long and you will quickly come to the point where you recognize that Lunr is one of the popular solutions. So let’s have a look at Lunr.

Lunr

Lunr or more exactly Lunr.js is an Apache Solr like, Javascript based, search engine which works with static index files. Therefore, you have to create such an index file which can be searched by Lunr.js afterwards. On the Hugo search suggestions, there are multiple projects which can give you some hints about it. For our page I sticked with the hugo-lunr example. This project is simple enough (for me) to act as starting point. But, we also need some Javascript so I took the GitHub Gist for Hugo Workflow gist as a boilerplate. Both projects are outdated therefore I picked together the jigsaw tiles and added some glue. So, let’s head over.

hugo-lunr (backend)

The first thing I tried was to create an index file which can be used by Lunr.js and therefore I tried the hugo-lurn Node.js project. The documentation of the project is OK, but, the Hugo Frontmatter from our site, the site your are currently reading, is a little bit different. After I read a bit about Node.js NPM’s (never worked with it before), I had to patch two files. The first one is the package.json which configures and calls the index function which will create the index from the Markdown source files. Here we go:

{
	"name": "index-hugo-content",
	"scripts": {
		"index": "hugo-lunr -i \"content/post/**\" -o static/lunr/index.json"
	},
	"dependencies": {
		"hugo-lunr": "0.0.4"
	}
}

The package.json defines a script called index which runs the hugo-lunr module which in turn takes all files from inside the content/posts folder and writes the resulting index file to static/lunr/index.json.

So far so good, but which data will be indexed? Thats why I have patched the package-index.js file from the hugo-lunr package. Our current Frontmatter for the n0r1sk.com website looks like this:

---
Title: "Hugo with Lunr Search and GitLab Pipelines"
Date: 2020-02-03
Description: ""
Aliases: []
Tags: []
Categories: ["Hugo", "Lunr", "GitLab"]
Authors: ["mario"]
Featuredimage: "/media/2020/02/03/hugo-lunr-gitlab.png"
Twitter:
  card: "summary"
  title: ""
  description: ""
  image: ""

---

Implementing a static site search 

We use a Title, Categories and of course the content of the post as such. The goal is to index this three fields which is done in line 107.

var fs = require('fs');
var glob = require('glob');
var matter = require('gray-matter');
var removeMd = require('remove-markdown');
var striptags = require('striptags');
var path = require('path');
/*
exports.hugolunr = function(){
	var h = new HugoLunr();
	return h.index();
}*/

module.exports = HugoLunr;


function HugoLunr(input, output){
	var self = this;
	var stream;
	this.list = [];

	//defaults
	this.input = 'content/**';
	this.output = 'public/lunr.json';

 	if(process.argv.indexOf("-o") != -1){ //does output flag exist?
		this.setOutput(process.argv[process.argv.indexOf("-o") + 1]); //grab the next item
	}

	if(process.argv.indexOf("-i") != -1){ //does input flag exist?
	    this.setInput(process.argv[process.argv.indexOf("-i") + 1]); //grab the next item
	}

	this.baseDir = path.dirname(this.input);
}

HugoLunr.prototype.setInput = function(input) {
	this.input = input;
}

HugoLunr.prototype.setOutput = function(output) {
	this.output = output;
}

HugoLunr.prototype.index = function(input, output){
	var self = this;

	if (input){
		self.input = input;
	}

	if (output){
		self.output = output;
	}

	self.list = [];
	self.stream = fs.createWriteStream(self.output);
	self.readDirectory(self.input);
	self.stream.write(JSON.stringify(self.list, null,4) );
	self.stream.end();
}


HugoLunr.prototype.readDirectory = function(path){
	var self = this;
	var files = glob.sync(path);
	var len = files.length;
	for (var i=0;i<len;i++){
		var stats = fs.lstatSync(files[i]);
		if (!stats.isDirectory()){
			self.readFile(files[i]);
		}
	}
  	return true;
}

HugoLunr.prototype.readFile = function(filePath){
	var self = this;
	var ext = path.extname(filePath);
	var meta = matter.read(filePath, {delims: '---', lang:'yaml'});
	if (meta.data.draft === true){
		return;
	}

	if (ext == '.md'){
		var plainText = removeMd(meta.content);
	} else {
		var plainText = striptags(meta.content);
	}

	var uri = '/' + filePath.substring(0,filePath.lastIndexOf('.'));
	uri = uri.replace(self.baseDir +'/', '');

	if (meta.data.slug !=  undefined){
		uri = path.dirname(uri) + meta.data.slug;
	}

	if (meta.data.url != undefined){
		uri = meta.data.url
	}

	var tags = [];

	if (meta.data.tags != undefined){
		tags = meta.data.tags;
	}
	console.log(meta.data);
	var item = {'uri' : uri , 'title' : meta.data.Title, 'content':plainText, 'tags':meta.data.Categories};
	self.list.push(item);
}

The result inside the resulting index file will look like this:

[
    {
        "uri": "/1-meetup-allgemeines-und-umfrage",
        "title": "1. Meetup: Allgemeines & Umfrage / General & Survey",
        "content": "p style=\"color: green; margin-bottom: 20px;\"We are planning a Docker Meetup in the next months. Therefore we have created a Google forms poll which you can see below. The poll is provided in German language as we are currently expecting only German talking people to come to our first Meetup. The description will only be provided in German language at the moment. If you do not understand German but you would like to attend (if you are from Italy or Slovenia for example), please contact us! If you are interested to hold a lightning talk or if you would like to share your Docker story (approximately 10 minutes) in English, please contact us. You can find our contact information in the left menu or you can head over to our a href=\"https://www.meetup.com/Docker-South-Austria/\" target=\"_blank\"Meetup page./a/p\n\np style=\"margin-bottom: 20px;\"Wir planen in den nächsten Monaten ein erstes Docker Meetup in Spittal an der Drau. Vorraussichtlich wird das Meetup in den Räumlichkeiten des a href=\"http://www.bfi-kaernten.at/autdehtml-5-standorte.php?pageId=bfi-spittal-38\" target=\"blank\"bfi-Spittal/a stattfinden. bAufgrund der Platzsituation müssen wir die Teilnehmeranzahl für das Meetup auf 12 Personen begrenzen!/b Die Organisation des Meetups (Agenda, RSVP, ...) wird über unsere a href=\"https://www.meetup.com/Docker-South-Austria/\" target=\"blank\"Meetup Seite/a erfolgen (Anmeldung erforderlich). Ein Termin für dieses Meetup steht noch nicht fest, da wir zuerst die Themen sammeln möchten, welche für die Teilnehmer und Teilnehmerinnen von Interesse sind. Aus diesem Grund findet ihr unterhalb dieser Zeilen eine entsprechende Umfrage, mit der Bitte diese auszufüllen./p\n\np style=\"margin-bottom: 20px;\"Die bereits vorgeschlagenen Themen kommen von Bernhard Rausch (CI/CD mit GitLab) und mir (Mario Kleinsasser, Docker 101), da dies Themenbereiche sind, die wir selber aufgrund unserer Erfahrung sehr gut kennen. Solltet ihr weitere Vorschläge haben, so könnt ihr diese gerne bei der Umfrage angeben./p\n\np style=\"margin-bottom: 20px;\"Den Zeitrahmen für das Meetup haben wir mit 2-3 Stunden festgelegt, wobei der Start des Meetups bvoraussichtlich um 18:30 Uhr/b sein wird./p\n\np style=\"margin-bottom: 20px;\"Vielen Dank für das Ausfüllen der Umfrage!/p\n\nUmfrage\n\niframe src=\"https://docs.google.com/forms/d/e/1FAIpQLScirfHgM4TfGwIQvMFp0wVs4o8tAXip9lC8ojCTTeSIWTu0DA/viewform?embedded=true&hl=de\" width=\"100%\" height=\"1100px\" frameborder=\"0\" marginheight=\"0\" marginwidth=\"0\"Loading.../iframe\n\n Meetup Treffpunkt\nWird in Kürze bekannt gegeben.\n",
        "tags": [
            "General"
        ]
    },...
]

Ok, now we have our index file, let’s head over to the frontend.

hugo-lunr (frontend)

The last thing missing, is the frontend implementation. This is done with a small Javascript file:

var idx;
var pagesIndex;

function call_search() {
    var query = $('#searchfield').val();
    res = idx.search(query).map(function(result) {
        return pagesIndex.filter(function(page) {
            return page.uri === result.ref;
        })[0];
    })
    console.log(res);
    render(res);
}

function render(result){
    output = $('#front-posts');
    output.empty();
    for (var i = 0; i < result.length; ++i) {
        output.append('<div class="row"><div class="col-12 text-center"><a href="/post/' + result[i].uri.toLowerCase() +'">' + result[i].title + '</a></div></div>');
    }
}


function initLunr() {
    // First retrieve the index file
    $.getJSON("/lunr/index.json")
        .done(function(index) {
            pagesIndex = index;
            console.log("index:", pagesIndex);

            // Set up lunrjs by declaring the fields we use
            // Also provide their boost level for the ranking
            idx = lunr(function() {
                this.field("uri", {
                    boost: 10
                });
                this.field("content");

                // ref is the result item identifier (I chose the page URL)
                this.ref("uri");

                for (var i = 0; i < pagesIndex.length; ++i) {
                    this.add(pagesIndex[i]);
                }
                
            });

        });
}

// Init Lunar to have the data on hand
initLunr();

The most important thing here ist the initLunr() function, which creates the search index from the static index file. The render(result) function just renders the items found via the Lunr query.

GitLab pipeline

To create the Lunr index out of the box, I’ve created a GitLab pipeline which creates the Lunr index file as an artifact which will be used during the Hugo build run to be included in the static resources. Here you can see the important pieces from the GitLab pipeline configuration. It is important, that the package-index.js is copied to the correct place before the NPM module is started.

stages:
  - build-lunr
  - build-hugo
  - build-image
  - deploy-image

build-lunr:
  stage: build-lunr
  image: node:13.7.0-stretch
  script:
    - npm install hugo-lunr
    - cp package-index.js node_modules/hugo-lunr/lib/index.js
    - npm run index
  artifacts:
    paths:
      - static/lunr/index.json
  only:
  - tags

build-hugo:
  stage: build-hugo
  image: monachus/hugo:v0.62.2
  script:
    - hugo
  artifacts:
    paths:
      - public/
  dependencies:
    - build-lunr
  only:
    - tags

Conclusion

This post should show, how you can create a simple static search functionality for your Hugo based website including a GitLab pipeline. If you would like to see this as a standalone NPM module, please Tweet me online. At the moment, this is a good-to-go solution for our needs. But if there are interested people out there, I can create a separate module! Thanks for reading!

Mario

Posted on: Mon, 03 Feb 2020 00:00:00 UTC by Mario Kleinsasser
  • Hugo
  • Lunr
  • GitLab
  • Doing Linux since 2000 and containers since 2009. Like to hack new and interesting stuff. Containers, Python, DevOps, automation and so on. Interested in science and I like to read (if I found the time). My motto is "𝗜𝗺𝗮𝗴𝗶𝗻𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗺𝗼𝗿𝗲 𝗶𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝘁 𝘁𝗵𝗮𝗻 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲. [Einstein]". Interesting contacts are always welcome - nice to meet you out there - if you like, do not hesitate and contact me!