Learn coding concepts by building real projects with modern technologies

Percolate Queries in ElasticSearch - How I Built an Alerts System

By Lane Wagner on Nov 14, 2019

Once upon a time, a company I worked for had a problem, we had thousands of messages flowing through our data pipeline every second, and we wanted to be able to send real-time emails, SMS, and Slack alerts when messages matching specific criteria were found. A simple solution built using ElasticSearch’s percolate queries ended up being our saving grace.

Our first failed attempt to build an alerting system utilized PipelineDB. To make a long story short, not only was that architecture rigid and hard to make changes to, it didn’t scale well and was constantly having performance issues. We would get called out by users for not sending alerts that should have triggered.

Enter ElasticSearch

elasticsearch

Elasticsearch is a NoSQL distributed database that is good for, well, searching. I would never recommend it as a transactional database for basic CRUD actions, but aggregations, metrics, and percolate queries are where it shines.

Learn Go by writing Go code

I'm a senior engineer learning Go, and the pace of Boot.dev's Go Mastery courses has been perfect for me. The diverse community in Discord makes the weekly workshops a blast, and other members are quick to help out with detailed answers and explanations.

- Daniel Gerep from Cassia, Brasil

What is a percolate query?

Percolate queries can be simply thought of as an inverse search. Instead of sending a query to an index and getting the matching documents, you send a document to an index and get the matching queries. This is exactly what most alerting systems need.

What does it look like?

From elastic’s documentation, we will create an index with a mapping (which is basically a loosey-goosy SQL schema) for an index that holds percolating queries:

PUT /my-index
{
    "mappings": {
        "properties": {
             "threshold": {
                 "type": "long"
             },
             "count": {
                 "type": "long"
             },
             "query": {
                 "type": "percolator"
             }
        }
    }
}

Now that we have an index that can store percolating queries, we can register a new query:

PUT /my-index/my-doc/1?refresh
{
    "threshold": 100,
    "query" : {
        "bool" : {
            "must": {
                "query_string": {
                    "default_field": "query_string",
                    "query": "count:>100"
                }
            }
        }
    }
}

The query object contains all the logic for percolation. If a document’s count field is greater than 100 then this query will be returned in the document’s result set. The only purpose of the threshold field is for convenience, that is, when we are doing CRUD operations on our queries, we can manage the threshold in its own field instead of parsing the query string every time.

Now, lets percolate a document and see if it matches:

GET /my-index/_search
{
    "query" : {
        "percolate" : {
            "field" : "query",
            "document" : {
                "count" : 101
            }
        }
    }
}

Response:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "my-index",
        "_type": "my-doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "threshold": 100,
          "query": {
            "bool": {
              "must": {
                "query_string": {
                  "default_field": "query_string",
                  "query": "count:>100"
                }
              }
            }
          }
        }
      }
    ]
  }
}

Because the count was greater than the threshold, the percolate query was returned! As you can see, this works great for an alerting system because users can create “alerts” which we store as percolating queries. For example, a user can create a query that triggers when a twitter post mentions their name, or when a temperature in a city is above a certain threshold.

Use it

Percolate queries are perfect for when you have an ever changing set of criteria (probably created by users) that many documents need to be checked against. I’ve used it for alerting and auto-tagging systems in the past. Let me know on twitter if you have questions or can think of another interesting use case for them!

Learn to code by building real projects

Related Reading