Elasticseach: search examples with Python

2 minute read

This post will focus on performing search calls on elasticsearch index using the elasticsearch-python library. Let’s start

Basic Search (match_all)

The following

body={
    "query": {
        "match_all": {}
    }
}

es = Elasticsearch("http://my.elasticsearch:9200")
res = es.search(index="some_index", body=body)
print(res)

The result looks like this (I reduced the hits part from 10 items to 2)

{
   "took":769,
   "timed_out":false,
   "_shards":{
      "total":1,
      "successful":1,
      "skipped":0,
      "failed":0
   },
   "hits":{
      "total":{
         "value":10000,
         "relation":"gte"
      },
      "max_score":1.0,
      "hits":[
         {
            "_index":"logstash",
            "_type":"_doc",
            "_id":"XaxaaUBe_Owq5UYnY1R",
            "_score":1.0,
            "_source":{
               "job":"build-image",
               "build":2
            }
         },
         {
            "_index":"logstash",
            "_type":"_doc",
            "_id":"WdasdadBe_Owq5UYnY1R",
            "_score":1.0,
            "_source":{
               "job":"compile",
               "build":5
            }
         },
         ...
      ]
   }
}

The interesting thing we immediately learn from this output is that it doesn’t really returns all documents/hits. I got back 10 documents and that’s a good thing, as you probably wouldn’t want to get 100,000 documents back.

Count the number of hits

Counting the number of hits can be done this way: print("Number of hits:", len(res["hits"]["hits"]))

Increase hits size

Use the size argument in your search call: res = es.search(index="logstash", body=body, size=20)

Search for word in a given field

From now on, the only thing will be changing is the body variable. So I won’t be showing the whole code, only the parts that are changing.

body={
    "query": {
        "match": {
            "job": {
                "query": "compile"
            }
        }
    }
}

Running the above code will basically check in which documents, the field job includes the word compile. You’ll get back all the documents where this condition is met.

Search for multiple words in a given field

You can also search for multiple words. There basically nothing you need to change in the above code except adding the words you are interested in. The reason is that by default something like "query": "compile build install" is treated like "query": "compile OR build OR install"

body={
    "query": {
        "match": {
            "job": {
                "query": "compile"
            }
        }
    }
}

Search for a sentence in a given field

But you don’t always want to search with OR. Sometimes you would like to search for documents where a field contains all the words you mention . For that you can use the AND operator. Note that the order doesn’t matter. It will simply check the words (separated by space) are in the value of the field.

body={
    "query": {
        "match": {
            "job": {
                "query": "compile the package",
                "operator": "AND"
            }
        }
    }
}

This Yoda phrasing works exactly the same way:

body={
    "query": {
        "match": {
            "job": {
                "query": "the compile package",
                "operator": "AND"
            }
        }
    }
}

Count each unique value of a given field

Aggregations is a powerful tool, allowing you run much more complex queries. Let’s see any example where we will count in how many documents each unique value of the field job is included.

body={
    "size": 0,
    "aggs": {
        "group_by_job": {
            "terms": {
                "size": 4000,
                "field": "job.keyword"
            }
        }
    }
}

Note that we set the size in the body to 0 to not show any documents and under the aggs we set it to 4000 to get all the aggregation results. This is basically how you control size from the body itself instead of es.search method.

The result looks like this:

...
{
   "aggregations":{
      "group_by_job":{
         "doc_count_error_upper_bound":0,
         "sum_other_doc_count":1506,
         "buckets":[
            {
               "key":"compile",
               "doc_count":1569
            },
            {
               "key":"build-image",
               "doc_count":873
            }
         ]
      }
   }
}
...

Comments