Elasticsearch Content Type Benchmarks

Overview

The geonames data set contains a lot of structured data. String fields are always indexed as text with a raw keyword subfield. We run the following variations (which we call "challenges" in Rally):

Append: Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run in parallel by multiple clients.
Append Fast: Indexes the whole document corpus using a setup that will lead to a larger indexing throughput than the default settings. Document ids are unique so all index operations are append only.

The benchmarks are run either for an out of the box configuration of Elasticsearch ("default settings") or with a larger heap of 4GB ("4g heap"). For more details please refer to the geonames track specification and have a look at our benchmarking methodology). The benchmark results are also provided as a Kibana dashboard.