Overview
The geonames data set contains a lot of structured data. String fields are always indexed as text
with a raw
keyword
subfield. We run the following variations (which we call "challenges" in Rally):
- Append: Indexes the whole document corpus using Elasticsearch default settings. We only adjust the
number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns
green. Document ids are unique so all index operations are append only. After that a couple of queries are run in
parallel by multiple clients.
- Append Fast: Indexes the whole document corpus using a setup that will lead to a larger indexing
throughput than the default settings. Document ids are unique so all index operations are append only.
- Id Conflicts: Indexes the whole document corpus using a setup that will lead to a larger indexing
throughput than the default settings. Rally will produce duplicate ids in 25% of all documents (not configurable) so we
can simulate a scenario with appends most of the time and some updates in between.
The benchmarks are run either for an out of the box configuration of Elasticsearch ("default settings") or with a larger heap
of 4GB ("4g heap"). For more details please refer to the geonames track specification and
have a look at our benchmarking methodology). The benchmark results are also provided as a
Kibana
dashboard.