Elasticsearch Content Type Benchmarks

Overview

The PMC data set contains scientific papers from PubMed Central ® (PMC). We use it to evaluate the performance of Elasticsearch for full-text content. We run the following variations (which we call "challenges" in Rally):

Append: Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run in parallel by multiple clients.
Append Fast: Indexes the whole document corpus using a setup that will lead to a larger indexing throughput than the default settings. Document ids are unique so all index operations are append only.

The benchmarks are run either for an out of the box configuration of Elasticsearch ("default settings") or with a larger heap of 4GB ("4g heap"). For more details please refer to the PMC track specification and have a look at our benchmarking methodology).