Overview
The PMC data set contains scientific papers from PubMed Central ® (PMC).
We use it to evaluate the performance of Elasticsearch for full-text content. We run the following variations (which we call
"challenges" in Rally):
- Append: Indexes the whole document corpus using Elasticsearch default settings. We only adjust the
number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns
green. Document ids are unique so all index operations are append only. After that a couple of queries are run in
parallel by multiple clients.
- Append Fast: Indexes the whole document corpus using a setup that will lead to a larger indexing
throughput than the default settings. Document ids are unique so all index operations are append only.
The benchmarks are run either for an out of the box configuration of Elasticsearch ("default settings") or with a larger heap
of 4GB ("4g heap"). For more details please refer to the PMC track specification and have a
look at our benchmarking methodology).