Overview

The NYC taxi data set contains the rides that have been performed in yellow taxis in New York in 2015 and have been provided by the NYC Taxi and Limousine Commission. We use it to evaluate the performance of Elasticsearch for structured data. We run the following variation (which we call "challenge" in Rally):

  • Append: Indexes the whole document corpus using Elasticsearch default settings. We only adjust the number of replicas as we benchmark a single node cluster and Rally will only start the benchmark if the cluster turns green. Document ids are unique so all index operations are append only. After that a couple of queries are run in parallel by multiple clients.

The benchmarks are run either for an out of the box configuration of Elasticsearch but with a larger heap of 4GB. For more details please refer to the NYC taxis track specification and have a look at our benchmarking methodology).