The NYC taxi data set contains the rides that have been performed in yellow taxis in New York in 2015 and have been provided by the NYC Taxi and Limousine Commission. We use it to evaluate the performance of Elasticsearch for structured data. We run the following variation (which we call "challenge" in Rally):
The benchmarks are run either for an out of the box configuration of Elasticsearch but with a larger heap of 4GB. For more details please refer to the NYC taxis track specification and have a look at our benchmarking methodology).