Benchmarking Redis on AWS ElastiCache

Amazon announced last week that the AWS ElastiCache service now supports the Redis protocol, which means that AWS is joining the crowded ranks of Redis-as-a-Service providers. AWS lets you choose among nine different instance types for your Redis to run on, which left us wondering: what's the best deal? We decided to benchmark all the available instance types to see what gives both the best raw performance, and the best bang for the buck.

We learned that only a few of the nine instance types are actually worth your money. For most users, the m1-medium, m1-large, and especially m2-xlarge instances are the best combination of price, performance, and available memory. And you should avoid the c1-xlarge and m1-xlarge instance types, as they are strictly worse than the alternatives. Read on for details about what we benchmarked and how to choose the best instance type for your application.

Evaluation Criteria

We evaluated each instance with respect to four different factors:

  • Price: What is the hourly cost of this instance? All of the numbers cited in this post are prices for unreserved instances in the US-East region as of September, 2013. Prices will be different for reserved instances and in other regions, but should be comparable to the numbers cited here.

  • Memory: How much RAM does the instance have available? This directly determines the size of your Redis, and for some applications this will be the dominant deciding factor. We found that generally speaking, instances with more memory are worse values from a price / performance ratio, so you should often choose the smallest instance that your application can get away with.

  • Latency: What's the 99th percentile latency per Redis operation? It's generally difficult to measure the network performance of EC2 instances, and the data that Amazon provides are vague (they classify network performance as "Very Low", "Low", "Moderate", or "High"). In general, we found that the worst-case latency was acceptable for all but the t1-micro and m1-small instances, and that paying more for an instance with a "High" network performance rating did not make a significant improvement on latency.

  • Throughput: How many operations per second (OPS) could the Redis server sustain? We used the redis-benchmark utility to measure throughput for a mix of different operation types, repeating the benchmark several times to ensure the results were consistent; see the "Details" section for specifics on our benchmark runs.

The Results

Here's a summarized version of the four criteria above:

The four metrics we evaluated on are shown in the first four columns. The last column, "OPS / $", is an indication of value - how much throughput you're getting per dollar spent. As you can see, there's a wide range of value here; the tiny t1-micro seems to be 150x more efficient per dollar spend than the massive m2-4xlarge. However, the t1-micro has some attributes that make it overperform on these sorts of benchmarks; more on that below.

instance type chart.png

This chart shows another view of the price / throughput ratio of hosts. Generally speaking, as you pay more, you get better performance, but some instance types have better ratios than others (in particular, the m1-medium, m1-large, and m2-xlarge offer excellent price-to-performance ratios).

So why is there so much variance in throughput, and why do the most expensive instances not outperform the mid-range instances? The answer lies in the architecture of Redis itself, which is "mostly single-threaded". One of the things you pay in more expensive EC2 instances is more cores, but those cores tend to sit idle if all the host is doing is running Redis. Amazon measures instance performance as the number of "EC2 Compute Units" (ECUs) available to an instance and the number of cores on that instance. The best performing hosts have a high ECU / core count, and the best value hosts have a low number of total cores (so you're not paying for unused CPUs).

Instance Breakdown and Recommendations

So assuming you're not memory-constrained, which instances should you pick, and which should you avoid?

Avoid High-Latency Instances

Don't use the t1-micro or m1-small instances for anything serving interactive traffic (i.e. as a cache for a web server). The 99th percentile latency for these hosts is much, much worse than the other hosts. While the price / performance ratio of the t1-micro looks good in the benchmark data, this instance type is only capable of "bursting" to high performance for short periods of time (about a minute) before being throttled by Amazon. In a longer-running benchmark, the throughput of the t1-micro would look more like the m1-small.

Some Instances are Strictly Worse

Choosing an instance usually involves a trade-off between the four factors above, but a couple of instance types are just a bad deal. The c1-xlarge is worse than both the m1-large and m2-xlarge along every dimension; choose one of those instead. Likewise, the m1-xlarge is strictly dominated by the m2-xlarge - the latter is less expensive, has more memory, performs with almost twice the throughput, and has comparable latency.

Only Pay for Extra Cores if You Need Them

The m2-2xlarge and m2-4xlarge are very expensive instances. They don't outperform the m2-xlarge (since they have the same ECU speed) but they do offer more memory (a whopping 33.8 GB and 68 GB respectively). If you absolutely must have this much memory, choose one of these instance types, but if you can get away with 16.7 GB, the m2-xlarge is a much more cost-effective choice

The Winners

The instance types that offer the best mix of the four factors are the m1-medium, m1-large, and m2-xlarge instances (in increasing order of price). The m2-xlarge has 3.25 ECUs per core, which is the faster single core available, as well as excellent network performance, and costs only $0.505 per hour. If you're on a budget, the m1-medium and m1-large offer decent performance at low prices. These instances do suffer from tighter memory constraints, but for many caching use-cases they should have ample available space.

Methodology

We collected the benchmark numbers using the command redis-benchmark -r 1000000 -n 64000000 -t get,set,lpush,lpop -P 16. This was cobbled together from the redis documentation and mailing lists.

The host running the benchmark was an extra-large instance in the same availability zone as the target Redis. While there was still some network latency inherent in these benchmarks, running redis-cli --latency showed a consistent 0.2-0.5ms of latency between our test driver and target host.

One peril with EC2 is the "noisy neighbor" problem. It's impossible to know if any of these benchmarks were influenced by heavy network traffic or CPU intensive neighbors on the same physical hardware. In particular, the benchmarked performance of m2-4xlarge in our tests was a little lower than expected considering the underlying hardware - it's possible that this was due to an oversubscribed underlying host.

Finally, we ran our benchmarks against non-durable Redis instances, meaning that we were not using replication or Redis's AOF feature.

Next steps

What else would you like to see us measure? We're thinking of running these same metrics against some of the other Redis-as-a-Service providers, as well as running your own instance (on EC2 or on "bare metal" hardware). Leave us a comment here and we'll do a "part 2" with the most popular benchmarking questions.