对Memcached和Redis有兴趣的同学不妨花几分钟读一读本文,否则请飘过。
Why Redis beats Memcached for caching | 在cache化方面,为何Redis胜过Memcached?
Memcached is sometimes more efficient, but Redis is almost always
the better choice.
有时候Memcached效率更高,但是Redis却总是更好的选择。
Memcached or Redis? It's a question that nearly always arises in any discussion about squeezing more performance out of a modern, database-driven Web application. When performance needs to be improved, caching is often the first step taken, and Memcached or Redis are typically the first places to turn.
使用Memcached还是Redis?这是个问题。每当讨论基于数据库驱动的现代web应用的时候,如何获得更多的性能提升,选择Memcached还是Redis,几乎总是会面临的一个问题。每当性能有待提升,cache化通常是迈出的第一步,而Memcached或者Redis则是典型的首选方案。
These renowned cache engines share a number of similarities, but they also have important differences. Redis, the newer and more versatile of the two, is almost always the superior choice.
这两个著名的cache引擎有许多相似之处,但它们也存在着重大区别。相对于Memcached来说,Redis更新、更灵活,几乎总是更好的选择。
The similarities | Memcached和Redis的相似性
Let's start with the similarities. Both Memcached and Redis serve as in-memory, key-value data stores, although Redis is more accurately described as a data structure store. Both Memcached and Redis belong to the NoSQL family of data management solutions, and both are based on a key-value data model. They both keep all data in RAM, which of course makes them supremely useful as a caching layer. In terms of performance, the two data stores are also remarkably similar, exhibiting almost identical characteristics (and metrics) with respect to throughput and latency.
让我们从二者的相似性开始。Memcached和Redis都是基于内存的键值对数据存储,虽然Redis可以被更准确地描述为是一个结构化的数据存储。二者都隶属于数据管理解决方案中的NoSQL大家庭,而且都基于键值对模型。它们都把所有的数据存放在内存中,这一举措让它们作为缓存(cache)层显然非常有用。在性能方面,二者也非常相似。在吞吐量和延迟方面,二者表现出几乎相同的特性(和度量)。
Both Memcached and Redis are mature and hugely popular open source projects. Memcached was originally developed by Brad Fitzpatrick in 2003 for the LiveJournal website. Since then, Memcached has been rewritten in C (the original implementation was in Perl) and put in the public domain, where it has become a cornerstone of modern Web applications. Current development of Memcached is focused on stability and optimizations rather than adding new features.
Memcached和Redis都是成熟的且非常受欢迎的开源项目。Memcached最初是由Brad Fitzpatrick在2003年为LiveJournal网站开发的,代码实现采用的是Perl语言。此后,使用C语言对Memcached进行了重写,并将源代码开放到公共领域,Memcached进而成为了现代web应用的基石。目前其开发工作的重点在稳定性和缓存优化方面,而不是为其增添新的功能。
Redis was created by Salvatore Sanfilippo in 2009, and Sanfilippo remains the lead developer of the project today. Redis is sometimes described as "Memcached on steroids," which is hardly surprising considering that parts of Redis were built in response to lessons learned from using Memcached. Redis has more features than Memcached and is, thus, more powerful and flexible.
Redis是由Salvatore Sanfilippo在2009创建的,迄今为止,他依然是这一项目的首席开发人员。有时候Redis被描述为"类固醇的memcached", 这并不足为奇,因为它部分汲取了Memcached的教训。Redis比Memcached功能更多一些,因此,它更强大和更灵活。
Used by many companies and in countless mission-critical production environments, both Memcached and Redis are supported by client libraries in every conceivable programming language, and it's included in a multitude of packages for developers. In fact, it's a rare Web stack that does not include built-in support for either Memcached or Redis.
许多公司和无数的关键生产环境中都在使用Memcached和Redis。你能想得到的每一种编程语言的客户端库都对二者进行支持,被包含在大量的开发包中。事实上,客户端库是一种特殊的Web栈,不包括任何的对Memcached或Redis的内置支持。
Why are Memcached and Redis so popular? Not only are they extremely effective, they're also relatively simple. Getting started with either Memcached or Redis is considered easy work for a developer. It takes only a few minutes to set up and get them working with an application. Thus, a small investment of time and effort can have an immediate, dramatic impact on performance -- usually by orders of magnitude. A simple solution with a huge benefit; that's as close to magic as you can get.
为什么Memcached和Redis如此受欢迎?因为它们不但有效而且简单。使用Memcached或Redis对开发人员来说很容易上手,之需要几分钟就可以设置好而且运用到应用程序中。因此,投入少量的时间和精力就可以对性能产生直接的巨大的影响,而且通常是按数量级别地提升性能。一个简单的解决方案就能产生巨大的好处,这有如变魔术一样。
When to use Memcached | 嘛时候用Mecached?
Because Redis is newer and has more features than Memcached, Redis is almost always the better choice. However, Memcached could be preferable when caching relatively small and static data, such as HTML code fragments. Memcached's internal memory management, while not as sophisticated as that of Redis, is more efficient in the simplest use cases because it consumes comparatively less memory resources for metadata. Strings (the only data type supported by Memcached) are ideal for storing data that's only read, because strings require no further processing.
Redis之所以总是最好的选择,是因为Redis比Memcached更新,而且功能更多。然而,当对比较小的静态数据(例如HTML代码片段)做cache的时候,Memcached则更受青睐。跟Redis比起来,Memcached的内部内存管理并不复杂,而且在简单用例中使用的话更加高效,因为在创建元数据方面消耗更少的内存资源。Memcached唯一支持的数据类型就是字符串。当用来存储只读数据的时候,字符串无疑是理想的选择,因为字符串不需要进一步的处理。
That said, Memcached's memory management efficiency diminishes quickly when data size is dynamic, at which point Memcached's memory can become fragmented. Also, large data sets often involve serialized data, which always requires more space to store. While Memcached is effectively limited to storing data in its serialized form, the data structures in Redis can store any aspect of the data natively, thus reducing serialization overhead.
也就是说,当数据尺寸是动态的时候,Memcached的内存管理效率将急剧下降,其内存就会碎片化。此外,大数据集通常涉及到序列化数据,从而总是需要更多的存储空间。在存储序列化的数据时,Memcached捉襟见肘,而Redis数据结构可以存储任何本地数据,从而减少了存储序列化数据的开销。
The second scenario in which Memcached has an advantage over Redis is in scaling. Because Memcached is multithreaded, you can easily scale up by giving it more computational resources, but you will lose part or all of the cached data (depending on whether you use consistent hashing). Redis, which is mostly single-threaded, can scale horizontally via clustering without loss of data. Clustering is an effective scaling solution, but it is comparatively more complex to set up and operate.
Memcached比Redis更占优势的第二个地方就是可伸缩性。因为Memcached是多线程的,对它进行垂直扩展很容易,通过给予它更多的计算资源就可以了。但是,你可能会失去部分或者全部的缓存数据(取决于你是否使用一致性哈希算法)。Redis是单线程的,可以通过集群进行水平扩展而且保证数据不丢失。集群化是一种有效的可伸缩性解决方案,但是设置和操作都比较复杂。
When to use Redis | 嘛时候用Redis?
You'll almost always want to use Redis because of its data structures. With Redis as a cache, you gain a lot of power (such as the ability to fine-tune cache contents and durability) and greater efficiency overall. Once you use the data structures, the efficiency boost becomes tremendous for specific application scenarios.
之所以总是要使用Redis,是因为其数据结构。将Redis作为缓存,你就获得了很多能量(比如拥有对缓存内容和持久性进行微调的能力)和更大的整体效率。一旦使用了Redis的数据结构,那么对于特定的应用场景来说,效率提升将变得非常巨大。
Redis' superiority is evident in almost every aspect of cache management. Caches employ a mechanism called data eviction to make room for new data by deleting old data from memory. Memcached's data eviction mechanism employs a Least Recently Used algorithm and somewhat arbitrarily evicts data that's similar in size to the new data.
在缓存管理中,Redis几乎在每一个方面都具有明显的优势。缓存使用一种称之为数据驱逐的机制,通过删除内存中的旧数据为新数据腾出空间。Memcached的数据驱逐机制采用最近最少使用算法,在某种程度上可以说是任意地驱逐与新数据尺寸大小类似的旧数据。
Redis, by contrast, allows for fine-grained control over eviction, letting you choose from six different eviction policies. Redis also employs more sophisticated approaches to memory management and eviction candidate selection. Redis supports both lazy and active eviction, where data is evicted only when more space is needed or proactively. Memcached, on the other hand, provides lazy eviction only.
相反地,Redis允许细粒度的驱逐控制,让用户从六个不同的回收策略之中进行选择。Redis还采用更复杂的方法来做内存管理和选择被驱逐的候选人。Redis既支持延迟驱逐,也支持立即驱逐。数据被驱逐,要么是需要更多的空间,要么主动要求被驱逐。而Memcached只提供延迟驱逐。
Redis gives you much greater flexibility regarding the objects you can cache. While Memcached limits key names to 250 bytes and works with plain strings only, Redis allows key names and values to be as large as 512MB each, and they are binary safe. Plus, Redis has five primary data structures to choose from, opening up a world of possibilities to the application developer through intelligent caching and manipulation of cached data.
在给对象做缓存方面,Redis给用户提供了更大的灵活性。Memcached将键名限定为250个字节,而且只能当作普通字符串使用。Redis则允许使用最大为512MB的键名和键值,而且保证二进制安全。在Redis中,有5个主要的数据结构可供选择,应用程序开发人员从而可以对数据进行智能缓存,然后操纵缓存数据。
Beyond caching | 不止于cache化
Using Redis data structures can simplify and optimize several tasks -- not only while caching, but even when you want the data to be persistent and always available. For example, instead of storing objects as serialized strings, developers can use a Redis Hash to store an object's fields and values, and manage them using a single key. Redis Hash saves developers the need to fetch the entire string, deserialize it, update a value, reserialize the object, and replace the entire string in the cache with its new value for every trivial update -- that means lower resource consumption and increased performance.
使用Redis数据结构可以简化和优化一些任务--不但能cache化,而且在cache化的时候支持持久地保存数据和持续地访问数据。例如,开发人员可以使用Redis哈希表来存储对象的字段和值,使用一个单一的Key来管理对象,而不需要存储对象的序列化字符串。开发人员使用Redis哈希,可以读取整个字符串,把字符串序列化,更新一个值,将对象重新序列化,和使用新的值替换cache中的整个字符串(在频繁地更新中)--这意味着资源消耗更少,性能更高。
Other data structures offered by Redis (such as lists, sets, sorted sets, hyperloglogs, bitmaps, and geospatial indexes) can be used to implement even more complex scenarios. Sorted sets for time-series data ingestion and analysis is another example of a Redis data structure that offers enormously reduced complexity and lower bandwidth consumption.
Redis提供的其他数据结构(例如列表, 集合, 有序集合, hyperloglog, 位图和地理空间索引)可以用来实现更复杂的应用场景。用有序集合做基于时间序列的数据抓取和分析,极大地降低了复杂度,而且带宽消耗比较低。
Another important advantage of Redis is that the data it stores isn't opaque, so the server can manipulate it directly. A considerable share of the 180-plus commands available in Redis are devoted to data processing operations and embedding logic in the data store itself via server-side Lua scripting. These built-in commands and user scripts give you the flexibility of handling data processing tasks directly in Redis without having to ship data across the network to another system for processing.
Redis的另一个重要的优势是,它存储的数据是透明的,所以server可以直接操作数据。Redis有超过180个命令可用,还有一些服务器端的Lua脚本,这些脚本致力于数据处理和内置逻辑数据存储。这些内置的命令和用户脚本,使数据处理更灵活,而不需要将数据通过网络传输到另一个系统上去处理。
Redis offers optional and tunable data persistence designed to bootstrap the cache after a planned shutdown or an unplanned failure. While we tend to regard the data in caches as volatile and transient, persisting data to disk can be quite valuable in caching scenarios. Having the cache's data available for loading immediately after restart allows for much shorter cache warm-up and removes the load involved in repopulating and recalculating cache contents from the primary data store.
Redis提供可选的和可调的数据持久化,该设计用于自动引导cache, 在有计划的停机或者遭遇意外的错误后。虽然我们倾向于把缓存中的数据看作是不稳定的和短暂的,但在缓存场景中,将数据持久化到磁盘上是非常有价值的。在重启后理解加载可用的缓存数据,使缓存预热时间更短,避免了从主存储数据中重建和重新计算缓存内容的工作负载。
Data replication too | 还有数据复制哟
Redis can also replicate the data that it manages. Replication can be used for implementing a highly available cache setup that can withstand failures and provide uninterrupted service to the application. A cache failure falls only slightly short of application failure in terms of the impact on user experience and application performance, so having a proven solution that guarantees the cache's contents and service availability is a major advantage in most cases.
Redis也可以复制它所管理的数据。复制可以用来实现一个高可用的缓存设置,该设置能够防御故障,为应用提供连续的服务。从对用户体验和对应用程序性能的影响角度看,缓存失败仅仅略低于应用程序失败。因此,在大多数情况下,拥有确保缓存内容和服务可用性的经过验证的解决方案则优势明显。
Last but not least, in terms of operational visibility, Redis provides a slew of metrics and a wealth of introspective commands with which to monitor and track usage and abnormal behavior. Real-time statistics about every aspect of the database, the display of all commands being executed, the listing and managing of client connections -- Redis has all that and more.
最后但并非最不重要的,在操作的可见性方面,Redis提供了一系列的方法和丰富的内建的命令,监视和跟踪使用情况和异常行为。在实时统计数据库方面,Redis功能强大。例如,显示所有的命令执行,客户端连接管理和列举。
When developers realize the effectiveness of Redis’ persistence and in-memory replication capabilities, they often use it as a first-responder database, usually to analyze and process high-velocity data and provide responses to the user while a secondary (often slower) database maintains a historical record of what happened. When used in this manner, Redis can also be ideal for analytics use cases.
当开发人员意识到Redis在持久性和内存复制方面的高效性的时候,他们经常把Redis作为第一响应数据库,通常用来分析和处理高速数据,并为通常较慢的数据库(维护历史记录)提供响应。用来做数据分析,Redis是比较理想的。
Redis for analytics | 做数据分析,Redis说:"我能"
Three analytics scenarios come immediately to mind. In the first scenario, when using something like Apache Spark to iteratively process large data sets, you can use Redis as a serving layer for data previously calculated by Spark. In the second scenario, using Redis as your shared, in-memory, distributed data store can accelerate Spark processing speeds by a factor of 45 to 100. Finally, an all too common scenario is one in which reports and analytics need to be customizable by the user, but retrieving data from inherently batch data stores (like Hadoop or an RDBMS) takes too long. In this case, an in-memory data structure store such as Redis is the only practical way of getting submillisecond paging and response times.
此时此刻,有三个场景映入我的脑海里。场景一,当使用像Apache Spark迭代处理大数据集的时候,可以使用Redis为Spark先前计算的数据做一个服务层。场景二,使用Redis作为共享的基于内存的分布式数据存储,可以加速Spark处理速度(45-100个因子)。场景三,一个非常常见的场景。用户需要定制报表和分析,但是从固有的批处理数据存储(如Hadoop或RDBMS)中检索数据需要花费很长时间。在这种情况下,使用基于内存中的数据结构存储(如Redis)是获得亚毫秒级翻页和响应时间的唯一可行办法。
When using extremely large operational data sets or analytics workloads, running everything in-memory might not be cost effective. To achieve submillisecond performance at lower cost, Redis Labs created a version of Redis that runs on a combination of RAM and flash, with the option to configure RAM-to-flash ratios. While this opens up several new avenues to accelerate workload processing, it also gives developers the option to simply run their “cache on flash.”
当使用非常大的操作数据集或分析非常大的工作负载时,将所有内容都运行在内存中可能成本很高。以较低的成本达到亚微秒级的性能,Redis实验室创建了一个版本的Redis,该版本联合使用RAM和FLASH,RAM与Flash的比例是可以选择与配置的。这为加速工作负载处理开辟了一些新的途径,也让开发者可以简单地在Flash上运行他们的Cache。
Open source software continues to provide some of the best technologies available today. When it comes to boosting application performance through caching, Redis and Memcached are the most established and production-proven candidates. However, given its richer functionality, more advanced design, many potential uses, and greater cost efficiency at scale, Redis should be your first choice in nearly every case.
时至今天,开源软件继续提供着一些最好的技术。当谈到通过缓存大幅度地提高应用程序的性能,Redis和Memcached是业界最好的且已经经过实践检验的最佳选择。然而,考虑到功能更丰富,设计更先进,潜在的许多用途,和成本效益更大的话,Redis近乎应该成为首要选择。