DESIGN 设计

原文链接: https://nsq.io/overview/design.html

译文

NOTE: for accompanying visual illustration see this slide deck.

注:有关附图的可视化说明,请参见此幻灯片组

NSQ is a successor to simplequeue (part of simplehttp) and as such is designed to (in no particular order):

NSQ简单队列 (简单http协议的一部分)的继承者,因此被设计成(没有特定的顺序):

  • support topologies that enable high-availability and eliminate SPOFs

​ 支持支撑高可用性和消除单点故障的的拓扑

  • address the need for stronger message delivery guarantees

​ 解决对更强的消息交付保证的需求

  • bound the memory footprint of a single process (by persisting some messages to disk)

​ 绑定单个进程的内存占用(将一些消息进行持久化到硬盘)

  • greatly simplify configuration requirements for producers and consumers

​ 大大简化了生产者和消费者的配置需求

  • provide a straightforward upgrade path

​ 提供一个简单的升级路径

  • improve efficiency

​ 提高效率

Simplifying Configuration and Administration 简化配置和管理

A single nsqd instance is designed to handle multiple streams of data at once.

一个“nsqd”实例,被设计成可以同时处理多个数据流。

Streams are called “topics” and a topic has 1 or more “channels”.

流访问到一个或者多个“主题”,会有对应的1个或者多个“通道”

Each channel receives a copy of all the messages for a topic.

每个通道接收 一个或者多个消息的拷贝到“主题”

In practice, a channel maps to a downstream service consuming a topic.

实际上,一个通道映射只映射到对应消费者服务的一个主题

Topics and channels are not configured a priori.

主题和通道并不是预先设置好的

Topics are created on first use by publishing to the named topic or by subscribing to a channel on the named topic.

主题在首次使用时创建,方法是将其发布到指定主题,或者订阅指定主题上的通道。

Channels are created on first use by subscribing to the named channel.

通道是通过订阅指定的通道在第一次使用时创建的。

Topics and channels all buffer data independently of each other, preventing a slow consumer from causing a backlog for other channels (the same applies at the topic level).

主题和通道所有缓冲区数据彼此独立,防止慢速使用者导致其他通道的积压(同样适用于主题级别)

A channel can, and generally does, have multiple clients connected.

一个通道可以(通常也确实)连接多个客户机。

Assuming all connected clients are in a state where they are ready to receive messages, each message will be delivered to a random client.

假设所有连接的客户机都处于准备接收消息的状态,每条消息都将被发送到一个随机的客户机。

For example:

nsqd clients

To summarize, messages are multicast from topic -> channel (every channel receives a copy of all messages for that topic) but evenly distributed from channel -> consumers (each consumer receives a portion of the messages for that channel).

简而言之 ,消息从主题->通道进行广播(每个通道接收该主题的所有消息的副本),并且可以均匀的从通道->各消费者(每个使用者接收该通道的一部分消息)

NSQ also includes a helper application, nsqlookupd, which provides a directory service where consumers can lookup the addresses of nsqd instances that provide the topics they are interested in subscribing to.

NSQ还包括一个助手应用程序nsqlookupd ,它提供了一个目录服务,用户可以在其中查找' nsqd '实例的地址,这些实例提供了他们感兴趣的订阅主题。

In terms of configuration, this decouples the consumers from the producers (they both individually only need to know where to contact common instances of nsqlookupd, never each other), reducing complexity and maintenance.

在架构方面,这将消费者与生产者分离开来(它们都只需要知道在哪里与nsqlookupd的公共实例联系,而不需要彼此联系),减少复杂性和维护。

At a lower level each nsqd has a long-lived TCP connection to nsqlookupd over which it periodically pushes its state.

在低级别情况下,每个nsq都有一个场链接的TCP链接到nsqlookupd,周期性的进行推送。

This data is used to inform which nsqd addresses nsqlookupd will give to consumers.

此数据用于通知nsqupd将向消费者提供哪些nsqd地址。

For consumers, an HTTP /lookup endpoint is exposed for polling.

对于消费者来说,公开一个HTTP /lookup端点用于轮询。

To introduce a new distinct consumer of a topic, simply start up an NSQ client configured with the addresses of your nsqlookupd instances.

引入一个主题的不同的新消费者,只需启动一个NSQ客户端,该客户端配置了您的nsqlookupd实例的地址。

There are no configuration changes needed to add either new consumers or new publishers, greatly reducing overhead and complexity.

不需要更改配置就可以添加新的使用者或新的发布者,从而大大降低了开销和复杂性。

NOTE: in future versions, the heuristic nsqlookupd uses to return addresses could be based on depth, number of connected clients, or other “intelligent” strategies.

注意:在未来的版本中,用于返回地址的启发式nsqlookupd可以基于深度、连接的客户端数量或其他“智能”策略。

The current implementation is simply all.

当前的实现就是全部

Ultimately, the goal is to ensure that all producers are being read from such that depth stays near zero.

最后,我们的目标是确保读取所有生产者的数据时深度保持在接近零的水平。

It is important to note that the nsqd and nsqlookupd daemons are designed to operate independently, without communication or coordination between siblings.

需要注意的是,nsqdnsqlookupd守护进程被设计为独立运行,没有兄弟姐妹之间的通信或协调。

We also think that it’s really important to have a way to view, introspect, and manage the cluster in aggregate.

我们还认为,有一种方法来整体地查看、反省和管理集群非常重要。

We built nsqadmin to do this.

我们构建了nsqadmin来实现这一点。

It provides a web UI to browse the hierarchy of topics/channels/consumers and inspect depth and other key statistics for each layer.

它提供了一个web UI来浏览主题/通道/消费者的层次结构,并检查每个层的深度和其他关键统计信息。

Additionally it supports a few administrative commands such as removing and emptying a channel (which is a useful tool when messages in a channel can be safely thrown away in order to bring depth back to 0).

此外,它还支持一些管理命令,比如删除和清空通道(可以将通道中的消息清零,这是一个有用的工具)

nsqadmin

Straightforward Upgrade Path 简单的升级路径

This was one of our highest priorities.

这是我们的首要任务之一。

Our production systems handle a large volume of traffic, all built upon our existing messaging tools, so we needed a way to slowly and methodically upgrade specific parts of our infrastructure with little to no impact.

我们的生产系统处理大量的流量,所有这些都是基于我们现有的消息传递工具。因此,我们需要一种方法来缓慢而有条不紊地升级我们基础设施的特定部分,而不会产生什么影响。

First, on the message producer side we built nsqd to match simplequeue.

首先,在消息生产者端,我们构建了nsqd来匹配简单队列

Specifically, nsqd exposes an HTTP /put endpoint, just like simplequeue, to POST binary data (with the one caveat that the endpoint takes an additional query parameter specifying the “topic”).

在此声明,nsqd公开HTTP /put 方法,就像简单队列一样,用来发布二进制数据(有一点需要注意,端点接受一个指定“topic”的附加查询参数)

Services that wanted to switch to start publishing to nsqd only have to make minor code changes.

想要切换到nsqd开始发布的服务只需要对代码做一些小的修改。

Second, we built libraries in both Python and Go that matched the functionality and idioms we had been accustomed to in our existing libraries.

其次,我们用Python和Go构建了与我们在现有库中习惯的功能和习惯用法相匹配的库。

This eased the transition on the message consumer side by limiting the code changes to bootstrapping.

通过将代码更改限制为引导,这简化了消息使用者端的转换。

All business logic remained the same.

所有业务逻辑保持不变。

Finally, we built utilities to glue old and new components together.

最后,我们构建了一些实用程序来将新旧组件粘合在一起。

These are all available in the examplesdirectory in the repository:

这些都可以在库中的示例目录中找到:

  • nsq_pubsub - expose a pubsub like HTTP interface to topics in an NSQ cluster

nsq_pubsub - 向NSQ集群中的主题公开类似于HTTP的' pubsub '接口

  • nsq_to_file - durably write all messages for a given topic to a file

nsq_to_file - 持久地将给定主题的所有消息写入文件

  • nsq_to_http - perform HTTP requests for all messages in a topic to (multiple) endpoints

nsq_to_http - 将主题中的所有消息执行HTTP请求到(多个)端点

Eliminating SPOFs 消除单点故障

NSQ is designed to be used in a distributed fashion.

NSQ被设计成以分布式方式使用。

nsqd clients are connected (over TCP) to all instances providing the specified topic.

nsqd客户端(通过TCP)连接到提供指定主题的所有实例。

There are no middle-men, no message brokers, and no SPOFs:

他们没有中间件,没有消息代理,也没有单点故障

nsq clients

This topology eliminates the need to chain single, aggregated, feeds.

这种拓扑结构消除了链接单个、聚合的提要的需要。

Instead you consume directly from allproducers.

相反,您直接从所有生产者消费。

Technically, it doesn’t matter which client connects to which NSQ, as long as there are enough clients connected to all producers to satisfy the volume of messages, you’re guaranteed that all will eventually be processed.

从技术上来说,那个哪个客户机连接到哪个NSQ并不重要,只要有足够的客户机连接到所有生产者,以满足消息量,您可以保证所有这些最终都会被处理。

For nsqlookupd, high availability is achieved by running multiple instances.

对于nsqlookupd,高可用性是通过运行多个实例来实现的。

They don’t communicate directly to each other and data is considered eventually consistent.

它们之间不直接通信,数据最终被认为是一致的。

Consumers poll all of their configured nsqlookupdinstances and union the responses.

使用者轮询所有配置的nsqlookupd 实例并将响应联合起来。

Stale, inaccessible, or otherwise faulty nodes don’t grind the system to a halt.

过时的、不可访问的或其他有缺陷的节点 都不会使系统停机。

Message Delivery Guarantees 持续交付的保障

NSQ guarantees that a message will be delivered at least once, though duplicate messages are possible.

NSQ保证消息至少被传递一次,尽管重复的消息是可能的。

Consumers should expect this and de-dupe or perform idempotent operations.

消费者应该预料到这一点,并使用删除功能或执行幂等操作。

This guarantee is enforced as part of the protocol and works as follows (assume the client has successfully connected and subscribed to a topic):

此保证作为协议的一部分强制执行,其工作原理如下(假设客户机已经成功连接并订阅了主题):

  1. client indicates they are ready to receive messages

客户端表示他们已经准备好接收消息

  1. NSQ sends a message and temporarily stores the data locally (in the event of re-queue or timeout)

NSQ发送消息并临时在本地存储数据(事件在重新查询或者超时)

  1. client replies FIN (finish) or REQ (re-queue) indicating success or failure respectively. If client does not reply NSQ will timeout after a configurable duration and automatically re-queue the message

客户端相应FIN(finish)或者REQ(重新查询)分别表示成功或者失败。如果客户端没有回复NSQ将超时后,可配置的持续时间和自动重新排队的消息

This ensures that the only edge case that would result in message loss is an unclean shutdown of an nsqdprocess.

这确保了唯一会导致消息丢失的边缘情况是nsqd进程的不完美关闭。

In that case, any messages that were in memory (or any buffered writes not flushed to disk) would be lost.

在这种情况下,内存中的任何消息(或没有刷新到磁盘的任何缓冲写)都将丢失。

If preventing message loss is of the utmost importance, even this edge case can be mitigated.

如果防止消息丢失是最重要的,那么即使是这种边缘情况也可以减轻。

One solution is to stand up redundant nsqd pairs (on separate hosts) that receive copies of the same portion of messages.

一种解决方案是建立冗余的nsqd对(在单独的主机上),这些nsqd对接收相同部分消息的副本。

Because you’ve written your consumers to be idempotent, doing double-time on these messages has no downstream impact and allows the system to endure any single node failure without losing messages.

因为您将消费者编写为幂等的,所以对这些消息进行两次处理不会对下游产生影响,并且允许系统在不丢失消息的情况下忍受任何单个节点故障。

The takeaway is that NSQ provides the building blocks to support a variety of production use cases and configurable degrees of durability.

要点是NSQ提供了构建块来支持各种生产用例和可配置的持久性程度。

Bounded Memory Footprint 限制内存占用

nsqd provides a configuration option --mem-queue-size that will determine the number of messages that are kept in memory for a given queue.

nsqd 提供一个配置选项 --mem-queue-size 这将确定为给定队列保存在内存中的消息数量。

If the depth of a queue exceeds this threshold messages are transparently written to disk.

如果队列的深度超过此阈值,则将消息透明地写入磁盘。

This bounds the memory footprint of a given nsqd process to mem-queue-size * #_of_channels_and_topics:

这将限制给定“nsqd”进程的内存占用 mem-queue-size * #_of_channels_and_topics:

message overflow

Also, an astute observer might have identified that this is a convenient way to gain an even higher guarantee of delivery by setting this value to something low (like 1 or even 0).

此外,精明的观察者可能已经发现,通过将这个值设置为较低的值(如1或甚至0),这是获得更高交付保证的一种方便方法。

The disk-backed queue is designed to survive unclean restarts (although messages might be delivered twice).

磁盘支持的队列被设计为在不干净的重启之后仍然存在(尽管消息可能被传递两次)。

Also, related to message delivery guarantees, clean shutdowns (by sending a nsqd process the TERM signal) safely persist the messages currently in memory, in-flight, deferred, and in various internal buffers.

此外,与消息交付保证相关,清理关闭(通过发送术语信号nsqd进程)可以安全地将当前的消息保存在内存中、在运行中、延迟中以及各种内部缓冲区中。

Note, a topic/channel whose name ends in the string #ephemeral will not be buffered to disk and will instead drop messages after passing the mem-queue-size.

注意,名称以字符串#ephemeral结尾的主题/通道将不会被缓冲到磁盘,而是在传递“内存队列大小”之后删除消息。

This enables consumers which do not need message guarantees to subscribe to a channel.

这允许不需要消息保证的消费者订阅通道。

These ephemeral channels will also disappear after its last client disconnects.

这些临时通道在其最后一个客户机断开连接后也将消失。

For an ephemeral topic, this implies that at least one channel has been created, consumed, and deleted (typically an ephemeral channel).

对于临时主题,这意味着至少创建、使用和删除了一个通道(通常是临时通道)。

Efficiency 效率

NSQ was designed to communicate over a “memcached-like” command protocol with simple size-prefixed responses.

NSQ被设计成通过一个“类似memcache”的命令协议进行通信,使用简单的大小前缀响应。

All message data is kept in the core including metadata like number of attempts, timestamps, etc.

所有消息数据都保存在内核中,包括尝试次数、时间戳等元数据。

This eliminates the copying of data back and forth from server to client, an inherent property of the previous toolchain when re-queueing a message.

这消除了将数据从服务器来回复制到客户机的操作,这是在重新排队消息时,前一个工具链的固有属性。

This also simplifies clients as they no longer need to be responsible for maintaining message state.

这还简化了客户机,因为它们不再需要负责维护消息状态。

Also, by reducing configuration complexity, setup and development time is greatly reduced (especially in cases where there are >1 consumers of a topic).

此外,通过降低配置复杂性,设置和开发时间也大大减少(特别是在主题有>1消费者的情况下)。

For the data protocol, we made a key design decision that maximizes performance and throughput by pushing data to the client instead of waiting for it to pull.

对于数据协议,我们做出了一个关键的设计决策,通过将数据推送到客户机而不是等待它被拉出,从而最大限度地提高性能和吞吐量。

This concept, which we call RDY state, is essentially a form of client-side flow control.

这个概念呢,我们称之为RDY状态,本质上是客户端流控制的一种形式

When a client connects to nsqd and subscribes to a channel it is placed in a RDY state of 0.

当客户端连接到nsqd并订阅通道时,它将处于RDY状态0。

This means that no messages will be sent to the client.

这意味着不会向客户机发送任何消息。

When a client is ready to receive messages it sends a command that updates its RDY state to some # it is prepared to handle, say 100.

当客户机准备接收消息时,它发送一个命令,将其RDY状态更新为它准备处理的某个#,比如100。

Without any additional commands, 100 messages will be pushed to the client as they are available (each time decrementing the server-side RDY count for that client).

如果没有任何附加命令,100条消息将在可用时推送到客户机(每次减少该客户机的服务器端RDY计数)。

Client libraries are designed to send a command to update RDY count when it reaches ~25% of the configurable max-in-flight setting (and properly account for connections to multiple nsqd instances, dividing appropriately).

客户端库被设计用来发送一个命令来更新RDY计数,当它达到可配置的max-in-flight设置的25%左右时(并且正确地解释到多个nsqd实例的连接,并进行适当划分)。

nsq protocol

This is a significant performance knob as some downstream systems are able to more-easily batch process messages and benefit greatly from a higher max-in-flight.

这是一个重要的滑动窗口组件,因为一些下游系统能够更容易地批处理消息,并从更高的max-in-flight中受益匪浅。

Notably, because it is both buffered and push based with the ability to satisfy the need for independent copies of streams (channels), we’ve produced a daemon that behaves like simplequeue and pubsub combined .

值得注意的是,由于它同时基于缓冲 加入,并且能够满足流(通道)的独立副本的需要,所以我们生成了一个守护进程,其行为类似于“simplequeue”和“pubsub”combined

This is powerful in terms of simplifying the topology of our systems where we would have traditionally maintained the older toolchain discussed above.

这在简化我们的系统拓扑结构方面非常强大,我们通常会在这些拓扑结构中维护上面讨论的旧工具链。

Go

We made a strategic decision early on to build the NSQ core in Go.

我们在早期就做出了一个战略决定,即在Go中构建NSQ核心。

We recently blogged about our use of Go at bitly and alluded to this very project - it might be helpful to browse through that post to get an understanding of our thinking with respect to the language.

我们最近写了一篇关于我们使用围棋的博客(http://word.bitly.com/post/29550171827/go-go-gadget),并提到了这个项目——浏览这篇文章可能有助于理解我们对这门语言的看法。

Regarding NSQ, Go channels (not to be confused with NSQ channels) and the language’s built in concurrency features are a perfect fit for the internal workings of nsqd.

关于NSQ, Go通道(不要与NSQ通道混淆)和构建在并发特性中的语言非常适合“nsqd”的内部工作。

We leverage buffered channels to manage our in memory message queues and seamlessly write overflow to disk.

我们利用缓冲通道来管理内存中的消息队列,并无缝地将溢出写入磁盘。

The standard library makes it easy to write the networking layer and client code.

标准库使编写网络层和客户机代码变得很容易。

The built in memory and cpu profiling hooks highlight opportunities for optimization and require very little effort to integrate.

内建的内存和cpu概要钩子突出了优化的机会,并且只需要很少的集成工作。

We also found it really easy to test components in isolation, mock types using interfaces, and iteratively build functionality.

我们还发现,隔离地测试组件、使用接口模拟类型和迭代地构建功能非常容易。

最后修改:2021 年 02 月 22 日 10 : 59 PM
如果觉得我的文章对你有用,请随意赞赏