Mesos学习

时间:2022-12-07 14:59:15


Mesos 简介

Mesos —— 像用一台电脑(一个资源池)一样使用整个数据中心

We have presented Mesos, a thin management layer that allows diverse cluster computing frameworks to efficiently share resources. Mesos is built around two design elements: a fine-grained resource sharing model at the level of tasks within a job, and a decentralized scheduling
mechanism called resource offers that lets applications choose which resources to use. Together, these elements let Mesos achieve high utilization, respond rapidly to workload changes, and cater to frameworks with diverse needs, while remaining simple and scalable.
 

Mesos是什么:

  Apache Mesos 是一个开源的集群管理器,用来抽象 CPU ,内存,储存等计算资源,并且支持容错以及弹性分布式系统。 Mesos 内核运行在每个集群机器中,并且提供为应用程序提供 API 来管理集群和调度。

分布式操作系统内核

Mesos是以与Linux内核同样的原则而创建的,不同点仅仅是在于抽象的层面。Mesos内核运行在每一个机器上,同时通过 API 为各种应用提供跨数据中心和云的资源管理调度能力。这些应用包括Hadoop、Spark、Kafka、Elastic Search。还可配合框架 Marathon 来管理大规模的Docker等容器化应用。

Mesos Architecture:

Mesos学习

上图显示了 Mesos 的主要组成部分。 Mesos 由一个 master daemon 来管理 slave daemon 在每个集群节点上的运行, mesos applications ( 也称为 frameworks )在这些 slaves 上运行 tasks。

Master 使用 Resource Offers 实现跨应用细粒度资源共享,如 cpu、内存、磁盘、网络等。 master 根据指定的策略来决定分配多少资源给 framework ,如公平共享策略,或优先级策略。 为了支持更多样性的策略,master 采用模块化结构,这样就可以方便的通过插件形式来添加新的分配模块。

在 Mesos 上运行的 framework 由两部分组成:一个是 scheduler ,通过注册到 master 来获取集群资源。另一个是在 slave 节点上运行的 executor 进程,它可以执行 framework 的 task 。 Master 决定为每个 framework 提供多少资源, framework 的 scheduler 来选择其中提供的资源。当 framework 同意了提供的资源,它通过 master 将 task发送到提供资源的 slaves 上运行。

Mesos学习

事件流程:

  1. Slave1 向 Master 报告,有4个CPU和4 GB内存可用
  2. Master 发送一个 Resource Offer 给 Framework1 来描述 Slave1 有多少可用资源
  3. FrameWork1 中的 FW Scheduler会答复 Master,我有两个 Task 需要运行在 Slave1,一个 Task 需要<2个cpu,1 gb内存="">,另外一个Task需要<1个cpu,2 gb内存="">
  4. 最后,Master 发送这些 Tasks 给 Slave1。然后,Slave1还有1个CPU和1 GB内存没有使用,所以分配模块可以把这些资源提供给 Framework2

Allocator(分配器):Mesos master 的 allocator 会定期将资源分配给某个framework。

Mesos 两阶段调度

Mesos introduces a distributed two-level scheduling mechanism called resource offers.

Mesos decides how many resources to offer each framework.

while frameworks decide which resources to accept and which computations to run on them.

One approach would be for Mesos to implement a centralized scheduler that takes as input framework requirements, resource availability, and organizational policies, and computes a global schedule for all tasks. While this approach can optimize scheduling across frameworks, it
faces several challenges. The first is complexity. The scheduler would need to provide a sufficiently expressive API to capture all frameworks’ requirements, and to solve an on-line optimization problem for millions of tasks. Even if such a scheduler were feasible, this
complexity would have a negative impact on its scalability and resilience. Second, as new frameworks and new scheduling policies for current frameworks are constantly being developed , it is not clear whether we are even at the point to have a full specification of framework requirements. Third, many existing frameworks implement their own sophisticated scheduling, and moving this functionality to a global scheduler would require expensive refactoring.
Instead, Mesos takes a different approach: delegating control over scheduling to the frameworks. This is accomplished through a new abstraction, called a resource offer, which encapsulates a bundle of resources that a framework can allocate on a cluster node to run tasks.
Mesos decides how many resources to offer each framework, based on an organizational policy such as fair sharing, while frameworks decide which resources to accept and which tasks to run on them. While this decentralized scheduling model may not always lead to globally
optimal scheduling, we have found that it performs surprisingly well in practice, allowing frameworks to meet goals such as data locality nearly perfectly. In addition, resource offers are simple and efficient to implement, allowing Mesos to be highly scalable and robust to failures.

Mesos’s flexible fine-grained sharing model also has other advantages. First, even organizations that only use one framework can use Mesos to run multiple instances of that framework in the same cluster, or multiple versions of the framework. Our contacts at Yahoo! and Facebook indicate that this would be a compelling way to isolate production and experimental Hadoop workloads and to roll out new versions of Hadoop.
Second, by providing a means of sharing resources across frameworks, Mesos allows framework developers to build specialized frameworks targeted at particular problem domains rather than one-size-fits-all abstractions. Frameworks can therefore evolve faster and provide better support for each problem domain.
 

Mesos 高可用性

    Mesos 利用多台 Mesos master 来实现高可用性(high-availability),包括一个活跃的 master (叫做 leader 或者 leading master)和若干备份 master 来避免宕机。 通过 Apache ZooKeeper 选举出活跃的 leader,然后通知集群中的其他节点,包括其他 Master,slave节点和调度器(scheduler driver)。

Mesos 实现两层的 ZooKeeper leader 选举抽象,代码实现一个在 src/zookeeper 目录下,另一个在 src/master 之下(请查看 contender|detector.hpp|cpp).
低层次的 LeaderContender 和 LeaderDetctor 在这个方法之后 实现一个通用的 ZooKeeper 选举算法并提供了松散的模块化(基于 master 集群的大小而没有从众效应, 通常为 3)
高等级的 MasterContender 和 MasterDetector 围绕着 ZooKeeper 的竞争者和检测器抽象为适配器来提供/解析 ZooKeeper 的数据。