OpenStack之Nova分析——Nova Scheduler调度算法

上篇文章介绍了Nova Scheduler服务的启动流程，我们知道Nova Scheduler服务作为一个调度者，其核心便是调度算法。这篇文章我们就来分析一下Nova Scheduler服务的调度算法吧。

在配置文件中，调度算法默认的驱动类是FilterScheduler，该类位于nova/nova/scheduler/filter_scheduler.py中。其算法的原理是比较简单的，就是“过滤”和“称重”的过程。

class FilterScheduler(driver.Scheduler):
    def scheduler_run_instance(self, context, request_spec,
                               admin_password, injected_files,
                               requested_networks, is_first_time,
                               filter_properties):
        #获取调度所需参数
        payload = dict(request_spec=request_spec)
        #通知Nova API开始调度
        notifier.notify(context, notifier.publisher_id("scheduler"),
                        'scheduler.run_instance.start', notifier.INFO, notifier.INFO,
                        payload)
        ...
        #执行调度算法，获取加权主机列表
        weighted_hosts = self._schedule(context, "compute", request_spec,
                                        filter_properties, instance_uuids)
        ...
        #为每个虚拟机分配计算节点
        for num, instance_uuid in enumerate(instance_uuids):
            ...
            try:
                try:
                    #选择权值最高的计算节点
                    weighted_host = weighted_hosts.pop(0)
                except IndexError:
                    raise exception.NoValidHost(reason="")
                #在权值最高的计算节点上创建虚拟机
                self._provision_resource(context, weighted_host,
                                         request_spec,
                                         filter_properties,
                                         requested_networks,
                                         injected_files, admin_password,
                                         is_first_time,
                                         instance_uuid=instance_uuid)
            except Exception as ex:
                ...
        #通知Nova API虚拟机调度完毕
        notifier.notify(context, notifier.publisher_id("scheduler"),
                        'scheduler.run_instance.end', notifier.INFO, payload)

算法的核心实现在FilterScheduler类的_scheduler方法中。后面的_provision_resource方法实际上是远程调用了Nova Compute服务的run_instance方法。我们下面来重点看一下包含算法核心的_scheduler方法。

def _schedule(self, context, topic, request_spec, filter_properties,
                  instance_uuids=None):
    #获取用户上下文信息
    elevated = context.elevated()
    #获取虚拟机的信息
    instance_properties = request_spec['instance_properties']
    #获取虚拟机规格
    instance_type = request_spec.get("instance_type", None)
    ...
    #获取配置项
    config_options = self._get_configuration_options()
    properties = instance_properties.copy()
    if instance_uuids:
        properties['uuid'] = instance_uuids[0]
    self._populate_retry(filter_properties, properties)

    #构造主机过滤参数
    filter_properties.update({'context': context,
                              'request_spec': request_spec,
                              'config_options': config_options,
                              'instance_type': instance_type})
    self.populate_filter_properties(request_spec,
                                    filter_properties)

    #获取全部活动的主机列表
    hosts = self.host_manager.get_all_host_states(elevated)
    selected_hosts = []
    #获取需要启动的虚拟机数量
    if instance_uuids:
        num_instances = len(instance_uuids)
    else:
        num_instances = request_spec.get('num_instances', 1)
    #为每个要创建的虚拟机，选择权值最高的主机
    for num in xrange(num_instances):
        #获取所有可用的主机列表
        hosts = self.host_manager.get_filter_hosts(hosts,
                filter_properties)
        if not hosts:
            break
        #计算可用主机的权值
        weighted_host.host_state.consume_from_instance(
                instance_properties)
        #这个参数定义了新的实例将会被调度到一个主机上，这个主机是随机的从最好的（分数最高的）N个主机组成的子集中选择出来的
        scheduler_host_subset_size = CONF.scheduler_host_subset_size
        if scheduler_host_subset_size > len(weighed_hosts):
            scheduler_host_subset_size = len(weighed_hosts)
        if scheduler_host_subset_size < 1:
            scheduler_host_subset_size = 1
        #从分数最高的若干主机组成的子集中，随机的选择一个主机出来
        chosen_host = random.choice(weighed_hosts[0:scheduler_host_subset_size])
        selected_hosts.append(chosen_host)
        #因为已经选好了一个主机，所以要在下一个实例选择主机前，更新主机资源信息
        chosen_host.obj.consume_from_instance(instance_properties)
        ...
    return selected_hosts

虚拟机调度算法主要就是四个步骤：

1. 获取可用的计算节点列表

(1) hosts = self.host_manager.get_all_host_states(elevated)

class HostManger(object):
    def get_all_host_states(self, context):
        #获取所有计算节点
        compute_nodes = db.compute_node_get_all(context)
        seen_nodes = set()
        for compute in compute_nodes:
            #获取节点的服务信息
            service = compute['service']
            #节点上没有服务，可能是过期节点
            if not service:
                continue
            #获取节点的主机名
            host = service['host']
            node = compute.get('hypervisor_hostname')
            state_key = (host, node)
            #获取HostManager对象缓存的服务状态和节点状态信息
            capabilities = self.service_states.get(state_key,None)
            host_state = self.host_state_map.get(state_key)
            #如果host_state存在，说明是旧节点
            if host_state:
                #更新节点的性能信息
                host_state.update_capabilities(capabilities, dict
                (service.iteritems))
            else:
                #添加新节点的状态信息
                host_state = self.host_state_cls(host, node,capabilities=capabilities,service=dict(service.iteritems()))
                self.host_state_map[state_key] = host_state
            #更新计算节点的硬件资源信息
            host_state.update_from_compute_node(compute)
            seen_nodes.add(state_key)
        #获取不活动的节点列表
        dead_nodes = set(self.host_state_map.keys()) - seen_nodes
        #删除不活动节点的缓存信息
        for state_key in dead_nodes:
            host, node = state_key
            del self.host_state_map[state_key]
        return self.host_state_map.itervalues()

可以看到，上面方法主要实现了两个功能：获取当前所有活动的计算节点列表；更新和维护HostManger对象缓存的节点状态信息。

该方法首先调用db.compute_node_get_all，从数据库中获取当前活动的计算节点列表。列表中保存了计算节点的CPU，内存和硬盘资源的最新信息，该信息由Nova Compute服务维护。Nova Compute服务会在每次执行完虚拟机操作后更新计算节点的硬件资源信息，同时还启动了一个定时任务（update_available），定时更新硬件资源的信息。变量capabilities存储的是HostManager对象缓存的计算节点性能信息（包括节点的CPU、内存、硬盘的使用状况），该性能信息也是由Nova Compute服务的定时任务（update_capabilities）定时向Nova Scheduler服务报告节点的性能信息。

(2) self.host_manager.get_filtered_hosts(hosts, filter_properties)

class HostManager(object):
    def get_filtered_hosts(self, hosts, filter_properties, filter_class_names=None):
        ...
        #获取过滤器列表
        filter_classes = self._choose_host_filters(filter_class_names)
        ...
        #返回过滤后的主机列表
        return self.filter_handler.get_filtered_objects(filter_classes,
                                                        hosts, filter_properties)

为了确定计算节点是否可用，Nova Scheduler定义了多个过滤器，每个过滤器检查节点的一种属性。只有通过全部过滤器的节点，才被认为是可用的主机。上面的方法首先调用_choose_host_filters获取过滤器列表。然后调用filter_handler变量的get_filtered_objects方法使用该过滤器。另外get_filtered_hosts方法还可以通过参数filter_properties传入force_hosts和ignore_hosts两个变量。

a. _choose_host_filters方法

class HostManager(object):
    def _choose_host_filters(self, filter_cls_names):
        #如果外部没有传入filter_cls_names参数，则使用默认的过滤器
        if filter_cls_names is None:
            filter_cls_names = CONF.scheduler_default_filters
        #将filter_cls_names封装成列表
        if not isinstance(filter_cls_names, (list, tuple)):
            filter_cls_names = [filter_cls_names]
        good_filters = []
        bad_filters = []
        #遍历所有配置的过滤器
        for filter_name in filter_cls_names:
            found_class = False
            #遍历所有注册的过滤器
            for cls in self.filter_classes:
                #如果filter_name对应的过滤器在注册的过滤器列表中，则认为是好过滤器
                if cls.__name__ == filter_name:
                    good_filters.append(cls)
                    found_class = True
                    break
            #如果filter_name对应的过滤器不在注册的过滤器列表中，则认为是坏过滤器
            if not found_class:
                bad_filters.append(filter_name)
        ...
        return good_filter

该方法遍历filter_cls_names参数中所有的过滤器，从中提取好的过滤器，所谓好的过滤器就是指这个过滤器之前被注册过。这个注册过程在HostManager类的初始化方法中通过调用filter_handler对象的get_matching_classes方法完成，get_matching_classes方法会注册nova.scheduler.filters包下定义的所有过滤器。

b. get_filtered_objects方法

class BaseFilterHandler(loadables.BaseLoader):
    def get_filtered_objects(self, filter_classes, objs, filter_properties):
        #遍历每个过滤器
        for filter_cls in filter_classes:
            #调用过滤器类的filter_all方法
            objs = filter_cls().filter_all(objs, filter_properties)
        return list(objs)

该方法使用上面指定的过滤器，检查计算节点是否可用，最终返回可用的计算节点列表。

方法依次调用了每个过滤器的filter_all方法，返回一个迭代器对象，该迭代器对象包含了通过该过滤器检查的主机列表。每个过滤器对象都继承自BaseHostFilter类，BaseHostFilter类继承自BaseFilter类。filter_all方法定义在BaseFilter类中，其定义如下

class BaseFilter(object):
    def filter_all(self, filter_obj_list, filter_properties):
        for obj in filter_obj_list:
            if self._filter_one(obj, filter_properties):
                yield obj

filter_obj_list是待过滤的计算节点列表。filter_all方法对每个计算节点都调用了_filter_one方法，如果_filter_one方法返回True，则返回该主机的引用。BaseFilter类的_filter_one方法总是返回True，子类BaseHostFilter重写了_filter_one方法，它会调用每个过滤器自身的host_pass方法。BaseHostFilter类的_filter_one方法定义如下

class BaseHostFilter(filters.BaseFilter):
    def _filter_one(self, obj, filter_properties):
        return self.host_pass(obj, filter_properties)

当主机通过了过滤器检查时，host_pass方法返回True。只有当主机通过了所有过滤器检查时，才被认为是可用的。

2. 计算可用计算节点的权值

get_weighed_hosts方法

class HostManager(object):
    def get_weighed_hosts(self, hosts, weight_properties):
        return self.weight_handler.get_weighed_objects(self.weight_classes,
                                                       hosts, weight_properties)

get_weighed_hosts方法较get_filtered_hosts方法要简单。它不需要外部传入类似weight_class_names的变量，而是直接使用预先注册权值类（ self.weight_classes = self.weight_handler.get_matching_classes(CONF.scheduler_weight_classes)），目前G版本的Nova只支持RAMWeigher权值类。

与get_filtered_hosts方法类似，get_weighed_host方法会调用weight_handler对象的get_weighed_objects方法来执行计算权值的方法，其定义如下

class BaseWeightHandler(loadables.BaseLoader):
    def get_weighed_objects(self, weigher_classes, obj_list, weighing_properties):
        if not obj_list:
            return []
        #将主机封装成WeighedObject对象
        weighed_objs = [self.object_class(obj, 0.0) for obj in obj_list]
        #遍历所有权值类
        for weigher_cls in weigher_classes:
            #创建权值对象
            weigher = weigher_cls()
            weigher.weigh_objects(weighed_objs, weighing_properties)
        #将主机列表按权值从高到低排序    
        return sorted(weighed_objs, key=lambda x: x.weight, reverse=True)

上面代码的核心部分是调用了权值对象的weigh_objects方法，每个权值对象都继承自BaseHostWeigher类，BaseHostWeigher类继承自BaseWeigher类。weigh_objects方法定义如下

class BaseWeigher(object):
    def weigh_objects(self, weighed_obj_list, weight_properties):
        for obj in weighed_obj_list:
            #主机权值=原来的权值+权重*当前权值对象赋予主机的权值
            obj.weight += (self._weight_multiplier() *
                           self._weigh_object(obj.obj, weight_properties))

可以看到，主机的权值实际上是各个权值类赋予主机的权值的加权和。其中_weight_multiplier方法返回当前权值类的权重，_weigh_object方法返回当前全之类赋予主机的权值。

由于当前Nova只支持RAMWeigher权值类，所以具体到这个权值类，我们来看一下_weight_multiplier和_weigh_object这两个方法。它的权重由nova.conf配置文件的ram_weight_multiplier配置项定义，默认值为1.0。其_weigh_object方法返回的是主机剩余的内存大小。

3. 从权值最高的scheduler_host_subset_size个计算节点中随机选择一个计算节点作为创建虚拟机的节点

4. 更新选择的计算节点的硬件资源信息，为虚拟机预留资源

秒客网

OpenStack之Nova分析——Nova Scheduler调度算法

相关文章