如何以编程方式和并发方式驱动?

I would like to use Ansible to execute a simple job on several remote nodes concurrently. The actual job involves grepping some log files and then post-processing the results on my local host (which has software not available on the remote nodes).

我想使用Ansible来并发地在几个远程节点上执行一个简单的作业。实际的工作包括加载一些日志文件，然后在本地主机上处理结果(远程节点上没有软件)。

The command line ansible tools don't seem well-suited to this use case because they mix together ansible-generated formatting with the output of the remotely executed command. The Python API seems like it should be capable of this though, since it exposes the output unmodified (apart from some potential unicode mangling that shouldn't be relevant here).

命令行可读工具似乎不太适合这个用例，因为它们将生成的格式与远程执行的命令的输出混合在一起。Python API似乎应该能够做到这一点，因为它公开了未修改的输出(除了一些不相关的unicode管理之外)。

A simplified version of the Python program I've come up with looks like this:

我提出的Python程序的简化版本如下:

from sys import argv
import ansible.runner
runner = ansible.runner.Runner(
    pattern='*', forks=10,
    module_name="command",
    module_args=(
        """
        sleep 10
        """),
    inventory=ansible.inventory.Inventory(argv[1]),
)
results = runner.run()

Here, sleep 10 stands in for the actual log grepping command - the idea is just to simulate a command that's not going to complete immediately.

在这里，sleep 10表示实际的日志清理命令——其思想是模拟一个不会立即完成的命令。

However, upon running this, I observe that the amount of time taken seems proportional to the number of hosts in my inventory. Here are the timing results against inventories with 2, 5, and 9 hosts respectively:

然而，在运行这个程序时，我发现所花费的时间似乎与我的清单中的主机数量成正比。以下是2个、5个和9个主机的库存的时间结果:

exarkun@top:/tmp$ time python howlong.py two-hosts.inventory
real    0m24.285s
user    0m0.216s
sys     0m0.120s
exarkun@top:/tmp$ time python howlong.py five-hosts.inventory                                                                                   
real    0m55.120s
user    0m0.224s
sys     0m0.160s
exarkun@top:/tmp$ time python howlong.py nine-hosts.inventory
real    1m57.272s
user    0m0.360s
sys     0m0.284s
exarkun@top:/tmp$

Some other random observations:

其他一些随机观察:

ansible all --forks=10 -i five-hosts.inventory -m command -a "sleep 10" exhibits the same behavior
所有-福克斯=10 - 5台主机。库存-m命令-a“睡眠10”显示了相同的行为
ansible all -c local --forks=10 -i five-hosts.inventory -m command -a "sleep 10" appears to execute things concurrently (but only works for local-only connections, of course)
所有-c本地-fork =10 -i五主机。库存-m命令-a“sleep 10”似乎可以并发地执行事务(当然，只适用于本地连接)
ansible all -c paramiko --forks=10 -i five-hosts.inventory -m command -a "sleep 10" appears to execute things concurrently
所有-c paramiko -fork =10 -i五主机。库存-m命令-“sleep 10”似乎可以并发执行。

Perhaps this suggests the problem is with the ssh transport and has nothing to do with using ansible via the Python API as opposed to from the comand line.

这可能表明问题出在ssh传输上，与通过Python API而不是comand行使用ansible并没有什么关系。

What is wrong here that prevents the default transport from taking only around ten seconds regardless of the number of hosts in my inventory?

这里有什么问题使得默认传输不能只占用大约10秒，而不管我的目录中有多少主机?

3 个解决方案

#1

Some investigation reveals that ansible is looking for the hosts in my inventory in ~/.ssh/known_hosts. My configuration has HashKnownHosts enabled. ansible isn't ever able to find the host entries it is looking for because it doesn't understand the hash known hosts entry format.

一些调查显示ansible正在寻找我在~/.ssh/known_hosts目录中的主机。我的配置已经启用了HashKnownHosts。ansible不能找到它正在寻找的主机条目，因为它不理解散列已知的主机入口格式。

Whenever ansible's ssh transport can't find the known hosts entry, it acquires a global lock for the duration of the module's execution. The result of this confluence is that all execution is effectively serialized.

每当ansible的ssh传输找不到已知的主机条目时，它都会在模块执行期间获得一个全局锁。这种融合的结果是所有的执行都被有效地序列化。

A temporary work-around is to give up some security and disabled host key checking by putting host_key_checking = False into ~/.ansible.cfg. Another work-around is to use the paramiko transport (but this is incredibly slow, perhaps tens or hundreds of times slower than the ssh transport, for some reason). Another work-around is to let some unhashed entries get added to the known_hosts file for ansible's ssh transport to find.

临时解决方案是通过将host_key_check = False放入~/.ansible.cfg来放弃一些安全性和禁用的主机密钥检查。另一种解决方案是使用paramiko传输(但是这种传输速度非常慢，可能比ssh传输慢几十倍或几百倍，出于某种原因)。另一种方法是让一些混乱的条目被添加到known_hosts文件中，以便ansible的ssh传输找到它们。

#2

Since you have HashKnownHosts enabled, you should upgrade to the latest version of Ansible. Version 1.3 added support for hashed known_hosts, see the bug tracker and changelog. This should solve your problem without compromising security (workaround using host_key_checking=False) or sacrificing speed (your workaround using paramiko).

既然已经启用了HashKnownHosts，就应该升级到Ansible的最新版本。版本1.3增加了对hashed known_hosts的支持，查看bug跟踪和更改。这应该在不影响安全性(使用host_key_check =False进行解决)或牺牲速度(使用paramiko进行解决)的情况下解决您的问题。

#3

With Ansible 2.0 Python API, I switched off StrictHostKeyChecking with

使用Ansible 2.0 Python API，我关闭了stricthostkeycheck

import ansible.constants

ansible.constants.HOST_KEY_CHECKING = False

I managed to speed up Ansible considerably by setting the following on managed computers. Newer sshd have the default the other way around, I think, so it might not be needed in your case.

通过在托管计算机上设置以下设置，我成功地大大加快了速度。更新的sshd具有相反的默认值，我认为，所以在您的例子中可能不需要它。

/etc/ssh/sshd_config
----
UseDNS no

#1