I am designing an Auto Scaling system for my application which runs on Amazon EC2 instances. The application reads messages from SQS and processes them.
我正在为我的应用程序设计Auto Scaling系统,该系统在Amazon EC2实例上运行。应用程序从SQS读取消息并处理它们。
The Auto Scaling system will monitor two things:
Auto Scaling系统将监控两件事:
- The number of messages in the SQS,
- The total number of processes running on all EC2 machines.
SQS中的消息数量,
所有EC2计算机上运行的进程总数。
For example, if the number of messages in SQS exceeds 3000 I want the system to autoscale, create a new EC2 instance, deploy code on it and whenever the number of messages goes below 2000 i want the system to terminate an EC2 instance.
例如,如果SQS中的消息数超过3000,我希望系统自动缩放,创建新的EC2实例,在其上部署代码,每当消息数低于2000时,我希望系统终止EC2实例。
I am doing this with Ruby and capistrano. My question is:
我正在用Ruby和capistrano这样做。我的问题是:
I am unable to find a method to determine number of processes running on all EC2 machines and save the number inside a variable. Could you help me?
我无法找到一种方法来确定在所有EC2机器上运行的进程数,并将数字保存在变量中。你可以帮帮我吗?
1 个解决方案
#1
3
You might want to utilize cron and CloudWatch API to push the numbers manually to CloudWatch as a part of auto-scaling-group policy. By numbers I mean the number of processes from each instance ps aux | grep your_process | wc -l
您可能希望利用cron和CloudWatch API将数字手动推送到CloudWatch,作为自动扩展组策略的一部分。数字是指每个实例ps aux |的进程数grep your_process | wc -l
CloudWatch will let you set alarm for that manual metric aggregated by SUM of the nr of processes across either all running instances or by auto-scaling-group.
CloudWatch将允许您为所有正在运行的实例中的nr个进程的SUM聚合的手动度量标准设置警报,或者通过auto-scaling-group设置警报。
Something to let you get started:
让你开始的东西:
Pushing RAM Memory metrics manually: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
手动推送RAM内存指标:http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
One more: http://aws.typepad.com/aws/2011/05/amazon-cloudwatch-user-defined-metrics.html
还有一个:http://aws.typepad.com/aws/2011/05/amazon-cloudwatch-user-defined-metrics.html
For memory it looks simple, as amazon already provides scripts for this. For processes you might need to dig in these scripts or read the official API docs
对于内存来说,它看起来很简单,因为亚马逊已经为此提供了脚本。对于流程,您可能需要深入了解这些脚本或阅读官方API文档
EDIT:
If you are now worried about single-point-of-failure in the watching system and you have a list of servers it might be preferred to examine them in parallel from a remote server:
如果您现在担心观察系统中的单点故障并且您有一个服务器列表,则可能首选从远程服务器并行检查它们:
rm ~/count.log
# SSH in parallel
for ROW in `cat ~/ListofIP.txt`
do
IP=`echo ${ROW} | sed 's/\./ /g' | awk '{print $1}'`
ssh -i /path/to/keyfile root@${IP} "ps -ef | grep process_name.rb | grep -v grep | wc -l" >> ~/count.log &
done
# Wait for totals
while [ ! `wc -l ~/ListofIP.txt` -eq `wc -l ~/count.log` ]
do
wait 1
done
# Sum up numbers from ~/count.log
# Push TO CloudWatch
#1
3
You might want to utilize cron and CloudWatch API to push the numbers manually to CloudWatch as a part of auto-scaling-group policy. By numbers I mean the number of processes from each instance ps aux | grep your_process | wc -l
您可能希望利用cron和CloudWatch API将数字手动推送到CloudWatch,作为自动扩展组策略的一部分。数字是指每个实例ps aux |的进程数grep your_process | wc -l
CloudWatch will let you set alarm for that manual metric aggregated by SUM of the nr of processes across either all running instances or by auto-scaling-group.
CloudWatch将允许您为所有正在运行的实例中的nr个进程的SUM聚合的手动度量标准设置警报,或者通过auto-scaling-group设置警报。
Something to let you get started:
让你开始的东西:
Pushing RAM Memory metrics manually: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
手动推送RAM内存指标:http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
One more: http://aws.typepad.com/aws/2011/05/amazon-cloudwatch-user-defined-metrics.html
还有一个:http://aws.typepad.com/aws/2011/05/amazon-cloudwatch-user-defined-metrics.html
For memory it looks simple, as amazon already provides scripts for this. For processes you might need to dig in these scripts or read the official API docs
对于内存来说,它看起来很简单,因为亚马逊已经为此提供了脚本。对于流程,您可能需要深入了解这些脚本或阅读官方API文档
EDIT:
If you are now worried about single-point-of-failure in the watching system and you have a list of servers it might be preferred to examine them in parallel from a remote server:
如果您现在担心观察系统中的单点故障并且您有一个服务器列表,则可能首选从远程服务器并行检查它们:
rm ~/count.log
# SSH in parallel
for ROW in `cat ~/ListofIP.txt`
do
IP=`echo ${ROW} | sed 's/\./ /g' | awk '{print $1}'`
ssh -i /path/to/keyfile root@${IP} "ps -ef | grep process_name.rb | grep -v grep | wc -l" >> ~/count.log &
done
# Wait for totals
while [ ! `wc -l ~/ListofIP.txt` -eq `wc -l ~/count.log` ]
do
wait 1
done
# Sum up numbers from ~/count.log
# Push TO CloudWatch