First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
首先,简单介绍一下我的背景:我已经做了一段时间的编程(到目前为止已经做了10年),在编写想法方面我相当有能力。一年前,我开始研究web应用程序编程,幸运的是,我发现了nodeJS,这让创建web应用程序感觉更像传统编程。现在,我有一个节点。我已经开发了一段时间的js应用现在在web上运行。我的主要困惑源于这样一个事实:我对web开发领域非常陌生,在监视我的应用程序时,我不知道什么是重要的,什么是不重要的。
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
我使用的是Joyent SmartMachine,看看他们提供的分析选项有一点压倒性。有很多不同的选项和配置,我不知道每种分析的真正目的是什么。对于下面的问题,我很感激任何答案,不管是Joyent的云分析还是完全通用的。
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
现在,我主要关心的是我的应用程序是如何利用我运行的服务器的。我想知道我的应用程序是否有足够的资源分配给它。它接收到的请求的数量是否使它的服务器处于超杀状态,还是它需要额外的资源?对于NodeJS应用来说,什么分析是重要的?(如果使用MongoDB和Redis在不同的服务器上,会有所不同)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
在管理正在生产中的服务器时,还有哪些统计数据是非常重要的?我习惯了运行一次的程序来做一些特定的事情(例如,一个在计算完图像后就会运行的光线追踪器),而不是不断运行和与许多客户端交互的web应用程序。我敢肯定,对于像我这样的新手来说,有许多事情是显而易见的。
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
在与NodeJS打交道时,需要特别注意什么?当处理NodeJS的单线程事件循环与更多的标准服务器系统时,什么统计数据/分析变得特别重要?
I have other questions about how databases play into the equation, but I think this is enough for now...
我还有其他关于数据库如何发挥作用的问题,但我认为这已经足够了……
3 个解决方案
#1
15
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
我们一直在运行node。js从0.4系列和currenty 0.8系列开始生产近一年。Web应用程序是基于mongo、redis和memcached的express 2和3。
Few facts.
一些事实。
- node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
- node无法处理大型v8堆,当它增长超过200mb时,您将开始看到cpu使用量的增加
- node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
- 节点似乎总是会泄漏内存,或者至少在不使用它的情况下增加大的堆大小。我怀疑内存碎片,因为v8的剖析或valgrind显示了js空间和常驻堆中没有泄漏。在这方面,早期的0.8很糟糕,rss可能是1GB,有50MB的堆。
- hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
- 挂起的请求很难追踪。我们编写了中间件来监视它们,特别是在我们的应用程序是基于长轮询的情况下
My suggestions.
我的建议。
- use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
- 每个机器使用多个实例,每个cpu至少一个实例。与haproxy、nginx或类似的会话关联进行平衡
- write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
- 编写midleware以报告挂起的连接(即代码从未响应或延迟超过阈值的连接)
- restart instances often, at least weekly
- 经常重启实例,至少每周重启一次
- write poller that prints out memory stats with process module one per minute
- 编写轮询器,每分钟打印一次进程模块的内存统计信息
- Use supervisord and fabric for easy process management
- 使用监督和织物进行简单的过程管理。
Monitor cpu, reported memory stats and restart on threshold
监控cpu,报告内存统计和重新启动阈值
#2
6
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
无论web应用程序的类型是NodeJS还是其他类型,负载测试都将回答您的应用程序是否拥有合适的服务器资源。我最近发现的一个很好的网站是Load Impact。
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
真正需要回答的问题是,当并发用户数量增加时,加载时间何时开始增加?当您到达一定数量的并发用户时,服务器性能就会开始下降。因此,根据您期望在不久的将来到达您的网站的用户数来进行负载测试。
How can you estimate the amount of users you expect?
您如何估计您所期望的用户数量?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
在页面上安装谷歌分析或其他分析包是必须的!通过这种方式,你可以看到每天有多少用户访问你的网站,以及你每月访问量的增长情况,这有助于预测你未来的预期访问量,从而预测你的服务器上的负载。
Even if I know the number of users, how can I estimate actual load?
即使我知道用户的数量,我怎么估计实际的负载呢?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
答案是在所有浏览器中可用的F12开发工具中。在任何浏览器中打开你的网站,按下F12(或者是Opera Ctrl+Shift+I),就会打开浏览器的开发工具。在火狐浏览器上安装Firebug,在Chrome和ie浏览器上安装Firebug就可以了。转到Net或Network选项卡,然后刷新页面。这将向您显示HTTP请求的数量、每个页面负载的带宽使用情况!
So the formula to work out daily server load is simple:
因此,计算每日服务器负载的公式很简单:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
每个页面装载的HTTP请求数X每个用户每天平均装载的页面数X并发用户的预期数量=每天对服务器的HTTP请求总数
And...
和…
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
每个页面传输的MBs数量X每个用户每天平均的页面负载数量X并发用户的预期数量=每天所需的总带宽
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
我发现,每天计算这些数字,然后再将其外推到几周或几个月,会更容易些。
#3
3
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
节点。js是单线程的,所以您应该为您的机器的每个cpu启动一个进程。集群是迄今为止最好的实现这一点的方法,而且还有一个额外的好处,那就是能够重新启动死亡员工并检测出反应迟钝的员工。
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
您还需要进行负载测试,直到您的请求开始超时或超出您认为的合理响应时间。这将给您一个好主意,您的服务器可以处理的上限。闪电战是众多可供选择的方案之一。
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.
我从未使用过Joyent的统计数据,但是NodeFly和他们的node- NodeFly -gcinfo是监视节点进程的一个很好的工具。
#1
15
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
我们一直在运行node。js从0.4系列和currenty 0.8系列开始生产近一年。Web应用程序是基于mongo、redis和memcached的express 2和3。
Few facts.
一些事实。
- node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
- node无法处理大型v8堆,当它增长超过200mb时,您将开始看到cpu使用量的增加
- node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
- 节点似乎总是会泄漏内存,或者至少在不使用它的情况下增加大的堆大小。我怀疑内存碎片,因为v8的剖析或valgrind显示了js空间和常驻堆中没有泄漏。在这方面,早期的0.8很糟糕,rss可能是1GB,有50MB的堆。
- hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
- 挂起的请求很难追踪。我们编写了中间件来监视它们,特别是在我们的应用程序是基于长轮询的情况下
My suggestions.
我的建议。
- use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
- 每个机器使用多个实例,每个cpu至少一个实例。与haproxy、nginx或类似的会话关联进行平衡
- write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
- 编写midleware以报告挂起的连接(即代码从未响应或延迟超过阈值的连接)
- restart instances often, at least weekly
- 经常重启实例,至少每周重启一次
- write poller that prints out memory stats with process module one per minute
- 编写轮询器,每分钟打印一次进程模块的内存统计信息
- Use supervisord and fabric for easy process management
- 使用监督和织物进行简单的过程管理。
Monitor cpu, reported memory stats and restart on threshold
监控cpu,报告内存统计和重新启动阈值
#2
6
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
无论web应用程序的类型是NodeJS还是其他类型,负载测试都将回答您的应用程序是否拥有合适的服务器资源。我最近发现的一个很好的网站是Load Impact。
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
真正需要回答的问题是,当并发用户数量增加时,加载时间何时开始增加?当您到达一定数量的并发用户时,服务器性能就会开始下降。因此,根据您期望在不久的将来到达您的网站的用户数来进行负载测试。
How can you estimate the amount of users you expect?
您如何估计您所期望的用户数量?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
在页面上安装谷歌分析或其他分析包是必须的!通过这种方式,你可以看到每天有多少用户访问你的网站,以及你每月访问量的增长情况,这有助于预测你未来的预期访问量,从而预测你的服务器上的负载。
Even if I know the number of users, how can I estimate actual load?
即使我知道用户的数量,我怎么估计实际的负载呢?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
答案是在所有浏览器中可用的F12开发工具中。在任何浏览器中打开你的网站,按下F12(或者是Opera Ctrl+Shift+I),就会打开浏览器的开发工具。在火狐浏览器上安装Firebug,在Chrome和ie浏览器上安装Firebug就可以了。转到Net或Network选项卡,然后刷新页面。这将向您显示HTTP请求的数量、每个页面负载的带宽使用情况!
So the formula to work out daily server load is simple:
因此,计算每日服务器负载的公式很简单:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
每个页面装载的HTTP请求数X每个用户每天平均装载的页面数X并发用户的预期数量=每天对服务器的HTTP请求总数
And...
和…
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
每个页面传输的MBs数量X每个用户每天平均的页面负载数量X并发用户的预期数量=每天所需的总带宽
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
我发现,每天计算这些数字,然后再将其外推到几周或几个月,会更容易些。
#3
3
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
节点。js是单线程的,所以您应该为您的机器的每个cpu启动一个进程。集群是迄今为止最好的实现这一点的方法,而且还有一个额外的好处,那就是能够重新启动死亡员工并检测出反应迟钝的员工。
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
您还需要进行负载测试,直到您的请求开始超时或超出您认为的合理响应时间。这将给您一个好主意,您的服务器可以处理的上限。闪电战是众多可供选择的方案之一。
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.
我从未使用过Joyent的统计数据,但是NodeFly和他们的node- NodeFly -gcinfo是监视节点进程的一个很好的工具。