Okay, here's the scenario. I have a utility that processes tons of records, and enters information to the Database accordingly.
好的,这是场景。我有一个处理大量记录的实用程序,并相应地将信息输入数据库。
It works on these records in multi-threaded batches. Each such batch writes to the same log file for creating a workflow trace for each record. Potentially, we could be making close to a million log writes in a day.
它适用于多线程批处理中的这些记录。每个此类批处理都写入相同的日志文件,以便为每条记录创建工作流程跟踪。潜在地,我们可能在一天内完成接近一百万次的日志写入。
Should this log be made into a database residing on another server? Considerations:
该日志是否应该驻留在另一台服务器上的数据库中?注意事项:
- The obvious disadvantage of multiple threads writing to the same log file is that the log messages are shuffled amongst each other. In the database, they can be grouped by batch id.
- Performance - which would slow down the batch processing more? writing to a local file or sending log data to a database on another server on the same network. Theoretically, the log file is faster, but is there a gotcha here?
多线程写入同一日志文件的明显缺点是日志消息在彼此之间混合。在数据库中,可以按批次ID对它们进行分组。
性能 - 这会减慢批处理的速度吗?写入本地文件或将日志数据发送到同一网络上另一台服务器上的数据库。从理论上讲,日志文件更快,但这里有问题吗?
Are there any optimizations that can be done on either approach?
是否可以对这两种方法进行任何优化?
Thanks.
10 个解决方案
#1
2
I second the other answers here, depends on what you are doing with the data.
我在这里的其他答案,取决于你正在做什么数据。
We have two scenarios here:
我们这里有两个场景:
-
The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.
大多数日志记录都是针对数据库的,因为我们构建的产品的管理员用户需要能够在他们漂亮的小应用程序中查看它们,并且所有的花里胡哨。
-
We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.
我们将所有诊断和调试信息记录到文件中。我们不需要真正“美化”它和TBH,我们甚至不经常需要它,所以我们只是记录和存档大部分。
I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.
我会说,如果用户正在做任何事情,然后登录到DB,如果它适合你,那么一个文件可能就足够了。
#2
6
The interesting question, should you decide to log to the database, is where do you log database connection errors?
有趣的问题,如果您决定登录数据库,您在哪里记录数据库连接错误?
If I'm logging to a database, I always have a secondary log location (file, event log, etc) in case there are communication errors. It really does make it easier to diagnose issues later on.
如果我正在登录数据库,我总是有一个辅助日志位置(文件,事件日志等),以防通信错误。它确实使以后更容易诊断问题。
#3
3
One thing that comes to mind is that you could have each thread writing to its own log file and then do a daily batch run to combine them.
我想到的一件事是,您可以让每个线程写入自己的日志文件,然后每天批量运行以组合它们。
If you are logging to database you probably need to do some tuning and optimization, especially if the DB will be across the network. At the least you will need to be reusing the DB connections.
如果要登录到数据库,则可能需要进行一些调整和优化,尤其是在数据库将通过网络时。至少你需要重用数据库连接。
Furthermore, do you have any specific needs to have the log in database? If all you need is a "grep " then I don't think you gain much by logging into database.
此外,您是否有任何特定需要登录数据库?如果您只需要一个“grep”,那么我认为您登录数据库并不会获得太多收益。
#4
2
Not sure if it helps, but there's also a utility called Microsoft LogParser that you can supposedly use to parse text-based log files and use them as if they were a database. From the website:
不确定它是否有帮助,但是还有一个名为Microsoft LogParser的实用程序,您可以使用它来解析基于文本的日志文件并将它们用作数据库。来自网站:
Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart. Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.
日志解析器是一个功能强大的多功能工具,提供对基于文本的数据的通用查询访问,如日志文件,XML文件和CSV文件,以及Windows®操作系统上的关键数据源,如事件日志,注册表,文件系统和ActiveDirectory®。您可以告诉Log Parser您需要哪些信息以及如何处理它们。您的查询结果可以在基于文本的输出中自定义格式化,也可以持久保存到更多专业目标,如SQL,SYSLOG或图表。大多数软件旨在完成有限数量的特定任务。 Log Parser是不同的......它可以使用的方式数量仅受用户需求和想象力的限制。世界是Log Parser的数据库。
I haven't used the program myself, but it seems quite interesting!
我自己没有用过这个程序,但看起来很有意思!
#5
2
Or how about logging to a queue? That way you can switch out pollers whenever you like to log to different things. It makes things like rolling over and archiving log files very easy. It's also nice because you can add pollers that log to different things, for example:
或者如何记录到队列?这样,只要您想登录不同的东西,就可以切换出轮询器。它使得滚动和归档日志文件等事情变得非常容易。它也很好,因为您可以添加记录到不同内容的轮询器,例如:
- a poller that looks for error messages and posts them to your FogBugz account
- a poller that looks for access violations ('x tried to access /foo/y/bar.html') to a 'hacking attempts' file
- etc.
查找错误消息并将其发布到您的FogBugz帐户的轮询器
寻找访问冲突('x试图访问/foo/y/bar.html')到'黑客企图'文件的轮询器
#6
1
Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer.
See if you have a performance problem before deciding to switch to files
"Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)
数据库 - 因为你提到了多个线程。同步和过滤检索是我回答的原因。在决定切换到文件之前看看你是否有性能问题“Knuth:过早的优化是所有邪恶的根源”我在那本书中没有得到任何进一步的... :)
#7
1
There are ways you can work around the limitations of file logging.
有一些方法可以解决文件记录的限制。
You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread.
您始终可以使用某种线程ID启动每个日志条目,并刷新各个线程ID。或者每个线程的不同日志文件。
I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.
我以前在较低优先级的单独线程中登录到数据库。我必须说,当你想弄清楚出了什么问题时,可查询性是非常有价值的。
#8
1
How about logging to database-file, say a SQLite database? I think it can handle multi-threaded writes - although that may also have its own performance overheads.
如果要记录到数据库文件,比如一个SQLite数据库呢?我认为它可以处理多线程写入 - 尽管它也可能有自己的性能开销。
#9
0
I think it depends greatly on what you are doing with the log files afterwards.
我认为这在很大程度上取决于你之后对日志文件做了什么。
Of the two operations writing to the log file will be faster - especially as you are suggesting writing to a database on another server.
写入日志文件的两个操作中的速度会更快 - 特别是当您建议写入另一台服务器上的数据库时。
However if you are then trying to process and search the log files on a regular basis then the best place to do this would be a database.
但是,如果您正在尝试定期处理和搜索日志文件,那么最好的地方就是数据库。
If you use a logging framework like log4net they often provide simple config file based ways of redirecting input to file or database.
如果使用log4net之类的日志框架,它们通常会提供基于配置文件的简单方法,将输入重定向到文件或数据库。
#10
0
I like Gaius' answer. Put all the log statements in a threadsafe queue and then process them from there. For DB you could batch them up, say 100 log statements in one batch and for file you could just stream them into the file as they come into the queue.
我喜欢Gaius的回答。将所有日志语句放在线程安全队列中,然后从那里处理它们。对于DB,您可以批量处理它们,在一个批处理中说100个日志语句,对于文件,您可以在它们进入队列时将它们流式传输到文件中。
File or Db? As many others say; it depends on what you need the log file for.
文件或Db?正如许多人所说;这取决于你需要的日志文件。
#1
2
I second the other answers here, depends on what you are doing with the data.
我在这里的其他答案,取决于你正在做什么数据。
We have two scenarios here:
我们这里有两个场景:
-
The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.
大多数日志记录都是针对数据库的,因为我们构建的产品的管理员用户需要能够在他们漂亮的小应用程序中查看它们,并且所有的花里胡哨。
-
We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.
我们将所有诊断和调试信息记录到文件中。我们不需要真正“美化”它和TBH,我们甚至不经常需要它,所以我们只是记录和存档大部分。
I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.
我会说,如果用户正在做任何事情,然后登录到DB,如果它适合你,那么一个文件可能就足够了。
#2
6
The interesting question, should you decide to log to the database, is where do you log database connection errors?
有趣的问题,如果您决定登录数据库,您在哪里记录数据库连接错误?
If I'm logging to a database, I always have a secondary log location (file, event log, etc) in case there are communication errors. It really does make it easier to diagnose issues later on.
如果我正在登录数据库,我总是有一个辅助日志位置(文件,事件日志等),以防通信错误。它确实使以后更容易诊断问题。
#3
3
One thing that comes to mind is that you could have each thread writing to its own log file and then do a daily batch run to combine them.
我想到的一件事是,您可以让每个线程写入自己的日志文件,然后每天批量运行以组合它们。
If you are logging to database you probably need to do some tuning and optimization, especially if the DB will be across the network. At the least you will need to be reusing the DB connections.
如果要登录到数据库,则可能需要进行一些调整和优化,尤其是在数据库将通过网络时。至少你需要重用数据库连接。
Furthermore, do you have any specific needs to have the log in database? If all you need is a "grep " then I don't think you gain much by logging into database.
此外,您是否有任何特定需要登录数据库?如果您只需要一个“grep”,那么我认为您登录数据库并不会获得太多收益。
#4
2
Not sure if it helps, but there's also a utility called Microsoft LogParser that you can supposedly use to parse text-based log files and use them as if they were a database. From the website:
不确定它是否有帮助,但是还有一个名为Microsoft LogParser的实用程序,您可以使用它来解析基于文本的日志文件并将它们用作数据库。来自网站:
Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart. Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.
日志解析器是一个功能强大的多功能工具,提供对基于文本的数据的通用查询访问,如日志文件,XML文件和CSV文件,以及Windows®操作系统上的关键数据源,如事件日志,注册表,文件系统和ActiveDirectory®。您可以告诉Log Parser您需要哪些信息以及如何处理它们。您的查询结果可以在基于文本的输出中自定义格式化,也可以持久保存到更多专业目标,如SQL,SYSLOG或图表。大多数软件旨在完成有限数量的特定任务。 Log Parser是不同的......它可以使用的方式数量仅受用户需求和想象力的限制。世界是Log Parser的数据库。
I haven't used the program myself, but it seems quite interesting!
我自己没有用过这个程序,但看起来很有意思!
#5
2
Or how about logging to a queue? That way you can switch out pollers whenever you like to log to different things. It makes things like rolling over and archiving log files very easy. It's also nice because you can add pollers that log to different things, for example:
或者如何记录到队列?这样,只要您想登录不同的东西,就可以切换出轮询器。它使得滚动和归档日志文件等事情变得非常容易。它也很好,因为您可以添加记录到不同内容的轮询器,例如:
- a poller that looks for error messages and posts them to your FogBugz account
- a poller that looks for access violations ('x tried to access /foo/y/bar.html') to a 'hacking attempts' file
- etc.
查找错误消息并将其发布到您的FogBugz帐户的轮询器
寻找访问冲突('x试图访问/foo/y/bar.html')到'黑客企图'文件的轮询器
#6
1
Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer.
See if you have a performance problem before deciding to switch to files
"Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)
数据库 - 因为你提到了多个线程。同步和过滤检索是我回答的原因。在决定切换到文件之前看看你是否有性能问题“Knuth:过早的优化是所有邪恶的根源”我在那本书中没有得到任何进一步的... :)
#7
1
There are ways you can work around the limitations of file logging.
有一些方法可以解决文件记录的限制。
You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread.
您始终可以使用某种线程ID启动每个日志条目,并刷新各个线程ID。或者每个线程的不同日志文件。
I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.
我以前在较低优先级的单独线程中登录到数据库。我必须说,当你想弄清楚出了什么问题时,可查询性是非常有价值的。
#8
1
How about logging to database-file, say a SQLite database? I think it can handle multi-threaded writes - although that may also have its own performance overheads.
如果要记录到数据库文件,比如一个SQLite数据库呢?我认为它可以处理多线程写入 - 尽管它也可能有自己的性能开销。
#9
0
I think it depends greatly on what you are doing with the log files afterwards.
我认为这在很大程度上取决于你之后对日志文件做了什么。
Of the two operations writing to the log file will be faster - especially as you are suggesting writing to a database on another server.
写入日志文件的两个操作中的速度会更快 - 特别是当您建议写入另一台服务器上的数据库时。
However if you are then trying to process and search the log files on a regular basis then the best place to do this would be a database.
但是,如果您正在尝试定期处理和搜索日志文件,那么最好的地方就是数据库。
If you use a logging framework like log4net they often provide simple config file based ways of redirecting input to file or database.
如果使用log4net之类的日志框架,它们通常会提供基于配置文件的简单方法,将输入重定向到文件或数据库。
#10
0
I like Gaius' answer. Put all the log statements in a threadsafe queue and then process them from there. For DB you could batch them up, say 100 log statements in one batch and for file you could just stream them into the file as they come into the queue.
我喜欢Gaius的回答。将所有日志语句放在线程安全队列中,然后从那里处理它们。对于DB,您可以批量处理它们,在一个批处理中说100个日志语句,对于文件,您可以在它们进入队列时将它们流式传输到文件中。
File or Db? As many others say; it depends on what you need the log file for.
文件或Db?正如许多人所说;这取决于你需要的日志文件。