用于审计数据的NoSQL或RDBMS

时间:2022-09-23 17:43:50

I know that similar questions were asked in the subject, but I still haven't seen anyone that completely contained all my requests.

我知道在这个主题中提出了类似的问题,但我还没有看到任何人完全包含我的所有要求。

I would start by saying that I only have experience in RDBMS's so I'm sorry if I get anything regarding NoSQL wrong.

我首先要说的是我只有RDBMS的经验,所以如果我对NoSQL有任何错误,我会很抱歉。

I'm creating a database that would hold a large amount of audit logs (about 1TB).

我正在创建一个包含大量审计日志(大约1TB)的数据库。

I'm using it for:

我用它来:

  1. Fast data writing (a massive amount of audit logs is written all the time)

    快速数据写入(大量审计日志一直写入)

  2. Search - search over the audit data (search actions performed by a certain user, at a certain time or a certain action... the database should support searching any of the 'columns' very quickly)

    搜索 - 搜索审计数据(由某个用户执行的搜索操作,在特定时间或某个操作...数据库应该支持非常快速地搜索任何'列')

  3. Analytics & Reporting - Generate daily, weekly, monthly reports of the data (They are predefined at the moment.. if they are more dynamic, does it affect the solution I should choose?)

    分析和报告 - 生成每日,每周,每月的数据报告(它们是目前预定义的......如果它们更具动态性,是否会影响我应该选择的解决方案?)

Reliability (support for fail-over or any similar feature), Scalability (If I grow above 1TB to 2TB, 10TB or 100TB - does any of the solutions can't support this amount of data?) and of course Performance (in the use cases I specified) are very important to me.

可靠性(支持故障转移或任何类似功能),可扩展性(如果我增长到1TB以上,2TB,10TB或100TB - 任何解决方案都不能支持这么多数据吗?)当然还有性能(在使用中)我指定的案例)对我来说非常重要。

I know RDBMS and that would be my easy way of starting, but I'm really concerned that after a while, the DB would simply not keep up with the pace.

我知道RDBMS,这将是我开始的简单方法,但我真的担心,过了一段时间,数据库根本无法跟上节奏。

My question is should I pick an RDBMS or NoSQL solution and why? If a NoSQL solution, since they are so different, which of them do you think fits my needs?

我的问题是我应该选择RDBMS或NoSQL解决方案,为什么?如果是NoSQL解决方案,因为它们是如此不同,您认为哪些适合我的需求?

1 个解决方案

#1


7  

Generally there isn't a right or wrong answer here.

一般来说,这里没有正确或错误的答案。

Fast data writing, either solution will be ok, although you didn't say what volume per second you are storing. Both solutions have things to watch out for.

快速数据写入,任何一种解决方案都可以,但您没有说明您每秒存储的音量。这两种解决方案都需要注意。

Search (very quick) over all columns. For smaller volumes, say few hundred Gb, then either solution will be Ok (assuming skilled people put it together). You didn't actually say how fast/often you search, so if it is many times per minute this consideration becomes more important. Fast search can often slow down ability to write high volumes quickly as indexes required for search need to be updated.

在所有列上搜索(非常快)。对于较小的体积,比如几百Gb,那么任何一种解决方案都是好的(假设技术人员把它放在一起)。你实际上没有说你搜索的速度有多快/经常,所以如果它每分钟多次,这个考虑变得更加重要。快速搜索通常会降低快速写入大量卷的能力,因为需要更新搜索所需的索引。

Audit records typically have a time component, so searching that is time constrained, eg within last 7 days, will significantly speed up search times compared to search all records.

审计记录通常具有时间组件,因此与搜索所有记录相比,搜索时间受限(例如在过去7天内)将显着加快搜索时间。

Reporting. When you get up to 100Tb, you are going to need some real tricks, or a big budget, to get fast reporting. For static reporting, you will probably end up creating one program that generates multiple reports at once to save I/O. Dynamic reports will be the tricky one.

报告。当你达到100Tb时,你需要一些真正的技巧或大预算才能获得快速报告。对于静态报告,您可能最终会创建一个程序,一次生成多个报告以节省I / O.动态报告将是棘手的。

My opinion? Since you know RDBMS, I would start with that as a method and ship the solution. This buys you time to learn the real problems you will encounter (the no premature optimization that many on SO are keen on). During this initial timeframe you can start to select nosql solutions and learn them. I am assuming here that you want to run your own hardware/database, if you want to use cloud type solutions, then go to them straight away.

我的想法?既然您了解RDBMS,我将从它开始作为一种方法并提供解决方案。这可以让你有时间学习你将遇到的真正问题(许多SO都不热衷于过早的优化)。在此初始时间范围内,您可以开始选择nosql解决方案并了解它们。我假设您想要运行自己的硬件/数据库,如果您想使用云类型解决方案,请立即转到它们。

#1


7  

Generally there isn't a right or wrong answer here.

一般来说,这里没有正确或错误的答案。

Fast data writing, either solution will be ok, although you didn't say what volume per second you are storing. Both solutions have things to watch out for.

快速数据写入,任何一种解决方案都可以,但您没有说明您每秒存储的音量。这两种解决方案都需要注意。

Search (very quick) over all columns. For smaller volumes, say few hundred Gb, then either solution will be Ok (assuming skilled people put it together). You didn't actually say how fast/often you search, so if it is many times per minute this consideration becomes more important. Fast search can often slow down ability to write high volumes quickly as indexes required for search need to be updated.

在所有列上搜索(非常快)。对于较小的体积,比如几百Gb,那么任何一种解决方案都是好的(假设技术人员把它放在一起)。你实际上没有说你搜索的速度有多快/经常,所以如果它每分钟多次,这个考虑变得更加重要。快速搜索通常会降低快速写入大量卷的能力,因为需要更新搜索所需的索引。

Audit records typically have a time component, so searching that is time constrained, eg within last 7 days, will significantly speed up search times compared to search all records.

审计记录通常具有时间组件,因此与搜索所有记录相比,搜索时间受限(例如在过去7天内)将显着加快搜索时间。

Reporting. When you get up to 100Tb, you are going to need some real tricks, or a big budget, to get fast reporting. For static reporting, you will probably end up creating one program that generates multiple reports at once to save I/O. Dynamic reports will be the tricky one.

报告。当你达到100Tb时,你需要一些真正的技巧或大预算才能获得快速报告。对于静态报告,您可能最终会创建一个程序,一次生成多个报告以节省I / O.动态报告将是棘手的。

My opinion? Since you know RDBMS, I would start with that as a method and ship the solution. This buys you time to learn the real problems you will encounter (the no premature optimization that many on SO are keen on). During this initial timeframe you can start to select nosql solutions and learn them. I am assuming here that you want to run your own hardware/database, if you want to use cloud type solutions, then go to them straight away.

我的想法?既然您了解RDBMS,我将从它开始作为一种方法并提供解决方案。这可以让你有时间学习你将遇到的真正问题(许多SO都不热衷于过早的优化)。在此初始时间范围内,您可以开始选择nosql解决方案并了解它们。我假设您想要运行自己的硬件/数据库,如果您想使用云类型解决方案,请立即转到它们。