在使用Cassandra时使用应用程序的内部缓存

时间:2021-09-30 04:50:25

As I've been working with traditional relational database for a long time, moving to nosql, especially Cassandra, is a big change. I ussually design my application so that everything in the database are loaded into application's internal caches on startup and if there is any update to a database's table, its corresponding cache is updated as well. For example, if I have a table Student, on startup, all data in that table is loaded into StudentCache, and when I want to insert/update/delete, I will call a service which updates both of them at the same time. The aim of my design is to prevent selecting directly from the database.

由于我一直在使用传统的关系数据库很长一段时间,转向nosql,特别是Cassandra,是一个很大的变化。我设计我的应用程序,以便数据库中的所有内容在启动时加载到应用程序的内部缓存中,如果对数据库的表有任何更新,它的相应缓存也会更新。例如,如果我有一个表Student,在启动时,该表中的所有数据都会加载到StudentCache中,当我想插入/更新/删除时,我将调用一个同时更新它们的服务。我的设计目的是防止直接从数据库中选择。

In Cassandra, as the idea is to build table containing all needed data so that join is unnencessary, I wonder if my favorite design is still useful, or is it more effective to query data directly from the database (i.e. from one table) when required.

在Cassandra中,我的想法是构建包含所有所需数据的表,以便连接不必要,我想知道我最喜欢的设计是否仍然有用,或者在需要时直接从数据库(即从一个表)查询数据更有效。

2 个解决方案

#1


3  

Based on your described usecase I'd say that querying data as you need it prevents storing of data you dont need, plus what if your dataset is 5Gb? Are you still going to load the entire dataset?

基于您描述的用例,我会说在您需要时查询数据可以防止存储您不需要的数据,以及如果您的数据集是5Gb会怎样?你还要加载整个数据集吗?

Maybe consider a design where you dont load all the data on startup, but load it as needed and then store it and check this store before querying again, like what a cache does!

也许考虑一个你不在启动时加载所有数据的设计,但是根据需要加载它然后存储它并在再次查询之前检查这个存储,就像缓存一样!

Cassandra is built to scale, your design cant handle scaling, you'll reach a point where your dataset is too large. Based on that, you should think about a tradeoff. Lots of on-the-fly querying vs storing everything in the client. I would advise direct queries, but store data when you do carry out a query, dont discard it and then carry out the same query again!

Cassandra是按比例缩放的,您的设计无法处理缩放,您将达到数据集过大的程度。基于此,你应该考虑权衡。大量的动态查询与将所有内容存储在客户端中。我建议直接查询,但在执行查询时存储数据,不要丢弃它,然后再次执行相同的查询!

#2


1  

I would suggest to query the data directly as saving all the data to the application makes the applications performance based on the input. Now this might be a good thing if you know that the amount of data will never exceed your target machine's memory.

我建议直接查询数据,因为保存所有数据到应用程序使应用程序性能基于输入。现在,如果您知道数据量永远不会超过目标机器的内存,那么这可能是一件好事。

Should you however decide that this limit should change (higher!) you will be faced with a problem. Taking this approach will be fast when it comes down to searching (assuming you sort the result at start) but will pretty much kill maintainability.

但是,如果你决定这个限制应该改变(更高!),你将面临一个问题。当涉及到搜索时(假设您在开始时对结果进行排序),采用这种方法会很快,但几乎会破坏可维护性。

The former favorite 'approach' is however still usefull should you choose for this.

然而,如果您选择这种方式,那么前者最喜欢的“方法”仍然有用。

#1


3  

Based on your described usecase I'd say that querying data as you need it prevents storing of data you dont need, plus what if your dataset is 5Gb? Are you still going to load the entire dataset?

基于您描述的用例,我会说在您需要时查询数据可以防止存储您不需要的数据,以及如果您的数据集是5Gb会怎样?你还要加载整个数据集吗?

Maybe consider a design where you dont load all the data on startup, but load it as needed and then store it and check this store before querying again, like what a cache does!

也许考虑一个你不在启动时加载所有数据的设计,但是根据需要加载它然后存储它并在再次查询之前检查这个存储,就像缓存一样!

Cassandra is built to scale, your design cant handle scaling, you'll reach a point where your dataset is too large. Based on that, you should think about a tradeoff. Lots of on-the-fly querying vs storing everything in the client. I would advise direct queries, but store data when you do carry out a query, dont discard it and then carry out the same query again!

Cassandra是按比例缩放的,您的设计无法处理缩放,您将达到数据集过大的程度。基于此,你应该考虑权衡。大量的动态查询与将所有内容存储在客户端中。我建议直接查询,但在执行查询时存储数据,不要丢弃它,然后再次执行相同的查询!

#2


1  

I would suggest to query the data directly as saving all the data to the application makes the applications performance based on the input. Now this might be a good thing if you know that the amount of data will never exceed your target machine's memory.

我建议直接查询数据,因为保存所有数据到应用程序使应用程序性能基于输入。现在,如果您知道数据量永远不会超过目标机器的内存,那么这可能是一件好事。

Should you however decide that this limit should change (higher!) you will be faced with a problem. Taking this approach will be fast when it comes down to searching (assuming you sort the result at start) but will pretty much kill maintainability.

但是,如果你决定这个限制应该改变(更高!),你将面临一个问题。当涉及到搜索时(假设您在开始时对结果进行排序),采用这种方法会很快,但几乎会破坏可维护性。

The former favorite 'approach' is however still usefull should you choose for this.

然而,如果您选择这种方式,那么前者最喜欢的“方法”仍然有用。