平面文件数据库有什么用?

Informed options needed about the merits of flat file database. I'm considering using a flat file database scheme to manage data for a custom blog. It would be deployed on Linux OS variant and written in Java.

需要了解平面文件数据库的优点。我正在考虑使用平面文件数据库模式来管理自定义博客的数据。它将部署在Linux OS变体上，并用Java编写。

What are the possible negatives or positives regarding performance for reading and writing of both articles and comments?

在阅读和撰写文章和评论时，哪些可能是负面的或正面的?

Would article retrieval crap out because of it being a flat file rather than a RDBMS if it were to get slash-doted? (Wishful thinking)

文章检索会不会因为它是一个平面文件而不是RDBMS而变得一团糟?(一厢情愿)

I'm not against using a RDBMS, just asking the community their opinion on the viability of such a software architecture scheme.

我不反对使用RDBMS，只是询问社区他们对这种软件架构方案的可行性的看法。

Follow Up: In the case of this question I would see “Flat file == file system–based” For example each blog entry and its accompanying metadata would be in a single file. Making for many files organized by date structure of the file folders (blogs\testblog2\2008\12\01) == 12/01/2008

跟进:在这个问题中，我将看到“Flat file == file system-based”，例如，每个blog条目及其伴随的元数据将在一个文件中。根据文件文件夹的日期结构来组织许多文件(blogs\testblog2\2008\12\01) = 12/01/2008

11 个解决方案

#1

Flat file databases have their place and are quite workable for the right domain.

平面文件数据库有它们的位置，并且对于正确的域来说是非常可行的。

Mail servers and NNTP servers of the past really pushed the limits of how far you can really take these things (which is actually quite far -- files systems can have millions of files and directories).

过去的邮件服务器和NNTP服务器确实突破了这些东西的极限(实际上是相当远的——文件系统可以有数百万个文件和目录)。

Flat file DBs two biggest weaknesses are indexing and atomic updates, but if the domain is suitable these may not be an issue.

平面文件DBs的两个最大的缺点是索引和原子更新，但是如果域是合适的，这可能不是问题。

But you can, for example, with proper locking, do an "atomic" index update using basic file system commands, at least on Unix.

但是，例如，通过适当的锁定，您可以使用基本的文件系统命令进行“原子”索引更新，至少在Unix上是这样。

A simple case is having the indexing process running through the data to create the new index file under a temporary name. Then, when you are done, you simply rename (either the system call rename(2) or the shell mv command) the old file over the new file. Rename and mv are atomic operations on a Unix system (i.e. it either works or it doesn't and there's never a missing "in between state").

一个简单的例子是让索引过程在数据中运行，以在临时名称下创建新的索引文件。然后，完成之后，只需在新文件上重命名旧文件(系统调用rename(2)或shell mv命令)。重命名和mv是Unix系统上的原子操作(例如，它可以工作，也可以不工作，并且在状态之间没有缺失)。

Same with creating new entries. Basically write the file fully to a temp file, then rename or mv it in to its final place. Then you never have an "intermediate" file in the "DB". Otherwise, you might have a race condition (such as a process reading a file that is still being written, and may get to the end before the writing process is complete -- ugly race condition).

创建新条目也是如此。基本上就是将文件完整地写入一个临时文件中，然后将其重命名为mv。那么在“DB”中永远不会有“中间”文件。否则，您可能会有一个竞争条件(比如读取仍在编写的文件的进程，并可能在编写过程完成之前到达终点——丑陋的竞争条件)。

If your primary indexing works well with directory names, then that works just fine. You can use a hashing scheme, for example, to create directories and subdirectories to locate new files.

如果主索引可以很好地使用目录名，那么这就很好了。例如，可以使用散列方案创建目录和子目录来定位新文件。

Finding a file using the file name and directory structure is very fast as most filesystems today index their directories.

使用文件名称和目录结构查找文件的速度非常快，因为大多数文件系统现在都在索引它们的目录。

If you're putting a million files in a directory, there may well be tuning issues you'll want to look in to, but out of that box most will handle 10's of thousands easily. Just remember that if you need to SCAN the directory, there's going to be a lot of files to scan. Partitioning via directories helps prevent that.

如果您在一个目录中放置了一百万个文件，那么可能会有一些您想要查看的问题，但是在这个框中，大多数都可以轻松处理10个问题。只要记住，如果你需要扫描目录，会有很多文件需要扫描。通过目录进行分区可以防止这种情况发生。

But that all depends on your indexing and searching techniques.

但这一切都取决于你的索引和搜索技术。

Effectively, a stock off the shelf web server serving up static content is a large, flat file database, and the model works pretty good.

实际上，提供静态内容的货架web服务器的库存是一个大型的、平面的文件数据库，模型运行得非常好。

Finally, of course, you have the plethora of free Unix file system level tools at your disposal, but all them have issues with zillions of files (forking grep 1000000 times to find something in a file will have performance tradeoffs -- the overhead simply adds up).

最后，当然，您可以使用大量免费的Unix文件系统级别工具，但所有这些工具都存在大量文件的问题(为了在文件中找到某样东西，将grep转换为1000000次将会有性能权衡——开销只会增加)。

If all of your files are on the same file system, then hard links also give you options (since they, too, are atomic) in terms of putting the same file in different places (basically for indexing).

如果所有的文件都在同一个文件系统上，那么硬链接也可以为您提供在不同位置放置相同文件的选项(主要用于索引)(因为它们也是原子文件)。

For example, you could have a "today" directory, a "yesterday" directory, a "java" directory, and the actual message directory.

例如，您可以有一个“今天”目录、一个“昨天”目录、一个“java”目录和实际的消息目录。

So, a post could be linked in the "today" directory, the "java" directory (because the post is tagged with "java", say), and in its final place (say /articles/2008/12/01/my_java_post.txt). Then, at midnight, you run two processes. The first one takes all files in the "today" directory, checks their create date to make sure they're not "today" (since the process can take several seconds and a new file might sneak in), and renames those files to "yesterday". Next, you do the same thing for the "yesterday" directory, only here you simply delete them if they're out of date.

因此，可以在“today”目录、“java”目录(因为post被标记为“java”)和最终位置(比如/articles/2008/12/01/my_java_post.txt)中链接一个post。然后，在午夜，运行两个进程。第一个是获取“today”目录中的所有文件，检查它们的创建日期，以确保它们不是“today”(因为这个过程可能需要几秒钟，新文件可能会悄悄进入)，然后将这些文件重命名为“yesterday”。接下来，对“昨日”目录执行相同的操作，只有在这里，如果它们过时，您只需删除它们。

Meanwhile, the file is still in the "java" and the ".../12/01" directory. Since you're using a Unix file system, and hard links, the "file" only exists once, these are all just pointers to the file. None of them are "the" file, they're all the same.

同时，文件仍在“java”和“…”中。/ 12/01”目录中。由于您正在使用Unix文件系统和硬链接，“文件”只存在一次，所以这些都只是指向文件的指针。它们都不是“文件”，都是一样的。

You can see that while each individual file move is atomic, the bulk is not. For example, while the "today" script is running, the "yesterday" directory can well contain files from both "yesterday" and "the day before" because the "yesterday" script had not yet run.

您可以看到，虽然每个单独的文件移动都是原子的，但是块不是。例如，当“today”脚本运行时，“昨天”目录可以很好地包含来自“昨天”和“前一天”的文件，因为“昨天”脚本还没有运行。

In a transactional DB, you would do that all at once.

在事务DB中，您可以一次完成所有这些操作。

But, simply, it is a tried and true method. Unix, in particular, works VERY well with that idiom, and the modern file systems can support it quite well as well.

但是，简单地说，这是一种行之有效的方法。尤其是Unix，它非常适合这种习惯用法，现代文件系统也可以很好地支持它。

#2

(answer copied and modified from here)

(回答从这里复制和修改)

I would advise against using a flat file for anything besides read-only access, because then you'd have to deal with concurrency issues like making sure only one process is writing to the file at once. Instead, I recommend SQLite, a fully functional SQL database that's stored in a file. SQLite already has built-in concurrency, so you don't have to worry about things like file locking, and it's really fast for reads.

我建议不要在只读访问之外使用平面文件，因为那样您就必须处理并发问题，比如确保一次只向文件写入一个进程。相反，我建议使用SQLite，这是一个存储在文件中的全功能SQL数据库。SQLite已经有了内置的并发性，所以您不必担心文件锁定之类的问题，而且它的读取速度非常快。

If, however, you are doing lots of database changes, it's best to do them all at once inside a transaction. This will only write the changes to the file once, as opposed to every time an change query is issued. This dramatically increases the speed of doing multiple changes.

但是，如果您正在进行大量的数据库更改，那么最好在事务中同时进行这些更改。这将只对文件写一次更改，而不是每次发出更改查询。这极大地提高了执行多个更改的速度。

When a change query is issued, whether it's inside a tranasction or not, the whole database is locked until that query finishes. This means that extremely large transactions could adversely affect the performance of other processes because they must wait for the transaction to finish before they can access the database. In practice, I haven't found this to be that noticeable, but it's always good practice to try to minimize the number of database modifying queries you issue, and it's certainly faster then trying to use a flat file.

当发出变更查询时，无论它是否在转换中，整个数据库都被锁定，直到查询结束。这意味着，非常大的事务可能会对其他进程的性能产生不利影响，因为它们必须等待事务完成才能访问数据库。在实践中，我还没有发现这一点很明显，但是最好尽量减少您发出的数据库修改查询的数量，而且它肯定比尝试使用平面文件要快。

#3

This has been done with asp.net with Dasblog. It uses file based storage.

这是用asp.net和Dasblog做的。它使用基于文件的存储。

A few details are listed on this older link. http://www.hanselman.com/blog/UpcomingDasBlog19.aspx

在这个旧的链接中列出了一些细节。http://www.hanselman.com/blog/UpcomingDasBlog19.aspx

You can also get more details on http://dasblog.info/Features.aspx

您还可以在http://dasblog.info/Features.aspx上获得更多细节

I've heard some mixed opinions on the performance. I'd suggest you research that a bit more to see if that type of system would work well for you. This is the closest thing I have heard about yet.

我听到一些人对演出的意见不一。我建议你多研究一下，看看这种系统是否适合你。这是我听说过的最接近的事情。

#4

Writing your own engine in native code can outperform a general purpose database.

用本机代码编写自己的引擎可以胜过通用数据库。

However, the quality of the engine and the feature level will never approach that. All the things that databases give you as core features - indexing, transactions, referential integrity - you would have to implement all them yourself.

但是，引擎的质量和特性级别永远不会接近。数据库给您的所有东西都是核心特性——索引、事务、引用完整性——您必须自己实现它们。

There's nothing wrong than reinventing the wheel (after all, Linux was just that), but keep in mind your expectations and time commitment.

重新发明*没有什么错(毕竟，Linux就是这样)，但要记住你的期望和时间承诺。

#5

I'm answering this not to answer why flat file databases are good or bad, others have done an ample job at that.

我回答这个问题并不是要回答为什么平面文件数据库是好是坏，其他人在这方面做了大量的工作。

However, some have been pointing at SQLite which does it's job just fine. Since you are using Java, your best option would be to use HSQLDB, which does precisely the same as SQLite, but is implemented in Java and embeds into your application.

然而，一些人一直指着SQLite，它做得很好。因为您正在使用Java，所以最好的选择是使用HSQLDB，它与SQLite的功能完全相同，但是是在Java中实现的，并且嵌入到您的应用程序中。

#6

Most of the time a flat file database is enough now. But you will thank your younger self if you start your project with a database. This could be SQLite, if you don't want to set up a whole database system like PostgreSQL.

大多数时候，平面文件数据库已经足够了。但如果你用数据库启动项目，你会感谢年轻的自己。如果您不想建立像PostgreSQL这样的整个数据库系统，那么可以使用SQLite。

#7

Horrible idea. Appending would involve seeking to the end of the file every time you want to add something. Updating would require rewriting the entire file each time. Reading involves a table scan (or maintaining a separate index, which would have the same problems with writing/updating). Just use a database unless, of course, you re-implement all the stuff that an RDBMS already provides to make your solution even moderately scalable.

可怕的想法。附加将涉及到每次想要添加内容时都要查找文件的末尾。每次更新都需要重写整个文件。读取涉及表扫描(或维护单独的索引，这与编写/更新有相同的问题)。只要使用数据库，当然，除非您重新实现RDBMS已经提供的所有内容，使您的解决方案具有适度的可伸缩性。

#8

They seem to work quite well for high-write, low-read, no-update databases, where new data is appended.

对于添加新数据的高编写、低读、无更新的数据库，它们似乎非常适用。

Web servers and their cousins rely on them heavily for log files.

Web服务器和它们的同类在日志文件中严重依赖它们。

DBMS software as well use them for logs.

DBMS软件也将它们用作日志。

If your design falls within these limits, you're in good company, it seems. You might want to keep metadata and pointers in a database, and set up some kind of fast asynchronous queue-writer to buffer the comments, but the filesystem is already pretty good at that level of buffering and write-locking.

如果你的设计在这些限制之内，你似乎有一个很好的伙伴。您可能希望在数据库中保存元数据和指针，并设置某种快速的异步队列写入器来缓冲注释，但是文件系统在缓冲和写锁定级别上已经做得很好了。

#9

Flat file databases are possible but consider the following.

平面文件数据库是可能的，但请考虑以下内容。

Databases need to attain all the ACID elements (atomicity, consistency, isolation, durability) and, if you're going to ensure that's all done in a flat file (especially with concurrent access), you've basically written a full-blown DBMS.

数据库需要获得所有的ACID元素(原子性、一致性、隔离性、持久性)，如果您要确保所有这些都在平面文件中完成(尤其是并发访问)，那么您基本上已经编写了一个完整的DBMS。

So why not use a full-blown DBMS in the first place?

那么为什么不首先使用一个完整的DBMS呢?

You'll save yourself the time and money involved with writing (and re-writing many times, I'll guarantee) if you just go with one of the free options (SQLite, MySQL, PostgresSQL, and so on).

如果您只使用其中的一个免费选项(SQLite、MySQL、PostgresSQL等等)，那么您将节省编写所需的时间和金钱(我保证会多次重写)。

#10

You can use fiat file databases if it is small enough does not have lost of random access. Big file with lot of random access will be very slow. And no complex queries. No joins, no sum, group by etc. You also can not expect to fetch hierarchical data from flat file. XML format is much better for complex structures.

如果菲亚特文件数据库足够小，并且没有丢失随机访问，您可以使用它。具有大量随机访问的大文件将非常缓慢。也没有复杂的查询。没有连接、没有和、组by等。您也不能期望从平面文件中获取分层数据。XML格式更适合复杂结构。

#11

-1

Check this out http://jsondb.io a opensource Java based database has most of what you are looking for. Saves data as flat .json files, Multithreading Support, Encryption Support, ORM support, Atomicity Support, XPATH based advanced query support.

看看这个http://jsondb。一个基于Java的开放源码数据库拥有您所需要的大部分内容。将数据保存为平面.json文件、多线程支持、加密支持、ORM支持、原子性支持、基于XPATH的高级查询支持。

Disclaimer: I created this database.

声明:我创建了这个数据库。

#1