在文件中可靠地存储java对象的最小代码

时间:2022-10-24 19:38:18

In my tiny little standalone Java application I want to store information.

在我的小型独立Java应用程序中,我希望存储信息。

My requirements:

我的要求:

  • read and write java objects (I do not want to use SQL, and also querying is not required)
  • 读写java对象(我不想使用SQL,也不需要查询)
  • easy to use
  • 易于使用的
  • easy to setup
  • 容易设置
  • minimal external dependencies
  • 最小的外部依赖

I therefore want to use jaxb to store all the information in a simple XML-file in the filesystem. My example application looks like this (copy all the code into a file called Application.java and compile, no additional requirements!):

因此,我希望使用jaxb将所有信息存储在文件系统中一个简单的xml文件中。我的示例应用程序是这样的(将所有代码复制到一个名为application的文件中。java和编译,没有额外的要求!)

@XmlRootElement
class DataStorage {
    String emailAddress;
    List<String> familyMembers;
    // List<Address> addresses;
}

public class Application {

    private static JAXBContext jc;
    private static File storageLocation = new File("data.xml");

    public static void main(String[] args) throws Exception {
        jc = JAXBContext.newInstance(DataStorage.class);

        DataStorage dataStorage = load();

        // the main application will be executed here

        // data manipulation like this:
        dataStorage.emailAddress = "me@example.com";
        dataStorage.familyMembers.add("Mike");

        save(dataStorage);
    }

    protected static DataStorage load() throws JAXBException {
        if (storageLocation.exists()) {
            StreamSource source = new StreamSource(storageLocation);
            return (DataStorage) jc.createUnmarshaller().unmarshal(source);
        }
        return new DataStorage();
    }

    protected static void save(DataStorage dataStorage) throws JAXBException {
        jc.createMarshaller().marshal(dataStorage, storageLocation);
    }
}

How can I overcome these downsides?

我怎样才能克服这些缺点呢?

  • Starting the application multiple times could lead to inconsistencies: Several users could run the application on a network drive and experience concurrency issues
  • 多次启动应用程序可能会导致不一致:多个用户可以在网络驱动器上运行应用程序并体验并发问题
  • Aborting the write process might lead to corrupted data or loosing all data
  • 中止写过程可能导致数据损坏或丢失所有数据

5 个解决方案

#1


2  

To answer your three issues you mentioned:

要回答你提到的三个问题:

Starting the application multiple times could lead to inconsistencies

Why would it lead to inconsistencies? If what you mean is multiple concurrent edit will lead to inconsistencies, you just have to lock the file before editing. The easiest way to create a lock file beside the file. Before starting edit, just check if a lock file exists.

为什么会导致不一致?如果您的意思是多个并发编辑将导致不一致,您只需在编辑之前锁定文件即可。在文件旁边创建锁文件的最简单的方法。在开始编辑之前,只需检查锁文件是否存在。

If you want to make it more fault tolerant, you could also put a timeout on the file. e.g. a lock file is valid for 10 minutes. You could write a randomly generated uuid in the lockfile, and before saving, you could check if the uuid stil matches.

如果您想使它更容错,您还可以在文件上设置一个超时。锁文件有效期为10分钟。您可以在lockfile中编写一个随机生成的uuid,在保存之前,您可以检查uuid stil是否匹配。

Several users could run the application on a network drive and experience concurrency issues

I think this is the same as number 1.

我认为这和数字1是一样的。

Aborting the write process might lead to corrupted data or loosing all data

This can be solved by making the write atomic or the file immutable. To make it atomic, instead of editing the file directly, just copy the file, and edit on the copy. After the copy is saved, just rename the files. But if you want to be on the safer side, you could always do things like append the timestamp on the file and never edit or delete a file. So every time an edit is made, you create a copy of it, with a newer timestamp appended on the file. And for reading, you will read always the newest one.

这可以通过使写入原子或文件不可变来解决。要使其具有原子性,只需复制文件并对其进行编辑,而不是直接编辑文件。保存副本后,只需重命名文件。但是如果你想要更安全,你可以在文件上添加时间戳,而不是编辑或删除文件。因此,每次编辑完成时,都创建一个副本,并在文件中添加一个更新的时间戳。对于阅读,你总是会读到最新的。

#2


7  

Seeing your requirements:

看到你的要求:

  • Starting the application multiple times
  • 多次启动应用程序
  • Several users could run the application on a network drive
  • 几个用户可以在网络驱动器上运行应用程序
  • Protection against data corruption
  • 防止数据损坏

I believe that an XML based filesystem will not be sufficient. If you consider a proper relational database an overkill, you could still go for an H2 db. This is a super-lightweight db that would solve all these problems above (even if not perfectly, but surely much better than a handwritten XML db), and is still very easy to setup and maintain.

我认为基于XML的文件系统是不够的。如果您认为一个适当的关系数据库是一个超杀,那么您仍然可以选择H2 db。这是一个超轻量的db,可以解决上述所有问题(即使不是很完美,但肯定比手写的XML db要好得多),而且仍然非常容易安装和维护。

You can configure it to persist your changes to the disk, can be configured to run as a standalone server and accept multiple connections, or can run as part of your application in embedded-mode too.

您可以将其配置为将更改持久化到磁盘,可以将其配置为作为独立服务器运行并接受多个连接,或者也可以作为应用程序的一部分以嵌入模式运行。

Regarding the "How do you save the data" part:

关于“如何保存数据”部分:

In case you do not want to use any advanced ORM library (like Hibernate or any other JPA implementation) you can still use plain old JDBC. Or at least some Spring-JDBC, which is very lightweight and easy to use.

如果您不想使用任何高级ORM库(如Hibernate或任何其他JPA实现),您仍然可以使用普通的旧JDBC。或者至少是一些Spring-JDBC,它非常轻且易于使用。

"What do you save"

“你保存”

H2 is a relational database. So whatever you save, it will end up in columns. But! If you really do not plan to query your data (neither apply migration scripts on it), saving your already XML-serialized objects is an option. You can easily define a table with an ID + a "data" varchar column, and save your xml there. There is no limit on data-length in H2DB.

H2是一个关系数据库。所以不管你存什么,它都会以列的形式结束。但是!如果您真的不打算查询数据(也不打算在数据上应用迁移脚本),保存已经xml序列化的对象是一种选择。可以使用ID +“data”varchar列轻松定义表,并将xml保存在其中。H2DB中的数据长度没有限制。

Note: Saving XML in a relational database is generally not a good idea. I am only advising you to evaluate this option, because you seem confident that you only need a certain set of features from what an SQL implementation can provide.

注意:在关系数据库中保存XML通常不是一个好主意。我只是建议您对这个选项进行评估,因为您似乎确信,您只需要SQL实现提供的某些特性。

#3


3  

Inconsistencies and concurrency are handled in two ways:

不一致和并发处理有两种方式:

  • by locking
  • 通过锁定
  • by versioning
  • 通过版本控制

Corrupted writing can not be handled very well at application level. The file system shall support journaling, which tries to fix that up to some extent. You can do this also by

在应用程序级别上,损坏的书写不能很好地处理。文件系统应该支持日志记录,它试图在一定程度上修复日志记录。你也可以这样做

  • making your own journaling file (i.e. a short-lived separate file containing changes to be committed to the real data file).
  • 创建自己的日志文件(即包含要提交给真实数据文件的更改的短期独立文件)。

All of these features are available even in the simplest relational database, e.g. H2, SQLite, and even a web page can use such features in HTML5. It is quite an overkill to reimplement these from scratch, and the proper implementation of the data storage layer will actually make your simple needs quite complicated.

即使在最简单的关系数据库中,也可以使用所有这些特性,例如H2、SQLite,甚至一个web页面都可以使用HTML5中的这些特性。从头重新实现这些内容实在是太过分了,正确地实现数据存储层实际上会使您的简单需求变得非常复杂。

But, just for the records:

但是,只是为了记录:

Concurrency handling with locks

  • prior starting to change the xml, use a file lock to gain an exclusive access to the file, see also How can I lock a file using java (if possible)
  • 在开始更改xml之前,使用文件锁来获得对文件的独占访问,请参见如何使用java锁定文件(如果可能)
  • once the update is done, and you sucessfully closed the file, release the lock
  • 更新完成后,成功关闭文件,释放锁

Consistency (atomicity) handling with locks

  • other application instances may still try to read the file, while one of the apps are writing it. This can cause inconsistency (aka dirty-read). Ensure that during writing, the writer process has an exclusive lock on the file. If it is not possible to gain an exclusive access lock, the writer has to wait a bit and retry.

    其他应用程序实例可能仍然尝试读取文件,而其中一个应用程序正在编写它。这可能导致不一致(也称为脏读)。确保在写入过程中,写入进程对文件具有独占锁。如果无法获得独占访问锁,则写入器必须等待一段时间并重试。

  • an application reading the file shall read it (if it can gain access, no other instances do an exclusive lock), then close the file. If reading is not possible (because of other app locking), wait and retry.

    读取该文件的应用程序应该读取它(如果它可以访问,则没有其他实例执行独占锁),然后关闭该文件。如果无法阅读(由于其他应用程序锁定),请等待并重试。

  • still an external application (e.g. notepad) can change the xml. You may prefer an exclusive read-lock while reading the file.

    仍然有一个外部应用程序(例如记事本)可以更改xml。在读取文件时,您可能更喜欢独占的读锁。

Basic journaling

Here the idea is that if you may need to do a lot of writes, (or if you later on might want to rollback your writes) you don't want to touch the real file. Instead:

这里的想法是,如果您可能需要进行大量的写操作(或者如果您稍后可能想要回滚您的写操作),那么您不希望触及真正的文件。而不是:

  • writes as changes go to a separate journaling file, created and locked by your app instance

    更改时写入到一个单独的日志文件,由应用程序实例创建并锁定

  • your app instance does not lock the main file, it locks only the journaling file

    应用程序实例没有锁定主文件,它只锁定日志文件

  • once all the writes are good to go, your app opens the real file with exclusive write lock, and commits every change in the journaling file, then close the file.

    一旦所有的写入操作都很好,您的应用程序就会打开真正的文件,并将所有的更改提交到日志文件中,然后关闭该文件。

As you can see, the solution with locks makes the file as a shared resource, which is protected by locks and only one applicaition can access to the file at a time. This solves the concurrency issues, but also makes the file access as a bottleneck. Therefore modern databases such as Oracle use versioning instead of locking. The versioning means that both the old and the new version of the file are available at the same time. Readers will be served by the old, most complete file. Once writing of the new version is finished, it is merged to the old version, and the new data is getting available at once. This is more tricky to implement, but since it allows reading all the time for all applications in parallel, it scales much better.

如您所见,带有锁的解决方案使文件成为共享资源,该资源由锁保护,并且每次只能有一个应用程序访问该文件。这解决了并发问题,但也使文件访问成为瓶颈。因此,像Oracle这样的现代数据库使用版本控制而不是锁定。版本控制意味着文件的旧版本和新版本同时可用。读者将得到旧的,最完整的文件。一旦完成了新版本的编写,它就会被合并到旧版本中,并且新的数据将立即可用。实现这一点比较困难,但由于它允许对所有应用程序都进行并行读取,因此它的可扩展性要好得多。

#4


2  

note that your simple answer won't handle concurrent writes by different instances. if two instances make changes and save, simply picking the newest one will end up losing the changes from the other instance. as mentioned by other answers, you should probably try to use file locking for this.

注意,您的简单答案不能处理不同实例的并发写操作。如果两个实例进行了更改并保存,那么简单地选择最新的实例将最终丢失来自另一个实例的更改。正如其他答案所提到的,您可能应该尝试使用文件锁定。

a relatively simple solution:

一个相对简单的解决方案:

  • use a separate lock file for writing "data.xml.lck". lock this when writing the file
  • 使用一个单独的锁文件来编写“data.xml.lck”。写入文件时请锁定此文件
  • as mentioned in my comment, write to a temp file first "data.xml.tmp", then rename to the final name when the write is complete "data.xml". this will give a reasonable assurance that anyone reading the file will get a complete file.
  • 如我在评论中所提到的,先写一个临时文件“data.xml”。tmp,然后在写入完成“data.xml”时重命名为最终名称。这将提供一个合理的保证,即任何阅读该文件的人都将获得完整的文件。
  • even with the file locking, you still have to handle the "merge" problem (one instance reads, another writes, then the first wants to write). in order to handle this you should have a version number in the file content. when an instance wants to write, it first acquires the lock. then it checks its local version number against the file version number. if it is out of date, it needs to merge what is in the file with the local changes. then it can write a new version.
  • 即使有了文件锁定,您仍然必须处理“合并”问题(一个实例读取,另一个写入,然后第一个想要写入)。为了处理这个问题,您应该在文件内容中有一个版本号。当一个实例想要写入时,它首先获得锁。然后根据文件版本号检查本地版本号。如果过期,则需要将文件中的内容与本地更改合并。然后它可以编写一个新版本。

#5


0  

After thinking about it for a while, I would want to try to implement it like this:

在思考了一段时间之后,我想尝试像这样去实施它:

  • Open the data.<timestamp>.xml-file with the latest timestamp.
  • 打开数据。 <时间> 。带有最新时间戳的xml文件。
  • Only use readonly mode.
  • 只使用只读的模式。
  • Make changes.
  • 做出改变。
  • Save the file as data.<timestamp>.xml - do not overwrite and check that no file with newer timestamp exists.
  • 将文件保存为数据。 <时间戳> 。xml -不要覆盖并检查是否存在具有较新的时间戳的文件。

#1


2  

To answer your three issues you mentioned:

要回答你提到的三个问题:

Starting the application multiple times could lead to inconsistencies

Why would it lead to inconsistencies? If what you mean is multiple concurrent edit will lead to inconsistencies, you just have to lock the file before editing. The easiest way to create a lock file beside the file. Before starting edit, just check if a lock file exists.

为什么会导致不一致?如果您的意思是多个并发编辑将导致不一致,您只需在编辑之前锁定文件即可。在文件旁边创建锁文件的最简单的方法。在开始编辑之前,只需检查锁文件是否存在。

If you want to make it more fault tolerant, you could also put a timeout on the file. e.g. a lock file is valid for 10 minutes. You could write a randomly generated uuid in the lockfile, and before saving, you could check if the uuid stil matches.

如果您想使它更容错,您还可以在文件上设置一个超时。锁文件有效期为10分钟。您可以在lockfile中编写一个随机生成的uuid,在保存之前,您可以检查uuid stil是否匹配。

Several users could run the application on a network drive and experience concurrency issues

I think this is the same as number 1.

我认为这和数字1是一样的。

Aborting the write process might lead to corrupted data or loosing all data

This can be solved by making the write atomic or the file immutable. To make it atomic, instead of editing the file directly, just copy the file, and edit on the copy. After the copy is saved, just rename the files. But if you want to be on the safer side, you could always do things like append the timestamp on the file and never edit or delete a file. So every time an edit is made, you create a copy of it, with a newer timestamp appended on the file. And for reading, you will read always the newest one.

这可以通过使写入原子或文件不可变来解决。要使其具有原子性,只需复制文件并对其进行编辑,而不是直接编辑文件。保存副本后,只需重命名文件。但是如果你想要更安全,你可以在文件上添加时间戳,而不是编辑或删除文件。因此,每次编辑完成时,都创建一个副本,并在文件中添加一个更新的时间戳。对于阅读,你总是会读到最新的。

#2


7  

Seeing your requirements:

看到你的要求:

  • Starting the application multiple times
  • 多次启动应用程序
  • Several users could run the application on a network drive
  • 几个用户可以在网络驱动器上运行应用程序
  • Protection against data corruption
  • 防止数据损坏

I believe that an XML based filesystem will not be sufficient. If you consider a proper relational database an overkill, you could still go for an H2 db. This is a super-lightweight db that would solve all these problems above (even if not perfectly, but surely much better than a handwritten XML db), and is still very easy to setup and maintain.

我认为基于XML的文件系统是不够的。如果您认为一个适当的关系数据库是一个超杀,那么您仍然可以选择H2 db。这是一个超轻量的db,可以解决上述所有问题(即使不是很完美,但肯定比手写的XML db要好得多),而且仍然非常容易安装和维护。

You can configure it to persist your changes to the disk, can be configured to run as a standalone server and accept multiple connections, or can run as part of your application in embedded-mode too.

您可以将其配置为将更改持久化到磁盘,可以将其配置为作为独立服务器运行并接受多个连接,或者也可以作为应用程序的一部分以嵌入模式运行。

Regarding the "How do you save the data" part:

关于“如何保存数据”部分:

In case you do not want to use any advanced ORM library (like Hibernate or any other JPA implementation) you can still use plain old JDBC. Or at least some Spring-JDBC, which is very lightweight and easy to use.

如果您不想使用任何高级ORM库(如Hibernate或任何其他JPA实现),您仍然可以使用普通的旧JDBC。或者至少是一些Spring-JDBC,它非常轻且易于使用。

"What do you save"

“你保存”

H2 is a relational database. So whatever you save, it will end up in columns. But! If you really do not plan to query your data (neither apply migration scripts on it), saving your already XML-serialized objects is an option. You can easily define a table with an ID + a "data" varchar column, and save your xml there. There is no limit on data-length in H2DB.

H2是一个关系数据库。所以不管你存什么,它都会以列的形式结束。但是!如果您真的不打算查询数据(也不打算在数据上应用迁移脚本),保存已经xml序列化的对象是一种选择。可以使用ID +“data”varchar列轻松定义表,并将xml保存在其中。H2DB中的数据长度没有限制。

Note: Saving XML in a relational database is generally not a good idea. I am only advising you to evaluate this option, because you seem confident that you only need a certain set of features from what an SQL implementation can provide.

注意:在关系数据库中保存XML通常不是一个好主意。我只是建议您对这个选项进行评估,因为您似乎确信,您只需要SQL实现提供的某些特性。

#3


3  

Inconsistencies and concurrency are handled in two ways:

不一致和并发处理有两种方式:

  • by locking
  • 通过锁定
  • by versioning
  • 通过版本控制

Corrupted writing can not be handled very well at application level. The file system shall support journaling, which tries to fix that up to some extent. You can do this also by

在应用程序级别上,损坏的书写不能很好地处理。文件系统应该支持日志记录,它试图在一定程度上修复日志记录。你也可以这样做

  • making your own journaling file (i.e. a short-lived separate file containing changes to be committed to the real data file).
  • 创建自己的日志文件(即包含要提交给真实数据文件的更改的短期独立文件)。

All of these features are available even in the simplest relational database, e.g. H2, SQLite, and even a web page can use such features in HTML5. It is quite an overkill to reimplement these from scratch, and the proper implementation of the data storage layer will actually make your simple needs quite complicated.

即使在最简单的关系数据库中,也可以使用所有这些特性,例如H2、SQLite,甚至一个web页面都可以使用HTML5中的这些特性。从头重新实现这些内容实在是太过分了,正确地实现数据存储层实际上会使您的简单需求变得非常复杂。

But, just for the records:

但是,只是为了记录:

Concurrency handling with locks

  • prior starting to change the xml, use a file lock to gain an exclusive access to the file, see also How can I lock a file using java (if possible)
  • 在开始更改xml之前,使用文件锁来获得对文件的独占访问,请参见如何使用java锁定文件(如果可能)
  • once the update is done, and you sucessfully closed the file, release the lock
  • 更新完成后,成功关闭文件,释放锁

Consistency (atomicity) handling with locks

  • other application instances may still try to read the file, while one of the apps are writing it. This can cause inconsistency (aka dirty-read). Ensure that during writing, the writer process has an exclusive lock on the file. If it is not possible to gain an exclusive access lock, the writer has to wait a bit and retry.

    其他应用程序实例可能仍然尝试读取文件,而其中一个应用程序正在编写它。这可能导致不一致(也称为脏读)。确保在写入过程中,写入进程对文件具有独占锁。如果无法获得独占访问锁,则写入器必须等待一段时间并重试。

  • an application reading the file shall read it (if it can gain access, no other instances do an exclusive lock), then close the file. If reading is not possible (because of other app locking), wait and retry.

    读取该文件的应用程序应该读取它(如果它可以访问,则没有其他实例执行独占锁),然后关闭该文件。如果无法阅读(由于其他应用程序锁定),请等待并重试。

  • still an external application (e.g. notepad) can change the xml. You may prefer an exclusive read-lock while reading the file.

    仍然有一个外部应用程序(例如记事本)可以更改xml。在读取文件时,您可能更喜欢独占的读锁。

Basic journaling

Here the idea is that if you may need to do a lot of writes, (or if you later on might want to rollback your writes) you don't want to touch the real file. Instead:

这里的想法是,如果您可能需要进行大量的写操作(或者如果您稍后可能想要回滚您的写操作),那么您不希望触及真正的文件。而不是:

  • writes as changes go to a separate journaling file, created and locked by your app instance

    更改时写入到一个单独的日志文件,由应用程序实例创建并锁定

  • your app instance does not lock the main file, it locks only the journaling file

    应用程序实例没有锁定主文件,它只锁定日志文件

  • once all the writes are good to go, your app opens the real file with exclusive write lock, and commits every change in the journaling file, then close the file.

    一旦所有的写入操作都很好,您的应用程序就会打开真正的文件,并将所有的更改提交到日志文件中,然后关闭该文件。

As you can see, the solution with locks makes the file as a shared resource, which is protected by locks and only one applicaition can access to the file at a time. This solves the concurrency issues, but also makes the file access as a bottleneck. Therefore modern databases such as Oracle use versioning instead of locking. The versioning means that both the old and the new version of the file are available at the same time. Readers will be served by the old, most complete file. Once writing of the new version is finished, it is merged to the old version, and the new data is getting available at once. This is more tricky to implement, but since it allows reading all the time for all applications in parallel, it scales much better.

如您所见,带有锁的解决方案使文件成为共享资源,该资源由锁保护,并且每次只能有一个应用程序访问该文件。这解决了并发问题,但也使文件访问成为瓶颈。因此,像Oracle这样的现代数据库使用版本控制而不是锁定。版本控制意味着文件的旧版本和新版本同时可用。读者将得到旧的,最完整的文件。一旦完成了新版本的编写,它就会被合并到旧版本中,并且新的数据将立即可用。实现这一点比较困难,但由于它允许对所有应用程序都进行并行读取,因此它的可扩展性要好得多。

#4


2  

note that your simple answer won't handle concurrent writes by different instances. if two instances make changes and save, simply picking the newest one will end up losing the changes from the other instance. as mentioned by other answers, you should probably try to use file locking for this.

注意,您的简单答案不能处理不同实例的并发写操作。如果两个实例进行了更改并保存,那么简单地选择最新的实例将最终丢失来自另一个实例的更改。正如其他答案所提到的,您可能应该尝试使用文件锁定。

a relatively simple solution:

一个相对简单的解决方案:

  • use a separate lock file for writing "data.xml.lck". lock this when writing the file
  • 使用一个单独的锁文件来编写“data.xml.lck”。写入文件时请锁定此文件
  • as mentioned in my comment, write to a temp file first "data.xml.tmp", then rename to the final name when the write is complete "data.xml". this will give a reasonable assurance that anyone reading the file will get a complete file.
  • 如我在评论中所提到的,先写一个临时文件“data.xml”。tmp,然后在写入完成“data.xml”时重命名为最终名称。这将提供一个合理的保证,即任何阅读该文件的人都将获得完整的文件。
  • even with the file locking, you still have to handle the "merge" problem (one instance reads, another writes, then the first wants to write). in order to handle this you should have a version number in the file content. when an instance wants to write, it first acquires the lock. then it checks its local version number against the file version number. if it is out of date, it needs to merge what is in the file with the local changes. then it can write a new version.
  • 即使有了文件锁定,您仍然必须处理“合并”问题(一个实例读取,另一个写入,然后第一个想要写入)。为了处理这个问题,您应该在文件内容中有一个版本号。当一个实例想要写入时,它首先获得锁。然后根据文件版本号检查本地版本号。如果过期,则需要将文件中的内容与本地更改合并。然后它可以编写一个新版本。

#5


0  

After thinking about it for a while, I would want to try to implement it like this:

在思考了一段时间之后,我想尝试像这样去实施它:

  • Open the data.<timestamp>.xml-file with the latest timestamp.
  • 打开数据。 <时间> 。带有最新时间戳的xml文件。
  • Only use readonly mode.
  • 只使用只读的模式。
  • Make changes.
  • 做出改变。
  • Save the file as data.<timestamp>.xml - do not overwrite and check that no file with newer timestamp exists.
  • 将文件保存为数据。 <时间戳> 。xml -不要覆盖并检查是否存在具有较新的时间戳的文件。

相关文章