My application has to read the data stored in a file and get the values for the variables or arrays to work on them.
我的应用程序必须读取存储在文件中的数据,并获取变量或数组的值以对其进行处理。
My question is, that which file format will be fast and easy for retrieval of data from the file.
我的问题是,哪种文件格式可以快速轻松地从文件中检索数据。
I was thinking to use .xml, .ini , or just a simple .txt file. But to read .txt file i will have to write a lot of code with many if or else conditions.
我想使用.xml,.ini或只是一个简单的.txt文件。但是要阅读.txt文件,我将不得不用很多if或者条件编写很多代码。
I dont know how to use .ini and .xml. But if they will better and fast so i'll learn them first, and then i'll use them. Kindly guide me.
我不知道如何使用.ini和.xml。但是,如果他们会更好,更快,所以我会先学习它们,然后我会用它们。请指导我。
5 个解决方案
#1
4
I will assume what you are indicating here is that raw performance is not a priority over robustness of the system.
我将假设您在此处指出的原始性能不是优先于系统的稳健性。
For simple data which is a value paired with a name, an ini would probably be simplest solution. More complex structured data would lead you toward XML. According to a previously asked question if you are working in C# (and hence it's assumed in .Net) XML is generally preferred as it has been built into the .Net libraries. As xml is more flexible and can change with needs of the program, I would also personally recommend xml over ini as a file standard. It will take more work to learn the XML library, however it will quickly pay off and is a standardized system.
对于与名称配对的简单数据,ini可能是最简单的解决方案。更复杂的结构化数据将引导您走向XML。根据之前提出的一个问题,如果您使用的是C#(因此在.Net中假设),XML通常是首选,因为它已内置到.Net库中。由于xml更灵活,可以根据程序的需要进行更改,我个人也会推荐使用xml over ini作为文件标准。学习XML库需要做更多的工作,但它会很快得到回报,并且是一个标准化的系统。
Text could be fast, but you would be sacrificing either a vast sum of robust parsing behavior for the sake of speed or spending far more man hours developing and maintaining a high speed specialized parser.
文本可能很快,但为了提高速度或者花费更多的工时来开发和维护高速专用解析器,你会牺牲大量强大的解析行为。
For references on reading in xml files: (natively supported in .Net libraries)
有关在xml文件中读取的参考:(在.Net库中本机支持)
- MSDN XMLTextReader Article
- MSDN XMLReader Article
- Writing Data to XML with XMLSerializer
MSDN XMLTextReader文章
MSDN XMLReader文章
使用XMLSerializer将数据写入XML
For references on reading in ini files: (not natively supported in .Net libraries)
有关在ini文件中读取的参考:(在.Net库中不是本机支持的)
#2
4
if its a tabular data, then probably it is faster to just use CSV(comma separated values) files.
如果它是表格数据,那么可能只使用CSV(逗号分隔值)文件更快。
If it is a structured data(like a tree or something) then you can use the XML parser in C# which is faster (but will take some learning effort on your part)
如果它是结构化数据(如树或其他东西),那么您可以在C#中使用XML解析器,这更快(但需要您的学习工作)
If the data is like a dictionary, then INI will be a better option. It really depends on the type of data in your application
如果数据类似于字典,那么INI将是更好的选择。这实际上取决于应用程序中的数据类型
Or if you don't mind an RDBMS, then that would be a better option. Usually, a good RDBMS is optimized to handle large data and read them really quickly.
或者,如果您不介意RDBMS,那么这将是一个更好的选择。通常,优化的RDBMS可以处理大数据并且可以非常快速地读取它们。
#3
1
If you don't mind having a binary file (one that people can't read and modify themselves), the fastest would be serializing an array of numbers to a file, and deserializing it from the file.
如果您不介意使用二进制文件(人们无法自行阅读和修改),最快的方法是将数组数组序列化到文件中,并从文件中反序列化。
The file will be smaller because data is stored more efficiently, requiring less I/O operations to read it. It will also require minimal parsing (really minimal), so reading will be lightening fast.
该文件将更小,因为数据存储更有效,需要更少的I / O操作来读取它。它还需要最少的解析(真的很小),因此读取速度会很快。
Suppose your numbers are located here:
假设您的号码位于此处:
int[] numbers = ..... ;
You save them to file with this code:
使用以下代码将它们保存到文件:
using(var file = new FileStream(filename, FileMode.Create))
{
var formatter = new BinaryFormatter();
formatter.Serialize(numbers, file);
}
To read the data from the file, you open it and then use:
要从文件中读取数据,请将其打开然后使用:
numbers = (int[])formatter.Deserialize(file);
#4
1
I think that @Ian T. Small addressed the difference between the file types well.
我认为@Ian T. Small很好地解决了文件类型之间的差异。
Given @Shaharyar's responses to @Aniket, I just wanted to add to the DBMS conversation as a solution given the limited scope info we have.
鉴于@ Shaharyar对@Aniket的回应,我只想在DBMS会话中添加一个解决方案,因为我们的范围信息有限。
Will the data set grow? How may entries constitutes "Many Fields"?
数据集会增长吗?条目如何构成“许多领域”?
I agree that an r-dbms (relational) is a potential solution far a large data set. The next question is what is a large data set.
我同意r-dbms(关系)是远大数据集的潜在解决方案。下一个问题是什么是大数据集。
When (and which) a DBMS is a good idea
When @Shaharyar says many fields I are we talking 10's or 100's of fields?
=> 10-20 fields wouldn't necessitate the overhead (install size, CRUD code, etc) of a r-DBMS. Xml serialization of the object is far simpler.
=> If, there is an indeterminate number of fields (ie: The number of fields increases over time), he needs ACID compliance, or has hundreds of fields, then I'd say @Aniket spot on.
何时(以及哪个)DBMS是一个好主意当@Shaharyar说很多字段时,我是在说10个字段还是100个字段? => 10-20个字段不需要r-DBMS的开销(安装大小,CRUD代码等)。对象的Xml序列化要简单得多。 =>如果,有一个不确定数量的字段(即:字段数随着时间的推移而增加),他需要ACID合规性,或者有数百个字段,那么我会说@Aniket现场。
@Matt's suggestion of NoSQL is also great. It will provide high throughput (far more then required for an update every few seconds) and simplified serialization/de-serialization.
@Matt对NoSQL的建议也很棒。它将提供高吞吐量(远远超过每几秒更新所需)和简化的序列化/反序列化。
The only downside I see here is application size/configuration. (Even the light weight, easy to configure MongoDB will add 10's of a MB for the DBMS facilites and driver. Not ideal for a small < 1MB application meant for fast easy distribution.) Oh and @Shaharyar, if you do require ACID compliance please be sure the check the database first. Mongo, for example, does not offer it. Not to say you will ever lose data, there are just no guarantees.
我在这里看到的唯一缺点是应用程序大小/配置。 (即使重量轻,易于配置MongoDB也会为DBMS设备和驱动程序添加10 MB的MB。不适合小于1MB的应用程序,以便快速轻松分发。)哦和@Shaharyar,如果您确实需要ACID合规,请一定要先检查数据库。例如,Mongo不提供它。不是说你将丢失数据,没有任何保证。
Another Option - No DBMS but increased throughput
The last suggestion I'd like to make will require a little code (specifically an object to act as a buffer).
If
1. the data set it small (10's not 100's)
2. the number of fields are fixed
3. there is no requirement for ACID compliance
4. you're concerned about increased transaction loads (ie: Lots of updates per second)
You can also just cache changes in a datastore object and flush on program close, or via a time every 'n' seconds/minutes/etc.
另一种选择 - 没有DBMS但是增加了吞吐量我想做的最后一个建议需要一些代码(特别是一个充当缓冲区的对象)。如果1.数据设置很小(10不是100)2。字段数是固定的3.没有要求ACID合规4.你担心增加的事务负载(即:每秒更新很多)你也可以只缓存数据存储区对象中的更改,并在程序关闭时刷新,或者每隔'n'秒/分钟/等时间刷新一次。
Per @Ian T. Small's post we would use native XML class serialization built into the .Net framework.
根据@Ian T. Small的帖子,我们将使用内置于.Net框架中的原生XML类序列化。
The following is just oversimplified pseudo-code but should give you an idea:
以下是过度简化的伪代码,但应该给你一个想法:
public class FieldContainer
{
bool ChangeMade
Timer timer = new Timer(5minutes)
private OnTimerTick(...)
{
If (ChangeMade)
UpdateXMLFlatFile()
}
}
#5
0
How fast does it need to be?
它需要多快?
txt will be the fastest option. But you have to program the parser yourself. (speed does come at a cost)
txt将是最快的选择。但是你必须自己编写解析器。 (速度确实需要付出代价)
xml is probably easiest to implement, as you have xmlSerializer (or other classes) to to the hard work.
xml可能最容易实现,因为你有xmlSerializer(或其他类)来努力工作。
For small configuration files (~0,5MB and smaller) you won't be able to tell any difference in speed. When it comes to really big files, txt and a custom file format is probably the way to go. However, you can always choose either way: Look at projects like OpenStreetMap, they have huge xml Files (> 10 GB) and it is still usable.
对于小配置文件(~0.5MB或更小),您将无法区分速度。当涉及到非常大的文件时,txt和自定义文件格式可能是要走的路。但是,您始终可以选择以下两种方式:查看OpenStreetMap等项目,它们具有巨大的xml文件(> 10 GB),并且仍可使用。
#1
4
I will assume what you are indicating here is that raw performance is not a priority over robustness of the system.
我将假设您在此处指出的原始性能不是优先于系统的稳健性。
For simple data which is a value paired with a name, an ini would probably be simplest solution. More complex structured data would lead you toward XML. According to a previously asked question if you are working in C# (and hence it's assumed in .Net) XML is generally preferred as it has been built into the .Net libraries. As xml is more flexible and can change with needs of the program, I would also personally recommend xml over ini as a file standard. It will take more work to learn the XML library, however it will quickly pay off and is a standardized system.
对于与名称配对的简单数据,ini可能是最简单的解决方案。更复杂的结构化数据将引导您走向XML。根据之前提出的一个问题,如果您使用的是C#(因此在.Net中假设),XML通常是首选,因为它已内置到.Net库中。由于xml更灵活,可以根据程序的需要进行更改,我个人也会推荐使用xml over ini作为文件标准。学习XML库需要做更多的工作,但它会很快得到回报,并且是一个标准化的系统。
Text could be fast, but you would be sacrificing either a vast sum of robust parsing behavior for the sake of speed or spending far more man hours developing and maintaining a high speed specialized parser.
文本可能很快,但为了提高速度或者花费更多的工时来开发和维护高速专用解析器,你会牺牲大量强大的解析行为。
For references on reading in xml files: (natively supported in .Net libraries)
有关在xml文件中读取的参考:(在.Net库中本机支持)
- MSDN XMLTextReader Article
- MSDN XMLReader Article
- Writing Data to XML with XMLSerializer
MSDN XMLTextReader文章
MSDN XMLReader文章
使用XMLSerializer将数据写入XML
For references on reading in ini files: (not natively supported in .Net libraries)
有关在ini文件中读取的参考:(在.Net库中不是本机支持的)
#2
4
if its a tabular data, then probably it is faster to just use CSV(comma separated values) files.
如果它是表格数据,那么可能只使用CSV(逗号分隔值)文件更快。
If it is a structured data(like a tree or something) then you can use the XML parser in C# which is faster (but will take some learning effort on your part)
如果它是结构化数据(如树或其他东西),那么您可以在C#中使用XML解析器,这更快(但需要您的学习工作)
If the data is like a dictionary, then INI will be a better option. It really depends on the type of data in your application
如果数据类似于字典,那么INI将是更好的选择。这实际上取决于应用程序中的数据类型
Or if you don't mind an RDBMS, then that would be a better option. Usually, a good RDBMS is optimized to handle large data and read them really quickly.
或者,如果您不介意RDBMS,那么这将是一个更好的选择。通常,优化的RDBMS可以处理大数据并且可以非常快速地读取它们。
#3
1
If you don't mind having a binary file (one that people can't read and modify themselves), the fastest would be serializing an array of numbers to a file, and deserializing it from the file.
如果您不介意使用二进制文件(人们无法自行阅读和修改),最快的方法是将数组数组序列化到文件中,并从文件中反序列化。
The file will be smaller because data is stored more efficiently, requiring less I/O operations to read it. It will also require minimal parsing (really minimal), so reading will be lightening fast.
该文件将更小,因为数据存储更有效,需要更少的I / O操作来读取它。它还需要最少的解析(真的很小),因此读取速度会很快。
Suppose your numbers are located here:
假设您的号码位于此处:
int[] numbers = ..... ;
You save them to file with this code:
使用以下代码将它们保存到文件:
using(var file = new FileStream(filename, FileMode.Create))
{
var formatter = new BinaryFormatter();
formatter.Serialize(numbers, file);
}
To read the data from the file, you open it and then use:
要从文件中读取数据,请将其打开然后使用:
numbers = (int[])formatter.Deserialize(file);
#4
1
I think that @Ian T. Small addressed the difference between the file types well.
我认为@Ian T. Small很好地解决了文件类型之间的差异。
Given @Shaharyar's responses to @Aniket, I just wanted to add to the DBMS conversation as a solution given the limited scope info we have.
鉴于@ Shaharyar对@Aniket的回应,我只想在DBMS会话中添加一个解决方案,因为我们的范围信息有限。
Will the data set grow? How may entries constitutes "Many Fields"?
数据集会增长吗?条目如何构成“许多领域”?
I agree that an r-dbms (relational) is a potential solution far a large data set. The next question is what is a large data set.
我同意r-dbms(关系)是远大数据集的潜在解决方案。下一个问题是什么是大数据集。
When (and which) a DBMS is a good idea
When @Shaharyar says many fields I are we talking 10's or 100's of fields?
=> 10-20 fields wouldn't necessitate the overhead (install size, CRUD code, etc) of a r-DBMS. Xml serialization of the object is far simpler.
=> If, there is an indeterminate number of fields (ie: The number of fields increases over time), he needs ACID compliance, or has hundreds of fields, then I'd say @Aniket spot on.
何时(以及哪个)DBMS是一个好主意当@Shaharyar说很多字段时,我是在说10个字段还是100个字段? => 10-20个字段不需要r-DBMS的开销(安装大小,CRUD代码等)。对象的Xml序列化要简单得多。 =>如果,有一个不确定数量的字段(即:字段数随着时间的推移而增加),他需要ACID合规性,或者有数百个字段,那么我会说@Aniket现场。
@Matt's suggestion of NoSQL is also great. It will provide high throughput (far more then required for an update every few seconds) and simplified serialization/de-serialization.
@Matt对NoSQL的建议也很棒。它将提供高吞吐量(远远超过每几秒更新所需)和简化的序列化/反序列化。
The only downside I see here is application size/configuration. (Even the light weight, easy to configure MongoDB will add 10's of a MB for the DBMS facilites and driver. Not ideal for a small < 1MB application meant for fast easy distribution.) Oh and @Shaharyar, if you do require ACID compliance please be sure the check the database first. Mongo, for example, does not offer it. Not to say you will ever lose data, there are just no guarantees.
我在这里看到的唯一缺点是应用程序大小/配置。 (即使重量轻,易于配置MongoDB也会为DBMS设备和驱动程序添加10 MB的MB。不适合小于1MB的应用程序,以便快速轻松分发。)哦和@Shaharyar,如果您确实需要ACID合规,请一定要先检查数据库。例如,Mongo不提供它。不是说你将丢失数据,没有任何保证。
Another Option - No DBMS but increased throughput
The last suggestion I'd like to make will require a little code (specifically an object to act as a buffer).
If
1. the data set it small (10's not 100's)
2. the number of fields are fixed
3. there is no requirement for ACID compliance
4. you're concerned about increased transaction loads (ie: Lots of updates per second)
You can also just cache changes in a datastore object and flush on program close, or via a time every 'n' seconds/minutes/etc.
另一种选择 - 没有DBMS但是增加了吞吐量我想做的最后一个建议需要一些代码(特别是一个充当缓冲区的对象)。如果1.数据设置很小(10不是100)2。字段数是固定的3.没有要求ACID合规4.你担心增加的事务负载(即:每秒更新很多)你也可以只缓存数据存储区对象中的更改,并在程序关闭时刷新,或者每隔'n'秒/分钟/等时间刷新一次。
Per @Ian T. Small's post we would use native XML class serialization built into the .Net framework.
根据@Ian T. Small的帖子,我们将使用内置于.Net框架中的原生XML类序列化。
The following is just oversimplified pseudo-code but should give you an idea:
以下是过度简化的伪代码,但应该给你一个想法:
public class FieldContainer
{
bool ChangeMade
Timer timer = new Timer(5minutes)
private OnTimerTick(...)
{
If (ChangeMade)
UpdateXMLFlatFile()
}
}
#5
0
How fast does it need to be?
它需要多快?
txt will be the fastest option. But you have to program the parser yourself. (speed does come at a cost)
txt将是最快的选择。但是你必须自己编写解析器。 (速度确实需要付出代价)
xml is probably easiest to implement, as you have xmlSerializer (or other classes) to to the hard work.
xml可能最容易实现,因为你有xmlSerializer(或其他类)来努力工作。
For small configuration files (~0,5MB and smaller) you won't be able to tell any difference in speed. When it comes to really big files, txt and a custom file format is probably the way to go. However, you can always choose either way: Look at projects like OpenStreetMap, they have huge xml Files (> 10 GB) and it is still usable.
对于小配置文件(~0.5MB或更小),您将无法区分速度。当涉及到非常大的文件时,txt和自定义文件格式可能是要走的路。但是,您始终可以选择以下两种方式:查看OpenStreetMap等项目,它们具有巨大的xml文件(> 10 GB),并且仍可使用。