I have a text file that is being written to as part of a very large data extract. The first line of the text file is the number of "accounts" extracted.
我有一个文本文件,作为一个非常大的数据提取的一部分写入。文本文件的第一行是提取的“帐户”数。
Because of the nature of this extract, that number is not known until the very end of the process, but the file can be large (a few hundred megs).
由于此提取的性质,该数字直到过程的最后才知道,但文件可能很大(几百兆)。
What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?
在C#/ .NET中打开文件(在本例中是一个简单的文本文件),并替换文本的第一个“行”中的数据的最佳方法是什么?
IMPORTANT NOTE: - I do not need to replace a "fixed amount of bytes" - that would be easy. The problem here is that the data that needs to be inserted at the top of the file is variable.
重要说明: - 我不需要替换“固定数量的字节” - 这很容易。这里的问题是需要插入文件顶部的数据是可变的。
IMPORTANT NOTE 2: - A few people have asked about / mentioned simply keeping the data in memory and then replacing it... however that's completely out of the question. The reason why this process is being updated is because of the fact that sometimes it crashes when loading a few gigs into memory.
重要说明2: - 有些人询问/提到只是将数据保存在内存中然后更换它......但这完全是不可能的。更新此过程的原因是因为有时它会在将几个演出加载到内存时崩溃。
6 个解决方案
#1
If you can you should insert a placeholder which you overwrite at the end with the actual number and spaces.
如果可以的话,您应该插入一个占位符,在最后用实际的数字和空格覆盖它。
If that is not an option write your data to a cache file first. When you know the actual number create the output file and append the data from the cache.
如果这不是一个选项,请先将数据写入缓存文件。当您知道实际数字时,请创建输出文件并附加缓存中的数据。
#2
BEST is very subjective. For any smallish file, you can easily open the entire file in memory and replace what you want using a string replace and then re-write the file.
BEST非常主观。对于任何小文件,您可以轻松地在内存中打开整个文件,并使用字符串替换替换您想要的内容,然后重新写入该文件。
Even for largish files, it would not be that hard to load into memory. In the days of multi-gigs of memory, I would consider hundreds of megabytes to still be easily done in memory.
即使对于较大的文件,加载到内存中也不会那么困难。在多内存的时代,我会考虑在内存中轻松完成数百兆字节。
Have you tested this naive approach? Have you seen a real issue with it?
你测试过这种天真的方法吗?你有没有看到它的真正问题?
If this is a really large file (gigabytes in size), I would consider writing all of the data first to a temp file and then write the correct file with the header line going in first and then appending the rest of the data. Since it is only text, I would probably just shell out to DOS:
如果这是一个非常大的文件(大小为千兆字节),我会考虑首先将所有数据写入临时文件,然后写入正确的文件,首先进入标题行,然后附加其余数据。由于它只是文本,我可能只是向DOS发出声明:
TYPE temp.txt >> outfile.txt
#3
I do not need to replace a "fixed amount of bytes"
我不需要替换“固定数量的字节”
Are you sure? If you write a big number to the first line of the file (UInt32.MaxValue or UInt64.MaxValue), then when you find the correct actual number, you can replace that number of bytes with the correct number, but left padded with zeros, so it's still a valid integer. e.g.
你确定吗?如果你在文件的第一行写一个大数字(UInt32.MaxValue或UInt64.MaxValue),那么当你找到正确的实际数字时,你可以用正确的数字替换这个字节数,但是用零填充,所以它仍然是一个有效的整数。例如
Replace 999999 - your "large number placeholder"
With 000100 - the actual number of accounts
#4
Seems to me if I understand the question correctly?
如果我理解正确的话,对我来说似乎?
What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?
在C#/ .NET中打开文件(在本例中是一个简单的文本文件),并替换文本的第一个“行”中的数据的最佳方法是什么?
How about placing at the top of the file a token {UserCount} when it is first created.
首次创建时,如何在文件顶部放置一个令牌{UserCount}。
Then use TextReader to read the file line by line. If it is the first line look for {UserCount} and replace with your value. Write out each line you read in using TextWriter
然后使用TextReader逐行读取文件。如果它是{UserCount}的第一行,则替换为您的值。使用TextWriter写出您读入的每一行
Example:
int lineNumber = 1;
int userCount = 1234;
string line = null;
using(TextReader tr = File.OpenText("OriginalFile"))
using(TextWriter tw = File.CreateText("ResultFile"))
{
while((line = tr.ReadLine()) != null)
{
if(lineNumber == 1)
{
line = line.Replace("{UserCount}", userCount.ToString());
}
tw.WriteLine(line);
lineNumber++;
}
}
#5
If the extracted file is only a few hundred megabytes, then you can easily keep all of the text in-memory until the extraction is complete. Then, you can write your output file as the last operation, starting with the record count.
如果提取的文件只有几百兆字节,那么您可以轻松地将所有文本保留在内存中,直到提取完成。然后,您可以将输出文件写为最后一个操作,从记录计数开始。
#6
Ok, earlier I suggested an approach that would be a better if dealing with existing files.
好的,早些时候我提出了一种方法,如果处理现有文件会更好。
However in your situation you want to create the file and during the create process go back to the top and write out the user count. This will do just that.
但是在您的情况下,您想要创建文件,并在创建过程中返回到顶部并写出用户计数。这样就可以了。
Here is one way to do it that prevents you having to write the temporary file.
这是一种方法,可以防止您必须编写临时文件。
private void WriteUsers()
{
string userCountString = null;
ASCIIEncoding enc = new ASCIIEncoding();
byte[] userCountBytes = null;
int userCounter = 0;
using(StreamWriter sw = File.CreateText("myfile.txt"))
{
// Write a blank line and return
// Note this line will later contain our user count.
sw.WriteLine();
// Write out the records and keep track of the count
for(int i = 1; i < 100; i++)
{
sw.WriteLine("User" + i);
userCounter++;
}
// Get the base stream and set the position to 0
sw.BaseStream.Position = 0;
userCountString = "User Count: " + userCounter;
userCountBytes = enc.GetBytes(userCountString);
sw.BaseStream.Write(userCountBytes, 0, userCountBytes.Length);
}
}
#1
If you can you should insert a placeholder which you overwrite at the end with the actual number and spaces.
如果可以的话,您应该插入一个占位符,在最后用实际的数字和空格覆盖它。
If that is not an option write your data to a cache file first. When you know the actual number create the output file and append the data from the cache.
如果这不是一个选项,请先将数据写入缓存文件。当您知道实际数字时,请创建输出文件并附加缓存中的数据。
#2
BEST is very subjective. For any smallish file, you can easily open the entire file in memory and replace what you want using a string replace and then re-write the file.
BEST非常主观。对于任何小文件,您可以轻松地在内存中打开整个文件,并使用字符串替换替换您想要的内容,然后重新写入该文件。
Even for largish files, it would not be that hard to load into memory. In the days of multi-gigs of memory, I would consider hundreds of megabytes to still be easily done in memory.
即使对于较大的文件,加载到内存中也不会那么困难。在多内存的时代,我会考虑在内存中轻松完成数百兆字节。
Have you tested this naive approach? Have you seen a real issue with it?
你测试过这种天真的方法吗?你有没有看到它的真正问题?
If this is a really large file (gigabytes in size), I would consider writing all of the data first to a temp file and then write the correct file with the header line going in first and then appending the rest of the data. Since it is only text, I would probably just shell out to DOS:
如果这是一个非常大的文件(大小为千兆字节),我会考虑首先将所有数据写入临时文件,然后写入正确的文件,首先进入标题行,然后附加其余数据。由于它只是文本,我可能只是向DOS发出声明:
TYPE temp.txt >> outfile.txt
#3
I do not need to replace a "fixed amount of bytes"
我不需要替换“固定数量的字节”
Are you sure? If you write a big number to the first line of the file (UInt32.MaxValue or UInt64.MaxValue), then when you find the correct actual number, you can replace that number of bytes with the correct number, but left padded with zeros, so it's still a valid integer. e.g.
你确定吗?如果你在文件的第一行写一个大数字(UInt32.MaxValue或UInt64.MaxValue),那么当你找到正确的实际数字时,你可以用正确的数字替换这个字节数,但是用零填充,所以它仍然是一个有效的整数。例如
Replace 999999 - your "large number placeholder"
With 000100 - the actual number of accounts
#4
Seems to me if I understand the question correctly?
如果我理解正确的话,对我来说似乎?
What is the BEST way in C# / .NET to open a file (in this case a simple text file), and replace the data that is in the first "line" of text?
在C#/ .NET中打开文件(在本例中是一个简单的文本文件),并替换文本的第一个“行”中的数据的最佳方法是什么?
How about placing at the top of the file a token {UserCount} when it is first created.
首次创建时,如何在文件顶部放置一个令牌{UserCount}。
Then use TextReader to read the file line by line. If it is the first line look for {UserCount} and replace with your value. Write out each line you read in using TextWriter
然后使用TextReader逐行读取文件。如果它是{UserCount}的第一行,则替换为您的值。使用TextWriter写出您读入的每一行
Example:
int lineNumber = 1;
int userCount = 1234;
string line = null;
using(TextReader tr = File.OpenText("OriginalFile"))
using(TextWriter tw = File.CreateText("ResultFile"))
{
while((line = tr.ReadLine()) != null)
{
if(lineNumber == 1)
{
line = line.Replace("{UserCount}", userCount.ToString());
}
tw.WriteLine(line);
lineNumber++;
}
}
#5
If the extracted file is only a few hundred megabytes, then you can easily keep all of the text in-memory until the extraction is complete. Then, you can write your output file as the last operation, starting with the record count.
如果提取的文件只有几百兆字节,那么您可以轻松地将所有文本保留在内存中,直到提取完成。然后,您可以将输出文件写为最后一个操作,从记录计数开始。
#6
Ok, earlier I suggested an approach that would be a better if dealing with existing files.
好的,早些时候我提出了一种方法,如果处理现有文件会更好。
However in your situation you want to create the file and during the create process go back to the top and write out the user count. This will do just that.
但是在您的情况下,您想要创建文件,并在创建过程中返回到顶部并写出用户计数。这样就可以了。
Here is one way to do it that prevents you having to write the temporary file.
这是一种方法,可以防止您必须编写临时文件。
private void WriteUsers()
{
string userCountString = null;
ASCIIEncoding enc = new ASCIIEncoding();
byte[] userCountBytes = null;
int userCounter = 0;
using(StreamWriter sw = File.CreateText("myfile.txt"))
{
// Write a blank line and return
// Note this line will later contain our user count.
sw.WriteLine();
// Write out the records and keep track of the count
for(int i = 1; i < 100; i++)
{
sw.WriteLine("User" + i);
userCounter++;
}
// Get the base stream and set the position to 0
sw.BaseStream.Position = 0;
userCountString = "User Count: " + userCounter;
userCountBytes = enc.GetBytes(userCountString);
sw.BaseStream.Write(userCountBytes, 0, userCountBytes.Length);
}
}