【转】C#大文件读取和查询

时间:2021-09-22 03:21:12

笔者最近需要快速查询日志文件,文件大小在4G以上。

需求如下:

1.读取4G左右大小的文件中的指定行,程序运行占用内存不超过500M。

2.希望查询1G以内容,能控制在20s左右.

刚开始觉得这个应该不难.研究一天之后,发现这个需要使用内存映射技术。

查阅了相关资料之后

发现还是有一定的复杂性.特别是需要对字符处理。

笔者自己写了一个Demo,希望实现

很遗憾,测试结果,查询1G左右的内容,花费时间在100s左右.

程序如下:

using System; using System.IO; using System.IO.MemoryMappedFiles; using System.Text; namespace ConsoleDemo { class Program { private const string TXT_FILE_PATH = @"E:\开源学习\超大文本文件读取\File\a.txt"; private const string SPLIT_VARCHAR = "囧"; private const char SPLIT_CHAR = ‘囧‘; private static long FILE_SIZE = 0; static void Main(string[] args) { //long ttargetRowNum = 39999999; long ttargetRowNum = 10000000; DateTime beginTime = DateTime.Now; string line = CreateMemoryMapFile(ttargetRowNum); double totalSeconds = DateTime.Now.Subtract(beginTime).TotalSeconds; Console.WriteLine(line); Console.WriteLine(string.Format("查找第{0}行,共耗时:{1}s", ttargetRowNum, totalSeconds)); Console.ReadLine(); } /// <summary> /// 创建内存映射文件 /// </summary> private static string CreateMemoryMapFile(long ttargetRowNum) { string line = string.Empty; using (FileStream fs = new FileStream(TXT_FILE_PATH, FileMode.Open, FileAccess.ReadWrite)) { long targetRowNum = ttargetRowNum + 1;//目标行 long curRowNum = 1;//当前行 FILE_SIZE = fs.Length; using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(fs, "test", fs.Length, MemoryMappedFileAccess.ReadWrite, null, HandleInheritability.None, false)) { long offset = 0; //int limit = 250; int limit = 200; try { StringBuilder sbDefineRowLine = new StringBuilder(); do { long remaining = fs.Length - offset; using (MemoryMappedViewStream mmStream = mmf.CreateViewStream(offset, remaining > limit ? limit : remaining)) //using (MemoryMappedViewStream mmStream = mmf.CreateViewStream(offset, remaining)) { offset += limit; using (StreamReader sr = new StreamReader(mmStream)) { //string ss = sr.ReadToEnd().ToString().Replace("\n", "囧").Replace(Environment.NewLine, "囧"); string ss = sr.ReadToEnd().ToString().Replace("\n", SPLIT_VARCHAR).Replace(Environment.NewLine, SPLIT_VARCHAR); if (curRowNum <= targetRowNum) { if (curRowNum < targetRowNum) { string s = sbDefineRowLine.ToString(); int pos = s.LastIndexOf(SPLIT_CHAR); if (pos > 0) sbDefineRowLine.Remove(0, pos); } else { line = sbDefineRowLine.ToString(); return line; } if (ss.Contains(SPLIT_VARCHAR)) { curRowNum += GetNewLineNumsOfStr(ss); sbDefineRowLine.Append(ss); } else { sbDefineRowLine.Append(ss); } } //sbDefineRowLine.Append(ss); //line = sbDefineRowLine.ToString(); //if (ss.Contains(Environment.NewLine)) //{ // ++curRowNum; // //curRowNum++; // //curRowNum += GetNewLineNumsOfStr(ss); // //sbDefineRowLine.Append(ss); //} //if (curRowNum == targetRowNum) //{ // string s = ""; //} sr.Dispose(); } mmStream.Dispose(); } } while (offset < fs.Length); } catch (Exception e) { Console.WriteLine(e.Message); } return line; } } } private static long GetNewLineNumsOfStr(string s) { string[] _lst = s.Split(SPLIT_CHAR); return _lst.Length - 1; } } }

  

欢迎大家提供更好的解决思路.

参考资料:

https://msdn.microsoft.com/zh-cn/library/dd997372(v=vs.110).aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1