I need to make a frequency analysis console program using c#. It has to show the 10 most frequent letters from a textfile. I have managed to display the first 10 letters read by the program and the frequency of each character. I, however, don't know how to sort the dictionary. This is the code I have so far.
我需要使用c#制作频率分析控制台程序。它必须显示文本文件中最常见的10个字母。我设法显示程序读取的前10个字母和每个字符的频率。但是,我不知道如何对字典进行排序。这是我到目前为止的代码。
I must also give the user the option to the frequency analysis in case sensitive mode (as it is right now) and case insensitive. Help with this issue will also be appreciated. Thank You!
我还必须为用户提供在区分大小写模式(现在就是这样)并且不区分大小写的情况下进行频率分析的选项。对此问题的帮助也将不胜感激。谢谢!
static void Main(string[] args)
{
// 1.
// Array to store frequencies.
int[] c = new int[(int)char.MaxValue];
// 2.
// Read entire text file.
// string root = Server.MapPath("~");
// string FileName = root + "/App_Data/text.txt";
//string s = File.ReadAllText(FileName);
foreach (string line in File.ReadLines(@"c:\Users\user\Documents\Visual Studio 2015\Projects\ConsoleApplication1\ConsoleApplication1\App_Data\text.txt", Encoding.UTF8)) {
var fileStream = new FileStream(@"c:\Users\user\Documents\Visual Studio 2015\Projects\ConsoleApplication1\ConsoleApplication1\App_Data\text.txt", FileMode.Open, FileAccess.Read);
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8))
{
string line2;
while ((line2 = streamReader.ReadLine()) != null)
{
// process the line
// 3.
// Iterate over each character.
foreach (char t in line)
{
// Increment table.
c[(int)t]++;
}
// 4.
// Write all letters found.
int counter = 0;
for (int i = 0; i < (int)char.MaxValue; i++)
{
if (c[i] > 0 && counter < 11 &&
char.IsLetterOrDigit((char)i))
{
++counter;
Console.WriteLine("Letter: {0} Frequency: {1}",
(char)i,
c[i]);
}
}
}
}
Console.ReadLine();
}
}
3 个解决方案
#1
3
If all you want to do is to found frequencies, you don't want any dictionaries, but a Linq. Such tasks are ones Linq has been designed for:
如果您只想找到频率,那么您不需要任何字典,而是Linq。这些任务是Linq专为:
...
using System.Linq;
...
static void Main(string[] args) {
var result = File
.ReadLines(@"...", Encoding.UTF8)
.SelectMany(line => line) // string into characters
.Where(c => char.IsLetterOrDigit(c))
.GroupBy(c => c)
.Select(chunk => new {
Letter = chunk.Key,
Count = chunk.Count() })
.OrderByDescending(item => item.Count)
.ThenBy(item => item.Letter) // in case of tie sort by letter
.Take(10)
.Select(item => $"{item.Letter} freq. {item.Count}"); // $"..." - C# 6.0 syntax
Console.Write(string.Join(Environment.NewLine, result));
}
#2
0
It would be easier to use the actual Dictionary type in C# here, rather than an array:
在C#中使用实际的Dictionary类型会更容易,而不是数组:
Dictionary<char, int> characterCountDictionary = new Dictionary<char, int>();
You add a key if it doesn't exist already (and insert a value of 1), or you increment the value if it does exist. Then you can pull out the keys of your dictionary as a list and sort them, iterating to find the values. If you do case insensitive you'd just convert all upper case to lower case before inserting into the dictionary.
如果密钥已经不存在(并插入值1),则添加密钥,或者如果值存在则增加该值。然后,您可以将字典中的键作为列表拉出并对其进行排序,迭代以查找值。如果你不区分大小写,你只需要在插入字典之前将所有大写字母转换为小写字母。
Here's the MSDN page for the examples for Dictionary: https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx#Examples
这是字典示例的MSDN页面:https://msdn.microsoft.com/en-us/library/xfhwa508(v = vs.110).aspx #Examples
#3
0
I like @Dmitry Bychenko's answer because it's very terse. But, if you have a very large file then that solution may not be optimal for you. The reason being, that solution has to read the entire file into memory to process it. So, in my tests, I got up to around 1GB of memory usage for a 500MB file. The solution below, while not quite as terse, uses constant memory (basically 0) and runs as fast or faster than the Linq version in my tests.
我喜欢@Dmitry Bychenko的回答,因为它非常简洁。但是,如果您有一个非常大的文件,那么该解决方案可能不适合您。原因是,该解决方案必须将整个文件读入内存以进行处理。因此,在我的测试中,我为500MB文件获得了大约1GB的内存使用量。下面的解决方案,虽然不是很简洁,但使用常量内存(基本上为0),并且在我的测试中运行速度比Linq版本快或快。
Dictionary<char, int> freq = new Dictionary<char, int>();
using (StreamReader sr = new StreamReader(@"yourBigFile")) {
string line;
while ((line = sr.ReadLine()) != null) {
foreach (char c in line) {
if (!freq.ContainsKey(c)) {
freq[c] = 0;
}
freq[c]++;
}
}
}
var result = freq.Where(c => char.IsLetterOrDigit(c.Key)).OrderByDescending(x => x.Value).Take(10);
Console.WriteLine(string.Join(Environment.NewLine, result));
#1
3
If all you want to do is to found frequencies, you don't want any dictionaries, but a Linq. Such tasks are ones Linq has been designed for:
如果您只想找到频率,那么您不需要任何字典,而是Linq。这些任务是Linq专为:
...
using System.Linq;
...
static void Main(string[] args) {
var result = File
.ReadLines(@"...", Encoding.UTF8)
.SelectMany(line => line) // string into characters
.Where(c => char.IsLetterOrDigit(c))
.GroupBy(c => c)
.Select(chunk => new {
Letter = chunk.Key,
Count = chunk.Count() })
.OrderByDescending(item => item.Count)
.ThenBy(item => item.Letter) // in case of tie sort by letter
.Take(10)
.Select(item => $"{item.Letter} freq. {item.Count}"); // $"..." - C# 6.0 syntax
Console.Write(string.Join(Environment.NewLine, result));
}
#2
0
It would be easier to use the actual Dictionary type in C# here, rather than an array:
在C#中使用实际的Dictionary类型会更容易,而不是数组:
Dictionary<char, int> characterCountDictionary = new Dictionary<char, int>();
You add a key if it doesn't exist already (and insert a value of 1), or you increment the value if it does exist. Then you can pull out the keys of your dictionary as a list and sort them, iterating to find the values. If you do case insensitive you'd just convert all upper case to lower case before inserting into the dictionary.
如果密钥已经不存在(并插入值1),则添加密钥,或者如果值存在则增加该值。然后,您可以将字典中的键作为列表拉出并对其进行排序,迭代以查找值。如果你不区分大小写,你只需要在插入字典之前将所有大写字母转换为小写字母。
Here's the MSDN page for the examples for Dictionary: https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx#Examples
这是字典示例的MSDN页面:https://msdn.microsoft.com/en-us/library/xfhwa508(v = vs.110).aspx #Examples
#3
0
I like @Dmitry Bychenko's answer because it's very terse. But, if you have a very large file then that solution may not be optimal for you. The reason being, that solution has to read the entire file into memory to process it. So, in my tests, I got up to around 1GB of memory usage for a 500MB file. The solution below, while not quite as terse, uses constant memory (basically 0) and runs as fast or faster than the Linq version in my tests.
我喜欢@Dmitry Bychenko的回答,因为它非常简洁。但是,如果您有一个非常大的文件,那么该解决方案可能不适合您。原因是,该解决方案必须将整个文件读入内存以进行处理。因此,在我的测试中,我为500MB文件获得了大约1GB的内存使用量。下面的解决方案,虽然不是很简洁,但使用常量内存(基本上为0),并且在我的测试中运行速度比Linq版本快或快。
Dictionary<char, int> freq = new Dictionary<char, int>();
using (StreamReader sr = new StreamReader(@"yourBigFile")) {
string line;
while ((line = sr.ReadLine()) != null) {
foreach (char c in line) {
if (!freq.ContainsKey(c)) {
freq[c] = 0;
}
freq[c]++;
}
}
}
var result = freq.Where(c => char.IsLetterOrDigit(c.Key)).OrderByDescending(x => x.Value).Take(10);
Console.WriteLine(string.Join(Environment.NewLine, result));