I will try to explain the problem as good as I can. I have a text file with email addresses and names. It looks like this: Barb Beney "de.mariof@vienna.aa", "Beny Beney" bet@catering.at
,etc....all in the same line. This is just an example and I have like thousands of such data in one big text file. I want to extract the emails and names so that I get something like this in the end:
我会尽力解释这个问题。我有一个包含电子邮件地址和姓名的文本文件。它看起来像这样:Barb Beney“de.mariof@vienna.aa”,“Beny Beney”打赌@ catering.at等等......都在同一行。这只是一个例子,我在一个大文本文件中有数千个这样的数据。我想提取电子邮件和名称,以便最终获得这样的内容:
Beny Beney bet@catering.at - separate, next to each other, in one line and without quote marks. And in the end it should eliminate all duplicate addresses from the file.
Beny Beney bet@catering.at-彼此相邻,分成一行,没有引号。最后,它应该从文件中删除所有重复的地址。
I wrote the code for extracting email addresses and it works, but I don't know how to do the rest. How to extract the names put it in one line as the addresses and eliminate duplicates. I hope I described it properly so you know what I'm trying to do. This is the code I have:
我编写了用于提取电子邮件地址的代码并且它可以工作,但我不知道如何完成剩下的工作。如何提取名称将其放在一行作为地址并消除重复。我希望我能正确描述它,以便你知道我在做什么。这是我的代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Text.RegularExpressions;
using System.IO;
namespace Email
{
class Program
{
static void Main(string[] args)
{
ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
}
public static void ExtractEmails(string inFilePath, string outFilePath)
{
string data = File.ReadAllText(inFilePath);
Regex emailRegex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
RegexOptions.IgnoreCase);
MatchCollection emailMatches = emailRegex.Matches(data);
StringBuilder sb = new StringBuilder();
foreach (Match emailMatch in emailMatches)
{
sb.AppendLine(emailMatch.Value);
}
File.WriteAllText(outFilePath, sb.ToString());
}
} }
3 个解决方案
#1
0
For the new desired formatting, you could do something like this:
对于新的所需格式,您可以执行以下操作:
private string[] parseEmails(string bigStringiIn){
string[] output;
string bigString;
bigString = bigStringiIn.Replace("\"", "");
output = bigString.Slit(",".ToCharArray());
return output;
}
it takes the string with the mail adresses, replaces the quote marks, then splits the string into a string array with the format: name lastname email@some.com
它接受带有邮件地址的字符串,替换引号,然后将字符串拆分为字符串数组,格式为:name lastname email@some.com
for the duplicated entries deletion, a nested for should do the trick, checking (maybe after a .Split()) for matching strings.
对于重复的条目删除,嵌套的应该做的技巧,检查(可能在.Split()之后匹配字符串)。
#2
0
Welcome you can use this code and it will work on file made by creating new file which will contain all e-mails without duplicates:
欢迎您可以使用此代码,它将通过创建新文件来处理文件,该文件将包含所有不重复的电子邮件:
static void Main(string[] args)
{
TextWriter w = File.CreateText(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt");
ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
TextReader r = File.OpenText(@"C:\Users\drake\Desktop\Email.txt");
RemovingAllDupes(r, w);
}
public static void RemovingAllDupes(TextReader reader, TextWriter writer)
{
string currentLine;
HashSet<string> previousLines = new HashSet<string>();
while ((currentLine = reader.ReadLine()) != null)
{
// Add returns true if it was actually added,
// false if it was already there
if (previousLines.Add(currentLine))
{
writer.WriteLine(currentLine);
}
}
writer.Close();
}
#3
0
you can also use this code with big files:
您也可以将此代码用于大文件:
static void Main(string[] args)
{
ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
var sr = new StreamReader(File.OpenRead(@"C:\Users\drake\Desktop\Email.txt"));
var sw = new StreamWriter(File.OpenWrite(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt"));
RemovingAllDupes(sr, sw);
}
public static void RemovingAllDupes(StreamReader str, StreamWriter stw)
{
var lines = new HashSet<int>();
while (!str.EndOfStream)
{
string line = str.ReadLine();
int hc = line.GetHashCode();
if (lines.Contains(hc))
continue;
lines.Add(hc);
stw.WriteLine(line);
}
stw.Flush();
stw.Close();
str.Close();
#1
0
For the new desired formatting, you could do something like this:
对于新的所需格式,您可以执行以下操作:
private string[] parseEmails(string bigStringiIn){
string[] output;
string bigString;
bigString = bigStringiIn.Replace("\"", "");
output = bigString.Slit(",".ToCharArray());
return output;
}
it takes the string with the mail adresses, replaces the quote marks, then splits the string into a string array with the format: name lastname email@some.com
它接受带有邮件地址的字符串,替换引号,然后将字符串拆分为字符串数组,格式为:name lastname email@some.com
for the duplicated entries deletion, a nested for should do the trick, checking (maybe after a .Split()) for matching strings.
对于重复的条目删除,嵌套的应该做的技巧,检查(可能在.Split()之后匹配字符串)。
#2
0
Welcome you can use this code and it will work on file made by creating new file which will contain all e-mails without duplicates:
欢迎您可以使用此代码,它将通过创建新文件来处理文件,该文件将包含所有不重复的电子邮件:
static void Main(string[] args)
{
TextWriter w = File.CreateText(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt");
ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
TextReader r = File.OpenText(@"C:\Users\drake\Desktop\Email.txt");
RemovingAllDupes(r, w);
}
public static void RemovingAllDupes(TextReader reader, TextWriter writer)
{
string currentLine;
HashSet<string> previousLines = new HashSet<string>();
while ((currentLine = reader.ReadLine()) != null)
{
// Add returns true if it was actually added,
// false if it was already there
if (previousLines.Add(currentLine))
{
writer.WriteLine(currentLine);
}
}
writer.Close();
}
#3
0
you can also use this code with big files:
您也可以将此代码用于大文件:
static void Main(string[] args)
{
ExtractEmails(@"C:\Users\drake\Desktop\New.txt", @"C:\Users\drake\Desktop\Email.txt");
var sr = new StreamReader(File.OpenRead(@"C:\Users\drake\Desktop\Email.txt"));
var sw = new StreamWriter(File.OpenWrite(@"C:\Users\drake\Desktop\NonDuplicateEmails.txt"));
RemovingAllDupes(sr, sw);
}
public static void RemovingAllDupes(StreamReader str, StreamWriter stw)
{
var lines = new HashSet<int>();
while (!str.EndOfStream)
{
string line = str.ReadLine();
int hc = line.GetHashCode();
if (lines.Contains(hc))
continue;
lines.Add(hc);
stw.WriteLine(line);
}
stw.Flush();
stw.Close();
str.Close();