大批量号码如何去重.

网页上，文本框中有10w+ 的号码一行一个，点击一个按钮，如何快速去除相同的号码

13000000000
18666666666
13888888888
13222222222
13000000000
13222222222
...........

去重后
13000000000
18666666666
13888888888
13222222222

29 个解决方案

#1

vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

#2

给你写个例子：

using System;

using System.Collections.Generic;


namespace ConsoleApplication1

{

    class Program

    {


        static void Main(string[] args)

        {

            var data = new List<string> { "asd fkaskfasdf asdf a", "kas 82", "2378482342389423", "kas 82", "39"};

            foreach (var r in 去重(data))

            {

                Console.WriteLine(r);

            }

            Console.ReadKey();

        }


        private static IEnumerable<string > 去重(List<string> data)

        {

            var dic = new HashSet<string>();

            foreach (var x in data)

            {

                if (!dic.Contains(x))

                {

                    yield return x;


                    dic.Add(x);

                }

            }

        }


    }

}

#3

引用 1 楼的回复:

vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

能更详细点吗？我希望是在后台 cs 中处理.

#4

或者（既然是10+W的内容都写到一个string中去了）更进一步：

using System;

using System.Collections.Generic;

using System.IO;


namespace ConsoleApplication1

{

    class Program

    {


        static void Main(string[] args)

        {

            var text = "asd fkaskfasdf asdf a\r\nkas 82\r\n2378482342389423\r\nkas 82\r\n39";

            var data = 读取行(text);

            foreach (var r in 去重(data))

            {

                Console.WriteLine(r);

            }

            Console.ReadKey();

        }


        private static IEnumerable<string> 读取行(string text)

        {

            var rd = new StringReader(text);

        begin:

            var line = rd.ReadLine();

            if (line != null)

            {

                yield return line;


                goto begin;

            }

        }


        private static IEnumerable<string> 去重(IEnumerable<string> data)

        {

            var dic = new HashSet<string>();

            foreach (var x in data)

            {

                if (!dic.Contains(x))

                {

                    yield return x;


                    dic.Add(x);

                }

            }

        }


    }

}

#5

引用 4 楼的回复:

或者（既然是10+W的内容都写到一个string中去了）更进一步：

C# code
using System;
using System.Collections.Generic;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {

        static void Ma……

我试了下，效率好像很慢的，我放了20万个号码,要执行很久

#6

public static List<string> NoSomeValue(string[] InPut)

        {

            List<string> list = new List<string>();

            if (InPut.Length > 0)

            {

                Array.Sort(InPut);


                int size = 1;

                for (int i = 1; i < InPut.Length; i++)

                    if (InPut[i] != InPut[i - 1])

                        size++;


                String[] myTempData = new String[size];


                int j = 0;


                myTempData[j++] = InPut[0];


                for (int i = 1; i < InPut.Length; i++)

                    if (InPut[i] != InPut[i - 1])

                        myTempData[j++] = InPut[i];



                list.AddRange(myTempData);

                return list;

            }


            list.AddRange(InPut);

            return list;

        }

NoSomeValue(textBox1.Split(new string[] { "\r\n" }))

#7

最简单最快的还是linq了，

一楼给的代码正解。

#8

引用 1 楼的回复:

vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

这个已经正解了，而且速度不慢，不过是不是应该把textBox.Split改成textBox.Text.Split呢

#9

引用 5 楼的回复:

我试了下，效率好像很慢的，我放了20万个号码,要执行很久

任何一个程序处理这个也不会“执行很久”，除非你把在屏幕上打印输出的时间也计算在内。

#10

引用 8 楼的回复:

引用 1 楼的回复:
vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

这个已经正解了，而且速度不慢，不过是不是应该把textBox.Split改成textBox.Text.Split呢

linq没有做过，能不能把代码写得更全些呢？谢谢!

#11

引用 9 楼的回复:

引用 5 楼的回复:
我试了下，效率好像很慢的，我放了20万个号码,要执行很久

任何一个程序处理这个也不会“执行很久”，除非你把在屏幕上打印输出的时间也计算在内。

        protected void Button2_Click(object sender, EventArgs e)
        {

            string str="";
            var text = this.TextMobile.Value;
            var data = 读取行(text);
            foreach (var r in 去重(data))
            {
                str += r+"\n";

            }

            this .TextMobile2 .Value =str .ToString ();

        }

        private static IEnumerable<string> 读取行(string text)
        {
            var rd = new StringReader(text);
        begin:
            var line = rd.ReadLine();
            if (line != null)
            {
                yield return line;

                goto begin;
            }
        }

        private static IEnumerable<string> 去重(IEnumerable<string> data)
        {
            var dic = new HashSet<string>();
            foreach (var x in data)
            {
                if (!dic.Contains(x))
                {
                    yield return x;

                    dic.Add(x);
                }
            }
        }

10万个号码就用了几分钟的时间.

#12



            string s = textBox.Text;

            HashSet<string> hs = new HashSet<string>();

            fixed (char* p = s)

            {

                for (char* p1 = p; p1 < p + s.Length; p1 += 13)

                    hs.Add(new string(p1, 0, 13));

            }

            string newstring = string.Concat(hs);

这样也行，能保留换行符。

#13

如果是“\n”就改为



            string s = textBox.Text;

            HashSet<string> hs = new HashSet<string>();

            fixed (char* p = s)

            {

                for (char* p1 = p; p1 < p + s.Length; p1 += 12)

                    hs.Add(new string(p1, 0, 12));

            }

            string newstring = string.Concat(hs);

#14

还是规规矩矩用SubString吧。



            string s = textBox.Text;

            HashSet<string> hs = new HashSet<string>();

            for (int i = 0; i < s.Length; i += 12)

                hs.Add(s.Substring(i, 12));

            string newstring = string.Concat(hs);

#15

引用 3 楼的回复:

引用 1 楼的回复:
vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

能更详细点吗？我希望是在后台 cs 中处理.

搞笑吧。看到只有一行的代码的条件反射是“能更详细点吗”？

#16

使用Dictionary<string, int>来判断，以前有弄过1百多万条数据去除重复，速度比List快很多。

#17

详细点:
放2个文本框，一个按钮，在按钮事件里面写

var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();

textBox2.Text = string.Join("\r\n", result.ToArray());

然后运行。结束。

#18

引用 17 楼的回复:

详细点:
放2个文本框，一个按钮，在按钮事件里面写

C# code
var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();
textBox2.Text = string.Join("\r\n", result.ToArray());

然后运行。结束。

第句代码编译不对也，提示无效参数

#19

少了一个参数呗

var result = "123".Split(new string[] { "\r\n" }, StringSplitOptions.None).Distinct();

难道你自己没有发现。

#20

引用 17 楼的回复:

详细点:
放2个文本框，一个按钮，在按钮事件里面写

C# code
var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();
textBox2.Text = string.Join("\r\n", result.ToArray());

然后运行。结束。

var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();

提示与"string.split(params char[])" 最匹配的重载方法具有一些无效参数

#21

引用 20 楼的回复:

提示与"string.split(params char[])" 最匹配的重载方法具有一些无效参数

var result = textBox1.Text.Split( "\r\n".ToCharArray()).Distinct();

#22

耗时几分钟，显然太慢了。

去除无效的。
只留下数字。

那么， 10万多个数字，去重复————这个步骤， 0.01秒之内可搞定的

#23

该回复于2012-08-04 17:03:23被版主删除

#24

引用 7 楼的回复:

最简单最快的还是linq了，

一楼给的代码正解。

正解

#25

该回复于2014-09-09 00:07:56被版主删除

#26

小批量看不出差别，大批量方式得注意积累

#27

该回复于2012-08-06 09:41:05被版主删除

#28

该回复于2012-08-06 13:01:21被版主删除

#29

引用 24 楼的回复:

引用 7 楼的回复:

最简单最快的还是linq了，

一楼给的代码正解。
正解

代码最简单是Linq，这个我同意，但最快？这个真不敢苟同。如果你需要处理效率，而且10W不是你的最大数量，还是好好看看2、4、13、16楼的代码，大数据量我们碰得比较多。

#1

vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

#2

给你写个例子：

using System;

using System.Collections.Generic;


namespace ConsoleApplication1

{

    class Program

    {


        static void Main(string[] args)

        {

            var data = new List<string> { "asd fkaskfasdf asdf a", "kas 82", "2378482342389423", "kas 82", "39"};

            foreach (var r in 去重(data))

            {

                Console.WriteLine(r);

            }

            Console.ReadKey();

        }


        private static IEnumerable<string > 去重(List<string> data)

        {

            var dic = new HashSet<string>();

            foreach (var x in data)

            {

                if (!dic.Contains(x))

                {

                    yield return x;


                    dic.Add(x);

                }

            }

        }


    }

}

#3

引用 1 楼的回复:

vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

能更详细点吗？我希望是在后台 cs 中处理.

#4

或者（既然是10+W的内容都写到一个string中去了）更进一步：

using System;

using System.Collections.Generic;

using System.IO;


namespace ConsoleApplication1

{

    class Program

    {


        static void Main(string[] args)

        {

            var text = "asd fkaskfasdf asdf a\r\nkas 82\r\n2378482342389423\r\nkas 82\r\n39";

            var data = 读取行(text);

            foreach (var r in 去重(data))

            {

                Console.WriteLine(r);

            }

            Console.ReadKey();

        }


        private static IEnumerable<string> 读取行(string text)

        {

            var rd = new StringReader(text);

        begin:

            var line = rd.ReadLine();

            if (line != null)

            {

                yield return line;


                goto begin;

            }

        }


        private static IEnumerable<string> 去重(IEnumerable<string> data)

        {

            var dic = new HashSet<string>();

            foreach (var x in data)

            {

                if (!dic.Contains(x))

                {

                    yield return x;


                    dic.Add(x);

                }

            }

        }


    }

}

#5

引用 4 楼的回复:

或者（既然是10+W的内容都写到一个string中去了）更进一步：

C# code
using System;
using System.Collections.Generic;
using System.IO;

namespace ConsoleApplication1
{
    class Program
    {

        static void Ma……

我试了下，效率好像很慢的，我放了20万个号码,要执行很久

#6

public static List<string> NoSomeValue(string[] InPut)

        {

            List<string> list = new List<string>();

            if (InPut.Length > 0)

            {

                Array.Sort(InPut);


                int size = 1;

                for (int i = 1; i < InPut.Length; i++)

                    if (InPut[i] != InPut[i - 1])

                        size++;


                String[] myTempData = new String[size];


                int j = 0;


                myTempData[j++] = InPut[0];


                for (int i = 1; i < InPut.Length; i++)

                    if (InPut[i] != InPut[i - 1])

                        myTempData[j++] = InPut[i];



                list.AddRange(myTempData);

                return list;

            }


            list.AddRange(InPut);

            return list;

        }

NoSomeValue(textBox1.Split(new string[] { "\r\n" }))

#7

最简单最快的还是linq了，

一楼给的代码正解。

#8

引用 1 楼的回复:

vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

这个已经正解了，而且速度不慢，不过是不是应该把textBox.Split改成textBox.Text.Split呢

#9

引用 5 楼的回复:

我试了下，效率好像很慢的，我放了20万个号码,要执行很久

任何一个程序处理这个也不会“执行很久”，除非你把在屏幕上打印输出的时间也计算在内。

#10

引用 8 楼的回复:

引用 1 楼的回复:
vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

这个已经正解了，而且速度不慢，不过是不是应该把textBox.Split改成textBox.Text.Split呢

linq没有做过，能不能把代码写得更全些呢？谢谢!

#11

引用 9 楼的回复:

引用 5 楼的回复:
我试了下，效率好像很慢的，我放了20万个号码,要执行很久

任何一个程序处理这个也不会“执行很久”，除非你把在屏幕上打印输出的时间也计算在内。

#12



            string s = textBox.Text;

            HashSet<string> hs = new HashSet<string>();

            fixed (char* p = s)

            {

                for (char* p1 = p; p1 < p + s.Length; p1 += 13)

                    hs.Add(new string(p1, 0, 13));

            }

            string newstring = string.Concat(hs);

这样也行，能保留换行符。

#13

如果是“\n”就改为



            string s = textBox.Text;

            HashSet<string> hs = new HashSet<string>();

            fixed (char* p = s)

            {

                for (char* p1 = p; p1 < p + s.Length; p1 += 12)

                    hs.Add(new string(p1, 0, 12));

            }

            string newstring = string.Concat(hs);

#14

还是规规矩矩用SubString吧。



            string s = textBox.Text;

            HashSet<string> hs = new HashSet<string>();

            for (int i = 0; i < s.Length; i += 12)

                hs.Add(s.Substring(i, 12));

            string newstring = string.Concat(hs);

#15

引用 3 楼的回复:

引用 1 楼的回复:
vqr result = textBox.Split(new string[] { "\r\n" }).Distinct();

能更详细点吗？我希望是在后台 cs 中处理.

搞笑吧。看到只有一行的代码的条件反射是“能更详细点吗”？

#16

使用Dictionary<string, int>来判断，以前有弄过1百多万条数据去除重复，速度比List快很多。

#17

详细点:
放2个文本框，一个按钮，在按钮事件里面写

var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();

textBox2.Text = string.Join("\r\n", result.ToArray());

然后运行。结束。

#18

引用 17 楼的回复:

详细点:
放2个文本框，一个按钮，在按钮事件里面写

C# code
var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();
textBox2.Text = string.Join("\r\n", result.ToArray());

然后运行。结束。

第句代码编译不对也，提示无效参数

#19

少了一个参数呗

var result = "123".Split(new string[] { "\r\n" }, StringSplitOptions.None).Distinct();

难道你自己没有发现。

#20

引用 17 楼的回复:

详细点:
放2个文本框，一个按钮，在按钮事件里面写

C# code
var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();
textBox2.Text = string.Join("\r\n", result.ToArray());

然后运行。结束。

var result = textBox1.Text.Split(new string[] { "\r\n" }).Distinct();

提示与"string.split(params char[])" 最匹配的重载方法具有一些无效参数

#21

引用 20 楼的回复:

提示与"string.split(params char[])" 最匹配的重载方法具有一些无效参数

var result = textBox1.Text.Split( "\r\n".ToCharArray()).Distinct();

#22

耗时几分钟，显然太慢了。

去除无效的。
只留下数字。

那么， 10万多个数字，去重复————这个步骤， 0.01秒之内可搞定的

#23

该回复于2012-08-04 17:03:23被版主删除

#24

引用 7 楼的回复:

最简单最快的还是linq了，

一楼给的代码正解。

正解

#25

该回复于2014-09-09 00:07:56被版主删除

#26

小批量看不出差别，大批量方式得注意积累

#27

该回复于2012-08-06 09:41:05被版主删除

#28

该回复于2012-08-06 13:01:21被版主删除

#29

引用 24 楼的回复:

引用 7 楼的回复:

最简单最快的还是linq了，

一楼给的代码正解。
正解

大批量号码如何去重.

29 个解决方案

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

#12

#13

#14

#15

#16

#17

#18

#19

#20

#21

#22

#23

#24

#25

#26

#27

#28

#29

#1

#2

#3

#4

#5

#6

#7

#8

#9

#10

#11

#12

#13

#14

#15

#16

#17

#18

#19

#20

#21

#22

#23

#24

#25

#26

#27

#28

#29

相关文章