如何使用Haskell对列表中的相似项进行分组?

Given a list of tuples like this:

给出一个这样的元组列表:

dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]

How to group items of dic resulting in a list grp where,

如何将dic的项分组到一个列表grp中，

grp  = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]

I'm actually a newcomer to Haskell...and seems to be falling in love with it..
Using group or groupBy in Data.List will only group similar adjacent items in a list. I wrote an inefficient function for this, but it results in memory failures as I need to process a very large coded string list. Hope you would help me find a more efficient way.

我是Haskell的新手……而且似乎爱上了它。在数据中使用组或组。列表只会在列表中列出类似的相邻项。我为此编写了一个低效的函数，但它导致了内存失败，因为我需要处理一个非常大的编码字符串列表。希望你能帮我找到一个更有效的方法。

5 个解决方案

#1

Here's my solution:

这是我的解决方案:

import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)

myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
          . sortBy (comparing fst)

This works by first sorting the list with sortBy:

这是第一次通过排序来排序列表的工作:

[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]     
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]

then grouping the list elements by the associated key with groupBy:

然后，通过与groupBy关联的键对列表元素进行分组:

[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")] 
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]

and then transforming the grouped items to tuples with map:

然后将分组项转换成图元组:

[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]] 
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)

Testing:

测试:

> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

#2

Whenever possible, reuse library code.

只要可能，重用库代码。

import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]

Try it out in ghci:

在ghci尝试一下:

*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]

#3

Also you can use TransformListComp extension, for example:

还可以使用TransformListComp扩展，例如:

Prelude> :set -XTransformListComp 
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]

#4

If the list is not sorted on the first element, I don't think you can do better than O(nlog(n)).

如果列表没有对第一个元素进行排序，那么我认为您不能比O(nlog(n))做得更好。
- One simple way would be to just sort and then use anything from the answer of second part.
  
  一种简单的方法就是排序，然后用第二部分的答案。
- You can use from Data.Map a map like Map k [a] to use first element of tuple as key and keep on adding to the values.
  
  你可以使用数据。映射像Map k [a]这样的Map，使用tuple的第一个元素作为键，并继续添加到值中。
- You can write your own complex function, which even after you all the attempts will still take O(nlog(n)).
  
  您可以编写自己的复杂函数，即使在所有尝试之后仍然会使用O(nlog(n))。
If list is sorted on the first element as is the case in your example, then the task is trivial for something like groupBy as given in the answer by @Mikhail or use foldr and there are numerous other ways.

如果列表按照您的示例中的第一个元素排序，那么这个任务对于groupBy之类的东西来说是微不足道的，例如通过@Mikhail或使用foldr给出的答案，还有许多其他的方法。

An example of using foldr is here:

这里有一个使用foldr的例子:

  grp :: Eq a => [(a,b)] -> [(a,[b])]
  grp = foldr f []
     where 
       f (z,s) [] = [(z,[s])] 
       f (z,s) a@((x,y):xs)  | x == z = (x,s:y):xs 
                             | otherwise = (z,[s]):a

#5

{-# LANGUAGE TransformListComp #-}

import GHC.Exts
import Data.List
import Data.Function (on)

process :: [(Integer, String)] -> [(Integer, [String])]
process list = [(the a, b) |  let info = [ (x, y) | (x, y) <- list, then    sortWith by y ], (a, b) <- info, then group by a using groupWith]

#1