显示在haskell中重复的单词列表

I need to be able to write a function that shows repeated words from a string and return a list of strings in order of its occurrence and ignore non-letters

我需要能够编写一个函数来显示字符串中重复的单词并按顺序返回字符串列表并忽略非字母

e.g at hugs prompt

例如在拥抱时

repetitions :: String -> [String]

repetitions > "My bag is is action packed packed."
output> ["is","packed"]
repetitions > "My name  name name is Sean ."
output> ["name","name"]
repetitions > "Ade is into into technical drawing drawing ."
output> ["into","drawing"]

4 个解决方案

#1

To split a string into words, use the words function (in the Prelude). To eliminate non-word characters, filter with Data.Char.isAlphaNum. Zip the list together with its tail to get adjacent pairs (x, y). Fold the list, consing a new list that contains all x where x == y.

要将字符串拆分为单词,请使用单词function(在Prelude中)。要消除非单词字符,请使用Data.Char.isAlphaNum进行过滤。将列表与其尾部一起压缩以获得相邻的对(x,y)。折叠列表,建立一个包含x == y所有x的新列表。

Someting like:

repetitions s = map fst . filter (uncurry (==)) . zip l $ tail l
  where l = map (filter isAlphaNum) (words s)

I'm not sure that works, but it should give you a rough idea.

我不确定它是否有效,但它应该给你一个粗略的想法。

#2

I am new to this language so my solution could be a kind of ugly in the eyes of an Haskell veteran, but anyway:

我是这种语言的新手,所以我的解决方案在Haskell退伍军人眼中可能是一种丑陋,但无论如何:

let repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (words (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') x)))))

This part will remove all non letters and non spaces from a string s:

这部分将删除字符串s中的所有非字母和非空格:

filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') ||  c==' ') s

This one will split a string s to words and group the same words to lists returning list of lists:

这个将字符串s拆分为单词并将相同的单词组合成列表返回列表列表:

List.group (words s)

When this part will remove all lists with less than two elements:

当此部分将删除少于两个元素的所有列表:

filter (\x -> (length x) > 1) s

After what we will concatenate all lists to one removing one element from them though

在我们将所有列表连接到一个之后,从它们中移除一个元素

concat (map tail s)

#3

This might be inelegent, however it is conceptually very simple. I'm assuming that its looking for consecutive duplicate words like the examples.

这可能是不合理的,但它在概念上非常简单。我假设它正在寻找像示例一样的连续重复单词。

-- a wrapper that allows you to give the input as a String
repititions :: String -> [String]
repititions s = repititionsLogic (words s)
-- dose the real work 
repititionsLogic :: [String] -> [String]
repititionsLogic [] = []
repititionsLogic [a] = []
repititionsLogic (a:as) 
    | ((==) a (head as)) = a : repititionsLogic as
    | otherwise = repititionsLogic as

#4

Building on what Alexander Prokofyev answered:

以亚历山大·普罗科菲耶夫的回答为基础:

repetitions x = concat (map tail (filter (\x -> (length x) > 1) (List.group (word (filter (\c -> (c >= 'a' && c <= 'z') || (c>='A' && c <= 'Z') || c==' ') x)))))

重复x = concat(map tail(filter(\ x - >(length x)> 1)(List.group(word(filter(\ c - >(c> ='a'&& c <='z')| |(c> ='A'&& c <='Z')|| c =='')x)))))

Remove unnecessary parenthesis:

删除不必要的括号:

repetitions x = concat (map tail (filter (\x -> length x > 1) (List.group (word (filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x)))))

重复x = concat(map tail(filter(\ x - > length x> 1)(List.group(word(filter(\ c - > c> ='a'&& c <='z'|| c> = 'A'&& c <='Z'|| c =='')x)))))

Use $ to remove more parenthesis (each $ can replace an opening parenthesis if the ending parenthesis is at the end of the expression):

使用$删除更多括号(如果结束括号位于表达式的末尾,则每个$可以替换左括号):

repetitions x = concat $ map tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> c >= 'a' && c <= 'z' || c>='A' && c <= 'Z' || c==' ') x

重复x = concat $ map tail $ filter(\ x - > length x> 1)$ List.group $ word $ filter(\ c - > c> ='a'&& c <='z'|| c> = 'A'&& c <='Z'|| c =='')x

Replace character ranges with functions from Data.Char, merge concat and map:

用Data.Char中的函数替换字符范围,合并concat和map:

repetitions x = concatMap tail $ filter (\x -> length x > 1) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x

重复x = concatMap tail $ filter(\ x - > length x> 1)$ List.group $ word $ filter(\ c - > isAlpha c || isSeparator c)x

Use a section and currying in points-free style to simplify (\x -> length x > 1) to ((>1) . length). This combines length with (>1) (a partially applied operator, or section) in a right-to-left pipeline.

使用一个部分并以无点样式进行曲线处理以简化(\ x - > length x> 1)到((> 1).length)。这将长度与(> 1)(部分应用的运算符或部分)组合在一个从右到左的管道中。

repetitions x = concatMap tail $ filter ((>1) . length) $ List.group $ word $ filter (\c -> isAlpha c || isSeparator c) x

重复x = concatMap tail $ filter((> 1).length)$ List.group $ word $ filter(\ c - > isAlpha c || isSeparator c)x

Eliminate explicit "x" variable to make overall expression points-free:

消除显式“x”变量以使整个表达式无点:

repetitions = concatMap tail . filter ((>1) . length) . List.group . word . filter (\c -> isAlpha c || isSeparator c)

repetitions = concatMap tail。过滤器((> 1)。长度)。 List.group。这个词。过滤器(\ c - > isAlpha c || isSeparator c)

Now the entire function, reading from right to left, is a pipeline that filters only alpha or separator characters, splits it into words, breaks it into groups, filters those groups with more than 1 element, and then reduces the remaining groups to the first element of each.

现在整个函数,从右到左阅读,是一个管道,只过滤字母或分隔符字符,将其拆分为单词,将其分成组,过滤那些具有多于1个元素的组,然后将剩余的组减少到第一个每个元素。

#1