如何在Haskell中分割字符串?

时间:2021-09-10 21:42:14

Is there a standard way to split a string in Haskell?

在Haskell中是否有一个标准的方法来分割字符串?

lines and words work great from spliting on a space or newline, but surely there is a standard way to split on a comma? I couldn't fint it at Hoogle?

在空格或换行符中,行和字可以很好地工作,但是在逗号上有一种标准的分割方法吗?我不能在Hoogle找到它?

To be specific, I'm looking for something where split "," "my,comma,separated,list" returns ["my","comma","separated","list"]

具体地说,我在寻找分裂的东西,“我的,逗号,分开的,列表”返回[“我的”,“逗号”,“分隔”,“列表”]

Thanks.

谢谢。

12 个解决方案

#1


109  

There is a package for this called split.

这里有一个名为split的包。

cabal install split

Use it like this:

使用它是这样的:

ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]

It comes with a lot of other functions for splitting on matching delimiters or having several delimiters.

它附带了许多其他函数,用于分隔匹配的分隔符或具有多个分隔符。

#2


138  

Remember that you can look up the definition of Prelude functions!

记住,你可以查找序曲函数的定义!

http://www.haskell.org/onlinereport/standard-prelude.html

http://www.haskell.org/onlinereport/standard-prelude.html

Looking there, the definition of words is,

看这里,单词的定义是,

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

So, change it for a function that takes a predicate:

所以,把它改成一个函数,它取一个谓词:

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

Then call it with whatever predicate you want!

然后用你想要的任何谓词调用它!

main = print $ wordsWhen (==',') "break,this,string,at,commas"

#3


22  

If you use Data.Text, there is splitOn:

如果您使用数据。文本,有splitOn:

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html v:splitOn

This is built in the Haskell Platform.

这是在Haskell平台上构建的。

So for instance:

例如:

import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")

or:

或者:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"

#4


17  

In the module Text.Regex (part of the Haskell Platform), there is a function:

模块中的文本。Regex (Haskell平台的一部分),有一个函数:

splitRegex :: Regex -> String -> [String]

which splits a string based on a regular expression. The API can be found at Hackage.

它根据正则表达式分割字符串。这个API可以在Hackage找到。

#5


11  

Use Data.List.Split, which uses split:

使用Data.List。分裂,它使用分割:

[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]

#6


10  

Try this one:

试试这个:

import Data.List (unfoldr)

separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
  sep [] = Nothing
  sep l  = Just . fmap (drop 1) . break (== chr) $ l

Only works for a single char, but should be easily extendable.

只适用于单个字符,但应该易于扩展。

#7


8  

split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s

E.g.

如。

split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]

A single trailing delimiter will be dropped:

一个尾随分隔符将被删除:

split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]

#8


5  

I started learning Haskell yesterday, so correct me if I'm wrong but:

我昨天开始学习Haskell,如果我错了,请纠正我:

split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if y==x then 
            func x ys ([]:(z:zs)) 
        else 
            func x ys ((y:z):zs)

gives:

给:

*Main> split ' ' "this is a test"
["this","is","a","test"]

or maybe you wanted

或者你想要的

*Main> splitWithStr  " and " "this and is and a and test"
["this","is","a","test"]

which would be:

这将是:

splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
            func x (drop (length x) (y:ys)) ([]:(z:zs))
        else
            func x ys ((y:z):zs)

#9


5  

I don’t know how to add a comment onto Steve’s answer, but I would like to recommend the
  GHC libraries documentation,
and in there specifically the
  Sublist functions in Data.List

我不知道如何在Steve的答案上添加注释,但是我想推荐GHC库文档,特别是在Data.List中的子列表函数。

Which is much better as a reference, than just reading the plain Haskell report.

这比仅仅阅读Haskell的简单报告要好得多。

Generically, a fold with a rule on when to create a new sublist to feed, should solve it too.

一般来说,在创建一个新的子列表时使用规则的折叠,也可以解决它。

#10


2  

In addition to the efficient and pre-built functions given in answers I'll add my own which are simply part of my repertory of Haskell functions I was writing to learn the language on my own time:

除了在回答中给出的高效和预先构建的函数之外,我还会添加我自己的函数,这只是我的Haskell函数库中的一部分,我是在自己的时间里学习这门语言的:

-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
    go s' ws = case (dropWhile (\c' -> c' == c) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)

-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
    go s' ws = case ((dropWhile (\c' -> f c')) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)

The solutions are at least tail-recursive so they won't incur a stack overflow.

解决方案至少是尾部递归的,这样就不会产生堆栈溢出。

#11


1  

Example in the ghci:

ghci的例子:

>  import qualified Text.Regex as R
>  R.splitRegex (R.mkRegex "x") "2x3x777"
>  ["2","3","777"]

#12


0  

Without importing anything a straight substitution of one character for a space, the target separator for words is a space. Something like:

如果没有导入任何一个字符的直接替换,那么单词的目标分隔符就是一个空格。喜欢的东西:

words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]

or

words let f ',' = ' '; f c = c in map f "my,comma,separated,list"

You can make this into a function with parameters. You can eliminate the parameter character-to-match my matching many, like in:

你可以用参数把它变成一个函数。您可以删除参数字符匹配我匹配的许多,比如:

 [if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]

#1


109  

There is a package for this called split.

这里有一个名为split的包。

cabal install split

Use it like this:

使用它是这样的:

ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]

It comes with a lot of other functions for splitting on matching delimiters or having several delimiters.

它附带了许多其他函数,用于分隔匹配的分隔符或具有多个分隔符。

#2


138  

Remember that you can look up the definition of Prelude functions!

记住,你可以查找序曲函数的定义!

http://www.haskell.org/onlinereport/standard-prelude.html

http://www.haskell.org/onlinereport/standard-prelude.html

Looking there, the definition of words is,

看这里,单词的定义是,

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

So, change it for a function that takes a predicate:

所以,把它改成一个函数,它取一个谓词:

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

Then call it with whatever predicate you want!

然后用你想要的任何谓词调用它!

main = print $ wordsWhen (==',') "break,this,string,at,commas"

#3


22  

If you use Data.Text, there is splitOn:

如果您使用数据。文本,有splitOn:

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html v:splitOn

This is built in the Haskell Platform.

这是在Haskell平台上构建的。

So for instance:

例如:

import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")

or:

或者:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"

#4


17  

In the module Text.Regex (part of the Haskell Platform), there is a function:

模块中的文本。Regex (Haskell平台的一部分),有一个函数:

splitRegex :: Regex -> String -> [String]

which splits a string based on a regular expression. The API can be found at Hackage.

它根据正则表达式分割字符串。这个API可以在Hackage找到。

#5


11  

Use Data.List.Split, which uses split:

使用Data.List。分裂,它使用分割:

[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]

#6


10  

Try this one:

试试这个:

import Data.List (unfoldr)

separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
  sep [] = Nothing
  sep l  = Just . fmap (drop 1) . break (== chr) $ l

Only works for a single char, but should be easily extendable.

只适用于单个字符,但应该易于扩展。

#7


8  

split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s

E.g.

如。

split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]

A single trailing delimiter will be dropped:

一个尾随分隔符将被删除:

split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]

#8


5  

I started learning Haskell yesterday, so correct me if I'm wrong but:

我昨天开始学习Haskell,如果我错了,请纠正我:

split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if y==x then 
            func x ys ([]:(z:zs)) 
        else 
            func x ys ((y:z):zs)

gives:

给:

*Main> split ' ' "this is a test"
["this","is","a","test"]

or maybe you wanted

或者你想要的

*Main> splitWithStr  " and " "this and is and a and test"
["this","is","a","test"]

which would be:

这将是:

splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
            func x (drop (length x) (y:ys)) ([]:(z:zs))
        else
            func x ys ((y:z):zs)

#9


5  

I don’t know how to add a comment onto Steve’s answer, but I would like to recommend the
  GHC libraries documentation,
and in there specifically the
  Sublist functions in Data.List

我不知道如何在Steve的答案上添加注释,但是我想推荐GHC库文档,特别是在Data.List中的子列表函数。

Which is much better as a reference, than just reading the plain Haskell report.

这比仅仅阅读Haskell的简单报告要好得多。

Generically, a fold with a rule on when to create a new sublist to feed, should solve it too.

一般来说,在创建一个新的子列表时使用规则的折叠,也可以解决它。

#10


2  

In addition to the efficient and pre-built functions given in answers I'll add my own which are simply part of my repertory of Haskell functions I was writing to learn the language on my own time:

除了在回答中给出的高效和预先构建的函数之外,我还会添加我自己的函数,这只是我的Haskell函数库中的一部分,我是在自己的时间里学习这门语言的:

-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
    go s' ws = case (dropWhile (\c' -> c' == c) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)

-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
    go s' ws = case ((dropWhile (\c' -> f c')) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)

The solutions are at least tail-recursive so they won't incur a stack overflow.

解决方案至少是尾部递归的,这样就不会产生堆栈溢出。

#11


1  

Example in the ghci:

ghci的例子:

>  import qualified Text.Regex as R
>  R.splitRegex (R.mkRegex "x") "2x3x777"
>  ["2","3","777"]

#12


0  

Without importing anything a straight substitution of one character for a space, the target separator for words is a space. Something like:

如果没有导入任何一个字符的直接替换,那么单词的目标分隔符就是一个空格。喜欢的东西:

words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]

or

words let f ',' = ' '; f c = c in map f "my,comma,separated,list"

You can make this into a function with parameters. You can eliminate the parameter character-to-match my matching many, like in:

你可以用参数把它变成一个函数。您可以删除参数字符匹配我匹配的许多,比如:

 [if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]