如何在Haskell中分割字符串？

163

有没有在Haskell中拆分字符串的标准方法？

lines并且words可以很好地在空格或换行符上分割，但是肯定有一种在逗号上分割的标准方法吗？

我在Hoogle上找不到。

具体而言，我正在寻找的东西，其中split "," "my,comma,separated,list"的回报["my","comma","separated","list"]。

string haskell

— 埃里克·威尔逊
source

21

我真的很想在将来Data.List甚至发布的版本中提供这样的功能Prelude。如果无法用于代码高尔夫，那么它是如此常见且令人讨厌。

— 2011年

135

有一个名为split的软件包。

cabal install split

像这样使用它：

ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]

它带有许多其他功能，用于在匹配的定界符上分割或具有多个定界符。

— Jonno_FTW
source

9

凉。我不知道这个包裹。这是在最终的拆分包，因为它提供了操作太多的控制（在结果装饰的空间，结果离开分离器，删除连续的分隔符，等...）。拆分列表的方法太多了，不可能有一个split功能可以满足所有需求，您确实需要那种包装。

— gawi 2011年

1

否则，如果可以接受外部软件包，MissingH还提供一个拆分功能：hackage.haskell.org/packages/archive/MissingH/1.2.0.0/doc/html/…该软件包还提供了许多其他的“ nice-to-have”功能而且我发现很多软件包都依赖于它。

— 伊曼纽尔·图扎里

41

从最新版本开始，split软件包现在是haskell平台的一部分。

— 互联网

14

导入Data.List.Split（splitOn）并进入城镇。splitOn ::等式=> [a]-> [a]-> [[a]]

— ，互联网

1

@RussAbbott拆分软件包在下载时已包含在Haskell平台中（haskell.org/platform/contents.html），但在构建项目时不会自动加载。例如，如果您的项目被称为hello，则将其添加split到build-depends清单文件中，然后在hello.cabal该executable hello行下面的文件中，添加类似于build-depends：base，split之类的行（请注意两个空格缩进）。然后使用cabal build命令进行构建。cf. haskell.org/cabal/users-guide/…–

— expz

164

请记住，您可以查找Prelude函数的定义！

http://www.haskell.org/onlinereport/standard-prelude.html

看那里的定义words是，

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

因此，将其更改为带有谓词的函数：

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

然后用您想要的谓词调用它！

main = print $ wordsWhen (==',') "break,this,string,at,commas"

— 史蒂夫
source

31

如果使用Data.Text，则存在splitOn：

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn

这是在Haskell平台中构建的。

因此，例如：

import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")

要么：

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"

— 伊曼纽尔·图泽里（Emmanuel Touzery）
source

1

@RussAbbott可能您需要依赖于该text软件包或进行安装。虽然会属于另一个问题。

— 伊曼纽尔·图扎里

无法将'T.Text'类型与'Char'匹配预期的类型：[Char]实际类型：[T.Text]

— Andrew Koster

19

在Text.Regex模块（Haskell平台的一部分）中，有一个函数：

splitRegex :: Regex -> String -> [String]

根据正则表达式分割字符串。该API可以在Hackage中找到。

— 邪恶糖果袋
source

Could not find module ‘Text.Regex’ Perhaps you meant Text.Read (from base-4.10.1.0)

— Andrew Koster

18

使用Data.List.Split，它使用split：

[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]

— 反物质
source

14

试试这个：

import Data.List (unfoldr)

separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
  sep [] = Nothing
  sep l  = Just . fmap (drop 1) . break (== chr) $ l

仅适用于单个字符，但应易于扩展。

— 福兹
source

10

在不导入任何字符的情况下，无需将一个字符直接替换为空格，目标分隔符为words空格。就像是：

words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]

要么

words let f ',' = ' '; f c = c in map f "my,comma,separated,list"

您可以使它成为带有参数的函数。您可以消除匹配多个字符的参数，例如：

 [if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]

— fp_mora
source

9

split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s

例如

split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]

单个尾部定界符将被删除：

split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]

— 弗兰克·迈斯查特
source

6

我昨天开始学习Haskell，所以如果我错了，请纠正我，但是：

split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if y==x then 
            func x ys ([]:(z:zs)) 
        else 
            func x ys ((y:z):zs)

给出：

*Main> split ' ' "this is a test"
["this","is","a","test"]

也许你想要

*Main> splitWithStr  " and " "this and is and a and test"
["this","is","a","test"]

可能是：

splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
            func x (drop (length x) (y:ys)) ([]:(z:zs))
        else
            func x ys ((y:z):zs)

— 罗宾·贝比
source

1

我一直在寻找内置的split，被语言完善的库所破坏。但是还是谢谢你。

— 埃里克·威尔逊

3

您是在6月编写的，因此我认为您已经继续前进了：)作为练习，尝试重写该函数时不要使用反向或长度，因为使用这些函数会导致算法复杂性下降，并且还会阻止应用到无限列表中。玩得开心！

— 托尼·莫里斯

5

我不知道如何在Steve的答案上添加注释，但是我想推荐
GHC库文档，
特别是
在Data.List中的Sublist函数中。

作为参考，比仅阅读简单的Haskell报告要好得多。

通常，对何时创建新的子列表进行馈送的规则的折叠也应该解决。

— Evi1M4chine
source

2

除了答案中给出的高效和预先构建的函数外，我还将添加自己的函数，这些都是我自己编写的用于学习语言的Haskell函数库的一部分：

-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
    go s' ws = case (dropWhile (\c' -> c' == c) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)

-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
    go s' ws = case ((dropWhile (\c' -> f c')) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)

这些解决方案至少是尾部递归的，因此它们不会引起堆栈溢出。

— 尔凡·哈米德（Irfan Hamid）
source

2

ghci中的示例：

>  import qualified Text.Regex as R
>  R.splitRegex (R.mkRegex "x") "2x3x777"
>  ["2","3","777"]

— 安德烈
source

1

请不要使用正则表达式来分割字符串。谢谢。

— kirelagin

@kirelagin，为什么要发表评论？我正在学习Haskell，我想知道你的评论背后的原因。

— 恩里科·玛丽亚·德·安吉利斯

@Andrey，为什么我什至不能运行第一行ghci？

— 恩里科·玛丽亚·德·安吉利斯

1

@EnricoMariaDeAngelis正则表达式是字符串匹配的强大工具。当您匹配不平凡的东西时，使用它们很有意义。如果您只想在一个字符串上将字符串拆分为与另一个固定字符串一样琐碎的内容，则绝对不需要使用正则表达式-这只会使代码更复杂，并且可能会变得更慢。

— kirelagin

“请不要使用正则表达式来分割字符串。” WTF，为什么不呢？？？用正则表达式拆分字符串是一件很合理的事情。在很多情况下，需要拆分字符串，但分隔符并不总是完全相同。

— 安德鲁·科斯特

2

我发现这更容易理解：

split :: Char -> String -> [String]
split c xs = case break (==c) xs of 
  (ls, "") -> [ls]
  (ls, x:rs) -> ls : split c rs

— mxs
source