Haskell Json数据处理

json的基本类型为——string, numbers, Booleans以及null，定义json类型如下

-- file: Json.hs

module Json where

data JValue = JString String

           |  JNumber Double

           |  JBool Bool

           |  JNull

           |  JObject [(String, JValue)]

           |  JArray [JValue]

              deriving(Eq, Ord, Show)

打印Json类型数据

-- file: PutJson.hs

module PutJson where

import Data.List (intercalate)

import Json

renderJValue :: JValue –> String

renderJValue (JString s) = show s

renderJValue (JNumber n) = show n

renderJValue (JBool True) = “true”

renderJValue (JBool False) = “false”

renderJValue JNull         = “null”

renderJValue (JObject o) = "{" ++ pairs o ++ "}"

    where pairs [] = ""

          pairs ps = intercalate "," (map renderPair ps)

          renderPair (k, v) = show k ++ ":" ++ renderJValue v

renderJValue (JArray a) = "[" ++ values a ++ "]"

    where values [] = ""

          values vs = intercalate "," (map renderJValue vs)

putJValue :: JValue –> IO()

putJValue v = putStrLn (renderJValue v)

最后编写调用文件

-- file: Main.hs

module Main where

main = putJValue (JObject [("foo", JNumber 1), ("bar", JBool False)])

在命令行中输入如下指令

ghc –o demo Json.hs PutJson.hs Main.hs

生成demo.exe，然后在命令行中输入demo，即可得到JValue的打印字符串。如果JValue的值结构非常复杂，则通过这种方法打印出来显得对人类不够友好。那么，如何友好的打印出来呢？

假设我们有一个模块Prettify，这个模块的API将JValue转换为Doc而不是上面的String，我们重写render函数如下，

-- file: PrettyJson.hs

renderJValue :: JValue –> Doc

renderJValue (JBool True)  = text “true”

renderJValue (JBool False) = text “false”

renderJValue JNull         = text “null”

renderJValue (JNumber num) = double num

renderJValue (JString str) = string str

其中，Doc类型，text，double和string函数由我们的Prettify模块实现。

先给出string函数的定义

-- file: PrettyJson.hs

string :: String –> Doc

string = enclose '"' '"' . hcat . map oneChar

这个定义中使用了*点风格（*点风格），本来函数是从左向右运算的，使用*点风格后，等价于

-- file: PrettyJson.hs

string :: String –> Doc

string s = enclose '"' '"' (hcat (map oneChar s))                                          （1）

可以看出，使用这种风格，省去参数s，并将小括号替换为点号。这个定义中，出现了各种函数，现在来依次分解。enclose函数是在一个Doc对象的前面和后面分别附加字符，其定义如下

-- file: PrettyJson.hs

enclose :: Char –> Char –> Doc –> Doc

enclose left right x = char left <> x <> char right

char是一个函数，用于将一个Char字符类型转换为Doc类型，目前我们还不知道其方法体，先用undefined表示其方法体，这样，签名如下，

-- file: PrettyJson.hs

char :: Char –> Doc

char c = undefined

<>是一个中缀函数，写在两个运算分量中间时，看作一个运算符，其运算优先级比作为函数的char底，故enclose函数方法体可以这么看，char函数将Char字符类型参数left和right转换为两个Doc类型参数，然后与另一个Doc类型参数 x 用两个 <> 运算符连接，容易知道，<>连接两个Doc类型参数，得到另一个Doc类型结果，签名如下，其实，这个<>运算符类似于String类型的(++)运算符，即连接两个Doc参数，生成一个新的Doc类型结果。

-- file: PrettyJson.hs

(<>) :: Doc –> Doc –> Doc

a <> b = undefined

我们再看（1），现在可以知道string函数的作用就是对(hcat (map oneChar s)) 这个Doc类型值两边添加双引号 (")，于是我们接着看hcat函数，这个函数类似于list的连接函数concat （注意跟 <> 的区别），签名如下

-- file: PrettyJson.hs

hcat :: [Doc]-> Doc

hcat xs = undefined

接下来，由于map函数是对列表s的每个元素应用oneChar函数，故我们只要看一下oneChar函数是干什么用的就可以了。由于s是一个String类型，故oneChar函数是将一个Char类型转为一个Doc类型，给出函数的详细定义如下，

-- file: PrettyJson.hs

oneChar :: Char –> Doc

oneChar c = case lookup c simpleEscapes of

              Just r –> text r

              Nothing | mustEscape –> hexEscape c

                      | otherwise  ->  char c

    where mustEscape c = c < ' ' || c == '\x7f' || c > '\xff'

simpleEscapes :: [(Char, String)]

simpleEscapes = zipWith ch "\b\n\f\r\t\\\"/" "bnfrt\\\"/"

    where ch a b = (a, ['\\', b]

这段函数定义篇幅稍微有点大，我们一点一点来看，首先simpleEscapes是一个二元元组的列表，元组的第一项为一个简单转义字符（\b, \n, \f, \r, \t, \\, \", /)，第二项是将第一项表现出来，比如 '\b' ，表现为 "\\b" ,可以写一个测试如下

Prelude> :load PrettyJson.hs

Prelude> take 8 simpleEscapes

[('\b',"\\b"),('\n',"\\n"),('\f',"\\f"),('\r',"\\r"),('\t',"\\t"),('\\',"\\\\"),('"',"\\\""),('/',"\\/")]

lookup函数是检查字符c是否在simpleEscapes中某一元组的第一项，如果是，则返回这个转义字符的字符串表现形式，如果字符c不是简单的转义字符，则如果是其他需要转义的字符，对字符c应用hexEscape函数，否则，就是简单的字符，不需要转义，直接应用char函数将Char类型转为Doc类型。

还有一个函数hexEscape还未给出解析。我们直接给出定义如下

-- file: PrettyJson.hs

hexEscape :: Char –> Doc

hexEscape c | d < 0x10000 = smallHex d

            | otherwise   = astral (d – 0x10000)

    where d = ord c

根据字符c的ascii码值是否小于0x10000，对字符c的码值进行不同的处理。

-- file: PrettyJson.hs

smallHex :: Int –> Doc

smallHex x = text "\\u"

          <> text (replicate (4 – length h) ‘0’)

          <> text h

    where h = showHex x ""

其中，showHex函数来自Numeric库，需要在文件的开头import这个库。showHex函数的功能是以16进制打印一个数。replicate函数是将一个指定类型a重复n次，生成一个a类型的列表

replicate :: Int –> a –> [a]

根据hexEscape的定义，字符c的ascii码值（16进制）的长度不超过4位（根据guard表达式为d< 0x10000得到），故4 – length h 的范围为[0, 4]，smallHex函数的作用是这样打印参数x，先打印"\\u"，然后x的16进制位数如果不足4位，补0使得满足4位，然后打印x的16进制值。

smallHex函数能打印的值最大不超过0xffff，而有效的unicode字符范围上限一直到0x10ffff，对 (0xffff, 0x10ffff) 的字符，应用 astral 函数，其定义给出如下

-- file: PrettyJson.hs

astral :: Int –> Doc

astral n = smallHex (a + 0xd800) <> smallHex (b + 0xdc00)

    where a = (n `shiftR` 10).&. 0x3ff

          b = n .&. 0x3ff

其中，shiftR是右移函数，(.&.)是按位与。我们看一下astral函数是如何处理参数n的，对于参数n，取其低10位作为b，取其第11位到第20位（次低10位）作为a，当然了，前面说到参数的范围为 (0xffff, 0x10ffff) ，值位数就是17位到20位，故 a 是高10位，b 是低10位。

秒客网

Haskell Json数据处理

相关文章