WBXML(WMLC)学习笔记

时间:2021-10-14 20:21:03

WMLC/WBXML 学习笔记

数据类型
============================================
类型      说明
bit       : 1 bit of data
byte      : 8 bits of opaque data
u_int8    : 8 bit unsigned integer
mb_u_int32: 32 bit unsigned integer, encoded in multi-byte integer format.

multi-byte integer format: 每个字节的第一位用于表示后面还有没有字节是属于当前integer,1表示还有,0表示没有。
    如:0xA0 转换后变成两个字节0x81 0x20 ,0x60转换后还是0x60.

BNF格式文档架构
========================================================
start = version publicid charset strtbl body
strtbl = length *byte
body = *pi element *pi
element = ([switchPage] stag) [ 1*attribute END ] [ *content END ]
content = element | string | extension | entity | pi | opaque
stag = TAG | (literalTag index)
literalTag = LITERAL | LITERAL_A | LITERAL_C | LITERAL_AC
attribute = attrStart *attrValue
attrStart = ([switchPage] ATTRSTART) | ( LITERAL index )
attrValue = ([switchPage] ATTRVALUE) | string | extension | entity | opaque
extension = [switchPage] (( EXT_I termstr ) | ( EXT_T index ) | EXT)
string = inline | tableref
switchPage = SWTICH_PAGE pageindex
inline = STR_I termstr
tableref = STR_T index
entity = ENTITY entcode
entcode = mb_u_int32 // UCS-4 character code
pi = PI attrStart *attrValue END
opaque = OPAQUE length *byte
version = u_int8 // WBXML version number
publicid = mb_u_int32 | ( zero index )
charset = mb_u_int32
termstr = charset-dependent string with termination
index = mb_u_int32 // integer index into string table.
length = mb_u_int32 // integer length.
zero = u_int8 // containing the value zero (0)
pageindex = u_int8

Version
=======================================================
version = u_int8 // WBXML version number
WBXML版本号,前4位表示主版本号减1后的值,后4位是副版本号,如:0x01表示版本号为1.1。

Public Identifier
=============================================================
publicid = mb_u_int32 | ( zero index )
zero = u_int8 // containing the value zero (0)
可用的public identifier清单如下:
0 String table index follows; public identifier is encoded as a literal in the string table.
1 Unknown or missing public identifier.
2 "-//WAPFORUM//DTD WML 1.0//EN" (WML 1.0)
3 DEPRECATED "-//WAPFORUM//DTD WTA 1.0//EN" (WTA Event 1.0)
4 "-//WAPFORUM//DTD WML 1.1//EN" (WML 1.1)
5 "-//WAPFORUM//DTD SI 1.0//EN" (Service Indication 1.0)
6 "-//WAPFORUM//DTD SL 1.0//EN" (Service Loading 1.0)
7 "-//WAPFORUM//DTD CO 1.0//EN" (Cache Operation 1.0)
8 "-//WAPFORUM//DTD CHANNEL 1.1//EN" (Channel 1.1)
9 "-//WAPFORUM//DTD WML 1.2//EN" (WML 1.2)
A “-//WAPFORUM//DTD WML 1.3//EN” (WML 1.3)
B “-//WAPFORUM//DTD PROV 1.0//EN” (Provisioning 1.0)
C “-//WAPFORUM//DTD WTA-WML 1.2//EN” (WTA-WML 1.2)
D “-//WAPFORUM//DTD CHANNEL 1.2//EN” (Channel 1.2)
E- 7F Reserved

Charset
===========================================================
charset = mb_u_int32
字符编码采用IANA Charset MIB数值,常用数值列举如下:
GBK   : 113
GB2312: 2025
Big5  : 2026
UTF-8 : 106
UTF-16: 1015
UTF-16BE:1013
完整列表请访问:http://www.iana.org/assignments/character-sets

String Table
===================================================
strtbl = length *byte
length = mb_u_int32

Tokens
===================================================
TAG Token结构:

Bit(s) Description
7      该位(bit)指明当前Tag是否包含有Attributes,如果该位为0,表示该Tag不包含有属性值(attribute),
       如果该位为1,表示该Tag有一个或多个attributes,直到碰到END token(即0)表示属性结束。
6      该位(bit)指明当前Tag是否是一个包含有内容(Content)的元素,如果该位为0,表示没有内容也没有end tag,
       如果为1,表示有任意多的内容,并且直到碰到END token(即0)结束。
5 - 0  当前Tag值

Global Tokens:
==================================================================================
Token Name Token Description
SWITCH_PAGE 0    Change the code page for the current token state. Followed by a
                 single u_int8 indicating the new code page number.
END         1    Indicates the end of an attribute list or the end of an element.
                 ENTITY 2 A character entity. Followed by a mb_u_int32 encoding the
                 character entity number.
STR_I       3    Inline string. Followed by a termstr.
LITERAL     4    An unknown attribute name, or unknown tag posessing no
                 attributes or content.Followed by a mb_u_int32 that encodes
                 an offset into the string table.
EXT_I_0    40    Inline string document-type-specific extension token. Token is
                 followed by a termstr.
EXT_I_1    41    Inline string document-type-specific extension token. Token is
                 followed by a termstr.
EXT_I_2    42    Inline string document-type-specific extension token. Token is
                 followed by a termstr.
PI         43    Processing instruction.
LITERAL_C  44    An unknown tag posessing content but no attributes.
EXT_T_0    80    Inline integer document-type-specific extension token. Token is
                 followed by a mb_u_int32.
EXT_T_1    81    Inline integer document-type-specific extension token. Token is
                 followed by a mb_u_int32.
EXT_T_2    82    Inline integer document-type-specific extension token. Token is
                 followed by a mb_u_int32.
STR_T      83    String table reference. Followed by a mb_u_int32 encoding a
                 byte offset from the beginning of the string table.
LITERAL_A  84    An unknown tag posessing attributes but no content.
EXT_0      C0    Single -byte document-type-specific extension token.
EXT_1      C1    Single -byte document-type-specific extension token.
EXT_2      C2    Single -byte document-type-specific extension token.
OPAQUE     C3    Opaque document-type-specific data.
LITERAL_AC C4    An unknown tag posessing both attributes and content.

WML Tag Tokens
=====================================================================================
TagName Token
a        1C
anchor   22
access   23
b        24
big      25
br       26
card     27
do       28
em       29
fieldset 2A
go       2B
head     2C
i        2D
img      2E
input    2F
meta     30
noop     31
p        20
postfield 21
pre      1B
prev     32
onevent  33
optgroup 34
option   35
refresh  36
select   37
setvar   3E
small    38
strong   39
table    1F
td       1D
template 3B
timer    3C
tr       1E
u        3D
wml      3F

Attribute Start Tokens
tokens with a value less than 128 indicate the start of an attribute. The attribute start token fully
identifies the attribute name, e.g., URL=, and may optionally specify the beginning of the attribute value, e.g.,
PUBLIC="TRUE". Unknown attribute names are encoded with the globally unique code LITERAL (see section
5.8.4.5). LITERAL must not be used to encode any portion of an attribute value.
=============================================================================================================
AttributeName AttributeValuePrefix Token
accept-charset                        5
accesskey                             5E
align                                 52
align          bottom                 6
align          center                 7
align          left                   8
align          middle                 9
align          right                  A
align          top                    B
alt                                   C
cache-control  no-cache               64
class                                 54
columns                               53
content                               D
content        application/vnd.wap.wmlc;charset=  5C
domain                                F
emptyok        false                  10
emptyok        true                   11
enctype                               5F
enctype application/x-www-form-urlencoded  60
enctype multipart/form-data           61
format                                12
forua          false                  56
forua          true                   57
height                                13
href                                  4A
href           http://                4B
href           https://               4C
hspace                                14
http-equiv                            5A
http-equiv     Content-Type           5B
http-equiv     Expires                5D
id                                    55
ivalue                                15
iname                                 16
label                                 18
localsrc                              19
maxlength                             1A
method         get                    1B
method         post                   1C
mode           nowrap                 1D
mode           wrap                   1E
multiple       false                  1F
multiple       true                   20
name                                  21
newcontext     false                  22
newcontext     true                   23
onenterbackward                       25
onenterforward                        26
onpick                                24
ontimer                               27
optional       false                  28
optional       true                   29
path                                  2A
scheme                                2E
sendreferer    false                  2F
sendreferer    true                   30
size                                  31
src                                   32
src            http://                58
src            https://               59
ordered        true                   33
ordered        false                  34
tabindex                              35
title                                 36
type                                  37
type           accept                 38
type           delete                 39
type           help                   3A
type           password               3B
type           onpick                 3C
type           onenterbackward        3D
type           onenterforward         3E
type           ontimer                3F
type           options                45
type           prev                   46
type           reset                  47
type           text                   48
type           vnd.                   49
value                                 4D
vspace                                4E
width                                 4F
xml:lang                              50
xml:space      preserve               62
xml:space      default                63


Attribute Value -
tokens with a value of 128 or greater represent a well-known string present in an attribute value.
These tokens may only be used to represent attribute values. Unknown attribute values are encoded with string,
entity or extension codes (see section 5.8.4).
All tokenised attributes must begin with a single attribute
==================================================================================================================
Attribute Value Token
.com/                 85
.edu/                 86
.net/                 87
.org/               88
accept          89
bottom          8A
clear           8B
delete          8C
help            8D
http://         8E
http://www.     8F
https://        90
https://www.    91
middle          93
nowrap          94
onenterbackward 96
onenterforward  97
onpick          95
ontimer         98
options         99
password        9A
reset           9B
text            9D
top             9E
unknown         9F
wrap            A0
Www.            A1

参考文档:
WBXML:
http://www.openmobilealliance.org/release_program/docs/CopyrightClick.asp?pck=Browsing&file=V2_1-20061020-A/WAP-192-WBXML-20010725-a.pdf

WML:
http://www.openmobilealliance.org/release_program/docs/CopyrightClick.asp?pck=Browsing&file=V2_1-20061020-A/WAP-191-WML-20000219-a.pdf