WMLC/WBXML 学习笔记
数据类型
============================================
类型 说明
bit : 1 bit of data
byte : 8 bits of opaque data
u_int8 : 8 bit unsigned integer
mb_u_int32: 32 bit unsigned integer, encoded in multi-byte integer format.
multi-byte integer format: 每个字节的第一位用于表示后面还有没有字节是属于当前integer,1表示还有,0表示没有。
如:0xA0 转换后变成两个字节0x81 0x20 ,0x60转换后还是0x60.
BNF格式文档架构
========================================================
start = version publicid charset strtbl body
strtbl = length *byte
body = *pi element *pi
element = ([switchPage] stag) [ 1*attribute END ] [ *content END ]
content = element | string | extension | entity | pi | opaque
stag = TAG | (literalTag index)
literalTag = LITERAL | LITERAL_A | LITERAL_C | LITERAL_AC
attribute = attrStart *attrValue
attrStart = ([switchPage] ATTRSTART) | ( LITERAL index )
attrValue = ([switchPage] ATTRVALUE) | string | extension | entity | opaque
extension = [switchPage] (( EXT_I termstr ) | ( EXT_T index ) | EXT)
string = inline | tableref
switchPage = SWTICH_PAGE pageindex
inline = STR_I termstr
tableref = STR_T index
entity = ENTITY entcode
entcode = mb_u_int32 // UCS-4 character code
pi = PI attrStart *attrValue END
opaque = OPAQUE length *byte
version = u_int8 // WBXML version number
publicid = mb_u_int32 | ( zero index )
charset = mb_u_int32
termstr = charset-dependent string with termination
index = mb_u_int32 // integer index into string table.
length = mb_u_int32 // integer length.
zero = u_int8 // containing the value zero (0)
pageindex = u_int8
Version
=======================================================
version = u_int8 // WBXML version number
WBXML版本号,前4位表示主版本号减1后的值,后4位是副版本号,如:0x01表示版本号为1.1。
Public Identifier
=============================================================
publicid = mb_u_int32 | ( zero index )
zero = u_int8 // containing the value zero (0)
可用的public identifier清单如下:
0 String table index follows; public identifier is encoded as a literal in the string table.
1 Unknown or missing public identifier.
2 "-//WAPFORUM//DTD WML 1.0//EN" (WML 1.0)
3 DEPRECATED "-//WAPFORUM//DTD WTA 1.0//EN" (WTA Event 1.0)
4 "-//WAPFORUM//DTD WML 1.1//EN" (WML 1.1)
5 "-//WAPFORUM//DTD SI 1.0//EN" (Service Indication 1.0)
6 "-//WAPFORUM//DTD SL 1.0//EN" (Service Loading 1.0)
7 "-//WAPFORUM//DTD CO 1.0//EN" (Cache Operation 1.0)
8 "-//WAPFORUM//DTD CHANNEL 1.1//EN" (Channel 1.1)
9 "-//WAPFORUM//DTD WML 1.2//EN" (WML 1.2)
A “-//WAPFORUM//DTD WML 1.3//EN” (WML 1.3)
B “-//WAPFORUM//DTD PROV 1.0//EN” (Provisioning 1.0)
C “-//WAPFORUM//DTD WTA-WML 1.2//EN” (WTA-WML 1.2)
D “-//WAPFORUM//DTD CHANNEL 1.2//EN” (Channel 1.2)
E- 7F Reserved
Charset
===========================================================
charset = mb_u_int32
字符编码采用IANA Charset MIB数值,常用数值列举如下:
GBK : 113
GB2312: 2025
Big5 : 2026
UTF-8 : 106
UTF-16: 1015
UTF-16BE:1013
完整列表请访问:http://www.iana.org/assignments/character-sets
String Table
===================================================
strtbl = length *byte
length = mb_u_int32
Tokens
===================================================
TAG Token结构:
Bit(s) Description
7 该位(bit)指明当前Tag是否包含有Attributes,如果该位为0,表示该Tag不包含有属性值(attribute),
如果该位为1,表示该Tag有一个或多个attributes,直到碰到END token(即0)表示属性结束。
6 该位(bit)指明当前Tag是否是一个包含有内容(Content)的元素,如果该位为0,表示没有内容也没有end tag,
如果为1,表示有任意多的内容,并且直到碰到END token(即0)结束。
5 - 0 当前Tag值
Global Tokens:
==================================================================================
Token Name Token Description
SWITCH_PAGE 0 Change the code page for the current token state. Followed by a
single u_int8 indicating the new code page number.
END 1 Indicates the end of an attribute list or the end of an element.
ENTITY 2 A character entity. Followed by a mb_u_int32 encoding the
character entity number.
STR_I 3 Inline string. Followed by a termstr.
LITERAL 4 An unknown attribute name, or unknown tag posessing no
attributes or content.Followed by a mb_u_int32 that encodes
an offset into the string table.
EXT_I_0 40 Inline string document-type-specific extension token. Token is
followed by a termstr.
EXT_I_1 41 Inline string document-type-specific extension token. Token is
followed by a termstr.
EXT_I_2 42 Inline string document-type-specific extension token. Token is
followed by a termstr.
PI 43 Processing instruction.
LITERAL_C 44 An unknown tag posessing content but no attributes.
EXT_T_0 80 Inline integer document-type-specific extension token. Token is
followed by a mb_u_int32.
EXT_T_1 81 Inline integer document-type-specific extension token. Token is
followed by a mb_u_int32.
EXT_T_2 82 Inline integer document-type-specific extension token. Token is
followed by a mb_u_int32.
STR_T 83 String table reference. Followed by a mb_u_int32 encoding a
byte offset from the beginning of the string table.
LITERAL_A 84 An unknown tag posessing attributes but no content.
EXT_0 C0 Single -byte document-type-specific extension token.
EXT_1 C1 Single -byte document-type-specific extension token.
EXT_2 C2 Single -byte document-type-specific extension token.
OPAQUE C3 Opaque document-type-specific data.
LITERAL_AC C4 An unknown tag posessing both attributes and content.
WML Tag Tokens
=====================================================================================
TagName Token
a 1C
anchor 22
access 23
b 24
big 25
br 26
card 27
do 28
em 29
fieldset 2A
go 2B
head 2C
i 2D
img 2E
input 2F
meta 30
noop 31
p 20
postfield 21
pre 1B
prev 32
onevent 33
optgroup 34
option 35
refresh 36
select 37
setvar 3E
small 38
strong 39
table 1F
td 1D
template 3B
timer 3C
tr 1E
u 3D
wml 3F
Attribute Start Tokens
tokens with a value less than 128 indicate the start of an attribute. The attribute start token fully
identifies the attribute name, e.g., URL=, and may optionally specify the beginning of the attribute value, e.g.,
PUBLIC="TRUE". Unknown attribute names are encoded with the globally unique code LITERAL (see section
5.8.4.5). LITERAL must not be used to encode any portion of an attribute value.
=============================================================================================================
AttributeName AttributeValuePrefix Token
accept-charset 5
accesskey 5E
align 52
align bottom 6
align center 7
align left 8
align middle 9
align right A
align top B
alt C
cache-control no-cache 64
class 54
columns 53
content D
content application/vnd.wap.wmlc;charset= 5C
domain F
emptyok false 10
emptyok true 11
enctype 5F
enctype application/x-www-form-urlencoded 60
enctype multipart/form-data 61
format 12
forua false 56
forua true 57
height 13
href 4A
href http:// 4B
href https:// 4C
hspace 14
http-equiv 5A
http-equiv Content-Type 5B
http-equiv Expires 5D
id 55
ivalue 15
iname 16
label 18
localsrc 19
maxlength 1A
method get 1B
method post 1C
mode nowrap 1D
mode wrap 1E
multiple false 1F
multiple true 20
name 21
newcontext false 22
newcontext true 23
onenterbackward 25
onenterforward 26
onpick 24
ontimer 27
optional false 28
optional true 29
path 2A
scheme 2E
sendreferer false 2F
sendreferer true 30
size 31
src 32
src http:// 58
src https:// 59
ordered true 33
ordered false 34
tabindex 35
title 36
type 37
type accept 38
type delete 39
type help 3A
type password 3B
type onpick 3C
type onenterbackward 3D
type onenterforward 3E
type ontimer 3F
type options 45
type prev 46
type reset 47
type text 48
type vnd. 49
value 4D
vspace 4E
width 4F
xml:lang 50
xml:space preserve 62
xml:space default 63
Attribute Value -
tokens with a value of 128 or greater represent a well-known string present in an attribute value.
These tokens may only be used to represent attribute values. Unknown attribute values are encoded with string,
entity or extension codes (see section 5.8.4).
All tokenised attributes must begin with a single attribute
==================================================================================================================
Attribute Value Token
.com/ 85
.edu/ 86
.net/ 87
.org/ 88
accept 89
bottom 8A
clear 8B
delete 8C
help 8D
http:// 8E
http://www. 8F
https:// 90
https://www. 91
middle 93
nowrap 94
onenterbackward 96
onenterforward 97
onpick 95
ontimer 98
options 99
password 9A
reset 9B
text 9D
top 9E
unknown 9F
wrap A0
Www. A1