如何为霍夫曼编码和解码创建树?

时间:2022-05-01 21:16:34

For my assignment, I am to do a encode and decode for huffman trees. I have a problem creating my tree, and I am stuck.

我的任务是对霍夫曼树进行编码和解码。我在创建树时遇到了问题,我被困住了。

Don't mind the print statements - they are just for me to test and see what the output is when my function runs.

不要介意print语句——它们只是供我测试和查看函数运行时的输出。

For the first for loop, I got all the values and index from the text file I used in my main block for testing.

对于第一个For循环,我从主块中用于测试的文本文件中获取所有值和索引。

In the second for loop I inserted all the stuff into the priority queue.

在第二个for循环中,我将所有内容插入到优先队列中。

I am so stuck about where to go next - I'm trying to make nodes, but I am confused about how to progress. Can someone tell me if I'm doing this right?

我一直纠结于下一步该往哪里走——我在尝试创建节点,但我对如何前进感到困惑。如果我做对了,有人能告诉我吗?

def _create_code(self, frequencies):
    '''(HuffmanCoder, sequence(int)) -> NoneType
    iterate over index into the sequence keeping it 256 elements long, '''
    #fix docstring
    p = PriorityQueue()
    print frequencies

    index = 0 
    for value in frequencies:
        if value != 0:
            print value #priority
            print index #elm
            print '-----------'       
        index = index + 1


    for i in range(len(frequencies)):
        if frequencies[i] != 0:
            p.insert(i, frequencies[i])  
            print i,frequencies[i]
            if p.is_empty():
                a = p.get_min()
                b = p.get_min()
                n1 = self.HuffmanNode(None, None, a)
                n2 = self.HuffmanNode(None, None, b)
                print a, b, n1, n2
    while not p.is_empty():
        p.get_min()

I manually inserted the first two to start my tree, is that correct?

我手动插入前两个来启动我的树,对吗?

How do I keep going? I know the idea of it, just code-wise I am very stuck.

我要怎么继续?我知道它的想法,只是代码上的我很困。

This is using python by the way. I tried looking at Wikipedia, I know the steps, I just need help on code and how I should keep going, thanks!

顺便说一下,这是在使用python。我试着去看*,我知道这些步骤,我只需要在代码方面的帮助,以及我该如何继续下去,谢谢!

The HuffmanNode comes from this nested class:

HuffmanNode来自这个嵌套类:

class HuffmanNode(object):

    def __init__(self, left=None, right=None, root=None):
        self.left = left
        self.right = right
        self.root = root

5 个解决方案

#1


11  

The Huffman algorithm in Wikipedia tells you exactly how to create the node tree, so your program can be based on that algorithm, or another like it. Here is a Python program with comments showing the corresponding wikipedia algorithm step. The test data is frequencies of the letters of the alphabet in English text.

*上的Huffman算法告诉你如何创建节点树,这样你的程序就可以基于这个算法,或者其他类似的算法。下面是一个Python程序,它的注释显示了相应的wikipedia算法步骤。测试数据是英文文本中字母的频率。

Once the node tree is created, you need to walk it down to assign Huffman codes to each symbol in your dataset. Since this is homework, that step is up to you, but a recursive algorithm is the simplest and most natural way to handle it. It's only six more lines of code.

一旦创建了节点树,您需要向下走,为数据集中的每个符号分配Huffman代码。由于这是家庭作业,所以这一步取决于您,但是递归算法是最简单、最自然的处理方法。只多了六行代码。

import queue

class HuffmanNode(object):
    def __init__(self, left=None, right=None, root=None):
        self.left = left
        self.right = right
        self.root = root     # Why?  Not needed for anything.
    def children(self):
        return((self.left, self.right))

freq = [
    (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
    (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
    (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
    (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
    (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
    (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
    (1.974, 'y'), (0.074, 'z') ]

def create_tree(frequencies):
    p = queue.PriorityQueue()
    for value in frequencies:    # 1. Create a leaf node for each symbol
        p.put(value)             #    and add it to the priority queue
    while p.qsize() > 1:         # 2. While there is more than one node
        l, r = p.get(), p.get()  # 2a. remove two highest nodes
        node = HuffmanNode(l, r) # 2b. create internal node with children
        p.put((l[0]+r[0], node)) # 2c. add new node to queue      
    return p.get()               # 3. tree is complete - return root node

node = create_tree(freq)
print(node)

# Recursively walk the tree down to the leaves,
#   assigning a code value to each symbol
def walk_tree(node, prefix="", code={}):
    return(code)

code = walk_tree(node)
for i in sorted(freq, reverse=True):
    print(i[1], '{:6.2f}'.format(i[0]), code[i[1]])

When run on the alphabet data, the resulting Huffman codes are:

当运行在字母表数据上时,产生的霍夫曼代码为:

e  12.70 100
t   9.06 000
a   8.17 1110
o   7.51 1101
i   6.97 1011
n   6.75 1010
s   6.33 0111
h   6.09 0110
r   5.99 0101
d   4.25 11111
l   4.03 11110
c   2.78 01001
u   2.76 01000
m   2.41 00111
w   2.37 00110
f   2.23 00100
g   2.02 110011
y   1.97 110010
p   1.93 110001
b   1.49 110000
v   1.04 001010
k   0.75 0010111
j   0.15 001011011
x   0.15 001011010
q   0.10 001011001
z   0.07 001011000

#2


5  

@Dave walk_tree is missing tree processing code

@Dave walk_tree缺少树处理代码

# Recursively walk the tree down to the leaves,
# assigning a code value to each symbol
def walk_tree(node, prefix="", code={}):
    if isinstance(node[1].left[1], HuffmanNode):
        walk_tree(node[1].left,prefix+"0", code)
    else:
        code[node[1].left[1]]=prefix+"0"
    if isinstance(node[1].right[1],HuffmanNode):
        walk_tree(node[1].right,prefix+"1", code)
    else:
        code[node[1].right[1]]=prefix+"1"
    return(code)

#3


3  

One more solution returning a dictionary {label:code} and a recursive dictionary tree containing the resulting graph. The input vals is in form of dictionary {label:freq}:

还有一个解决方案返回字典{label:code}和包含结果图的递归字典树。输入值为字典{label:freq}:

def assign_code(nodes, label, result, prefix = ''):    
    childs = nodes[label]     
    tree = {}
    if len(childs) == 2:
        tree['0'] = assign_code(nodes, childs[0], result, prefix+'0')
        tree['1'] = assign_code(nodes, childs[1], result, prefix+'1')     
        return tree
    else:
        result[label] = prefix
        return label

def Huffman_code(_vals):    
    vals = _vals.copy()
    nodes = {}
    for n in vals.keys(): # leafs initialization
        nodes[n] = []

    while len(vals) > 1: # binary tree creation
        s_vals = sorted(vals.items(), key=lambda x:x[1]) 
        a1 = s_vals[0][0]
        a2 = s_vals[1][0]
        vals[a1+a2] = vals.pop(a1) + vals.pop(a2)
        nodes[a1+a2] = [a1, a2]        
    code = {}
    root = a1+a2
    tree = {}
    tree = assign_code(nodes, root, code)   # assignment of the code for the given binary tree      
    return code, tree

This can be used as:

这可以用作:

freq = [
(8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
(12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
(6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
(2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
(0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
(2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
(1.974, 'y'), (0.074, 'z') ]    
vals = {l:v for (v,l) in freq}
code, tree = Huffman_code(vals)

text = 'hello' # text to encode
encoded = ''.join([code[t] for t in text])
print('Encoded text:',encoded)

decoded = []
i = 0
while i < len(encoded): # decoding using the binary graph
    ch = encoded[i]  
    act = tree[ch]
    while not isinstance(act, str):
        i += 1
        ch = encoded[i]  
        act = act[ch]        
    decoded.append(act)          
    i += 1

print('Decoded text:',''.join(decoded))

One can visualize the tree with Graphviz as: 如何为霍夫曼编码和解码创建树?

我们可以用Graphviz来可视化这棵树:

The figure was generated by the following script as (Graphviz is needed):

下图由以下脚本生成(需要Graphviz):

def draw_tree(tree, prefix = ''):    
    if isinstance(tree, str):            
        descr = 'N%s [label="%s:%s", fontcolor=blue, fontsize=16, width=2, shape=box];\n'%(prefix, tree, prefix)
    else: # Node description
        descr = 'N%s [label="%s"];\n'%(prefix, prefix)
        for child in tree.keys():
            descr += draw_tree(tree[child], prefix = prefix+child)
            descr += 'N%s -> N%s;\n'%(prefix,prefix+child)
    return descr

import subprocess
with open('graph.dot','w') as f:
    f.write('digraph G {\n')
    f.write(draw_tree(tree))
    f.write('}') 
subprocess.call('dot -Tpng graph.dot -o graph.png', shell=True)

#4


1  

I was working out this problem today, to try and match results in above response. For most part, this solution works well but only thing i find non-intuitive is to add [0] and [1] in printing the non-node (leaf). But this answers miracles question - you can essentially print it using any traversal mechanism

我今天正在解决这个问题,试图在上面的回复中匹配结果。对于大多数情况,这个解决方案都很有效,但是我发现唯一不直观的是在打印非节点(叶子)时添加[0]和[1]。但这就回答了“奇迹”的问题——你基本上可以使用任何遍历机制来打印它

import queue

class HuffmanNode(object):
    def __init__(self,left=None,right=None,root=None):
        self.left = left
        self.right = right
        self.root = root
    def children(self):
        return (self.left,self.right)
    def preorder(self,path=None):
        if path is None:
            path = []
        if self.left is not None:
            if isinstance(self.left[1], HuffmanNode):
                self.left[1].preorder(path+[0])
            else:
                print(self.left,path+[0])
        if self.right is not None:
            if isinstance(self.right[1], HuffmanNode):
                self.right[1].preorder(path+[1])
            else:
                print(self.right,path+[1])

freq = [
    (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
    (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
    (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
    (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
    (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
    (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
    (1.974, 'y'), (0.074, 'z') ]

def encode(frequencies):
    p = queue.PriorityQueue()
    for item in frequencies:
        p.put(item)

    #invariant that order is ascending in the priority queue
    #p.size() gives list of elements
    while p.qsize() > 1:
        left,right = p.get(),p.get()
        node = HuffmanNode(left,right)
        p.put((left[0]+right[0],node))
    return p.get()

node = encode(freq)
print(node[1].preorder())

#5


1  

@Dave class HuffmanNode(object) has a subtle bug. When the two frequencies are equal an exception is thrown: For example let

@Dave类HuffmanNode(对象)有一个小错误。当两个频率相等时,抛出一个异常:例如,let

    freq = [ (200/3101, 'd'), (100/3101, 'e'), (100/3101, 'f') ]

Then you get TypeError: unorderable types: HuffmanNode() < str(). The problem has to do with the PriorityQueue implementation. I suspect when the first elements of the tuples compare equal, PriorityQueue wants to compare the second elements one of which is a python object. You add the lt method to your class and the problem is solved.

然后得到TypeError:无序类型:HuffmanNode() < str()。这个问题与PriorityQueue实现有关。我怀疑当元组的第一个元素比较相等时,PriorityQueue希望比较第二个元素,其中一个是python对象。将lt方法添加到类中,问题就解决了。

    def __lt__(self,other):
        return 0

#1


11  

The Huffman algorithm in Wikipedia tells you exactly how to create the node tree, so your program can be based on that algorithm, or another like it. Here is a Python program with comments showing the corresponding wikipedia algorithm step. The test data is frequencies of the letters of the alphabet in English text.

*上的Huffman算法告诉你如何创建节点树,这样你的程序就可以基于这个算法,或者其他类似的算法。下面是一个Python程序,它的注释显示了相应的wikipedia算法步骤。测试数据是英文文本中字母的频率。

Once the node tree is created, you need to walk it down to assign Huffman codes to each symbol in your dataset. Since this is homework, that step is up to you, but a recursive algorithm is the simplest and most natural way to handle it. It's only six more lines of code.

一旦创建了节点树,您需要向下走,为数据集中的每个符号分配Huffman代码。由于这是家庭作业,所以这一步取决于您,但是递归算法是最简单、最自然的处理方法。只多了六行代码。

import queue

class HuffmanNode(object):
    def __init__(self, left=None, right=None, root=None):
        self.left = left
        self.right = right
        self.root = root     # Why?  Not needed for anything.
    def children(self):
        return((self.left, self.right))

freq = [
    (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
    (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
    (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
    (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
    (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
    (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
    (1.974, 'y'), (0.074, 'z') ]

def create_tree(frequencies):
    p = queue.PriorityQueue()
    for value in frequencies:    # 1. Create a leaf node for each symbol
        p.put(value)             #    and add it to the priority queue
    while p.qsize() > 1:         # 2. While there is more than one node
        l, r = p.get(), p.get()  # 2a. remove two highest nodes
        node = HuffmanNode(l, r) # 2b. create internal node with children
        p.put((l[0]+r[0], node)) # 2c. add new node to queue      
    return p.get()               # 3. tree is complete - return root node

node = create_tree(freq)
print(node)

# Recursively walk the tree down to the leaves,
#   assigning a code value to each symbol
def walk_tree(node, prefix="", code={}):
    return(code)

code = walk_tree(node)
for i in sorted(freq, reverse=True):
    print(i[1], '{:6.2f}'.format(i[0]), code[i[1]])

When run on the alphabet data, the resulting Huffman codes are:

当运行在字母表数据上时,产生的霍夫曼代码为:

e  12.70 100
t   9.06 000
a   8.17 1110
o   7.51 1101
i   6.97 1011
n   6.75 1010
s   6.33 0111
h   6.09 0110
r   5.99 0101
d   4.25 11111
l   4.03 11110
c   2.78 01001
u   2.76 01000
m   2.41 00111
w   2.37 00110
f   2.23 00100
g   2.02 110011
y   1.97 110010
p   1.93 110001
b   1.49 110000
v   1.04 001010
k   0.75 0010111
j   0.15 001011011
x   0.15 001011010
q   0.10 001011001
z   0.07 001011000

#2


5  

@Dave walk_tree is missing tree processing code

@Dave walk_tree缺少树处理代码

# Recursively walk the tree down to the leaves,
# assigning a code value to each symbol
def walk_tree(node, prefix="", code={}):
    if isinstance(node[1].left[1], HuffmanNode):
        walk_tree(node[1].left,prefix+"0", code)
    else:
        code[node[1].left[1]]=prefix+"0"
    if isinstance(node[1].right[1],HuffmanNode):
        walk_tree(node[1].right,prefix+"1", code)
    else:
        code[node[1].right[1]]=prefix+"1"
    return(code)

#3


3  

One more solution returning a dictionary {label:code} and a recursive dictionary tree containing the resulting graph. The input vals is in form of dictionary {label:freq}:

还有一个解决方案返回字典{label:code}和包含结果图的递归字典树。输入值为字典{label:freq}:

def assign_code(nodes, label, result, prefix = ''):    
    childs = nodes[label]     
    tree = {}
    if len(childs) == 2:
        tree['0'] = assign_code(nodes, childs[0], result, prefix+'0')
        tree['1'] = assign_code(nodes, childs[1], result, prefix+'1')     
        return tree
    else:
        result[label] = prefix
        return label

def Huffman_code(_vals):    
    vals = _vals.copy()
    nodes = {}
    for n in vals.keys(): # leafs initialization
        nodes[n] = []

    while len(vals) > 1: # binary tree creation
        s_vals = sorted(vals.items(), key=lambda x:x[1]) 
        a1 = s_vals[0][0]
        a2 = s_vals[1][0]
        vals[a1+a2] = vals.pop(a1) + vals.pop(a2)
        nodes[a1+a2] = [a1, a2]        
    code = {}
    root = a1+a2
    tree = {}
    tree = assign_code(nodes, root, code)   # assignment of the code for the given binary tree      
    return code, tree

This can be used as:

这可以用作:

freq = [
(8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
(12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
(6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
(2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
(0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
(2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
(1.974, 'y'), (0.074, 'z') ]    
vals = {l:v for (v,l) in freq}
code, tree = Huffman_code(vals)

text = 'hello' # text to encode
encoded = ''.join([code[t] for t in text])
print('Encoded text:',encoded)

decoded = []
i = 0
while i < len(encoded): # decoding using the binary graph
    ch = encoded[i]  
    act = tree[ch]
    while not isinstance(act, str):
        i += 1
        ch = encoded[i]  
        act = act[ch]        
    decoded.append(act)          
    i += 1

print('Decoded text:',''.join(decoded))

One can visualize the tree with Graphviz as: 如何为霍夫曼编码和解码创建树?

我们可以用Graphviz来可视化这棵树:

The figure was generated by the following script as (Graphviz is needed):

下图由以下脚本生成(需要Graphviz):

def draw_tree(tree, prefix = ''):    
    if isinstance(tree, str):            
        descr = 'N%s [label="%s:%s", fontcolor=blue, fontsize=16, width=2, shape=box];\n'%(prefix, tree, prefix)
    else: # Node description
        descr = 'N%s [label="%s"];\n'%(prefix, prefix)
        for child in tree.keys():
            descr += draw_tree(tree[child], prefix = prefix+child)
            descr += 'N%s -> N%s;\n'%(prefix,prefix+child)
    return descr

import subprocess
with open('graph.dot','w') as f:
    f.write('digraph G {\n')
    f.write(draw_tree(tree))
    f.write('}') 
subprocess.call('dot -Tpng graph.dot -o graph.png', shell=True)

#4


1  

I was working out this problem today, to try and match results in above response. For most part, this solution works well but only thing i find non-intuitive is to add [0] and [1] in printing the non-node (leaf). But this answers miracles question - you can essentially print it using any traversal mechanism

我今天正在解决这个问题,试图在上面的回复中匹配结果。对于大多数情况,这个解决方案都很有效,但是我发现唯一不直观的是在打印非节点(叶子)时添加[0]和[1]。但这就回答了“奇迹”的问题——你基本上可以使用任何遍历机制来打印它

import queue

class HuffmanNode(object):
    def __init__(self,left=None,right=None,root=None):
        self.left = left
        self.right = right
        self.root = root
    def children(self):
        return (self.left,self.right)
    def preorder(self,path=None):
        if path is None:
            path = []
        if self.left is not None:
            if isinstance(self.left[1], HuffmanNode):
                self.left[1].preorder(path+[0])
            else:
                print(self.left,path+[0])
        if self.right is not None:
            if isinstance(self.right[1], HuffmanNode):
                self.right[1].preorder(path+[1])
            else:
                print(self.right,path+[1])

freq = [
    (8.167, 'a'), (1.492, 'b'), (2.782, 'c'), (4.253, 'd'),
    (12.702, 'e'),(2.228, 'f'), (2.015, 'g'), (6.094, 'h'),
    (6.966, 'i'), (0.153, 'j'), (0.747, 'k'), (4.025, 'l'),
    (2.406, 'm'), (6.749, 'n'), (7.507, 'o'), (1.929, 'p'), 
    (0.095, 'q'), (5.987, 'r'), (6.327, 's'), (9.056, 't'), 
    (2.758, 'u'), (1.037, 'v'), (2.365, 'w'), (0.150, 'x'),
    (1.974, 'y'), (0.074, 'z') ]

def encode(frequencies):
    p = queue.PriorityQueue()
    for item in frequencies:
        p.put(item)

    #invariant that order is ascending in the priority queue
    #p.size() gives list of elements
    while p.qsize() > 1:
        left,right = p.get(),p.get()
        node = HuffmanNode(left,right)
        p.put((left[0]+right[0],node))
    return p.get()

node = encode(freq)
print(node[1].preorder())

#5


1  

@Dave class HuffmanNode(object) has a subtle bug. When the two frequencies are equal an exception is thrown: For example let

@Dave类HuffmanNode(对象)有一个小错误。当两个频率相等时,抛出一个异常:例如,let

    freq = [ (200/3101, 'd'), (100/3101, 'e'), (100/3101, 'f') ]

Then you get TypeError: unorderable types: HuffmanNode() < str(). The problem has to do with the PriorityQueue implementation. I suspect when the first elements of the tuples compare equal, PriorityQueue wants to compare the second elements one of which is a python object. You add the lt method to your class and the problem is solved.

然后得到TypeError:无序类型:HuffmanNode() < str()。这个问题与PriorityQueue实现有关。我怀疑当元组的第一个元素比较相等时,PriorityQueue希望比较第二个元素,其中一个是python对象。将lt方法添加到类中,问题就解决了。

    def __lt__(self,other):
        return 0