如何在python 3中将单词转换为数字（自己的键和值）？

I am writing a Python 3 script that will take words in a text file and convert them into numbers (my own, not ASCII, so no ord function). I have assigned each letter to an integer and would like each word to be the sum of its letters' numerical value. The goal is to group each word with the same numerical value into a dictionary. I am having great trouble recombining the split words as numbers and adding them together. I am completely stuck with this script (it is not complete yet.

我正在编写一个Python 3脚本，它将文本文件中的单词转换为数字（我自己的，而不是ASCII，所以没有ord函数）。我已经将每个字母分配给一个整数，并希望每个单词都是其字母数值的总和。目标是将具有相同数值的每个单词分组到字典中。我很难将拆分词重新组合成数字并将它们加在一起。我完全坚持使用这个脚本（它尚未完成。

**Btw, I know the easier way of creating the l_n dictionary below, but since I've already written it out, I am a little lazy to change it for now, but will do so after the completion of the script.

**顺便说一下，我知道下面创建l_n字典的简单方法，但是因为我已经把它写出来了，我现在有点懒于改变它，但是在脚本完成后会这样做。

l_n = {
    "A": 1, "a": 1,
    "B": 2, "b": 2,
    "C": 3, "c": 3,
    "D": 4, "d": 4,
    "E": 5, "e": 5,
    "F": 6, "f": 6,
    "G": 7, "g": 7,
    "H": 8, "h": 8,
    "I": 9, "i": 9,
    "J": 10, "j": 10,
    "K": 11, "k": 11,
    "L": 12, "l": 12,
    "M": 13, "m": 13,
    "N": 14, "n": 14,
    "O": 15, "o": 15,
    "P": 16, "p": 16,
    "Q": 17, "q": 17,
    "R": 18, "r": 18,
    "S": 19, "s": 19,
    "T": 20, "t": 20,
    "U": 21, "u": 21,
    "V": 22, "v": 22,
    "W": 23, "w": 23,
    "X": 24, "x": 24,
    "Y": 25, "y": 25,
    "Z": 26, "z": 26,
    }

words_list = []

def read_words(file):
    opened_file = open(file, "r")
    contents = opened_file.readlines()

    for i in range(len(contents)):
        words_list.extend(contents[i].split())

    opened_file.close()

    return words_list

read_words("file1.txt")
new_words_list = list(set(words_list))

numbers_list = []
w_n = {}

def words_to_numbers(new_words_list, l_n):
    local_list = new_words_list[:]
    local_number_list = []

    for word in local_list:
        local_number_list.append(word.split())
        for key in l_n:
            local_number_list = local_number_list.replace( **#I am stuck on the logic in this section.**

words_to_numbers(new_words_list, l_n)
print(local_list)

I've tried looking for an answer on * but was unable to find an answer.

我试过在*上寻找答案，但无法找到答案。

Thank you for your help.

感谢您的帮助。

3 个解决方案

#1

You will have to handle punctuation but you just need to sum the value of each words letters and group them which you can do with a defaultdict:

您将不得不处理标点符号，但您只需要将每个单词字母的值相加并将它们分组，您可以使用defaultdict：

lines = """am writing a Python script that will take words in a text file and convert them into numbers (my own, not ASCII, so no ord function).
I have assigned each letter to an integer and would like each word to be the sum of its letters' numerical value.
The goal is to group each word with the same numerical value into a dictionary.
I am having great trouble recombining the split words as numbers and adding them together"""

from collections import defaultdict

d = defaultdict(list)
for line in lines.splitlines():
    for word in line.split():
        d[sum(l_n.get(ch,0) for ch in word)].append(word)

Output:

输出：

from pprint import pprint as pp

pp(dict(d))
{1: ['a', 'a', 'a'],
 7: ['be'],
 9: ['I', 'I'],
 14: ['am', 'am'],
 15: ['an'],
 17: ['each', 'each', 'each'],
 19: ['and', 'and', 'and'],
 20: ['as'],
 21: ['of'],
 23: ['in'],
 28: ['is'],
 29: ['no'],
 32: ['file'],
 33: ['the', 'The', 'the', 'the'],
 34: ['so'],
 35: ['to', 'to', 'goal', 'to'],
 36: ['have'],
 37: ['take', 'ord', 'like'],
 38: ['(my', 'same'],
 39: ['adding'],
 41: ['ASCII,'],
 46: ['them', 'them'],
 48: ['its'],
 49: ['that', 'not'],
 51: ['great'],
 52: ['own,'],
 53: ['sum'],
 56: ['will'],
 58: ['into', 'into'],
 60: ['word', 'word', 'with'],
 61: ['value.', 'value', 'having'],
 69: ['text'],
 75: ['would'],
 76: ['split'],
 77: ['group'],
 78: ['assigned', 'integer'],
 79: ['words', 'words'],
 80: ['letter'],
 85: ['script'],
 92: ['numbers', 'numbers'],
 93: ['trouble'],
 96: ['numerical', 'numerical'],
 97: ['convert'],
 98: ['Python', 'together'],
 99: ["letters'"],
 100: ['writing'],
 102: ['function).'],
 109: ['recombining'],
 118: ['dictionary.']}

sum(l_n.get(ch,0) for ch in word) gets the sum of all the letters in the word, we use that as the key and just append the word as the value. The defaultdict handles repeated keys so we end you with all the words that have the same sum grouped in lists.

sum（单词中为ch的l_n.get（ch，0））得到单词中所有字母的总和，我们将其用作键，只需将单词作为值附加。 defaultdict处理重复的键，因此我们将结束列表中具有相同总和的所有单词。

Also as John commented you can simply store a set of lowercase letters in the dict and call .lower sum(l_n.get(ch,0) for ch in word.lower())

同样，John评论说你可以简单地在dict中存储一组小写字母，并在word.lower（）中为ch调用.lower sum（l_n.get（ch，0））

If you want to remove all punctuation you can use str.translate:

如果要删除所有标点符号，可以使用str.translate：

from collections import defaultdict
from string import punctuation
d = defaultdict(list)
for line in lines.splitlines():
    for word in line.split():
        word = word.translate(None,punctuation)
        d[sum(l_n.get(ch,0) for ch in word)].append(word)

Which would output:

哪个输出：

{1: ['a', 'a', 'a'],
 7: ['be'],
 9: ['I', 'I'],
 14: ['am', 'am'],
 15: ['an'],
 17: ['each', 'each', 'each'],
 19: ['and', 'and', 'and'],
 20: ['as'],
 21: ['of'],
 23: ['in'],
 28: ['is'],
 29: ['no'],
 32: ['file'],
 33: ['the', 'The', 'the', 'the'],
 34: ['so'],
 35: ['to', 'to', 'goal', 'to'],
 36: ['have'],
 37: ['take', 'ord', 'like'],
 38: ['my', 'same'],
 39: ['adding'],
 41: ['ASCII'],
 46: ['them', 'them'],
 48: ['its'],
 49: ['that', 'not'],
 51: ['great'],
 52: ['own'],
 53: ['sum'],
 56: ['will'],
 58: ['into', 'into'],
 60: ['word', 'word', 'with'],
 61: ['value', 'value', 'having'],
 69: ['text'],
 75: ['would'],
 76: ['split'],
 77: ['group'],
 78: ['assigned', 'integer'],
 79: ['words', 'words'],
 80: ['letter'],
 85: ['script'],
 92: ['numbers', 'numbers'],
 93: ['trouble'],
 96: ['numerical', 'numerical'],
 97: ['convert'],
 98: ['Python', 'together'],
 99: ['letters'],
 100: ['writing'],
 102: ['function'],
 109: ['recombining'],
 118: ['dictionary']}

If you don't want duplicate words appearing then use a set:

如果您不想出现重复的单词，请使用集合：

d = defaultdict(set)
....
d[sum(l_n.get(ch,0) for ch in word)].add(word)

#2

i think this is also a good way for doing this

我认为这也是一个很好的方法

import string
letters = string.lowercase
def give_sum(str):
    ans = 0

    for i in str:
        if i.lower() in letters:
            value = letters.find(i.lower()) + 1
            ans += value
    return ans

w_n = {}
with open('file1.txt') as f:
    for line in f:
        w_n[give_sum(line)] = [line]
print w_n

ps: optimize the code according to your requirements

ps：根据您的要求优化代码

#3

as you mentioned , this is not the best way but if we code exactly in your way , this would be the completed code,i checked it and it works.
you need to change your def words_to_numbers code and calculate values of each string according to your l_n dictionary which in that, keys are strings and values are lists.

正如你所提到的，这不是最好的方法，但如果我们完全以你的方式编码，这将是完成的代码，我检查它，它的工作原理。您需要更改def words_to_numbers代码并根据您的l_n字典计算每个字符串的值，其中键是字符串，值是列表。

l_n = {
    "A": 1, "a": 1,
    "B": 2, "b": 2,
    "C": 3, "c": 3,
    "D": 4, "d": 4,
    "E": 5, "e": 5,
    "F": 6, "f": 6,
    "G": 7, "g": 7,
    "H": 8, "h": 8,
    "I": 9, "i": 9,
    "J": 10, "j": 10,
    "K": 11, "k": 11,
    "L": 12, "l": 12,
    "M": 13, "m": 13,
    "N": 14, "n": 14,
    "O": 15, "o": 15,
    "P": 16, "p": 16,
    "Q": 17, "q": 17,
    "R": 18, "r": 18,
    "S": 19, "s": 19,
    "T": 20, "t": 20,
    "U": 21, "u": 21,
    "V": 22, "v": 22,
    "W": 23, "w": 23,
    "X": 24, "x": 24,
    "Y": 25, "y": 25,
    "Z": 26, "z": 26,
    }

words_list = []

def read_words(file):
    opened_file = open(file, "r")
    contents = opened_file.readlines()

    for i in range(len(contents)):
        words_list.extend(contents[i].split())

    opened_file.close()

    return words_list

read_words("file1.txt")
new_words_list = list(set(words_list))

print "new_word_list",new_words_list
numbers_list = []
w_n = {}


def words_to_numbers(new_words_list,l_n):
    local_list = new_words_list[:]
    for word in local_list:
        tmp = 0
        for ch in word:
            tmp += l_n[ch]
        if str(tmp) in w_n:
            w_n[str(tmp)].append(word)
        else:
            tmp_lis = []
            tmp_lis.append(word)
            w_n[str(tmp)] = tmp_lis

    return w_n

print "the_answer_is ==> ",words_to_numbers(new_words_list,l_n)

#1