I have a python script that analyzes a set of error messages and checks for each message if it matches a certain pattern (regular expression) in order to group these messages. For example "file x does not exist" and "file y does not exist" would match "file .* does not exist" and be accounted as two occurrences of "file not found" category.
我有一个python脚本,它分析一组错误消息,并检查每条消息是否与特定模式(正则表达式)匹配,以便对这些消息进行分组。例如,“文件x不存在”和“文件y不存在”将匹配“文件。*不存在”并且被视为“未找到文件”类别的两次出现。
As the number of patterns and categories is growing, I'd like to put these couples "regular expression/display string" in a configuration file, basically a dictionary serialization of some sort.
随着模式和类别的数量不断增加,我想把这些夫妇“正则表达/显示字符串”放在配置文件中,基本上是某种类型的字典序列化。
I would like this file to be editable by hand, so I'm discarding any form of binary serialization, and also I'd rather not resort to xml serialization to avoid problems with characters to escape (& <> and so on...).
我希望这个文件可以手工编辑,所以我放弃任何形式的二进制序列化,而且我宁愿不采用xml序列化来避免字符转义问题(&<>等等...... )。
Do you have any idea of what could be a good way of accomplishing this?
你知道什么是实现这个目标的好方法吗?
Update: thanks to Daren Thomas and Federico Ramponi, but I cannot have an external python file with possibly arbitrary code.
更新:感谢Daren Thomas和Federico Ramponi,但我不能拥有可能任意代码的外部python文件。
6 个解决方案
#1
35
You have two decent options:
你有两个不错的选择:
- Python standard config file format using ConfigParser
- YAML using a library like PyYAML
使用ConfigParser的Python标准配置文件格式
YAML使用像PyYAML这样的库
The standard Python configuration files look like INI files with [sections]
and key : value
or key = value
pairs. The advantages to this format are:
标准Python配置文件看起来像带有[sections]和key:value或key = value对的INI文件。这种格式的优点是:
- No third-party libraries necessary
- Simple, familiar file format.
不需要第三方库
简单,熟悉的文件格式。
YAML is different in that it is designed to be a human friendly data serialization format rather than specifically designed for configuration. It is very readable and gives you a couple different ways to represent the same data. For your problem, you could create a YAML file that looks like this:
YAML的不同之处在于它被设计为人性化的数据序列化格式,而不是专门为配置而设计的。它非常易读,并为您提供了几种表示相同数据的方法。对于您的问题,您可以创建一个如下所示的YAML文件:
file .* does not exist : file not found
user .* not found : authorization error
Or like this:
或者像这样:
{ file .* does not exist: file not found,
user .* not found: authorization error }
Using PyYAML couldn't be simpler:
使用PyYAML不是更简单:
import yaml
errors = yaml.load(open('my.yaml'))
At this point errors
is a Python dictionary with the expected format. YAML is capable of representing more than dictionaries: if you prefer a list of pairs, use this format:
此时,errors是具有预期格式的Python字典。 YAML能够代表字典以外的代码:如果您更喜欢字对列表,请使用以下格式:
-
- file .* does not exist
- file not found
-
- user .* not found
- authorization error
Or
[ [file .* does not exist, file not found],
[user .* not found, authorization error]]
Which will produce a list of lists when yaml.load
is called.
当调用yaml.load时,这将生成一个列表列表。
One advantage of YAML is that you could use it to export your existing, hard-coded data out to a file to create the initial version, rather than cut/paste plus a bunch of find/replace to get the data into the right format.
YAML的一个优点是,您可以使用它将现有的硬编码数据导出到文件以创建初始版本,而不是剪切/粘贴加上一堆查找/替换以将数据转换为正确的格式。
The YAML format will take a little more time to get familiar with, but using PyYAML is even simpler than using ConfigParser with the advantage is that you have more options regarding how your data is represented using YAML.
YAML格式需要花费更多时间才能熟悉,但使用PyYAML比使用ConfigParser更简单,其优点是您可以使用YAML获得更多关于数据表示方式的选项。
Either one sounds like it will fit your current needs, ConfigParser will be easier to start with while YAML gives you more flexibilty in the future, if your needs expand.
任何一个听起来都能满足您当前的需求,ConfigParser将更容易入手,而YAML将在您的需求扩展时为您提供更多的灵活性。
Best of luck!
祝你好运!
#2
37
I sometimes just write a python module (i.e. file) called config.py
or something with following contents:
我有时只是编写一个名为config.py的python模块(即文件)或具有以下内容的东西:
config = {
'name': 'hello',
'see?': 'world'
}
this can then be 'read' like so:
然后可以像这样“读取”:
from config import config
config['name']
config['see?']
easy.
#3
8
I've heard that ConfigObj is easier to work with than ConfigParser. It is used by a lot of big projects, IPython, Trac, Turbogears, etc...
我听说ConfigObj比ConfigParser更容易使用。它被许多大项目使用,IPython,Trac,Turbogears等......
From their introduction:
从他们的介绍:
ConfigObj is a simple but powerful config file reader and writer: an ini file round tripper. Its main feature is that it is very easy to use, with a straightforward programmer's interface and a simple syntax for config files. It has lots of other features though :
ConfigObj是一个简单但功能强大的配置文件读写器:一个ini文件round Tripper。它的主要特点是它非常易于使用,具有简单的程序员界面和简单的配置文件语法。它有很多其他功能:
- Nested sections (subsections), to any level
- List values
- Multiple line values
- String interpolation (substitution)
- Integrated with a powerful validation system
- including automatic type checking/conversion
- repeated sections
- and allowing default values
包括自动类型检查/转换
并允许默认值
- When writing out config files, ConfigObj preserves all comments and the order of members and sections
- Many useful methods and options for working with configuration files (like the 'reload' method)
- Full Unicode support
嵌套部分(子部分),任何级别
多行值
字符串插值(替换)
与强大的验证系统集成,包括自动类型检查/转换重复部分并允许默认值
在写出配置文件时,ConfigObj会保留所有注释以及成员和部分的顺序
使用配置文件的许多有用方法和选项(如'reload'方法)
完整的Unicode支持
#4
4
I think you want the ConfigParser module in the standard library. It reads and writes INI style files. The examples and documentation in the standard documentation I've linked to are very comprehensive.
我想你想要标准库中的ConfigParser模块。它读写INI样式文件。我链接到的标准文档中的示例和文档非常全面。
#5
4
If you are the only one that has access to the configuration file, you can use a simple, low-level solution. Keep the "dictionary" in a text file as a list of tuples (regexp, message) exactly as if it was a python expression:
如果您是唯一可以访问配置文件的人,则可以使用简单的低级解决方案。将“词典”保存在文本文件中作为元组列表(regexp,message),就像它是一个python表达式一样:
[
("file .* does not exist", "file not found"),
("user .* not authorized", "authorization error")
]
In your code, load it, then eval it, and compile the regexps in the result:
f = open("messages.py")
messages = eval(f.read()) # caution: you must be sure of what's in that file
f.close()
messages = [(re.compile(r), m) for (r,m) in messages]
and you end up with a list of tuples (compiled_regexp, message).
#6
3
I typically do as Daren suggested, just make your config file a Python script:
我通常像达人建议的那样做,只需将配置文件设为Python脚本:
patterns = {
'file .* does not exist': 'file not found',
'user .* not found': 'authorization error',
}
Then you can use it as:
然后你可以用它作为:
import config
for pattern in config.patterns:
if re.search(pattern, log_message):
print config.patterns[pattern]
This is what Django does with their settings file, by the way.
顺便说一下,这就是Django对其设置文件所做的事情。
#1
35
You have two decent options:
你有两个不错的选择:
- Python standard config file format using ConfigParser
- YAML using a library like PyYAML
使用ConfigParser的Python标准配置文件格式
YAML使用像PyYAML这样的库
The standard Python configuration files look like INI files with [sections]
and key : value
or key = value
pairs. The advantages to this format are:
标准Python配置文件看起来像带有[sections]和key:value或key = value对的INI文件。这种格式的优点是:
- No third-party libraries necessary
- Simple, familiar file format.
不需要第三方库
简单,熟悉的文件格式。
YAML is different in that it is designed to be a human friendly data serialization format rather than specifically designed for configuration. It is very readable and gives you a couple different ways to represent the same data. For your problem, you could create a YAML file that looks like this:
YAML的不同之处在于它被设计为人性化的数据序列化格式,而不是专门为配置而设计的。它非常易读,并为您提供了几种表示相同数据的方法。对于您的问题,您可以创建一个如下所示的YAML文件:
file .* does not exist : file not found
user .* not found : authorization error
Or like this:
或者像这样:
{ file .* does not exist: file not found,
user .* not found: authorization error }
Using PyYAML couldn't be simpler:
使用PyYAML不是更简单:
import yaml
errors = yaml.load(open('my.yaml'))
At this point errors
is a Python dictionary with the expected format. YAML is capable of representing more than dictionaries: if you prefer a list of pairs, use this format:
此时,errors是具有预期格式的Python字典。 YAML能够代表字典以外的代码:如果您更喜欢字对列表,请使用以下格式:
-
- file .* does not exist
- file not found
-
- user .* not found
- authorization error
Or
[ [file .* does not exist, file not found],
[user .* not found, authorization error]]
Which will produce a list of lists when yaml.load
is called.
当调用yaml.load时,这将生成一个列表列表。
One advantage of YAML is that you could use it to export your existing, hard-coded data out to a file to create the initial version, rather than cut/paste plus a bunch of find/replace to get the data into the right format.
YAML的一个优点是,您可以使用它将现有的硬编码数据导出到文件以创建初始版本,而不是剪切/粘贴加上一堆查找/替换以将数据转换为正确的格式。
The YAML format will take a little more time to get familiar with, but using PyYAML is even simpler than using ConfigParser with the advantage is that you have more options regarding how your data is represented using YAML.
YAML格式需要花费更多时间才能熟悉,但使用PyYAML比使用ConfigParser更简单,其优点是您可以使用YAML获得更多关于数据表示方式的选项。
Either one sounds like it will fit your current needs, ConfigParser will be easier to start with while YAML gives you more flexibilty in the future, if your needs expand.
任何一个听起来都能满足您当前的需求,ConfigParser将更容易入手,而YAML将在您的需求扩展时为您提供更多的灵活性。
Best of luck!
祝你好运!
#2
37
I sometimes just write a python module (i.e. file) called config.py
or something with following contents:
我有时只是编写一个名为config.py的python模块(即文件)或具有以下内容的东西:
config = {
'name': 'hello',
'see?': 'world'
}
this can then be 'read' like so:
然后可以像这样“读取”:
from config import config
config['name']
config['see?']
easy.
#3
8
I've heard that ConfigObj is easier to work with than ConfigParser. It is used by a lot of big projects, IPython, Trac, Turbogears, etc...
我听说ConfigObj比ConfigParser更容易使用。它被许多大项目使用,IPython,Trac,Turbogears等......
From their introduction:
从他们的介绍:
ConfigObj is a simple but powerful config file reader and writer: an ini file round tripper. Its main feature is that it is very easy to use, with a straightforward programmer's interface and a simple syntax for config files. It has lots of other features though :
ConfigObj是一个简单但功能强大的配置文件读写器:一个ini文件round Tripper。它的主要特点是它非常易于使用,具有简单的程序员界面和简单的配置文件语法。它有很多其他功能:
- Nested sections (subsections), to any level
- List values
- Multiple line values
- String interpolation (substitution)
- Integrated with a powerful validation system
- including automatic type checking/conversion
- repeated sections
- and allowing default values
包括自动类型检查/转换
并允许默认值
- When writing out config files, ConfigObj preserves all comments and the order of members and sections
- Many useful methods and options for working with configuration files (like the 'reload' method)
- Full Unicode support
嵌套部分(子部分),任何级别
多行值
字符串插值(替换)
与强大的验证系统集成,包括自动类型检查/转换重复部分并允许默认值
在写出配置文件时,ConfigObj会保留所有注释以及成员和部分的顺序
使用配置文件的许多有用方法和选项(如'reload'方法)
完整的Unicode支持
#4
4
I think you want the ConfigParser module in the standard library. It reads and writes INI style files. The examples and documentation in the standard documentation I've linked to are very comprehensive.
我想你想要标准库中的ConfigParser模块。它读写INI样式文件。我链接到的标准文档中的示例和文档非常全面。
#5
4
If you are the only one that has access to the configuration file, you can use a simple, low-level solution. Keep the "dictionary" in a text file as a list of tuples (regexp, message) exactly as if it was a python expression:
如果您是唯一可以访问配置文件的人,则可以使用简单的低级解决方案。将“词典”保存在文本文件中作为元组列表(regexp,message),就像它是一个python表达式一样:
[
("file .* does not exist", "file not found"),
("user .* not authorized", "authorization error")
]
In your code, load it, then eval it, and compile the regexps in the result:
f = open("messages.py")
messages = eval(f.read()) # caution: you must be sure of what's in that file
f.close()
messages = [(re.compile(r), m) for (r,m) in messages]
and you end up with a list of tuples (compiled_regexp, message).
#6
3
I typically do as Daren suggested, just make your config file a Python script:
我通常像达人建议的那样做,只需将配置文件设为Python脚本:
patterns = {
'file .* does not exist': 'file not found',
'user .* not found': 'authorization error',
}
Then you can use it as:
然后你可以用它作为:
import config
for pattern in config.patterns:
if re.search(pattern, log_message):
print config.patterns[pattern]
This is what Django does with their settings file, by the way.
顺便说一下,这就是Django对其设置文件所做的事情。