如何根据正则表达式将文件内容分类到不同的组?

时间:2021-01-15 01:06:18

I have a flat file that contains a list of packages that are existing in the system. I want to find out if the package is

我有一个平面文件,其中包含系统中存在的软件包列表。我想知道包是否

  1. a batch component (conventionally, names begin with batch),
  2. 批处理组件(通常,名称以批处理开头),

  3. a service (names end with serv)
  4. 服务(名称以serv结尾)

  5. a messaging daemon (names end with d)
  6. 消息传递守护进程(名称以d结尾)

  7. a web component (names end with web)
  8. 一个Web组件(名称以web结尾)

  9. and those that don't fall into any category (meaning not named per convention)
  10. 那些不属于任何类别的人(意思是没有根据惯例命名)

I have written this bash script for the same:

我为此写了这个bash脚本:

grep serv$ pack_list.txt > serv_list.txt
grep d$ pack_list.txt > daemon_list.txt
grep ^batch pack_list.txt > batch_list.txt
grep web$ pack_list.txt > web_list.txt
grep -v serv$ pack_list.txt | grep -v d$ | grep -v ^batch | grep -v web$ > uncat_list.txt

While it satisfies my current requirement and does not take much time, I cannot help but wonder if some other language would be a better choice for these kind of operations.

虽然它满足了我目前的要求并且不花费太多时间,但我不禁想知道其他语言是否会成为这类操作的更好选择。

---EDIT--

Example input would be:

示例输入将是:

fileserv
batch_file_processor
userweb
processord

Each would go into a different file.

每个都会进入一个不同的文件。

To clarify what I am looking for: I am looking for some language where this processing would have better syntactic support than:

为了澄清我在寻找什么:我正在寻找一种语言,这种处理将有更好的语法支持:

  1. A command like grep for each regex.
  2. 像每个正则表达式的grep命令。

  3. A series of if conditions like Python or Perl would do.
  4. 像Python或Perl这样的一系列if条件都可以。

Something along the lines of:

有点像:

switch line.match($1):
    case (pattern1):
          ...
    case (pattern2):
          ...

Any suggestions?

1 个解决方案

#1


2  

A single Awk process can do this much better, for each line matching against your patterns and redirecting output appropriately:

对于与模式匹配的每一行并适当地重定向输出,单个Awk进程可以做得更好:

awk '{
  if ($0 ~ /serv$/) { print > "serv_list.txt" }
  else if ($0 ~ /d$) { print > "daemon_list.txt" }
  // ... and so on
  else { print > "uncat_list.txt" }
}' pack_list.txt

#1


2  

A single Awk process can do this much better, for each line matching against your patterns and redirecting output appropriately:

对于与模式匹配的每一行并适当地重定向输出,单个Awk进程可以做得更好:

awk '{
  if ($0 ~ /serv$/) { print > "serv_list.txt" }
  else if ($0 ~ /d$) { print > "daemon_list.txt" }
  // ... and so on
  else { print > "uncat_list.txt" }
}' pack_list.txt