使用Grep匹配文件名

The overarching problem: So I have a file name that comes in the form of JohnSmith14_120325_A10_6.raw and I want to match it using regex. I have a couple of issues in building a working example but unfortunately my issues won't be solved unless I get the basics.

首要问题是:我有一个文件名，它的形式是JohnSmith14_120325_A10_6。raw格式，我想使用regex进行匹配。我在构建一个工作示例时遇到了一些问题，但不幸的是，除非掌握基本知识，否则我的问题不会得到解决。

So I have just recently learned about piping and one of the cool things I learned was that I can do the following.

所以我最近刚学了管道，我学到的一件很酷的事情是我可以做以下的事情。

X=ll_paprika.sc (don't ask)
VAR=`echo $X | cut -p -f 1`
echo $VAR

which gives me paprika.sc Now when I try to execute the pipe idea in grep, nothing happens.

这给我辣椒。现在当我尝试在grep中执行管道想法时，什么都没有发生。

x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR

Can anyone explain what I am doing wrong?

有人能解释一下我做错了什么吗?

Second question: How does one match a single underscore using regex?

第二个问题:如何使用regex匹配一个下划线?

Here's what I am ultimately trying to do;

这是我最终想要做的;

VAR=`echo $X | grep -e "^[a-bA-Z][a-bA-Z0-9]*(_){1}[0-9]*(_){1}[a-bA-Z0-9]*(_){1}[0-9](\.){1}(raw)"

So the basic idea of my pattern here is that the file name must start with a letter and then it can have any number of letters and numbers following it and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers and then it must have a single period following by raw. This looks grossly wrong and ugly (because I am not sure about the syntax). So how does one match a file extension? Can someone put up a simple example for something ll_parpika.sc so that I can figure out how to do my own regex?

这里我的模式的基本思想是文件名称必须以字母开始,然后它可以拥有任意数量的字母和数字后,必须有一个_划定的一系列数字和另一个_划下一组数字和字符和另一个_划下一组数字,然后它必须有一个原始时期之后。这看起来非常错误和丑陋(因为我不确定语法)。那么如何匹配文件扩展名呢?有人能为ll_parpika举个简单的例子吗?这样我就能知道如何做我自己的regex了?

Thanks.

谢谢。

2 个解决方案

#1

x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR

The reason this isn't doing what you want is that the grep matches a line and returns it. *.sc does in fact match 11_paprika.sc, so it returns that whole line and sticks it in $VAR.

这不是做你想做的事情的原因是grep匹配一行并返回它。*。sc实际上匹配11_paprika。它返回整行并插入$VAR。

If you want to just get a part of it, the cut line probably better. There is a grep -o option that returns only the matching portion, but for this you'd basically have to put in the thing you were looking for, at which point why bother?

如果你想要得到它的一部分，截线可能更好。有一个grep -o选项，它只返回匹配的部分，但是对于这个选项，你基本上需要把你要找的东西放进去，在那一点上为什么要麻烦呢?

the file name must start with a letter

文件名必须以字母开头。

`grep -e "^[a-zA-Z]

grep - e " ^[a-zA-Z]

and then it can have any number of letters and numbers following it

它后面可以有任意数量的字母和数字

[a-zA-Z0-9]*

(a-zA-Z0-9)*

and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers

它必须有_分隔一系列数字和_分隔下一组数字和字符以及_分隔下一组数字

(_[0-9]+){3}

(_[0 - 9]+){ 3 }

and then it must have a single period following by raw.

然后它必须有一个单独的周期跟随raw。

.raw"

.raw”

#2

For the first, use:

第一,使用:

VAR=`echo $X | egrep '\.sc$'`

For the second, you can try this alternative instead:

第二点，你可以尝试以下方法:

VAR=`echo $X | egrep '^[[:alpha:]][[:alnum:]]*_[[:digit:]]+_[[:alnum:]]+_[[:digit:]]+\.raw'`

Note that your character classes from your expression differ from the description that follows in that they seem to only be permissive of a-b for lower case characters in some places. This example is permissive of all alphanumeric characters in those places.

请注意，您的表达式中的字符类与后面的描述不同，因为它们似乎只是在某些地方的小写字符中使用a-b。这个例子是对那些地方所有字母数字字符的许可。

#1

x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR