The overarching problem: So I have a file name that comes in the form of JohnSmith14_120325_A10_6.raw and I want to match it using regex. I have a couple of issues in building a working example but unfortunately my issues won't be solved unless I get the basics.
首要问题是:我有一个文件名,它的形式是JohnSmith14_120325_A10_6。raw格式,我想使用regex进行匹配。我在构建一个工作示例时遇到了一些问题,但不幸的是,除非掌握基本知识,否则我的问题不会得到解决。
So I have just recently learned about piping and one of the cool things I learned was that I can do the following.
所以我最近刚学了管道,我学到的一件很酷的事情是我可以做以下的事情。
X=ll_paprika.sc (don't ask)
VAR=`echo $X | cut -p -f 1`
echo $VAR
which gives me paprika.sc Now when I try to execute the pipe idea in grep, nothing happens.
这给我辣椒。现在当我尝试在grep中执行管道想法时,什么都没有发生。
x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR
Can anyone explain what I am doing wrong?
有人能解释一下我做错了什么吗?
Second question: How does one match a single underscore using regex?
第二个问题:如何使用regex匹配一个下划线?
Here's what I am ultimately trying to do;
这是我最终想要做的;
VAR=`echo $X | grep -e "^[a-bA-Z][a-bA-Z0-9]*(_){1}[0-9]*(_){1}[a-bA-Z0-9]*(_){1}[0-9](\.){1}(raw)"
So the basic idea of my pattern here is that the file name must start with a letter and then it can have any number of letters and numbers following it and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers and then it must have a single period following by raw. This looks grossly wrong and ugly (because I am not sure about the syntax). So how does one match a file extension? Can someone put up a simple example for something ll_parpika.sc so that I can figure out how to do my own regex?
这里我的模式的基本思想是文件名称必须以字母开始,然后它可以拥有任意数量的字母和数字后,必须有一个_划定的一系列数字和另一个_划下一组数字和字符和另一个_划下一组数字,然后它必须有一个原始时期之后。这看起来非常错误和丑陋(因为我不确定语法)。那么如何匹配文件扩展名呢?有人能为ll_parpika举个简单的例子吗?这样我就能知道如何做我自己的regex了?
Thanks.
谢谢。
2 个解决方案
#1
3
x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR
The reason this isn't doing what you want is that the grep matches a line and returns it. *.sc
does in fact match 11_paprika.sc
, so it returns that whole line and sticks it in $VAR
.
这不是做你想做的事情的原因是grep匹配一行并返回它。*。sc实际上匹配11_paprika。它返回整行并插入$VAR。
If you want to just get a part of it, the cut
line probably better. There is a grep -o
option that returns only the matching portion, but for this you'd basically have to put in the thing you were looking for, at which point why bother?
如果你想要得到它的一部分,截线可能更好。有一个grep -o选项,它只返回匹配的部分,但是对于这个选项,你基本上需要把你要找的东西放进去,在那一点上为什么要麻烦呢?
the file name must start with a letter
文件名必须以字母开头。
`grep -e "^[a-zA-Z]
grep - e " ^[a-zA-Z]
and then it can have any number of letters and numbers following it
它后面可以有任意数量的字母和数字
[a-zA-Z0-9]*
(a-zA-Z0-9)*
and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers
它必须有_分隔一系列数字和_分隔下一组数字和字符以及_分隔下一组数字
(_[0-9]+){3}
(_[0 - 9]+){ 3 }
and then it must have a single period following by raw.
然后它必须有一个单独的周期跟随raw。
.raw"
.raw”
#2
0
For the first, use:
第一,使用:
VAR=`echo $X | egrep '\.sc$'`
For the second, you can try this alternative instead:
第二点,你可以尝试以下方法:
VAR=`echo $X | egrep '^[[:alpha:]][[:alnum:]]*_[[:digit:]]+_[[:alnum:]]+_[[:digit:]]+\.raw'`
Note that your character classes from your expression differ from the description that follows in that they seem to only be permissive of a-b for lower case characters in some places. This example is permissive of all alphanumeric characters in those places.
请注意,您的表达式中的字符类与后面的描述不同,因为它们似乎只是在某些地方的小写字符中使用a-b。这个例子是对那些地方所有字母数字字符的许可。
#1
3
x=ll_paprika.sc
VAR=`echo $X | grep *.sc`
echo $VAR
The reason this isn't doing what you want is that the grep matches a line and returns it. *.sc
does in fact match 11_paprika.sc
, so it returns that whole line and sticks it in $VAR
.
这不是做你想做的事情的原因是grep匹配一行并返回它。*。sc实际上匹配11_paprika。它返回整行并插入$VAR。
If you want to just get a part of it, the cut
line probably better. There is a grep -o
option that returns only the matching portion, but for this you'd basically have to put in the thing you were looking for, at which point why bother?
如果你想要得到它的一部分,截线可能更好。有一个grep -o选项,它只返回匹配的部分,但是对于这个选项,你基本上需要把你要找的东西放进去,在那一点上为什么要麻烦呢?
the file name must start with a letter
文件名必须以字母开头。
`grep -e "^[a-zA-Z]
grep - e " ^[a-zA-Z]
and then it can have any number of letters and numbers following it
它后面可以有任意数量的字母和数字
[a-zA-Z0-9]*
(a-zA-Z0-9)*
and it must have an _ delimit a series of numbers and another _ to delimit the next set of numbers and characters and another _ to delimit the next set of numbers
它必须有_分隔一系列数字和_分隔下一组数字和字符以及_分隔下一组数字
(_[0-9]+){3}
(_[0 - 9]+){ 3 }
and then it must have a single period following by raw.
然后它必须有一个单独的周期跟随raw。
.raw"
.raw”
#2
0
For the first, use:
第一,使用:
VAR=`echo $X | egrep '\.sc$'`
For the second, you can try this alternative instead:
第二点,你可以尝试以下方法:
VAR=`echo $X | egrep '^[[:alpha:]][[:alnum:]]*_[[:digit:]]+_[[:alnum:]]+_[[:digit:]]+\.raw'`
Note that your character classes from your expression differ from the description that follows in that they seem to only be permissive of a-b for lower case characters in some places. This example is permissive of all alphanumeric characters in those places.
请注意,您的表达式中的字符类与后面的描述不同,因为它们似乎只是在某些地方的小写字符中使用a-b。这个例子是对那些地方所有字母数字字符的许可。