grep匹配字符串加前后字母

时间:2022-09-13 00:27:49

I have two files with strings where the strings in fileA matches part of the strings in fileB. Every string is unique and only has 1 match. I would like to match strings in fileA to the strings in fileB and print the match and 10 characters before and after of the match.

我有两个带字符串的文件,其中fileA中的字符串与fileB中的部分字符串匹配。每个字符串都是唯一的,只有1个匹配项。我想将fileA中的字符串与fileB中的字符串匹配,并在匹配前后打印匹配和10个字符。

Maybe grep -f would work, but how do I get the 10 characters before and after.

也许grep -f可以工作,但是如何获得前后10个字符。

FileA

FILEA

     TGAGGTAGTAGTTTGTACAGTT
     ACTGTACAGGCCACTGCCTTGC
     TGAGGTAGTAGTTTGTGCTGTT

FileB

FILEB

     CCAGGCTGAGGTAGTAGTTTGTACAGTTTGAGGGTCTATGATACCACCCGGTACAGGAGA
     TAACTGTACAGGCCACTGCCTTGCCAGG

     CTGGCTGAGGTAGTAGTTTGTGCTGTTGGTCGGGTTGTGACATTGCCCGCTGTGGAGATA
     ACTGCGCAAGCTACTGCCTTGCTAG

     GCTTGGGACACATACTTCTTTATATGCCCATATGAACCTGCTAAGCTATGGAATGTAAAG
     AAGTATGTATTTCAGGC

     CTGTAGCAGCACATCATGGTTTACATACTACAGTCAAGATGCGAATCATTATTTGCTGCT
     CTAG

2 个解决方案

#1


1  

You can use a while loop over fileA and grep:

你可以在fileA和grep上使用while循环:

while read line ; do
    grep -o  ".\{0,10\}$line.\{0,10\}" fileB.txt
done < fileA.txt 

This example assumes that the contents of fileA.txt will not contain special characters which could break the regex. Otherwise you need to escape them:

此示例假定fileA.txt的内容不包含可能破坏正则表达式的特殊字符。否则你需要逃脱它们:

while read line ; do
    search=$(sed 's/[^[:alnum:]]/\\\0/g' <<< "$line")
    grep -o  ".\{0,10\}$search.\{0,10\}" fileB.txt
done < fileA.txt 

#2


1  

You can use sed to pre-process a pattern and send it through stdin:

您可以使用sed预处理模式并通过stdin发送它:

sed 's/^/.{,10}/;s/$/.{,10}/' fileA | grep -oEf - fileB

Here, the sed part produces something like this:

这里,sed部分产生这样的东西:

.{,10}TGAGGTAGTAGTTTGTACAGTT.{,10}
.{,10}ACTGTACAGGCCACTGCCTTGC.{,10}
.{,10}TGAGGTAGTAGTTTGTGCTGTT.{,10}

and we use the -E option for extended regexes. The - after -Ef says we want to use the standard input as the file argument (to -f).

我们使用-E选项来扩展正则表达式。 - 之后-Ef表示我们希望使用标准输入作为文件参数(到-f)。

#1


1  

You can use a while loop over fileA and grep:

你可以在fileA和grep上使用while循环:

while read line ; do
    grep -o  ".\{0,10\}$line.\{0,10\}" fileB.txt
done < fileA.txt 

This example assumes that the contents of fileA.txt will not contain special characters which could break the regex. Otherwise you need to escape them:

此示例假定fileA.txt的内容不包含可能破坏正则表达式的特殊字符。否则你需要逃脱它们:

while read line ; do
    search=$(sed 's/[^[:alnum:]]/\\\0/g' <<< "$line")
    grep -o  ".\{0,10\}$search.\{0,10\}" fileB.txt
done < fileA.txt 

#2


1  

You can use sed to pre-process a pattern and send it through stdin:

您可以使用sed预处理模式并通过stdin发送它:

sed 's/^/.{,10}/;s/$/.{,10}/' fileA | grep -oEf - fileB

Here, the sed part produces something like this:

这里,sed部分产生这样的东西:

.{,10}TGAGGTAGTAGTTTGTACAGTT.{,10}
.{,10}ACTGTACAGGCCACTGCCTTGC.{,10}
.{,10}TGAGGTAGTAGTTTGTGCTGTT.{,10}

and we use the -E option for extended regexes. The - after -Ef says we want to use the standard input as the file argument (to -f).

我们使用-E选项来扩展正则表达式。 - 之后-Ef表示我们希望使用标准输入作为文件参数(到-f)。