I have two files with strings where the strings in fileA
matches part of the strings in fileB
. Every string is unique and only has 1 match. I would like to match strings in fileA
to the strings in fileB
and print the match and 10 characters before and after of the match.
我有两个带字符串的文件,其中fileA中的字符串与fileB中的部分字符串匹配。每个字符串都是唯一的,只有1个匹配项。我想将fileA中的字符串与fileB中的字符串匹配,并在匹配前后打印匹配和10个字符。
Maybe grep -f
would work, but how do I get the 10 characters before and after.
也许grep -f可以工作,但是如何获得前后10个字符。
FileA
FILEA
TGAGGTAGTAGTTTGTACAGTT
ACTGTACAGGCCACTGCCTTGC
TGAGGTAGTAGTTTGTGCTGTT
FileB
FILEB
CCAGGCTGAGGTAGTAGTTTGTACAGTTTGAGGGTCTATGATACCACCCGGTACAGGAGA
TAACTGTACAGGCCACTGCCTTGCCAGG
CTGGCTGAGGTAGTAGTTTGTGCTGTTGGTCGGGTTGTGACATTGCCCGCTGTGGAGATA
ACTGCGCAAGCTACTGCCTTGCTAG
GCTTGGGACACATACTTCTTTATATGCCCATATGAACCTGCTAAGCTATGGAATGTAAAG
AAGTATGTATTTCAGGC
CTGTAGCAGCACATCATGGTTTACATACTACAGTCAAGATGCGAATCATTATTTGCTGCT
CTAG
2 个解决方案
#1
1
You can use a while loop over fileA and grep:
你可以在fileA和grep上使用while循环:
while read line ; do
grep -o ".\{0,10\}$line.\{0,10\}" fileB.txt
done < fileA.txt
This example assumes that the contents of fileA.txt will not contain special characters which could break the regex. Otherwise you need to escape them:
此示例假定fileA.txt的内容不包含可能破坏正则表达式的特殊字符。否则你需要逃脱它们:
while read line ; do
search=$(sed 's/[^[:alnum:]]/\\\0/g' <<< "$line")
grep -o ".\{0,10\}$search.\{0,10\}" fileB.txt
done < fileA.txt
#2
1
You can use sed to pre-process a pattern and send it through stdin:
您可以使用sed预处理模式并通过stdin发送它:
sed 's/^/.{,10}/;s/$/.{,10}/' fileA | grep -oEf - fileB
Here, the sed part produces something like this:
这里,sed部分产生这样的东西:
.{,10}TGAGGTAGTAGTTTGTACAGTT.{,10}
.{,10}ACTGTACAGGCCACTGCCTTGC.{,10}
.{,10}TGAGGTAGTAGTTTGTGCTGTT.{,10}
and we use the -E
option for extended regexes. The -
after -Ef
says we want to use the standard input as the file argument (to -f
).
我们使用-E选项来扩展正则表达式。 - 之后-Ef表示我们希望使用标准输入作为文件参数(到-f)。
#1
1
You can use a while loop over fileA and grep:
你可以在fileA和grep上使用while循环:
while read line ; do
grep -o ".\{0,10\}$line.\{0,10\}" fileB.txt
done < fileA.txt
This example assumes that the contents of fileA.txt will not contain special characters which could break the regex. Otherwise you need to escape them:
此示例假定fileA.txt的内容不包含可能破坏正则表达式的特殊字符。否则你需要逃脱它们:
while read line ; do
search=$(sed 's/[^[:alnum:]]/\\\0/g' <<< "$line")
grep -o ".\{0,10\}$search.\{0,10\}" fileB.txt
done < fileA.txt
#2
1
You can use sed to pre-process a pattern and send it through stdin:
您可以使用sed预处理模式并通过stdin发送它:
sed 's/^/.{,10}/;s/$/.{,10}/' fileA | grep -oEf - fileB
Here, the sed part produces something like this:
这里,sed部分产生这样的东西:
.{,10}TGAGGTAGTAGTTTGTACAGTT.{,10}
.{,10}ACTGTACAGGCCACTGCCTTGC.{,10}
.{,10}TGAGGTAGTAGTTTGTGCTGTT.{,10}
and we use the -E
option for extended regexes. The -
after -Ef
says we want to use the standard input as the file argument (to -f
).
我们使用-E选项来扩展正则表达式。 - 之后-Ef表示我们希望使用标准输入作为文件参数(到-f)。