I have a text document and I'm trying to get the text between the words "abstract" and "keywords" (in R). This is the code I'm using:
我有一个文本文档,我试图在“抽象”和“关键字”(R)之间找到文本。
gsub(".*abstract\\s*|keywords.*", "\\1", string)
However, this didn't work because somewhere else in the text the word "abstract" occurred so I made it non-greedy like this (added ? in front of abstract)
但是,这并没有起作用,因为在文本的其他地方出现了“抽象”一词,所以我将它设置为非贪婪(添加了?)前面的文摘)
gsub(".*?abstract\\s*|keywords.*", "\\1", string)
But for some reason it now takes the text between "abstract" and "keywords" (which is what I want), but ALSO the text starting from the second "abstract" appearing in the text, all the way to the end. Any ideas?
但是出于某种原因,它现在把文本放在“抽象”和“关键字”之间(这是我想要的),同时也把文本从第二个“抽象”开始,一直到最后。什么好主意吗?
2 个解决方案
#1
1
it doesn't look like you are capturing anything in your search term, you just need some ()
's in there to actually grab something so \\1
will return your target :
看起来你并没有在你的搜索词中捕捉到任何东西,你只是需要一些()来获取一些东西,所以\1会返回你的目标:
words <- c("these are some different abstract words that might be between keywords or they might just be bounded by abstract ideas")
gsub(".* abstract (.*) keywords.*", "\\1", words)
[1] "words that might be between"
#2
1
I think this should give you what you are looking for:
我认为这应该会给你你想要的东西:
regmatches(string, gregexpr("(?<=abstract).*(?=keywords)", string, perl = TRUE))
What it does:
它所做的:
-
(?<=abstract)
use the "look ahead" capabilities to find things after the word "abstract" - (?<=抽象)使用“展望未来”功能,在“抽象”一词后找到事物
-
.*
match any number of keywords - .*匹配任意数量的关键字
-
(?=keywords)
use the "look behind" for find things before the word "keywords" - 使用“look behind”在关键词之前查找
-
gregexpr
looks for the given regular expression instring
- gregexpr在字符串中查找给定的正则表达式
-
perl = TRUE
allows for the "look ahead" and "look behind" functionality - perl = TRUE允许“展望未来”和“展望未来”功能
-
regmatches
pulls out the matching piece of the string using the regular expression. - regmatches使用正则表达式提取匹配的字符串片段。
#1
1
it doesn't look like you are capturing anything in your search term, you just need some ()
's in there to actually grab something so \\1
will return your target :
看起来你并没有在你的搜索词中捕捉到任何东西,你只是需要一些()来获取一些东西,所以\1会返回你的目标:
words <- c("these are some different abstract words that might be between keywords or they might just be bounded by abstract ideas")
gsub(".* abstract (.*) keywords.*", "\\1", words)
[1] "words that might be between"
#2
1
I think this should give you what you are looking for:
我认为这应该会给你你想要的东西:
regmatches(string, gregexpr("(?<=abstract).*(?=keywords)", string, perl = TRUE))
What it does:
它所做的:
-
(?<=abstract)
use the "look ahead" capabilities to find things after the word "abstract" - (?<=抽象)使用“展望未来”功能,在“抽象”一词后找到事物
-
.*
match any number of keywords - .*匹配任意数量的关键字
-
(?=keywords)
use the "look behind" for find things before the word "keywords" - 使用“look behind”在关键词之前查找
-
gregexpr
looks for the given regular expression instring
- gregexpr在字符串中查找给定的正则表达式
-
perl = TRUE
allows for the "look ahead" and "look behind" functionality - perl = TRUE允许“展望未来”和“展望未来”功能
-
regmatches
pulls out the matching piece of the string using the regular expression. - regmatches使用正则表达式提取匹配的字符串片段。