regex:获取两个单词之间的文本(R)

I have a text document and I'm trying to get the text between the words "abstract" and "keywords" (in R). This is the code I'm using:

我有一个文本文档，我试图在“抽象”和“关键字”(R)之间找到文本。

gsub(".*abstract\\s*|keywords.*", "\\1", string)

However, this didn't work because somewhere else in the text the word "abstract" occurred so I made it non-greedy like this (added ? in front of abstract)

但是，这并没有起作用，因为在文本的其他地方出现了“抽象”一词，所以我将它设置为非贪婪(添加了?)前面的文摘)

gsub(".*?abstract\\s*|keywords.*", "\\1", string)

But for some reason it now takes the text between "abstract" and "keywords" (which is what I want), but ALSO the text starting from the second "abstract" appearing in the text, all the way to the end. Any ideas?

但是出于某种原因，它现在把文本放在“抽象”和“关键字”之间(这是我想要的)，同时也把文本从第二个“抽象”开始，一直到最后。什么好主意吗?

2 个解决方案

#1

it doesn't look like you are capturing anything in your search term, you just need some ()'s in there to actually grab something so \\1 will return your target :

看起来你并没有在你的搜索词中捕捉到任何东西，你只是需要一些()来获取一些东西，所以\1会返回你的目标:

words <- c("these are some different abstract words that might be between keywords or they might just be bounded by abstract ideas")
gsub(".* abstract (.*) keywords.*", "\\1", words)
[1] "words that might be between"

#2

I think this should give you what you are looking for:

我认为这应该会给你你想要的东西:

regmatches(string, gregexpr("(?<=abstract).*(?=keywords)", string, perl = TRUE))

What it does:

它所做的:

(?<=abstract) use the "look ahead" capabilities to find things after the word "abstract"
(?<=抽象)使用“展望未来”功能，在“抽象”一词后找到事物
.* match any number of keywords
.*匹配任意数量的关键字
(?=keywords) use the "look behind" for find things before the word "keywords"
使用“look behind”在关键词之前查找
gregexpr looks for the given regular expression in string
gregexpr在字符串中查找给定的正则表达式
perl = TRUE allows for the "look ahead" and "look behind" functionality
perl = TRUE允许“展望未来”和“展望未来”功能
regmatches pulls out the matching piece of the string using the regular expression.
regmatches使用正则表达式提取匹配的字符串片段。

#1