Python:使用关键短语从字符串中提取文本

时间:2023-01-27 20:26:04

Struggling trying to find a way to do this, any help would be great.

试图找到一种方法来努力,任何帮助都会很棒。

I have a long string – it’s the Title field. Here are some samples.

我有一个很长的字符串 - 这是Title字段。这是一些样本。

AIR-LAP1142N-A-K
AIR-LP142N-A-K
Used Airo 802.11n Draft 2.0 SingleAccess Point AIR-LP142N-A-9
Airo AIR-AP142N-A-K9 IOS Ver 15.2
MINT Lot of (2) AIR-LA112N-A-K9 - Dual-band-based 802.11a/g/n
Genuine Airo 112N  AP AIR-LP114N-A-K9 PoE
Wireless AP AIR-LP114N-A-9  Airy 50 availiable

I need to pull the part number out of the Title and assign it to a variable named ‘PartNumber’. The part number will always start with the characters ‘AIR-‘.

我需要从标题中提取部件号并将其分配给名为“PartNumber”的变量。部件号始终以字符'AIR-'开头。

So for example-

例如 -

Title = ‘AIR-LAP1142N-A-K9 W/POWER CORD’
PartNumber = yourformula(Title)

Print (PartNumber) will output AIR-LAP1142N-A-K9

打印(PartNumber)将输出AIR-LAP1142N-A-K9

I am fairly new to python and would greatly appreciate help. I would like it to ONLY print the part number not all the other text before or after.

我对python很新,非常感谢帮助。我希望它只打印部件号而不是之前或之后的所有其他文本。

4 个解决方案

#1


2  

def yourFunction(title):
    for word in title.split():
        if word.startswith('AIR-'):
            return word

>>> PartNumber = yourFunction(Title)
>>> print PartNumber

AIR-LAP1142N-A-K9

#2


3  

What you’re looking for is called regular expressions and is implemented in the re module. For instance, you’d need to write something like :

您正在寻找的是正则表达式,并在re模块中实现。例如,你需要写一些类似的东西:

>>> import re
>>> def format_title(title):
...     return re.search("(AIR-\S*)", title).group(1)
>>> Title = "Cisco AIR-LAP1142N-A-K9 W/POWER CORD"
>>> PartNumber = format_title(Title)
>>> print(PartNumber)
AIR-LAP1142N-A-K9

The \S ensures you match everything from AIR- to the next blank character.

\ S确保您匹配从AIR到下一个空白角色的所有内容。

#3


2  

This is a sensible time to use a regular expression. It looks like the part number consists of upper-case letters, hyphens, and numbers, so this should work:

这是使用正则表达式的合理时间。看起来部件号由大写字母,连字符和数字组成,因此这应该有效:

import re
def extract_part_number(title):
    return re.search(r'(AIR-[A-Z0-9\-]+)', title).groups()[0]

This will throw an error if it gets a string that doesn't contain something that looks like a part number, so you'll probably want to add some checks to make sure re.search doesn't return None and groups doesn't return an empty tuple.

如果它获得的字符串不包含看起来像部件号的内容,则会抛出错误,因此您可能希望添加一些检查以确保re.search不返回None并且组不返回一个空的元组。

#4


0  

You may/could use the .split() function. What this does is that it'll split parts of the text separated by spaces into a list.

您可以/可以使用.split()函数。这样做是因为它会将用空格分隔的部分文本拆分成一个列表。

To do this the way you want it, I'd make a new variable (named whatever); though for this example, let's go with titleSplitList. (Where as this variable is equal to titleSplitList = Title.split())

要按照你想要的方式做到这一点,我会创建一个新变量(命名为what);虽然对于这个例子,让我们使用titleSplitList。 (此变量等于titleSplitList = Title.split())

From here, you know that the part of text you're trying to retrieve is the second item of the titleSplitList, so you could assign it to a new variable by:

从这里,您知道您要检索的文本部分是titleSplitList的第二项,因此您可以通过以下方式将其分配给新变量:

PartNumber = titleSplitList[1]

Hope this helps.

希望这可以帮助。

#1


2  

def yourFunction(title):
    for word in title.split():
        if word.startswith('AIR-'):
            return word

>>> PartNumber = yourFunction(Title)
>>> print PartNumber

AIR-LAP1142N-A-K9

#2


3  

What you’re looking for is called regular expressions and is implemented in the re module. For instance, you’d need to write something like :

您正在寻找的是正则表达式,并在re模块中实现。例如,你需要写一些类似的东西:

>>> import re
>>> def format_title(title):
...     return re.search("(AIR-\S*)", title).group(1)
>>> Title = "Cisco AIR-LAP1142N-A-K9 W/POWER CORD"
>>> PartNumber = format_title(Title)
>>> print(PartNumber)
AIR-LAP1142N-A-K9

The \S ensures you match everything from AIR- to the next blank character.

\ S确保您匹配从AIR到下一个空白角色的所有内容。

#3


2  

This is a sensible time to use a regular expression. It looks like the part number consists of upper-case letters, hyphens, and numbers, so this should work:

这是使用正则表达式的合理时间。看起来部件号由大写字母,连字符和数字组成,因此这应该有效:

import re
def extract_part_number(title):
    return re.search(r'(AIR-[A-Z0-9\-]+)', title).groups()[0]

This will throw an error if it gets a string that doesn't contain something that looks like a part number, so you'll probably want to add some checks to make sure re.search doesn't return None and groups doesn't return an empty tuple.

如果它获得的字符串不包含看起来像部件号的内容,则会抛出错误,因此您可能希望添加一些检查以确保re.search不返回None并且组不返回一个空的元组。

#4


0  

You may/could use the .split() function. What this does is that it'll split parts of the text separated by spaces into a list.

您可以/可以使用.split()函数。这样做是因为它会将用空格分隔的部分文本拆分成一个列表。

To do this the way you want it, I'd make a new variable (named whatever); though for this example, let's go with titleSplitList. (Where as this variable is equal to titleSplitList = Title.split())

要按照你想要的方式做到这一点,我会创建一个新变量(命名为what);虽然对于这个例子,让我们使用titleSplitList。 (此变量等于titleSplitList = Title.split())

From here, you know that the part of text you're trying to retrieve is the second item of the titleSplitList, so you could assign it to a new variable by:

从这里,您知道您要检索的文本部分是titleSplitList的第二项,因此您可以通过以下方式将其分配给新变量:

PartNumber = titleSplitList[1]

Hope this helps.

希望这可以帮助。