从给定的字符串中提取单词只包含alpanumeric

时间:2022-09-13 11:33:10

Can someone please tell me how can I extract the "model name" from the below product names. As an example all I need is, extract "SGS45A08GB" from "Bosch SGS45A08GB Silver Dishwasher". Seems like I have to create Regex to identify words which has Alphanumric values for given string. Can someone give me some c# example to get this done.

请问如何从以下产品名称中提取“型号名称”?作为一个例子,我所需要的是,从“Bosch SGS45A08GB银色洗碗机”中提取“SGS45A08GB”。似乎我必须创建Regex来识别具有给定字符串的字母数字值的单词。有人能给我举个c#的例子吗?

Some example strings with model names:

一些带有模型名称的示例字符串:

Bosch SGS45A08GB Silver Dishwasher
        Bosch Avantixx SGS45A02GB Dishwasher, White
        Bosch SMS53E12GB White Dishwasher
        Bosch SGS45A08GB Dishwashers
        BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
        Bosch SKS60E02GB Compact Dishwasher, White
        BOSCH SRV43M03GB Slimline Integrated Dishwasher
        BOSCH Classixx SGS45C12GB Full-size Dishwasher - White
        BOSCH SGS45A02GB Dishwashers
        Bosch 18V Cordless Drill Driver
        Bosch PSB 18V Li-Ion Hammer Drill
        Bosch SGS45A08GB Dishwasher
        Bosch SGS45A08 12Place Full Size Dishwasher in Silver

EDIT: Adding more product names

编辑:添加更多的产品名称

    Hitachi DH24DVC 4kg Cordless SDS Plus Hammer Drill 24V
    DeWalt DW965K 12V Angled Drill Driver
    Grove Modern Bathroom Suite with Acrylic Bath
    Bosch GBH24V 3.2kg SDS Plus Drill 24V
    Makita LS0714/1 190mm Sliding Compound Mitre Saw 110V
    Grove Modern Bathroom Suite with Steel Bath
    Swann All-in-One Monitoring & Recording Kit with LCD
    Makita BHR202RFE LXT 3.2kg SDS+ Rotary Hammer Drill 18V
    DeWalt DW625EK-GB 2000W Router 240V
    Trade Triple-Extension Ladder ELT340
    Makita 6391DWPE3 18V Drill Driver
    Erbauer ERF298MSW 165mm Sliding Compound Mitre Saw 24V

2 个解决方案

#1


3  

If you define "alphanumeric" as a string that contains both ASCII uppercase letters and numbers, and if you assume a minimum length for a model name (let's say 8 characters), then you can match all the names from your example using

如果将“字母数字”定义为包含ASCII大写字母和数字的字符串,并且假设模型名称的最小长度(假设是8个字符),则可以使用示例匹配所有名称

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=[A-Z]*[0-9])  # assert presence of at least one ASCII digit
    (?=[0-9]*[A-Z])  # assert presence of at least one ASCII letter
    [0-9A-Z]{8,}     # match at least 8 characters
    \b               # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    // matched text: matchResults.Value
    // match start: matchResults.Index
    // match length: matchResults.Length
    matchResults = matchResults.NextMatch();
} 

I think that uppercase ASCII letters and digits is a reasonable assumption for model names, but if that's not correct, you need to show us more examples.

我认为大写的ASCII字母和数字是对模型名的合理假设,但是如果不正确,您需要向我们展示更多的示例。

EDIT: With your new examples, the following regex works, but the constraints are getting looser and looser, and you'll probably never find a regex that reliably matches all possible model names.

编辑:有了您的新示例,下面的regex就可以工作了,但是约束变得越来越松散,您可能永远也找不到可靠地匹配所有可能的模型名的regex。

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=\S*[0-9])   # assert presence of at least one ASCII digit
    (?=\S*[A-Z])   # assert presence of at least one ASCII letter
    [0-9A-Z/-]{6,} # match at least 6 characters
    \b             # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);

#2


0  

Well dude this is the best that I could do. Note some of the items don't have any model number:

伙计,这是我能做的最好的了。注意,有些项目没有任何型号:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication3 {
    class Program {
        static void Main(string[] args) {
            string _data = @"Bosch SGS45A08GB Silver Dishwasher
            Bosch Avantixx SGS45A02GB Dishwasher, White
            Bosch SMS53E12GB White Dishwasher
            Bosch SGS45A08GB Dishwashers
            BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
            Bosch SKS60E02GB Compact Dishwasher, White
            BOSCH SRV43M03GB Slimline Integrated Dishwasher
            BOSCH Classixx SGS45C12GB Full-size Dishwasher - White
            BOSCH SGS45A02GB DishwashersBosch 18V Cordless Drill Driver
            Bosch PSB 18V Li-Ion Hammer Drill
            Bosch SGS45A08GB Dishwasher
            Bosch SGS45A08 12Place Full Size Dishwasher in Silver";

            Regex _expression = new Regex(@"\p{Lu}{3}\d+\w+\s+");
            foreach (Match _match in _expression.Matches(_data)) {
                Console.WriteLine(_match.Value);
            }
            Console.ReadKey();
        }
    }
}

#1


3  

If you define "alphanumeric" as a string that contains both ASCII uppercase letters and numbers, and if you assume a minimum length for a model name (let's say 8 characters), then you can match all the names from your example using

如果将“字母数字”定义为包含ASCII大写字母和数字的字符串,并且假设模型名称的最小长度(假设是8个字符),则可以使用示例匹配所有名称

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=[A-Z]*[0-9])  # assert presence of at least one ASCII digit
    (?=[0-9]*[A-Z])  # assert presence of at least one ASCII letter
    [0-9A-Z]{8,}     # match at least 8 characters
    \b               # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
    // matched text: matchResults.Value
    // match start: matchResults.Index
    // match length: matchResults.Length
    matchResults = matchResults.NextMatch();
} 

I think that uppercase ASCII letters and digits is a reasonable assumption for model names, but if that's not correct, you need to show us more examples.

我认为大写的ASCII字母和数字是对模型名的合理假设,但是如果不正确,您需要向我们展示更多的示例。

EDIT: With your new examples, the following regex works, but the constraints are getting looser and looser, and you'll probably never find a regex that reliably matches all possible model names.

编辑:有了您的新示例,下面的regex就可以工作了,但是约束变得越来越松散,您可能永远也找不到可靠地匹配所有可能的模型名的regex。

Regex regexObj = new Regex(
    @"\b             # word boundary
    (?=\S*[0-9])   # assert presence of at least one ASCII digit
    (?=\S*[A-Z])   # assert presence of at least one ASCII letter
    [0-9A-Z/-]{6,} # match at least 6 characters
    \b             # until a word boundary", 
    RegexOptions.IgnorePatternWhitespace);

#2


0  

Well dude this is the best that I could do. Note some of the items don't have any model number:

伙计,这是我能做的最好的了。注意,有些项目没有任何型号:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication3 {
    class Program {
        static void Main(string[] args) {
            string _data = @"Bosch SGS45A08GB Silver Dishwasher
            Bosch Avantixx SGS45A02GB Dishwasher, White
            Bosch SMS53E12GB White Dishwasher
            Bosch SGS45A08GB Dishwashers
            BOSCH SGI45E15E Full-size Semi-Integrated Dishwasher
            Bosch SKS60E02GB Compact Dishwasher, White
            BOSCH SRV43M03GB Slimline Integrated Dishwasher
            BOSCH Classixx SGS45C12GB Full-size Dishwasher - White
            BOSCH SGS45A02GB DishwashersBosch 18V Cordless Drill Driver
            Bosch PSB 18V Li-Ion Hammer Drill
            Bosch SGS45A08GB Dishwasher
            Bosch SGS45A08 12Place Full Size Dishwasher in Silver";

            Regex _expression = new Regex(@"\p{Lu}{3}\d+\w+\s+");
            foreach (Match _match in _expression.Matches(_data)) {
                Console.WriteLine(_match.Value);
            }
            Console.ReadKey();
        }
    }
}