swift 2:如何在不丢失分割的字符的情况下拆分字符串?

时间:2020-12-06 21:36:46

Assuming I have the string "a:b:c" how do I split it so that I end up with the array ["a", ":", "b", ":", "c"]?

假设我有字符串“a:b:c”我如何拆分它以便最终得到数组[“a”,“:”,“b”,“:”,“c”]?

My ultimate goal is method I can pass in a regexp for whatever delimiters I want (not just ":") but I can't figure out how to split a string in Swift 2 without loosing the characters it split on.

我的最终目标是我可以在regexp中传递我想要的任何分隔符(不仅仅是“:”),但我无法弄清楚如何在Swift 2中拆分字符串而不会丢失它所分割的字符。

[edit] to clarify (based on comments) I'm not trying to split it by character, and I'm not trying to split on ":" specifically. It's just a random delimiter that I thought would provide a simple example. I want to know how to split a string on ANY random delimiter defined in a regexp and NOT loose the delimiter. "fooBerry-BazClom*" split on something like [B\\-*] would get me ["foo", "B", "erry", "-", "B", "az", "Clom", "*"]

[编辑]澄清(根据评论)我不是试图按字符拆分它,我不是要特意分裂“:”。它只是一个随机分隔符,我认为这将提供一个简单的例子。我想知道如何在正则表达式中定义的任意随机分隔符上拆分字符串,而不是松散分隔符。 “fooBerry-BazClom *”在像[B \\ - *]这样的东西上分开会让我[“foo”,“B”,“erry”,“ - ”,“B”,“az”,“Clom”,“ *“]

2 个解决方案

#1


1  

I believe this will do the trick (not sure if it is very efficient though):

我相信这会起作用(不确定它是否非常有效):

extension String
{
   func componentsStartingFromCharactersInSet(searchSet: NSCharacterSet) -> [String]
   {
      if self == "" { return [] }

      if let firstDelimiter = rangeOfCharacterFromSet(searchSet)         
      {
         let delimiter       = self.substringWithRange(firstDelimiter)
         var result:[String] = []

         if let rightIndex = firstDelimiter.last?.successor()
         { result = self.substringFromIndex(rightIndex).componentsStartingFromCharactersInSet(searchSet) }

         result.insert(delimiter, atIndex:0)

         if !hasPrefix(delimiter)
         { result.insert(self.substringToIndex(firstDelimiter.first!), atIndex:0) }

         return result
      }
      return [self]       
   }
}

Using it as follows :

使用方法如下:

let searchSet = NSCharacterSet(charactersInString:"B\\-*")
"fooBerry-BazClom*".componentsStartingFromCharactersInSet(searchSet)

returns ["foo", "B", "erry", "-", "B", "azClom", "*"]

返回[“foo”,“B”,“erry”,“ - ”,“B”,“azClom”,“*”]

Given that you need an regular expression to express the delimiters, I'm not certain what you're aiming for but here's a modified version based on regular expressions (and some fiddling with range type casting):

鉴于您需要一个正则表达式来表达分隔符,我不确定您的目标是什么,但这里是基于正则表达式的修改版本(以及一些摆放范围类型转换):

extension String
{   
   var length:Int {return (self as NSString).length }

   func stringRange(range:NSRange) -> Range<String.Index>
   {
     let start = self.startIndex.advancedBy(range.location) 
     let end   = start.advancedBy(range.length) 
     return Range<String.Index>(start: start, end: end)
   }

   func componentsFromRegExp(regExp:String) -> [String]
   {
     if self == "" { return [] }
     do
     {
        let expression = try NSRegularExpression(pattern: regExp, options: NSRegularExpressionOptions.CaseInsensitive)
        return self.componentsFromRegExp(expression) 
     }
     catch { return [self] }
   }

   func componentsFromRegExp(regExp:NSRegularExpression) -> [String]
   {
      if self == "" { return [] }

      if let firstMatch = regExp.firstMatchInString(self, options:NSMatchingOptions(rawValue:0), range:NSMakeRange(0, self.length) )
         where firstMatch.range.length > 0         
      {
         let firstDelimiter  = self.stringRange(firstMatch.range)
         let delimiter       = self.substringWithRange(firstDelimiter)
         var result:[String] = []

         if let rightIndex = firstDelimiter.last?.successor()
         { result = self.substringFromIndex(rightIndex).componentsFromRegExp(regExp) }

         result.insert(delimiter, atIndex:0)

         if !hasPrefix(delimiter)
         { result.insert(self.substringToIndex(firstDelimiter.first!), atIndex:0) }

         return result
      }
      return [self]       
   }
}

I had to use a different syntax in the regular expression to define the delimiters. That's why I'm not sure I fully understood what you need.

我必须在正则表达式中使用不同的语法来定义分隔符。这就是为什么我不确定我完全理解你需要什么。

"fooBerry-BazClom*".componentsFromRegExp("B|-|\\*")
// returns ["foo", "B", "erry", "-", "B", "azClom", "*"]

#2


0  

You can solve this by putting a backreference in your template. It feels a little crude to me, but out of the proposed solutions it's the fastest by a long shot (see performance notes at the end of this)

您可以通过在模板中添加反向引用来解决此问题。对我来说这感觉有点粗糙,但是从提出的解决方案来看,它是最快的(见本文末尾的性能说明)

let y="YupNope-FractalOrangexbluey";
let testPattern="(Nope|-|[xy])";

func splitStingOnRegex(aString: String, aPattern: String) -> Array<String> {
    do{
        let regEx = try NSRegularExpression(pattern: aPattern, options: NSRegularExpressionOptions())
        let template = "\u{16E5}\u{16E5}\u{16E5}$1\u{16E5}\u{16E5}\u{16E5}"
        // u+1635 is an ancient rune unlikely to show up in modern text (or ancient (i hope)) 3 times in a row
        let modifiedString = regEx.stringByReplacingMatchesInString(
            aString, options: NSMatchingOptions(),
            range: NSMakeRange(0, aString.characters.count),
            withTemplate:template)
        let cleanedSideBySideMatches = modifiedString.stringByReplacingOccurrencesOfString("\u{16E5}\u{16E5}\u{16E5}\u{16E5}\u{16E5}\u{16E5}", withString: "\u{16E5}\u{16E5}\u{16E5}", options: NSStringCompareOptions.LiteralSearch, range: nil)

        let arrPlusOne = cleanedSideBySideMatches.componentsSeparatedByString("\u{16E5}\u{16E5}\u{16E5}")
        if arrPlusOne.count > 1 {
            return Array(arrPlusOne[0...(arrPlusOne.count - 2)]);
            // because there's always an extra one at the end
        } else {
            return arrPlusOne;
            // nothing was matched
        }
    } catch {
        return []
    }
}

splitStingOnRegex(y, aPattern: testPattern);
// ["Yup", "Nope", "-", "FractalOrange", "x", "blue", "y"]

Alternately you can get an array of the matches, and an array of the things that didn't match and zip them together.

或者,您可以获得匹配数组,以及不匹配的数组并将它们压缩在一起。

func newSplitStringOnRegex(aString: String, aPattern: String) -> Array<String>{
    do {
        let regEx = try NSRegularExpression(pattern: aPattern, options: NSRegularExpressionOptions())
        let template = "\u{16E5}\u{16E5}\u{16E5}"
        let aNSString = aString as NSString;
        // u+1635 is an ancient rune unlikely to show up in modern text (or ancient (i hope)) 3 times in a row
        var modifiedString = regEx.stringByReplacingMatchesInString(
            aString, options: NSMatchingOptions(),
            range: NSMakeRange(0, aString.characters.count),
            withTemplate:template)
        // if the first match was at the beginning
        // we'll end up with an extra "" at the start of our array when we split
        if modifiedString.hasPrefix(template) {
            modifiedString = (modifiedString as NSString).substringFromIndex(3);
        }
        modifiedString

        let unmatchedItems = modifiedString.componentsSeparatedByString(template)
        unmatchedItems.last
        let matchRanges = regEx.matchesInString(aString, options: NSMatchingOptions(), range: NSMakeRange(0, aString.characters.count));
        let matches = matchRanges.map { aNSString.substringWithRange($0.range)}
        // now let's zip the matched and unmatched items together
        let merged = zip(unmatchedItems, matches).map{[$0.0, $0.1]}.flatMap({$0});

        // zip will leave any extra items off the end
        // because this is ultimately a split we'll never have more than one extra
        if unmatchedItems.count > matches.count {
            return merged + [unmatchedItems.last!];
        } else if matches.count > unmatchedItems.count {
            return merged + [matches.last!];
        }
        // no extras
        return merged;

    } catch {
        return Array<String>();
    }
}

newSplitStringOnRegex(text, aPattern: testPattern);
// ["Yup", "Nope", "", "Nope", "FractalOrange", "-", "blue", "x", "", "y"]

Performance notes

Testing these two, plus Alain T's on my computer with a test string that had ~50 matches and ~50 delimiters: I ran them each 1000 times and got these results:

测试这两个,加上我的计算机上的Alain T's,测试字符串有~50个匹配和~50个分隔符:我每次运行它们1000次并获得以下结果:

  • splitStingOnRegex (my first solution) ~7 seconds
  • splitStingOnRegex(我的第一个解决方案)~7秒

  • newSplitStringOnRegex (my second solution) ~ 32.5 seconds
  • newSplitStringOnRegex(我的第二个解决方案)~32.5秒

  • componentsFromRegExp (Alain's 2nd) ~ 152 seconds
  • componentsFromRegExp(阿兰的第二个)~152秒

  • componentsStartingFromCharactersInSet (Alain's 1st) ~ 122 seconds
  • componentsStartingFromCharactersInSet(Alain的第1个)~122秒

So there you have it. Crufty simple-minded solutions for the win. ;)

所以你有它。赢得胜利的简单易懂的解决方案。 ;)

#1


1  

I believe this will do the trick (not sure if it is very efficient though):

我相信这会起作用(不确定它是否非常有效):

extension String
{
   func componentsStartingFromCharactersInSet(searchSet: NSCharacterSet) -> [String]
   {
      if self == "" { return [] }

      if let firstDelimiter = rangeOfCharacterFromSet(searchSet)         
      {
         let delimiter       = self.substringWithRange(firstDelimiter)
         var result:[String] = []

         if let rightIndex = firstDelimiter.last?.successor()
         { result = self.substringFromIndex(rightIndex).componentsStartingFromCharactersInSet(searchSet) }

         result.insert(delimiter, atIndex:0)

         if !hasPrefix(delimiter)
         { result.insert(self.substringToIndex(firstDelimiter.first!), atIndex:0) }

         return result
      }
      return [self]       
   }
}

Using it as follows :

使用方法如下:

let searchSet = NSCharacterSet(charactersInString:"B\\-*")
"fooBerry-BazClom*".componentsStartingFromCharactersInSet(searchSet)

returns ["foo", "B", "erry", "-", "B", "azClom", "*"]

返回[“foo”,“B”,“erry”,“ - ”,“B”,“azClom”,“*”]

Given that you need an regular expression to express the delimiters, I'm not certain what you're aiming for but here's a modified version based on regular expressions (and some fiddling with range type casting):

鉴于您需要一个正则表达式来表达分隔符,我不确定您的目标是什么,但这里是基于正则表达式的修改版本(以及一些摆放范围类型转换):

extension String
{   
   var length:Int {return (self as NSString).length }

   func stringRange(range:NSRange) -> Range<String.Index>
   {
     let start = self.startIndex.advancedBy(range.location) 
     let end   = start.advancedBy(range.length) 
     return Range<String.Index>(start: start, end: end)
   }

   func componentsFromRegExp(regExp:String) -> [String]
   {
     if self == "" { return [] }
     do
     {
        let expression = try NSRegularExpression(pattern: regExp, options: NSRegularExpressionOptions.CaseInsensitive)
        return self.componentsFromRegExp(expression) 
     }
     catch { return [self] }
   }

   func componentsFromRegExp(regExp:NSRegularExpression) -> [String]
   {
      if self == "" { return [] }

      if let firstMatch = regExp.firstMatchInString(self, options:NSMatchingOptions(rawValue:0), range:NSMakeRange(0, self.length) )
         where firstMatch.range.length > 0         
      {
         let firstDelimiter  = self.stringRange(firstMatch.range)
         let delimiter       = self.substringWithRange(firstDelimiter)
         var result:[String] = []

         if let rightIndex = firstDelimiter.last?.successor()
         { result = self.substringFromIndex(rightIndex).componentsFromRegExp(regExp) }

         result.insert(delimiter, atIndex:0)

         if !hasPrefix(delimiter)
         { result.insert(self.substringToIndex(firstDelimiter.first!), atIndex:0) }

         return result
      }
      return [self]       
   }
}

I had to use a different syntax in the regular expression to define the delimiters. That's why I'm not sure I fully understood what you need.

我必须在正则表达式中使用不同的语法来定义分隔符。这就是为什么我不确定我完全理解你需要什么。

"fooBerry-BazClom*".componentsFromRegExp("B|-|\\*")
// returns ["foo", "B", "erry", "-", "B", "azClom", "*"]

#2


0  

You can solve this by putting a backreference in your template. It feels a little crude to me, but out of the proposed solutions it's the fastest by a long shot (see performance notes at the end of this)

您可以通过在模板中添加反向引用来解决此问题。对我来说这感觉有点粗糙,但是从提出的解决方案来看,它是最快的(见本文末尾的性能说明)

let y="YupNope-FractalOrangexbluey";
let testPattern="(Nope|-|[xy])";

func splitStingOnRegex(aString: String, aPattern: String) -> Array<String> {
    do{
        let regEx = try NSRegularExpression(pattern: aPattern, options: NSRegularExpressionOptions())
        let template = "\u{16E5}\u{16E5}\u{16E5}$1\u{16E5}\u{16E5}\u{16E5}"
        // u+1635 is an ancient rune unlikely to show up in modern text (or ancient (i hope)) 3 times in a row
        let modifiedString = regEx.stringByReplacingMatchesInString(
            aString, options: NSMatchingOptions(),
            range: NSMakeRange(0, aString.characters.count),
            withTemplate:template)
        let cleanedSideBySideMatches = modifiedString.stringByReplacingOccurrencesOfString("\u{16E5}\u{16E5}\u{16E5}\u{16E5}\u{16E5}\u{16E5}", withString: "\u{16E5}\u{16E5}\u{16E5}", options: NSStringCompareOptions.LiteralSearch, range: nil)

        let arrPlusOne = cleanedSideBySideMatches.componentsSeparatedByString("\u{16E5}\u{16E5}\u{16E5}")
        if arrPlusOne.count > 1 {
            return Array(arrPlusOne[0...(arrPlusOne.count - 2)]);
            // because there's always an extra one at the end
        } else {
            return arrPlusOne;
            // nothing was matched
        }
    } catch {
        return []
    }
}

splitStingOnRegex(y, aPattern: testPattern);
// ["Yup", "Nope", "-", "FractalOrange", "x", "blue", "y"]

Alternately you can get an array of the matches, and an array of the things that didn't match and zip them together.

或者,您可以获得匹配数组,以及不匹配的数组并将它们压缩在一起。

func newSplitStringOnRegex(aString: String, aPattern: String) -> Array<String>{
    do {
        let regEx = try NSRegularExpression(pattern: aPattern, options: NSRegularExpressionOptions())
        let template = "\u{16E5}\u{16E5}\u{16E5}"
        let aNSString = aString as NSString;
        // u+1635 is an ancient rune unlikely to show up in modern text (or ancient (i hope)) 3 times in a row
        var modifiedString = regEx.stringByReplacingMatchesInString(
            aString, options: NSMatchingOptions(),
            range: NSMakeRange(0, aString.characters.count),
            withTemplate:template)
        // if the first match was at the beginning
        // we'll end up with an extra "" at the start of our array when we split
        if modifiedString.hasPrefix(template) {
            modifiedString = (modifiedString as NSString).substringFromIndex(3);
        }
        modifiedString

        let unmatchedItems = modifiedString.componentsSeparatedByString(template)
        unmatchedItems.last
        let matchRanges = regEx.matchesInString(aString, options: NSMatchingOptions(), range: NSMakeRange(0, aString.characters.count));
        let matches = matchRanges.map { aNSString.substringWithRange($0.range)}
        // now let's zip the matched and unmatched items together
        let merged = zip(unmatchedItems, matches).map{[$0.0, $0.1]}.flatMap({$0});

        // zip will leave any extra items off the end
        // because this is ultimately a split we'll never have more than one extra
        if unmatchedItems.count > matches.count {
            return merged + [unmatchedItems.last!];
        } else if matches.count > unmatchedItems.count {
            return merged + [matches.last!];
        }
        // no extras
        return merged;

    } catch {
        return Array<String>();
    }
}

newSplitStringOnRegex(text, aPattern: testPattern);
// ["Yup", "Nope", "", "Nope", "FractalOrange", "-", "blue", "x", "", "y"]

Performance notes

Testing these two, plus Alain T's on my computer with a test string that had ~50 matches and ~50 delimiters: I ran them each 1000 times and got these results:

测试这两个,加上我的计算机上的Alain T's,测试字符串有~50个匹配和~50个分隔符:我每次运行它们1000次并获得以下结果:

  • splitStingOnRegex (my first solution) ~7 seconds
  • splitStingOnRegex(我的第一个解决方案)~7秒

  • newSplitStringOnRegex (my second solution) ~ 32.5 seconds
  • newSplitStringOnRegex(我的第二个解决方案)~32.5秒

  • componentsFromRegExp (Alain's 2nd) ~ 152 seconds
  • componentsFromRegExp(阿兰的第二个)~152秒

  • componentsStartingFromCharactersInSet (Alain's 1st) ~ 122 seconds
  • componentsStartingFromCharactersInSet(Alain的第1个)~122秒

So there you have it. Crufty simple-minded solutions for the win. ;)

所以你有它。赢得胜利的简单易懂的解决方案。 ;)