Javascript和regex:分割字符串并保留分隔符

时间:2021-03-14 21:43:31

I have a string:

我有一个字符串:

var string = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc"

And I would like to split this string with the delimiter <br /> followed by a special character.

我想用分隔符
和一个特殊字符分割这个字符串。

To do that, I am using this:

为此,我使用以下方法:

string.split(/<br \/>&#?[a-zA-Z0-9]+;/g);

I am getting what I need, except that I am losing the delimiter. Here is the example: http://jsfiddle.net/JwrZ6/1/

我得到了我需要的,除了我失去了分隔符。这里有一个示例:http://jsfiddle.net/JwrZ6/1/

How can I keep the delimiter?

如何保存分隔符?

7 个解决方案

#1


57  

Use positive lookahead so that the regular expression asserts that the special character exists, but does not actually match it:

使用正向前视,使正则表达式断言特殊字符存在,但实际上不匹配:

string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g);

See it in action.

看它的实际应用。

Update: fixed typo (moved literal ; inside lookahead parens)

更新:固定排版(移动文字;里超前括号)

#2


96  

I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.

我遇到了类似但略有不同的问题。不管怎么说,这里有三个不同的例子,说明了在哪里保留减震器。

"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]

Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:

警告:第四个将只对单个字符有效。ConnorsFan提出一个替代:

// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);

#3


29  

If you wrap the delimiter in parantheses it will be part of the returned array.

如果您将分隔符包装成偏旁,它将是返回数组的一部分。

string.split(/(<br \/>&#?[a-zA-Z0-9]+);/g);
// returns ["aaaaaa", "<br />&dagger;", "bbbb", "<br />&Dagger;", "cccc"]

Depending on which part you want to keep change which subgroup you match

根据您想要保持的部分更改您匹配的子组

string.split(/(<br \/>)&#?[a-zA-Z0-9]+;/g);
// returns ["aaaaaa", "<br />", "bbbb", "<br />", "cccc"]

You could improve the expression by ignoring the case of letters string.split(/()&#?[a-z0-9]+;/gi);

您可以通过忽略字母字符串的情况来改进表达式。

And you can match for predefined groups like this: \d equals [0-9] and \w equals [a-zA-Z0-9_]. This means your expression could look like this.

你可以匹配预定义的组,比如:\d =[0-9]和\w = [a-zA-Z0-9_]。这意味着你的表达式可以是这样的。

string.split(/<br \/>(&#?[a-z\d]+;)/gi);

There is a good Regular Expression Reference on JavaScriptKit.

在JavaScriptKit上有一个很好的正则表达式引用。

#4


3  

answered it here also JavaScript Split Regular Expression keep the delimiter

这里也回答了JavaScript拆分正则表达式保持分隔符

use the (?=pattern) lookahead pattern in the regex example

在regex示例中使用(?=pattern)前瞻性模式

var string = '500x500-11*90~1+1';
string = string.replace(/(?=[$-/:-?{-~!"^_`\[\]])/gi, ",");
string = string.split(",");

this will give you the following result.

这会得到以下结果。

[ '500x500', '-11', '*90', '~1', '+1' ]

Can also be directly split

还可以直接拆分吗?

string = string.split(/(?=[$-/:-?{-~!"^_`\[\]])/gi);

giving the same result

给出相同的结果

[ '500x500', '-11', '*90', '~1', '+1' ]

#5


0  

An extension function splits string with substring or RegEx and the delimiter is putted according to second parameter ahead or behind.

扩展函数用子字符串或正则表达式拆分字符串,并根据前面或后面的第二个参数对分隔符进行处理。

    String.prototype.splitKeep = function (splitter, ahead) {
        var self = this;
        var result = [];
        if (splitter != '') {
            var matches = [];
            // Getting mached value and its index
            var replaceName = splitter instanceof RegExp ? "replace" : "replaceAll";
            var r = self[replaceName](splitter, function (m, i, e) {
                matches.push({ value: m, index: i });
                return getSubst(m);
            });
            // Finds split substrings
            var lastIndex = 0;
            for (var i = 0; i < matches.length; i++) {
                var m = matches[i];
                var nextIndex = ahead == true ? m.index : m.index + m.value.length;
                if (nextIndex != lastIndex) {
                    var part = self.substring(lastIndex, nextIndex);
                    result.push(part);
                    lastIndex = nextIndex;
                }
            };
            if (lastIndex < self.length) {
                var part = self.substring(lastIndex, self.length);
                result.push(part);
            };
            // Substitution of matched string
            function getSubst(value) {
                var substChar = value[0] == '0' ? '1' : '0';
                var subst = '';
                for (var i = 0; i < value.length; i++) {
                    subst += substChar;
                }
                return subst;
            };
        }
        else {
            result.add(self);
        };
        return result;
    };

The test:

测试:

    test('splitKeep', function () {
        // String
        deepEqual("1231451".splitKeep('1'), ["1", "231", "451"]);
        deepEqual("123145".splitKeep('1', true), ["123", "145"]);
        deepEqual("1231451".splitKeep('1', true), ["123", "145", "1"]);
        deepEqual("hello man how are you!".splitKeep(' '), ["hello ", "man ", "how ", "are ", "you!"]);
        deepEqual("hello man how are you!".splitKeep(' ', true), ["hello", " man", " how", " are", " you!"]);
        // Regex
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g), ["m", "hellomm", "hellommm", "hello"]);
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g, true), ["mhello", "mmhello", "mmmhello"]);
    });

#6


0  

I've been using this:

我一直用这个:

String.prototype.splitBy = function (delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');

  return this.split(delimiterRE).reduce((chunks, item) => {
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

Except that you shouldn't mess with String.prototype, so here's a function version:

除了你不应该弄乱字符串。原型,这是一个函数版本

var splitBy = function (text, delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');

  return text.split(delimiterRE).reduce(function(chunks, item){
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

So you could do:

所以你可以做的:

var haystack = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc"
var needle =  '<br \/>&#?[a-zA-Z0-9]+;';
var result = splitBy(string, haystack)
console.log( JSON.stringify( result, null, 2) )

And you'll end up with:

你会得到:

[
  "<br />&dagger; bbbb",
  "<br />&Dagger; cccc"
]

#7


0  

function formatDate(dt, format) {
    var monthNames = [
        "Enero", "Febrero", "Marzo",
        "Abril", "Mayo", "Junio", "Julio",
        "Agosto", "Setiembre", "Octubre",
        "Noviembre", "Diciembre"
    ];
    var Days = [
        "Domingo", "Lunes", "Martes", "Miercoles",
        "Jueves", "Viernes", "Sabado"
    ];

    function pad(n, width, z) {
        z = z || '0';
        n = n + '';
        return n.length >= width ? n : new Array(width - n.length + 1).join(z) + n;
    }

    function replace(val, date) {
        switch (val) {
            case 'yyyy':
                return date.getFullYear();
            case 'YYYY':
                return date.getFullYear();
            case 'yy':
                return (date.getFullYear() + "").substring(2);
            case 'YY':
                return (date.getFullYear() + "").substring(2);
            case 'MMMM':
                return monthNames[date.getMonth()];
            case 'MMM':
                return monthNames[date.getMonth()].substring(0, 3);
            case 'MM':
                return pad(date.getMonth() + 1, 2);
            case 'M':
                return date.getMonth() + 1;
            case 'dd':
                return pad(date.getDate(), 2);
            case 'd':
                return date.getDate();
            case 'DD':
                return Days[date.getDay()];
            case 'D':
                return Days[date.getDay()].substring(0, 3);
            case 'HH':
                return pad(date.getHours(), 2);
            case 'H':
                return date.getHours();
            case 'mm':
                return pad(date.getMinutes(), 2);
            case 'm':
                return date.getMinutes();
            case 'ss':
                return pad(date.getSeconds(), 2);
            case 's':
                return date.getSeconds();
            default:
                return val;
        }
    }

    var ds = format.split(/( |,|:)/g);
    var newFormat = '';
    for (var i = 0; i < ds.length; i++) {
        newFormat += replace(ds[i], dt);
    }
    return newFormat;
}

var a = "2016-08-22T16:02:05.645Z";
var d = new Date(Date.parse(a));
// var d = new Date();
console.log(formatDate(d, 'd de MMMM, de YYYY H:mm'));

#1


57  

Use positive lookahead so that the regular expression asserts that the special character exists, but does not actually match it:

使用正向前视,使正则表达式断言特殊字符存在,但实际上不匹配:

string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g);

See it in action.

看它的实际应用。

Update: fixed typo (moved literal ; inside lookahead parens)

更新:固定排版(移动文字;里超前括号)

#2


96  

I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.

我遇到了类似但略有不同的问题。不管怎么说,这里有三个不同的例子,说明了在哪里保留减震器。

"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]

Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:

警告:第四个将只对单个字符有效。ConnorsFan提出一个替代:

// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);

#3


29  

If you wrap the delimiter in parantheses it will be part of the returned array.

如果您将分隔符包装成偏旁,它将是返回数组的一部分。

string.split(/(<br \/>&#?[a-zA-Z0-9]+);/g);
// returns ["aaaaaa", "<br />&dagger;", "bbbb", "<br />&Dagger;", "cccc"]

Depending on which part you want to keep change which subgroup you match

根据您想要保持的部分更改您匹配的子组

string.split(/(<br \/>)&#?[a-zA-Z0-9]+;/g);
// returns ["aaaaaa", "<br />", "bbbb", "<br />", "cccc"]

You could improve the expression by ignoring the case of letters string.split(/()&#?[a-z0-9]+;/gi);

您可以通过忽略字母字符串的情况来改进表达式。

And you can match for predefined groups like this: \d equals [0-9] and \w equals [a-zA-Z0-9_]. This means your expression could look like this.

你可以匹配预定义的组,比如:\d =[0-9]和\w = [a-zA-Z0-9_]。这意味着你的表达式可以是这样的。

string.split(/<br \/>(&#?[a-z\d]+;)/gi);

There is a good Regular Expression Reference on JavaScriptKit.

在JavaScriptKit上有一个很好的正则表达式引用。

#4


3  

answered it here also JavaScript Split Regular Expression keep the delimiter

这里也回答了JavaScript拆分正则表达式保持分隔符

use the (?=pattern) lookahead pattern in the regex example

在regex示例中使用(?=pattern)前瞻性模式

var string = '500x500-11*90~1+1';
string = string.replace(/(?=[$-/:-?{-~!"^_`\[\]])/gi, ",");
string = string.split(",");

this will give you the following result.

这会得到以下结果。

[ '500x500', '-11', '*90', '~1', '+1' ]

Can also be directly split

还可以直接拆分吗?

string = string.split(/(?=[$-/:-?{-~!"^_`\[\]])/gi);

giving the same result

给出相同的结果

[ '500x500', '-11', '*90', '~1', '+1' ]

#5


0  

An extension function splits string with substring or RegEx and the delimiter is putted according to second parameter ahead or behind.

扩展函数用子字符串或正则表达式拆分字符串,并根据前面或后面的第二个参数对分隔符进行处理。

    String.prototype.splitKeep = function (splitter, ahead) {
        var self = this;
        var result = [];
        if (splitter != '') {
            var matches = [];
            // Getting mached value and its index
            var replaceName = splitter instanceof RegExp ? "replace" : "replaceAll";
            var r = self[replaceName](splitter, function (m, i, e) {
                matches.push({ value: m, index: i });
                return getSubst(m);
            });
            // Finds split substrings
            var lastIndex = 0;
            for (var i = 0; i < matches.length; i++) {
                var m = matches[i];
                var nextIndex = ahead == true ? m.index : m.index + m.value.length;
                if (nextIndex != lastIndex) {
                    var part = self.substring(lastIndex, nextIndex);
                    result.push(part);
                    lastIndex = nextIndex;
                }
            };
            if (lastIndex < self.length) {
                var part = self.substring(lastIndex, self.length);
                result.push(part);
            };
            // Substitution of matched string
            function getSubst(value) {
                var substChar = value[0] == '0' ? '1' : '0';
                var subst = '';
                for (var i = 0; i < value.length; i++) {
                    subst += substChar;
                }
                return subst;
            };
        }
        else {
            result.add(self);
        };
        return result;
    };

The test:

测试:

    test('splitKeep', function () {
        // String
        deepEqual("1231451".splitKeep('1'), ["1", "231", "451"]);
        deepEqual("123145".splitKeep('1', true), ["123", "145"]);
        deepEqual("1231451".splitKeep('1', true), ["123", "145", "1"]);
        deepEqual("hello man how are you!".splitKeep(' '), ["hello ", "man ", "how ", "are ", "you!"]);
        deepEqual("hello man how are you!".splitKeep(' ', true), ["hello", " man", " how", " are", " you!"]);
        // Regex
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g), ["m", "hellomm", "hellommm", "hello"]);
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g, true), ["mhello", "mmhello", "mmmhello"]);
    });

#6


0  

I've been using this:

我一直用这个:

String.prototype.splitBy = function (delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');

  return this.split(delimiterRE).reduce((chunks, item) => {
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

Except that you shouldn't mess with String.prototype, so here's a function version:

除了你不应该弄乱字符串。原型,这是一个函数版本

var splitBy = function (text, delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');

  return text.split(delimiterRE).reduce(function(chunks, item){
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

So you could do:

所以你可以做的:

var haystack = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc"
var needle =  '<br \/>&#?[a-zA-Z0-9]+;';
var result = splitBy(string, haystack)
console.log( JSON.stringify( result, null, 2) )

And you'll end up with:

你会得到:

[
  "<br />&dagger; bbbb",
  "<br />&Dagger; cccc"
]

#7


0  

function formatDate(dt, format) {
    var monthNames = [
        "Enero", "Febrero", "Marzo",
        "Abril", "Mayo", "Junio", "Julio",
        "Agosto", "Setiembre", "Octubre",
        "Noviembre", "Diciembre"
    ];
    var Days = [
        "Domingo", "Lunes", "Martes", "Miercoles",
        "Jueves", "Viernes", "Sabado"
    ];

    function pad(n, width, z) {
        z = z || '0';
        n = n + '';
        return n.length >= width ? n : new Array(width - n.length + 1).join(z) + n;
    }

    function replace(val, date) {
        switch (val) {
            case 'yyyy':
                return date.getFullYear();
            case 'YYYY':
                return date.getFullYear();
            case 'yy':
                return (date.getFullYear() + "").substring(2);
            case 'YY':
                return (date.getFullYear() + "").substring(2);
            case 'MMMM':
                return monthNames[date.getMonth()];
            case 'MMM':
                return monthNames[date.getMonth()].substring(0, 3);
            case 'MM':
                return pad(date.getMonth() + 1, 2);
            case 'M':
                return date.getMonth() + 1;
            case 'dd':
                return pad(date.getDate(), 2);
            case 'd':
                return date.getDate();
            case 'DD':
                return Days[date.getDay()];
            case 'D':
                return Days[date.getDay()].substring(0, 3);
            case 'HH':
                return pad(date.getHours(), 2);
            case 'H':
                return date.getHours();
            case 'mm':
                return pad(date.getMinutes(), 2);
            case 'm':
                return date.getMinutes();
            case 'ss':
                return pad(date.getSeconds(), 2);
            case 's':
                return date.getSeconds();
            default:
                return val;
        }
    }

    var ds = format.split(/( |,|:)/g);
    var newFormat = '';
    for (var i = 0; i < ds.length; i++) {
        newFormat += replace(ds[i], dt);
    }
    return newFormat;
}

var a = "2016-08-22T16:02:05.645Z";
var d = new Date(Date.parse(a));
// var d = new Date();
console.log(formatDate(d, 'd de MMMM, de YYYY H:mm'));