在lua中使用指定的分隔符拆分字符串

时间:2021-12-30 21:35:19

I'm trying to create a split() function in lua with delimiter by choice, when the default is space. the default is working fine. The problem starts when I give a delimiter to the function. For some reason it doesn't return the last sub string. The function:

我正在尝试在lua中使用分隔符创建一个split()函数,当默认为空格时。默认工作正常。当我给函数分配符时,问题就开始了。由于某种原因,它不会返回最后一个子字符串。功能:

function split(str,sep)
if sep == nil then
    words = {}
    for word in str:gmatch("%w+") do table.insert(words, word) end
    return words
end
return {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))} -- BUG!! doesnt return last value
end

I try to run this:

我尝试运行这个:

local str = "a,b,c,d,e,f,g"
local sep = ","
t = split(str,sep)
for i,j in ipairs(t) do
    print(i,j)
end

and I get:

我得到:

1   a
2   b
3   c
4   d
5   e
6   f

Can't figure out where the bug is...

无法弄清楚bug在哪里......

3 个解决方案

#1


5  

When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:

分割字符串时,避免极端情况的最简单方法是将分隔符附加到字符串,当您知道字符串不能以分隔符结束时:

str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end

Alternatively, you can use a pattern with an optional delimiter:

或者,您可以使用带有可选分隔符的模式:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end

Actually, we don't need the optional delimiter since we're capturing non-delimiters:

实际上,我们不需要可选的分隔符,因为我们正在捕获非分隔符:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end

#2


0  

Here's the split function I usually use for all my "splitting" needs:

这是我通常用于满足所有“分裂”需求的分割功能:

function split(s, sep)
    local fields = {}

    local sep = sep or " "
    local pattern = string.format("([^%s]+)", sep)
    string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)

    return fields
end

t = split("a,b,c,d,e,f,g",",")
for i,j in pairs(t) do
    print(i,j)
end

#3


0  

"[^"..sep.."]*"..sep This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g) is not followed by the separator character.

“[^”.. sep ..“] *”.. sep这是导致问题的原因。您正在匹配一个字符串,这些字符不是分隔符后跟分隔符。但是,要匹配的最后一个子字符串(g)后面没有分隔符。

The quickest way to fix this is to also consider \0 a separator ("[^"..sep.."\0]*"..sep), as it represents the beginning and/or the end of the string. This way, g, which is not followed by a separator but by the end of the string would still be considered a match.

解决此问题的最快方法是同时考虑\ 0分隔符(“[^”.. sep ..“\ 0] *”.. sep),因为它表示字符串的开头和/或结尾。这样,g,后面没有分隔符,但是字符串的末尾仍然被认为是匹配。

I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for-loop using the gmatch function

我说你的方法一般过于复杂;首先,您可以匹配不包含分隔符的各个子字符串;其次,您可以使用gmatch函数在for循环中执行此操作

local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
  table.insert(result, field)
end
return result

EDIT: The above code made a bit more simple:

编辑:上面的代码更简单:

local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)

EDIT2: Keep in mind that you should also escape your separators. A separator like % could cause problems if you don't escape it as %%

编辑2:请记住,你也应该逃脱你的分隔符。像%一样的分隔符可能会导致问题,如果您不以%%的形式将其转义

function escape(str)
  return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end

#1


5  

When splitting strings, the easiest way to avoid corner cases is to append the delimiter to the string, when you know the string cannot end with the delimiter:

分割字符串时,避免极端情况的最简单方法是将分隔符附加到字符串,当您知道字符串不能以分隔符结束时:

str = "a,b,c,d,e,f,g"
str = str .. ','
for w in str:gmatch("(.-),") do print(w) end

Alternatively, you can use a pattern with an optional delimiter:

或者,您可以使用带有可选分隔符的模式:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+),?") do print(w) end

Actually, we don't need the optional delimiter since we're capturing non-delimiters:

实际上,我们不需要可选的分隔符,因为我们正在捕获非分隔符:

str = "a,b,c,d,e,f,g"
for w in str:gmatch("([^,]+)") do print(w) end

#2


0  

Here's the split function I usually use for all my "splitting" needs:

这是我通常用于满足所有“分裂”需求的分割功能:

function split(s, sep)
    local fields = {}

    local sep = sep or " "
    local pattern = string.format("([^%s]+)", sep)
    string.gsub(s, pattern, function(c) fields[#fields + 1] = c end)

    return fields
end

t = split("a,b,c,d,e,f,g",",")
for i,j in pairs(t) do
    print(i,j)
end

#3


0  

"[^"..sep.."]*"..sep This is what causes the problem. You are matching a string of characters which are not the separator followed by the separator. However, the last substring you want to match (g) is not followed by the separator character.

“[^”.. sep ..“] *”.. sep这是导致问题的原因。您正在匹配一个字符串,这些字符不是分隔符后跟分隔符。但是,要匹配的最后一个子字符串(g)后面没有分隔符。

The quickest way to fix this is to also consider \0 a separator ("[^"..sep.."\0]*"..sep), as it represents the beginning and/or the end of the string. This way, g, which is not followed by a separator but by the end of the string would still be considered a match.

解决此问题的最快方法是同时考虑\ 0分隔符(“[^”.. sep ..“\ 0] *”.. sep),因为它表示字符串的开头和/或结尾。这样,g,后面没有分隔符,但是字符串的末尾仍然被认为是匹配。

I'd say your approach is overly complicated in general; first of all you can just match individual substrings that do not contain the separator; secondly you can do this in a for-loop using the gmatch function

我说你的方法一般过于复杂;首先,您可以匹配不包含分隔符的各个子字符串;其次,您可以使用gmatch函数在for循环中执行此操作

local result = {}
for field in your_string:gsub(("[^%s]+"):format(your_separator)) do
  table.insert(result, field)
end
return result

EDIT: The above code made a bit more simple:

编辑:上面的代码更简单:

local pattern = "[^%" .. your_separator .. "]+"
for field in string.gsub(your_string, pattern) do
-- ...and so on (The rest should be easy enough to understand)

EDIT2: Keep in mind that you should also escape your separators. A separator like % could cause problems if you don't escape it as %%

编辑2:请记住,你也应该逃脱你的分隔符。像%一样的分隔符可能会导致问题,如果您不以%%的形式将其转义

function escape(str)
  return str:gsub("([%^%$%(%)%%%.%[%]%*%+%-%?])", "%%%1")
end

相关文章