将两个字符的字符串转换成布尔数组的快速方法是什么?

时间:2022-12-22 21:40:56

I have a long string (sometimes over 1000 characters) that I want to convert to an array of boolean values. And it needs to do this many times, very quickly.

我有一个很长的字符串(有时超过1000个字符),我想把它转换成一个布尔值数组。它需要这样做很多次,非常快。

let input: String = "001"
let output: [Bool] = [false, false, true]

My naive attempt was this:

我天真的尝试是:

input.characters.map { $0 == "1" }

But this is a lot slower than I'd like. My profiling has shown me that the map is where the slowdown is, but I'm not sure how much simpler I can make that.

但这比我想要的要慢得多。我的分析显示,地图是减速的地方,但我不确定我能做得多简单。

I feel like this would be wicked fast without Swift's/ObjC's overhead. In C, I think this is a simple for loop where a byte of memory is compared to a constant, but I'm not sure what the functions or syntax is that I should be looking at.

我觉得如果没有斯威夫特的/ObjC的管理,这将是非常邪恶的。在C语言中,我认为这是一个简单的for循环,它将一个字节的内存与一个常量进行比较,但是我不确定应该查看什么函数或语法。

Is there a way to do this much faster?

有没有一种更快的方法呢?

UPDATE:

更新:

I also tried a

我也尝试了

output = []
for char in input.characters {
    output.append(char == "1")
}

And it's about 15% faster. I'm hoping for a lot more than that.

而且速度快了15%。我希望能有更多。

8 个解决方案

#1


12  

This is faster:

这是更快:

// Algorithm 'A'
let input = "0101010110010101010"
var output = Array<Bool>(count: input.characters.count, repeatedValue: false)
for (index, char) in input.characters.enumerate() where char == "1" {
    output[index] = true
}

Update: under input = "010101011010101001000100000011010101010101010101"

更新:输入以下= " 010101010110101010010001000000110101010101010101010101 "

0.0741 / 0.0087, where this approach is faster that author's in 8.46 times. With bigger data correlation more positive.

0。0741 / 0。0087,这个方法比作者快8。46倍。数据相关性越大越好。

Also, with using nulTerminatedUTF8 speed a little increased, but not always speed higher than algorithm A:

同样,使用nulTerminatedUTF8速度会稍微增加一点,但并不总是比算法a的速度快:

// Algorithm 'B'
let input = "10101010101011111110101000010100101001010101"
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
for (index, code) in input.nulTerminatedUTF8.enumerate() where code == 49 {
    output[index] = true
}

In result graph appears, with input length 2196, where first and last 0..1, A – second, B – third point. A: 0.311sec, B: 0.304sec

结果出现了输入长度为2196的图,其中第一个和最后一个为0。1, A -秒,B - 3分。0.311秒,0.304秒

将两个字符的字符串转换成布尔数组的快速方法是什么?

#2


5  

import Foundation

let input:String = "010101011001010101001010101100101010100101010110010101010101011001010101001010101100101010100101010101011001010101001010101100101010100101010"
var start  = clock()
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
var index = 0
for val in input.nulTerminatedUTF8 {
    if val != 49 {
        output[index] = true
    }
    index+=1
}
var diff = clock() - start;
var msec = diff * 1000 / UInt(CLOCKS_PER_SEC);
print("Time taken \(Double(msec)/1000.0) seconds \(msec%1000) milliseconds");

This should be really fast. Try it out. For 010101011010101001000100000011010101010101010101 it takes 0.039 secs.

这应该很快。试一下。010101010110101010010001000000110101010101010101010101010101需要0.039秒。

#3


1  

I would guess that this is as fast as possible:

我猜这是越快越好:

let targ = Character("1")
let input: String = "001" // your real string goes here
let inputchars = Array(input.characters)
var output:[Bool] = Array.init(count: inputchars.count, repeatedValue: false)
inputchars.withUnsafeBufferPointer {
    inputbuf in
    output.withUnsafeMutableBufferPointer {
        outputbuf in
        var ptr1 = inputbuf.baseAddress
        var ptr2 = outputbuf.baseAddress
        for _ in 0..<inputbuf.count {
            ptr2.memory = ptr1.memory == targ
            ptr1 = ptr1.successor()
            ptr2 = ptr2.successor()
        }
    }
}
// output now contains the result

The reason is that, thanks to the use of buffer pointers, we are simply cycling through contiguous memory, just like the way you cycle through a C array by incrementing its pointer. Thus, once we get past the initial setup, this should be as fast as it would be in C.

原因是,由于使用了缓冲区指针,我们只是在连续内存中循环,就像您在C数组中循环使用指针的方式一样。因此,一旦我们通过了初始设置,它应该和C一样快。

EDIT In an actual test, the time difference between the OP's original method and this one is the difference between

在实际的测试中编辑,OP的原始方法和这个方法之间的时间差是两者之间的差

13.3660290241241

and

0.219357967376709

which is a pretty dramatic speed-up. I hasten to add, however, that I have excluded the initial set-up from the timing test. This line:

这是一个非常戏剧性的加速。不过,我马上补充说,我已经把最初的设置从计时测试中排除了。这条线:

let inputchars = Array(input.characters)

...is particularly expensive.

…尤其昂贵。

#4


1  

This should be a little faster than the enumerate() where char == "1" version (0.557s for 500_000 alternating ones and zeros vs. 1.159s algorithm 'A' from diampiax)

这应该比enumerate()要快一些,其中char ==“1”版本(500_000个交替的1和0与来自diampiax的1.159s算法‘a’对应的是0.557s)

let input = inputStr.utf8
let n = input.count
var output = [Bool](count: n, repeatedValue: false)
let one = UInt8(49) // 1
for (idx, char) in input.enumerate() {
    if char == one { output[idx] = true }
}

but it's also a lot less readable ;-p

但是它的可读性也差很多;-p

edit: both versions are slower than the map variant, maybe you forgot to compile with optimizations?

编辑:两个版本都比map版本慢,也许您忘记使用优化编译了?

#5


0  

One more step should speed that up even more. Using reserveCapacity will resize the array once before the loops starts instead of trying to do it as the loop runs.

再多走一步就会让这一速度进一步加快。使用存储容量将在循环开始之前调整数组的大小,而不是在循环运行时尝试这样做。

var output = [Bool]()
output.reserveCapacity(input.characters.count)
for char in input.characters {
    output.append(char == "1")
}

#6


0  

Use withCString(_:) to retrieve a raw UnsafePointer<Int8>. Iterate over that and compare to 49 (ascii value of "1").

使用cstring(_:)检索原始UnsafePointer 。对其进行迭代并与49 (ascii值为“1”)进行比较。

#7


0  

What about a more functional style? It's not fastest (47 ms), today, for sure...

更实用的风格呢?它不是最快的(47毫秒),当然,今天……

import Cocoa

let start = clock()

let bools = [Bool](([Character] ("010101011001010101001010101100101010100101010110010101010101011001010101001010101100101010100101010101011001010101001010101100101010100101010".characters)).map({$0 == "1"}))

let msec = (clock() - start) * 1000 / UInt(CLOCKS_PER_SEC);
print("Time taken \(Double(msec)/1000.0) seconds \(msec%1000) milliseconds");

#8


0  

I need to some testing to be sure but I think one issue with many approaches given including the original map is that they need to iterate over the string to count the characters and then a second time to actually process the characters.

我需要做一些测试来确定,但是我认为包括原始映射在内的许多方法存在的一个问题是,它们需要遍历字符串来计数字符,然后第二次实际处理字符。

Have you tried:

你有试过:

let output = [Bool](input.characters.lazy.map { $0 == "1" })

This might only do a single iteration.

这可能只执行一次迭代。

The other thing that could speed things up is if you can avoid using strings but instead use arrays of characters of an appropriate encoding (particularly if is more fixed size units (e.g. UTF16 or ASCII). Then then length lookup will be O(1) rather than O(n) and the iteration may be faster too

另一个可以加快速度的事情是,如果你可以避免使用字符串,而是使用适当编码的字符数组(特别是如果是更固定大小的单位(如UTF16或ASCII)。然后,长度查找将是O(1)而不是O(n),迭代也可能更快

BTW always test performance with the optimiser enabled and never in the Playground because the performance characteristics are completely different, sometimes by a factor of 100.

BTW总是在启用了optimiser的情况下测试性能,而不是在操场上,因为性能特征是完全不同的,有时是100倍。

#1


12  

This is faster:

这是更快:

// Algorithm 'A'
let input = "0101010110010101010"
var output = Array<Bool>(count: input.characters.count, repeatedValue: false)
for (index, char) in input.characters.enumerate() where char == "1" {
    output[index] = true
}

Update: under input = "010101011010101001000100000011010101010101010101"

更新:输入以下= " 010101010110101010010001000000110101010101010101010101 "

0.0741 / 0.0087, where this approach is faster that author's in 8.46 times. With bigger data correlation more positive.

0。0741 / 0。0087,这个方法比作者快8。46倍。数据相关性越大越好。

Also, with using nulTerminatedUTF8 speed a little increased, but not always speed higher than algorithm A:

同样,使用nulTerminatedUTF8速度会稍微增加一点,但并不总是比算法a的速度快:

// Algorithm 'B'
let input = "10101010101011111110101000010100101001010101"
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
for (index, code) in input.nulTerminatedUTF8.enumerate() where code == 49 {
    output[index] = true
}

In result graph appears, with input length 2196, where first and last 0..1, A – second, B – third point. A: 0.311sec, B: 0.304sec

结果出现了输入长度为2196的图,其中第一个和最后一个为0。1, A -秒,B - 3分。0.311秒,0.304秒

将两个字符的字符串转换成布尔数组的快速方法是什么?

#2


5  

import Foundation

let input:String = "010101011001010101001010101100101010100101010110010101010101011001010101001010101100101010100101010101011001010101001010101100101010100101010"
var start  = clock()
var output = Array<Bool>(count: input.nulTerminatedUTF8.count, repeatedValue: false)
var index = 0
for val in input.nulTerminatedUTF8 {
    if val != 49 {
        output[index] = true
    }
    index+=1
}
var diff = clock() - start;
var msec = diff * 1000 / UInt(CLOCKS_PER_SEC);
print("Time taken \(Double(msec)/1000.0) seconds \(msec%1000) milliseconds");

This should be really fast. Try it out. For 010101011010101001000100000011010101010101010101 it takes 0.039 secs.

这应该很快。试一下。010101010110101010010001000000110101010101010101010101010101需要0.039秒。

#3


1  

I would guess that this is as fast as possible:

我猜这是越快越好:

let targ = Character("1")
let input: String = "001" // your real string goes here
let inputchars = Array(input.characters)
var output:[Bool] = Array.init(count: inputchars.count, repeatedValue: false)
inputchars.withUnsafeBufferPointer {
    inputbuf in
    output.withUnsafeMutableBufferPointer {
        outputbuf in
        var ptr1 = inputbuf.baseAddress
        var ptr2 = outputbuf.baseAddress
        for _ in 0..<inputbuf.count {
            ptr2.memory = ptr1.memory == targ
            ptr1 = ptr1.successor()
            ptr2 = ptr2.successor()
        }
    }
}
// output now contains the result

The reason is that, thanks to the use of buffer pointers, we are simply cycling through contiguous memory, just like the way you cycle through a C array by incrementing its pointer. Thus, once we get past the initial setup, this should be as fast as it would be in C.

原因是,由于使用了缓冲区指针,我们只是在连续内存中循环,就像您在C数组中循环使用指针的方式一样。因此,一旦我们通过了初始设置,它应该和C一样快。

EDIT In an actual test, the time difference between the OP's original method and this one is the difference between

在实际的测试中编辑,OP的原始方法和这个方法之间的时间差是两者之间的差

13.3660290241241

and

0.219357967376709

which is a pretty dramatic speed-up. I hasten to add, however, that I have excluded the initial set-up from the timing test. This line:

这是一个非常戏剧性的加速。不过,我马上补充说,我已经把最初的设置从计时测试中排除了。这条线:

let inputchars = Array(input.characters)

...is particularly expensive.

…尤其昂贵。

#4


1  

This should be a little faster than the enumerate() where char == "1" version (0.557s for 500_000 alternating ones and zeros vs. 1.159s algorithm 'A' from diampiax)

这应该比enumerate()要快一些,其中char ==“1”版本(500_000个交替的1和0与来自diampiax的1.159s算法‘a’对应的是0.557s)

let input = inputStr.utf8
let n = input.count
var output = [Bool](count: n, repeatedValue: false)
let one = UInt8(49) // 1
for (idx, char) in input.enumerate() {
    if char == one { output[idx] = true }
}

but it's also a lot less readable ;-p

但是它的可读性也差很多;-p

edit: both versions are slower than the map variant, maybe you forgot to compile with optimizations?

编辑:两个版本都比map版本慢,也许您忘记使用优化编译了?

#5


0  

One more step should speed that up even more. Using reserveCapacity will resize the array once before the loops starts instead of trying to do it as the loop runs.

再多走一步就会让这一速度进一步加快。使用存储容量将在循环开始之前调整数组的大小,而不是在循环运行时尝试这样做。

var output = [Bool]()
output.reserveCapacity(input.characters.count)
for char in input.characters {
    output.append(char == "1")
}

#6


0  

Use withCString(_:) to retrieve a raw UnsafePointer<Int8>. Iterate over that and compare to 49 (ascii value of "1").

使用cstring(_:)检索原始UnsafePointer 。对其进行迭代并与49 (ascii值为“1”)进行比较。

#7


0  

What about a more functional style? It's not fastest (47 ms), today, for sure...

更实用的风格呢?它不是最快的(47毫秒),当然,今天……

import Cocoa

let start = clock()

let bools = [Bool](([Character] ("010101011001010101001010101100101010100101010110010101010101011001010101001010101100101010100101010101011001010101001010101100101010100101010".characters)).map({$0 == "1"}))

let msec = (clock() - start) * 1000 / UInt(CLOCKS_PER_SEC);
print("Time taken \(Double(msec)/1000.0) seconds \(msec%1000) milliseconds");

#8


0  

I need to some testing to be sure but I think one issue with many approaches given including the original map is that they need to iterate over the string to count the characters and then a second time to actually process the characters.

我需要做一些测试来确定,但是我认为包括原始映射在内的许多方法存在的一个问题是,它们需要遍历字符串来计数字符,然后第二次实际处理字符。

Have you tried:

你有试过:

let output = [Bool](input.characters.lazy.map { $0 == "1" })

This might only do a single iteration.

这可能只执行一次迭代。

The other thing that could speed things up is if you can avoid using strings but instead use arrays of characters of an appropriate encoding (particularly if is more fixed size units (e.g. UTF16 or ASCII). Then then length lookup will be O(1) rather than O(n) and the iteration may be faster too

另一个可以加快速度的事情是,如果你可以避免使用字符串,而是使用适当编码的字符数组(特别是如果是更固定大小的单位(如UTF16或ASCII)。然后,长度查找将是O(1)而不是O(n),迭代也可能更快

BTW always test performance with the optimiser enabled and never in the Playground because the performance characteristics are completely different, sometimes by a factor of 100.

BTW总是在启用了optimiser的情况下测试性能,而不是在操场上,因为性能特征是完全不同的,有时是100倍。