I have some text (meaningful text or arithmetical expression) and I want to split it into words.
If I had a single delimiter, I'd use:
我有一些文本(有意义的文本或算术表达式),我想把它分成文字。如果我有一个分隔符,我会使用:
std::stringstream stringStream(inputString);
std::string word;
while(std::getline(stringStream, word, delimiter))
{
wordVector.push_back(word);
}
How can I break the string into tokens with several delimiters?
如何用几个分隔符将字符串分解成令牌?
4 个解决方案
#1
39
Assuming one of the delimiters is newline, the following reads the line and further splits it by the delimiters. For this example I've chosen the delimiters space, apostrophe, and semi-colon.
假设其中一个分隔符是newline,下面的内容将读取该行并通过分隔符进一步分割该行。在本例中,我选择了分隔符空间、撇号和分号。
std::stringstream stringStream(inputString);
std::string line;
while(std::getline(stringStream, line))
{
std::size_t prev = 0, pos;
while ((pos = line.find_first_of(" ';", prev)) != std::string::npos)
{
if (pos > prev)
wordVector.push_back(line.substr(prev, pos-prev));
prev = pos+1;
}
if (prev < line.length())
wordVector.push_back(line.substr(prev, std::string::npos));
}
#2
17
If you have boost, you could use:
如果你有boost,你可以使用:
#include <boost/algorithm/string.hpp>
std::string inputString("One!Two,Three:Four");
std::string delimiters("|,:");
std::vector<std::string> parts;
boost::split(parts, inputString, boost::is_any_of(delimiters));
#3
3
I don't know why nobody pointed out the manual way, but here it is:
我不知道为什么没有人指出手动方式,但这里是:
const std::string delims(";,:. \n\t");
inline bool isDelim(char c) {
for (int i = 0; i < delims.size(); ++i)
if (delims[i] == c)
return true;
return false;
}
and in function:
和功能:
std::stringstream stringStream(inputString);
std::string word; char c;
while (stringStream) {
word.clear();
// Read word
while (!isDelim((c = stringStream.get())))
word.push_back(c);
if (c != EOF)
stringStream.unget();
wordVector.push_back(word);
// Read delims
while (isDelim((c = stringStream.get())));
if (c != EOF)
stringStream.unget();
}
This way you can do something useful with the delims if you want.
这样,如果你想的话,你可以用购物单做一些有用的事情。
#4
0
If you interesting in how to do it yourself and not using boost.
如果你对如何自己做而不使用boost感兴趣的话。
Assuming the delimiter string may be very long - let say M, checking for every char in your string if it is a delimiter, would cost O(M) each, so doing so in a loop for all chars in your original string, let say in length N, is O(M*N).
假设分隔符字符串可能很长——假设是M,检查字符串中的每个字符(如果它是分隔符),每个字符的代价是O(M),因此,对原始字符串中的所有字符进行循环,比如长度为N的字符,就是O(M*N)。
I would use a dictionary (like a map - "delimiter" to "booleans" - but here I would use a simple boolean array that has true in index = ascii value for each delimiter).
我将使用字典(就像映射——“分隔符”到“布尔值”——但是这里我将使用一个简单的布尔数组,它在index = ascii值中为每个分隔符赋值为true)。
Now iterating on the string and check if the char is a delimiter is O(1), which eventually gives us O(N) overall.
现在对字符串进行迭代,检查字符是否是分隔符O(1),这最终会得到O(N)。
Here is my sample code:
这是我的示例代码:
const int dictSize = 256;
vector<string> tokenizeMyString(const string &s, const string &del)
{
static bool dict[dictSize] = { false};
vector<string> res;
for (int i = 0; i < del.size(); ++i) {
dict[del[i]] = true;
}
string token("");
for (auto &i : s) {
if (dict[i]) {
if (!token.empty()) {
res.push_back(token);
token.clear();
}
}
else {
token += i;
}
}
if (!token.empty()) {
res.push_back(token);
}
return res;
}
int main()
{
string delString = "MyDog:Odie, MyCat:Garfield MyNumber:1001001";
//the delimiters are " " (space) and "," (comma)
vector<string> res = tokenizeMyString(delString, " ,");
for (auto &i : res) {
cout << "token: " << i << endl;
}
return 0;
}
Note: tokenizeMyString returns vector by value and create it on the stack first, so we're using here the power of the compiler >>> RVO - return value optimization :)
注意:tokenizeMyString按值返回向量并首先在堆栈上创建它,因此我们在这里使用编译器>>> RVO的强大功能——返回值优化:)
#1
39
Assuming one of the delimiters is newline, the following reads the line and further splits it by the delimiters. For this example I've chosen the delimiters space, apostrophe, and semi-colon.
假设其中一个分隔符是newline,下面的内容将读取该行并通过分隔符进一步分割该行。在本例中,我选择了分隔符空间、撇号和分号。
std::stringstream stringStream(inputString);
std::string line;
while(std::getline(stringStream, line))
{
std::size_t prev = 0, pos;
while ((pos = line.find_first_of(" ';", prev)) != std::string::npos)
{
if (pos > prev)
wordVector.push_back(line.substr(prev, pos-prev));
prev = pos+1;
}
if (prev < line.length())
wordVector.push_back(line.substr(prev, std::string::npos));
}
#2
17
If you have boost, you could use:
如果你有boost,你可以使用:
#include <boost/algorithm/string.hpp>
std::string inputString("One!Two,Three:Four");
std::string delimiters("|,:");
std::vector<std::string> parts;
boost::split(parts, inputString, boost::is_any_of(delimiters));
#3
3
I don't know why nobody pointed out the manual way, but here it is:
我不知道为什么没有人指出手动方式,但这里是:
const std::string delims(";,:. \n\t");
inline bool isDelim(char c) {
for (int i = 0; i < delims.size(); ++i)
if (delims[i] == c)
return true;
return false;
}
and in function:
和功能:
std::stringstream stringStream(inputString);
std::string word; char c;
while (stringStream) {
word.clear();
// Read word
while (!isDelim((c = stringStream.get())))
word.push_back(c);
if (c != EOF)
stringStream.unget();
wordVector.push_back(word);
// Read delims
while (isDelim((c = stringStream.get())));
if (c != EOF)
stringStream.unget();
}
This way you can do something useful with the delims if you want.
这样,如果你想的话,你可以用购物单做一些有用的事情。
#4
0
If you interesting in how to do it yourself and not using boost.
如果你对如何自己做而不使用boost感兴趣的话。
Assuming the delimiter string may be very long - let say M, checking for every char in your string if it is a delimiter, would cost O(M) each, so doing so in a loop for all chars in your original string, let say in length N, is O(M*N).
假设分隔符字符串可能很长——假设是M,检查字符串中的每个字符(如果它是分隔符),每个字符的代价是O(M),因此,对原始字符串中的所有字符进行循环,比如长度为N的字符,就是O(M*N)。
I would use a dictionary (like a map - "delimiter" to "booleans" - but here I would use a simple boolean array that has true in index = ascii value for each delimiter).
我将使用字典(就像映射——“分隔符”到“布尔值”——但是这里我将使用一个简单的布尔数组,它在index = ascii值中为每个分隔符赋值为true)。
Now iterating on the string and check if the char is a delimiter is O(1), which eventually gives us O(N) overall.
现在对字符串进行迭代,检查字符是否是分隔符O(1),这最终会得到O(N)。
Here is my sample code:
这是我的示例代码:
const int dictSize = 256;
vector<string> tokenizeMyString(const string &s, const string &del)
{
static bool dict[dictSize] = { false};
vector<string> res;
for (int i = 0; i < del.size(); ++i) {
dict[del[i]] = true;
}
string token("");
for (auto &i : s) {
if (dict[i]) {
if (!token.empty()) {
res.push_back(token);
token.clear();
}
}
else {
token += i;
}
}
if (!token.empty()) {
res.push_back(token);
}
return res;
}
int main()
{
string delString = "MyDog:Odie, MyCat:Garfield MyNumber:1001001";
//the delimiters are " " (space) and "," (comma)
vector<string> res = tokenizeMyString(delString, " ,");
for (auto &i : res) {
cout << "token: " << i << endl;
}
return 0;
}
Note: tokenizeMyString returns vector by value and create it on the stack first, so we're using here the power of the compiler >>> RVO - return value optimization :)
注意:tokenizeMyString按值返回向量并首先在堆栈上创建它,因此我们在这里使用编译器>>> RVO的强大功能——返回值优化:)