I alredy parsed file and split content to enum or enum classes.
我alredy解析文件并将内容拆分为枚举或枚举类。
std::string sourceString = readFromFile(typesHDestination);
boost::smatch xResults;
std::string::const_iterator Start = sourceString.cbegin();
std::string::const_iterator End = sourceString.cend();
while (boost::regex_search(Start, End, xResults, boost::regex("(?<data_type>enum|enum\\s+class)\\s+(?<enum_name>\\w+)\\s*\{(?<content>[^\}]+?)\\s*\}\\s*")))
{
std::cout << xResults["data_type"]
<< " " << xResults["enum_name"] << "\n{\n";
std::string::const_iterator ContentStart = xResults["content"].begin();
std::string::const_iterator ContentEnd = xResults["content"].end();
boost::smatch xResultsInner;
while (boost::regex_search(ContentStart, ContentEnd, xResultsInner, boost::regex("(?<name>\\w+)(?:(?:\\s*=\\s*(?<value>[^\,\\s]+)(?:(?:,)|(?:\\s*)))|(?:(?:\\s*)|(?:,)))")))
{
std::cout << xResultsInner["name"] << ": " << xResultsInner["value"] << std::endl;
ContentStart = xResultsInner[0].second;
}
Start = xResults[0].second;
std::cout << "}\n";
}
Its ok if enums are without comments.
如果枚举没有评论就可以了。
I tried to add named group <comment>
to save comments in enums, but failed every time. (\/{2}\s*.+)
- sample for comments with double slashes.
我尝试添加命名组
I tested using this online regex and with boost::regex.
我使用这个在线正则表达式和boost :: regex进行了测试。
- The first step - from *.cpp file to
<data_type> <enum_name> <content>
regex:
第一步 - 从* .cpp文件到
(?'data_type'enum|enum\s+class)\s+(?'enum_name'\w+)\s*{\s*(?'content'[^}]+?)\s*}\s*
- From
<content>
to<name> <value> <comment>
regex:
从
(?'name'\w+)(?:(?:\s*=\s*(?'value'[^\,\s/]+)(?:(?:,)|(?:\s*)))|(?:(?:\s*)|(?:,)))
The last one contains error. Is there any way to fix it and add feature to store coments in group?
最后一个包含错误。有没有办法解决它并添加功能来存储组中的coments?
2 个解决方案
#1
0
As some comments said, may it is not a good idea to parse a source file with Regular Expression except with some simple cases
正如一些评论所说,使用正则表达式解析源文件可能不是一个好主意,除非有一些简单的情况
for example this source file, from: http://en.cppreference.com/w/cpp/language/enum
例如,此源文件来自:http://en.cppreference.com/w/cpp/language/enum
#include <iostream>
// enum that takes 16 bits
enum smallenum: int16_t
{
a,
b,
c
};
// color may be red (value 0), yellow (value 1), green (value 20), or blue (value 21)
enum color
{
red,
yellow,
green = 20,
blue
};
// altitude may be altitude::high or altitude::low
enum class altitude: char
{
high='h',
low='l', // C++11 allows the extra comma
};
// the constant d is 0, the constant e is 1, the constant f is 3
enum
{
d,
e,
f = e + 2
};
//enumeration types (both scoped and unscoped) can have overloaded operators
std::ostream& operator<<(std::ostream& os, color c)
{
switch(c)
{
case red : os << "red"; break;
case yellow: os << "yellow"; break;
case green : os << "green"; break;
case blue : os << "blue"; break;
default : os.setstate(std::ios_base::failbit);
}
return os;
}
std::ostream& operator<<(std::ostream& os, altitude al)
{
return os << static_cast<char>(al);
}
int main()
{
color col = red;
altitude a;
a = altitude::low;
std::cout << "col = " << col << '\n'
<< "a = " << a << '\n'
<< "f = " << f << '\n';
}
the key pattern here is: starting with enum
and end with ;
and you cannot predict any text between enum
and ;
there will be so many possibilities! and for that you can use .*?
lazy star
这里的关键模式是:以enum开头并以end结尾;你无法预测枚举之间的任何文字;会有这么多的可能性!为此,你可以使用。*?懒星
Thus if I want to extract all enums
I use:
因此,如果我想提取我使用的所有枚举:
NOTE: it is not the efficient way
注意:这不是有效的方式
boost::regex rx( "^\\s*(enum.*?;)" );
boost::match_results< std::string::const_iterator > mr; // or boost::smatch
std::ifstream ifs( "file.cpp" );
const uintmax_t file_size = ifs.seekg( 0, std::ios_base::end ).tellg();
ifs.seekg( 0, std::ios_base::beg ); // rewind
std::string whole_file( file_size, ' ' );
ifs.read( &*whole_file.begin(), file_size );
ifs.close();
while( boost::regex_search( whole_file, mr, rx ) ){
std::cout << mr.str( 1 ) << '\n';
whole_file = mr.suffix().str();
}
which the output will be:
输出将是:
enum smallenum: int16_t
{
a,
b,
c
};
enum color
{
red,
yellow,
green = 20,
blue
};
enum class altitude: char
{
high='h',
low='l', // C++11 allows the extra comma
};
enum
{
d,
e,
f = e + 2
};
And Of course for such simple thing I prefer to use:
当然,对于这样简单的事情,我更喜欢使用:
perl -lne '$/=unlef;print $1 while/^\s*(enum.*?;)/smg' file.cpp
that has the same output.
具有相同的输出。
And may this pattern helps you if you want to match each section separately
如果您想分别匹配每个部分,这种模式可能会对您有所帮助
^\s*(enum[^{]*)\s*({)\s*([^}]+)\s*(};)
But again this is not a good idea except for some simple source files. Since C++ Source Code has free style and not all code writers follow the standard rules. For example with the pattern above, I assumed that
(};)
the}
comes with;
and if someone separates them ( which is still a valid code ) the pattern will be failed to match.但是除了一些简单的源文件之外,这不是一个好主意。由于C ++源代码具有*风格,并非所有代码编写者都遵循标准规则。例如,对于上面的模式,我假设}}伴随着(};);如果有人将它们分开(仍然是有效代码),则该模式将无法匹配。
#2
0
I argree with the fact that using regex to parse complicated data is not the best solution. I'v made an omission of the few major conditions. First of all, i parsed some kind of generated source code containing emuns and enum classes. So there were no suprises in code, and code was regular. So i parsing regular code with regex.
我认为使用正则表达式来解析复杂数据并不是最好的解决方案。我忽略了几个主要条件。首先,我解析了一些包含emuns和enum类的生成源代码。所以代码中没有惊喜,而且代码是常规的。所以我使用正则表达式解析常规代码。
The Answer: (the first step is the same, the second was fixed) How to parse enums/emun classes with regex:
答案:(第一步是相同的,第二步是固定的)如何使用正则表达式解析枚举/ emun类:
- The first step - from *.cpp file to
<data_type> <enum_name> <content>
regex:
第一步 - 从* .cpp文件到
(?'data_type'enum|enum\s+class)\s+(?'enum_name'\w+)\s*{\s*(?'content'[^}]+?)\s*}\s*
- From
<content>
to<name> <value> <comment>
regex:
从
^\s*(?'name'\w+)(?:(?:\s*=\s*(?'value'[^,\n/]+))|(?:[^,\s/]))(?:(?:\s$)|(?:\s*,\s*$)|(?:[^/]/{2}\s(?'comment'.*$)))
All test were ok and here is marked text by colors.
所有测试都没问题,这里按颜色标记文字。
#1
0
As some comments said, may it is not a good idea to parse a source file with Regular Expression except with some simple cases
正如一些评论所说,使用正则表达式解析源文件可能不是一个好主意,除非有一些简单的情况
for example this source file, from: http://en.cppreference.com/w/cpp/language/enum
例如,此源文件来自:http://en.cppreference.com/w/cpp/language/enum
#include <iostream>
// enum that takes 16 bits
enum smallenum: int16_t
{
a,
b,
c
};
// color may be red (value 0), yellow (value 1), green (value 20), or blue (value 21)
enum color
{
red,
yellow,
green = 20,
blue
};
// altitude may be altitude::high or altitude::low
enum class altitude: char
{
high='h',
low='l', // C++11 allows the extra comma
};
// the constant d is 0, the constant e is 1, the constant f is 3
enum
{
d,
e,
f = e + 2
};
//enumeration types (both scoped and unscoped) can have overloaded operators
std::ostream& operator<<(std::ostream& os, color c)
{
switch(c)
{
case red : os << "red"; break;
case yellow: os << "yellow"; break;
case green : os << "green"; break;
case blue : os << "blue"; break;
default : os.setstate(std::ios_base::failbit);
}
return os;
}
std::ostream& operator<<(std::ostream& os, altitude al)
{
return os << static_cast<char>(al);
}
int main()
{
color col = red;
altitude a;
a = altitude::low;
std::cout << "col = " << col << '\n'
<< "a = " << a << '\n'
<< "f = " << f << '\n';
}
the key pattern here is: starting with enum
and end with ;
and you cannot predict any text between enum
and ;
there will be so many possibilities! and for that you can use .*?
lazy star
这里的关键模式是:以enum开头并以end结尾;你无法预测枚举之间的任何文字;会有这么多的可能性!为此,你可以使用。*?懒星
Thus if I want to extract all enums
I use:
因此,如果我想提取我使用的所有枚举:
NOTE: it is not the efficient way
注意:这不是有效的方式
boost::regex rx( "^\\s*(enum.*?;)" );
boost::match_results< std::string::const_iterator > mr; // or boost::smatch
std::ifstream ifs( "file.cpp" );
const uintmax_t file_size = ifs.seekg( 0, std::ios_base::end ).tellg();
ifs.seekg( 0, std::ios_base::beg ); // rewind
std::string whole_file( file_size, ' ' );
ifs.read( &*whole_file.begin(), file_size );
ifs.close();
while( boost::regex_search( whole_file, mr, rx ) ){
std::cout << mr.str( 1 ) << '\n';
whole_file = mr.suffix().str();
}
which the output will be:
输出将是:
enum smallenum: int16_t
{
a,
b,
c
};
enum color
{
red,
yellow,
green = 20,
blue
};
enum class altitude: char
{
high='h',
low='l', // C++11 allows the extra comma
};
enum
{
d,
e,
f = e + 2
};
And Of course for such simple thing I prefer to use:
当然,对于这样简单的事情,我更喜欢使用:
perl -lne '$/=unlef;print $1 while/^\s*(enum.*?;)/smg' file.cpp
that has the same output.
具有相同的输出。
And may this pattern helps you if you want to match each section separately
如果您想分别匹配每个部分,这种模式可能会对您有所帮助
^\s*(enum[^{]*)\s*({)\s*([^}]+)\s*(};)
But again this is not a good idea except for some simple source files. Since C++ Source Code has free style and not all code writers follow the standard rules. For example with the pattern above, I assumed that
(};)
the}
comes with;
and if someone separates them ( which is still a valid code ) the pattern will be failed to match.但是除了一些简单的源文件之外,这不是一个好主意。由于C ++源代码具有*风格,并非所有代码编写者都遵循标准规则。例如,对于上面的模式,我假设}}伴随着(};);如果有人将它们分开(仍然是有效代码),则该模式将无法匹配。
#2
0
I argree with the fact that using regex to parse complicated data is not the best solution. I'v made an omission of the few major conditions. First of all, i parsed some kind of generated source code containing emuns and enum classes. So there were no suprises in code, and code was regular. So i parsing regular code with regex.
我认为使用正则表达式来解析复杂数据并不是最好的解决方案。我忽略了几个主要条件。首先,我解析了一些包含emuns和enum类的生成源代码。所以代码中没有惊喜,而且代码是常规的。所以我使用正则表达式解析常规代码。
The Answer: (the first step is the same, the second was fixed) How to parse enums/emun classes with regex:
答案:(第一步是相同的,第二步是固定的)如何使用正则表达式解析枚举/ emun类:
- The first step - from *.cpp file to
<data_type> <enum_name> <content>
regex:
第一步 - 从* .cpp文件到
(?'data_type'enum|enum\s+class)\s+(?'enum_name'\w+)\s*{\s*(?'content'[^}]+?)\s*}\s*
- From
<content>
to<name> <value> <comment>
regex:
从
^\s*(?'name'\w+)(?:(?:\s*=\s*(?'value'[^,\n/]+))|(?:[^,\s/]))(?:(?:\s$)|(?:\s*,\s*$)|(?:[^/]/{2}\s(?'comment'.*$)))
All test were ok and here is marked text by colors.
所有测试都没问题,这里按颜色标记文字。