使用c样式数组作为STL字符串操作的后端

时间:2022-01-14 22:23:47

I'm writing a library to read some specific file formats. The file are being read with memory mapped files (boost::interprocess templates). On these files I have to do some searches with std::regex. To avoid unnecessary copying I want to use the memory mapped file directly (as C-style char array).

我正在编写一个库来读取某些特定的文件格式。使用内存映射文件(boost::interprocess模板)读取文件。在这些文件中,我必须用std::regex进行一些搜索。为了避免不必要的复制,我希望直接使用内存映射文件(作为c样式的char数组)。

After some research time I come up with the following two approaches:

经过一段时间的研究,我想到了以下两种方法:

  • Using the pubsetbuf method of a streambuf object
  • 使用一个streambuf对象的pubsetbuf方法。
  • Using the char* pointer as iterator
  • 使用char*指针作为迭代器

but since the implementation of the first one is optional for the STL vendor, I'm sticked with the second approach. Since the constructor for std::string::iterator is declared as private and the whole iterator implementation seems to be also vendor specific. I wrote my own iterator:

但是由于第一个选项的实现是STL供应商的可选的,所以我坚持第二个方法。因为std::string::iterator的构造函数被声明为private,并且整个迭代器实现似乎也是特定于供应商的。我写了我自己的迭代器:

template<typename T>
class PointerIterator: std::iterator<std::input_iterator_tag, T> {
public:
    PointerIterator(T* first, std::size_t count): first_(first), last_(first + count) {}
    PointerIterator(T* first, T* last): first_(first), last_(last) {}

    class iterator {
    public:
        iterator(T* p): ptr_(p) {}
        iterator(const iterator& it): ptr_(it.ptr_) {}
        iterator& operator++() {
            ++ptr_;
            return *this;
        }
        iterator operator++(int) {
            iterator temp(*this);
            ++ptr_;
            return temp;
        }
        bool operator==(const iterator& it) { return ptr_ == it.ptr_; }
        bool operator!=(const iterator& it) { return ptr_ != it.ptr_; }
        T& operator*() { return *ptr_; }
    private:
        T* ptr_;
    };    
    iterator begin() {
        return iterator(first_);
    }
    iterator end() {
        return iterator(last_);
    }
private:
    T* first_;
    T* last_;
};

The iterator is working, but for use with the std::regex_search method (or other char-related STL methods) it must be of the same type as the STL iterators.

迭代器正在工作,但是对于std::regex_search方法(或其他与charoisl相关的STL方法),它必须与STL迭代器具有相同的类型。

Is there some generic approach to cast my iterators to the STL ones (portable over STL implementations) or achieve the entire thng with another approach I didn't mentioned?

是否有一些通用的方法将我的迭代器转换到STL(可移植到STL实现)中,或者用另一种方法实现整个thng,我没有提到?

Edit:

编辑:

The source using std::regex_search:

源使用std::regex_search:

std::regex re(...);
boost::interprocess::mapped_region region(...);
char* first = static_cast<char*>(region.get_address());
char* last = first + 5000;

// ...

PointerIterator<char> wrapper(first, last);
std::smatch match;
while (std::regex_search(wrapper.begin(), wrapper.end(), match, re)) {  // Error: No matching function call to 'regex_search'
     // do something
}

Thanks

谢谢

1 个解决方案

#1


3  

The definition of std::smatch is a specialization of std::match_results. This specialization uses string::const_iterator as the iterator type in the template arguments passed to std::match_results. This requires the begin and end arguments passed to std::regex_search to also be of type string::const_iterator.

smatch的定义是std::match_results的专门化。这种专门化使用string: const_iterator作为传递给std::match_results的模板参数中的迭代器类型。这需要传递给std::regex_search的开始和结束参数也是string: const_iterator类型。

In C++ pointers satisfy the requirements of bidirectional iterators and it is not necessary to wrap them in an iterator class. If you need to search through a buffer pointed to by a char pointer you can either use std::cmatch or use std::match_results and specify the iterator type explicitly. In the following two examples I have retained the use of PointerIterator to provide solutions that directly apply to your current code base. I have also included a stand alone example you can reference in the event you want to eliminate the use of your custom iterator class.

在c++指针中,满足双向迭代器的需求,并且没有必要在迭代器类中封装它们。如果需要通过char指针指向的缓冲区进行搜索,可以使用std:::cmatch或使用std::match_results并显式指定迭代器类型。在下面的两个示例中,我保留了使用PointerIterator来提供直接应用于当前代码库的解决方案。我还包括了一个独立的例子,您可以在您想要取消自定义迭代器类的使用时引用它。

PointerIterator<char> wrapper(first, last);
std::cmatch match; // <<--

while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
    // do something
}

...using std::match_results instead.

…使用std::match_results代替。

PointerIterator<char> wrapper(first, last);
std::match_results<const char*> match; // <<--

while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
    // do something
}

Below is a stand alone example that should provide a bit of codified clarification. It is based on the example on cppreference.com and uses const char* instead of std::string as the search target.

下面是一个独立的示例,它应该提供一些规范化的澄清。它基于cppreference.com上的示例,使用const char*而不是std: string作为搜索目标。

#include <regex>
#include <iostream>
int main()
{
    const char *haystack = "Roses are #ff0000";
    const int size = strlen(haystack);

    std::regex pattern(
        "#([a-f0-9]{2})"
        "([a-f0-9]{2})"
        "([a-f0-9]{2})");

    std::cmatch results;

    std::regex_search(haystack, haystack + size, results, pattern);

    for (size_t i = 0; i < results.size(); ++i) {
        std::csub_match  sub_match = results[i];
        std::string sub_match_str = sub_match.str();
        std::cout << i << ": " << sub_match_str << '\n';
    }  
}

This produces the following output.

这将产生以下输出。

0: #ff0000
1: ff
2: 00
3: 00

0: f0000 1: ff 2: 00 3: 00

#1


3  

The definition of std::smatch is a specialization of std::match_results. This specialization uses string::const_iterator as the iterator type in the template arguments passed to std::match_results. This requires the begin and end arguments passed to std::regex_search to also be of type string::const_iterator.

smatch的定义是std::match_results的专门化。这种专门化使用string: const_iterator作为传递给std::match_results的模板参数中的迭代器类型。这需要传递给std::regex_search的开始和结束参数也是string: const_iterator类型。

In C++ pointers satisfy the requirements of bidirectional iterators and it is not necessary to wrap them in an iterator class. If you need to search through a buffer pointed to by a char pointer you can either use std::cmatch or use std::match_results and specify the iterator type explicitly. In the following two examples I have retained the use of PointerIterator to provide solutions that directly apply to your current code base. I have also included a stand alone example you can reference in the event you want to eliminate the use of your custom iterator class.

在c++指针中,满足双向迭代器的需求,并且没有必要在迭代器类中封装它们。如果需要通过char指针指向的缓冲区进行搜索,可以使用std:::cmatch或使用std::match_results并显式指定迭代器类型。在下面的两个示例中,我保留了使用PointerIterator来提供直接应用于当前代码库的解决方案。我还包括了一个独立的例子,您可以在您想要取消自定义迭代器类的使用时引用它。

PointerIterator<char> wrapper(first, last);
std::cmatch match; // <<--

while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
    // do something
}

...using std::match_results instead.

…使用std::match_results代替。

PointerIterator<char> wrapper(first, last);
std::match_results<const char*> match; // <<--

while (std::regex_search(wrapper.begin(), wrapper.end(), match, re))
{
    // do something
}

Below is a stand alone example that should provide a bit of codified clarification. It is based on the example on cppreference.com and uses const char* instead of std::string as the search target.

下面是一个独立的示例,它应该提供一些规范化的澄清。它基于cppreference.com上的示例,使用const char*而不是std: string作为搜索目标。

#include <regex>
#include <iostream>
int main()
{
    const char *haystack = "Roses are #ff0000";
    const int size = strlen(haystack);

    std::regex pattern(
        "#([a-f0-9]{2})"
        "([a-f0-9]{2})"
        "([a-f0-9]{2})");

    std::cmatch results;

    std::regex_search(haystack, haystack + size, results, pattern);

    for (size_t i = 0; i < results.size(); ++i) {
        std::csub_match  sub_match = results[i];
        std::string sub_match_str = sub_match.str();
        std::cout << i << ": " << sub_match_str << '\n';
    }  
}

This produces the following output.

这将产生以下输出。

0: #ff0000
1: ff
2: 00
3: 00

0: f0000 1: ff 2: 00 3: 00