Over a million developers have joined DZone.

Speeding Up string_view String Split Implementation

DZone's Guide to

Speeding Up string_view String Split Implementation

Learn how you can improve your string split code for even better performance from string_view in this tutorial.

· Performance Zone ·
Free Resource

Maintain Application Performance with real-time monitoring and instrumentation for any application. Learn More!

Thank you for all the comments about the string_view performance! Last week, I got a lot of feedback on how to improve the initial string split code.

Have a look at how can we update the code and get some better performance.


Last week, I showed a few examples of string_view. Obviously, in most of the cases, string_view was much faster than the standard string. A view is a non-owning reference, so there's no need to copy the data - only [ptr, len] is needed to mark the referenced range. Moreover, string_view was added into the Standard Library because of the performance.

Maybe my string_view vs string tests were not needed because the results were too obvious?

As always, it's not that easy. Running benchmarks is hard, and sometimes results might be entirely unexpected.

For example, the last time one string implementation was faster than the string_view counterpart...

Here's the simple benchmark of string split algorithm results from GCC 8.1.

As you can see, the string_view version is slower!

Let's try to understand why.

The Series

This article is part of my series about C++17 Library Utilities. Here's the list of the other topics that I'll cover:

Resources about C++17 STL:

The Case

The algorithm that I tested last week was a string split implementation. As you saw in the image above, the performance of string_view was not perfect.

Here's the code:

split(const std::string& str, const std::string& delims = " ")
    std::vector<std::string> output;
    auto first = std::cbegin(str);

    while (first != std::cend(str))
        const auto second = std::find_first_of(first, std::cend(str), 
                  std::cbegin(delims), std::cend(delims));

        if (first != second)
            output.emplace_back(first, second);

        if (second == std::cend(str))

        first = std::next(second);

    return output;

Now the string_view version:

splitSV(std::string_view strv, std::string_view delims = " ")
    std::vector<std::string_view> output;
    size_t first = 0;

    while (first < strv.size())
        const auto second = strv.find_first_of(delims, first);

        if (first != second)
            output.emplace_back(strv.substr(first, second-first));

        if (second == std::string_view::npos)

        first = second + 1;

    return output;

The readers pointed out that the initial implementations used different combinations of features:

  • string implementation used iterators and std::find_first_of
  • string_view used std::string_view::find_first_of — a member function.

When you change the string_view view version, so that it uses the std::find_first_of then the performance is much better!

For example:

See the benchmark: @QuickBench

One possible reason why the member function is slower than the std::find_first_of is that the member method uses memchr. See this comment by "en-em".

The generic std::find_first_of can be fully inlined by the compiler, while the member function is not. It would be an interesting experiment to figure out exactly why the generic std:: function is faster than a member method. Is memchr that slow (At least in GCC implementation)?

The second improvement comes from JFT who also implemented the algorithms using pointers and not iterators. That also gave a lot of speed increase.

Another idea was that we might preallocate some space at the beginning — so that we have fewer vector reallocations. For example, we can assume each word is 5...6 words and then use .reserve(). While it works nicely, we might end up with a bit larger vector - and later you'd probably want to shrink_to_fit(). And in total, I've noticed that it doesn't bring much performance gain. Some more tests would be needed here.

Final Benchmark

Here are the results from running 6 versions of the benchmark:

  • StringSplit - string with std::string::find_first_of — member function
  • StringSplitStd - string with std::find_first_of with iterators
  • StringSplitPtr - string with std::find_first_of with pointers
  • StringViewSplit - string_view with std::string_view::find_first_of — member function
  • StringViewSplitStd - string_view with std::find_first_of with iterators
  • StringViewSplitPtr - string_view with std::find_first_of with pointers

GCC 8.1:

See at Quick Bench

And Clang 6.0 version:

The benchmark uses a static string, so there's a chance that the compiler could optimize its usage somehow.

Here are the results from MSVC 2017.7. I've used a large string — 547412 characters, loaded from a file.

string length: 547412
test iterations: 100
string split: 731.008 ms
string split std: 586.843 ms
string split ptr: 562.683 ms
string_view split: 406.436 ms
string_view split std: 223.27 ms
string_view split ptr: 208.758 ms

In both experiments, we can see that the version of string_view, with std::find_first_of and pointer implementation is the fastest.


Once again thanks for all the comments under the last article. I hope I gathered all the essential details from the feedback.

Here's the GitHub to the MSVC tests. The results of those quick benchmarks have to taken with care. It's always best to measure the final scenario, rather than sometimes artificial examples. Such benchmarks might give you a general direction towards the final solution (see Don't trust quick-bench results you see on the internet).

Collect, analyze, and visualize performance data from mobile to mainframe with AutoPilot APM. Learn More!

c++ ,visual c++ ,programming ,tutorial ,performance ,strings

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}