DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Understanding HyperLogLog for Estimating Cardinality
  • The Importance of Understanding Time Complexity in Data Structures
  • Amazon EC2 Deep Dive: Optimizing Workloads With Hardware Insights
  • Reversing an Array: An Exploration of Array Manipulation

Trending

  • It’s Not About Control — It’s About Collaboration Between Architecture and Security
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Hybrid Cloud vs Multi-Cloud: Choosing the Right Strategy for AI Scalability and Security
  • Beyond Linguistics: Real-Time Domain Event Mapping with WebSocket and Spring Boot
  1. DZone
  2. Data Engineering
  3. Data
  4. C++ benchmark – std::vector VS std::list

C++ benchmark – std::vector VS std::list

By 
Baptiste Wicht user avatar
Baptiste Wicht
·
Dec. 06, 12 · Interview
Likes (0)
Comment
Save
Tweet
Share
43.7K Views

Join the DZone community and get the full member experience.

Join For Free
a updated version of this article is available: c++ benchmark – std::vector vs std::list vs std::deque

in c++, the two most used data structures are the std::vector and the std::list. in this article, we will compare the performance in practice of these two data structures on several different workloads. in this article, when i talk about a list it is the std::list implementation and vector refers to the std::vector implementation.

it is generally said that a list should be used when random insert and remove will be performed (performed in o(1) versus o(n) for a vector). if we look only at the complexity, search in both data structures should be roughly equivalent, complexity being in o(n). when random insert/replace operations are performed on a vector, all the subsequent data needs to be moved so each element will be copied. that is why the size of the data type is an important factor when comparing those two data structures.

however, in practice, there is a huge difference in the usage of the memory caches. all the data in a vector is contiguous where the std::list allocates memory separately for each element. how does that change the results in practice ?

keep in mind that all the tests performed are made on vector and list even if other data structures could be better suited to the given workload.

in the graphs and in the text, n is used to refer to the number of elements of the collection.

all the tests performed have been performed on an intel core i7 q 820  @ 1.73ghz. the code has been compiled in 64 bits with gcc 4.7.2 with -02 and -march=native. the code has been compiled with c++11 support (-std=c++11).

fill

the first test is to fill the data structures by adding elements to the back of the container. two variations of vector are used, vector_pre being a std::vector with the size passed in parameters to the constructor, resulting in only one allocation of memory.

fill (8 bytes)vector_prevectorlist100010000100000100000003006009001,200milliseconds
fill (1024 bytes)vector_prevectorlist100010000100000100000006,00012,00018,00024,000milliseconds

all data structures are impacted the same way when the data size increases because there will be more memory to allocate. the vector_pre is clearly the winner of this test, being one order of magnitude faster than a list and about twice as fast as a vector without pre-allocation. the results are directly linked to the allocations that have to be performed, allocation being slow. whatever the data size is, push_back to a vector will always be faster than to a list. this is logical because vector allocates more memory than necessary and so does not need to allocate memory for each element.

but this test is not very interesting, generally building the data structure is not critical. what is critical is the operations that are performed on the data structure. that will be tested in the coming sections.

random find

the first operation is that is tested is the search. the container is filled with all the numbers in [0, n] and shuffled. then, each number in [0,n] is searched in the container with std::find that performs a simple linear search.

dyerware.com

yes, vector is represented in the graph, its line is the same as the x line ! performing a linear search in a vector is several orders of magnitude faster than in a list .

the only reason is the usage of the cache line. when a data is accessed, the data is fetched from the main memory to the cache. not only the accessed data is accessed, but a whole cacheline is fetched. as the elements in a vector are contiguous, when you access an element, the next element is automatically in the cache. as the main memory is orders of magnitude slower than the cache, this makes a huge difference. in the list case, the processor spends its whole time waiting for data being fetched from memory to the cache.

if we augment the size of the data type to 1kb, the results remain the same, but slower:

dyerware.com

random insert

in the case of random insert, in theory, the list should be much faster, its insert operation being in o(1) versus o(n) for a vector.

the container is filled with all the numbers in [0, n] and shuffled. then, 1000 random values are inserted at a random position in the container. the random position is found by linear search. in both cases, the complexity of the search is o(n), the only difference comes from the insert that follow the search.

dyerware.com

when, the vector should be slower than the list, it is almost an order of magnitude faster. again, this is because finding the position in a list is much slower than copying a lot of small elements.

if we increase the size:

dyerware.com

the two lines are getting closer, but vector is still faster.

increase it to 1kb:

dyerware.com

this time, list outperforms vector by an order of magnitude ! the performance of random insert in a list are not impacted much by the size of the data type, where vector suffers a lot when big sizes are used. we can also see that list doesn’t seem to care about the size of the collection. it is because the size of the collection only impact the search and not the insertion and as few search are performed, it does not change the results a lot.

if the iterator was already known (no need for linear search), it would be faster to insert into a list than into the vector.

random remove

in theory, random remove is the same case than random insert. now that we’ve seen the results with random insert, we could expect the same behavior for random remove.

the container is filled with all the numbers in [0, n] and shuffled. then, 1000 random values are removed from a random position in the container.

dyerware.com

again, vector is several times faster and looks to scale better. again, this is because it is very cheap to copy small elements.

let’s increase it directly to 1kb element.

dyerware.com

the two lines have been reversed !

the behavior of random remove is the same as the behavior of random insert, for the same reasons.

push front

the next operation that we will compare is inserting elements in front of the collection. this is the worst case for vector, because after each insertion, all the previously inserted will be moved and copied. for a list, it does not make a difference compared to pushing to the back.

dyerware.com

the results are crystal-clear and as expected. vector is very bad at inserting elements to the front. this does not need further explanations. there is no need to change the data size, it will only make vector much slower.

sort

the next operation that is tested is the performance of sorting a vector or a list. for a vector std::sort is used and for a list the member function sort is used.

dyerware.com

we can see that sorting a list is several times slower. it comes from the poor usage of the cache.

if we increase the size of the element to 1kb:

dyerware.com

this time the list is faster than the vector. it is not very clear on the graph, but the values for the list are almost the same as for the previous results. that is because std::list::sort() does not perform any copy, only pointers to the elements are changed. on the other hand, swapping two elements in a vector involves at least three copies, so the cost of sorting will increase as the cost of copying increases.

number crunching

finally, we can also test a number crunching operation. here, random elements are inserted into the container that is kept sorted. it means, that the position where the element has to be inserted is first searched by iterating through elements and the inserted. as we talk about number crunching, only 8 bytes elements are tested.

dyerware.com

we can clearly see that vector is more than an order of magnitude faster than list and this will only be more as the size of the collection increase. this is because traversing the list is much more expensive than copying the elements of the vector.

conclusion

to conclude, we can get some facts about each data structure:

  • std::vector is insanely faster than std::list to find an element
  • std::vector always performs faster than std::list with very small data
  • std::vector is always faster to push elements at the back than std::list
  • std::list handles large elements very well, especially for sorting or inserting in the front

these are the simple conclusions on usage of each data structure:

  • for number crunching : use std::vector
  • for linear search : use std::vector
  • for random insert/remove : use std::list (if data size very small (< 64b on my computer), use std::vector)
  • for big data size : use std::list (not if intended for searching)

if you have the time, in practice, the best way to decide is always to benchmark both versions, or even to try other data structures.

i hope that you found this article interesting. if you have any comment or have an idea about another workload that you would like to test, don’t hesitate to post a comment ;) if you have a question on results, don’t hesitate as well.

the code source of the benchmark is available online: https://github.com/wichtounet/articles/blob/master/src/vector_list/bench.cpp

Big data Data structure Element Virtual screening

Published at DZone with permission of Baptiste Wicht, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Understanding HyperLogLog for Estimating Cardinality
  • The Importance of Understanding Time Complexity in Data Structures
  • Amazon EC2 Deep Dive: Optimizing Workloads With Hardware Insights
  • Reversing an Array: An Exploration of Array Manipulation

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!