DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Fixing a SOLR Memory Leak
  • Performance Engineering Management: A Quick Guide
  • Vibe Coding With GitHub Copilot: Optimizing API Performance in Fintech Microservices
  • Scaling DevOps With NGINX Caching: Reducing Latency and Backend Load

Trending

  • Beyond ChatGPT, AI Reasoning 2.0: Engineering AI Models With Human-Like Reasoning
  • Mastering Advanced Traffic Management in Multi-Cloud Kubernetes: Scaling With Multiple Istio Ingress Gateways
  • Artificial Intelligence, Real Consequences: Balancing Good vs Evil AI [Infographic]
  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  1. DZone
  2. Data Engineering
  3. Data
  4. Optimize Memory Access to Increase Your Coding Performance

Optimize Memory Access to Increase Your Coding Performance

Learn about common memory coding problems as well as best practices and tips to help you increase your code performance.

By 
Levy Williamson user avatar
Levy Williamson
·
Sep. 06, 17 · Opinion
Likes (5)
Comment
Save
Tweet
Share
9.8K Views

Join the DZone community and get the full member experience.

Join For Free

While CPUs have become blindingly fast in the past 20 years, memory access times have remained relatively constant. To get the most performance out of coding, serious programmers must not only choose the best algorithms but also need to design for highly efficient memory access. Although compilers have improved in recent years, they are unable to detect and remap most memory related coding oversights that could be avoided. For this reason, avoiding pitfalls associated with memory access requires careful analysis.

In this article, some common memory coding problems will be briefly described together with some best practices tips that can increase code performance. Many of the issues have to do with understanding virtual memory concepts, such as the difference between stack and heap memory, as well as understanding more subtle interactions of program execution with the cache hierarchy.

Memory Available to Executing Thread

Two high-level virtual memory concepts for program execution are the stack and heap. The stack refers to that memory allocated to an executing thread and consists of local variables, addresses for executing functions, and other process information. Each thread has its own stack. The heap is dynamically allocated memory; it is referred to as the free store and is a shared memory pool accessible by all executing threads. Memory on the stack is much faster to access because variables are arranged in a contiguous LIFO, while the memory on the "heap" is relatively slower since it is created randomly across RAM in blocks, requiring more complex and multithreading-safe management.

Variables stored on the stack are static variables, such as fixed sized arrays (e.g. A[100]) that are not able to grow in size during thread execution. Once these variables on the stack go out of scope, or a running function completes, they go out of scope and are removed from the stack. Dynamically variables created on the "heap" with "new" in C++ or "malloc ()" in C, will remain on the heap until manually deallocated in the program code using one of the built in keywords like free, delete, or delete[].

For the purpose of coding, the stack is faster because the access pattern is contiguous and consists of incrementing or decrementing a pointer, thereby making allocation and deallocation trivial. Also, the stack efficiently reuses bytes that are directly mapped to the cache. A disadvantage of the stack is that variables cannot be resized during execution and upon many recursive calls, the stack memory quickly fills up, potentially producing stack overflow. In contrast, the variables created at runtime on the heap, which is available to all threads, will have relatively slower access due to several factors associated with virtual memory management. Also, because the heap is a global resource, multithreaded synchronization overhead degrades access time. A common technique is to create large blocks of heap and dole these out to arrays when needed, essentially creating a local memory manager.

In the next section, more details of the cache hierarchy and best practices of memory allocation are described.

The Memory Hierarchy and Performance

At the heart of virtual memory management is the need to coordinate blocks and pages of memory across the memory hierarchy of modern computer architectures. This is difficult because both access times and size vary considerably down the hierarchy [2]. The L1 cache can be accessed within 1-5 cycles and have sizes of ~32-64K, while L2 cache consumes 5-20 cycles upon access, with larger sizes of ~128-512K. These on-chip caches are considerably faster than main memory, or RAM, requiring more than 40-100 cycles.

Thus, significant performance degradation arises from non-optimal use of the cache hierarchy. Problems associated with reading and writing to and from the caches can be summarized as the 3 C's of cache inefficiency:

  • Compulsory misses: the first time a page is loaded will produce unavoidable misses.
  • Capacity misses: this occurs when all the active data cannot fit into the available cache space; this will cause reloading of cache pages.
  • Conflict misses: when data is mapped to same cache lines, it can result in cache thrashing.

Since inefficient cache utilization results in lower performance, coding practices must be cache-aware to improve program execution efficiency. While modern compilers can optimize code in many ways, they are unable to fix some of the most blatant problems that can occur in memory access. The good news is that there are common practices that can help avoid such case.

Practical Coding Tips

A useful mnemonic for remembering the class of methods that are used to improve memory access performance is given by the 3 R's [1][3], as follows:

  • Rearrange (code, data): Change layout to increase spatial locality.
  • Reduce (size, number of cache lines read): Smaller/smarter formats, compression.
  • Reuse (cache lines): Increase temporal (and spatial) locality.

Rearranging code and data refer to a set of techniques that optimize the mapping to the cache pages. Two examples are memory access order and memory alignment. To avoid cache thrashing, arrays should be scanned in increasing order, while multidimensional arrays should use the rightmost index for the innermost loops (e.g. for i to size; for j to size; A[j][i] ).

Memory alignment refers to arranging similarly-typed fields together in a structure with the most restrictively aligned types first. Compilers use an alignment criterion for fundamental types, thereby creating memory blocks that are certain multiples of memory. For maximum efficiency and avoid adding padding between objects, it is best practice to arrange types in objects from largest to smallest.

Another rearrangement in code is called hot/cold splitting, where locality of reference is obtained by splitting data structures into frequently used and infrequently used segments and allocating all the "frequently used" stuff together.

Reduce and reuse refer to techniques that minimize memory operations with temporal locality that reduce cache fetches. This is accomplished by reuse of data still in the cache by merging loops that use the same data.

Conclusions

The art of writing cache-aware code requires careful analysis of code with memory profiling tools. While there are many more memory optimization techniques, the general techniques described in this short article would go a long way to increase execution performance of many programs.

For more information on code performance improvements, see C++ code performance and C optimization.

References

C. Ericson. Real-time collision detection. Morgan-Kaufmann, 2005.
ISBN:1558607323. (Chapter on memory optimization)

Hennessy, J. L., and Patterson, D. A. Computer Architecture: A Quantitative
Approach, second ed. Morgan Kaufmann, 1996.

D.N. Truong, F. Bodin and A. Seznec, "Improving cache behavior of dynamically allocated data structures," Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192), Paris, 1998, pp. 322-329. doi: 10.1109/PACT.1998.727268

Memory (storage engine) Coding (social sciences) Cache (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Fixing a SOLR Memory Leak
  • Performance Engineering Management: A Quick Guide
  • Vibe Coding With GitHub Copilot: Optimizing API Performance in Fintech Microservices
  • Scaling DevOps With NGINX Caching: Reducing Latency and Backend Load

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!