DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Python Variables Declaration
  • Start Coding With Google Cloud Workstations
  • Beyond ChatGPT, AI Reasoning 2.0: Engineering AI Models With Human-Like Reasoning
  • Why I Started Using Dependency Injection in Python

Trending

  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Hybrid Cloud vs Multi-Cloud: Choosing the Right Strategy for AI Scalability and Security
  • Beyond Linguistics: Real-Time Domain Event Mapping with WebSocket and Spring Boot
  • A Modern Stack for Building Scalable Systems
  1. DZone
  2. Coding
  3. Languages
  4. Python Memory Issues: Tips and Tricks

Python Memory Issues: Tips and Tricks

Like most garbage collected languages, memory management in Python is indirect; you have to find and break references to unused data to help the garbage collector keep the heap clean.

By 
Neville Carvalho user avatar
Neville Carvalho
·
Oct. 04, 16 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
104.4K Views

Join the DZone community and get the full member experience.

Join For Free

Python intends to remove a lot of the complexity of memory management that languages like C and C++ involve. It certainly does do that, with automatic garbage collection when objects go out of scope. However, for large and long running systems developed in Python, dealing with memory management is a fact of life.

In this blog, I will share thoughts on reducing Python memory consumption and on root-causing Memory consumption/bloat issues. These have been formulated from the learnings we’ve had while building Datos IO’s RecoverX Distributed Backup and Recovery platform, which is developed primarily in Python (has some components in C++, Java, and bash too).

Python Garbage Collection

The Python interpreter keeps reference counts to objects being used. When an object is not referred to anymore, the garbage collector is free to release the object and get back the allocated memory. For eg: if you are using regular Python (that is, CPython, not JPython) this is when Python’s garbage collector will call free()/delete()

Useful tools

Resource

‘resource’ module for finding the current (Resident) memory consumption of your program

[Resident memory is the actual RAM your program is using]

>>> import resource
>>> resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
4332

objgraph

‘objgraph’ is a useful module that shows you the objects that are currently in memory

[objgraph documentation and examples are available at: https://mg.pov.lt/objgraph/]

Let's look at this simple usage of objgraph:

import objgraph
import random
import inspect

class Foo(object):
   def __init__(self):
       self.val = None

   def __str__(self):
       return “foo – val: {0}”.format(self.val)

def f():
   l = []
   for i in range(3):
      foo = Foo()
      #print “id of foo: {0}”.format(id(foo))
      #print “foo is: {0}”.format(foo)
      l.append(foo)

   return l

def main():
   d = {}
   l = f()
   d[‘k’] = l
   print “list l has {0} objects of type Foo()”.format(len(l))

   objgraph.show_most_common_types()
   objgraph.show_backrefs(random.choice(objgraph.by_type(‘Foo’)),
filename=“foo_refs.png”)
  
   objgraph.show_refs(d, filename=‘sample-graph.png’)

if __name__ == “__main__”:
   main()
python test1.py

list l has 10000 objects of type Foo()
dict                       10423
Foo                        10000 ————> Guilty as charged!
tuple                      3349
wrapper_descriptor         945
function                   860
builtin_function_or_method 616
method_descriptor          338
weakref                    199
member_descriptor          161
getset_descriptor          107

Notice that we are also holding 10,423 instances of ‘dict’ in memory. Although we’ll come to that in a bit.

Visualizing the objgraph Dependencies

Objgraph has a nice capability that can show you why these Foo() objects are being held in memory. That is, who is holding references to them (in this case, it is the list ‘l’)

On RedHat/Centos, you can install graphviz using: sudo yum install yum install graphviz*

To see the objects the dictionary, d, refers to:

objgraph.show_refs(d, filename=’sample-graph.png’)

Image title


What is more interesting from a memory usage perspective, is — why is my Object not getting freed? That is, who is holding a reference to it.

This snippet shows how objgraph provides this info:

objgraph.show_backrefs(random.choice(objgraph.by_type(‘Foo’)),

filename=“foo_refs.png”)

Image title

In this case, we looked at a random object of type Foo. We know that this particular object is being held in memory because of its referrers back up the chain being in scope.

Sometimes, this can help when figuring out why Objects that we think are no longer used (referred to) are not being freed by Python’s garbage collector.

The challenge is sometimes knowing that Foo() is the class is holding a lot of memory. We can use heapy() to answer that part.

Heapy

Heapy is a useful tool for debugging memory consumption/leaks. See http://guppy-pe.sourceforge.net/. I have generally used heapy along with objgraph. I typically use heapy to see watch allocation growth of diff objects over time. Heapy can show which objects are holding the most memory etc. Objgraph can help in finding the backref chain (eg: section 4 above) to understand exactly why they cannot be freed.

My typical usage of heapy is calling a function like at different spots in the code to try to find where memory usage is spiking and to gather a clue about what objects might be causing the issue:

from guppy import hpy

def dump_heap(h, i):
   “””
   @param h: The heap (from hp = hpy(), h = hp.heap())
   @param i: Identifier str
   “””
  
   print “Dumping stats at: {0}”.format(i)
   print ‘Memory usage: {0} (MB)’.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss/1024)
   print “Most common types:”
   objgraph.show_most_common_types()
  
   print “heap is:”
   print “{0}”.format(h)
  
   by_refs = h.byrcs
   print “by references: {0}”.format(by_refs)
  
   print “More stats for top element..”
   print “By clodo (class or dict owner): {0}”.format(by_refs[0].byclodo)
   print “By size: {0}”.format(by_refs[0].bysize)
   print “By id: {0}”.format(by_refs[0].byid)
 

Memory Reduction Tips

In this section I will describe a couple of tips that I’ve found to reduce memory consumption.

Slots

Use Slots for objects that you have a lot of. Slotting tells the Python interpreter that a dynamic dict is not needed for your object (From the example in 2.2 above, we saw that each Foo() object had a dict inside it)

Defining your class with slots makes the python interpreter know that the attributes/members of your class are fixed. And can lead to significant memory savings!

Consider the following code:

import resource
class Foo(object):
   #__slots__ = (‘val1’, ‘val2’, ‘val3’, ‘val4’, ‘val5’, ‘val6’)
  
   def __init__(self, val):
      self.val1 = val+1
      self.val2 = val+2
      self.val3 = val+3
      self.val4 = val+4
      self.val5 = val+5
      self.val6 = val+6

def f(count):
   l = []
 
   for i in range(count):
      foo = Foo(i)
      l.append(foo)

   return l

def main():
   count = 10000
   l = f(count)

   mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
   print “Memory usage is: {0} KB”.format(mem)
   print “Size per foo obj: {0} KB”.format(float(mem)/count)

if __name__ == “__main__”:
   main()


[vagrant@datosdev temp]$ python test2.py
Memory usage is: 16672 KB
Size per foo obj: 1.6672 KB

Now un-comment this line: #__slots__ = (‘val1’, ‘val2’, ‘val3’, ‘val4’, ‘val5’, ‘val6’)

[vagrant@datosdev temp]$ python test2.py
Memory usage is: 6576 KB
Size per foo obj: 0.6576 KB

In this case, this was a 60% Memory reduction!

Read more about Slotting here: http://www.elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/

Interning: Beware of Interned Strings!

Python remembers immutables like strings (up to a certain size, this size is implementation dependent). This is called interning.

 >>> t = “abcdefghijklmnopqrstuvwxyz” 
 >>> p = “abcdefghijklmnopqrstuvwxyz” 
 >>> id(t) 
 139863272322872 

 >>> id(p) 
 139863272322872 

This is done by the python interpreter to save memory, and to speed up comparison. For eg, if 2 strings have the same id/reference – they are the same.

However, if your program creates a lot of small strings, you could be bloating memory.

Use Format Instead of ‘+’ for Generating Strings

Following from the above, when building strings, prefer to build strings using format instead of concatentation.

That is,

 st = “{0}_{1}_{2}_{3}”.format(a,b,c,d)   # Better for memory. Does not create temp strs 
 st2 = a + ‘_’ + b + ‘_’ + c + ‘_’ + d # Creates temp strs at each concatenation, which are then interned

In our system, we found significant memory savings when we changed certain string construction from concat to format.

System Level Concerns

The tips I’ve discussed above should help you to hunt down memory consumption issues in your system. However, over time, the memory consumed by your python processes will continue to grow. This seems to be a combination of:

  • How the C memory allocator in Python works. This is essentially memory fragmentation, because the allocation cannot call ‘free’ unless the entire memory chunk is unused. But the memory chunk usage is usually not perfectly aligned to the objects that you are creating and using.
  • Memory growth also related to the “Interning” topic discussed above.

In my experience, it is only possible to reduce the rate at which the memory consumption grows in Python. What we have done at Datos IO, is to implement a worker model for certain memory hungry processes. For a sizeable unit of work, we spin up a worker process. When the worker processes work is done, it is killed — the only guaranteed way to return all the memory back to the OS :). Good memory housekeeping allows to increase the size of the work allocated, that is, allow the worker to run for a longer time.

Summary

I’ve described some tips to reduce memory consumption in python processes. When looking for memory leaks in your code, an approach is to use Heapy to find out which Objs are holding the most memory, and then possibly using Objgraph to find out why these are not getting freed (if you think they should).

Overall, I feel that hunting for memory problems in python is an art. Over time, you will develop better intuitions about where memory bloat and leaks are coming in your system, and will be able to solve them faster! Happy Hunting!

Related links

Other Python memory links I’ve found useful:

  1. http://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html
  2. http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
Memory (storage engine) Python (language)

Published at DZone with permission of . See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Python Variables Declaration
  • Start Coding With Google Cloud Workstations
  • Beyond ChatGPT, AI Reasoning 2.0: Engineering AI Models With Human-Like Reasoning
  • Why I Started Using Dependency Injection in Python

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!