DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Coding
  3. Languages
  4. Hacking into Python Objects Internals

Hacking into Python Objects Internals

Chris Smith user avatar by
Chris Smith
·
Feb. 10, 12 · Interview
Like (0)
Save
Tweet
Share
5.51K Views

Join the DZone community and get the full member experience.

Join For Free
You know, Python represents every object using the low-level C API PyObject (or PyVarObject for variable-size objects) structure, so, concretely, you can cast any Python object pointer to this type; this inheritance is built by hand, every new object must have a leading macro called PyObject_HEAD which defines the PyObject header for the object. The PyObject structure is declared in Include/object.h as:


Originally Authored by Christian S. Perone


typedef struct _object {
    PyObject_HEAD
} PyObject;

and the PyObject_HEAD macro is defined as:

#define PyObject_HEAD                   \
    _PyObject_HEAD_EXTRA                \
    Py_ssize_t ob_refcnt;               \
    struct _typeobject *ob_type;

… with two fields (forget the _PyObject_HEAD_EXTRA, it’s only used for a tracing debug feature) called ob_refcnt and ob_type, representing the reference counting for the object and the type of the object. I know you can use sys.getrefcount to get the reference counting of an object, but hacking the object memory using ctypes is by far more powerful, since you can get the contents of any field of the object (in cases where you don’t have a native API for that), I’ll show more examples later, but lets focus on the reference counting field of the object.

Getting the reference count (ob_refcnt)

So, in Python, we have the built-in function id(), this function returns the identity of the object, but, looking at its definition on CPython implementation, you’ll notice that id() returns the memory address of the object, see the source in Python/bltinmodule.c:

static PyObject *
builtin_id(PyObject *self, PyObject *v)
{
    return PyLong_FromVoidPtr(v);
}


… the function PyLong_FromVoidPtr returns a Python long object from a void pointer. So, in CPython, this value is the address of the object in the memory as shown below:

>>> value = 666
>>> hex(id(value))
'0x8998e50' # memory address of the 'value' object


Now that we have the memory address of the object, we can use the Python ctypes module to get the reference counting by accessing the attribute ob_refcnt, here is the code needed to do that:

>>> value = 666
>>> value_address = id(value)
>>>
>>> ob_refcnt = ctypes.c_long.from_address(value_address)
>>> ob_refcnt
c_long(1)


What I’m doing here is getting the integer value from the ob_refcnt attribute of the PyObject in memory.  Let’s add a new reference for the object ‘value’ we created, and then check the reference count again:

>>> value_ref = value
>>> id(value_ref) == id(value)
True
>>> ob_refcnt
c_long(2)


Note that the reference counting was increased by 1 due to the new reference variable called ‘value_ref’.

Interned strings state (ob_sstate)

Now, getting the reference count wasn’t even funny, we already had the sys.getrefcount API for that, but what about the interned state of the strings ? In order to avoid the creation of different allocations for the same string (and to speed comparisons), Python uses a dictionary that works like a “cache” for strings, this dictionary is defined in Objects/stringobject.c:

/* This dictionary holds all interned strings.  Note that references to
strings in this dictionary are *not* counted in the string's ob_refcnt.
When the interned string reaches a refcnt of 0 the string deallocation
function will delete the reference from this dictionary.

Another way to look at this is that to say that the actual reference
count of a string is:  s->ob_refcnt + (s->ob_sstate?2:0)
*/
static PyObject *interned;


I also copied here the comment about the dictionary, because is interesting to note that the strings in the dictionary aren’t counted in the string’s ob_refcnt.

So, the interned state of a string object is hold in the attribute ob_sstate of the string object, let’s see the definition of the Python string object:

typedef struct {
    PyObject_VAR_HEAD
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];

    /* Invariants:
    *     ob_sval contains space for 'ob_size+1' elements.
    *     ob_sval[ob_size] == 0.
    *     ob_shash is the hash of the string or -1 if not computed yet.
    *     ob_sstate != 0 iff the string object is in stringobject.c's
    *       'interned' dictionary; in this case the two references
    *       from 'interned' to this object are *not counted* in ob_refcnt.
    */
} PyStringObject;


As you can note, strings objects inherit from the PyObject_VAR_HEAD macro, which defines another header attribute, let’s see the definition to get the complete idea of the structure:

#define PyObject_VAR_HEAD               \
    PyObject_HEAD                       \
    Py_ssize_t ob_size; /* Number of items in variable part */


The PyObject_VAR_HEAD macro adds another field called ob_size, which is the number of items on the variable part of the Python object (i.e. the number of items on a list object). So, before getting to the ob_sstate field, we need to shift our offset to skip the fields ob_refcnt (long), ob_type (void*) (from PyObject_HEAD), the field ob_size (long) (from PyObject_VAR_HEAD) and the field ob_shash (long) from the PyStringObject. Concretely, we need to skip this offset (3 fields with size long and one field with size void*) of bytes:

>>> ob_sstate_offset = ctypes.sizeof(ctypes.c_long)*3 + ctypes.sizeof(ctypes.c_voidp)
>>> ob_sstate_offset
16


Now, let’s prepare two cases, one that we know that isn’t interned and another that is surely interned, then we’ll force the interned state of the other non-interned string to check the result of the ob_sstate attribute:

>>> a = "lero"
>>> b = "".join(["l", "e", "r", "o"])
>>> ctypes.c_long.from_address(id(a) + ob_sstate_offset)
c_long(1)
>>> ctypes.c_long.from_address(id(b) + ob_sstate_offset)
c_long(0)
>>> ctypes.c_long.from_address(id(intern(b)) + ob_sstate_offset)
c_long(1)


Note that the interned state for the object “a” is 1 and for the object “b” is 0. After forcing the interned state of the variable “b”, we can see that the field ob_sstate has changed to 1.

Changing internal states (evil mode)

Now, let’s suppose we want to change some internal state of a Python object through the interpreter. Let’s try to change the value of an int object. Int objects are defined in Include/intobject.h:

typedef struct {
    PyObject_HEAD
    long ob_ival;
} PyIntObject;


As you can see, the internal value of an int is stored in the field ob_ival, to change it, we just need to skip the ob_refcnt (long) and the ob_type (void*) from the PyObject_HEAD:

>>> value = 666
>>> ob_ival_offset = ctypes.sizeof(ctypes.c_long) + ctypes.sizeof(ctypes.c_voidp)
>>> ob_ival = ctypes.c_int.from_address(id(value)+ob_ival_offset)
>>> ob_ival
c_long(666)
>>> ob_ival.value = 8
>>> value
8


And that is it, we have changed the value of the int value directly in the memory.

I hope you liked it, you can play with lots of other Python objects like lists and dicts, note that this method is just intended to show how the Python objects are structured in the memory and how you can change them using the native API, but obviously, you’re not supposed to use this to change the value of ints lol.


Update 11/29/11
: you’re not supposed to do such things on your production code or something like that, in this post I’m doing lazy assumptions about arch details like sizes of primitives, etc. Be warned.


Source: http://pyevolve.sourceforge.net/wordpress/?p=2171

Object (computer science) Python (language)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Strategies for Kubernetes Cluster Administrators: Understanding Pod Scheduling
  • 5 Steps for Getting Started in Deep Learning
  • OpenVPN With Radius and Multi-Factor Authentication
  • What Are the Different Types of API Testing?

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: