Metaprogramming
Join the DZone community and get the full member experience.
Join For FreeThis article is taken from IronPython in Action from Manning Publications. This segment shows how to use the metaprogramming capabilities of Python in the form of metaclasses, which can be used to achieve tasks that are much harder, or even impossible, with other approaches. For the table of contents, the Author Forum, and other resources, go to http://www.manning.com/foord/.
It is being reproduced here by permission from Manning Publications. Manning ebooks are sold exclusively through Manning. Visit the book's page for more information.
Softbound print: March 2009 | 496 pages
ISBN: 1933988339
Use code "dzone30" to get 30% off any version of this book.
Metaprogramming is the programming language or runtime assisting with the writing or modification of your program. The classic example of this is runtime code generation. In Python and IronPython this is supported through the exec statement and built‐in compile / eval functions. Python source code is text, so generating code using string manipulation and then executing it is relatively easy.
However, code generation has to be relatively deterministic (your code that generates the strings is going to be following a set of rules that you determine), meaning that it is usually easier just to provide objects rather than go through the intermediate step of generating and executing code. An exception is when you generate code from user input, perhaps by translating a domain specific language into Python. This is the approach taken by the Resolver spreadsheet(1), which translates a 'Python‐like' formula language into Python expressions.
Python has further support for metaprogramming through something called metaclasses. These allow you to customize class creation; that is modify classes or perform actions at the point at which they are defined. Metaclasses have a reputation for being 'deep black magic':
"Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don't." ‐‐ Python Guru Tim Peters
They are seen as 'magic' because they can modify code away from the point at which it actually appears in your source. In fact, the basic principles are simple to grasp and no good book on Python would be complete without an introduction to them.
8.3.1 Introduction to Metaclasses
In Python everything is an object. Functions and classes are all first class objects that can be created at runtime and passed around your code. Every object has a type. For most objects their type is their class, so what is the type of a class? The answer is that classes are instances of their metaclass.
Just as objects are created by calling their class with the correct parameters, classes are created by calling their metaclass with certain parameters. The default metaclass (the type of classes) is 'type'. This leads to the wonderful expression:
type(type) is type
'type' is itself a class, so its type is 'type'!
So what does this have to do with metaprogramming? Python is an interpreted language, classes are created at runtime and by providing a custom metaclass we can control what happens when a class is created, including modifying the class.
As usual the easiest way to explain this is to show it in action. Listing 8.8 shows the simplest possible metaclass.
Listing 8.8 The Simplest Example of a Metaclass
class PointlessMetaclass(type): | # 1
def __new__(meta, name, bases, classDict): | # 2
return type.__new__(meta, name, bases, classDict) | #3
class SomeClass(object):
__metaclass__ = PointlessMetaclass | # 4
(annotation) <#1 Metaclasses inherit from 'type'>
(annotation) <#2 Arguments received by the metaclass>
(annotation) <#3 This instantiates the class>
(annotation) <#4 Set the metaclass on the class>
It doesn't actually do anything, but illustrates the basics of the metaclass. You set the metaclass on a class by assigning the '__metaclass__' attribute inside the class definition. Subclasses automatically inherit the metaclass of their superclass(2). When the class is created, the metaclass is called with a set of arguments. These are the class name, a tuple of base classes and a dictionary of all the attributes (including methods) defined in the class. To customize class creation with a metaclass you need to inherit from 'type' and override the '__new__' method.
The '__new__' Constructor
So far, we have talked about '__init__' as the constructor method of objects. '__init__' receives self as the first argument, the freshly created instance, and it initializes it.
The instance that '__init__' receives is actually created by '__ew__', so it is technically more correct to call '__new__' the constructor and '__init__' an initializer. '__new__' receives the class as the first argument, plus any additional arguments that were used in the construction (it receives the same arguments as '__init__').
'__new__' is needed for creating immutable values; setting the value in '__init__' would mean that values could be modified simply by calling '__init__' on an instance again. You can subclass the builtin types in Python, but to customize immutable values, like 'str' and 'int', you need to override '__new__' rather than '__init__'.
You can try this out by defining methods and attributes on 'SomeClass', and putting print statements in the body of 'PointlessMetaclass'. You will see that methods appear as functions in the 'classDict', keyed by their name. This means that inside the metaclass you can modify this dictionary (plus the 'name' and the 'bases' tuple if you want) to change how the class is created. But what is this actually useful for?
8.3.2 Uses of Metaclasses
Despite their reputation for being 'deep magic', sometimes a metaclass can achieve something that is difficult or impossible to do via other means. They are invoked at class creation time, so they can be used to perform operations that would otherwise require a manual step. These include3:
- Register classes as they are created. This is often done by database ORMs (Object Relational Mappers), as the classes you create relate to the shape of the database table you will interact with. This can also be useful for autoregistering plugin classes.
- Enable new coding techniques. For example the 'Elixir'4 ORM framework uses metaclasses to enable a declarative way of declaring database schema.
- Provide interface registration, including auto‐discovery of features and adaptation.
- Class verification: preventing subclassing or verifying code quality such as checking that all methods have docstrings, or that classes meet a particular standard.
- Decorate all the methods in a class. This can be useful for logging, tracing and profiling purposes.
- Mixing in appropriate methods without having to use inheritance. This can be one way of avoiding multiple inheritance. You can also load in methods from non‐code definitions, for example by loading XML to create classes.
We can show a practical use of metaclasses with a profiling metaclass. This is an example of the fifth use of metaclasses listed above, wrapping every method in a class with a decorator function.
8.3.3 A Profiling Metaclass
One of the usecases above is wrapping all methods with a decorator. We can use this for profiling by recording method calls and how long they take. This is useful if you are looking to optimize the performance of your application. With IronPython you always have the option of moving parts of your code into C# to improve performance, but before you consider this it is important to profile so that you know exactly which parts of your code are the bottlenecks (the results are often not what you would expect). At Resolver we have been through this process many times, and have always managed to improve performance by optimizing our Python code; we haven't had to drop down into C# to improve speed so far.
For profiling IronPython code you can use the .NET 'DateTime'5 class.
from System import DateTime
start = DateTime.Now
someFunction()
timeTaken = (DateTime.Now - start).TotalMilliseconds
There is a drawback to the code above. 'DateTime' has a granularity of about 15milliseconds. For timing individual calls to fast running code, this can be way too coarse. An alternative is to use a high performance timer class from 'System.Diagnostics', the 'Stopwatch'6 class. Listing 8.9 is the code to time a function call using a 'Stopwatch'7.
Listing 8.9 Timing a Function Call with the Stopwatch Class
from System.Diagnostics import Stopwatch
s = Stopwatch()
s.Start()
someFunction()
s.Stop()
timeTaken = s.ElapsedMilliseconds
We can wrap this code in a decorator that tracks the number of calls to functions, and how long each call takes. A decorator that does this is shown in listing 8.10.
Listing 8.10 A Function Decorator that Times Calls
from System.Diagnostics import Stopwatch
timer = Stopwatch()
times = {} | # 1
def profiler(function): | # 2
def wrapped(*args, **keywargs):
if not timer.IsRunning: | # 2
timer.Start()
start = timer.ElapsedMilliseconds
retVal = function(*args, **keywargs) | # 3
timeTaken = timer.ElapsedMilliseconds - start
name = function.__name__
function_times = times.setdefault(name, []) | # 4
function_times.append(timeTaken) | # 5
return retVal
return wrapped
(annotation) <#1 A cache to store times>
(annotation) <#2 The decorator which wraps functions>
(annotation) <#3 Start the timer if necessary>
(annotation) <#4 Fetch the list that stores times>
(annotation) <#5 Add the current duration>
The profiler decorator takes a function and returns a wrapped function that times how long the call takes. This is stored in a cache (the 'times' dictionary), keyed by the function name.
Optimizing IronPython Code
It is always important to profile code when optimizing. The bottlenecks are rarely quite where you expect them and you can only find effective ways of speeding up your code if you know exactly which bits are slow.
For experienced Python programmers optimizing IronPython code this is especially important, because the performance of IronPython is so different to CPython. For example, function calls are much less expensive, as are arithmetic and operations involving the primitive types. On the other hand, operations with some of the Python types like tuples and dictionaries can be more expensive(8).
Now we need a metaclass that can apply this decorator to all the methods in a class.
We will be able to recognize methods by using 'FunctionType' from the Python
standard library types module. Listing 8.11 shows a metaclass that does this.
Importing from the Python Standard Library
Don't forget, that in order to import from the Python standard library it needs to be on your path. This happens automatically if you installed IronPython from the 'msi' installer. Otherwise you need the Python standard library (the easiest way to obtain it is to install the appropriate version of Python) and set the path to the library in the 'IRONPYTHONPATH' environment variable.
Listing 8.11 A Profiling Metaclass that Wraps Methods with 'profiler'
from types import FunctionType | # 1
class ProfilingMetaclass(type):
def __new__(meta, classname, bases, classDict):
for name, item in classDict.items():
if isinstance(item, FunctionType): | # 2
classDict[name] = profiler(item) | # 3
return type.__new__(meta, classname, bases, classDict)
(annotation) <#1 Alternatively 'FunctionType = type(some_function)'>
(annotation) <#2 Check for methods>
(annotation) <#3 Wrap methods with profiler>
Of course, having created a profiling metaclass we need to use it. Listing 8.12 shows how to use the 'ProfilingMetaclass'.
Listing 8.12 Timing Method Calls on an Object with the ProfilingMetaclass
from System.Threading import Thread
class Test(object):
__metaclass__ = ProfilingMetaclass
def __init__(self):
counter = 0
while counter < 100:
counter += 1
self.method()
def method(self):
Thread.CurrentThread.Join(20)
t = Test()
for name, calls in times.items():
print 'Function: %s' % name
print 'Called: %s times' % len(calls)
print ('Total time taken: %s seconds' %
(sum(calls) / 1000.0))
avg = (sum(calls) / float(len(calls)))
print 'Max: %sms, Min: %sms, Avg: %sms' % (max(calls), min(calls), avg)
When the 'Test' class is created all the methods are wrapped with the profiler. When it is constructed, it calls 'method' 100 hundred times. When the code has run, we print out the entries in the 'times' cache and can analyze the results. It should print something like this:
Function: method
Called: 100 times
Total time taken: 2.07 seconds
Max: 39ms, Min: 16ms, Avg: 20.7ms
Function: __init__
Called: 1 times
Total time taken: 2.093 seconds
Max: 2093ms, Min: 2093ms, Avg: 2093.0ms
If this were real code, we would know how many calls were being made to each method and how long they were taking. With this simple technique, the times include calls to other methods and will also include overhead for the profiling code itself. It is possible to write more sophisticated profiling code that tracks trees of calls and lets you see how long different branches of your code take.
So metaclasses are magic, but perhaps not as black as they are painted. This is true of all the 'Python magic' that we have been learning about. Python's magic methods are magic because they are invoked by the interpreter on your behalf, not because they are difficult to understand. Using these protocol methods is a normal part of programming in Python. You should now be confident in navigating which methods correspond to language features and how to implement them.
(1) See http://www.resolversystems.com
(2) Meaning that you can't have incompatible metaclasses where you have multiple inheritance.
(3) This borrows some of the use cases for metaclasses from an excellent presentation by Mike C. Fletcher:
http://www.vrplumber.com/programming/metaclasses-pycon.pdf
(4) Elixir is a declarative layer built on top of the popular Python ORM SQLAlchemy. http://elixir.ematia.de/trac/wiki
(5) http://msdn2.microsoft.com/en-us/library/system.datetime.aspx
(6) http://msdn2.microsoft.com/en-us/library/system.diagnostics.stopwatch(vs.80).aspx – note that on multi-core AMD processors the StopWatch can sometimes return negative numbers (appearing to travel backwards in time!). The solution is to apply this fix, http://support.microsoft.com/?id=896256
(7) The Python standard library 'time.clock()' is implemented using StopWatch under the hood.
(8) Which is hardly surprising, the Python builtin types are written in C and have been fine tuned over the last fifteen years or more. Their implementation in IronPython is barely a handful of years old.
Opinions expressed by DZone contributors are their own.
Comments