DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Curious about the future of data-driven systems? Join our Data Engineering roundtable and learn how to build scalable data platforms.

Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.

Threat Detection: Learn core practices for managing security risks and vulnerabilities in your organization — don't regret those threats!

Managing API integrations: Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

Avatar

Eli Bendersky

Cary, US

Joined Dec 2011

About

Eli's favorite programming languages are Python and C. He's also proficient in C++, and has various levels of familiarity with Perl, Java, Ruby, Javascript, Common Lisp, Scheme, Ada and a few assembly languages. @elibendersky

Stats

Reputation: 727
Pageviews: 678.5K
Articles: 22
Comments: 0
  • Articles

Articles

article thumbnail
The Confusion Matrix in Statistical Tests
It's been a crazy flu season, so you may be interested to learn how to use big data and statistics to learn about diagnostic flu tests available to doctors.
March 28, 2018
· 19,601 Views · 1 Like
article thumbnail
Conditional Probability and Bayes' Theorem
A doctor orders a blood test that is 90% accurate. Bob's test is positive. You might be tempted to say that there's a 90% chance he has the disease, but this is wrong.
March 15, 2018
· 15,253 Views · 3 Likes
article thumbnail
Concurrent Servers: Part 4 - libuv
Learn about rewriting your server with libuv for better performance, how it works, and handling time-consuming tasks in callbacks using a thread pool.
November 14, 2017
· 10,315 Views · 3 Likes
article thumbnail
Interacting With a Long-Running Child Process in Python
There's no single fits-all solution for this task, but here are a bunch of recipes to handle the more commonly occurring situations.
July 12, 2017
· 21,103 Views · 2 Likes
article thumbnail
Clojure Concurrency and Blocking With core.async
Take a deep dive into the performance problem of concurrent apps that use core.async where blocking operations are involved. If you're a fan of Clojure, you need to read this.
June 26, 2017
· 5,382 Views · 3 Likes
article thumbnail
Reducers, Transducers, and core.async in Clojure
This deep dive into Clojure's reducers, transducers and core.async is not for the faint of heart. Get to know how and when to use these tools for abstraction.
June 10, 2017
· 11,784 Views · 6 Likes
article thumbnail
A Polyglot's Guide to Multiple Dispatch Part 4: Clojure
The conclusion of this series about multiple dispatch focuses on using the technique in Clojure.
May 11, 2016
· 5,415 Views · 1 Like
article thumbnail
A Polyglot's Guide to Multiple Dispatch Part 3: Common Lisp
Part 3 on multiple dispatch goes back to the roots of the concept from Common Lisp
May 9, 2016
· 23,446 Views · 3 Likes
article thumbnail
A Polyglot's Guide to Multiple Dispatch, Part 2: Python
Part 2 of this Multiple Dispatch series focuses on implementing multiple dispatch in Python.
May 3, 2016
· 5,654 Views · 1 Like
article thumbnail
The Promises and Challenges of std::async Task-based Parallelism in C++11
Async in C++, complex parallel programming.
April 1, 2016
· 11,987 Views · 6 Likes
article thumbnail
gRPC Sample in C++ and Python
Google recently open sourced gRPC, which adds remote procedure calls on top of Protobufs, including code generation and serialization that works with multiple languages.
March 21, 2016
· 13,911 Views · 5 Likes
article thumbnail
C++11 Threads, Affinity, and Hyperthreading
C and C++ standards for years treated concurrency and multi-threading as outside the standard. Here's a bit of background, intro, and experimentation related to threads, hyperthreading, and affinity.
March 20, 2016
· 26,293 Views · 8 Likes
article thumbnail
Returning Multiple Values from Functions in C++
In this day and age, programmers often need to return multiple values from functions in C++. Author Eli Bendersky provides an overview of some of the options available to accomplish this feat, along with a peek at what's in store for this necessity in C++.
March 9, 2016
· 110,632 Views · 4 Likes
article thumbnail
Memory Layout of Multi-Dimensional Arrays
Multi-dimensional arrays are really fascinating for developers, and influence memory layout.
October 2, 2015
· 19,536 Views · 8 Likes
article thumbnail
On Parsing C, Type Declarations and Fake Headers
pycparser has become fairly popular in the past couple of years (especially following its usage in cffi). This means I get more questions by email, which leads me to getting tired of answering the same questions :-) So this blog post is a one-stop shop for the (by far) most frequently asked question about pycparser - how to handle headers that your code #includes. I've certainly written about this before, and it's mentioned in the README, but I feel that additional details are needed to provide a more complete answer to the different variations of this question. First, a disclaimer. This post assumes some level of familiarity with the C programming language and how it's compiled. You must know about the C preprocessor (the thing that handles directives like #include and #define), and have a general understanding of how multiple source files (most often a .c file and any number of .h files) get combined into a single translation unit for compilation. If you don't have a strong grasp of these concepts, I would hold off using pycparser until you learn more about them. So what's the problem? The problem arises when the code you want to analyze with pycparser #includes a header file: #include int foo() { // my code } Since this is true of virtually all real-life code, it's a problem almost everyone faces. How to handle headers with pycparser In general, pycparser does not concern itself with headers, or C preprocessor directives in general. The CParser object expects preprocessed code in its parse method, period. So you have two choices: Provide preprocessed code to pycparser. This means you first preprocess the code by invoking, say, gcc -E (or clang -E, or cpp, or whatever way you have to preprocess code [1]). Use pycparser's parse_file convenience function; it will invoke the preprocessor for you. Here's an example. Great, so now you can handle headers. However, this is unlikely to solve all your problems, because pycparser will have trouble parsing some library headers; first and foremost, it will probably have trouble parsing the standard library headers. Why? Because while pycparser fully supports C99, many library headers are full of compiler extensions and other clever tricks for compatibility across multiple platforms. While it's entirely possible to parse them with pycparser [2], this requires work. Work that you may not have the skills or the time to do. Work that, fortunately, is almost certainly unnecessary. Why isn't it necessary? Because, in all likeness, you don't really need pycparser to parse those headers at all. What pycparser actually needs to parse headers for To understand this bold claim, you must first understand why pycparser needs to parse headers. Let's start with a more basic question - why does the C compiler need to parse headers your file includes? For a number of reasons; some of them syntactic, but most of them semantic. Syntactic issues are those that may prevent the compiler from parsing the code. #defines are one, types are another. For example, the C code: { T * x; } Cannot be properly parsed unless we know whether: Either T or x are macros #defined to something. T is a type that was previously created with a typedef. For a thorough explanation of this issue, look at this article and other related postings on my website. Semantic reasons are those that won't prevent the compiler from parsing the code, but will prevent it from properly understanding and verifying it. For example, declarations of functions being used. Full declarations of structs, and so on. These take up the vast majority of real-world header files. But as it turns out, since pycparser only cares about parsing the code into an AST, and doesn't do any semantic analysis or further processing, it doesn't care about these issues. In other words, given the code: { foo(a.b); } pycparser can construct a proper AST (given that none of foo, a or b are type names). It doesn't care what the actual declaration of foo is, whether a is indeed a variable of struct type, or whether it has a field named b [3]. So pycparser requires very little from header files. This is how the idea of "fake headers" was born. Fake headers Let's get back to this simple code sample: #include int foo() { // my code } So we've established two key ideas: pycparser needs to know what someheader.h contains so it can properly parse the code. pycparser needs only a very small subset of someheader.h to perform its task. The idea of fake headers is simple. Instead of actually parsing someheader.h and all the other headers it transitively includes (this probably includes lots of system and standard library headers too), why not create a "fake" someheader.h that only contains the parts of the original that are necessary for parsing - the #defines and the typedefs. The cool part about typedefs is that pycparser doesn't actually care what a type is defined to be. T may be a pointer to function accepting an array of struct types, but all pycparser needs to see is: typedef int T; So it knows that T is a type. It doesn't care what kind of type it is. So what do you have to do to parse your program? OK, so now you hopefully have a better understanding of what headers mean for pycparser, and how to work around having to parse tons of system headers. What does this actually mean for your program, though? Will you now have to scour through all your headers, "faking them out"? Unlikely. If your code is standards-compliant C, then most likely pycparser will have no issue parsing all your headers. But you probably don't want it to parse the system hedaders. In addition to being nonstandard, these headers are usually large, which means longer parsing time and larger ASTs. So my suggestion would be: let pycparser parse your headers, but fake out the system headers, and possibly any other large library headers used by your code. As far as the standard headers, pycparser already provides you with nice fakes in its utils folder. All you need to do is provide this flag to the preprocessor [4]: -I/utils/fake_libc_include And it will be able to find header files like stdio.h and sys/types.h with the proper types defined. I'll repeat: the flag shown above is almost certainly sufficient to parse a C99 program that only relies on the C runtime (i.e. has no other library dependencies). Real-world example OK, enough theory. Now I want to work through an example to help ground these suggestions in reality. I'll take some well-known open-source C project and use pycparser to parse one of its files, fully showing all the steps taken until a successful parse is done. I'll pick Redis. Let's start at the beginning, by cloning the Redis git repo: /tmp$ git clone git@github.com:antirez/redis.git I'll be using the latest released pycparser (version 2.13 at the time of writing). I'll also clone its repository into /tmp so I can easily access the fake headers: /tmp$ git clone git@github.com:eliben/pycparser.git A word on methodology - when initially exploring how to parse a new project, I'm always preprocessing separately. Once I figure out the flags/settings/extra faking required to successfully parse the code, it's all very easy to put in a script. Let's take the main Redis file (redis/src/redis.c) and attempt to preprocess it. The first preprocessor invocation simply adds the include paths for Redis's own headers (they live in redis/src) and pycparser's fake libc headers: /tmp$ gcc -E -Iredis/src -Ipycparser/utils/fake_libs_include redis/src/redis.c > redis_pp.c # 48 "redis/src/redis.h" 2 In file included from redis/src/redis.c:30:0: redis/src/redis.h:48:17: fatal error: lua.h: No such file or directory #include ^ compilation terminated. Oops, no good. Redis is looking for Lua headers. Let's see if it carries this dependency along: /tmp$ find redis -name lua redis/deps/lua Indeed! We should be able to add the Lua headers to the preprocessor path too: /tmp$ gcc -E -Iredis/src -Ipycparser/utils/fake_libs_include \ -Iredis/deps/lua/src redis/src/redis.c > redis_pp.c Great, no more errors. Now let's try to parse it with pycparser. I'll load pycparser in an interactive terminal, but any other technique (such as running one of the example scripts will work): : import pycparser : pycparser.parse_file('/tmp/redis_pp.c') ... backtrace ---> 55 raise ParseError("%s: %s" % (coord, msg)) ParseError: /usr/include/x86_64-linux-gnu/sys/types.h:194:20: before: __attribute__ This error is strange. Note where it occurs: in a system header included in the preprocessed file. But we should have no system headers there - we specified the fake headers path. What gives? The reason this is happening is that gcc knows about some pre-set system header directories and will add them to its search path. We can block this, making sure it only looks in the directories we explicitly specify with -I, by providing it with the -nostdinc flag. Let's re-run the preprocessor: /tmp$ gcc -nostdinc -E -Iredis/src -Ipycparser/utils/fake_libc_include \ -Iredis/deps/lua/src redis/src/redis.c > redis_pp.c Now I'll try to parse the preprocessed code again: : pycparser.parse_file('/tmp/redis_pp.c') ... backtrace ---> 55 raise ParseError("%s: %s" % (coord, msg)) ParseError: redis/src/sds.h:74:5: before: __attribute__ OK, progress! If we look in the code where this error occurs, we'll note a GNU-specific __attribute__ pycparser doesn't support. No problem, let's just #define it away: $ gcc -nostdinc -E -D'__attribute__(x)=' -Iredis/src \ -Ipycparser/utils/fake_libc_include \ -Iredis/deps/lua/src redis/src/redis.c > redis_pp.c If I try to parse again, it works: : pycparser.parse_file('/tmp/redis_pp.c') I can also run one of the example scripts now to see we can do something more interesting with the AST: /tmp$ python pycparser/examples/func_defs.py redis_pp.c sdslen at redis/src/sds.h:47 sdsavail at redis/src/sds.h:52 rioWrite at redis/src/rio.h:93 rioRead at redis/src/rio.h:106 rioTell at redis/src/rio.h:119 rioFlush at redis/src/rio.h:123 redisLogRaw at redis/src/redis.c:299 redisLog at redis/src/redis.c:343 redisLogFromHandler at redis/src/redis.c:362 ustime at redis/src/redis.c:385 mstime at redis/src/redis.c:396 exitFromChild at redis/src/redis.c:404 dictVanillaFree at redis/src/redis.c:418 ... many more lines main at redis/src/redis.c:3733 This lets us see all the functions defined in redis.c and the headers included in it using pycparser. This was fairly straightforward - all I had to do is set the right preprocessor flags, really. In some cases, it may be a bit more difficult. The most obvious problem that you may encounter is a new header you'll need to fake away. Luckily, that's very easy - just take a look at the existing ones (say at stdio.h). These headers can be copied to other names/directories, to make sure the preprocessor will find them properly. If you think there's a standard header I forgot to include in the fake headers, please open an issue and I'll add it. Note that we didn't have to fake out the headers of Redis (or Lua for that matter). pycparser handled them just fine. The same has a high chance of being true for your C project as well. [1] On Linux, at least gcc should be there on the command line. On OS X, you'll need to install "command-line developer tools" to get a command-line clang. If you're in Microsoft-land, I recommend downloading pre-built clang binaries for Windows. [2] And this has been done by many folks. pycparser was made to parse the standard C library, windows.h, parts of the Linux kernel headers, and so on. [3] Note that this describes the most common use of pycparser, which is to perform simple analyses on the source, or rewrite parts of existing source in some way. More complex uses may actually require full parsing of type definitions, structs and function declarations. In fact, you can certainly create a real C compiler using pycparser as the frontend. These uses will require full parsing of headers, so fake headers won't do. As I mentioned above, it's possible to make pycparser parse the actual headers of libraries and so on; it just takes more work. [4] Depending on the exact preprocessor you're using, you may need to provide it with another flag telling it to ignore the system headers whose paths are hard-coded in it. Read on to the example for more details.
June 4, 2015
· 7,863 Views
article thumbnail
Visualizing Matrix Multiplication as a Linear Combination
when multiplying two matrices, there's a manual procedure we all know how to go through. each result cell is computed separately as the dot-product of a row in the first matrix with a column in the second matrix. while it's the easiest way to compute the result manually, it may obscure a very interesting property of the operation: multiplying a by b is the linear combination of a's columns using coefficients from b . another way to look at it is that it's a linear combination of the rows of b using coefficients from a . in this quick post i want to show a colorful visualization that will make this easier to grasp. right-multiplication: combination of columns let's begin by looking at the right-multiplication of matrix x by a column vector: representing the columns of x by colorful boxes will help visualize this: sticking the white box with a in it to a vector just means: multiply this vector by the scalar a. the result is another column vector - a linear combination of x's columns, with a, b, c as the coefficients. right-multiplying x by a matrix is more of the same. each resulting column is a different linear combination of x's columns: graphically: if you look hard at the equation above and squint a bit, you can recognize this column-combination property by examining each column of the result matrix. left-multiplication: combination of rows now let's examine left-multiplication. left-multiplying a matrix x by a row vector is a linear combination of x's rows : is represented graphically thus: and left-multiplying by a matrix is the same thing repeated for every result row: it becomes the linear combination of the rows of x, with the coefficients taken from the rows of the matrix on the left. here's the equation form: and the graphical form:
April 12, 2015
· 24,361 Views · 47 Likes
article thumbnail
Redirecting All Kinds of stdout in Python
A common task in Python (especially while testing or debugging) is to redirect sys.stdout to a stream or a file while executing some piece of code. However, simply "redirecting stdout" is sometimes not as easy as one would expect; hence the slightly strange title of this post. In particular, things become interesting when you want C code running within your Python process (including, but not limited to, Python modules implemented as C extensions) to also have its stdout redirected according to your wish. This turns out to be tricky and leads us into the interesting world of file descriptors, buffers and system calls. But let's start with the basics. Pure Python The simplest case arises when the underlying Python code writes to stdout, whether by calling print, sys.stdout.write or some equivalent method. If the code you have does all its printing from Python, redirection is very easy. With Python 3.4 we even have a built-in tool in the standard library for this purpose - contextlib.redirect_stdout. Here's how to use it: from contextlib import redirect_stdout f = io.StringIO() with redirect_stdout(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) When this code runs, the actual print calls within the with block don't emit anything to the screen, and you'll see their output captured by in the stream f. Incidentally, note how perfect the with statement is for this goal - everything within the block gets redirected; once the block is done, things are cleaned up for you and redirection stops. If you're stuck on an older and uncool Python, prior to 3.4 [1], what then? Well, redirect_stdout is really easy to implement on your own. I'll change its name slightly to avoid confusion: from contextlib import contextmanager @contextmanager def stdout_redirector(stream): old_stdout = sys.stdout sys.stdout = stream try: yield finally: sys.stdout = old_stdout So we're back in the game: f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) Redirecting C-level streams Now, let's take our shiny redirector for a more challenging ride: import ctypes libc = ctypes.CDLL(None) f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue())) I'm using ctypes to directly invoke the C library's puts function [2]. This simulates what happens when C code called from within our Python code prints to stdout - the same would apply to a Python module using a C extension. Another addition is the os.system call to invoke a subprocess that also prints to stdout. What we get from this is: this comes from C and this is from echo Got stdout: "foobar 12 " Err... no good. The prints got redirected as expected, but the output from puts and echo flew right past our redirector and ended up in the terminal without being caught. What gives? To grasp why this didn't work, we have to first understand what sys.stdout actually is in Python. Detour - on file descriptors and streams This section dives into some internals of the operating system, the C library, and Python [3]. If you just want to know how to properly redirect printouts from C in Python, you can safely skip to the next section (though understanding how the redirection works will be difficult). Files are opened by the OS, which keeps a system-wide table of open files, some of which may point to the same underlying disk data (two processes can have the same file open at the same time, each reading from a different place, etc.) File descriptors are another abstraction, which is managed per-process. Each process has its own table of open file descriptors that point into the system-wide table. Here's a schematic, taken from The Linux Programming Interface: File descriptors allow sharing open files between processes (for example when creating child processes with fork). They're also useful for redirecting from one entry to another, which is relevant to this post. Suppose that we make file descriptor 5 a copy of file descriptor 4. Then all writes to 5 will behave in the same way as writes to 4. Coupled with the fact that the standard output is just another file descriptor on Unix (usually index 1), you can see where this is going. The full code is given in the next section. File descriptors are not the end of the story, however. You can read and write to them with the read and write system calls, but this is not the way things are typically done. The C runtime library provides a convenient abstraction around file descriptors - streams. These are exposed to the programmer as the opaque FILE structure with a set of functions that act on it (for example fprintf and fgets). FILE is a fairly complex structure, but the most important things to know about it is that it holds a file descriptor to which the actual system calls are directed, and it provides buffering, to ensure that the system call (which is expensive) is not called too often. Suppose you emit stuff to a binary file, a byte or two at a time. Unbuffered writes to the file descriptor with write would be quite expensive because each write invokes a system call. On the other hand, using fwrite is much cheaper because the typicall call to this function just copies your data into its internal buffer and advances a pointer. Only occasionally (depending on the buffer size and flags) will an actual write system call be issued. With this information in hand, it should be easy to understand what stdout actually is for a C program. stdout is a global FILE object kept for us by the C library, and it buffers output to file descriptor number 1. Calls to functions like printf and puts add data into this buffer. fflush forces its flushing to the file descriptor, and so on. But we're talking about Python here, not C. So how does Python translate calls to sys.stdout.write to actual output? Python uses its own abstraction over the underlying file descriptor - a file object. Moreover, in Python 3 this file object is further wrapper in an io.TextIOWrapper, because what we pass to print is a Unicode string, but the underlying write system calls accept binary data, so encoding has to happen en route. The important take-away from this is: Python and a C extension loaded by it (this is similarly relevant to C code invoked via ctypes) run in the same process, and share the underlying file descriptor for standard output. However, while Python has its own high-level wrapper around it - sys.stdout, the C code uses its own FILE object. Therefore, simply replacing sys.stdout cannot, in principle, affect output from C code. To make the replacement deeper, we have to touch something shared by the Python and C runtimes - the file descriptor. Redirecting with file descriptor duplication Without further ado, here is an improved stdout_redirector that also redirects output from C code [4]: from contextlib import contextmanager import ctypes import io import os, sys import tempfile libc = ctypes.CDLL(None) c_stdout = ctypes.c_void_p.in_dll(libc, 'stdout') @contextmanager def stdout_redirector(stream): # The original fd stdout points to. Usually 1 on POSIX systems. original_stdout_fd = sys.stdout.fileno() def _redirect_stdout(to_fd): """Redirect stdout to the given file descriptor.""" # Flush the C-level buffer stdout libc.fflush(c_stdout) # Flush and close sys.stdout - also closes the file descriptor (fd) sys.stdout.close() # Make original_stdout_fd point to the same file as to_fd os.dup2(to_fd, original_stdout_fd) # Create a new sys.stdout that points to the redirected fd sys.stdout = io.TextIOWrapper(os.fdopen(original_stdout_fd, 'wb')) # Save a copy of the original stdout fd in saved_stdout_fd saved_stdout_fd = os.dup(original_stdout_fd) try: # Create a temporary file and redirect stdout to it tfile = tempfile.TemporaryFile(mode='w+b') _redirect_stdout(tfile.fileno()) # Yield to caller, then redirect stdout back to the saved fd yield _redirect_stdout(saved_stdout_fd) # Copy contents of temporary file to the given stream tfile.flush() tfile.seek(0, io.SEEK_SET) stream.write(tfile.read()) finally: tfile.close() os.close(saved_stdout_fd) There are a lot of details here (such as managing the temporary file into which output is redirected) that may obscure the key approach: using dup and dup2 to manipulate file descriptors. These functions let us duplicate file descriptors and make any descriptor point at any file. I won't spend more time on them - go ahead and read their documentation, if you're interested. The detour section should provide enough background to understand it. Let's try this: f = io.BytesIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue().decode('utf-8'))) Gives us: Got stdout: "and this is from echo this comes from C foobar 12 " Success! A few things to note: The output order may not be what we expected. This is due to buffering. If it's important to preserve order between different kinds of output (i.e. between C and Python), further work is required to disable buffering on all relevant streams. You may wonder why the output of echo was redirected at all? The answer is that file descriptors are inherited by subprocesses. Since we rigged fd 1 to point to our file instead of the standard output prior to forking to echo, this is where its output went. We use a BytesIO here. This is because on the lowest level, the file descriptors are binary. It may be possible to do the decoding when copying from the temporary file into the given stream, but that can hide problems. Python has its in-memory understanding of Unicode, but who knows what is the right encoding for data printed out from underlying C code? This is why this particular redirection approach leaves the decoding to the caller. The above also makes this code specific to Python 3. There's no magic involved, and porting to Python 2 is trivial, but some assumptions made here don't hold (such as sys.stdout being a io.TextIOWrapper). Redirecting the stdout of a child process We've just seen that the file descriptor duplication approach lets us grab the output from child processes as well. But it may not always be the most convenient way to achieve this task. In the general case, you typically use the subprocess module to launch child processes, and you may launch several such processes either in a pipe or separately. Some programs will even juggle multiple subprocesses launched this way in different threads. Moreover, while these subprocesses are running you may want to emit something to stdout and you don't want this output to be captured. So, managing the stdout file descriptor in the general case can be messy; it is also unnecessary, because there's a much simpler way. The subprocess module's swiss knife Popen class (which serve as the basis for much of the rest of the module) accepts a stdout parameter, which we can use to ask it to get access to the child's stdout: import subprocess echo_cmd = ['echo', 'this', 'comes', 'from', 'echo'] proc = subprocess.Popen(echo_cmd, stdout=subprocess.PIPE) output = proc.communicate()[0] print('Got stdout:', output) The subprocess.PIPE argument can be used to set up actual child process pipes (a la the shell), but in its simplest incarnation it captures the process's output. If you only launch a single child process at a time and are interested in its output, there's an even simpler way: output = subprocess.check_output(echo_cmd) print('Got stdout:', output) check_output will capture and return the child's standard output to you; it will also raise an exception if the child exist with a non-zero return code. Conclusion I hope I covered most of the common cases where "stdout redirection" is needed in Python. Naturally, all of the same applies to the other standard output stream - stderr. Also, I hope the background on file descriptors was sufficiently clear to explain the redirection code; squeezing this topic in such a short space is challenging. Let me know if any questions remain or if there's something I could have explained better. Finally, while it is conceptually simple, the code for the redirector is quite long; I'll be happy to hear if you find a shorter way to achieve the same effect. [1] Do not despair. As of February 2015, a sizable chunk of the worldwide Python programmers are in the same boat. [2] Note that bytes passed to puts. This being Python 3, we have to be careful since libc doesn't understand Python's unicode strings. [3] The following description focuses on Unix/POSIX systems; also, it's necessarily partial. Large book chapters have been written on this topic - I'm just trying to present some key concepts relevant to stream redirection. [4] The approach taken here is inspired by this Stack Overflow answer.
February 23, 2015
· 17,989 Views
article thumbnail
Dynamically Generating Python Test Cases
Testing is crucial. While many different kinds and levels of testing exist, there’s good library support only for unit tests (the Python unittest package and its moral equivalents in other languages). However, unit testing does not cover all kinds of testing we may want to do – for example, all kinds of whole program tests and integration tests. This is where we usually end up with a custom "test runner" script. Having written my share of such custom test runners, I’ve recently gravitated towards a very convenient approach which I want to share here. In short, I’m actually using Python’s unittest, combined with the dynamic nature of the language, to run all kinds of tests. Let’s assume my tests are some sort of data files which have to be fed to a program. The output of the program is compared to some "expected results" file, or maybe is encoded in the data file itself in some way. The details of this are immaterial, but seasoned programmers usually encounter such testing rigs very frequently. It commonly comes up when the program under test is a data-transformation mechanism of some sort (compiler, encryptor, encoder, compressor, translator etc.) So you write a "test runner". A script that looks at some directory tree, finds all the "test files" there, runs each through the transformation, compares, reports, etc. I’m sure all these test runners share a lot of common infrastructure – I know that mine do. Why not employ Python’s existing "test runner" capabilities to do the same? Here’s a very short code snippet that can serve as a template to achieve this: import unittest class TestsContainer(unittest.TestCase): longMessage = True def make_test_function(description, a, b): def test(self): self.assertEqual(a, b, description) return test if __name__ == '__main__': testsmap = { 'foo': [1, 1], 'bar': [1, 2], 'baz': [5, 5]} for name, params in testsmap.iteritems(): test_func = make_test_function(name, params[0], params[1]) setattr(TestsContainer, 'test_{0}'.format(name), test_func) unittest.main() What happens here: The test class TestsContainer will contain dynamically generated test methods. make_test_function creates a test function (a method, to be precise) that compares its inputs. This is just a trivial template – it could do anything, or there can be multiple such "makers" fur multiple purposes. The loop creates test functions from the data description in testmap and attaches them to the test class. Keep in mind that this is a very basic example. I hope it’s obvious that testmap could really be test files found on disk, or whatever else. The main idea here is the dynamic test method creation. So what do we gain from this, you may ask? Quite a lot. unittest is powerful – armed to its teeth with useful tools for testing. You can now invoke tests from the command line, control verbosity, control "fast fail" behavior, easily filter which tests to run and which not to run, use all kinds of assertion methods for readability and reporting (why write your own smart list comparison assertions?). Moreover, you can build on top of any number of third-party tools for working with unittest results – HTML/XML reporting, logging, automatic CI integration, and so on. The possibilities are endless. One interesting variation on this theme is aiming the dynamic generation at a different testing "layer". unittest defines any number of "test cases" (classes), each with any number of "tests" (methods). In the code above, we generate a bunch of tests into a single test case. Here’s a sample invocation to see this in action: $ python dynamic_test_methods.py -v test_bar (__main__.TestsContainer) ... FAIL test_baz (__main__.TestsContainer) ... ok test_foo (__main__.TestsContainer) ... ok ====================================================================== FAIL: test_bar (__main__.TestsContainer) ---------------------------------------------------------------------- Traceback (most recent call last): File "dynamic_test_methods.py", line 8, in test self.assertEqual(a, b, description) AssertionError: 1 != 2 : bar ---------------------------------------------------------------------- Ran 3 tests in 0.001s FAILED (failures=1) As you can see, all data pairs in testmap are translated into distinctly named test methods within the single test case TestsContainer. Very easily, we can cut this a different way, by generating a whole test case for each data item: import unittest class DynamicClassBase(unittest.TestCase): longMessage = True def make_test_function(description, a, b): def test(self): self.assertEqual(a, b, description) return test if __name__ == '__main__': testsmap = { 'foo': [1, 1], 'bar': [1, 2], 'baz': [5, 5]} for name, params in testsmap.iteritems(): test_func = make_test_function(name, params[0], params[1]) klassname = 'Test_{0}'.format(name) globals()[klassname] = type(klassname, (DynamicClassBase,), {'test_gen_{0}'.format(name): test_func}) unittest.main() Most of the code here remains the same. The difference is in the lines within the loop: now instead of dynamically creating test methods and attaching them to the test case, we create whole test cases – one per data item, with a single test method. All test cases derive from DynamicClassBase and hence from unittest.TestCase, so they will be auto-discovered by the unittest machinery. Now an execution will look like this: $ python dynamic_test_classes.py -v test_gen_bar (__main__.Test_bar) ... FAIL test_gen_baz (__main__.Test_baz) ... ok test_gen_foo (__main__.Test_foo) ... ok ====================================================================== FAIL: test_gen_bar (__main__.Test_bar) ---------------------------------------------------------------------- Traceback (most recent call last): File "dynamic_test_classes.py", line 8, in test self.assertEqual(a, b, description) AssertionError: 1 != 2 : bar ---------------------------------------------------------------------- Ran 3 tests in 0.000s FAILED (failures=1) Why would you want to generate whole test cases dynamically rather than just single tests? It all depends on your specific needs, really. In general, test cases are better isolated and share less, than tests within one test case. Moreover, you may have a huge amount of tests and want to use tools that shard your tests for parallel execution – in this case you almost certainly need separate test cases. I’ve used this technique in a number of projects over the past couple of years and found it very useful; more than once, I replaced a whole complex test runner program with about 20-30 lines of code using this technique, and gained access to many more capabilities for free. Python’s built-in test discovery, reporting and running facilities are very powerful. Coupled with third-party tools they can be even more powerful. Leveraging all this power for any kind of testing, and not just unit testing, is possible with very little code, due to Python’s dynamism. I hope you find it useful too.
April 25, 2014
· 11,825 Views · 2 Likes
article thumbnail
Clearing the Database with Django Commands
In a previous post, I presented a method of loading initial data into a Django database by using a custom management command. An accompanying task is cleaning the database up. Here I want to discuss a few options for doing that. First, some general design notes on Django management commands. If you run manage.py help you’ll see a whole bunch of commands starting with sql. These all share a common idiom – print SQL statements to the standard output. Almost all DB engines have means to pipe commands from the standard input, so this plays great with the Unix philosophy of building pipes of single-task programs. Django even provides a convenient shortcut for us to access the actual DB that’s being used with a given project – the dbshell command. As an example, we have the sqlflush command, which returns a list of the SQL statements required to return all tables in the database to the state they were in just after they were installed. In a simple blog-like application with "post" and "tag" models, it may return something like: $ python manage.py sqlflush BEGIN; DELETE FROM "auth_permission"; DELETE FROM "auth_group"; DELETE FROM "django_content_type"; DELETE FROM "django_session"; DELETE FROM "blogapp_tag"; DELETE FROM "auth_user_groups"; DELETE FROM "auth_group_permissions"; DELETE FROM "auth_user_user_permissions"; DELETE FROM "blogapp_post"; DELETE FROM "blogapp_post_tags"; DELETE FROM "auth_user"; DELETE FROM "django_admin_log"; COMMIT; Note there’s a lot of tables here, because the project also installed the admin and auth applications from django.contrib. We can actually execute these SQL statements, and thus wipe out all the DB tables in our database, by running: $ python manage.py sqlflush | python manage.py dbshell For this particular sequence, since it’s so useful, Django has a special built-in command named flush. But there’s a problem with running flush that may or may not bother you, depending on what your goals are. It wipes out all tables, and this means authentication data as well. So if you’ve created a default admin user when jump-starting the application, you’ll have to re-create it now. Perhaps there’s a more gentle way to delete just your app’s data, without messing with the other apps? Yes. In fact, I’m going to show a number of ways. First, let’s see what other existing management commands have to offer. sqlclear will emit the commands needed to drop all tables in a given app. For example: $ python manage.py sqlclear blogapp BEGIN; DROP TABLE "blogapp_tag"; DROP TABLE "blogapp_post"; DROP TABLE "blogapp_post_tags"; COMMIT; So we can use it to target a specific app, rather than using the kill-all approach of flush. There’s a catch, though. While flush runs delete to wipe all data from the tables, sqlclear removes the actual tables. So in order to be able to work with the database, these tables have to be re-created. Worry not, there’s a command for that: $ python manage.py sql blogapp BEGIN; CREATE TABLE "blogapp_post_tags" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "post_id" integer NOT NULL REFERENCES "blogapp_post" ("id"), "tag_id" varchar(50) NOT NULL REFERENCES "blogapp_tag" ("name"), UNIQUE ("post_id", "tag_id") ) ; CREATE TABLE "blogapp_post" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, <.......> ) ; CREATE TABLE "blogapp_tag" ( <.......> ) ; COMMIT; So here’s a first way to do a DB cleanup: pipe sqlclear appname into dbshell. Then pipe sql appname to dbshell. An alternative way, which I like less, is to take the subset of DELETE statements generated by sqlflush, save them in a text file, and pipe it through to dbshell when needed. For example, for the blog app discussed above, these statements should do it: BEGIN; DELETE FROM "blogapp_tag"; DELETE FROM "blogapp_post"; DELETE FROM "blogapp_post_tags"; DELETE COMMIT; The reason I don’t like it is that it forces you to have explicit table names stored somewhere, which is a duplication of the existing models. If you happen to change some of your foreign keys, for example, tables will need changing so this file will have to be regenerated. The approach I like best is more programmatic. Django’s model API is flexible and convenient, and we can just use it in a custom management command: from django.core.management.base import BaseCommand from blogapp.models import Post, Tag class Command(BaseCommand): def handle(self, *args, **options): Tag.objects.all().delete() Post.objects.all().delete() Save this code as blogapp/management/commands/clear_models.py, and now it can be invoked with: $ python manage.py clear_models
March 24, 2014
· 18,693 Views
article thumbnail
Classical Inheritance in JavaScript ES5
JavaScript’s prototype-based inheritance is interesting and has its uses, but sometimes one just wants to express classical inheritance, familiar from C++ and Java. This need has been recognized by the ECMAScript committee and classes are being discussed for inclusion in the next version of the standard. It was surprisingly hard for me to find a good and simple code sample that shows how to cleanly and correctly express inheritance with ES5 (a lot of links discuss how to implement the pre-ES5 tools required for that) and explains why the thing works. Mozilla’s Object.Create reference came close, but not quite there because it still left some open questions. Hence this short post. Without further ado, the following code defines a parent class named Shape with a constructor and a method, and a derived class named Circle that has its own method: // Shape - superclass // x,y: location of shape's bounding rectangle function Shape(x, y) { this.x = x; this.y = y; } // Superclass method Shape.prototype.move = function(x, y) { this.x += x; this.y += y; } // Circle - subclass function Circle(x, y, r) { // Call constructor of superclass to initialize superclass-derived members. Shape.call(this, x, y); // Initialize subclass's own members this.r = r; } // Circle derives from Shape Circle.prototype = Object.create(Shape.prototype); Circle.prototype.constructor = Circle; // Subclass methods. Add them after Circle.prototype is created with // Object.create Circle.prototype.area = function() { return this.r * 2 * Math.PI; } The most interesting part here, the one that actually performs the feat of inheritance is these two lines, so I’ll explain them a bit: Circle.prototype = Object.create(Shape.prototype); Circle.prototype.constructor = Circle; The first line is the magic – it sets up the prototype chain. To understand it, you must first understand that "the prototype of an object" and "the .prototype property of an object" are different things. If you don’t, go read on that a bit. The first line, interpreted very technically, says: the prototype of new objects created with the Circle constructor is an object whose prototype is the prototype of objects created by Shape constructor. Yeah, that’s a handful. But it can be simplified as: each Circle has a Shape as its prototype. What about the second line? While not strictly necessary, it’s there to preserve some useful invariants, as we’ll see below. Since the assignment to Circle.prototype kills the existing Circle.prototype.constructor (which was set to Circle when the Circle constructor was created), we restore it. Let’s whip up a JavaScript console and load that code inside, to quickly try some stuff: > var shp = new Shape(1, 2) undefined > [shp.x, shp.y] [1, 2] > shp.move(1, 1) undefined > [shp.x, shp.y] [2, 3] … but we’re here for the circles: > var cir = new Circle(5, 6, 2) undefined > [cir.x, cir.y, cir.r] [5, 6, 2] > cir.move(1, 1) undefined > [cir.x, cir.y, cir.r] [6, 7, 2] > cir.area() 12.566370614359172 So far so good, a Circle initialized itself correctly using the Shape constructor; it responds to the methods inherited from Shape, and to its own area method too. Let’s check that the prototype shenanigans worked as expected: > var shape_proto = Object.getPrototypeOf(shp) undefined > var circle_proto = Object.getPrototypeOf(cir) undefined > Object.getPrototypeOf(circle_proto) === shape_proto true Great. Now let’s see what instanceof has to say: > cir instanceof Shape true > cir instanceof Circle true > shp instanceof Shape true > shp instanceof Circle false Finally, here are some things we can do with the constructor property that wouldn’t have been possible had we not preserved it: > cir.constructor === Circle true // Create a new Circle object based on an existing Circle instance > var new_cir = new cir.constructor(3, 4, 1.5) undefined > new_cir Circle {x: 3, y: 4, r: 1.5, constructor: function, area: function} A lot of existing code (and programmers) expect the constructor property of objects to point back to the constructor function used to create them with new. In addition, it is sometimes useful to be able to create a new object of the same class as an existing object, and here as well the constructor property is useful. So that is how we express classical inheritance in JavaScript. It is very explicit, and hence on the long-ish side. Hopefully the future ES standards will provide nice sugar for succinct class definitions.
October 24, 2013
· 7,961 Views · 1 Like
article thumbnail
Displaying all argv in x64 assembly
Recently I’ve been doing some x64 assembly hacking, and something I had to Google a bit and collect from a few places is how to go over all command-line arguments (colloquially known as argv from C) and do something with them. I already discussed how arguments get passed into a program in the past (not the C main, mind you, but rather the real entry point of a program – _start), so what was left is just a small matter of implementation. Here it is, in GNU Assembly (gas) syntax for Linux. This is pure assembly code – it does not use the C standard library or runtime at all. It demonstrates a lot of interesting concepts such as reading command-line arguments, issuing Linux system calls and string processing. #---------------- DATA ----------------# .data # We need buf_for_itoa to be large enough to contain a 64-bit integer. # endbuf_for_itoa will point to the end of buf_for_itoa and is useful # for passing to itoa. .set BUFLEN, 32 buf_for_itoa: .space BUFLEN, 0x0 .set endbuf_for_itoa, buf_for_itoa + BUFLEN - 1 newline_str: .asciz "\n" argc_str: .asciz "argc: " #---------------- CODE ----------------# .globl _start .text _start: # On entry to _start, argc is in (%rsp), argv[0] in 8(%rsp), # argv[1] in 16(%rsp) and so on. lea argc_str, %rdi call print_cstring mov (%rsp), %r12 # save argc in r12 # Convert the argc value to a string and print it out mov %r12, %rdi lea endbuf_for_itoa, %rsi call itoa mov %rax, %rdi call print_cstring lea newline_str, %rdi call print_cstring # In a loop, pick argv[n] for 0 <= n < argc and print it out, # followed by a newline. r13 holds n. xor %r13, %r13 .L_argv_loop: mov 8(%rsp, %r13, 8), %rdi # argv[n] is in (rsp + 8 + 8*n) call print_cstring lea newline_str, %rdi call print_cstring inc %r13 cmp %r12, %r13 jl .L_argv_loop # exit(0) mov $60, %rax mov $0, %rdi syscall This code uses a couple of support functions. The first is print_cstring: # Function print_cstring # Print a null-terminated string to stdout. # Arguments: # rdi address of string # Returns: void print_cstring: # Find the terminating null mov %rdi, %r10 .L_find_null: cmpb $0, (%r10) je .L_end_find_null inc %r10 jmp .L_find_null .L_end_find_null: # r10 points to the terminating null. so r10-rdi is the length sub %rdi, %r10 # Now that we have the length, we can call sys_write # sys_write(unsigned fd, char* buf, size_t count) mov $1, %rax # Populate address of string into rsi first, because the later # assignment of fd clobbers rdi. mov %rdi, %rsi mov $1, %rdi mov %r10, %rdx syscall ret More interestingly, here is itoa. It’s a bit more general than what I actually use in the main program because it also supports negative numbers. It can convert any number that fits into a 64-bit register. Note the unusual API for receiving and returning the place where the actual string is written. Since it’s very natural for an itoa implementation to emit the digits in reverse, I wanted to avoid actual string reversing by writing the digits into a buffer from the end towards the beginning. # Function itoa # Convert an integer to a null-terminated string in memory. # Assumes that there is enough space allocated in the target # buffer for the representation of the integer. Since the number itself # is accepted in the register, its value is bounded. # Arguments: # rdi: the integer # rsi: address of the *last* byte in the target buffer # Returns: # rax: address of the first byte in the target string that # contains valid information. itoa: movb $0, (%rsi) # Write the terminating null and advance. dec %rsi # If the input number is negative, we mark it by placing 1 into r9 # and negate it. In the end we check if r9 is 1 and add a '-' in front. mov $0, %r9 cmp $0, %rdi jge .L_input_positive neg %rdi mov $1, %r9 .L_input_positive: mov %rdi, %rax # Place the number into rax for the division. mov $10, %r8 # The base is in r8 .L_next_digit: # Prepare rdx:rax for division by clearing rdx. rax remains from the # previous div. rax will be rax / 10, rdx will be the next digit to # write out. xor %rdx, %rdx div %r8 # Write the digit to the buffer, in ascii dec %rsi add $0x30, %dl movb %dl, (%rsi) cmp $0, %rax # We're done when the quotient is 0. jne .L_next_digit # If we marked in r9 that the input is negative, it's time to add that # '-' in front of the output. cmp $1, %r9 jne .L_itoa_done dec %rsi movb $0x2d, (%rsi) .L_itoa_done: mov %rsi, %rax # rsi points to the first byte now; return it. ret Some notes about the code: GAS vs. Intel syntax: I used to believe the Intel syntax is better looking, but grew to tolerate GAS because it’s the default used by tools on Linux. After a very short time you get used to it and don’t really mind it any longer. Yes, even the weird indirect addressing syntax (mov 8(%rsp, %r13, 8), %rdi) grows on you. In other words, focus on the code, not syntax. I could pick any representation for strings, but ended up going with the C-like null-terminated strings. If you look carefully at print_cstring you’ll notice that a length-prefix representation could be better since the write system call doesn’t care about the null and wants the length passed explicitly. However, since real life assembly code often does have to inter-operate with C, null-terminated strings make more sense. Even though my own functions could use any calling convention, I’m sticking with the System V AMD64 ABI. It’s natural because system calls use it as well w.r.t. argument and return value passing. AFAIU they can also clobber scratch registers so care must be taken to preserve information in registers around system calls. Related posts: Creating a tiny ‘Hello World’ executable in assembly
July 25, 2013
· 7,829 Views
article thumbnail
Shared Counter with Python’s Multiprocessing
One of the methods of exchanging data between processes with the multiprocessing module is directly shared memory via multiprocessing.Value. As any method that’s very general, it can sometimes be tricky to use. I’ve seen a variation of this question asked a couple of times on StackOverflow: I have some processes that do work, and I want them to increment some shared counter because [... some irrelevant reason ...] – how can this be done? The wrong way And surprisingly enough, some answers given to this question are wrong, since they use multiprocessing.Value incorrectly, as follows: import time from multiprocessing import Process, Value def func(val): for i in range(50): time.sleep(0.01) val.value += 1 if __name__ == '__main__': v = Value('i', 0) procs = [Process(target=func, args=(v,)) for i in range(10)] for p in procs: p.start() for p in procs: p.join() print v.value This code is a demonstration of the problem, distilling only the usage of the shared counter. A "pool" of 10 processes is created to run the func function. All processes share a Value and increment it 50 times. You would expect this code to eventually print 500, but in all likeness it won’t. Here’s some output taken from 10 runs of that code: > for i in {1..10}; do python sync_nolock_wrong.py; done 435 464 484 448 491 481 490 471 497 494 Why does this happen? I must admit that the documentation of multiprocessing.Value can be a bit confusing here, especially for beginners. It states that by default, a lock is created to synchronize access to the value, so one may be falsely led to believe that it would be OK to modify this value in any way imaginable from multiple processes. But it’s not. Explanation – the default locking done by Value This section is advanced and isn’t strictly required for the overall flow of the post. If you just want to understand how to synchronize the counter correctly, feel free to skip it. The locking done by multiprocessing.Value is very fine-grained. Value is a wrapper around a ctypes object, which has an underlying value attribute representing the actual object in memory. All Value does is ensure that only a single process or thread may read or write this value attribute simultaneously. This is important, since (for some types, on some architectures) writes and reads may not be atomic. I.e. to actually fill up the object’s memory, the CPU may need several instructions, and another process reading the same (shared) memory at the same time could see some intermediate, invalid state. The built-in lock of Value prevents this from happening. However, when we do this: val.value +=1 What Python actually performs is the following (disassembled bytecode with the dis module). I’ve annotated the locking done by Value in #<-- comments: 0 LOAD_FAST 0 (val) 3 DUP_TOP #<--- Value lock acquired 4 LOAD_ATTR 0 (value) #<--- Value lock released 7 LOAD_CONST 1 (1) 10 INPLACE_ADD 11 ROT_TWO #<--- Value lock acquired 12 STORE_ATTR 0 (value) #<--- Value lock released So it’s obvious that while process #1 is now at instruction 7 (LOAD_CONST), nothing prevents process #2 from also loading the (old) value attribute and be on instruction 7 too. Both processes will proceed incrementing their private copy and writing it back. The result: the actual value got incremented only once, not twice. The right way Fortunately, this problem is very easy to fix. A separate Lock is needed to guarantee the atomicity of modifications to the Value: import time from multiprocessing import Process, Value, Lock def func(val, lock): for i in range(50): time.sleep(0.01) with lock: val.value += 1 if __name__ == '__main__': v = Value('i', 0) lock = Lock() procs = [Process(target=func, args=(v, lock)) for i in range(10)] for p in procs: p.start() for p in procs: p.join() print v.value Now we get the expected result: > for i in {1..10}; do python sync_lock_right.py; done 500 500 500 500 500 500 500 500 500 500 A value and a lock may appear like too much baggage to carry around at all times. So, we can create a simple "synchronized shared counter" object to encapsulate this functionality: import time from multiprocessing import Process, Value, Lock class Counter(object): def __init__(self, initval=0): self.val = Value('i', initval) self.lock = Lock() def increment(self): with self.lock: self.val.value += 1 def value(self): with self.lock: return self.val.value def func(counter): for i in range(50): time.sleep(0.01) counter.increment() if __name__ == '__main__': counter = Counter(0) procs = [Process(target=func, args=(counter,)) for i in range(10)] for p in procs: p.start() for p in procs: p.join() print counter.value() Bonus: since we’ve now placed a more coarse-grained lock on the modification of the value, we may throw away Value with its fine-grained lock altogether, and just use multiprocessing.RawValue, that simply wraps a shared object without any locking. Source: http://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing/
January 17, 2012
· 15,432 Views

User has been successfully modified

Failed to modify user

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: