The reason that it has not been done in CPython so far is that it's even more work: we would need to care not only about carefully adding fine-grained locks everywhere, but also about reference counting; and there are a lot more C extension modules that would need care, too. And we don't have locking primitives as performant as Java's, which have been hand-tuned since ages (e.g. to use help from the JIT compiler).
One commenter recognized that approaching this problem using Transactional Memory...
. . . is much, much harder to get right than it looks like in today effectful/imperative languages. Sure, it looks wonderful on paper, but if your language doesn't help you control side-effects it will give you a very hard time. -- gasche
Still, the PyPy team is suggesting that they have a plan for Python developers to use all their cores without ever having to write threads. You can follow further developments and details on the project at the PyPy Status blog.