Getting the Most Out of Your Django Database
Join the DZone community and get the full member experience.
Join For FreeDatabase performance is a crucial factor in web application
performance, and can mean the difference between a responsive web
application and a slow one. Here, we summarise methods for identifying
database performance issues, and how to approach fixing them.
Benchmarking Overall Performance
First, it is worth establishing whether database queries are a
performance bottleneck, or whether you should be focusing your efforts
on something else. There are a number of ways to do this, and two that
I’ve found simple and effective are django-snippetscream
and django-debug-toolbar
.
Install django-snippetscream
:
pip install django-snippetscream
and add the following middleware class to settings.py
:
MIDDLEWARE_CLASSES = MIDDLEWARE_CLASSES + ('snippetscream.ProfileMiddleware',)
Now, you can simply append
?prof
to your application URLs
to profile the code run to generate the page. This gives a quick way of
telling whether any particular methods are consuming disproportionate
resources. Sample output is shown below.72063 function calls (68833 primitive calls) in 0.267 seconds Ordered by: internal time, call count ncalls tottime percall cumtime percall filename:lineno(function) 8084 0.034 0.000 0.034 0.000 env/lib/python2.7/posixpath.py:60(join) 8515 0.031 0.000 0.039 0.000 env/lib/python2.7/posixpath.py:130(islink)
See the snippetscream documentation for more details.
The django-debug-toolbar is another easy-to-use profiler (and has many other functions). To install:
pip install django-debug-toolbar
And add the following to your settings.py
:
MIDDLEWARE_CLASSES = MIDDLEWARE_CLASSES + ('debug_toolbar.middleware.DebugToolbarMiddleware',) INSTALLED_APPS = INSTALLED_APPS + ('debug_toolbar',) INTERNAL_IPS = ('127.0.0.1',)
Now, when you visit your site in a browser, the SQL query view will display queries that have been executed and the time taken.
So, you can now judge for yourself whether your application is slow
executing queries or if there is some other performance bottleneck. To
deal with database issues, read on.
A Closer Look at SQL Query Generation in Django
Now that you have a couple of tools by which to measure performance, let’s look at some examples of optimising database performance. We’ll use the following simple models for our discussion.
class Article(models.Model): title = models.CharField(max_length=255) owner = models.ForeignKey(User) tags = models.ManyToManyField('Tag') content = models.TextField() class Tag(models.Model): name = models.CharField(max_length=255)
Fetching Multiple Models at Once
For our first example, assume we want to print a list of articles with the author’s name. We could do the following:
for article in Article.object.all(): print article.title print article.owner.name
How many SQL queries would this generate? Let’s use debugsqlshell
:
SELECT `testapp_article`.`id`, `testapp_article`.`owner_id`, `testapp_article`.`title`, `testapp_article`.`content` FROM `testapp_article` [0.13ms] SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` FROM `auth_user` WHERE `auth_user`.`id` = 1 [0.84ms] etc...
For each article, there will be a separate query to retrieve details about the user. We can avoid this by using select_related
, which tells
Django’s ORM to select related models in the same query:
for article in Article.objects.all().select_related(): print article.owner.id SELECT `testapp_article`.`id`, `testapp_article`.`owner_id`, `testapp_article`.`title`, `testapp_article`.`content`, `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` FROM `testapp_article` INNER JOIN `auth_user` ON (`testapp_article`.`owner_id` = `auth_user`.`id`) [0.39ms]
Here, we see that the article’s owner is retrieved using the JOIN
clause. So in this example, use of select_related
has halved the number of queries executed.
One thing that should be remembered about select_related
is that it does not work for many-to-many fields.
In our example, that means that article tags would not be fetched in a single query.
For these model attributes, there is a new method included in Django 1.4 called prefetch_related
. Django 1.4 is currently an alpha release, download it here.
Let’s look at how prefetch_related
reduces the number of
queries with our example classes. First, observe what happens if we
access article tags without prefetching:
for article in Article.objects.all().select_related(): print article.tags.all() ...: SELECT `testapp_article`.`id`, `testapp_article`.`owner_id`, `testapp_article`.`content` FROM `testapp_article` [0.50ms] SELECT `testapp_tag`.`id`, `testapp_tag`.`name` FROM `testapp_tag` INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`) WHERE `testapp_article_tags`.`article_id` = 1 LIMIT 21 [1.36ms] [<Tag: Tag object>, <Tag: Tag object>] SELECT `testapp_tag`.`id`, `testapp_tag`.`name` FROM `testapp_tag` INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`) WHERE `testapp_article_tags`.`article_id` = 2 LIMIT 21 [0.36ms] [<Tag: Tag object>, <Tag: Tag object>] SELECT `testapp_tag`.`id`, `testapp_tag`.`name` FROM `testapp_tag` INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`) WHERE `testapp_article_tags`.`article_id` = 3 LIMIT 21 [0.36ms]
Separate queries are generated to retrieve the tags. And now with tag prefetching:
for article in Article.objects.all().prefetch_related('tags'): print article.tags.all() ...: SELECT `testapp_article`.`id`, `testapp_article`.`owner_id`, `testapp_article`.`content` FROM `testapp_article` [0.21ms] SELECT (`testapp_article_tags`.`article_id`) AS `_prefetch_related_val`, `testapp_tag`.`id`, `testapp_tag`.`name` FROM `testapp_tag` INNER JOIN `testapp_article_tags` ON (`testapp_tag`.`id` = `testapp_article_tags`.`tag_id`) WHERE `testapp_article_tags`.`article_id` IN (1, 2, 3) [121.34ms]
As we can see, all tags are fetched in a singlee query using an IN
statement. So, select_related
and prefetch_related
are two effective ways of reducing the number of queries needed for accessing models.
Other Techniques for Reducing Generated Queries
If you find that your models are too complex to benefit from the above methods, you can always start writing your own SQL. Of course, this comes with the usual disclaimers: your code may be less portable between databases, and may be harder to maintain.
One method of doing this is to use the extra()
method on a queryset like so:
for article in Article.objects.all().extra('raw SQL here')
This can be useful if you need, for example, a nested SELECT
clause. The other option is to use raw queries, which are documented here.
One last point worth mentioning is to be careful with the use of iterator()
,
which loads only a subset of a query set into memory. If you then
iterate over this queryset, this can generate lots of queries. Here’s a longer discussion.
Conclusion
Hopefully this post has provided a useful discussion of query generation in Django and how you can optimise database access in your application.
I’ll close with a couple of useful resources: some useful tips on writing performant models, and the official django optimization page.
Source: http://blueprintforge.com/blog/2012/01/24/measuring-optimising-database-performance-in-django/
Opinions expressed by DZone contributors are their own.
Comments