During the last weeks I noticed that my blog was becoming slower and slower (and if you visited my blog you might have noticed it as well).
After a bit of troubleshooting on the server (which was kind of painful due to the slow connection via RDC from Europe to my server in the US) I figured out that most of the slowness came from my SQL Server Express instance that was taking up too much memory.
And I found out that I had 3 tables that were extremely big in size:
- Subtext_Log - logs all errors happening on the server
- Subtext_URLs - all the referrals' urls
- Subtext_Referrals - a 1:1 mapping table that counts how many visits per referral
The first was about 20k row in size, and the other two, same size, over 670k rows. It is not a big number by itself, but if you contextualize it on the (virtual) hardware it runs on (a tiny VM instance of half-core and 614Mb of RAM that hosts both IIS and SQL Server) it was a huge number. The overall size of the database was around 350MB.
The size of these tables are pretty easy to explain: each new visit from coming from a link or from a search engine creates a new record (and subsequent just increase the count) and I never cleaned the tables since I launched my blog which received more or less 1150k page views of that kind.
Googling around I also found I'm not the only one with those problems:
- Rob Mensching had the same problem in 2008, and he truncated the 3 tables and also the viewcount table
- Travis Illig, from the subtext team had this problem at least three times, in 2008 when he wrote an automated maintenance procedure and a few days ago as well, when he updated this procedure to work with Subtext 2.5
- Phil Haack had this problem, too, back in 2006 and posted a clean up script that removed from the tables all the urls that come from search engines and other various search pages.
I tried to use the page Travis wrote, but with a the very slow site his page always timed out. I also tried Phil’s approach, cleaning the tables from the search referrals, but, not sure why, the delete statement failed after one hour of execution. So I fell back to the more brutal but quicker and safer "truncate table" approach. And after cleaning all tables I rebuilt the indexes and shrunk the database. From the original 350MB it went down to 30MB, which are much more easy to handle for my tiny VM instance.
I lost the stats about the urls that brought in the most visitors, but I guess I could retrieve the same information with a bit of practice on Google Analytics.
This feature, the referrals’ logging, was introduced in 2003 in the original .Text, when there was not free stats analyzer. But now there is Google Analytics and many other similar services, so I think it's time for a redesign.
A possible solution for this problem might be adding some configuration options to fine tune the behavior of the referrals logging:
- don't track referrals at all
- track only referrals that don't come from search engines (based on regex)
- track referrals only for the latest posts
- automatically purge referrals older than 1 month
- track everything
And also introduce some kind of auto-purge of Log entries after a period of time or when the size reaches a specified number of rows.
If you have Subtext, please comment and let me know what you think about this possible solution to this problem.
Final suggestion: if you have a Subtext-powered blog, periodically check the size of these 3 tables and clean them if they grow too much.