So, last night I was working on my URL shortening website (jmbk.nl) and the application that generates the short URLs and that redirects existing ones to the actual URLs. And for some unknown reason, the redirections just weren’t working on the actual website. I’d get server errors (even “404-Not Found” server errors) and my redirection ASPX page just didn’t seem to get called. It was, to be polite, a mess. And, because it was late at night and I was tired I was flailing around trying stupid stuff to see what stuck and nothing would.
This morning I was more calm and collected and finally worked out what the issue was. Trouble is, when laid out like this, it sounds ruddy obvious and I should not have had a problem in the first place.
Let’s briefly explain how it all works. The application has a routine that converts a long URL (such as http://blog.boyet.com/blog/pcplus/pcplus-285-calculating-pi/) and converts it to a short URL (like http://jmbk.nl/d3T7A) and stores it in a database. That bit is easy. Now the fun stuff: the short URL does not actually exist. When you click on it, magic happens, a 404 error is raised and my redirection page is supposed to gain control, work out what short URL was used, and convert it back to the long URL (it just looks it up in the database essentially). It then does a Response.Redirect to that URL. To the end-user, it was as if he just clicked on a short URL and he got the actual web page.
The problem is in that phrase “magic happens”. I was assuming one thing and some entirely different other thing was happening.
This is what I was assuming. I had a section in my web.config that said:
<customErrors mode="On"> <error statusCode="404" redirect="Redirection.aspx" /> </customErrors>
(This is in the system.web section.) I was reading this as “Got a 404? Go to Redirection.aspx”. Right? Wrong. All right, it’s partially correct but the bit I was assuming was correct was hopelessly wrong.
The other thing to point out is that the jmbk.nl website is being run on GoDaddy. It is hosted on a Windows server running IIS7 and ASP.NET 3.5. I’d also set the “404 Error Behavior” option on the hosting dashboard to the same ASPX page, although part of me was wondering why. Surely my web.config setting was enough?
Therein lies the rub. No, it isn’t. You see, there are two different ways a 404 error could be generated. My issue was I thought that there was only one.
First things first: when you try and follow a link to your domain, it is IIS that will get the request and decide what to do. (I’m assuming my case here, obviously I should say “the web server” to be more generic.) Part of IIS’ configuration is a list of file extensions and what to do with those extensions. GoDaddy have got their IIS set up such that if you request an ASPX page (or ASHX, ASMX, etc, etc), IIS will just pass the whole request over to ASP.NET and let it do with it what it will.
So, if I request http://jmbk.nl/DoesNotExist.aspx, IIS will pass it on to ASP.NET, and ASP.NET will work out that that ASPX page is bogus; it does not exist. It’s a 404. ASP.NET then goes to the web.config file, finds out what I wanted it to do with 404 errors and then redirects the request to http://jmbk.nl/Redirection.aspx?aspxerrorpath=http://jmbk.nl/DoesNotExist.aspx. My redirection page gains control, works out the original URL (using Request.QueryString["aspxerrorpath"]) and can do some valuable work with it.
But what if the original request was not for an obvious ASP.NET page? Say, for a short URL like http://jmbk.nl/d3T7A? In this case, IIS will not pass the request onto ASP.NET. Instead it tries to find that page itself. If it can’t find the page, it’s a 404 again, but this time generated by IIS itself. IIS looks to its own custom error page list to see what to do with a 404. The way I’ve set up the hosting of this website with GoDaddy, the custom error page is the same Redirection.aspx page. IIS redirects the request (and, yes, it gets transferred to ASP.NET). This time however, the redirected request is very different: http://jmbk.nl/Redirection.aspx?404;http://jmbk.nl:80/d3T7A.
Notice a couple of things: there is no aspxerrorpath key, and in fact the query string (the bit after the question mark) is not even a name/value dictionary. There’s no key=value pairs to be seen anywhere; Request.QueryString[...] is going to get you nothing.
The solution is to either have two different redirection pages, one for ASP.NET 404s and one for IIS 404s with different processing, or to use a different way to get at the query string. In the end, I went for the second option and used Request.ServerVariables["QUERY_STRING"] to get the entire query string. Once I had that it was pretty easy to extract the non-existing URL, no matter how the query string had been created.
As I said, it all sounds very obvious describing it like this, but I was seriously confused about it all last night. My research showed me that I’m not the only person so confused.