.NET: Avoid Using System.Uri for Domain Validation
Join the DZone community and get the full member experience.
Join For FreeLast time I talked about System.Uri,
I was talking about a bug that prevents trailing dots from being used
for REST resources. Now the issue is different: how about relying on
System.Uri for domain validation?
It's not uncommon to see System.Uri being used to validate an input that is supposed to be a domain name. I've seen code like this trying to validate domains:
Or, besides relying on UriFormatException or on the the Host property, something like this:
Preliminary tests with this code shows that completely wrong domains (like 3@#@@.com) are rejected, so it seems to be a great code. And the best is that we don't have to write any domain validation ourselves.
Now, what about the following domains?
--------.com
-test.com
test-.com
They are all considered valid according to System.Uri(). However, according RFC 1035 or RFC 1123, they are not. According to RFC 1035, not even a digit only domain (like 999.com) is valid, but System.Uri() is fine with all of them.
I played with some of the internal flags and it seems that, if you use E_HostNotCanonical (256), it starts rejecting some of these invalid domain names, but I really couldn't understand the rules it follows. And since there are different RFCs and different interpretations, it would be really hard for System.Uri() to do a precise validation unless one passed the type RFC that the domain is expected to be compliant with.
At the end of the day, you're better off understanding the RFC you want to comply with and implementing the proper regular expression for that. In my case, I wanted it to be compatible with RFC 1123, so this is the regular expression I started with:
"^(?![0-9]+$)(?!-)[a-zA-Z0-9-]{1,63}(?<!-)$
And then relaxed it to the following after learning that digits only domains were accepted by RFC 1123 (there are multiple interpretations, but I read the RFC and was convinced that it was fine).
"^(?!-)[a-zA-Z0-9-]{1,63}(?<!-)$"
This is the regular expression per domain label (text between the dots). It does not apply to the rightmost label as it must not start with a digit - in order to differentiate a domain name from an IP address.
Also, this regular expression requires an explicit check that the entire domain is less or equal to 255 characters.
Source: http://blog.sacaluta.com/2011/11/net-do-not-use-systemuri-for-domain.html
Orginally Authored by Rodrigo De Castro
It's not uncommon to see System.Uri being used to validate an input that is supposed to be a domain name. I've seen code like this trying to validate domains:
public static bool IsDomainValid(string name) { try { new Uri("http://" + name); } catch (UriFormatException) { return false; } }
Or, besides relying on UriFormatException or on the the Host property, something like this:
public static bool IsDomainValid(string domainName) { try { if (StringComparer.OrdinalIgnoreCase.Equals(new Uri("http://" + domainName).Host, domainName)) { return true; } return false; } catch (UriFormatException) { return false; } }
Preliminary tests with this code shows that completely wrong domains (like 3@#@@.com) are rejected, so it seems to be a great code. And the best is that we don't have to write any domain validation ourselves.
Now, what about the following domains?
--------.com
-test.com
test-.com
They are all considered valid according to System.Uri(). However, according RFC 1035 or RFC 1123, they are not. According to RFC 1035, not even a digit only domain (like 999.com) is valid, but System.Uri() is fine with all of them.
I played with some of the internal flags and it seems that, if you use E_HostNotCanonical (256), it starts rejecting some of these invalid domain names, but I really couldn't understand the rules it follows. And since there are different RFCs and different interpretations, it would be really hard for System.Uri() to do a precise validation unless one passed the type RFC that the domain is expected to be compliant with.
At the end of the day, you're better off understanding the RFC you want to comply with and implementing the proper regular expression for that. In my case, I wanted it to be compatible with RFC 1123, so this is the regular expression I started with:
"^(?![0-9]+$)(?!-)[a-zA-Z0-9-]{1,63}(?<!-)$
And then relaxed it to the following after learning that digits only domains were accepted by RFC 1123 (there are multiple interpretations, but I read the RFC and was convinced that it was fine).
"^(?!-)[a-zA-Z0-9-]{1,63}(?<!-)$"
This is the regular expression per domain label (text between the dots). It does not apply to the rightmost label as it must not start with a digit - in order to differentiate a domain name from an IP address.
Also, this regular expression requires an explicit check that the entire domain is less or equal to 255 characters.
Source: http://blog.sacaluta.com/2011/11/net-do-not-use-systemuri-for-domain.html
Label
Host (Unix)
REST
Testing
Property (programming)
.NET
IT
Web Protocols
Opinions expressed by DZone contributors are their own.
Comments