|
How to parse domain name?
07-18-2008, 07:10 PM
|
How to parse domain name?
|
Posts: 5,662
Name: John Alexander
|
What is domain.co.uk? Isn't .co the code for Columbia? Are you allowed to have 2 countries in a row?
Given a URL, how does your web browser know what part is the domain name, and what part are the extensions? If it was always .com or .org or .net, it would be easy - find the last dot, and the word before it. But since it can be 1 or 2 (or more?), I'm not sure what's the rule?
|
|
|
|
07-18-2008, 07:51 PM
|
Re: How to parse domain name?
|
Posts: 41,519
Name: Chris Hirst
Location: Blackpool. UK
|
the ccTLD is .uk only
The second levels of .co, .org, .gov, .me, .mod and a few others are sub-domains of .uk created and managed by Nominet for specific business areas (theoretically).
Verisign manage a .uk sub of .com as well (.uk.com)
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
|
|
|
|
07-21-2008, 05:10 AM
|
Re: How to parse domain name?
|
Posts: 38
|
If my memory doesn`t fail me, .co extension is one of the sub-domains of .uk. It is not the code of Columbia.
|
|
|
|
07-21-2008, 05:22 AM
|
Re: How to parse domain name?
|
Posts: 41,519
Name: Chris Hirst
Location: Blackpool. UK
|
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
|
|
|
|
07-24-2008, 07:53 AM
|
Re: How to parse domain name?
|
Posts: 165
|
Interesting... Never thought about it.
|
|
|
|
07-24-2008, 11:31 AM
|
Re: How to parse domain name?
|
Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
|
Given the structure of the DNS, it's make sense to me.
The adresse is parsed from right to left, and from the top root servers until an authoritative server is found at the end of the chain, or every parts are resolved.
So, theorically, the first request for the root servers will be to find which of the 13 handle .uk zone.
Once this is found, a request is sent to the 1st level childs of that server to found which one handle .co.uk zone.
And so on until the whole adresse is resolved or the first unresolved segment.
So, as the .uk and .co zones are different, they don't have any risks of collusions. Even if they are handled by the same root server.
http://en.wikipedia.org/wiki/Root_server
__________________
Only a biker knows why a dog sticks his head out the window.
Last edited by tripy; 07-24-2008 at 11:33 AM..
|
|
|
|
07-24-2008, 03:35 PM
|
Re: How to parse domain name?
|
Posts: 5,662
Name: John Alexander
|
Quote:
Originally Posted by tripy
Given the structure of the DNS, it's make sense to me.
The adresse is parsed from right to left, and from the top root servers until an authoritative server is found at the end of the chain, or every parts are resolved.
|
It sounds like without sending queries to DNS, we can't parse out the registered domain name with 100 % certainty, based on this?
|
|
|
|
07-24-2008, 05:17 PM
|
Re: How to parse domain name?
|
Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
|
Effectively John.
That's the reason why every DNS server is in fact not just an DNS server.
The service is split between a server and a resolver.
The server is the part that answer to a request for a zone it handle.
The resolver is the part that forward the query in the case it does not.
A resolver (usually the nearest one of you, meaing your ISP resolver), is caching those results, to avoid hammering the root servers with request that already have been resolved to an IP.
That's the propagation lag time you have on every DNS modifications you make to a zone.
As each resolvers keep a cache of their results, they trust the "refresh" delay as a TTL to keep the zone infos in a cache.
Looking for infos backing up my writing, I have found this:
http://en.wikipedia.org/wiki/Domain_Name_System
Quote:
Address resolution mechanism
(This description deliberately uses the fictional .example TLD in accordance with the DNS guidelines.) In theory a full host name may have several name segments, (e.g ahost.ofasubnet.ofabiggernet.inadomain.example). In practice, full host names will frequently consist of just three segments (ahost.inadomain.example, and most often www.inadomain.example). For querying purposes, software interprets the name segment by segment, from right to left, using an iterative search procedure. At each step along the way, the program queries a corresponding DNS server to provide a pointer to the next server which it should consult.
A DNS recursor consults three nameservers to resolve the address www.wikipedia.org.
As originally envisaged, the process was as simple as:- the local system is pre-configured with the known addresses of the root servers in a file of root hints, which need to be updated periodically by the local administrator from a reliable source to be kept up to date with the changes which occur over time.
- query one of the root servers to find the server authoritative for the next level down (so in the case of our simple hostname, a root server would be asked for the address of a server with detailed knowledge of the example top level domain).
- querying this second server for the address of a DNS server with detailed knowledge of the second-level domain (inadomain.example in our example).
- repeating the previous step to progress down the name, until the final step which would, rather than generating the address of the next DNS server, return the final address sought.
The diagram illustrates this process for the real host www.wikipedia.org.
The mechanism in this simple form has a difficulty: it places a huge operating burden on the root servers, with every search for an address starting by querying one of them. Being as critical as they are to the overall function of the system, such heavy use would create an insurmountable bottleneck for trillions of queries placed every day. The section DNS in practice describes how this is addressed.
|
__________________
Only a biker knows why a dog sticks his head out the window.
Last edited by tripy; 07-24-2008 at 05:20 PM..
|
|
|
|
07-24-2008, 05:51 PM
|
Re: How to parse domain name?
|
Posts: 5,662
Name: John Alexander
|
Thanks so much, Tripy! Talkupation is definitely due here, but I've given out too much lately, so I'll have to return.
Another developer has come up with a way to parse out the vast majority of domain names from URLs, but it involves a regular expression. A 50 KB regular expression. It matches any "ccTLD" and any "gTLD" and the Cartesian product. I'm not terribly happy about that method, but it's what we've got.
|
|
|
|
07-24-2008, 05:55 PM
|
Re: How to parse domain name?
|
Posts: 3,621
Name: Thierry
Location: I'm the uber Spaminator !
|
Your welcome John.
Glad to see that all those countless nights crawling through BIND manual and dns specs weren't (totally) useless
:-P
DNS is a really complicated pat of internet, but very interesting too.
Quote:
|
Another developer has come up with a way to parse out the vast majority of domain names from URLs, but it involves a regular expression. A 50 KB regular expression.
|
Ouch... a 50 Kb regexp !?
It always brings in my mind the over famous
Quote:
Some people, when confronted with a problem, think “I know,
I'll use regular expressions.” Now they have two problems.
|
And I understand better your question about multithreaded regexp compilation...
__________________
Only a biker knows why a dog sticks his head out the window.
Last edited by tripy; 07-24-2008 at 06:02 PM..
|
|
|
|
07-25-2008, 10:01 AM
|
Re: How to parse domain name?
|
Posts: 41,519
Name: Chris Hirst
Location: Blackpool. UK
|
50Kb sounds a bit much to do something like this;
Code:
<%
dim o_RegEx
dim p_uri
dim host_parts
p_uri = "http://www.google.co.uk/search/search.asp"
set o_RegEx = New RegExp
o_RegEx.pattern = "https://|http://|ftp://"
o_RegEx.Global = True
n_url = o_RegEx.replace(p_uri, "")
n_url = left(n_url,instr(n_url,"/")-1)
host_parts = split(n_url,".")
set o_RegEx = nothing
%>
host_parts(ubound(host_parts)) has the TLD
host_parts(ubound(host_parts)-1) has the second level
and so on
Taken from one of my spam link dropping / referrer spam protection methods
( p_uri would normally be the request.servervariables("HTTP_REFERER") BTW )
__________________
Chris. ->> Links are advertising NOT optimising!! <<-
A foolish consistency is the hobgoblin of little minds
Thought for today:- I SEO the only industry where all the cowboys are Indians?
|
|
|
|
|
« Reply to How to parse domain name?
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|