Digital Credibility and Phishing, Part I – Domain Names

This post follows up on the two-minute tech tip I recorded for Episode 252 of the Just the Books podcast. This is the first post in series that covers digital credibility and phishing.

Domain Names
A website is known by its domain name. This is the friendly version of its internet address. There are a few parts to the name, however, for this tip, I am focusing on the end.

At the end is the TLD, the top level domain name. Generically this is the .com, .net, .org, etc. Registration of some of these is restricted, for example .mil, .gov, and edu. Each country has its own TLD, these are known as ccTLDs and are two letters. This is why there is .co.uk exists.

In general there are two more parts to domain names, working from the right, the next part is what you get when you purchase a domain name. For the Just the Books podcast, it’s the just-the-books part. The last part is the part that the domain holder can generally set themselves. It’s often the www but could be anything; if desired we could make a name directly to the show notes as shownotes.just-the-books.com (note: I’ve not done this).

How can you check a link in a webpage?
Below, I explain one quick way to test. Please note that while I made these screen shots on my Mac, it’s worded similarly on Windows.

For simplicity, let’s use a page from The New York Times. We want to make sure that links to the full articles are still within The New York Times website, nytimes.com.

Right click on the link and select “Copy Link Address”. If you have a one-button mouse, hold down control as you click to bring up the menu.

Next, paste into a plain text editor. For Macs, the default is TextEdit, Windows machines have Note Pad. You want to use a plain text editor and not your word processing program of choice because plain text strips away the webpage programming (“HTML”) and will display the actual URL.

It should look something like this.

Let’s evaluate what you see:

  1. Ignore the http:// It’s mostly for the computer, not humans.
  2. The www. isn’t important for this either.
  3. nytimes. looks right. So far so good.
  4. .com Great!
  5. Now we see the most important part. Next comes a /. For the computer, and for you, that means the domain name is done and we get to the webpage specific parts.

Congratulations, it is a link that looks right.

Using this method to test from the results you get from a search engine is trickier because the search engines what to know what websites are successfully clicked and they do this by passing you through their domain first. If we searched Google for the New York Times and copy-and-pasted the resulting link, it will by default in most web browsers return this massive block of … stuff:

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CI8 BEBYwAA&url=http%3A%2F%2Fwww.nytimes.com%2F&ei=uV62T4XnC8bM6QGAuZzFCg& usg=AFQjCNEtLodOdxWZSGdJpL7WJaEeUJVlnw&sig2=B5xbatumCielOdBT8trDNA

How do you parse that?

Start with the chunk http://www.google.com/ This is helpful because it’s google’s domain and because we see the / we know we’re now going to some webpage specific stuff.

Next we find the geeky bits. Let’s take it slow and scan through what we see. We know that we want to find nytimes.com in there, so take it slow… do you see it?

Great! But there’s lots of weirdness around it and no slash, so you aren’t sure if it’s done and going onto the website stuff. The www.nytimes.com is surrounded by encoding of special characters. Computers don’t really like spaces and different characters might also have special meaning for programs, so they are often encoded. %2F means a /. We see that surrounding the text we’re looking for: %2F%2Fwww.nytimes.com%2F which really means //www.nytimes.com/. That’s a good sign.

In this case we shouldn’t ignore http://, which we find to the left of URL. How? the %3A means a : so now we have all the parts. I don’t expect you to remember that but please note the similarity in length and format for each of the chunks of special encoding. So we see &url=http://www.nytimes.com/ Congrats! That’s what we’re looking for and this link so going where we expect.

You are probably wondering how to prevent this and just get directly to your links without your search engine tracking extra information about what you clicked. Don’t worry, personal privacy will be covered in the future. If you are concerned now, please ask and let me know you want to know sooner. Next week I plan to continue with digital credibility and phishing by discussing security certificates and SSL.