How Meta robot tag or Robots.txt affect your presence on search engines.


 

robot

What is Robot
When you submit a query to search engine such as Google, you get a very large list of websites which provide information related to your query but you ever wonder how this list get collected by search engines. To find information on the hundreds of millions of Web pages that exist, a search engine employs automated script/program that crawls through the web and collect data from websites… This automated script/ program is called Robot, spider or crawler.
Usually search engine try to search all pages from websites but some websites contain few pages which are not required to index by search engines. If you do not want to index any page or directory than it should be specified to search engines. This can be done by two ways:
1.      Robot Meta Tag
2.      Robots.txt file

 

Robot Meta Tag

HTML <META> tag tell robots not to index the content of a page, and/or not scan it for links to follow.

For example:

<html>

<head>

<title>…</title>

<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

</head>

There are two important considerations when using the robots <META> tag:

  • robots can ignore your <META> tag. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the NOFOLLOW directive only applies to links on this page. It’s entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.

Don’t confuse this NOFOLLOW with the rel=”nofollow” link attribute.

Here’s a list of the values you can specify within the “contents” attribute of this tag:

 

Value Description
index Allows indexing of the page.
noindex Disallows indexing of the page.
follow Instructs crawler to crawl links contained within that page.
nofollow This disallows following of links by a crawler on that specific page:
none Don’t index nor follow links on page.

Examples:

<meta name="robots" content="noindex,nofollow" />
<meta name="robots" content="noindex,follow" />
<meta name="robots" content="index,nofollow" />
<meta name="robots" content="none">

Robots.txt File

robotsRobots.txt is a regular text file that has special meaning to the majority of search engines on the web. By defining a few rules in this text file, you can instruct robots to not crawl and index certain files, directories within your site, or at all. For example, you may not want Google to crawl the /images directory of your site, as it’s both meaningless to you and a waste of your site’s bandwidth. “Robots.txt” lets you tell Google just that.

You need a robots.txt file only if your site includes content that you don’t want search engines to index. If you want search engines to index everything in your site, you don’t need a robots.txt file (not even an empty one).

How to create robots.txt file

  • Create a text file exactly by the name “robots.txt”
  • Write Content.
  • Save this file on the root directory of your website and not a subdirectory.

The format of the content of the file is very simple. It contains two rules:

  • User-agent: the robot the following rule applies to
  • Disallow: the URL you want to block

These two lines are considered a single entry in the file. You can include as many entries as you want. You can include multiple Disallow lines and multiple user-agents in one entry.

Examples:

  • To block the entire site:
    user-agent: *
    Disallow: /
  • To block a directory and everything in it:
    User-agent: *
    Disallow: /junk-directory/
  • To block a page:
    User-agent: *
    Disallow: /private_file.html
  • To remove a specific image from Google Images:
    User-agent: Googlebot-Image
    Disallow: /images/dogs.jpg
  • To remove all images on your site from Google Images:
    User-agent: Googlebot-Image
    Disallow: /
  • To block files of a specific file type (for example, .gif), use the following:
    User-agent: Googlebot
    Disallow: /*.gif$
  • To exclude robot from part of the server:
    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /tmp/
    Disallow: /junk/
  • To exclude all files except one:
    User-agent: *
    Disallow: /~joe/stuff/

Save your robots.txt file on the root directory of your website.

Test a robots.txt file

The Test robots.txt tool will show you if your robots.txt file is accidentally blocking Googlebot from a file or directory on your site, or if it’s permitting Googlebot to crawl files that should not appear on the web. When you enter the text of a proposed robots.txt file, the tool reads it in the same way Googlebot does, and lists the effects of the file and any problems found.

Test a site’s robots.txt file:

  1. On the Google Webmaster Tools Home page, click the site you want.
  2. Under Health, click Blocked URLs.
  3. If it’s not already selected, click the Test robots.txt tab.
  4. Copy the content of your robots.txt file, and paste it into the first box.
  5. In the URLs box, list the site to test against.
  6. In the User-agents list, select the user-agents you want.

Any changes you make in this tool will not be saved. To save any changes, you’ll need to copy the contents and paste them into your robots.txt file.

 

Difference between Robot Meta Tag & robots.txt

  • The Robots META tag does exactly the same thing as the robots.txt file – but it is not as reliable. Not all robots honour the robots meta tag. Use it if your site is in a subdirectory like www.yourdomain.com/users/mypage/ and you can’t get the server administrator to add changes to robots.txt file.
  • When you block URLs from being indexed in Google via robots.txt, they may still show those pages if they are listed somewhere else on the web. A better solution for completely blocking the index of a particular page is to use a robots noindex meta tag on a per page bases
Posted in Featured, How to do SEO, On page SEO, Robots, Search Engine Optimization, SEO | Tagged , , , , , , , , , , , , | 1 Comment

One Response to How Meta robot tag or Robots.txt affect your presence on search engines.

  1. That is a good tip especially to those fresh to the blogosphere.
    Short but very accurate info… Appreciate your sharing this one.
    A must read post!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Powered By: wpnova.com