|
It is very important to remember when
submitting your Web site to search engines that your whole site
can be indexed and seen on their results pages. To counter this
problem you can install a text file called robots, which basically
gives you more control over the spiders. With HTML documents you can tell the
indexing spider not to index a page with the
Robot
Meta Tag. However you cannot do this with cgi files, images or text
documents.
Although the robots.txt file is not going
to help you in your quest for website traffic it will help you control
where the indexing spider goes. Most of us will have a cgi-bin and some
other directories that we wish to keep out of the reach of others so it
is wise to install this file just for a little more protection from unscrupulous
people.
The basic format of the robots.txt file
is a listing of particular directory paths to disallow.
For example,
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /temp/
Disallow: /private/
Disallow: /nothanks/
In this case, we have denied access for
all robots to the cgi-bin, images, temp, private and nothanks
directories.
To install a robots.txt file simply copy the above statements (adding your sensitive directories) into a blank
.txt file named robots and uploaded it to the root of your website (If you have a free site this file will not work and you may
have to ask your provider to install one for you). To check if you
already have a robots.txt file installed on your Web site
click
here.
Please be aware that the robots.txt file
will be open for all to read so it is unwise to place specific files in
the listing, use the
Robot
Meta Tag
to control the indexing of specific HTML documents.
Robot Exclusion File Maker
|