Knowledgebase

How do I limit what the search engines can index ?

Various search engines such as Google have what are called 'spiders' or 'robots' continually crawling the web indexing content for inclusion in their search engine databases. While most users view inclusion in search engine listings in a positive light and high search engine rankings can translate to big bucks for commercial sites not everyone wants every single page and file stored on their account publicly available through web searches. This is where /robots.txt comes in.

Most search engine robots will comply with a webmaster/site owners wishes as far as excluding content by following a robots inclusion standard which is implemented via the use of a small ASCII text file named /robots.txt in the root web accessible directory of a given domain. When a compliant robot visits a given site the first thing it does is to check the top level directory for the presence of a file named 'robots.txt'. If found, the directives within the file which tells the robot what, if any, content it can or cannot visit and index is read, in most cases honoured.

Creating /robots.txt files
To create a /robots.txt file simply open a plain text editor such as Windows NotePad, type or paste your directives and save the file using the file name 'robots' (robots.txt). This file should then be uploaded to the /public_html directory such that it's URL will be http://domain.com/robots.txt /robots.txt syntax. All valid /robots.txt files must contain at least two lines in the following format: User-Agent: [robot name or * for all robots] Disallow: [name of file or directory you do not want indexed] Unless one wishes to implement different rules for specific robots the user agent line should just include an asterisk [*] which is a wildcard read as 'rules apply to all robots'. Disallow lines can be used to specify specific files or folders one doesn't wish to have indexed by search engines. Each file or folder to be excluded must be listed separately on it's own line, and wildcards are not supported in Disallow directives. One can have as many or as few disallow lines as is necessary.

For more details on /robots.txt and the Robots Exclusion Standard visit The Web Robots Pages at http://www.robotstxt.org.

Also Read

CAPTCHA: Telling Humans and Computers Apart Automatically

A CAPTCHA is a program that protects websites against bots by generating and grading tests that...

custom php.ini

PHP Selector allows customer to edit php.ini settings. Admin has a full control over which...

How do I get different character sets to display in HTML?

Many web browsers display Western Character Encoding (ISO-8859-1) by default. This means that...

What is php.ini ?

The php.ini file is a special file for phpSuExec (pronounced php-soo-ec-sec). The...

What is the meaning of owner, group, and everyone in the permissions panel?

- Owner defines the permissions set for your main FTP account. - Group defines permissions valid...

Knowledgebase

Categories

Categories

How do I limit what the search engines can index ?

Also Read

Support

Related Services

Get Support

About 365ezone

Ask.

Knowledgebase

Knowledgebase

Categories

Categories

How do I limit what the search engines can index ?

Was this answer helpful?

Also Read

Support

Related Services

Get Support

About 365ezone

Ask.