Archive for January, 2008
Using a robots.txt file can tell search engine spiders what not to crawl on your site. The file has to be named “robots.txt” and placed in the root of your site for spiders to find, and read the file. You can not have more than one robots.txt file for each site, sub-domains are of course the exception, and treated as individual domains, and you can place a robots.txt in the subdomain root.
You may not want spiders to crawl certain sections of your site for various reasons or perhaps if the page showed up in a search engine result, it wouldn’t be useful for the reader. Many of the extra URL’s that vBulletin produces are irrelevant to the end user, and if they landed on one of the pages you could possibly lose a valuable guest when they leave on the page they entered from.
Google Webmaster Tools has a robots.txt generator that you can use to help create the file. There are also a slew of other ways to block content, such as using “NOINDEX” meta tags, or password protecting directories. You can also use the services in Google Webmaster Tools to remove content from their index.
User-agent: * Disallow: /ajax.php Disallow: /attachment.php Disallow: /calendar.php Disallow: /cron.php Disallow: /editpost.php Disallow: /global.php Disallow: /image.php Disallow: /inlinemod.php Disallow: /joinrequests.php Disallow: /login.php Disallow: /member.php Disallow: /memberlist.php Disallow: /misc.php Disallow: /moderator.php Disallow: /newattachment.php Disallow: /newreply.php Disallow: /newthread.php Disallow: /online.php Disallow: /poll.php Disallow: /postings.php Disallow: /printthread.php Disallow: /private.php Disallow: /profile.php Disallow: /register.php Disallow: /report.php Disallow: /reputation.php Disallow: /search.php Disallow: /sendmessage.php Disallow: /showgroups.php Disallow: /subscription.php Disallow: /threadrate.php Disallow: /usercp.php Disallow: /usernote.php
Copy and paste the previous lines into your favorite text editor, call it robots.txt and upload it to your website root. You may need to change the paths of the files, for exmaple if your forums are at http://yoursite.com/forums you’ll need to use “Disallow: /forums/filename.php” instead.
Good Robots.txt Practices For vBulletin
- Avoid allowing search result like pages to be crawled. Nothing is more annoying than landing on a search result page, from a search result page.
- Don’t let a large number of auto-generated pages with the same, or only slightly different content to be crawled. Why you’d build a large number of pages with only slightly different content is beyond us to begin with.
- Don’t allow URL’s created as a result of proxy services to be crawled
- Find more secure methods for blocking search engines from crawling private or sensitive data.
