Hack Removal Service

1 January 2008
Category:
Articles
Comments: 5

Use Robots.txt To Prevent vBulletin Duplicate Content

Using a robots.txt file can tell search engine spiders what not to crawl on your site. The file has to be named “robots.txt” and placed in the root of your site for spiders to find, and read the file. You can not have more than one robots.txt file for each site, sub-domains are of course the exception, and treated as individual domains, and you can place a robots.txt in the subdomain root.

You may not want spiders to crawl certain sections of your site for various reasons or perhaps if the page showed up in a search engine result, it wouldn’t be useful for the reader. Many of the extra URL’s that vBulletin produces are irrelevant to the end user, and if they landed on one of the pages you could possibly lose a valuable guest when they leave on the page they entered from.

Google Webmaster Tools has a robots.txt generator that you can use to help create the file. There are also a slew of other ways to block content, such as using “NOINDEX” meta tags, or password protecting directories. You can also use the services in Google Webmaster Tools to remove content from their index.

User-agent: *
Disallow: /ajax.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /global.php
Disallow: /image.php
Disallow: /inlinemod.php
Disallow: /joinrequests.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /subscription.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php

Copy and paste the previous lines into your favorite text editor, call it robots.txt and upload it to your website root. You may need to change the paths of the files, for exmaple if your forums are at http://yoursite.com/forums you’ll need to use “Disallow: /forums/filename.php” instead.

Good Robots.txt Practices For vBulletin

  • Avoid allowing search result like pages to be crawled. Nothing is more annoying than landing on a search result page, from a search result page.
  • Don’t let a large number of auto-generated pages with the same, or only slightly different content to be crawled. Why you’d build a large number of pages with only slightly different content is beyond us to begin with.
  • Don’t allow URL’s created as a result of proxy services to be crawled
  • Find more secure methods for blocking search engines from crawling private or sensitive data.

5 responses on “Use Robots.txt To Prevent vBulletin Duplicate Content

  1. kma says:

    It is very helpfull
    thanks

  2. Audax says:

    I think this is also important

    Disallow: /faq.php

    Why generate duplicate content as thousand other sites with boring standard faq.

  3. Todd says:

    Scan Robots or spiders dont follow the robots.txt this is a optional thing.. and it is very sasy to change your identity. I can rreawl any website with ease and make it believe i am a google robot. I can post instructions on this here if you need proof but just search google and you will find tons of answers. so the above will not stop crawling of your website..

    general.useragent.override Googlebot/2.1 (+http://www.googlebot.com/bot.html)

    This can be done using firefox also.

    Just search ” changing user agent in firefox”

  4. Todd says:

    Also: Tink about this, even if you could stop scammer from crawling your website, you would also stop any search engine from indexing your website. so the above comment is a joke:

    Disallow: /faq.php

    What would that do.. ok so you would keep your .Faq all to yourself and not share it with google and the rest of the search engines ???

    Oh Yea that’s good advice!!

  5. Audax says:

    Why disallow the faq?
    Because there are thousands of vbulletins out there and most of the operators leave the faq as they are by default.
    So do I.
    That’s a lot of text which is exactly on thousands other sites = dupl. content.

    Why don’t prevent that?
    Of course if you modified the faq you should allow bot access.
    That’s what I meant!

Leave a Reply