Use Robots.txt To Prevent vBulletin Duplicate Content
Using a robots.txt file can tell search engine spiders what not to crawl on your site. The file has to be named “robots.txt” and placed in the root of your site for spiders to find, and read the file. You can not have more than one robots.txt file for each site, sub-domains are of course the exception, and treated as individual domains, and you can place a robots.txt in the subdomain root.
You may not want spiders to crawl certain sections of your site for various reasons or perhaps if the page showed up in a search engine result, it wouldn’t be useful for the reader. Many of the extra URL’s that vBulletin produces are irrelevant to the end user, and if they landed on one of the pages you could possibly lose a valuable guest when they leave on the page they entered from.
Google Webmaster Tools has a robots.txt generator that you can use to help create the file. There are also a slew of other ways to block content, such as using “NOINDEX” meta tags, or password protecting directories. You can also use the services in Google Webmaster Tools to remove content from their index.
User-agent: * Disallow: /ajax.php Disallow: /attachment.php Disallow: /calendar.php Disallow: /cron.php Disallow: /editpost.php Disallow: /global.php Disallow: /image.php Disallow: /inlinemod.php Disallow: /joinrequests.php Disallow: /login.php Disallow: /member.php Disallow: /memberlist.php Disallow: /misc.php Disallow: /moderator.php Disallow: /newattachment.php Disallow: /newreply.php Disallow: /newthread.php Disallow: /online.php Disallow: /poll.php Disallow: /postings.php Disallow: /printthread.php Disallow: /private.php Disallow: /profile.php Disallow: /register.php Disallow: /report.php Disallow: /reputation.php Disallow: /search.php Disallow: /sendmessage.php Disallow: /showgroups.php Disallow: /subscription.php Disallow: /threadrate.php Disallow: /usercp.php Disallow: /usernote.php
Copy and paste the previous lines into your favorite text editor, call it robots.txt and upload it to your website root. You may need to change the paths of the files, for exmaple if your forums are at http://yoursite.com/forums you’ll need to use “Disallow: /forums/filename.php” instead.
Good Robots.txt Practices For vBulletin
- Avoid allowing search result like pages to be crawled. Nothing is more annoying than landing on a search result page, from a search result page.
- Don’t let a large number of auto-generated pages with the same, or only slightly different content to be crawled. Why you’d build a large number of pages with only slightly different content is beyond us to begin with.
- Don’t allow URL’s created as a result of proxy services to be crawled
- Find more secure methods for blocking search engines from crawling private or sensitive data.



5 Comments
November 15th, 2008 at 4:38 am
It is very helpfull
thanks
January 16th, 2009 at 12:14 pm
I think this is also important
Disallow: /faq.php
Why generate duplicate content as thousand other sites with boring standard faq.
February 6th, 2009 at 1:28 pm
Scan Robots or spiders dont follow the robots.txt this is a optional thing.. and it is very sasy to change your identity. I can rreawl any website with ease and make it believe i am a google robot. I can post instructions on this here if you need proof but just search google and you will find tons of answers. so the above will not stop crawling of your website..
general.useragent.override Googlebot/2.1 (+http://www.googlebot.com/bot.html)
This can be done using firefox also.
Just search ” changing user agent in firefox”
February 6th, 2009 at 1:31 pm
Also: Tink about this, even if you could stop scammer from crawling your website, you would also stop any search engine from indexing your website. so the above comment is a joke:
Disallow: /faq.php
What would that do.. ok so you would keep your .Faq all to yourself and not share it with google and the rest of the search engines ???
Oh Yea that’s good advice!!
February 18th, 2009 at 6:17 am
Why disallow the faq?
Because there are thousands of vbulletins out there and most of the operators leave the faq as they are by default.
So do I.
That’s a lot of text which is exactly on thousands other sites = dupl. content.
Why don’t prevent that?
Of course if you modified the faq you should allow bot access.
That’s what I meant!