Blocking index.php using robots.txt

28 January 2009

All websites should have a robots.txt file which either allows or disallows access to Google to certain pages or directories of your website.

If you don't block Google from any of your pages you should still have a robots.txt file, but simply tell Google to allow entry to everything.

The robots.txt file is often used to stop Google from crawling duplicate content, which Google often sees as cheating and may penalise some websites for this.

What I'm wondering is - what happens if we disallow Google from indexing index.php?

Although in my case the complete Content Mangement System runs off index.php alone, I don't actually have any links going to index.php and so disallowing it in my eyes would be ok.

However, I have recently had a problem of a website being 3rd in Google one day, and then not even indexed by Google the next day. Please read this report at PeterNichol.com

Even if disallowing access to index.php in the robots.txt file, I still don't think it would be as drastic as removing the whole site from Google's index.

This is something I will need to look into further and may need to play with the robots.txt file until th site is once again listed in Google.

If anyone knows any reason why this may have happened, please do contact me ASAP.

http://www.pjncomputersolutions.com/entry/trackback/46/

Subscribe to comments

Please leave a comment using the form provided.





Current comments:

Ashley Ward says:

Ouch, bad result for peternichol.com

I also don't think this should affect the website though as there's no links to index.php, but actually with the whole site hanging on index.php I think I'd just allow the page on the robots.txt file, just to be safe. I'm no expert though :S

29 January 2009 12:15am

Networks

Share this on Facebook  Google  Yahoo  Technorati  Reddit  Digg  del.icio.us