Robots.txt file and disallowing irrelevant folders

Posted by Community Admin on 05-Aug-2018 23:45

Robots.txt file and disallowing irrelevant folders

All Replies

Posted by Community Admin on 30-Jan-2017 00:00

I'm new to using the sitefinity deployment environment and I want to disallow access to folders that are irrelevant that don't need to be crawled using the robots.txt file.   

1.  Does anyone have any best practices or info regarding what folders to prevent being crawled (script and code folders, etc.)

2.  Also, what specific folder should the file be placed in, obviously the root folder, but what name does it go by if any? 

Thanks,

lk

Posted by Community Admin on 31-Jan-2017 00:00

Hi Liz,

The file should be named robots.txt and it must be in the root of your site. Basically there are no folders or files you need to list in there. Search engines will figure it out themselves. If you have pages that are not relevant to be found in search engines (e.g. your error page or iframes), you can optionally list them. Also you can add a line to point to your XML Sitemap if you have one. So then robots.txt looks like this, for example:

Sitemap: www.example.com/sitemapxml.xml

User-agent: *

Posted by Community Admin on 31-Jan-2017 00:00

Hi Arno

I understand robots.txt but I am not totally familiar with sitefinity on windows.  I typically use apache.  So my question is the same, what folders are irrelevant? This does make a difference when it comes to seo.  For example in wordpress environments there are folders that have code and scripts we do not want the search engines crawling, was curious if sitefinity is similar.  There should be best practices info on this issue I would think.

thanks.

Posted by Community Admin on 31-Jan-2017 00:00

Hi Liz,

There's really no need to add such folders to robots.txt, regardless of which CMS or operating system you're using. Search engines start crawling at your homepage, or even another page, and simply follow links from there. They are not interested in anything else but pages and assets like images, videos, etc.  Excluding a folder like \bin or \Templates does not make a difference.

The less you exclude the better it is. A few years ago there was some concern that excluding css files was actually harmful as search engines could think you had something to hide. Not sure if that's still the case, but in any case there is no need to block specific folders or scripts.

Perhaps someone else can provide a list of folders you can exclude. I don't have one as I never did it, and never had an issue with search engines either ;-)

Note that robots.txt is just a suggestion to search engines. It does not guarantee anything. If there are files that are not to been seen to the public, they need to be password protected.

This thread is closed