Here Are Answers Of Your Questions About [robots And Automation]

Michael asks…
Is it possible to view an archive of a site that has robots.txt?
I am looking for an archive of a post that someone had made on a site, but they have since left the site, and thus their post is gone as well. The site does not allow robots, so google and the likes do not have cached pages for it.
Any help?
conventi answers:
If you know the exact post url, you could see it as long as it wasn’t deleted, or set to private.

Joseph asks…
what is the purpose of robots.txt file in websites?
I want to know, what is the purpose of robots.txt file in websites?
How we can use it and can get benefits of it?
conventi answers:
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.

Lizzie asks…
Do I have to make different robots.txt and sitemap.xml on every subdomain of my website?
I have a website and 5 subdomains, and only my main homapage have sitemap.xml and robots.txt. I want all of my subdomains to be indexed by search engine. do i have to make robots.txt and sitemap.xml on every subdomain?
conventi answers:
Probably so.
You don’t want too many subdomains, because google will consider it spam, and penalize you by not showing you on its front pages.

Sandra asks…
Do I need a robots txt file for my simple one page website?
I have a simple one page text only website. I went to submit it to “Scrub the Web” search engine, and the “meta tag analizer” said I need to construct a “robots txt file” for my website. Should I do it?
conventi answers:
You don’t need a robots.txt file at all . . . It’s optional. And since your site is text only, and you want a search engine to find it, you definitely don’t need it. The robots.txt file is to stop search engine crawlers from getting tangled in parts of the site meant only for humans (like the forms for answering questions, or signing up for new accounts, that sort of thing). Also, if you’re on Google and Yahoo you probably don’t need to sign up anywhere else. Have a nice day!

Steven asks…
How do I set up my robots.txt to only allow one directory?
I just did a site redesign in a new CMS and we didn’t move one of the pieces over due to lack of time. I only want one of the directories of the old site to be crawled. Is there any way to do this other than moving all of the content I don’t want into a folder and disallowing it and then allowing the directory I do want to be crawled?
conventi answers:
Hi!
Basically it should look like this:
User-agent: *
Disallow: /thisplace
User-agent: *
Allow: /theotherplace
But look at the two links below.
For much more help.
Greetings,
/// Micke
Powered by Yahoo! Answers