Author
Message
Harold
Posts:2494
Senior member
Member since: 2006-11-30
:: Quote ::
Subject: robots.txt file
I've never used a robots.txt file with my site, in 11+ years in existence. The site has approx. 300 static HTML pages, a small amount of javascript, and image files. Nothing fancy.

Does the site need a robots.txt file?

If so, what should the file have in it?
June 08, 2008 08:58PM
DamonHD
Posts:6158
Moderator
Member since: 2006-11-30
:: Quote ::
Subject: Re: robots.txt file
I almost never use robots.txt.

I do use it in one specific case to keep monstrously stupid robots from spidering dead aliases of my main site. I do this in conjunction with behaviour-based defences so that when the stupid robots ignore robots.txt anyway I stomp on them anyway!

I haven't felt the need beyond that since ~1993 or whenever we first put up a site.

So, you likely don't need one.

Rgds

Damon
June 08, 2008 09:10PM
GegaBit
Posts:3311
Senior member
Member since: 2006-11-30
:: Quote ::
Subject: Re: robots.txt file
Hi Harold,
I was going to write a complete answer but found someone that did it simpler:

[www.robotstxt.org]

Quote:
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to vists a Web site URL, say www.example.com/welcome.html. Before it does so, it firsts checks for www.example.com/robots.txt, and finds:

Quote:
User-agent: *
Disallow: /

Quote:
Does the site need a robots.txt file?

For Good bots (Google, Yahoo, live ..), only if you need to have parts of your site not indexed (like the images folder or cgi-bin for example), or need to tell a major search engine to crawl your pages less often.. I think you're asking about strategical advantages, the answer is no, robots.txt will not improve your site's rankings.

As for bad bots:
I think you are better off not opening that can of worms and seeing your pillow under a microscope, if you did not have to care what kind of bots visit your site for so long, count your blessings.



Edited 1 time(s). Last edit at 06/08/2008 09:19PM by GegaBit.
June 08, 2008 09:17PM
Harold
Posts:2494
Senior member
Member since: 2006-11-30
:: Quote ::
Subject: Re: robots.txt file
Thanks,guys. Why might I want to exclude my images folder from being indexed?

As for bad bots, well, I've heard of them. Perhaps I should be concerned about them?
June 08, 2008 09:22PM
GegaBit
Posts:3311
Senior member
Member since: 2006-11-30
:: Quote ::
Subject: Re: robots.txt file
1- The images bit was just an example
2- Trust me, don't, leave bad bots alone smiling smiley
June 08, 2008 09:45PM
Joshua
Posts:2831
Administrator
Member since: 2007-03-16
:: Quote ::
Subject: Re: robots.txt file
Quote:
User-agent: *
Disallow: /

Just to tell you, if you don't know robots.txt DON'T put this example on your site.
June 08, 2008 09:55PM
James
Posts:1757
Moderator
Member since: 2006-11-29
:: Quote ::
Subject: Re: robots.txt file
I use it to keep some of the admin pages out of the search engines, but very sparingly.
June 09, 2008 08:13AM
annzeise
Posts:22
Junior member
Member since: 2008-12-14
:: Quote ::
Subject: Re: robots.txt file
Because someone might link directly to an image.

I had some biker on a biker board decide to use one of my images as his avatar. Linked directly to my site to do it. I'd rather he had just "stolen" the image and hosted it himself.

He posted a lot, and every time anyone read the board, the image was pulled from my site, using my bandwidth.

Finally, I switched out the image with a new red one that said something like "Bandwidth Thief!" and wrote to the owner of the board.

I do mostly use the htaccess file to ban certain spiders.

Recently had a problem with one spider getting "stuck" on my site and really spinning around. Found out that he was always landing on my 404 page and couldn't get out. I changed all my relative links on the page to absolute ones. Going to see if that prevents spider traps. At least they won't use the bandwidth as badly.

Ann Zeise
A to Z Home's Cool
[homeschooling.gomilpitas.com]
December 17, 2008 12:27PM
Harold
Posts:2494
Senior member
Member since: 2006-11-30
:: Quote ::
Subject: Re: robots.txt file
Ann, I use .htaccess to prevent hotlinking, so don't think I would ALSO need to use robots.txt to do the same?
December 17, 2008 04:43PM
GegaBit
Posts:3311
Senior member
Member since: 2006-11-30
:: Quote ::
Subject: Re: robots.txt file
robots.txt prevents nothing
it acts as a recommendation to be followed by only well behaved bots (a minority of what crawls out there) nothing more.
December 17, 2008 05:31PM
Roger
Posts:46
Junior member
Member since: 2008-12-16
:: Quote ::
Subject: Re: robots.txt file
One thing I do with robots.txt is to exclude a "bad bot trap" page. Hidden in the regular pages of my site is a link to the trap. I record the IP addresses of any visits to the trap page and periodically review them. This gives me the IPs of bad robots I can block with my firewall or .htaccess file.

-- Roger
December 18, 2008 10:52PM
Ian C. Purdie
Posts:2220
Senior member
Member since: 2008-12-12
:: Quote ::
Subject: Re: robots.txt file
Quote:
This gives me the IPs of bad robots I can block with my firewall or .htaccess file.

I had some spamming fool who had a bot to fill out my contact forms with stupid junk.

I thought of going your way Roger until I quickly found out there were literally thousands of IP's he was using.

Used another method.
December 18, 2008 11:03PM
Roger
Posts:46
Junior member
Member since: 2008-12-16
:: Quote ::
Subject: Re: robots.txt file
> I had some spamming fool who had a bot to fill out
> my contact forms with stupid junk.
>
> I thought of going your way Roger until I quickly
> found out there were literally thousands of IP's
> he was using.
>
> Used another method.

A lot of the "spambots" that spam online forms, forums, and send email spam are trojan horse programs that get installed when someone downloads a free game or their computer gets a virus. That's one reason why they can spam you from so many different IP addresses.

[soapbox]
So, it's important to advise your visitors to keep their antivirus software up-to-date and to use programs like Spybot-S&D periodically.
[/soapbox]

-- Roger
December 23, 2008 06:57PM

Sorry, you do not have permission to post/reply in this forum.