Last modified, Expires and GYM - a big deal? Posted by Dan Frost on Mon, 06/08/2007 - 13:33
From here on in I'm going to refer to Google, Yahoo and MSN as "GYM". As far as I'm concerned, they're the three search engines most people concern themselves with. We've been doing some work on 304 related stuff and come to a few conclusions. GYM have a limited amount of crawling capability and obviously don't distribute this resource evenly. The will give more resource to sites that are more "important" than others. So if you have a large number of high quality links to your site there's a good chance it's fairly well crawled both in terms of regularity and depth. Let's call this large number of quality links "love". So if a website doesn't have that much love it will only get a bit of attention. So how about we make the most of that attention by guiding the crawler to the pages we want "loving". Pah! I hear you say, we already do that. I've got unique page titles, a good folder and internal linking structure and lovely, unique content on every page. That's great but if you don't have enough love, GYM won't even look for that page because it will be busy re-crawling your main pages as regularly as possible....unless you use Expire or Last Modified. Matt Cutts has said that if they check the HTTP status of a page, don't get a 304 but they see the page hasn't changed they get a bit miffed. It makes perfect sense - why should they re-crawl a page when it hasn't changed? It's a waste of time and love... Unless your homepage changes significantly each day, why not try having the Last Modified to be a week - perhaps, and I'm guessing here, GYM will give its love to other pages on your site that were previously unloved. Based on what we've found in creating crawlscore - if a page tells us its not changed we can quickly move onto another page. In GYMs case, it may be that you have, say 4 minutes of crawler time every 48 hours (depending on how much love you have) so if that time is taken up re-crawling pages that have hardly changed the crawlers never get to the other pages you really do need crawling. This would explain why some big players who have poorly optimised sites still get indexed. They have so much link love that GYM can spend the time to trawl through it. For those less fortunate, it could be that the more help you give GYM, the more they will help you. |