Frequently Asked Questions

What is Expires

Expires is a HTTP header that is a date for which a file is not to be updated until beyond that date.

Yahoo uses Expires (http://developer.yahoo.net/blog/archives/2007/05/high_performanc_2.html) and they recommend changing the filename if you ever need to change a file that has previously had a far future date.

Generally speaking, Expires is not something that is really applicable on smaller sites.

What is "Last Modified"

A web server will generally automatically generate a Last Modified date. If the content is dynamic then usually the Last Modified date will always change unless you manipulate it via your publishing / content management system.

Last Modified is used by browsers and search engine crawlers alike to see if the content has changed since the webpage was last downloaded.

What is an ETag?

An ETag is a HTTP response header (not generally seen by the user) that is unique for each URL. Etags are generally used for caching.

If a crawler or browser checks the ETag when the page is first requested and then re-checks it when re-requesting the page, then it knows the page hasn't changed and therefore it can use the local, cached version which saves downloading the document again.

They are generated by the web server for each document. If the document changes then a new ETag will be generated automatically.

Live.com use ETags but Yahoo! do not. It is thought that Google also ignores them but this hasn't been confirmed.

Do I need a sitemap?

Yes. Search engines need to see as many of your pages as possible to be be to have them in their index. The more pages of your site in a search engines index, the more chance you have of a user finding your content via a search engine. Sitemaps help search engines to crawl your site properly which could result in an increase in traffic.

A sitemap enables a crawler to view all of your pages and ensure that they are read and stored correctly. If you don't use a sitemap then a crawler has to try and find links to your pages from your website. This isn't always as easy as you'd expect and a sitemap can have a dramatic effect on the way a search engine crawls your site.

Why use crawlscore?

Most people view and check their website through their browser. Google, Yahoo and other search engines see your site in a very different way to most browsers so many potential issues are hidden from your view.

crawlscore gives you a search engine view of your website via reports and charts that enables you to understand the structure of your website better.

It may be that your site has lots of duplicate page titles, temporary redirects and so on that effect your rank in the search engines.

The other major benefit is that crawlscore will generate sitemaps for Google, Yahoo and every other widely used format.

All of this is available without installing any software on your PC or server. Your sitemaps are automatically generated at the end of each crawl and you can upload them straight to Google and/or your site.

Is the site internal linking structure important?

Absolutely. When a crawler looks at your site it's trying to find out what the most important pages are - it's safe to assume that a page with many internal links to it is probably something you want people to see so the search engines treat it accordingly.

Also, a crawler wants, as quickly as possible, to find all your pages and get them indexed - if you site structure is good and easy to follow then you make the crawlers job easier.

A sitemap can help with this but you really should ensure that your site "flows"...

Do I really need to worry about meta tags?

Probably. Most search engines ignore them due to them being abused but unique and keyword rich meta keywords and descriptions can help with search engines such as Yahoo.

Why are duplicate pages titles a problem?

Page titles are widely regarded as one of the most important factors of Search Engine Optimisation (SEO). Not only are page titles key in the way your page appears in search engine results but they also have an effect on the way the search engines rank your site or page.

We believe that each and every page on your website should have unique page titles. If you're using Javascript to dynamically generate your page titles remember that search engines, nor crawlscore (by design), will be able to see them.

Duplicate page titles aren't good for a site
Does size matter?

Assuming we're talking about the size of HTML, Javascript or images, then yes, it can have an effect.

This is only our opinion but it may be that if a page has over 100k of HTML that this may be fairly difficult to crawl. Our crawler can handle much larger pages because it's designed to do so but a page that size probably isn't that useful to users either so it's worth considering breaking the page down into smaller chunks if possible.

All of the main search engine crawlers have a lot of work to do in crawling billions of web pages. If a crawler comes across a site that's making it work hard perhaps it will move on to another site.

summary image

Why should I care about Javascript on my web pages?

In some cases, you shouldn't but if you have links in Javascript then the search engines can't find them and obviously this means some of your pages may not be indexed.

The rule of thumb is that if you do anything in javascript then, generally speaking, a search engine can't read it.

How is crawlscore different from Google Webmaster Tools?

Google Webmaster Tools (GWT) is a very handy application and it's free. You can, and you should, sign up here : www.google.com/webmasters/sitemaps/

crawlscore differs in a number of ways but in essence we believe that our software is more comprehensive and easier to use. We are able to crawl a complete site in a fairly short period of time - apart from anything else this is because we don't have billions of websites to crawl unlike Google!

We will have a comparision matrix here so you can see the key differences but it has to be said, we feel that CrawlScore compliments GWT. You can use crawlscore to ensure your site is crawlable and then, in due course, this should be reflected in GWT.

As an example, you might run crawlscore on your 100 page website today, it should be fully crawled and completed reports available the next day. You can then make the necessary changes (if required) and then re-crawl to see if your changes worked. To do this in GWT could take months.

Obviously crawlscore shows a lot of information that is currently outside of the remit of GWT. For Google to start talking about factors such as Javascript, page size and so on could be a mine field for them.

summary image

What is a 301, 302, etc?

When a webpage is requested the webserver will reply to the browser or crawler with a HTTP status code. Users are generally not aware of them but browsers and crawlers use them extensively.

The most common codes are :

200 - page ok
301 - permanent redirect
302 - temporary redirect
404 - file not found
500 - internal error

If a crawler comes across a site with a lot of 301, 302 or 404, this can lead to crawling problems. 302's in particular can be used (along with other frowned upon SEO techniques) to mislead search engines.

Ideally your site will have no 302, 404 or 500 errors. 301's often have to be used - for example in redirecting domain names (IE example.com to www.example.com). However they should be used sparingly.

Crawl Score will show other HTTP codes. For a full list of HTTP codes please see the following links :

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
http://en.wikipedia.org/wiki/List_of_HTTP_status_codes