How is Crawl Score going to benefit your organisation
Key pieces of unique functionality
| Identifies which pages link to the 404 (broken link) | |
| Generates site map in all widely used formats | |
| Reporting interface gives site-wide information on structure, metadata and internal link weight | |
Standard edition functionality
Report and filter HTTP status of every page |
|
| 301 (permanent re-directs) | |
| 302 (temporary re-directs) | |
| 404 (page not found) | |
| 500 (internal error) | |
| All other HTTP status codes | |
- Increased search engine traffic
- Improved user experience
The HTTP status of a page can change the way a search engine crawls the page and it is important for both the user experience and search engine crawlers that pages have the correct status.
With so many sites being database driven and managed from a Content Management System it is relatively easy for the wrong status codes to be used causing reduced traffic and/or a poor user experience.
The HTTP status of a page is generally not visible to a user when using a web browser so when a website is tested, it may appear to work just fine but a full diagnostic crawl will ensure that pages are using the correct status codes.
Crawl Score reports the HTTP status of every page in such a way that all pages can be viewed, checked and corrected if necessary.
HTTP reporting In depth :
A page status of 301 is used to indicated a “permanent redirect”. This means that page no
longer exists and usually “forwards” the crawler to the new URL (EG www.website.com/oldpage forwards to www.website.com/newpage).
This is useful when users have a particular page bookmarked or if a “friendly” URL is publicized. However, these pages should not generally be linked to internally (IE from pages of the website) or externally.
Search engines should generally disregard the original page (EG www.website.com/oldpage)
and remove it from its index.
A page status of 302 is used to indicate a “temporary redirect”. This is generally when a
page is currently not available or you would like search engines to re-visit this page in the
future. If a 302 exists for a long period of time it is possible that the search engines will
ignore this page as, by definition, it should be a temporary redirect.
A page status of 404 is used to indicate “page not found”. This is a broken link and is undesirable for users and search engine crawlers. If a user finds this page then generally they are at a dead-end and may leave the site.
If a search engine crawler finds an excessive number of broken links this can have an effect on your search engine ranking as the crawler will be spending time following links that add no value.
Crawl Score will also identify which pages link to a 404 page (a broken link).
A page status of 500 is used to indicate an “internal server error”. This usually means that
the site has an error in the code or database. To a user this is similar to a broken link in
that their user journey is disturbed by an unexpected page.
Report and filter page titles |
|
| Filter by site section | |
| Highlights duplications | |
- Increased search engine rankings
- Improved click through rates
- Increased search engine traffic
Page titles In depth :
Search engines use page titles as a major factor in ranking. The page title is also the link
that appears in the search engine results so it’s important that page titles are accurate and
concise.
Crawl Score displays all pages titles in a report format that means you can check spelling,
consistency and uniqueness.
Report and filter meta keywords and meta descriptions |
|
| Filter by site section | |
| Highlights duplications | |
| Highlights missing content | |
- Increased click through rates
- Increased search engine traffic
Meta keyawords/descriptions In depth :
Meta keywords are generally not used by search engines but meta descriptions are
generally displayed under the hyperlink (see Page Titles above). Crawl Score reports on Meta keywords and descriptions in such a way that you can quickly
see any issues with spelling, grammar, consistency and accuracy.
By having better Meta descriptions, this could result in an improved click through rate to
your website and thus increased revenue.
Page sizes |
|
| Filter by site section | |
| Show pages that are larger than recommended sizes | |
- Faster load times
- Improved user experience
PAGE SIZE reporting In depth :
Crawl Score will calculate the size of the HTML element of a page and show any pages that
have more than 30k and/or 50k of HTML code. Although there are other elements that can
influence the size of a web page (EG images and external javascript), excessive HTML can
drastically effect the speed of a website.
By analyzing and optimizing the amount of HTML it is likely that your website will operate
quicker to users and be easily for search engines to crawl.
Internal link structure |
|
| Show number of internal links to each page | |
| Order by most or least linked to | |
- More targeted search engine traffic
- Increased search engine traffic
INTERNAL LINK STRUCTURE reporting In depth:
The internal linking structure of a website can have a drastic effect on the way a search
engine ranks pages of a website.
If there is a large image to an important section of your website then a user will realize that this is probably the page they’re looking for but a search engine will see it as one link. By ensuring you have sufficient links to the most important pages, you may be able to help search engines to understand which parts of your website are key.
Crawl Score shows how many times each page is linked to from within the website in a report format so you can view the most linked to, the least linked to and so on.
Caching reports |
|
| Show Last Modified for each page | |
| Order by Last Modified date | |
| Show pages that have no Last Modified | |
| Show eTags for each page | |
| Show pages that have no eTag | |
| Show Expires for each page | |
| Show pages that have no Expires | |
- Faster page load times
- Improved user experience
- Content crawled and indexed more regularly by search engines
- Improved search engine traffic
CACHING reporting In depth:
Proper use of caching can have a massive effect on both the user experience and the way
the search engines crawl your site.
The “Last Modified” date is generally hidden from the user when using a web browser but most browsers will check the local cache to see if a page exists and the Last Modified date has not changed – if not, the web page will not be downloaded thus increasing performance.
A search engine crawler will do exactly the same thing and will therefore use its resources
to crawl new content rather than re-download unchanged content.
“Expires” and “eTags” are used in the same way although “eTags” are used less widely by
search engine crawlers.
Extended functionality
Metadata functionality
Metadata – a definition :
Metadata is data used to describe the nature and content of resources, in order to allow
third-party systems to search those resources
Metadata is extra information which one adds to a resource in order to describe it more
efficiently. A parallel example is a library catalogue entry for a book: the book is the
resource, and the catalogue entry contains extra information about the book, including
author, title, publication date and so on. The library catalogue then allows someone to
search for a given book without having to read every book in the library.
This is the role of
metadata. It allows automated searches to find relevant resources without having to trawl
entire contents of documents.
Most Public Sector websites, in the UK and abroad, should adhere to various metadata
standards. Crawl Score has “libraries” of a number of worldwide standards including eGMS
3.1 which is relevant to UK Public Sector websites.
In the UK the standard is known as the e-Government Metadata Standard (eGMS) which is part of the e-Government Interoperability Framework (eGIF) adoption of which, and compliance with which, is mandatory.
- eGMS contains many elements which refer to specific information about resources (EG TITLE, AUTHOR, DATE, CONTRIBUTOR)
- some elements contain refinements which describe different aspects of each element (EG DATE.ISSUED, DATE.MODIFIED)
- we give these elements values to describe the resource (EG AUTHOR = John Doe)
- some elements require values in specific formats (EG DATEs must be formatted YYYY-MM-DD)
- some elements must contain values picked from a controlled vocabulary (EG AUDIENCE)
Metadata reporting functionality |
|
| Crawl Score will crawl every page on a given website and extract the metadata specified. | |
| This content is then available in a number of ways including : | |
| Show mandatory, Recommended and Optional tags for all pages | |
| Count of compliant and non-compliant pages | |
| Drill down to each tag | |
| Filter metadata reports by site section (EG www.website.com/products) | |
| Create and report upon custom tags | |
| Extensive library of pre-built standards (EG DC 1.1, eGMS 3.1) | |
| Show missing tags | |
| Show where tags are present but content is empty | |
For example, if it’s a public sector customer and they wish to check their eGMS 3.1 compliancy. The chances are that their site will not be compliant (most aren’t).
An example of how a public sector organisation may use Crawl Score
To satisfy the minimum requirement they need to have the following tags on each and every one of their web pages:
| eGMS 3.1 mandatory elements | ||
| eGMS.Accessiblity | eGMS.Creator | eGMS.Title |
| eGMS.Date | eGMS.Identifier | |
| eGMS.Publisher | eGMS.Subject | |
1. eGMS.Accessiblity
Example :
<meta name=“eGMS.accessibility” scheme=“WCAG” content=“Double-A”>
In this case, the customer would need to know the accessibility level of each page – this is
unlikely and is not a trivial task (even though public sector sites should adhere to at least
WCAG A standard…) however, the accessibility standards themselves are not within the
remit of CS.
The customer will need to either check the accessibility of each page or have their CMS provider insert this data in each page (a fairly simple task usually).
They could “cheat” and insert the element with content as “not known”. This would actually comply with the regulations but obviously isn’t very useful.
2. eGMS.Creator
Example :
<meta name="eGMS.Creator" content="Corporate Affairs, Sunderland NHS" />
It is likely that this should be the same for each page on a given website so again, the
technology team could insert this value across the site.
3. eGMS.Date
If this element isn’t present then it can probably be extracted from the article details within
the CMS and then updated across the site.
4. eGMS.Identifier
Example :
<meta name=“eGMS.identifier” content=“http://purl.oclc.org/NET/e-GMS_v1”>
In most cases, this is the URL of the page in question.
This could be generated “on the fly” by the CMS or inserted as a value by the tech team.
5. eGMS.Publisher
Example :
<meta name=“eGMS.publisher” content=“The Stationery Office, St Crispins, Duke Street,
Norwich NR3 1PD, 0870 610 5522, esupport@theso.co.uk”>
It is likely that this should be the same for each page on a given website so again, the technology team could insert this value across the site.
6. eGMS.Subject
Example :
<meta name=“eGMS.subject.keyword” scheme=“CurriculumOnline” content=“En-0383
Joined-up writing”>
This is perhaps one of the most difficult elements as it is likely that this will have to be
entered manually.
7. eGMS.Title
Example :
<meta name=“eGMS.title” content=“e-Government Metadata Standard version 3”>
This element could probably be taken from the article title and therefore the technology
team could insert this value across the site.
From the above you can see that there is quite a bit of involvement from other departments
so it may take time for any changes to take place.
Once these changes have been carried out, a re-crawl will need to take place. A regular recrawl
will help to keep track of any new issues and obviously any other issues with the
website (page titles, broken links, etc).
I think this is a reasonable example of how CS may be used. I guess its value is diminished over time and therefore perhaps we may lose subscribers after 12 months. After the initial crawl(s) and fixes, CS then takes the roll of quality assurance by regularly crawling the site and flagging up any problems.
There is the possibility that the perceived value to the customer for this service is decreased. We need to consider this.

