How is Crawl Score going to benefit your organisation

Key pieces of unique functionality

top
Identifies which pages link to the 404 (broken link) tick
Generates site map in all widely used formats tick
Reporting interface gives site-wide information on structure, metadata and internal link weight tick
 

Standard edition functionality

Report and filter HTTP status of every page

top
301 (permanent re-directs) tick
302 (temporary re-directs) tick
404 (page not found) tick
500 (internal error) tick
All other HTTP status codes tick
 
Benefits:
  • Increased search engine traffic
  • Improved user experience

The HTTP status of a page can change the way a search engine crawls the page and it is important for both the user experience and search engine crawlers that pages have the correct status.

With so many sites being database driven and managed from a Content Management System it is relatively easy for the wrong status codes to be used causing reduced traffic and/or a poor user experience.

The HTTP status of a page is generally not visible to a user when using a web browser so when a website is tested, it may appear to work just fine but a full diagnostic crawl will ensure that pages are using the correct status codes.

Crawl Score reports the HTTP status of every page in such a way that all pages can be viewed, checked and corrected if necessary.


HTTP reporting In depth :
A page status of 301 is used to indicated a “permanent redirect”. This means that page no longer exists and usually “forwards” the crawler to the new URL (EG www.website.com/oldpage forwards to www.website.com/newpage).

This is useful when users have a particular page bookmarked or if a “friendly” URL is publicized. However, these pages should not generally be linked to internally (IE from pages of the website) or externally.

Search engines should generally disregard the original page (EG www.website.com/oldpage) and remove it from its index.
A page status of 302 is used to indicate a “temporary redirect”. This is generally when a page is currently not available or you would like search engines to re-visit this page in the future. If a 302 exists for a long period of time it is possible that the search engines will ignore this page as, by definition, it should be a temporary redirect.

A page status of 404 is used to indicate “page not found”. This is a broken link and is undesirable for users and search engine crawlers. If a user finds this page then generally they are at a dead-end and may leave the site.

If a search engine crawler finds an excessive number of broken links this can have an effect on your search engine ranking as the crawler will be spending time following links that add no value.

Crawl Score will also identify which pages link to a 404 page (a broken link).
A page status of 500 is used to indicate an “internal server error”. This usually means that the site has an error in the code or database. To a user this is similar to a broken link in that their user journey is disturbed by an unexpected page.

Report and filter page titles

top
Filter by site section tick
Highlights duplications tick
 
Benefits:
  • Increased search engine rankings
  • Improved click through rates
  • Increased search engine traffic

Page titles In depth :
Search engines use page titles as a major factor in ranking. The page title is also the link that appears in the search engine results so it’s important that page titles are accurate and concise.

Crawl Score displays all pages titles in a report format that means you can check spelling, consistency and uniqueness.

Report and filter meta keywords and meta descriptions

top
Filter by site section tick
Highlights duplications tick
Highlights missing content tick
 
Benefits:
  • Increased click through rates
  • Increased search engine traffic

Meta keyawords/descriptions In depth :
Meta keywords are generally not used by search engines but meta descriptions are generally displayed under the hyperlink (see Page Titles above). Crawl Score reports on Meta keywords and descriptions in such a way that you can quickly see any issues with spelling, grammar, consistency and accuracy.

By having better Meta descriptions, this could result in an improved click through rate to
your website and thus increased revenue.

Page sizes

top
Filter by site section tick
Show pages that are larger than recommended sizes tick
 
Benefits:
  • Faster load times
  • Improved user experience

PAGE SIZE reporting In depth :
Crawl Score will calculate the size of the HTML element of a page and show any pages that have more than 30k and/or 50k of HTML code. Although there are other elements that can influence the size of a web page (EG images and external javascript), excessive HTML can drastically effect the speed of a website.

By analyzing and optimizing the amount of HTML it is likely that your website will operate quicker to users and be easily for search engines to crawl.

Internal link structure

top
Show number of internal links to each page tick
Order by most or least linked to tick
 
Benefits:
  • More targeted search engine traffic
  • Increased search engine traffic

INTERNAL LINK STRUCTURE reporting In depth:
The internal linking structure of a website can have a drastic effect on the way a search engine ranks pages of a website.

If there is a large image to an important section of your website then a user will realize that this is probably the page they’re looking for but a search engine will see it as one link. By ensuring you have sufficient links to the most important pages, you may be able to help search engines to understand which parts of your website are key.

Crawl Score shows how many times each page is linked to from within the website in a report format so you can view the most linked to, the least linked to and so on.

Caching reports

top
Show Last Modified for each page tick
Order by Last Modified date tick
Show pages that have no Last Modified tick
Show eTags for each page tick
Show pages that have no eTag tick
Show Expires for each page tick
Show pages that have no Expires tick
 
Benefits:
  • Faster page load times
  • Improved user experience
  • Content crawled and indexed more regularly by search engines
  • Improved search engine traffic

CACHING reporting In depth:
Proper use of caching can have a massive effect on both the user experience and the way the search engines crawl your site.

The “Last Modified” date is generally hidden from the user when using a web browser but most browsers will check the local cache to see if a page exists and the Last Modified date has not changed – if not, the web page will not be downloaded thus increasing performance.

A search engine crawler will do exactly the same thing and will therefore use its resources to crawl new content rather than re-download unchanged content.
“Expires” and “eTags” are used in the same way although “eTags” are used less widely by search engine crawlers.


Extended functionality

Metadata functionality


Metadata – a definition :
Metadata is data used to describe the nature and content of resources, in order to allow third-party systems to search those resources
Metadata is extra information which one adds to a resource in order to describe it more efficiently. A parallel example is a library catalogue entry for a book: the book is the resource, and the catalogue entry contains extra information about the book, including author, title, publication date and so on. The library catalogue then allows someone to search for a given book without having to read every book in the library.

This is the role of metadata. It allows automated searches to find relevant resources without having to trawl entire contents of documents.
Most Public Sector websites, in the UK and abroad, should adhere to various metadata standards. Crawl Score has “libraries” of a number of worldwide standards including eGMS 3.1 which is relevant to UK Public Sector websites.

In the UK the standard is known as the e-Government Metadata Standard (eGMS) which is part of the e-Government Interoperability Framework (eGIF) adoption of which, and compliance with which, is mandatory.

  • eGMS contains many elements which refer to specific information about resources (EG TITLE, AUTHOR, DATE, CONTRIBUTOR)
  • some elements contain refinements which describe different aspects of each element (EG DATE.ISSUED, DATE.MODIFIED)
  • we give these elements values to describe the resource (EG AUTHOR = John Doe)
  • some elements require values in specific formats (EG DATEs must be formatted YYYY-MM-DD)
  • some elements must contain values picked from a controlled vocabulary (EG AUDIENCE)

Metadata reporting functionality

top
Crawl Score will crawl every page on a given website and extract the metadata specified. tick
This content is then available in a number of ways including : tick
Show mandatory, Recommended and Optional tags for all pages tick
Count of compliant and non-compliant pages tick
Drill down to each tag tick
Filter metadata reports by site section (EG www.website.com/products) tick
Create and report upon custom tags tick
Extensive library of pre-built standards (EG DC 1.1, eGMS 3.1) tick
Show missing tags tick
Show where tags are present but content is empty tick
 

For example, if it’s a public sector customer and they wish to check their eGMS 3.1 compliancy. The chances are that their site will not be compliant (most aren’t).

An example of how a public sector organisation may use Crawl Score

To satisfy the minimum requirement they need to have the following tags on each and every one of their web pages:

eGMS 3.1 mandatory elements
eGMS.Accessiblity eGMS.Creator eGMS.Title
eGMS.Date eGMS.Identifier  
eGMS.Publisher eGMS.Subject  

1. eGMS.Accessiblity
Example :
<meta name=“eGMS.accessibility” scheme=“WCAG” content=“Double-A”>
In this case, the customer would need to know the accessibility level of each page – this is unlikely and is not a trivial task (even though public sector sites should adhere to at least WCAG A standard…) however, the accessibility standards themselves are not within the
remit of CS.

The customer will need to either check the accessibility of each page or have their CMS provider insert this data in each page (a fairly simple task usually).

They could “cheat” and insert the element with content as “not known”. This would actually comply with the regulations but obviously isn’t very useful.

2. eGMS.Creator
Example :
<meta name="eGMS.Creator" content="Corporate Affairs, Sunderland NHS" />
It is likely that this should be the same for each page on a given website so again, the technology team could insert this value across the site.

3. eGMS.Date
If this element isn’t present then it can probably be extracted from the article details within the CMS and then updated across the site.

4. eGMS.Identifier
Example :
<meta name=“eGMS.identifier” content=“http://purl.oclc.org/NET/e-GMS_v1”>
In most cases, this is the URL of the page in question. This could be generated “on the fly” by the CMS or inserted as a value by the tech team.

5. eGMS.Publisher
Example :
<meta name=“eGMS.publisher” content=“The Stationery Office, St Crispins, Duke Street, Norwich NR3 1PD, 0870 610 5522, esupport@theso.co.uk”>
It is likely that this should be the same for each page on a given website so again, the technology team could insert this value across the site.

6. eGMS.Subject
Example :
<meta name=“eGMS.subject.keyword” scheme=“CurriculumOnline” content=“En-0383 Joined-up writing”>
This is perhaps one of the most difficult elements as it is likely that this will have to be entered manually.

7. eGMS.Title
Example :
<meta name=“eGMS.title” content=“e-Government Metadata Standard version 3”>
This element could probably be taken from the article title and therefore the technology team could insert this value across the site.
From the above you can see that there is quite a bit of involvement from other departments so it may take time for any changes to take place.
Once these changes have been carried out, a re-crawl will need to take place. A regular recrawl will help to keep track of any new issues and obviously any other issues with the website (page titles, broken links, etc).

I think this is a reasonable example of how CS may be used. I guess its value is diminished over time and therefore perhaps we may lose subscribers after 12 months. After the initial crawl(s) and fixes, CS then takes the roll of quality assurance by regularly crawling the site and flagging up any problems.

There is the possibility that the perceived value to the customer for this service is decreased. We need to consider this.