Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.

Ok, so we've been busy

Ok, so we've been busyOk, so we've been busy
Posted by Dan Frost on Wed, 04/07/2007 - 10:12

Not a great start to this blog - nearly two months between blog posts but we've been busy...honest.

The idea was to post on here with fairly regular progress but we've been nose to the grindstone getting the software ready for launch.

It's looking like we're going to go live next week as a beta with a few known issues. There's a huge list of other improvements but the issues below are things we want to get sorted next week.

They are :

1) Documents such as PDF, DOC, etc are not being properly identified so they appear in the duplicate/no titles reports incorrectly.

2) Dynamic and session based URLs - we are identifying these but not reporting on them.

3) Summary charts for each report - at present the reporting is mostly text based but we're going to have chart representations of most sections

4) Meta date - we're not checking meta dates at present but will report on these.

5) Framed sites - we're in two minds what to do here. It's quite difficult to crawl a framed site but it seems Google sometimes manages it so we're going to need to do the same. We'll report on framed pages and flag it up as they're certainly not a good thing in terms of SEO or accessibility.

We're just making some final performance improvements to the crawler and then we'll improve this website a bit and go live. We're effectively re-writing the crawler soon but in terms of it's ability to crawl, it'll be the same. The re-write is purely for the sake of performance and scalability.

CrawlScore be free whilst in beta and crawls will be limited by time rather than pages. All of our crawlers are in the UK at present but we will have crawlers hosted in the US and probably Germany so that sites hosted in those countries will use "local" crawlers for the sake of performance.