Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.

Crawler now stable...

Crawler now stable...Crawler now stable...
Posted by Dan Frost on Fri, 13/07/2007 - 17:29

It looks like we've cracked the crawler problem. We're running tests right now and all seems to be well - apart from our internet connection which is extremely ropey. For the tests we're running the crawler from a supposedly "high capactity" ADSL line so perhaps the ISP is freaking out.

The crawler is still operating on a single thread per site but we're running 9 threads per crawler and it's coping quite well.

Next week we'll be doing a bit more on the reporting side and trying to make the whole interface a little more user friendly then hopefully back to the crawler to make it multi-threaded per site. We need to be careful as we don't want too many threads in case it kills the site its crawling but I think 3 threads per site will be fine. It will be user configurable anyway and only applicable to larger sites.