A problem with robots.txt & other minor issues Posted by Dan Frost on Fri, 27/07/2007 - 15:09
We've noticed there's a bug with changing use or not use robots.txt. It's going to change so the default is that it does use robots.txt and should be done on Monday. A couple of sites have also completed the crawl and then all the crawl data has been cleared. If you have this problem please re-crawl. We're looking into it. There's still a few IE7 issues that we'll get sorted middle of next week - it doesn't stop anything from working but doesn't look as good as it does in Firefox. We've worked out a very cool (read efficient) way of recording what pages are linked to and what they link to which means we can do some really cool stuff with presenting website topology. At the moment you can see number of times a page is linked to but can't see which pages they are. The volumes of data to store this information are (were) a bit of a headache but we've found a very efficient way of providing this info. It won't be available for a while because we've got a major revision going on with the crawler that we want to get resolved first. This major revision is purely about scalability and should be completed in the next couple of weeks. We can really get going with the really nice stuff then. Expect some improvements to the reporting next week - a sprinkling of Ajax and a few extra reports (EG dynamic pages). |