RSS Scan Times

Chad Dickerson via Scott Shuda via Dave Winer now sees a massive surge of RSS newsreader activity at the top of every hour, presumably because most people configure their newsreaders to wake up at that time to pull their feeds. If I didn’t know how RSS worked, I would think we were being slammed by a bunch of zombies sitting on compromised home PCs. Our hourly RSS surge has all the characteristics of a distributed DoS attack, and although the requests are legitimate and small, the sheer number of requests in that short time period creates some aggravating scaling issues.

Scott and Dave proudly scan at odd times past the hour.

I scan no sooner than one hour after than the last time the feed was scanned. This means that if the aggregator starts up after a long period of inactivity, it immediately scans all the feeds. Thereafter it scans them an hour after that. If you get impatient and hit refresh before the hour is up, it won’t scan again until another hour has passed. The practical upshot is there is no guaranteed time past the hour that I scan, it is randomly distributed.

This is only scheme that makes sense to me. Scanning at sheduled times (particularly the top of the hour) is bad for servers and for consumers. As a server I don’t want to be hit all at once. As a consumer I want the news as soon as I can get it (subject to not flooding the server), not at some arbitrary time past the hour. I’m stunned that this is not how all aggregators work.