Bug with MSNBot/1.1
Recently, Microsoft announced the release of MSNBot/1.1, which is designed to save bandwidth and improve their search rankings. We applaud this effort. However, we’ve discovered what we believe to be a bug in MSNBot/1.1.
MSNBot/1.1 is using the If-Modified-Since: HTTP header to save bandwidth. This is a good idea; it allows the server to say “nope, hasn’t changed” rather than sending back an entire huge file and letting Microsoft figure out if it changed or not. Better for them, better for you.
However, when MSNBot/1.1 issues an If-Modified-Since: header, they’re garbling the HTTP request. Here’s an example:
GET / HTTP/1.1
Accept: text/html, text/plain, text/xml, application/*, Model/vnd.dwf, drawing/x-dwf
Host: www.xxxxxxxxxxxx.com
Accept-Encoding: gzip, deflate
From: msnbot(at)microsoft.com
If-Modified-Since: Sun, 17 Dec 2006 23:34:11 GMT
User-Agent: msnbot/1.1 (+http://search.msn.com/msnbot.htm)
Connection: Close
Notice the extra line between the If-Modified-Since: header and the User-Agent: header. If it were just a blank line, it would be mostly fine; the User-Agent: and Connection: headers would just be regarded as part of a useless request body. Unfortunately, that extra line is actually a space on a line by itself. To be sure, we confirmed that with tcpdump:
0x0020: 5018 ffff 4296 0000 4745 5420 2f20 4854 P...B...GET./.HT
0x0030: 5450 2f31 2e31 0d0a 4163 6365 7074 3a20 TP/1.1..Accept:.
0x0040: 7465 7874 2f68 746d 6c2c 2074 6578 742f text/html,.text/
0x0050: 706c 6169 6e2c 2074 6578 742f 786d 6c2c plain,.text/xml,
0x0060: 2061 7070 6c69 6361 7469 6f6e 2f2a 2c20 .application/*,.
0x0070: 4d6f 6465 6c2f 766e 642e 6477 662c 2064 Model/vnd.dwf,.d
0x0080: 7261 7769 6e67 2f78 2d64 7766 0d0a 486f rawing/x-dwf..Ho
0x0090: 7374 3a20 7777 772e xxxx xxxx xxxx xxxx st:.www.xxxxxxxx
0x00a0: xxxx xxxx 2e63 6f6d 0d0a 4163 6365 7074 xxxx.com..Accept
0x00b0: 2d45 6e63 6f64 696e 673a 2067 7a69 702c -Encoding:.gzip,
0x00c0: 2064 6566 6c61 7465 0d0a 4672 6f6d 3a20 .deflate..From:.
0x00d0: 6d73 6e62 6f74 2861 7429 6d69 6372 6f73 msnbot(at)micros
0x00e0: 6f66 742e 636f 6d0d 0a49 662d 4d6f 6469 oft.com..If-Modi
0x00f0: 6669 6564 2d53 696e 6365 3a20 5375 6e2c fied-Since:.Sun,
0x0100: 2031 3720 4465 6320 3230 3036 2032 333a .17.Dec.2006.23:
0x0110: 3334 3a31 3120 474d 540d 0a20 0d0a 5573 34:11.GMT.....Us
0x0120: 6572 2d41 6765 6e74 3a20 6d73 6e62 6f74 er-Agent:.msnbot
0x0130: 2f31 2e31 2028 2b68 7474 703a 2f2f 7365 /1.1.(+http://se
0x0140: 6172 6368 2e6d 736e 2e63 6f6d 2f6d 736e arch.msn.com/msn
0x0150: 626f 742e 6874 6d29 0d0a 436f 6e6e 6563 bot.htm)..Connec
0x0160: 7469 6f6e 3a20 436c 6f73 650d 0a0d 0a tion:.Close....
(Look for the “0d 0a20 0d0a” after If-Modfied-Since:.)
That’s a nasty violation of the HTTP spec, and causes the request to be dropped (and, on our system, logged, which is how we found out about it). We’d like Microsoft to fix this. 🙂
I’m posting about it here because, as with most large companies, the people responsible for MSNBot appear to be pretty well insulated from the outside world, no doubt due to several billion “your crap search engine lists my site as #372 when you search for cute kittens!” complaints. We don’t have a lot of members up in Redmond, for obvious reasons, but we do have some. And we have a bunch more who know people up there. So this is sort of a throwback to the days when mail was delivered by handing it to anyone who looked like they might be headed in the right direction.
So…. Where do you want to go today? Can you pass along a message when you get there?
Update: It’s resolved; see the comments for full details.
6 Comments
RSS feed for comments on this post.
Sorry, the comment form is closed at this time.
Entries and comments feeds.
Valid XHTML and CSS.
Powered by WordPress. Hosted by NearlyFreeSpeech.NET.
JDW,
I’ve got a friend that manages Microsoft’s Live Search. I’ll pass it on to him to see if he has any control over it.
See, that’s what I’m talkin’ about! Thanks. -jdw
Comment by Michael Schnuerle — April 3, 2008 #
JDW..
Michael notified me of the issue. I will submit this to the crawl team for review and triage. Thanks for the giving us the details.
Jeremiah Andrick
Program Manager Live Search
Excellent! -jdw
Comment by Jeremiah Andrick — April 3, 2008 #
Thanks a lot Jeremiah, on the ball as usual!
Here’s a link to his blog for those of you who are interested:
http://blogs.msdn.com/webmaster
Comment by Michael Schnuerle — April 3, 2008 #
I got in to work today, all ready to send to Live Search – Crawler Support… and I find the blog post already has comments from their PM. 😀
Go Internet.
Comment by Scott Robinson — April 3, 2008 #
Just as a follow up. We pushed a hotfix that we believe addressed this issue. The changes should be solid. Let me know if there are ever issues in the future.
cheers
Jeremiah
Nice work! -jdw
Comment by Jeremiah Andrick — April 4, 2008 #
Who said Microsoft doesn’t listen? (FYI, I’m in the Redmond area!)
Maybe you should update the blog post for those who don’t read the comments?
Comment by Tyler Menezes — April 4, 2008 #