Errata for Programming Collective Intelligence
I'm making my way through Toby Segaran's excellent new book "Programming Collective Intelligence," and I'm posting here some of the errata I've found in the code thus far that hasn't been reported or published on the O'Reilly site yet. I'll report them but also want to explain them here. (I can't get the Python code to indent using the code markup plugin. Please let me know if you have suggestions.)
Chapter 3, Discovering Groups
generatefeedvector.py
The main body of this file bombs on
title,wc=getwordcounts(feedurl)
because the URL http://www.techeblog.com/index.php/feed/ toward the bottom of
http://kiwitobes.com/clusters/feedlist.txt
no long returns an RSS feed. We could remove that URL from feedlist.txt, find the working RSS URL for techeblog, or make our code more robust to deal with this problem in general. To enable the last option, encapsulate getwordcounts in Python's error apparatus:
try: title,wc=getwordcounts(feedurl) except AttributeError: continue
The variable feedlist in the line
frac=float(bc)/feedlist
is referenced but not initialized or computed before that.
The fix is initialize feedlist and increment it as each feedurl is processed:
feedlist = 0 for feedurl in file('feedlist.txt'): try: title,wc=getwordcounts(feedurl) except AttributeError: continue feedlist += 1 wordcounts[title]=wc for word,count in wc.items(): apcount.setdefault(word,0) if count>1: apcount[word]+=1
Lastly for Chapter 3, the string handling chokes on a character from one of the feeds that doesn't bridge the ascii and unicode worlds. I googled for a solution and came up with this one simple fix:
out = open('blogdata.txt','w') out.write('Blog')
to
out = codecs.open('blogdata.txt','wb','utf-8') out.write('Blog')
You must
import codecs
I'm not up to speed on unicode so don't ask me how it works; it works.
That's it for Chapter 3. More later as I make my way through the book. Btw, I just checked Toby's blog and found that you can download the source code.
Thank you for reading this post. You can now Read Comments (3) or Leave A Trackback.
Post Info
This entry was posted on Friday, December 21st, 2007 and is filed under books, python.You can follow any responses to this entry through the Comments Feed. You can Leave A Comment, or A Trackback.
Previous Post: Yelp Battles Supporters of the Meier Family »
Next Post: Devious New Targeted Financial Phishing Scam Strikes Your Cellphone »
- How to Get MagicJack and Lifecam Cinema Working on Windows 7 64-bit
- Fix for Error Installing do_mysql Datamapper Adapter on Ubuntu
- Devious New Targeted Financial Phishing Scam Strikes Your Cellphone
- Errata for Programming Collective Intelligence
- Yelp Battles Supporters of the Meier Family
- Pictures of Lori Drew
- Picture of Curt Drew
- Brandon Antron Rolle Goes on Trial Today
- Fixing Spurious Rails Routing Error
- MySpace Stumbles Playing Catchup to Facebook with Status Updates “Friendsmoods”




February 20th, 2008 12:50
nice find on the codecs fix. i’m digging this book as well, but am astounded at how rampant the typos are. i’ve submitted all of the ones i have found on the errata page for the book. there is now quite a long list of user-submitted errata as well.
October 23rd, 2008 01:53
I am looking for some idea and stumble upon your posting
decide to wish you Thanks. Eugene
March 8th, 2010 07:30
Thanks for this. The amount of errors in the book are unacceptable!