Errata for Programming Collective Intelligence

I'm making my way through Toby Segaran's excellent new book "Programming Collective Intelligence," and I'm posting here some of the errata I've found in the code thus far that hasn't been reported or published on the O'Reilly site yet. I'll report them but also want to explain them here. (I can't get the Python code to indent using the code markup plugin. Please let me know if you have suggestions.)

Chapter 3, Discovering Groups

generatefeedvector.py

The main body of this file bombs on

 
title,wc=getwordcounts(feedurl)

because the URL http://www.techeblog.com/index.php/feed/ toward the bottom of

http://kiwitobes.com/clusters/feedlist.txt

no long returns an RSS feed. We could remove that URL from feedlist.txt, find the working RSS URL for techeblog, or make our code more robust to deal with this problem in general. To enable the last option, encapsulate getwordcounts in Python's error apparatus:

 
try:
   title,wc=getwordcounts(feedurl)
except AttributeError:
   continue

The variable feedlist in the line

 
frac=float(bc)/feedlist

is referenced but not initialized or computed before that.

The fix is initialize feedlist and increment it as each feedurl is processed:

 
feedlist = 0
for feedurl in file('feedlist.txt'):
    try:
        title,wc=getwordcounts(feedurl)
    except AttributeError:
        continue
    feedlist += 1
    wordcounts[title]=wc
    for word,count in wc.items():
        apcount.setdefault(word,0)
        if count>1:
            apcount[word]+=1

Lastly for Chapter 3, the string handling chokes on a character from one of the feeds that doesn't bridge the ascii and unicode worlds. I googled for a solution and came up with this one simple fix:

 
out = open('blogdata.txt','w')
out.write('Blog')

to

 
out = codecs.open('blogdata.txt','wb','utf-8')
out.write('Blog')

You must

 
import codecs

I'm not up to speed on unicode so don't ask me how it works; it works.

That's it for Chapter 3. More later as I make my way through the book. Btw, I just checked Toby's blog and found that you can download the source code.

Popularity: 39% [?]

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • del.icio.us
  • digg
  • NewsVine
  • Reddit
  • TailRank


Thank you for reading this post. You can now Read Comment (1) or Leave A Trackback.

Post Info

This entry was posted on Friday, December 21st, 2007 and is filed under books, python.

Tagged with: No Tags

You can follow any responses to this entry through the Comments Feed. You can Leave A Comment, or A Trackback.



Previous Post: Yelp Battles Supporters of the Meier Family »

Read More

Related Reading:

One Response to “Errata for Programming Collective Intelligence

  • 1
    jasonz
    February 20th, 2008 12:50

    nice find on the codecs fix. i’m digging this book as well, but am astounded at how rampant the typos are. i’ve submitted all of the ones i have found on the errata page for the book. there is now quite a long list of user-submitted errata as well.



Leave a Reply

Note: Any comments are permitted only because the site owner is letting you post, and any comments will be removed for any reason at the absolute discretion of the site owner.