Plumbing Project: Yahoo! Pipes for Aggregating Comments across a Syndicated Course Blog

So last week I mentioned that after figuring out how to display comment counts on syndicated posts, the next hurdle was to figure out how to display the most recent comments across a whole collection of syndicated blogs/posts.

The dilemma lies in the fact that while there is an RSS feed for each post’s comments, that feed needs to be essentially “exploded” and then all of the exploded feeds need to be aggregated together in order to get a view of comments across all of the syndicated blogs. I wasn’t sure how we were going to do that within WordPress/FeedWordPress. As you can imagine, as the content for ds106 grows, there are more and more posts being aggregated. EACH of those posts has an associated comment feed, and each of those feeds could have any number of comments. Potentially, we could be talking about thousands of comments, in the end. That’s a big feed to both construct and parse.

On Tuesday it occurred to me that maybe I could tackle this in Yahoo! Pipes, so I took a stab at it and I’ve been sort of successful.

Here’s the pipe’s process:

Step One: Fetch the feed out of ds106.us. We’re already off to a tough start. Right now Jim is syndicating the last 600 posts on the blog. That’s so that as students joined the class this spring and subscribed to the feed, they could get a backlog of everything that had come before. Keep in mind that there are almost 900 syndicated posts on the site, and, um, we’re in the third week of class. So, I’m already not starting with everything on my plate. Theoretically, if you go leave a comment on the very first post that was syndicated into the site, I can’t capture any information about it.

But, we’ll forge ahead.

Step Two: I’m going to pull out one element of the feed: the wfw:commentRSS property which I’ve already determined is pretty universal (esp. on WordPress blogs) and contains the URL of the post’s comment feed.

Step Three: Now I’m going to run through a loop that basically fetches the data contained in those comment feeds that I just pulled out of the site feed. For some reason, I had to use this function on Yahoo! Pipes as opposed to the Fetch Feed function again. I don’t know why. I’m not that smart. Whatever it finds in those comment feeds is then emitted into a new feed.

Steps Four & Five: Now I have to do some kind of klugey stuff. I know I need to filter out items that are empty — I have empty items because not every comment feed has contents (because no one has commented on those posts yet). But for some reason, the Filter function in Yahoo! Pipes doesn’t work when I put in a regular expression to try and match, say, the item title to a null value. However, I can use the Regex function to replace a null item title with a string — in this case “XXX.” And then, I can filter out all of the titles that are equal to “XXX.” It’s stupid, but it works.

Step Six: Now I truncate the feed to the last 10 items, because that’s all I want to display on the course site in the sidebar (and I’m trying to limit the amount of parsing WordPress needs to do).

Step Seven: I output the pipe.

If I’m lucky, all the pieces fit together and I get a widget on the ds106 sidebar of the last 10 or so comments.

But sometimes, I just get a feed error. Probably because the whole thing is pretty intensive. So, now I ned to figure out if there is a better way. I’m thinking I need some way to cache this data as it’s compiled. I mean, it’s not going to be changing a lot and there’s no reason to repeat the parsing of every feed every time. But, I admit, I’m not sure what steps to try next.

Ultimately, I’d like to get this to a Yahoo! Pipe that we could clone anytime we’re running a course blog so that we can include recent comments all the time.

As always, ideas, thoughts, and reactions are welcome!!