grep for craigslist?

kenta kenta at guster.net
Tue Feb 12 08:41:43 EST 2013


On Tue, Feb 12, 2013 at 8:08 AM, David Rysdam <david at rysdam.org> wrote:
> I used to subscribe to the CL RSS feed. But it's a lot of junk to click
> "read" on. It's even more when you realize how poorly the posters are at
> putting things in the right category, so you basically have to subscribe
> to everything.
>
> For the last couple weeks, I've been just periodically, manually
> searching terms and that works well...other than the fact that I'm doing
> work a computer should be doing.
>
> I'm not interested in some fly-by-night, fee-based solution. I found If
> This Then That, but it doesn't seem to be running my sample recipe and
> the interface is so infantilized there's no way to tell what's
> wrong. But even if it worked, I'm not sure how happy I'm going to be
> trusting my data and processing to a "free" service.
>
> Surely this problem has been solved by now with some combination of
> wget, a procmail-like "patterns and rules" tool and a database to track
> what you've seen before.

My solution is to use RSS feeds, but I have them set with search
filters. So for example, I am constantly looking for Honda S2000's or
parts for them. Reading the general Cars and Trucks feed would be
insanity but you can build a search via their normal page then slap
"&format=rss" to the end or the URL. Combine this with their limited
query capability (http://www.craigslist.org/about/help/search) to
exclude some things and you can get something like this:

e.g. Search for "Honda S2000" in CTA (cars and trucks) minus "2000"
and "2001" (model years that had plastic windows) with a price of
$1-15000 and give it to me in rss:

http://boston.craigslist.org/search/cta?zoomToPosting=&altView=&query=honda+S2000+-2000+-2001&srchType=T&minAsk=1&maxAsk=15000&format=rss

There does appear to be a limit with the length of the URL or, maybe
the number of search terms. I once tried to exclude a lot of things
and if I recall the query did not return results.

The key here is, query refinement. What's good is you can experiment
using CL's standard page, then when you're done slap the "&format=rss"
at the end and put that in your RSS reader.

So, I combine about 5-6 different RSS feeds, with specific search
terms and subscribe them all into Google Reader so I can use it
seamlessly on my work, home, and mobile. Marking items as read as I
see them and I don't see them again, minus those reposts, for which
people who constantly repost to CL, sometimes every day.

(Hitting reply all this time :/)


More information about the gnhlug-discuss mailing list