grep for craigslist?

Greg Rundlett (freephile) greg at freephile.com
Thu Feb 14 15:04:04 EST 2013


I've already contemplated such a move... (my service would be called
Greg'sList... it's Craigslist, only better :-)  But, their TOS explicitly
limits any such possibility.

5. UNAUTHORIZED ACCESS AND ACTIVITIES

...

Any copying, aggregation, display, distribution, performance or derivative
use of craigslist or any content posted on craigslist whether done directly
or through intermediaries (including but not limited to by means of
spiders, robots, crawlers, scrapers, framing, iframes or RSS feeds) is
prohibited. As a limited exception, general purpose Internet search engines
and noncommercial public archives will be entitled to access craigslist
without individual written agreements executed with CL that specifically
authorize an exception to this prohibition if, in all cases and individual
instances: (a) they provide a direct hyperlink to the relevant craigslist
website, service, forum or content; (b) they access craigslist from a
stable IP address using an easily identifiable agent; and (c) they comply
with CL's robots.txt file; provided however, that CL may terminate this
limited exception as to any search engine or public archive (or any person
or entity relying on this provision to access craigslist without their own
written agreement executed with CL), at any time and in its sole
discretion, upon written notice, including, without limitation, by email
notice.

Any access to or use of craigslist to design, develop, test, update,
operate, modify, maintain, support, market, advertise, distribute or
otherwise make available any program, application or service (including,
without limitation, any device, technology, product, computer program,
mobile device application, website, or mechanical or personal service) that
enables or provides access to, use of, operation of or interoperation with
craigslist (including, without limitation, to access content, post content,
cross-post content, re-post content, respond or reply to content, verify
content, transmit content, create accounts, verify accounts, use accounts,
circumvent and/or automate technological security measures or restrictions,
or flag content) is prohibited. This prohibition specifically applies but
is not limited to software, programs, applications and services for use or
operation on or by any computer and/or any electronic, wireless and/or
mobile device, technology or product that exists now or in the future.
...

~ Greg

Greg Rundlett


On Thu, Feb 14, 2013 at 1:23 PM, Joshua Judson Rosen
<rozzin at geekspace.com>wrote:

> You should turn craigsearch into a web service.
>
> I have a bunch of 50%-off coupons for new domains-registrations at
> gandi.net, if you want. Looks like "craigsearch.com" is already taken,
> though, so you'll have to pick another name....
>
> David Rysdam <david at rysdam.org> writes:
> >
> > I liked, but didn't LIKE like, that solution
> [...]
> > Which means that I can now fix that error AND announce the first
> > release of craigsearch!
> >
> > You can't use regular expressions and it would be hard to fix
> > that. You'd basically have to download all of CL and search it
> > locally. Instead, I'm searching for a term at a time using the above
> > URL-based query scheme and then parsing the results. Fortunately, they
> > seem to be very easily parseable (for now).
> >
> > The usage is either
> >
> >     craigsearch <term>
> >
> > where <term> can be multiple words searched for together (i.e. "milling
> > machine" returns hits where both "milling" and "machine" appear) or
> >
> >     craigsearch -f <file-of-terms>
> >
> > similar to how grep -f works.
> >
> > This is only part of the solution, of course. The next step is to save
> > the output from one run and diff it against the output on another run,
> > notifying me of new things. That's pretty trivial[1].
> >
> > [1] Except for the diff itself. The man page seems to indicate that you
> > can get the "one-sided" differences by using some kind of format thing,
> > but my diff won't take any of the options the man page gives for that
> > and it's so confusingly written I can't even tell what is *supposed* to
> > work. I eventually resorted to just
> >
> >     diff A B | grep \>
> >
> > Am I missing something obvious or is diff just broken? The man page sure
> > isn't helpful.
> >
> >
> >
> > _______________________________________________
> > gnhlug-discuss mailing list
> > gnhlug-discuss at mail.gnhlug.org
> > http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>
> --
> "Don't be afraid to ask (λf.((λx.xx) (λr.f(rr))))."
>
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.gnhlug.org/mailman/private/gnhlug-discuss/attachments/20130214/bdb14179/attachment-0001.html 


More information about the gnhlug-discuss mailing list