Re: searching string url
Available news archives: comp.lang.tcl - comp.lang.python - comp.security.firewalls - sci.crypt - comp.lang.php - comp.lang.javascript
Google
 
Web news.hping.org


comp.lang.python archive

Re: searching string url

From: Mike Meyer <mwm@mired.org>
Date: Thu Jul 28 2005 - 05:38:54 CEST

googlinggoogler@hotmail.com writes:

> Anyway to the orginally replier - I wish it was homework ;-), that
> would mean I wouldnt be trying to find myself a job as a recent
> graduate... I decided to crawl something similar to the yellow pages
> (do you have them in the US?) for my select area and then find all
> pages corresponding to my ideal field of work, and grab their details
> into a txt file.

I'm actually working on a general framework for doing this kind of
thing. It's designed specifically for walking through a collection of
pages from a web-based search engine, applying extra criteria to the
results, and then running a bit of code on any that pass that check.

It works for one site, but my attempt to try it on a second site
turned up a fundamental flaw. My first site used full URLs for
everything, so I happily passed soup between various methods. The
second site used relative urls for everything, and it all broke.

> Trouble is I keep thinking of cool new bits to add, python truely is a
> beautifal language. Ideally would like to somehow write all the
> information into a word mail merge - but I think that requires more
> research!

Given a working scrape, the only extra work is how to get it into a
mail merge. That depends on your platform and the software you're
using to send the mail. Shouldn't be all that hard.

      <mike

-- 
Mike Meyer <mwm@mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Received on Thu Sep 29 17:12:20 2005