Web crawler uses the internet classes of .Net to search through web pages for:
-
Email Addresses
-
Phone numbers and type (cell, fax etc)
-
Street addresses.
Process is initiated by retrieving search engine results of appropriate
subject. Seed set of web pages are extracted from search results pages - And
the web crawl begins.
-
Crawls through specified web page levels.
-
Pattern matching using regular expressions (Regex)
-
Duplicate checking for email addresses and phone #s.
-
Verification of email addresses (pending solution)
Email HTML or Text messages directly from selected email table records
Project status
-
Verification of email addresses. May have to use a COM object here since, so
far SMTP seems unable to satisfactorily verify email addresses.
-
Improved algorithm to better associate phone types (cell, fax etc) with phone
numbers. Given the countless styles for writing a web page, this is a best
effort approach
ianlane@ianlane.com
|