Web Crawler - Extracts & verifies emails, phones, street addresses from web pages
Web crawler uses the internet classes of .Net to search through web pages for:
  • Email Addresses
  • Phone numbers and type (cell, fax etc)
  • Street addresses.
Process is initiated by retrieving search engine results of appropriate subject. Seed set of web pages are extracted from search results pages - And the web crawl begins.
  • Crawls through specified web page levels.
  • Pattern matching using regular expressions (Regex)
  • Duplicate checking for email addresses and phone #s.
  • Verification of email addresses (pending solution)
Email HTML or Text messages directly from selected email table records Project status
  • Verification of email addresses. May have to use a COM object here since, so far SMTP seems unable to satisfactorily verify email addresses.
  • Improved algorithm to better associate phone types (cell, fax etc) with phone numbers. Given the countless styles for writing a web page, this is a best effort approach
ianlane@ianlane.com