Search engines have always interested me a bit and I’ve wondered how to set them up. They consist of a few simple parts:
crawler, scraper, query engine, and ranking system. I decided to set myself a simple project to build one that could scrape a domain and store content for all the pages. This post will be focused on the crawler/scraper aspect of it.
Scrape A Page Initially, all we want to do is download a single page and scrape the text from it.