Basic Site Crawler

Wed, Jan 15, 2020 | 700 Words

Search engines have always interested me a bit and I’ve wondered how to set them up. They consist of a few simple parts: crawler, scraper, query engine, and ranking system. I decided to set myself a simple project to build one that could scrape a domain and store content for all the pages. This post will be focused on the crawler/scraper aspect of it. Scrape A Page Initially, all we want to do is download a single page and scrape the text from it.