Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Running a .NET Core Web Crawler on a Raspberry Pi

DZone's Guide to

Running a .NET Core Web Crawler on a Raspberry Pi

Here we see how to create a simple, automated web crawler to run on your Raspberry Pi so you can automate simple tasks.

· IoT Zone ·
Free Resource

With a web crawler that runs on a Raspberry Pi, you can automate a boring daily task, such as price monitoring or market research.

Introduction

Recently, I developed an interest in IoT and Raspberry Pi, and since I'm a .NET developer, I started to explore .NET Core on a Linux stack. The reason was simply that a Linux stack is cheap and can run everywhere. I built my website in .NET Core that runs on Ubuntu on Linode for $5/month. Next, I started exploring Raspberry Pis, which run on the Linux distribution flavor Raspbian. My first project is to build web crawler in C# that runs on a Raspberry Pi to get the latest shopping deals from popular sites such as Amazon or Best Buy, then post data to WebApi to feed my site http://www.fairnet.com/deal.

Prerequisites

Visual Studio 2017 with the ".NET Core cross-platform development" workload installed. You can download the free community edition.

Using the Code

Launch Visual Studio 2017. Select File > New > Project from the menu bar. In the New Project* dialog, select the Visual C# node followed by the .NET Core node. Then select the Console App (.NET Core) project template.

Install the HtmlAgilityPack and Newtonsoft.json NuGet packages.


HtmlAgilityPack is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT.

Here is the request to the website to get all HTML pages:

HttpClient client = new HttpClient();    
using (var response = await client.GetAsync(url))
{
    using (var content = response.Content)
    {
        var result = await content.ReadAsStringAsync();
        var document = new HtmlDocument();
        document.LoadHtml(result);
        var nodes = document.DocumentNode.SelectNodes("//div[@class='item-inner clearfix']");
        var storeData = new List<store>();
        foreach (var node in nodes)
        {
            Store _store = ParseHtml(node);
            storeData.Add(_store);
        }

        HttpResponseMessage resp = await client.PostAsJsonAsync<list<store>>(@"/api/stores", storeData);
   }
}
</list<store></store>


I post the parsed data to WebApi, where it gets saved in MongoDB.

HttpResponseMessage resp = await client.PostAsJsonAsync >(@"/api/stores", storeData);


Here is the ParseHtml method to parse useful data:

private static Store ParseHtml(HtmlNode node)
{
    var _store = new Store();

    _store.Image = node.Descendants("img").ElementAt(imgIndex).OuterHtml;
    _store.Link = node.Descendants("a").Select(s => s.GetAttributeValue("href", "not found")).FirstOrDefault();
    _store.Title = node.Descendants("a").ElementAt(titIndex).InnerText;
    _store.Price = node.Descendants("span").ElementAt(pricIndex).InnerText;
    _store.RetailPrice = node.Descendants("span").ElementAt(retpricIndex).InnerText;

    return _store;
}


Next, I need to set up my Raspberry Pi so that .NET code can run on it.

Supplies required:

  • Raspberry Pi 3 Model B

  • HDMI cable

  • USB mouse / keyboard

  • SD card

  • 2 Amp USB power supply

Setting Up a Raspberry Pi

  1. The recommended OS is called Raspbian. Download it here: https://www.raspberrypi.org/downloads/raspbian/
  2. Install .NET Core 2 onto the Raspberry Pi.
  3. Deploy this application to your Pi running Raspbian

Once Raspbian installed, configure Raspberry Pi to connect from your development machine.

Enabled SSH from the Raspberry Pi Configuration screen.

Next, we need to find the IP address of the Raspberry Pi.

Open a terminal on your Pi and type:

hostname -I


Next, install PUTTY to connect from your development machine.

The default username and password for Raspbian is “pi” and “raspberry

Install .NET Core 2 onto the Raspberry Pi.

# Update the Raspbian install
sudo apt-get -y update

# Install the packages necessary for .NET Core
sudo apt-get -y install libunwind8 gettext

# Download the nightly binaries for .NET Core 2
wget https://dotnetcli.blob.core.windows.net/dotnet/Runtime/release/2.0.0/dotnet-runtime-latest-linux-arm.tar.gz

# Create a folder to hold the .NET Core 2 installation
sudo mkdir /opt/dotnet

# Unzip the dotnet zip into the dotnet installation folder
sudo tar -xvf dotnet-runtime-latest-linux-arm.tar.gz -C /opt/dotnet

# set up a symbolic link to a directory on the path so we can call dotnet
sudo ln -s /opt/dotnet/dotnet /usr/local/bin


Run the dotnet --info command to see the version installed on Raspbian.

Create a .NET deployment release build for linux-arm:

dotnet publish -c release -r linux-arm


Now, create a folder for the web crawler and transfer project files using FTP. Then, run dotnet webcrawler:

dotnet webcrawler.dll


Topics:
.net core ,c# ,raspberry pi ,iot ,tutorial

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}