Scraping data in C# is easy using HtmlAgilityPack. In this short tutorial we will focus on how to select Nodes and Subnodes with HtmlAgilityPack. In this project, we need to download htmlagilitypack.dll. Next step is to add reference, like for any other dll included in your current project. Name space
has to be included as well.
First, we need to connect data with given url.
string url = “http://www.nhl.com/ice/schedulebyseason.htm”;
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url);
There is ambiguity with HtmlDocument object since there is also HtmlDocument class in System.Windows.Forms namespace, so we use all HtmlAgilityPack.HtmlDocument doc path. If we want to select just one node we can use method SelectSingleNode
HtmlAgilityPack.HtmlNode justOneNode = doc.DocumentNode.SelectSingleNode(“//a[@class=’ last’]”);
In this example we have selected anchor with class ‘ last’.
<a class=” last” href=”http://…
Once, when node is selected we can read content with
string str_write = justOneNode.InnerHtml.ToString();
using InnerHtml method. Now, text can be displayed in textbox for example.
What to be done when we want to select several nodes? We can use LINQ, and DescendantNodes() method. We need to declare variable of variant type through out LINQ quest:
var row_nodes = doc.DocumentNode.DescendantNodes()
.Where(n => n.Name == “a”)
.Where(n => n.GetAttributeValue(“class”, null) == “”);
After this LINQ quest, we have enumerable collection of the nodes with given attributes. This is similar to jquery when we address control by id or class. In this example, we selected anchor with empty class
<a class=”” href=”http://…
To read their values, we need foreach loop:
foreach (HtmlNode node in row_nodes)
string str_text = node.InnerHtml.ToString();
textBox1.Text += str_text;
HtmlAgilityPack makes scraping data from url HTML easy in C#.
Scraping data in C# using HtmlAgilityPack on snipplr.com
Scraping data in C# using HtmlAgilityPack on mark-dot-net.blogspot.com
Scraping data in C# using HtmlAgilityPack on viziblr.com
Scraping data in C# using HtmlAgilityPack on Youtube