Pickaxe - Screen scrape directly into SQL Server tables

by breeve 14. May 2016 11:45

Pickaxe uses SQL like syntax to scrape data from web sites. It is hosted on github and is open source.

Recently, I added functionality to insert scraped results directly into SQL Server. The example below screen scrapes Lake Travis water levels and inserts the results directly into SQL server. Visit the Pickaxe download page to run the code below. The attached youtube video shows how the code was developed.

create mssql levels(month string, year int, level float)
    connectionstring = 'Server=localhost;Database=scrape;Trusted_Connection=True;',
    dbtable = 'LakeLevels'

insert overwrite levels
    pick 'td:nth-child(1) p.bold' match '(\d+)-(\w+)' replace '$2',
    pick 'td:nth-child(1) p.bold' match '(\d+)-(\w+)' replace '$1',
    pick 'td:nth-child(2)' match '\d{3}\.\d{2}'
from download page (
        'http://www.golaketravis.com/waterlevel/' + pick '' take attribute 'href'
    from download page 'http://www.golaketravis.com/waterlevel/'
    where nodes = 'table[width="100%"] td[style="background-color: #62ABCC;"] p.white a'
    ) with (thread(10))
where nodes = 'table[width="600"] tr'



Powered by BlogEngine.NET
Theme by Mads Kristensen

About Me

I am a Principal Engineer with 16 years experience developing and releasing software products. I started developing in C/C++ then moved into .NET and C#. Currently working in Python/Flask and Docker. Have tech lead multiple projects. I have developed products in Windows Forms, ASP.NET/MVC, Silverlight, WPF, and Python. I currently reside in Austin, Texas.

Own Projects

Pickaxe - An easy to use web page scraper. If you know a little SQL and how basic CSS selectors work, no web scraping product will be easier to use.


FligthQuery - Query FlightAware FlightXml API with SQL


Created ASP.NET MVC forum originally targeting home owner associations but now in use by an investor group.


Currently Reading