by breeve
14. May 2016 11:45
Pickaxe uses SQL like syntax to scrape data from web sites. It is hosted on github and is open source.
Recently, I added functionality to insert scraped results directly into SQL Server. The example below screen scrapes Lake Travis water levels and inserts the results directly into SQL server. Visit the Pickaxe download page to run the code below. The attached youtube video shows how the code was developed.
create mssql levels(month string, year int, level float)
with(
connectionstring = 'Server=localhost;Database=scrape;Trusted_Connection=True;',
dbtable = 'LakeLevels'
)
insert overwrite levels
select
pick 'td:nth-child(1) p.bold' match '(\d+)-(\w+)' replace '$2',
pick 'td:nth-child(1) p.bold' match '(\d+)-(\w+)' replace '$1',
pick 'td:nth-child(2)' match '\d{3}\.\d{2}'
from download page (
select
'http://www.golaketravis.com/waterlevel/' + pick '' take attribute 'href'
from download page 'http://www.golaketravis.com/waterlevel/'
where nodes = 'table[width="100%"] td[style="background-color: #62ABCC;"] p.white a'
) with (thread(10))
where nodes = 'table[width="600"] tr'