Web scraping county, population and median home value from United States postal zip codes.
I switched from using
InternetExplorer.Application to New MSXML2.XMLHTTP60. For the first time, I broke code into smaller subs or functions, among other things.
- This code ran successfully 1 time, with other tests returning only 70-100 records.
InternetExplorer.Application, I projected the code to complete in 1H:45M by timing 20 records. With the XML method, I projected 25-30 min as it takes about 5 min to fetch 70-100 records.
- Excel goes completely blank when running (white screen).
There appears to be several things I can do:
- Early binding (which I didn’t understand how to implement based on the thread)
- Creating VBScripts to simulate multi-threading; I’ve not created VBScripts with Excel so this option is taking me a little bit as I try to read and study VBScripts more in depth.
- Can’t seem to find the link, but elsewhere, I read that jumping around would make things slow. According to the thread, I should store all values in an array first, after all values are retrieved, I then should input them into the corresponding cells, instead of retrieving and inputting right away. (I think I can handle this but not sure if anyone has any pointers as to whether this actually works).
'ZipCodeScrape Variables Public ZipCodeRange As Range Public cell as Variant 'Web Variables Public IE As MSXML2.XMLHTTP60 Public url As String Public post As Object Public HTML As MSHTML.HTMLDocument Public HTMLbody As MSHTML.HTMLbody
Gathering zip codes and using a function to retrieve data
Sub ZipCodeScrape() Set IE = New MSXML2.XMLHTTP60 url = "https://www.unitedstateszipcodes.org/" Set ZipCodeRange = Range("C2", Range("C2").End(xlDown)) Dim TargetElement(1 To 3) As String TargetElement(1) = "County:" TargetElement(2) = "Population" TargetElement(3) = "Median Home Value" Dim i As Integer For Each cell In ZipCodeRange For i = 1 To 3 cell.Offset(0, i).Value = dataScrape("th", TargetElement(i), "td") Next i Next cell End Sub
Here is the function I’m using to retrieve the data
Private Function dataScrape(ByVal TagName As String, Element As String, targetTagName) IE.Open "GET", url & cell.Value, False IE.send While IE.readyState <> 4: DoEvents: Wend Set HTML = New MSHTML.HTMLDocument Set HTMLbody = HTML.body HTMLbody.innerHTML = IE.responseText For Each post In HTMLbody.getElementsByTagName(TagName) If InStr(post.innerText, Element) > 0 Then dataScrape = post.ParentNode.getElementsByTagName(targetTagName)(0).innerText: Exit For End If Next post End Function
✓ Extra quality
ExtraProxies brings the best proxy quality for you with our private and reliable proxies
✓ Extra anonymity
Top level of anonymity and 100% safe proxies – this is what you get with every proxy package
✓ Extra speed
1,ooo mb/s proxy servers speed – we are way better than others – just enjoy our proxies!
USA proxy location
We offer premium quality USA private proxies – the most essential proxies you can ever want from USA
Our proxies have TOP level of anonymity + Elite quality, so you are always safe and secure with your proxies
Use your proxies as much as you want – we have no limits for data transfer and bandwidth, unlimited usage!
Superb fast proxy servers with 1,000 mb/s speed – sit back and enjoy your lightning fast private proxies!
99,9% servers uptime
Alive and working proxies all the time – we are taking care of our servers so you can use them without any problems
No usage restrictions
You have freedom to use your proxies with every software, browser or website you want without restrictions
Perfect for SEO
We are 100% friendly with all SEO tasks as well as internet marketing – feel the power with our proxies
Buy more proxies and get better price – we offer various proxy packages with great deals and discounts
We are working 24/7 to bring the best proxy experience for you – we are glad to help and assist you!