There is a program , that parsing a website . Program works well , but too long . I want to simplify/speed it up.
How program works:
- First , the program finds needed hyperlink in Excel
- Then by the hyperlink , the program goes to the site , where it finds a certain table of elements. Then it takes out the “href” of each element , turns it into a hyperlink , and inserts it into Excel in the 1-st table
- Then again by the hyperlink , the program goes to the site , where it finds a certain table of elements. Then it extracts the text of each element and inserts it into Excel in the 2-nd table
-
Then it goes through the elements of the 1-st and 2-nd tables , so that in the 3-rd table each element contains a “hyperlink +text”
Sub Softãèïåðññûëêè() Application.DisplayAlerts = False Call mainìàññèâû Application.DisplayAlerts = True End Sub Sub mainìàññèâû() Dim r As Range Dim firstAddress As String Dim iLoop As Long Dim book1 As Workbook Dim sheetNames(1 To 19) As String Dim Ssilka As String '!!! 1. First , the program finds needed hyperlink sheetNames(1) = "Ëèñò1" sheetNames(2) = "Ëèñò2" sheetNames(3) = "Ëèñò3" sheetNames(4) = "Ëèñò4" sheetNames(5) = "Ëèñò5" sheetNames(6) = "Ëèñò6" sheetNames(7) = "Ëèñò7" sheetNames(8) = "Ëèñò8" sheetNames(9) = "Ëèñò9" sheetNames(10) = "Ëèñò10" sheetNames(11) = "Ëèñò11" sheetNames(12) = "Ëèñò12" sheetNames(13) = "Ëèñò13" sheetNames(14) = "Ëèñò14" sheetNames(15) = "Ëèñò15" sheetNames(16) = "Ëèñò16" sheetNames(17) = "Ëèñò17" sheetNames(18) = "Ëèñò18" sheetNames(19) = "Ëèñò19" Set book1 = Workbooks.Open("E:\Super M\Ïðîåêò ñòàâêè\Ïîèñê ðåøåíèÿ\Óñîâ 7\Óñëîâèÿ äëÿ àíäåðäîãîâ.xlsm") iLoop = -1 With book1.Worksheets("Ëèñò1").Range("R34:R99") For Each r In .Rows If r.Value = 1 Then iLoop = iLoop + 1 Ssilka = r.Offset(, -13).Hyperlinks.Item(1).Address .Parent.Parent.Worksheets(sheetNames(1)).Activate .Parent.Parent.Save extractTable Ssilka, book1, iLoop End If Next r End With book1.Save book1.Close Exit Sub End Sub Function extractTable(Ssilka As String, book1 As Workbook, iLoop As Long) Dim oDom As Object, oTable As Object, oRow As Object Dim iRows As Integer, iCols As Integer Dim x As Integer, y As Integer Dim data() Dim oHttp As Object Dim oRegEx As Object Dim sResponse As String Dim oRange As Range Dim Perem1 As String Dim Perem2 As String '!!!2. Then by the hyperlink , the program goes to the site , where it finds a certain table of elements.Then it takes out the "href" of each element , turns it into a hyperlink , and inserts it into Excel in the 1-st table ' get page Set oHttp = CreateObject("MSXML2.XMLHTTP") oHttp.Open "GET", Ssilka, False oHttp.Send ' cleanup response sResponse = StrConv(oHttp.responseBody, vbUnicode) Set oHttp = Nothing sResponse = Mid$ (sResponse, InStr(1, sResponse, "<!DOCTYPE ")) Set oRegEx = CreateObject("vbscript.regexp") With oRegEx .MultiLine = True .Global = True .IgnoreCase = False .Pattern = "<(script|SCRIPT)[\w\W]+?</>" sResponse = .Replace(sResponse, "") End With Set oRegEx = Nothing ' create Document from response Set oDom = CreateObject("htmlFile") oDom.Write sResponse DoEvents ' table with results, indexes starts with zero Set oTable = oDom.getelementsbytagname("table")(3) DoEvents iRows = oTable.Rows.Length iCols = oTable.Rows(1).Cells.Length ' first row and first column contain no intresting data ReDim data(1 To iRows - 1, 1 To iCols - 1) ' fill in data array For x = 1 To iRows - 1 Set oRow = oTable.Rows(x) For y = 1 To iCols - 1 If oRow.Cells(y).Children.Length > 0 Then data(x, y) = oRow.Cells(y).getelementsbytagname("a")(0).getattribute("href") End If Next y Next x Set oRow = Nothing Set oTable = Nothing Set oDom = Nothing ' put data array on worksheet Set oRange = book1.ActiveSheet.Cells(110, 26 + (iLoop * 21)).Resize(iRows - 1, iCols - 1) oRange.NumberFormat = "@" oRange.Value = data oRange.Replace What:="about:", Replacement:="http://allscores.ru/soccer/" Set oRange = Nothing '!!!! 3. Then again by the hyperlink , the program goes to the site , where it finds a certain table of elements. Then it extracts the text of each element and inserts it into Excel in the 2-nd table ' get page Set oHttp = CreateObject("MSXML2.XMLHTTP") oHttp.Open "GET", Ssilka, False oHttp.Send ' cleanup response sResponse = StrConv(oHttp.responseBody, vbUnicode) Set oHttp = Nothing sResponse = Mid$ (sResponse, InStr(1, sResponse, "<!DOCTYPE ")) Set oRegEx = CreateObject("vbscript.regexp") With oRegEx .MultiLine = True .Global = True .IgnoreCase = False .Pattern = "<(script|SCRIPT)[\w\W]+?</>" sResponse = .Replace(sResponse, "") End With Set oRegEx = Nothing ' create Document from response Set oDom = CreateObject("htmlFile") oDom.Write sResponse DoEvents ' table with results, indexes starts with zero Set oTable = oDom.getelementsbytagname("table")(3) DoEvents iRows = oTable.Rows.Length iCols = oTable.Rows(1).Cells.Length ' first row and first column contain no intresting data ReDim data(1 To iRows - 1, 1 To iCols - 1) ' fill in data array For x = 1 To iRows - 1 Set oRow = oTable.Rows(x) For y = 1 To iCols - 1 If oRow.Cells(y).Children.Length > 0 Then data(x, y) = oRow.Cells(y).innerText End If Next y Next x Set oRow = Nothing Set oTable = Nothing Set oDom = Nothing ' put data array on worksheet Set oRange = book1.ActiveSheet.Cells(185, 26 + (iLoop * 21)).Resize(iRows - 1, iCols - 1) oRange.NumberFormat = "@" oRange.Value = data Set oRange = Nothing '!!! 4. Then it goes through the elements of the 1-st and 2-nd tables , so that in the 3-rd table each element contains a "hyperlink +text" For A = 0 To 4 For B = 0 To 65 Perem1 = book1.ActiveSheet.Cells(110 + B, (26 + (iLoop * 21)) + A).Value Perem2 = book1.ActiveSheet.Cells(185 + B, (26 + (iLoop * 21)) + A).Value book1.ActiveSheet.Hyperlinks.Add Anchor:=Cells(34 + B, (26 + (iLoop * 21)) + A), Address:=Perem1, TextToDisplay:=Perem2 Next Next End Function