I have a program written in Java that does some scraping on my web page. The page content is loaded through Javascript, and the entire website requires Javascript to be enabled, otherwise you are presented with a text error saying to enable Javascript before you can proceed. With that said, any web scraping must have Javascript enabled and due to this, I have gone the htmlunit route with the following code:
WebClient webClient = new WebClient(); webClient.getOptions().setJavaScriptEnabled(true); HtmlPage page = webClient.getPage(URL); HtmlElement rx = page.getHtmlElementById("ID"); String text = rx.getTextContent(); webClient.close(); return text;
However, since htmlunit is a GUI-less browser, it’s still relatively slower than other methods that I could have used if Javascript wasn’t an issue. So with that said, does anyone know any faster methods that I could use to parse a Javascript enabled website?
Edit: I don’t know much about PhantomJS, but could that be an optimal solution?