Crawling/mirroring Browser-Generated Web Pages
tldr; When parts of sites are generated on the fly by browser Javascript, they can’t be directly archived. Augmenting a simple crawler with an easy-to-use browser automation subsystem (Selenium) makes this very straightforward. Background If you got past the title of this post, chances are you’ve executed a command something like this: wget –mirror –convert-links –adjust-extension –page-requisites –no-parent http://example.org This...