oh, maybe you all would know this.
any tips for scraping a #wordpress-based site for the urls of all posts by a particular author? I tried a few combinations of lynx -dump, wget, & grep but don't know enough about any of them.
i.e. https://site.tld/author/authorsname, https://site.tld/author/authorsname/page/2, page/3, etc., where the posts are like https://site.tld/1970/01/01/title-of-post
@nev What you need is a spider. There was a tool that allowed you to download all the data from a site, or at least list all the links...
This list might help you.
https://en.wikipedia.org/wiki/Web_crawler#Open-source_crawlers
@nev You have options to download external lnks, and whether to download or not pages outside a given path. It needs a bit of trial and error, but it's excellent for backups.
@nev Official httrack / winhttrack manual:
@nev Good luck! 😉 👍
@rick_777 this looks handy, thanks!
@nev Found a page with general instructions and comments:
https://wptavern.com/how-to-archive-a-site-you-dont-have-access-to