Google Group Scrapper
A small script as a replacement of the old PHP script for downloading messages stored in the black hole of the Google Groups.
How to use it?
This script requires formail(1)
from procmail
package. Any version
is OK, so please install it from your distribution's repositories. Then
run:
pip install beautifulsoup4 PyYAML
python gg_scraper.py 'https://groups.google.com/forum/#!forum/<group_name>'
Background
I would never start without an inspiration from the comment by Sean Hogan on my previous post on the theme of locked down nature of Google Groups:
At least google-groups appears to follow google\'s own advice on making AJAX sites accessible to the google web-crawler. See https://developers.google.com/webmasters/ajax-crawling/docs/getting-started
So for me, http://groups.google.com/d/forum/jbrout:FBv1oMXRZkxB6YShaFuNHc3-Moc&cuid=3654582 redirects (eventually) to https://groups.google.com/forum/#!forum/jbrout which can be viewed in raw HTML as https://groups.google.com/forum/?_escaped_fragment_=forum/jbrout
Google Groups seems a perfect use-case for extreme-progressive-enhancement, but what would I know.
regards, Sean
All issues, questions, complaints, or (even better!) patches should be send via email to ~mcepl/devel@lists.sr.ht email list (for patches use git send-email). For the issue tracking I use git-bug in this repo.