aboutsummaryrefslogtreecommitdiffstats

Google Group Scrapper

Build Status

A small script as a replacement of the old PHP script for downloading messages stored in the black hole of the Google Groups.

How to use it?

This script requires formail(1) from procmail package. Any version is OK, so please install it from your distribution's repositories. Then run:

pip install beautifulsoup4 PyYAML
python gg_scraper.py 'https://groups.google.com/forum/#!forum/<group_name>'

Background

I would never start without an inspiration from the comment by Sean Hogan on my previous post on the theme of locked down nature of Google Groups:

At least google-groups appears to follow google\'s own advice on making AJAX sites accessible to the google web-crawler. See https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

So for me, http://groups.google.com/d/forum/jbrout:FBv1oMXRZkxB6YShaFuNHc3-Moc&cuid=3654582 redirects (eventually) to https://groups.google.com/forum/#!forum/jbrout which can be viewed in raw HTML as https://groups.google.com/forum/?_escaped_fragment_=forum/jbrout

Google Groups seems a perfect use-case for extreme-progressive-enhancement, but what would I know.

regards, Sean


All issues, questions, complaints, or (even better!) patches should be send via email to ~mcepl/devel@lists.sr.ht email list (for patches use git send-email). For the issue tracking I use git-bug in this repo.