diff options
-rw-r--r-- | LICENSE | 22 | ||||
-rw-r--r-- | README.md | 49 | ||||
-rw-r--r-- | README.rst | 55 |
3 files changed, 71 insertions, 55 deletions
@@ -0,0 +1,22 @@ +Copyright © 2024 Matěj Cepl, mcepl at cepl dot eu + +Permission is hereby granted, free of charge, to any person +obtaining a copy of this software and associated documentation +files (the “Software”), to deal in the Software without +restriction, including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following +conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..54d636d --- /dev/null +++ b/README.md @@ -0,0 +1,49 @@ +# Google Group Scrapper + +![Build Status](https://secure.travis-ci.org/mcepl/gg_scraper.png) + +A small script as a replacement of [the old PHP +script](http://saturnboy.com/2010/03/scraping-google-groups/) for +downloading messages stored in the black hole of the Google Groups. + +## How to use it? + +This script requires `formail(1)` from `procmail` package. Any version +is OK, so please install it from your distribution's repositories. Then +run: + + pip install beautifulsoup4 PyYAML + python gg_scraper.py 'https://groups.google.com/forum/#!forum/<group_name>' + +## Background + +I would never start without an inspiration from the +[comment](https://luther.ceplovi.cz/blog/2013/09/19/we-should-stop-even-pretending-google-is-trying-to-do-the-right-thingtm/#comment-133-by-sean-hogan) +by Sean Hogan on my previous post on the theme of locked down nature of +Google Groups: + +> At least google-groups appears to follow google\'s own advice on +> making AJAX sites accessible to the google web-crawler. See +> <https://developers.google.com/webmasters/ajax-crawling/docs/getting-started> +> +> So for me, +> <http://groups.google.com/d/forum/jbrout:FBv1oMXRZkxB6YShaFuNHc3-Moc&cuid=3654582> +> redirects (eventually) to +> <https://groups.google.com/forum/#!forum/jbrout> which can be viewed +> in raw HTML as +> <https://groups.google.com/forum/?_escaped_fragment_=forum/jbrout> +> +> Google Groups seems a perfect use-case for +> extreme-progressive-enhancement, but what would I know. +> +> regards, Sean + +------------------------------------------------------------------------ + +All issues, questions, complaints, or (even +better!) patches should be send via email to +[~mcepl/<devel@lists.sr.ht>](mailto:~mcepl/devel@lists.sr.ht) +email list (for patches use [git +send-email](https://git-send-email.io/)). For the issue tracking +I use [git-bug](https://github.com/MichaelMure/git-bug) in this +repo. diff --git a/README.rst b/README.rst deleted file mode 100644 index 5f656b0..0000000 --- a/README.rst +++ /dev/null @@ -1,55 +0,0 @@ -===================== -Google Group Scrapper -===================== - -.. image:: https://secure.travis-ci.org/mcepl/gg_scraper.png - :alt: Build Status - -A small script as a replacement of `the old PHP script`_ for downloading messages stored in the black hole of the Google Groups. - -.. _`the old PHP script`: - http://saturnboy.com/2010/03/scraping-google-groups/ - -How to use it? --------------- - -This script requires ``formail(1)`` from ``procmail`` package. Any -version is OK, so please install it from your distribution’s -repositories. Then run: - -:: - - pip install beautifulsoup4 PyYAML - python gg_scraper.py 'https://groups.google.com/forum/#!forum/<group_name>' - -Background ----------- - -I would never start without an inspiration from the comment_ by Sean Hogan on my previous post on the theme of locked down nature of Google Groups: - - At least google-groups appears to follow google's own advice on making AJAX sites accessible to the google web-crawler. See https://developers.google.com/webmasters/ajax-crawling/docs/getting-started - - So for me, - http://groups.google.com/d/forum/jbrout:FBv1oMXRZkxB6YShaFuNHc3-Moc&cuid=3654582 - redirects (eventually) to - https://groups.google.com/forum/#!forum/jbrout - which can be viewed in raw HTML as - https://groups.google.com/forum/?_escaped_fragment_=forum/jbrout - - Google Groups seems a perfect use-case for extreme-progressive-enhancement, but what would I know. - - regards, - Sean - -.. _comment: - https://luther.ceplovi.cz/blog/2013/09/19/we-should-stop-even-pretending-google-is-trying-to-do-the-right-thingtm/#comment-133-by-sean-hogan - ----- - -All issues, questions, complaints, or (even -better!) patches should be send via email to -[~mcepl/devel@lists.sr.ht](mailto:~mcepl/devel@lists.sr.ht) -email list (for patches use [git -send-email](https://git-send-email.io/)). For the issue tracking -I use [git-bug](https://github.com/MichaelMure/git-bug) in this -repo. |