aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Switch README to Markdown and add LICENSE.HEADmasterMatěj Cepl2024-01-193-55/+71
|
* Update communication options.Matěj Cepl2024-01-191-14/+9
|
* Add a crucial comment by Sean Hogan in full.Matěj Cepl2024-01-191-1/+15
|
* Use as the url of the project the gitlab one.0.10.0Matěj Cepl2014-12-282-2/+2
|
* Make python 2.7 default again and clean up.0.9.0Matěj Cepl2014-12-124-35/+25
| | | | | | Switch setup.py to use setuptools. Fixes #1, fixes #2
* Fix demangling.0.8Matěj Cepl2014-03-171-6/+21
| | | | | | * remove some reasons why import could crash * use case ignoring RE instead of plain .replace (cache those REs)
* Fix logic to actually demangle the mbox.Matěj Cepl2014-03-171-2/+2
|
* Make README.rst less Debian-centric.Matěj Cepl2014-02-132-3/+12
|
* Ignore links in welcome messageIzidor Matušov2014-02-131-1/+3
| | | | | | | Some groups, e.g. django-oscar [1], have links in welcome message. Those are not supposed to be a link to the next page, ignore them. 1: https://groups.google.com/forum/#!forum/django-oscar
* Failing test for not recognizing a welcome message.Matěj Cepl2014-02-131-0/+6
| | | | See https://github.com/izidormatusov/gg_scraper/commit/75559fc4d3b5d84c75d0b100361f523452c7e78d
* Fix setup.py.RunTests.run() to return proper exit code.Matěj Cepl2014-02-134-26/+35
| | | | | | | | | The previous situation didn't fail on Travis-CI. But this does, so I had to include also a number of fixes for revealed issues. Fixing #314, #315, #316, and #317
* Functional testingMatěj Cepl2014-02-112-7/+109
| | | | | From https://github.com/izidormatusov/gg_scraper\ /commit/d85483b2204591951f4667d4b0f15a9c0570ec58
* Fix links in READMEIzidor Matušov2014-02-081-3/+3
|
* Clean up READMEIzidor Matušov2014-02-081-6/+20
| | | | Allow simple copy'n'paste for usage
* Eliminate do_redirect() method.0.6Matěj Cepl2014-01-112-19/+14
| | | | One less HTTP connection, which was actually not needed.
* Make whole script comaptible with python 2.6Matěj Cepl2014-01-116-30/+74
| | | | | | | | How low we fell :( Also: * On Python 2.6 we have to send bytes to proc.communicate() not unicode str (Fixes #288)
* Rewrite Group.get_topics to be iterative rather than recursive.Matěj Cepl2014-01-111-15/+17
| | | | Fixes #284
* Sort unmangled addresses in the configuration file by frequency.Matěj Cepl2014-01-113-9/+15
| | | | Fixes #287
* scrapper -> scraper0.5Matěj Cepl2014-01-116-32/+32
| | | | | | | | | | Woops! scrapper: a fighter or aggressive competitor, especially one always ready or eager for a fight, argument, or contest: the best lightweight scrapper in boxing; a rugged political scrapper. That's not what I meant.
* Bump the release to 0.40.4Matěj Cepl2014-01-042-1/+2
|
* Update pip install in .travis.ymlMatěj Cepl2014-01-043-5/+7
| | | | Supporting 2.6 would be too difficult ... patches welcome!
* Make at least unittests running with Python 2.7Matěj Cepl2014-01-045-22/+49
| | | | | Add support for Travis-CI Fixes #283
* Fix setup.pyMatěj Cepl2014-01-034-2/+9
|
* Create setup.py and add GPLv3 license.Matěj Cepl2014-01-032-0/+71
| | | | Fixes #282
* Improve README.rstMatěj Cepl2014-01-031-4/+18
|
* Skip over unfilled items in the unmangling config. file.Matěj Cepl2014-01-031-0/+2
| | | | Fixes #281
* Survive HTTPErrors on malformed URLs.Matěj Cepl2014-01-031-10/+16
|
* Ignore results of the workMatěj Cepl2014-01-031-0/+2
|
* Some logging to allow some following of the process.Matěj Cepl2014-01-021-8/+12
| | | | Fixes #280
* Unmangle mbox according to the configuration file.Matěj Cepl2014-01-024-11/+327
| | | | Fixes #273, #276
* For each group generate also a list of all mangled addresses.Matěj Cepl2014-01-025-17/+63
| | | | | | | | | | Google Groups (rightly) protects addresses against spammers. There is (obviously) no way how to find true value of these addresses programmatically, so we just generate list of all affected ones, which could be later completed with true values (collected somehow) and fixed by some other script. Fixes #275
* Creating raw MBOX fixed (tests included)Matěj Cepl2014-01-025-34/+467
| | | | Fix #278 and #271
* General structure of operation and MBOX writing.Matěj Cepl2013-12-3017-2762/+46
| | | | So far, only unit test for the latter.
* Collect raw articleMatěj Cepl2013-12-303-1/+125
|
* Collect articles for one topic.Matěj Cepl2013-12-295-20/+83
|
* Convert to python3.Matěj Cepl2013-12-284-44/+50
|
* Collecting topics.Matěj Cepl2013-12-2818-7/+2893
| | | | Added also some testing pages.
* Add a note about the inspiration for the project.Matěj Cepl2013-12-271-0/+3
|
* Remove BE repository and move to BugzillaMatěj Cepl2013-12-279-303/+1
| | | | https://luther.ceplovi.cz/bugzilla/buglist.cgi?quicksearch=product%3Agg_scrapper
* Start of the project.Matěj Cepl2013-11-2212-0/+375
Based on the unescpaing algorithm from https://developers.google.com/webmasters/ajax-crawling/docs/getting-started (on which I have been reminded by Sean Hogan (http://www.meekostuff.net/))