aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Bump the release to 0.40.4Matěj Cepl2014-01-042-1/+2
|
* Update pip install in .travis.ymlMatěj Cepl2014-01-043-5/+7
| | | | Supporting 2.6 would be too difficult ... patches welcome!
* Make at least unittests running with Python 2.7Matěj Cepl2014-01-045-22/+49
| | | | | Add support for Travis-CI Fixes #283
* Fix setup.pyMatěj Cepl2014-01-034-2/+9
|
* Create setup.py and add GPLv3 license.Matěj Cepl2014-01-032-0/+71
| | | | Fixes #282
* Improve README.rstMatěj Cepl2014-01-031-4/+18
|
* Skip over unfilled items in the unmangling config. file.Matěj Cepl2014-01-031-0/+2
| | | | Fixes #281
* Survive HTTPErrors on malformed URLs.Matěj Cepl2014-01-031-10/+16
|
* Ignore results of the workMatěj Cepl2014-01-031-0/+2
|
* Some logging to allow some following of the process.Matěj Cepl2014-01-021-8/+12
| | | | Fixes #280
* Unmangle mbox according to the configuration file.Matěj Cepl2014-01-024-11/+327
| | | | Fixes #273, #276
* For each group generate also a list of all mangled addresses.Matěj Cepl2014-01-025-17/+63
| | | | | | | | | | Google Groups (rightly) protects addresses against spammers. There is (obviously) no way how to find true value of these addresses programmatically, so we just generate list of all affected ones, which could be later completed with true values (collected somehow) and fixed by some other script. Fixes #275
* Creating raw MBOX fixed (tests included)Matěj Cepl2014-01-025-34/+467
| | | | Fix #278 and #271
* General structure of operation and MBOX writing.Matěj Cepl2013-12-3017-2762/+46
| | | | So far, only unit test for the latter.
* Collect raw articleMatěj Cepl2013-12-303-1/+125
|
* Collect articles for one topic.Matěj Cepl2013-12-295-20/+83
|
* Convert to python3.Matěj Cepl2013-12-284-44/+50
|
* Collecting topics.Matěj Cepl2013-12-2818-7/+2893
| | | | Added also some testing pages.
* Add a note about the inspiration for the project.Matěj Cepl2013-12-271-0/+3
|
* Remove BE repository and move to BugzillaMatěj Cepl2013-12-279-303/+1
| | | | https://luther.ceplovi.cz/bugzilla/buglist.cgi?quicksearch=product%3Agg_scrapper
* Start of the project.Matěj Cepl2013-11-2212-0/+375
Based on the unescpaing algorithm from https://developers.google.com/webmasters/ajax-crawling/docs/getting-started (on which I have been reminded by Sean Hogan (http://www.meekostuff.net/))