Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Switch README to Markdown and add LICENSE.HEADmaster | Matěj Cepl | 2024-01-19 | 3 | -55/+71 |
| | |||||
* | Update communication options. | Matěj Cepl | 2024-01-19 | 1 | -14/+9 |
| | |||||
* | Add a crucial comment by Sean Hogan in full. | Matěj Cepl | 2024-01-19 | 1 | -1/+15 |
| | |||||
* | Use as the url of the project the gitlab one.0.10.0 | Matěj Cepl | 2014-12-28 | 2 | -2/+2 |
| | |||||
* | Make python 2.7 default again and clean up.0.9.0 | Matěj Cepl | 2014-12-12 | 4 | -35/+25 |
| | | | | | | Switch setup.py to use setuptools. Fixes #1, fixes #2 | ||||
* | Fix demangling.0.8 | Matěj Cepl | 2014-03-17 | 1 | -6/+21 |
| | | | | | | * remove some reasons why import could crash * use case ignoring RE instead of plain .replace (cache those REs) | ||||
* | Fix logic to actually demangle the mbox. | Matěj Cepl | 2014-03-17 | 1 | -2/+2 |
| | |||||
* | Make README.rst less Debian-centric. | Matěj Cepl | 2014-02-13 | 2 | -3/+12 |
| | |||||
* | Ignore links in welcome message | Izidor Matušov | 2014-02-13 | 1 | -1/+3 |
| | | | | | | | Some groups, e.g. django-oscar [1], have links in welcome message. Those are not supposed to be a link to the next page, ignore them. 1: https://groups.google.com/forum/#!forum/django-oscar | ||||
* | Failing test for not recognizing a welcome message. | Matěj Cepl | 2014-02-13 | 1 | -0/+6 |
| | | | | See https://github.com/izidormatusov/gg_scraper/commit/75559fc4d3b5d84c75d0b100361f523452c7e78d | ||||
* | Fix setup.py.RunTests.run() to return proper exit code. | Matěj Cepl | 2014-02-13 | 4 | -26/+35 |
| | | | | | | | | | The previous situation didn't fail on Travis-CI. But this does, so I had to include also a number of fixes for revealed issues. Fixing #314, #315, #316, and #317 | ||||
* | Functional testing | Matěj Cepl | 2014-02-11 | 2 | -7/+109 |
| | | | | | From https://github.com/izidormatusov/gg_scraper\ /commit/d85483b2204591951f4667d4b0f15a9c0570ec58 | ||||
* | Fix links in README | Izidor Matušov | 2014-02-08 | 1 | -3/+3 |
| | |||||
* | Clean up README | Izidor Matušov | 2014-02-08 | 1 | -6/+20 |
| | | | | Allow simple copy'n'paste for usage | ||||
* | Eliminate do_redirect() method.0.6 | Matěj Cepl | 2014-01-11 | 2 | -19/+14 |
| | | | | One less HTTP connection, which was actually not needed. | ||||
* | Make whole script comaptible with python 2.6 | Matěj Cepl | 2014-01-11 | 6 | -30/+74 |
| | | | | | | | | How low we fell :( Also: * On Python 2.6 we have to send bytes to proc.communicate() not unicode str (Fixes #288) | ||||
* | Rewrite Group.get_topics to be iterative rather than recursive. | Matěj Cepl | 2014-01-11 | 1 | -15/+17 |
| | | | | Fixes #284 | ||||
* | Sort unmangled addresses in the configuration file by frequency. | Matěj Cepl | 2014-01-11 | 3 | -9/+15 |
| | | | | Fixes #287 | ||||
* | scrapper -> scraper0.5 | Matěj Cepl | 2014-01-11 | 6 | -32/+32 |
| | | | | | | | | | | Woops! scrapper: a fighter or aggressive competitor, especially one always ready or eager for a fight, argument, or contest: the best lightweight scrapper in boxing; a rugged political scrapper. That's not what I meant. | ||||
* | Bump the release to 0.40.4 | Matěj Cepl | 2014-01-04 | 2 | -1/+2 |
| | |||||
* | Update pip install in .travis.yml | Matěj Cepl | 2014-01-04 | 3 | -5/+7 |
| | | | | Supporting 2.6 would be too difficult ... patches welcome! | ||||
* | Make at least unittests running with Python 2.7 | Matěj Cepl | 2014-01-04 | 5 | -22/+49 |
| | | | | | Add support for Travis-CI Fixes #283 | ||||
* | Fix setup.py | Matěj Cepl | 2014-01-03 | 4 | -2/+9 |
| | |||||
* | Create setup.py and add GPLv3 license. | Matěj Cepl | 2014-01-03 | 2 | -0/+71 |
| | | | | Fixes #282 | ||||
* | Improve README.rst | Matěj Cepl | 2014-01-03 | 1 | -4/+18 |
| | |||||
* | Skip over unfilled items in the unmangling config. file. | Matěj Cepl | 2014-01-03 | 1 | -0/+2 |
| | | | | Fixes #281 | ||||
* | Survive HTTPErrors on malformed URLs. | Matěj Cepl | 2014-01-03 | 1 | -10/+16 |
| | |||||
* | Ignore results of the work | Matěj Cepl | 2014-01-03 | 1 | -0/+2 |
| | |||||
* | Some logging to allow some following of the process. | Matěj Cepl | 2014-01-02 | 1 | -8/+12 |
| | | | | Fixes #280 | ||||
* | Unmangle mbox according to the configuration file. | Matěj Cepl | 2014-01-02 | 4 | -11/+327 |
| | | | | Fixes #273, #276 | ||||
* | For each group generate also a list of all mangled addresses. | Matěj Cepl | 2014-01-02 | 5 | -17/+63 |
| | | | | | | | | | | Google Groups (rightly) protects addresses against spammers. There is (obviously) no way how to find true value of these addresses programmatically, so we just generate list of all affected ones, which could be later completed with true values (collected somehow) and fixed by some other script. Fixes #275 | ||||
* | Creating raw MBOX fixed (tests included) | Matěj Cepl | 2014-01-02 | 5 | -34/+467 |
| | | | | Fix #278 and #271 | ||||
* | General structure of operation and MBOX writing. | Matěj Cepl | 2013-12-30 | 17 | -2762/+46 |
| | | | | So far, only unit test for the latter. | ||||
* | Collect raw article | Matěj Cepl | 2013-12-30 | 3 | -1/+125 |
| | |||||
* | Collect articles for one topic. | Matěj Cepl | 2013-12-29 | 5 | -20/+83 |
| | |||||
* | Convert to python3. | Matěj Cepl | 2013-12-28 | 4 | -44/+50 |
| | |||||
* | Collecting topics. | Matěj Cepl | 2013-12-28 | 18 | -7/+2893 |
| | | | | Added also some testing pages. | ||||
* | Add a note about the inspiration for the project. | Matěj Cepl | 2013-12-27 | 1 | -0/+3 |
| | |||||
* | Remove BE repository and move to Bugzilla | Matěj Cepl | 2013-12-27 | 9 | -303/+1 |
| | | | | https://luther.ceplovi.cz/bugzilla/buglist.cgi?quicksearch=product%3Agg_scrapper | ||||
* | Start of the project. | Matěj Cepl | 2013-11-22 | 12 | -0/+375 |
Based on the unescpaing algorithm from https://developers.google.com/webmasters/ajax-crawling/docs/getting-started (on which I have been reminded by Sean Hogan (http://www.meekostuff.net/)) |