diff options
author | W. Trevor King <wking@drexel.edu> | 2009-07-12 08:38:40 -0400 |
---|---|---|
committer | W. Trevor King <wking@drexel.edu> | 2009-07-12 08:38:40 -0400 |
commit | 76d552e5401df990a601f245f30f45d7c13cdd1e (patch) | |
tree | 5c510a12e8cb3df1dd5d30cd5aebb6b7938e2ceb /.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments | |
parent | a65b273fa14df2a085342bac14abb8a2167ff98a (diff) | |
download | bugseverywhere-76d552e5401df990a601f245f30f45d7c13cdd1e.tar.gz |
Added be-mbox-to-xml.
Reworked to allow "be comment" to handle unicode strings (see bug
e4ed63f6-9000-4d0b-98c3-487269140141). The solution was to escape all
the unicode to produce and ASCII string before calling
ElementTree.XML, and then converting back to unicode afterwards.
Added a unicode-containing comment to the end of bug
f7ccd916-b5c7-4890-a2e3-8c8ace17ae3a so that there's a handy unicode
comment for testing.
XML headers (e.g. '<?xml version="1.0" encoding="UTF-8" ?>') are
now added to all xml output from be.
Switched non-text/* encoding library to base64 instead of
email.encoders, which makes that code in libbe/comment.py simpler.
Changed libbe/mapfile.py error encoding from string_escape to
unicode_escape so it can handle unicode.
Everything's still untested, and be-xml-to-mbox doesn't handle unicode
yet, but I felt this commit was getting a bit unwieldy ;).
Diffstat (limited to '.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments')
10 files changed, 72 insertions, 0 deletions
diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/07fc448f-c42e-4846-929a-8924de485766/body b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/07fc448f-c42e-4846-929a-8924de485766/body new file mode 100644 index 0000000..0598d70 --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/07fc448f-c42e-4846-929a-8924de485766/body @@ -0,0 +1,8 @@ +<type 'unicode'> <body>�</body> +Traceback (most recent call last): + File "<string>", line 1, in <module> + File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 963, in XML + parser.feed(text) + File "/usr/lib/python2.5/xml/etree/ElementTree.py", line 1245, in feed + self._parser.Parse(data, 0) +UnicodeEncodeError: 'ascii' codec can't encode character u'\u1234' in position 6: ordinal not in range(128) diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/07fc448f-c42e-4846-929a-8924de485766/values b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/07fc448f-c42e-4846-929a-8924de485766/values new file mode 100644 index 0000000..cd8d8b9 --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/07fc448f-c42e-4846-929a-8924de485766/values @@ -0,0 +1,11 @@ +Content-type: text/plain + + +Date: Sun, 12 Jul 2009 11:34:22 +0000 + + +From: W. Trevor King <wking@drexel.edu> + + +In-reply-to: faa686bf-c0eb-48bf-8a0b-d9a2e02bd132 + diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/520a9829-8d90-43ce-be64-868b8321e5b0/body b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/520a9829-8d90-43ce-be64-868b8321e5b0/body new file mode 100644 index 0000000..397d4b6 --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/520a9829-8d90-43ce-be64-868b8321e5b0/body @@ -0,0 +1 @@ +It looks like etree wants a byte string, not unicode input diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/520a9829-8d90-43ce-be64-868b8321e5b0/values b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/520a9829-8d90-43ce-be64-868b8321e5b0/values new file mode 100644 index 0000000..8bdaf52 --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/520a9829-8d90-43ce-be64-868b8321e5b0/values @@ -0,0 +1,11 @@ +Content-type: text/plain + + +Date: Sun, 12 Jul 2009 11:42:16 +0000 + + +From: W. Trevor King <wking@drexel.edu> + + +In-reply-to: faa686bf-c0eb-48bf-8a0b-d9a2e02bd132 + diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/8b54e56e-c693-4594-998f-5bd6c1f385d7/body b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/8b54e56e-c693-4594-998f-5bd6c1f385d7/body new file mode 100644 index 0000000..ce2bb8d --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/8b54e56e-c693-4594-998f-5bd6c1f385d7/body @@ -0,0 +1,5 @@ +For example, this works: + +python -c 'from xml.etree import ElementTree; a=u"<body>\u1234</body>"; print type(a), a; b=ElementTree.XML(a.encode("unicode_escape")); print type(b.text), unicode(b.text).decode("unicode_escape");' + +Ugly though :p. Ah well. diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/8b54e56e-c693-4594-998f-5bd6c1f385d7/values b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/8b54e56e-c693-4594-998f-5bd6c1f385d7/values new file mode 100644 index 0000000..1784e0e --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/8b54e56e-c693-4594-998f-5bd6c1f385d7/values @@ -0,0 +1,11 @@ +Content-type: text/plain + + +Date: Sun, 12 Jul 2009 11:46:57 +0000 + + +From: W. Trevor King <wking@drexel.edu> + + +In-reply-to: 520a9829-8d90-43ce-be64-868b8321e5b0 + diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/bb124fd9-08f5-4f82-a035-6355e8403075/body b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/bb124fd9-08f5-4f82-a035-6355e8403075/body new file mode 100644 index 0000000..89a8f8d --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/bb124fd9-08f5-4f82-a035-6355e8403075/body @@ -0,0 +1 @@ +That's with Python 2.5.2 and ElementTree "2326 2005-03-17 07:45:21Z fredrik" diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/bb124fd9-08f5-4f82-a035-6355e8403075/values b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/bb124fd9-08f5-4f82-a035-6355e8403075/values new file mode 100644 index 0000000..cca07c3 --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/bb124fd9-08f5-4f82-a035-6355e8403075/values @@ -0,0 +1,11 @@ +Content-type: text/plain + + +Date: Sun, 12 Jul 2009 11:37:55 +0000 + + +From: W. Trevor King <wking@drexel.edu> + + +In-reply-to: 07fc448f-c42e-4846-929a-8924de485766 + diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/faa686bf-c0eb-48bf-8a0b-d9a2e02bd132/body b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/faa686bf-c0eb-48bf-8a0b-d9a2e02bd132/body new file mode 100644 index 0000000..57e050d --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/faa686bf-c0eb-48bf-8a0b-d9a2e02bd132/body @@ -0,0 +1,5 @@ +Isolated problem to: + +python -c 'from xml.etree import ElementTree; a=u"<body>\u1234</body>"; print type(a), a; b=ElementTree.XML(a);' + +Output attached below diff --git a/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/faa686bf-c0eb-48bf-8a0b-d9a2e02bd132/values b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/faa686bf-c0eb-48bf-8a0b-d9a2e02bd132/values new file mode 100644 index 0000000..e430ea0 --- /dev/null +++ b/.be/bugs/e4ed63f6-9000-4d0b-98c3-487269140141/comments/faa686bf-c0eb-48bf-8a0b-d9a2e02bd132/values @@ -0,0 +1,8 @@ +Content-type: text/plain + + +Date: Sun, 12 Jul 2009 11:31:13 +0000 + + +From: W. Trevor King <wking@drexel.edu> + |