aboutsummaryrefslogtreecommitdiffstats
path: root/worker/lib/parse.go
Commit message (Collapse)AuthorAgeFilesLines
* headers: enable partial header fetchingTim Culverhouse2023-05-161-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Enable partial header fetching by creating config values for headers to specifically include, or specifically exclude. The References field will always be fetched, regardless of the include list. Envelope data is always fetched, but is not shown with :toggle-headers, since it isn't in the RFC822 struct unless explicitly included in the list. Partial headers can break the cache on changes. Update the cache tag key to include the state of the partially-fetched headers. Partial header fetching can have a significant performance increase for IMAP, and for all backends a resource improvement. Some data to support this is below. Gathered by opening aerc, selecting a mailbox with approximately 800 messages and scrolling to the end. Received measured with nethogs, RAM from btop Received | RAM ------------------------------------- All Headers | 9,656 kb | 103 MB Minimum Headers | 896 kb | 36 MB Signed-off-by: Tim Culverhouse <tim@timculverhouse.com> Acked-by: Moritz Poldrack <moritz@poldrack.dev> Acked-by: Robin Jarry <robin@jarry.cc>
* messageinfo: report message sizesKoni Marti2023-04-261-0/+10
| | | | | | | Report sizes of the message across all backends. Signed-off-by: Koni Marti <koni.marti@gmail.com> Acked-by: Robin Jarry <robin@jarry.cc>
* parse msg-id lists more liberallyNguyễn Gia Phong2023-03-261-13/+4
| | | | | | | | | | | | | | | Some user agents deliberately generate non-standard message identifier lists in In-Reply-To and References headers. Instead of failing silently, aerc now falls back to a desperate parser scavaging whatever looking like <id-left@id-right>. As the more liberal parser being substituted with, References header are now stored for IMAP not only when there's a parsing error. Fixes: 31d2f5be3cec ("message-info: add explicit References field") Signed-off-by: Nguyễn Gia Phong <mcsinyx@disroot.org> Acked-by: Robin Jarry <robin@jarry.cc>
* lint: always run golangci-lint@latestRobin Jarry2023-02-261-1/+1
| | | | | | | | | | | | Do not store the dependency in tools.go as there may be conflicts with some indirect dependencies of aerc. Run gofumpt and golangci-lint from their latest tagged release. This should fix issues with go 1.20. Bonus, it drains a bit of fat from go.sum. Signed-off-by: Robin Jarry <robin@jarry.cc> Acked-by: Moritz Poldrack <moritz@poldrack.dev>
* model: change flags array to bitmaskRobin Jarry2023-01-041-1/+1
| | | | | | | Using a list of integers is not optimal. Use a bit mask instead. Signed-off-by: Robin Jarry <robin@jarry.cc> Acked-by: Tim Culverhouse <tim@timculverhouse.com>
* logging: rename package to logRobin Jarry2022-12-021-2/+2
| | | | | | | | | | Use the same name than the builtin "log" package. That way, we do not risk logging in the wrong place. Suggested-by: Tim Culverhouse <tim@timculverhouse.com> Signed-off-by: Robin Jarry <robin@jarry.cc> Tested-by: Bence Ferdinandy <bence@ferdinandy.com> Acked-by: Tim Culverhouse <tim@timculverhouse.com>
* maildir: fallback when parsing for in-reply-to failsKoni Marti2022-11-131-4/+7
| | | | | | | | | | Fallback to just using the raw In-Reply-To header content when it cannot be properly parsed as a message-id. Fixes: ca903d422826 ("envelope: add InReplyTo field") Reported-by: Bence Ferdinandy <bence@ferdinandy.com> Signed-off-by: Koni Marti <koni.marti@gmail.com> Tested-by: Bence Ferdinandy <bence@ferdinandy.com>
* message-info: add explicit References fieldTim Culverhouse2022-11-091-0/+5
| | | | | | | | | | Add an explicit References field to message info. This is useful for storing information needed for threading without storing all of the header values, keeping system RAM usage lower. Signed-off-by: Tim Culverhouse <tim@timculverhouse.com> Tested-by: Inwit <inwit@sindominio.net> Acked-by: Robin Jarry <robin@jarry.cc>
* envelope: add InReplyTo fieldTim Culverhouse2022-11-091-0/+9
| | | | | | | | | | A standard IMAP envelope response includes the In-Reply-To header field. Add this field to the aerc model of Envelope. Update envelope parser to set this value. Update imap worker to set this value. Signed-off-by: Tim Culverhouse <tim@timculverhouse.com> Tested-by: Inwit <inwit@sindominio.net> Acked-by: Robin Jarry <robin@jarry.cc>
* maildir: keep less data in memory for sortingTim Culverhouse2022-11-061-0/+48
| | | | | | | | | | | | | | | | Sorting opens and reads portions of every file within a directory in order to gather the data needed. Specifically, RFC822Headers and BodyStructure are not needed. The RFC822Headers field stores a lot of information, and the BodyStructure field requires parsing the body of the email. Don't set these two values when parsing. Note: in my testing, this dropped sorting a 52k archive from 2.2gb of ram usage, to < 500mb Signed-off-by: Tim Culverhouse <tim@timculverhouse.com> Acked-by: Robin Jarry <robin@jarry.cc>
* lib: parse address header fields to utf-8Koni Marti2022-09-291-6/+9
| | | | | | | | | When parsing address header fields, the field charset is automatically decoded to UTF-8. If the charset is unknown, use the raw field value. Fixes: https://todo.sr.ht/~rjarry/aerc/91 Signed-off-by: Koni Marti <koni.marti@gmail.com> Tested-by: Tim Culverhouse <tim@timculverhouse.com>
* charset: handle unknown charsets more user-friendlyKoni Marti2022-09-251-1/+16
| | | | | | | | | Do not throw an error when the charset is unknown; the message entity can still be read, but log the error instead. Reported-by: falsifian Signed-off-by: Koni Marti <koni.marti@gmail.com> Acked-by: Robin Jarry <robin@jarry.cc>
* parse: remove trailing whitespace from rfc1123z regexThomas Faughnan2022-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | When there is no Date header in a message, aerc falls back to the Received header and tries to extract an rfc1123z date from it (introduced in commit d1600e46). The current regex for extracting the date incorrectly allows for trailing whitespace, causing time.Parse() to fail inside of parseReceivedHeader(). As a result, the message's date is shown as "???????????????????" in the message list and as "0000-12-31 07:03 PM" in the message view (the latter is likely related to the zero value of time.Time). Steps to reproduce: 1) Send yourself a message with no Date header, e.g. with msmtp: printf 'Subject: foo bar\n\nbody text\n' | msmtp --set-date-header=off me@example.com 2) Note the message's displayed date in aerc's message list and message view. Signed-off-by: Thomas Faughnan <tom@tjf.sh> Acked-by: Robin Jarry <robin@jarry.cc>
* tests: fix errors after lint seriesRobin Jarry2022-08-041-1/+1
| | | | | | | | | | | | | | | | | | | Fix the following test failures: FAIL: TestMessageInfoHandledError (0.00s) parse_test.go:53: could not parse envelope: date parsing failed: unrecognized date format: FAIL: TestReader (0.07s) gpg_test.go:27: using GNUPGHOME = /tmp/TestReader2384941142/001 reader_test.go:108: Test case: Invalid Signature reader_test.go:112: gpg.Read() = gpgmail: failed to read PGP message: gpg: failed to run verification: exit status 1 Fixes: 5ca6022d007b ("lint: ensure errors are at least logged (errcheck)") Fixes: 70bfcfef4257 ("lint: work nicely with wrapped errors (errorlint)") Signed-off-by: Robin Jarry <robin@jarry.cc> Signed-off-by: Moritz Poldrack <moritz@poldrack.dev>
* lint: work nicely with wrapped errors (errorlint)Moritz Poldrack2022-08-041-13/+13
| | | | | | | | Error wrapping as introduced in Go 1.13 adds some additional logic to use for comparing errors and adding information to it. Signed-off-by: Moritz Poldrack <moritz@poldrack.dev> Acked-by: Robin Jarry <robin@jarry.cc>
* lint: apply new formatting rulesMoritz Poldrack2022-08-011-2/+2
| | | | | | | Run `make fmt`. Signed-off-by: Moritz Poldrack <git@moritz.sh> Acked-by: Robin Jarry <robin@jarry.cc>
* parse: fix content-type parsing errorKoni Marti2022-06-071-1/+26
| | | | | | | | | | | If an error occurs when parsing the content-type, check if the content-type is quoted; if so, remove quotes. If this is not the case, then return a text/plain content-type as a sane fallback option, so that the message can be at least viewed in plaintext. Fixes: https://todo.sr.ht/~rjarry/aerc/44 Signed-off-by: Koni Marti <koni.marti@gmail.com> Acked-by: Robin Jarry <robin@jarry.cc>
* pgp: ensure CRLF line endings in pgpmail readerKoni Marti2022-04-251-0/+11
| | | | | | | | | | | | | | | | | | Ensure CRLF line endings in the pgpmail reader. Fix the pgp signature verification for maildir and notmuch. These backends do not return the full message body with CRLF line endings. But the accepted OpenPGP convention is for signed data to end with a <CR><LF> sequence (see RFC3156). If this is not the case the signed and transmitted data are considered not the same and thus signature verification fails. Link: https://datatracker.ietf.org/doc/html/rfc3156 Reported-by: Tim Culverhouse <tim@timculverhouse.com> Signed-off-by: Koni Marti <koni.marti@gmail.com> Tested-by: Tim Culverhouse <tim@timculverhouse.com>
* go vet: composite literal uses unkeyed fieldsMoritz Poldrack2022-03-181-2/+2
| | | | | | | | This commit fixes all occurrences of the abovementioned lint-error in the codebase. Signed-off-by: Moritz Poldrack <git@moritz.sh> Acked-by: Robin Jarry <robin@jarry.cc>
* maildir,notmuch: avoid leaking open filesNguyễn Gia Phong2022-01-191-1/+2
| | | | | | | | | | Previously, Message.NewReader returned the wrapped buffered reader without a reference to the opened file, so the files descriptors were left unclosed after reading. Now, the file reader is returned directly and closed on the call site. Buffering is not needed here because it is an implementation detail of go-message. Fixes: https://todo.sr.ht/~rjarry/aerc/9
* go.mod: change base git urlRobin Jarry2021-11-051-1/+1
| | | | | | | I'm not sure what are the implications but it seems required. Link: https://github.com/golang/go/issues/20883 Signed-off-by: Robin Jarry <robin@jarry.cc>
* lib/parse: simplify parseAddressListReto Brunner2021-02-221-13/+5
|
* lib/parse: use go-message msgid parsing if it succeedsReto Brunner2020-11-141-2/+6
|
* remove models.Address in favor of go-message AddressReto Brunner2020-11-141-4/+4
| | | | | | | We made a new type out of go-message/mail.Address without any real reason. This suddenly made it necessary to convert from one to the other without actually having any benefit whatsoever. This commit gets rid of the additional type
* handle message unknown charset errorJeff Martin2020-08-311-2/+6
| | | | | | | | | | | | | | This change handles message parse errors by printing the error when the user tries to view the message. Specifically only handling unknown charset errors in this patch, but there are many types of invalid messages that can be handled in this way. aerc currently leaves certain messages in the msglist in the pending (spinner) state, and I'm unable to view or modify the message. aerc also only prints parse errors with message when they are initially loaded. This UX is a little better, because you can still see the header info about the message, and if you try to view it, you will see the specific error.
* base models.Address on the mail.Address typeReto Brunner2020-08-201-10/+1
| | | | | | | | | | | | This allows us to hook into the std libs implementation of parsing related stuff. For this, we need to get rid of the distinction between a mailbox and a host to just a single "address" field. However this is already the common case. All but one users immediately concatenated the mbox/domain to a single address. So this in effects makes it simpler for most cases and we simply do the transformation in the special case.
* improve date parsing for notmuch/maildirReto Brunner2020-08-101-24/+49
| | | | | | | | | | | | | | | | | | | | If a message date would fail to parse, the worker would never receive the MessageInfo it asked for, and so it wouldn't display the message. The problem is the spec for date formats is too lax, so trying to ensure we can parse every weird date format out there is not a strategy we want to pursue. On the other hand, preventing the user from reading and working with a message due to the error format is also not a solution. The maildir and notmuch workers will now fallback to the internal date, which is based on the received header if we can't parse the format of the Date header. The UI will also fallback to the received header whenever the date header can't be parsed. This patch is based on the work done by Lyudmil Angelov <lyudmilangelov@gmail.com> But tries to handle a parsing error a bit more gracefully instead of just returning the zero date.
* Remove hard coded bodystruct path everywhereReto Brunner2020-07-271-6/+4
| | | | | | | Aerc usually used the path []int{1} if it didn't know what the proper path is. However this only works for multipart messages and breaks if it isn't one. This patch removes all the hard coding and extracts the necessary helpers to lib.
* Make it easier to debug date parsing errorsLyudmil Angelov2020-07-111-1/+1
| | | | | | | | | When message dates failed to parse, the error displayed would try to include the time object it failed to obtain, which would display as something like 0001-01-01 00:00:00 UTC, which isn't of much help. Instead, display the text we were trying to parse into a date, which makes the problem easier to debug.
* Guess date from received if not presentelumbella2020-05-061-1/+41
|
* Initial support for PGP decryption & signaturesDrew DeVault2020-03-031-3/+3
|
* worker/lib/parse: be more tolerant with parsing email addressesTimmy Douglas2020-02-061-1/+1
|
* maildir/notmuch: don't re-encode readersReto Brunner2020-01-051-37/+4
|
* Add labels to index format (%g)Reto Brunner2019-12-271-0/+6
| | | | Exposes the notmuch tags accordingly, stubs it for the maildir worker.
* Parse Reply-To header while parsing envelopeSrivathsan Murali2019-11-171-0/+5
|
* all: rewrite references to strings.Index to strings.ContainsWagner Riffel2019-09-041-1/+1
| | | | Signed-off-by: Wagner Riffel <wgrriffel@gmail.com>
* Extract message parsing to common worker moduleReto Brunner2019-08-081-0/+240
Things like FetchEntityPartReader etc can be reused by most workers working with raw email files from disk (or any reader for that matter). This patch extract that common functionality in a separate package.