diff options
author | Jake Hunsaker <jhunsake@redhat.com> | 2022-01-13 13:52:34 -0500 |
---|---|---|
committer | Jake Hunsaker <jhunsake@redhat.com> | 2022-01-17 12:24:06 -0500 |
commit | ed618678fd3d07e68e1a430eb7d225a9701332e0 (patch) | |
tree | ca347bf38aa8a5f84b4cc89fbc0b026b2bec5b14 /tests/unittests | |
parent | f270220fddb70ef71a8da0376333b2454d7c4983 (diff) | |
download | sos-ed618678fd3d07e68e1a430eb7d225a9701332e0.tar.gz |
[clean,parsers] Build regex lists for static items only once
For parsers such as the username and keyword parsers, we don't discover
new items through parsing archives - these parsers use static lists
determined before we begin the actual obfuscation process.
As such, we can build a list of regexes for these static items once, and
then reference those regexes during execution, rather than rebuilding
the regex for each of these items for every obfuscation.
For use cases where hundreds of items, e.g. hundreds of usernames, are
being obfuscated this results in a significant performance increase.
Individual per-file gains are minor - fractions of a second - however
these gains build up over the course of the hundreds to thousands of
files a typical archive can be expected to contain.
Signed-off-by: Jake Hunsaker <jhunsake@redhat.com>
Diffstat (limited to 'tests/unittests')
-rw-r--r-- | tests/unittests/cleaner_tests.py | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/tests/unittests/cleaner_tests.py b/tests/unittests/cleaner_tests.py index cb20772f..b59eade9 100644 --- a/tests/unittests/cleaner_tests.py +++ b/tests/unittests/cleaner_tests.py @@ -105,6 +105,7 @@ class CleanerParserTests(unittest.TestCase): self.host_parser = SoSHostnameParser(config={}, opt_domains='foobar.com') self.kw_parser = SoSKeywordParser(config={}, keywords=['foobar']) self.kw_parser_none = SoSKeywordParser(config={}) + self.kw_parser.generate_item_regexes() def test_ip_parser_valid_ipv4_line(self): line = 'foobar foo 10.0.0.1/24 barfoo bar' |