path: root/computer/re-ooo.rst



(from discussion on OpenOffice.org questions list)
##################################################

:date: 2005-11-27T17:21:00
:category: computer
:tags: OpenOffice, vim, regexp

Regexp is a fairly complex beast and probably quite unnatural unless you
have some sort of programming training. In that sense it is questionable
how useful regexps are in a generic word processor for the general
public, but if you happen to have regexp experience by using tools like
perl, awk, grep, lex and alike then you can express quite complex
searches efficiently.

OK, first of all there is a famous `cite of Jamie Zawinski`_: ‘(Some
people, when confronted with a problem, think “I know, I’ll use regular
expressions.” Now they have two problems.)’ There is something about
that ``:-)``. Nevertheless, I use regexps quite often and when limited
to useful level of complexity, they could be quite useful. But, it
**is** difficult to use them and learning curve **is** quite steep. Perl
(probably the best and fastest implementation of RE currently available)
has four manpages for RE (perlrequick, perlretut, perlre, and
perlreref).

Sideshow for serious geeks: first read this_, `its continuation`_, and
conclusion_. Explanation of this mystery is simple, but thought
provoking—\ `apparently Perl has support for REs so complex, that all
other RE implementations break down on them, but this complexity has its
cost in slightly lower speed`_. And BTW I do not use Perl if I don’t
have to (much prefer Python_, but apparently here Perl is better than
anybody else).

Back to our main presentation tonight: there seems to be two ways how to
deal with REs in OpenOffice.org (and elsewhere). Either you will ignore
them, or you will bite the bullet and learn them. Actually, the first
way is not so ridiculous as it seems to be. As it was repeated many
times by vi-people (`vi-family editors`_ don’t have anything else than
RE for searching): “plain string is valid RE and as such will be
evaluated” (let’s ignore case sensitivity of REs for a moment); i.e.,
when you are searching for “moron”, you can just put “moron” into your
RE field and everything will work as expected. Being in this position
you are not worse off, then if there were no REs at all.

However, learning REs is not so difficult as it seems to be from looking
at some really advanced examples (yeah, sure you want some examples;
this RE in Python syntax ``r"(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$"``
parses US phone numbers and returns their parts in different fields;
`courtesy of Mark Pilgrim`_). You can begin for starters with just
something so simple as “\ ``colou?r``\ ” and even that will be
incredibly helpful. Just throw “regular expression tutorial” into your
friendly Google and you will find a lot of stuff which can help. You
have to be aware only of couple of things—first of all, that there are
at least two incompatible lines of REs living well “in wild” (for more
info on that read `aricle on Wikipedia`_). The best way how to deal with
this is to learn just the type of RE used in the application you’re
going to use (for OOo I just randomly stumbled upon `some tutoliar on RE
in OOo`_). BTW, you could just go to Help “List of Regular Expression”,
but it is really just a reference material, which is not enough for
somebody who doesn’t what’s going on.

The last thing—thank you, OOo developers, that you have included
full-size REs into OOo and not something crippled like `“wildcards” in
M$ Word`_ (which is just a small subset of REs packaged for non-geeks).
This and other things (XSLT filters and scripting, albeit the latter is
severly underdocumented) made OOo much more than just another free
office suite-like (there are others), but serious platform for doing
things in the proper geek-like way. Thanks!

.. _`cite of Jamie Zawinski`:
    https://www.jwz.org/hacks/marginal.html
.. _this:
    https://www.tbray.org/ongoing/When/200x/2004/08/22/PJre
.. _`its continuation`:
    https://www.tbray.org/ongoing/When/200x/2004/08/26/PJre2
.. _conclusion:
    https://www.tbray.org/ongoing/When/200x/2005/11/20/Regex-Promises
.. _`apparently Perl has support for REs so complex, that all other RE implementations break down on them, but this complexity has its cost in slightly lower speed`:
    http://perlmonks.org/index.pl?node_id=502408
.. _Python:
    http://www.python.org
.. _`vi-family editors`:
    https://en.wikipedia.org/wiki/Vi
.. _`courtesy of Mark Pilgrim`:
    https://diveinto.org/python3/regular-expressions.html#phonenumbers
.. _`aricle on Wikipedia`:
    https://en.wikipedia.org/wiki/Regular_expression
.. _`some tutoliar on RE in OOo`:
    http://homepage.ntlworld.com/garryknight/linux/ooregexp.html
.. _`“wildcards” in M$ Word`:
    http://office.microsoft.com/en-ca/assistance/HP051894331033.aspx