path: root/utf8.rst
blob: 6e8685cfa4884b322259e705397ea3c98650f21a (plain) (tree)



Finally, UTF-8 locale (and about Compose)

:date: 2005-09-23T01:16:00
:category: computer
:tags: utf8, Linux

I have finally bitten the bullet and switched my locale to
`cs\_CZ.UTF-8`_. When still writing this blog in gvim (`the end of my
relation with vim`_ and here_), I begun to write it in `UTF-8`_ and it
was such a relief. Suddenly, I didn‘t have to use ugly kludges like \`\`
or --. Of course, the problem is that there are so many supplementary
characters which could be suddenly used, that no keyboard layout is able
to handle all of them (I think) and some other solution has to be found.
Vim has digraphs_ which are really quite useful, but as everything
else in vim, there is no connection to the outside world. Switch to
Kate/KWrite was very pleasant issue, but obviously there are no digraphs
native to them. My first reaction was to use `HTML entities`_ and
translate them to the pure UTF-8 version with `my special Python
script`_. However, I felt very strongly that this is not the way.

`I asked on cz.comp.linux`_ about experience of people with inserting
these non-keyboardish characters and the answer was “Use Compose key”. I
begun to search on Google for the answer how to make it work and finally
I found that actually the best source of information about the
combinations of keys for Compose (aside from `the article on
Wikipedia`_) is `directly in my computer`_. The only problem was that
with ISO 8859-2 based locale only very small part of keys actually
worked. This was the last straw which broke my back of resistance
towards switching whole computer to UTF-8. The problem is (as always)
`Midnight Commander`_, which Debian version doesn’t work with UTF-8 at
all (especially, panel frames are affected by this). So, again, Googling
and Googling until I've found `this thread on some discussion board`_,
which contains `a link to patched version of MC`_ (requires also
`non-standard version of slang`_), which somehow works in my console.
However, MC is not a critical for me anymore, now when Krusader_ is
finally `stable enough and featurefull enough to compete with MC`_.

One more problem—when I have switched to UTF-8 many filenames with
accented characters were suddenly broken. I thought that Linux
filesystems store all metada in UTF-8 already. Oh well, they probably
don’t. So I had to run output of ``locate`` through ``cstocs`` and then
to find out with ``diff`` what all has been changed.

Looking at all this issue with at least some distance, it seems to that
actually Compose key combines best from all the options—it works as well
as vim’s digraphs, but it is X11-wide, which is cool (and yes, of
course, it is much better than M$-Windows’s ``Alt+<number>``).

.. _`cs\_CZ.UTF-8`:
.. _`the end of my relation with vim`:
.. _here:
.. _`UTF-8`:
.. _digraphs:
.. _`HTML entities`:
.. _`my special Python script`:
.. _`I asked on cz.comp.linux`:
.. _`the article on Wikipedia`:
.. _`directly in my computer`:
.. _`Midnight Commander`:
.. _`this thread on some discussion board`:
.. _`a link to patched version of MC`:
.. _`non-standard version of slang`:
.. _Krusader:
.. _`stable enough and featurefull enough to compete with MC`: