93 lines
3.2 KiB
Text
93 lines
3.2 KiB
Text
$Header: /cvsroot/unac/unac/README,v 1.5 2002/09/02 10:40:09 loic Exp $
|
|
|
|
What is it ?
|
|
------------
|
|
|
|
unac is a C library that removes accents from characters, regardless
|
|
of the character set (ISO-8859-15, ISO-CELTIC, KOI8-RU...) as long as
|
|
iconv(3) is able to convert it into UTF-16 (Unicode). For instance
|
|
the string été will become ete. It provides a command line interface
|
|
(unaccent) that removes accents from an input flow or a string given
|
|
in argument. When using the library function or the command, the
|
|
charset of the input must be specified. The input is converted to
|
|
UTF-16 using iconv(3), accents are removed and the result is converted
|
|
back to the original charset. The iconv -l command on GNU/Linux will
|
|
show all charset supported.
|
|
|
|
Where is the documentation ?
|
|
----------------------------
|
|
|
|
The manual page of the unaccent command : man unaccent.
|
|
The manual page of the unac library : man unac.
|
|
|
|
How to install it ?
|
|
-------------------
|
|
|
|
For OS that are not GNU/Linux we recommend to use the iconv library
|
|
provided by Bruno Haible <haible@ilog.fr> at
|
|
ftp://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.8.tar.gz.
|
|
|
|
./configure [--with-iconv=/my/local]
|
|
|
|
make all
|
|
|
|
make check
|
|
|
|
make install
|
|
|
|
How to link with unac ?
|
|
-------------------------
|
|
|
|
Assuming you've installed unac in the /usr/local directory use something
|
|
similar to the following:
|
|
|
|
In the sources:
|
|
...
|
|
#include <unac.h>
|
|
...
|
|
|
|
On the command line:
|
|
|
|
cc -I/usr/local/include -o prog prog.cc -L/usr/local/lib -lunac
|
|
|
|
Where can I download it ?
|
|
-------------------------
|
|
The main distribution site is http://www.senga.org/unac/.
|
|
|
|
What is the license ?
|
|
---------------------
|
|
unac is distributed under the GNU GPL, as found at
|
|
http://www.gnu.org/licenses/gpl.txt. Unicode data files are
|
|
under the following license, which is compatible with the
|
|
GNU GPL:
|
|
|
|
http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.html#UCD_Terms
|
|
UCD Terms of Use
|
|
|
|
Disclaimer
|
|
|
|
The Unicode Character Database is provided as is by Unicode, Inc. No
|
|
claims are made as to fitness for any particular purpose. No
|
|
warranties of any kind are expressed or implied. The recipient agrees
|
|
to determine applicability of information provided. If this file has
|
|
been purchased on magnetic or optical media from Unicode, Inc., the
|
|
sole remedy for any claim will be exchange of defective media within
|
|
90 days of receipt.
|
|
|
|
This disclaimer is applicable for all other data files accompanying
|
|
the Unicode Character Database, some of which have been compiled by
|
|
the Unicode Consortium, and some of which have been supplied by other
|
|
sources. Limitations on Rights to Redistribute This Data
|
|
|
|
Recipient is granted the right to make copies in any form for internal
|
|
distribution and to freely use the information supplied in the
|
|
creation of products supporting the Unicode(TM) Standard. The files
|
|
in the Unicode Character Database can be redistributed to third
|
|
parties or other organizations (whether for profit or not) as long as
|
|
this notice and the disclaimer notice are retained. Information can
|
|
be extracted from these files and used in documentation or programs,
|
|
as long as there is an accompanying notice indicating the source.
|
|
|
|
Loic Dachary
|
|
loic@senga.org
|
|
http://www.senga.org/
|