recoll/unac/unaccent.1

'''
''' Copyright (C) 2000, 2001, 2002 Loic Dachary <loic@senga.org>
'''
''' This program is free software; you can redistribute it and/or modify
''' it under the terms of the GNU General Public License as published by
''' the Free Software Foundation; either version 2 of the License, or
''' (at your option) any later version.
'''
''' This program is distributed in the hope that it will be useful,
''' but WITHOUT ANY WARRANTY; without even the implied warranty of
''' MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
''' GNU General Public License for more details.
'''
''' You should have received a copy of the GNU General Public License
''' along with this program; if not, write to the Free Software
''' Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
'''
.TH unaccent 1 local
.SH NAME
unaccent \- remove accents from input stream or a string

.SH SYNOPSIS
unaccent [--debug_low] [--debug_high] [-h] charset [string] [expected]
.SH DESCRIPTION
With a single argument,
.I unaccent
reads data from stdin, replaces accented letters by their unaccented
equivalent and writes the result on stdout.
If the second argument
.B ('string')
is provided
.I unaccent
transforms it by replacing accented letters by their unaccented
equivalent. The
result is printed on the standard output.
The charset of the input
string or the data read from stdin is specified by the
.B 'charset'
argument (ISO-8859-15 for instance). The output is printed using the same charset.
.P
If the
.B 'expected'
argument is provided, the output string is compared to it. If they
are not equal
.I unaccent
exits on error.
.P
.I unaccent
relies on the
.B iconv(3)
library to convert from the specified charset to UTF-16BE (or UTF-16
if UTF-16BE is not available). You should check the manual pages for
available charsets. On GNU/Linux the command

.nf
.ft CW
iconv -l
.ft R
.fi

shows all available charsets.

.SH OPTIONS
.TP
.B --debug_low
Prints human readable information about the unaccentuation process. See
.B unac(3)
for more information.

.TP
.B --debug_high
Prints very detailed information about the unaccentuation process.
See
.B unac(3)
for more information.

.TP
.B --help -h
Prints a short usage and exits.

.SH EXAMPLES
Remove accents from the string
.B <EFBFBD>t<EFBFBD>
and check that the result is
.B ete.

.nf
.ft CW
unaccent ISO-8859-1 <20>t<EFBFBD> ete
.ft R
.fi

.P
Remove accents from file
.B myfile
and put the result in file
.B myfile.unaccent

.nf
.ft CW
unaccent ISO-8859-1 < myfile > myfile.unaccent
.ft R
.fi

.SH SEE ALSO
unac(3), iconv(3)

.SH AUTHOR
Loic Dachary loic@senga.org
.nf
.ft CW
http://www.senga.org/unac/
.ft R
.fi