168 lines
6.6 KiB
HTML
168 lines
6.6 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
|
|
<html>
|
|
<head>
|
|
<title>Recoll updated filters</title>
|
|
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta name="Author" content="Jean-Francois Dockes">
|
|
<meta name="Description" content=
|
|
"recoll is a simple full-text search system for unix and linux
|
|
based on the powerful and mature xapian engine">
|
|
<meta name="Keywords" content=
|
|
"full text search, desktop search, unix, linux">
|
|
<meta http-equiv="Content-language" content="en">
|
|
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
|
|
<meta name="robots" content="All,Index,Follow">
|
|
|
|
<link type="text/css" rel="stylesheet" href="../styles/style.css">
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="rightlinks">
|
|
<ul>
|
|
<li><a href="../index.html">Home</a></li>
|
|
<li><a href="../download.html">Downloads</a></li>
|
|
<li><a href="../usermanual/index.html">User manual</a></li>
|
|
<li><a href="../usermanual/rcl.install.html">Installation</a></li>
|
|
<li><a href="../index.html#support">Support</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="content">
|
|
|
|
<h1>Updated filters for Recoll</h1>
|
|
|
|
<p>The following describe new and updated filters, which will be
|
|
part of the next release, but can be installed on the current
|
|
release if you need them.</p>
|
|
|
|
<p>For updated filters, you just need to copy the script to the
|
|
filters directory which may be typically either <span
|
|
class="filename">/usr/share/recoll/filters</span>, or <span
|
|
class="filename">/usr/local/share/recoll/filters</span>. Please check
|
|
that the script is executable after copying it, and make it so if
|
|
needed (chmod a+x <i>scriptname</i>)</p>
|
|
|
|
<p>For new filters, you'll need to copy the script file as
|
|
above, possibly install the supporting application, and usually
|
|
edit the
|
|
<span class="filename">mimemap</span>,
|
|
<span class="filename">mimeview</span> and
|
|
<span class="filename">mimeconf</span> files, either in the
|
|
shared directory
|
|
(<span class="filename">
|
|
/usr[/local]/share/recoll/examples</span>), or
|
|
in your personal configuration directory
|
|
(<span class="filename">$HOME/.recoll</span> or
|
|
<span class="filename">$RECOLL_CONFDIR</span>).</p>
|
|
|
|
<p>Alternatively, you can replace your system files with
|
|
these updated and complete versions:
|
|
<a href="mimemap">mimemap</a>
|
|
<a href="mimeconf">mimeconf</a>
|
|
<a href="mimeview">mimeview</a> </p>
|
|
|
|
<blockquote>
|
|
<p>There is a new rclepub filter for EPUB ebooks. It is new for
|
|
all recoll versions before 1.18.0.</p>
|
|
<p>rclchm needs to be updated for all Recoll versions up
|
|
to and including 1.17.1.</p>
|
|
<p>If you are running an older Recoll version, you really
|
|
should upgrade.</p>
|
|
</blockquote>
|
|
|
|
|
|
<h2>EPUB documents</h2>
|
|
|
|
<p>New <a href="rclepub">rclepub</a> filter for EPUB documents.
|
|
This needs
|
|
the <a href="http://pypi.python.org/pypi/epub/0.5.0">
|
|
python epub decoding module</a>. The mimeview/mimemap and
|
|
mimeconf files in this directory have the appropriate
|
|
entries.</p>
|
|
|
|
<h2>Updated Open Document filter</h2>
|
|
|
|
<p>The <a href="rclsoff">new filter</a> will correctly handle
|
|
exported Google Docs
|
|
documents and also Open/LibreOffice ones in some cases. The
|
|
previous filters concatenated all the text inside the exported
|
|
Google docs without any spacing...</p>
|
|
|
|
<h2>TAR archives</h2>
|
|
|
|
<p>New <a href="rcltar">rcltar</a> filter for tar archives. The
|
|
indexing of tar archives is disabled by default in the sample
|
|
configuration (stored here). You'll need to add
|
|
an <tt>application/x-tar = execm rcltar</tt> line in the
|
|
[index] section of your $HOME/mimeconf to enable it.</p>
|
|
|
|
<h2>XML files</h2>
|
|
|
|
<p>By default, the current recoll version does not index xml
|
|
content (except for known formats like dia, svg etc.). This
|
|
new <a href="rclxml">rclxml</a> filter will extract the data
|
|
from any xml file. Only text data is extracted, no attribute
|
|
values. The other option is to treat xml file as plain text
|
|
one (see comment in mimeconf), and index everything, including
|
|
a lot of garbage.</p>
|
|
|
|
|
|
<h2>DIA files</h2>
|
|
<p><a href="rcldia">rcldia</a> is a new filter
|
|
for <a href="http://projects.gnome.org/dia/">Dia</a> files,
|
|
contributed by Stefan Friedel.</p>
|
|
|
|
<h2>CHM files</h2>
|
|
<p><a href="rclchm">rclchm</a>. The previous version of the
|
|
filter mishandled files which had encoded internal URLs (not
|
|
very frequent, but happens).</p>
|
|
|
|
<h2>Okular annotations</h2>
|
|
<p><a href="rclokulnote">rclokulnote</a>. Okular lets you create
|
|
annotations for PDF documents and stores them in xml format
|
|
somewhere under ~/.kde. This filter does not do a nice job to
|
|
format the data, but will at least let you find it...</p>
|
|
|
|
<h2>Gnumeric</h2>
|
|
<p><a href="rclgnm">rclgnm</a>. Needs xsltproc and
|
|
gunzip. As <tt>.gnumeric</tt> was in the list of
|
|
explicitely ignored suffixes, you can't just add the mime
|
|
and indexer script lines to your local mimemap and mimeconf, you
|
|
also need to define recoll_noindex in the local mimemap (to
|
|
override the system one which
|
|
contains <tt>.gnumeric</tt>). The simplest approach may be to
|
|
just replace the system files with those above.</p>
|
|
|
|
<h2>Rar archive support</h2>
|
|
<p><a href="rclrar">rclrar</a>. This is up to date in Recoll
|
|
1.16.2 but may be added to Recoll 1.15. It needs the Python
|
|
rarfile module. </p>
|
|
|
|
<h2>Mimehtml support</h2>
|
|
<p>This is based on the internal mail filter, you just need to
|
|
download and install the configuration files (mimemap and
|
|
mimeconf. Will only work with 1.15 and later.</p>
|
|
|
|
<h2>Konqueror webarchive (.war) filter</h2>
|
|
<p><a href="rclwar">rclwar</a></p>
|
|
|
|
<h2>Updated zip archive filter</h2>
|
|
<p>The filter is corrected to handle utf-8 paths in zip archives:
|
|
<a href="rclzip">rclzip</a>. Up to date in Recoll 1.16, but
|
|
may be useful with Recoll 1.15</p>
|
|
|
|
<h2>Updated audio tag filter</h2>
|
|
<p>The mutagen-based rclaudio filter delivered with recoll 1.14.2
|
|
used a very recent mutagen interface which will only work with
|
|
mutagen versions after 1.17 (probably. at least works with 1.19,
|
|
doesn't with 1.15).
|
|
You can download the <a href="rclaudio">corrected script
|
|
here. Not useful with Recoll 1.5 or 1.6</a>.
|
|
</p>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|