200 lines
8 KiB
HTML
200 lines
8 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
|
|
<html>
|
|
<head>
|
|
<title>Recoll updated filters</title>
|
|
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
|
<meta name="Author" content="Jean-Francois Dockes">
|
|
<meta name="Description" content=
|
|
"recoll is a simple full-text search system for unix and linux
|
|
based on the powerful and mature xapian engine">
|
|
<meta name="Keywords" content=
|
|
"full text search, desktop search, unix, linux">
|
|
<meta http-equiv="Content-language" content="en">
|
|
<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
|
|
<meta name="robots" content="All,Index,Follow">
|
|
|
|
<link type="text/css" rel="stylesheet" href="../styles/style.css">
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<div class="rightlinks">
|
|
<ul>
|
|
<li><a href="../index.html">Home</a></li>
|
|
<li><a href="../download.html">Downloads</a></li>
|
|
<li><a href="../usermanual/index.html">User manual</a></li>
|
|
<li><a href="../usermanual/rcl.install.html">Installation</a></li>
|
|
<li><a href="../index.html#support">Support</a></li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="content">
|
|
|
|
<h1>Updated filters for Recoll</h1>
|
|
|
|
<p>The following describe new and updated filters, which will be
|
|
part of the next release, but can be installed on an older
|
|
release if you need them.</p>
|
|
|
|
<p>For updated filters, you just need to copy the script to the
|
|
filters directory which may be typically either <span
|
|
class="filename">/usr/share/recoll/filters</span>, or <span
|
|
class="filename">/usr/local/share/recoll/filters</span>. Please check
|
|
that the script is executable after copying it, and make it so if
|
|
needed (chmod a+x <i>scriptname</i>)</p>
|
|
|
|
<p>For new filters, you'll need to copy the script file as
|
|
above, possibly install the supporting application, and usually
|
|
edit the
|
|
<span class="filename">mimemap</span>,
|
|
<span class="filename">mimeview</span> and
|
|
<span class="filename">mimeconf</span> files, either in the
|
|
shared directory
|
|
(<span class="filename">
|
|
/usr[/local]/share/recoll/examples</span>), or
|
|
in your personal configuration directory
|
|
(<span class="filename">$HOME/.recoll</span> or
|
|
<span class="filename">$RECOLL_CONFDIR</span>).</p>
|
|
|
|
<p>Alternatively, you can replace your system files with
|
|
these updated and complete versions:
|
|
<a href="mimemap">mimemap</a>
|
|
<a href="mimeconf">mimeconf</a>
|
|
<a href="mimeview">mimeview</a>.</p>
|
|
|
|
<p>There is a slightly more detailed description of the filter
|
|
installation procedure on the
|
|
<a href="https://bitbucket.org/medoc/recoll/wiki/FilterRetrofit.wiki">
|
|
Recoll Wiki</a>.</p>
|
|
|
|
<p>The following entries are in reverse chronologic order. Each
|
|
lists the latest Recoll release on which the update makes sense
|
|
(newer releases have an up to date version of the filter).</p>
|
|
|
|
<p>However, if you are running a Recoll version older than 1.17,
|
|
you should really upgrade.</p>
|
|
|
|
<h2>PowerPoint documents (1.20 and older)</h2>
|
|
|
|
<p>The <b>rclppt</b> filter was based on <b>catppt</b>, but this
|
|
seems to fail quite often on newer PPT
|
|
documents. The new version is based on code from
|
|
the <b>libreoffice</b> <b>mso-dump</b> project. It is both
|
|
reasonably fast and quite thorough.
|
|
</p>
|
|
|
|
<p>Installation:<ul>
|
|
<li>As <tt>recollindex</tt> was executing <b>catppt</b>
|
|
directly in the default configuration, you will also need to add
|
|
the following to
|
|
the <tt>mimeconf</tt> file (e.g.: ~/.recoll/mimeconf):
|
|
<pre>
|
|
[index]
|
|
application/vnd.ms-powerpoint = exec rclppt
|
|
</pre>
|
|
</li>
|
|
<li>Copy the 3 following files to the Recoll filters directory (e.g:
|
|
<i>/usr/share/recoll/filters</i>) and make sure
|
|
that <tt>ppt-dump.py</tt> and <tt>rclppt</tt> are executable.
|
|
<ul>
|
|
<li><a href="rclppt">rclppt</a></li>
|
|
<li><a href="ppt-dump.py">ppt-dump.py</a></li>
|
|
<li><a href="msodump.zip">msodump.zip</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<h2>EPUB documents (1.17 and older)</h2>
|
|
|
|
<p>New <a href="rclepub">rclepub</a> filter for EPUB documents.
|
|
This needs
|
|
the <a href="http://pypi.python.org/pypi/epub/0.5.0">
|
|
python epub decoding module</a>. </p>
|
|
|
|
<h2>CHM files (1.17.1 and older)</h2>
|
|
<p><a href="rclchm">rclchm</a>. The previous version of the
|
|
filter mishandled files which had encoded internal URLs (not
|
|
very frequent, but happens).</p>
|
|
|
|
<h2>Updated Open Document filter (1.17 and older)</h2>
|
|
|
|
<p>The <a href="rclsoff">new filter</a> will correctly handle
|
|
exported Google Docs documents and also Open/LibreOffice ones in
|
|
some cases. The previous filters concatenated all the text
|
|
inside the exported Google docs without any spacing...</p>
|
|
|
|
<h2>TAR archives (1.17 and older)</h2>
|
|
|
|
<p>New <a href="rcltar">rcltar</a> filter for tar archives. The
|
|
indexing of tar archives is disabled by default in the sample
|
|
configuration (stored here). This is an <tt>execm</tt>
|
|
filter !. You'll need to add an <br>
|
|
<tt>application/x-tar = execm rcltar</tt><br>
|
|
line in the [index] section of your
|
|
$HOME/mimeconf to enable it, not an <tt>exec</tt> one.</p>
|
|
|
|
<h2>XML files (1.17 and older)</h2>
|
|
|
|
<p>By default, the current recoll version does not index xml
|
|
content (except for known formats like dia, svg etc.). This
|
|
new <a href="rclxml">rclxml</a> filter will extract the data
|
|
from any xml file. Only text data is extracted, no attribute
|
|
values. The other option is to treat xml file as plain text
|
|
one (see comment in mimeconf), and index everything, including
|
|
a lot of garbage.</p>
|
|
|
|
<h2>DIA files (1.16 and older)</h2>
|
|
<p><a href="rcldia">rcldia</a> is a new filter
|
|
for <a href="http://projects.gnome.org/dia/">Dia</a> files,
|
|
contributed by Stefan Friedel.</p>
|
|
|
|
|
|
<h2>Okular annotations (1.16 and older)</h2>
|
|
<p><a href="rclokulnote">rclokulnote</a>. Okular lets you create
|
|
annotations for PDF documents and stores them in xml format
|
|
somewhere under ~/.kde. This filter does not do a nice job to
|
|
format the data, but will at least let you find it...</p>
|
|
|
|
<h2>Gnumeric (1.16 and older)</h2>
|
|
<p><a href="rclgnm">rclgnm</a>. Needs xsltproc and
|
|
gunzip. As <tt>.gnumeric</tt> was in the list of
|
|
explicitely ignored suffixes, you can't just add the mime
|
|
and indexer script lines to your local mimemap and mimeconf, you
|
|
also need to define recoll_noindex in the local mimemap (to
|
|
override the system one which
|
|
contains <tt>.gnumeric</tt>). The simplest approach may be to
|
|
just replace the system files with those above.</p>
|
|
|
|
<h2>Rar archive support (1.15 and older)</h2>
|
|
<p><a href="rclrar">rclrar</a>. This is up to date in Recoll
|
|
1.16.2 but may be added to Recoll 1.15. It needs the Python
|
|
rarfile module. </p>
|
|
|
|
<h2>Mimehtml support (1.15)</h2>
|
|
<p>This is based on the internal mail filter, you just need to
|
|
download and install the configuration files (mimemap and
|
|
mimeconf. Will only work with 1.15 and later.</p>
|
|
|
|
<h2>Konqueror webarchive (.war) filter (1.15)</h2>
|
|
<p><a href="rclwar">rclwar</a></p>
|
|
|
|
<h2>Updated zip archive filter (1.15)</h2>
|
|
<p>The filter is corrected to handle utf-8 paths in zip archives:
|
|
<a href="rclzip">rclzip</a>. Up to date in Recoll 1.16, but
|
|
may be useful with Recoll 1.15</p>
|
|
|
|
<h2>Updated audio tag filter (1.14)</h2>
|
|
<p>The mutagen-based rclaudio filter delivered with recoll 1.14.2
|
|
used a very recent mutagen interface which will only work with
|
|
mutagen versions after 1.17 (probably. at least works with 1.19,
|
|
doesn't with 1.15).
|
|
You can download the <a href="rclaudio">corrected script
|
|
here. Not useful with Recoll 1.5 or 1.6</a>.
|
|
</p>
|
|
|
|
</div>
|
|
</body>
|
|
</html>
|