improve the documentation of execm filters and include a pointer to a simple sample

This commit is contained in:
Jean-Francois Dockes 2016-05-23 19:21:53 +02:00
parent ba2002192d
commit db1a33db06
2 changed files with 77 additions and 57 deletions

View file

@ -20,8 +20,8 @@ alink="#0000FF">
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h1 class="title"><a name="idp37043056" id= <h1 class="title"><a name="idp49492016" id=
"idp37043056"></a>Recoll user manual</h1> "idp49492016"></a>Recoll user manual</h1>
</div> </div>
<div> <div>
@ -109,13 +109,13 @@ alink="#0000FF">
multiple indexes</a></span></dt> multiple indexes</a></span></dt>
<dt><span class="sect2">2.1.3. <a href= <dt><span class="sect2">2.1.3. <a href=
"#idp42671856">Document types</a></span></dt> "#idp55120704">Document types</a></span></dt>
<dt><span class="sect2">2.1.4. <a href= <dt><span class="sect2">2.1.4. <a href=
"#idp42691536">Indexing failures</a></span></dt> "#idp55140304">Indexing failures</a></span></dt>
<dt><span class="sect2">2.1.5. <a href= <dt><span class="sect2">2.1.5. <a href=
"#idp42698992">Recovery</a></span></dt> "#idp55147760">Recovery</a></span></dt>
</dl> </dl>
</dd> </dd>
@ -981,8 +981,8 @@ alink="#0000FF">
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idp42671856" id= <h3 class="title"><a name="idp55120704" id=
"idp42671856"></a>2.1.3.&nbsp;Document types</h3> "idp55120704"></a>2.1.3.&nbsp;Document types</h3>
</div> </div>
</div> </div>
</div> </div>
@ -1075,8 +1075,8 @@ indexedmimetypes = application/pdf
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idp42691536" id= <h3 class="title"><a name="idp55140304" id=
"idp42691536"></a>2.1.4.&nbsp;Indexing "idp55140304"></a>2.1.4.&nbsp;Indexing
failures</h3> failures</h3>
</div> </div>
</div> </div>
@ -1116,8 +1116,8 @@ indexedmimetypes = application/pdf
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idp42698992" id= <h3 class="title"><a name="idp55147760" id=
"idp42698992"></a>2.1.5.&nbsp;Recovery</h3> "idp55147760"></a>2.1.5.&nbsp;Recovery</h3>
</div> </div>
</div> </div>
</div> </div>
@ -1701,10 +1701,9 @@ metadatacmds = ; tags = tmsu tags %f
"margin-left: 0.5in; margin-right: 0.5in;"> "margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3> <h3 class="title">Note</h3>
<p>This is just an example. Depending on the <span class= <p>Depending on the <span class="application">tmsu</span>
"application">tmsu</span> version, you may need/want to version, you may need/want to add options like
add options like <code class= <code class="literal">--database=/some/db</code>.</p>
"literal">--database=/some/db</code>.</p>
</div> </div>
<p>You may want to restrict this processing to a subset of <p>You may want to restrict this processing to a subset of
@ -5719,7 +5718,7 @@ dir:recoll dir:src -dir:utils -dir:common
<span class="command"><strong>recollindex</strong></span>. <span class="command"><strong>recollindex</strong></span>.
This latter kind will not be described here.</p> This latter kind will not be described here.</p>
<p>There are currently (1.18 and since 1.13) two kinds of <p>There are currently (since version 1.13) two kinds of
external executable input handlers:</p> external executable input handlers:</p>
<div class="itemizedlist"> <div class="itemizedlist">
@ -5870,14 +5869,20 @@ dir:recoll dir:src -dir:utils -dir:common
<p>If you can program and want to write an <code class= <p>If you can program and want to write an <code class=
"literal">execm</code> handler, it should not be too "literal">execm</code> handler, it should not be too
difficult to make sense of one of the existing modules. difficult to make sense of one of the existing modules.
For example, look at <span class= There is a sample one with many comments, not actually
"command"><strong>rclzip</strong></span> which uses Zip used by <span class="application">Recoll</span>, which
file paths as identifiers (<code class= would index a text file as one document per line. Look
"literal">ipath</code>), and <span class= for <code class="filename">rcltxtlines.py</code> in the
"command"><strong>rclics</strong></span>, which uses an <code class="filename">src/filters</code> directory in
integer index. Also have a look at the comments inside the <span class="application">Recoll</span> <a class=
the <code class="filename">internfile/mh_execm.h</code> "ulink" href="https://bitbucket.org/medoc/recoll/src"
file and possibly at the corresponding module.</p> target="_top">BitBucket repository</a> (the sample not in
the distributed release at the moment).</p>
<p>You can also have a look at the slightly more complex
<span class="command"><strong>rclzip</strong></span>
which uses Zip file paths as identifiers (<code class=
"literal">ipath</code>).</p>
<p><code class="literal">execm</code> handlers sometimes <p><code class="literal">execm</code> handlers sometimes
need to make a choice for the nature of the <code class= need to make a choice for the nature of the <code class=
@ -5958,15 +5963,17 @@ dir:recoll dir:src -dir:utils -dir:common
<p>If no suffix association is found for the file name, <p>If no suffix association is found for the file name,
<span class="application">Recoll</span> will try to <span class="application">Recoll</span> will try to
execute the <span class="command"><strong>file execute a system command (typically <span class=
-i</strong></span> command to determine a MIME type.</p> "command"><strong>file -i</strong></span> or <span class=
"command"><strong>xdg-mime</strong></span>) to determine
a MIME type.</p>
<p>The association of file types to handlers is performed <p>The second element is the association of MIME types to
in the <a class="link" href= handlers in the <a class="link" href=
"#RCL.INSTALL.CONFIG.MIMECONF" title= "#RCL.INSTALL.CONFIG.MIMECONF" title=
"5.4.5.&nbsp;The mimeconf file"><code class= "5.4.5.&nbsp;The mimeconf file"><code class=
"filename">mimeconf</code> file</a>. A sample will "filename">mimeconf</code> file</a>. A sample will
probably be of better help than a long explanation:</p> probably be better than a long explanation:</p>
<pre class="programlisting"> <pre class="programlisting">
[index] [index]
@ -9404,18 +9411,24 @@ x-my-tag = mailmytag
file name extension to MIME type mappings.</p> file name extension to MIME type mappings.</p>
<p>For file names without an extension, or with an <p>For file names without an extension, or with an
unknown one, the system's <span class= unknown one, a system command (<span class=
"command"><strong>file</strong></span> <code class= "command"><strong>file</strong></span> <code class=
"option">-i</code> command will be executed to determine "option">-i</code>, or <span class=
the MIME type (this can be switched off inside the main "command"><strong>xdg-mime</strong></span>) will be
configuration file).</p> executed to determine the MIME type (this can be switched
off, or the command changed inside the main configuration
file).</p>
<p>The mappings can be specified on a per-subtree basis, <p>The mappings can be specified on a per-subtree basis,
which may be useful in some cases. Example: <span class= which may be useful in some cases. Example: <span class=
"application">gaim</span> logs have a <code class= "application">okular</span> notes have a <code class=
"filename">.txt</code> extension but should be handled "filename">.xml</code> extension but should be handled
specially, which is possible because they are usually all specially, which is possible because they are usually all
located in one place.</p> located in one place. Example:</p>
<pre class="programlisting">
[~/.kde/share/apps/okular/docdata]
.xml = application/x-okular-notes
</pre>
<p>The <code class="varname">recoll_noindex</code> <p>The <code class="varname">recoll_noindex</code>
<code class="filename">mimemap</code> variable has been <code class="filename">mimemap</code> variable has been

View file

@ -3877,7 +3877,7 @@ dir:recoll dir:src -dir:utils -dir:common
inside <command>recollindex</command>. This latter kind will not inside <command>recollindex</command>. This latter kind will not
be described here.</para> be described here.</para>
<para>There are currently (1.18 and since 1.13) two kinds of <para>There are currently (since version 1.13) two kinds of
external executable input handlers: external executable input handlers:
<itemizedlist> <itemizedlist>
<listitem><para>Simple <literal>exec</literal> handlers <listitem><para>Simple <literal>exec</literal> handlers
@ -3988,13 +3988,18 @@ dir:recoll dir:src -dir:utils -dir:common
<para>If you can program and want to write <para>If you can program and want to write
an <literal>execm</literal> handler, it should not be too an <literal>execm</literal> handler, it should not be too
difficult to make sense of one of the existing modules. For difficult to make sense of one of the existing modules. There is
example, look at <command>rclzip</command> which uses Zip a sample one with many comments, not actually used by &RCL;,
file paths as identifiers (<literal>ipath</literal>), which would index a text file as one document per line. Look for
and <command>rclics</command>, which uses an integer <filename>rcltxtlines.py</filename> in the
index. Also have a look at the comments inside <filename>src/filters</filename> directory in the &RCL; <ulink
the <filename>internfile/mh_execm.h</filename> file and url="https://bitbucket.org/medoc/recoll/src">BitBucket
possibly at the corresponding module.</para> repository</ulink> (the sample
not in the distributed release at the moment).</para>
<para>You can also have a look at the slightly more complex
<command>rclzip</command> which uses Zip
file paths as identifiers (<literal>ipath</literal>).</para>
<para><literal>execm</literal> handlers sometimes need to make <para><literal>execm</literal> handlers sometimes need to make
a choice for the nature of the <literal>ipath</literal> a choice for the nature of the <literal>ipath</literal>
@ -4045,13 +4050,13 @@ dir:recoll dir:src -dir:utils -dir:common
.doc = application/msword .doc = application/msword
</programlisting> </programlisting>
If no suffix association is found for the file name, &RCL; will try If no suffix association is found for the file name, &RCL; will try
to execute the <command>file -i</command> command to determine a to execute a system command (typically <command>file -i</command> or
MIME type.</para> <command>xdg-mime</command>) to determine a MIME type.</para>
<para>The association of file types to handlers is performed in <para>The second element is the association of MIME types to handlers
the <link linkend="RCL.INSTALL.CONFIG.MIMECONF"> in the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
<filename>mimeconf</filename> file</link>. A sample will probably be <filename>mimeconf</filename> file</link>. A sample will probably be
of better help than a long explanation:</para> better than a long explanation:</para>
<programlisting> <programlisting>
[index] [index]
@ -6543,18 +6548,20 @@ x-my-tag = mailmytag
<para><filename>mimemap</filename> specifies the <para><filename>mimemap</filename> specifies the
file name extension to MIME type mappings.</para> file name extension to MIME type mappings.</para>
<para>For file names without an extension, or with an unknown <para>For file names without an extension, or with an unknown one,
one, the system's <command>file</command> <option>-i</option> a system command (<command>file</command> <option>-i</option>, or
command will be <command>xdg-mime</command>) will be executed to determine the MIME
executed to determine the MIME type (this can be switched off type (this can be switched off, or the command changed inside the
inside the main configuration file).</para> main configuration file).</para>
<para>The mappings can be specified on a per-subtree basis, <para>The mappings can be specified on a per-subtree basis,
which may be useful in some cases. Example: which may be useful in some cases. Example:
<application>gaim</application> logs have a <application>okular</application> notes have a
<filename>.txt</filename> extension but <filename>.xml</filename> extension but
should be handled specially, which is possible because they should be handled specially, which is possible because they
are usually all located in one place.</para> are usually all located in one place. Example:
<programlisting>[~/.kde/share/apps/okular/docdata]
.xml = application/x-okular-notes</programlisting></para>
<para>The <varname>recoll_noindex</varname> <para>The <varname>recoll_noindex</varname>
<filename>mimemap</filename> variable has been moved to <filename>mimemap</filename> variable has been moved to