doc: clarifications in the synonyms section
This commit is contained in:
parent
1b0e8bb5b6
commit
766e7d4804
2 changed files with 142 additions and 101 deletions
|
@ -20,8 +20,8 @@ alink="#0000FF">
|
||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h1 class="title"><a name="idp48862656" id=
|
<h1 class="title"><a name="idp59627200" id=
|
||||||
"idp48862656"></a>Recoll user manual</h1>
|
"idp59627200"></a>Recoll user manual</h1>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div>
|
<div>
|
||||||
|
@ -109,13 +109,13 @@ alink="#0000FF">
|
||||||
multiple indexes</a></span></dt>
|
multiple indexes</a></span></dt>
|
||||||
|
|
||||||
<dt><span class="sect2">2.1.3. <a href=
|
<dt><span class="sect2">2.1.3. <a href=
|
||||||
"#idp54428144">Document types</a></span></dt>
|
"#idp65068656">Document types</a></span></dt>
|
||||||
|
|
||||||
<dt><span class="sect2">2.1.4. <a href=
|
<dt><span class="sect2">2.1.4. <a href=
|
||||||
"#idp54447824">Indexing failures</a></span></dt>
|
"#idp65088336">Indexing failures</a></span></dt>
|
||||||
|
|
||||||
<dt><span class="sect2">2.1.5. <a href=
|
<dt><span class="sect2">2.1.5. <a href=
|
||||||
"#idp48768496">Recovery</a></span></dt>
|
"#idp65095792">Recovery</a></span></dt>
|
||||||
</dl>
|
</dl>
|
||||||
</dd>
|
</dd>
|
||||||
|
|
||||||
|
@ -293,9 +293,8 @@ alink="#0000FF">
|
||||||
line</a></span></dt>
|
line</a></span></dt>
|
||||||
|
|
||||||
<dt><span class="sect1">3.4. <a href=
|
<dt><span class="sect1">3.4. <a href=
|
||||||
"#RCL.SEARCH.SYNONYMS">Using Synonyms (<span class=
|
"#RCL.SEARCH.SYNONYMS">Using Synonyms
|
||||||
"application">Recoll</span> 1.22 and
|
(1.22)</a></span></dt>
|
||||||
later)</a></span></dt>
|
|
||||||
|
|
||||||
<dt><span class="sect1">3.5. <a href=
|
<dt><span class="sect1">3.5. <a href=
|
||||||
"#RCL.SEARCH.PTRANS">Path translations</a></span></dt>
|
"#RCL.SEARCH.PTRANS">Path translations</a></span></dt>
|
||||||
|
@ -500,12 +499,10 @@ alink="#0000FF">
|
||||||
are specific to Unix, and not valid on <span class=
|
are specific to Unix, and not valid on <span class=
|
||||||
"application">Windows</span>. Some described features are
|
"application">Windows</span>. Some described features are
|
||||||
also not available on <span class=
|
also not available on <span class=
|
||||||
"application">Windows</span>.</p>
|
"application">Windows</span>. The manual will be
|
||||||
|
progressively updated. Until this happens, most references to
|
||||||
<p>The manual will be progressively updated for <span class=
|
shared files can be translated by looking under the Recoll
|
||||||
"application">Windows</span>. Until this happens, most
|
installation directory (esp. the <code class=
|
||||||
references to files can be translated by looking under the
|
|
||||||
Recoll installation directory (esp. the <code class=
|
|
||||||
"filename">Share</code> subdirectory). The user configuration
|
"filename">Share</code> subdirectory). The user configuration
|
||||||
is stored by default under <code class=
|
is stored by default under <code class=
|
||||||
"filename">AppData/Local/Recoll</code> inside the user
|
"filename">AppData/Local/Recoll</code> inside the user
|
||||||
|
@ -546,12 +543,18 @@ alink="#0000FF">
|
||||||
the <span class="guilabel">Top directories</span>
|
the <span class="guilabel">Top directories</span>
|
||||||
section).</p>
|
section).</p>
|
||||||
|
|
||||||
<p>Also be aware that you may need to install the
|
<p>Also be aware that, on Unix/Linux, you may need to
|
||||||
appropriate <a class="link" href="#RCL.INSTALL.EXTERNAL"
|
install the appropriate <a class="link" href=
|
||||||
title="5.2. Supporting packages">supporting
|
"#RCL.INSTALL.EXTERNAL" title=
|
||||||
applications</a> for document types that need them (for
|
"5.2. Supporting packages">supporting applications</a>
|
||||||
example <span class="application">antiword</span> for
|
for document types that need them (for example <span class=
|
||||||
<span class="application">Microsoft Word</span> files).</p>
|
"application">antiword</span> for <span class=
|
||||||
|
"application">Microsoft Word</span> files).</p>
|
||||||
|
|
||||||
|
<p>The <span class="application">Recoll</span> installation
|
||||||
|
for <span class="application">Windows</span> is
|
||||||
|
self-contained and includes most useful auxiliary programs.
|
||||||
|
You will just need to install Python 2.7.</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="sect1">
|
<div class="sect1">
|
||||||
|
@ -978,8 +981,8 @@ alink="#0000FF">
|
||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h3 class="title"><a name="idp54428144" id=
|
<h3 class="title"><a name="idp65068656" id=
|
||||||
"idp54428144"></a>2.1.3. Document types</h3>
|
"idp65068656"></a>2.1.3. Document types</h3>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
@ -1072,8 +1075,8 @@ indexedmimetypes = application/pdf
|
||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h3 class="title"><a name="idp54447824" id=
|
<h3 class="title"><a name="idp65088336" id=
|
||||||
"idp54447824"></a>2.1.4. Indexing
|
"idp65088336"></a>2.1.4. Indexing
|
||||||
failures</h3>
|
failures</h3>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
@ -1113,8 +1116,8 @@ indexedmimetypes = application/pdf
|
||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h3 class="title"><a name="idp48768496" id=
|
<h3 class="title"><a name="idp65095792" id=
|
||||||
"idp48768496"></a>2.1.5. Recovery</h3>
|
"idp65095792"></a>2.1.5. Recovery</h3>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
@ -4562,36 +4565,46 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
|
||||||
<h2 class="title" style="clear: both"><a name=
|
<h2 class="title" style="clear: both"><a name=
|
||||||
"RCL.SEARCH.SYNONYMS" id=
|
"RCL.SEARCH.SYNONYMS" id=
|
||||||
"RCL.SEARCH.SYNONYMS"></a>3.4. Using Synonyms
|
"RCL.SEARCH.SYNONYMS"></a>3.4. Using Synonyms
|
||||||
(<span class="application">Recoll</span> 1.22 and
|
(1.22)</h2>
|
||||||
later)</h2>
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<p>There are a number of different uses for synonyms in
|
<p><b>Term synonyms: </b>there are a number of ways to
|
||||||
text search. They can be used at index time (either to
|
use term synonyms for searching text:</p>
|
||||||
increase or decrease the number of indexed terms), or at
|
|
||||||
query time, to reduce user terms to a set of canonical
|
|
||||||
ones, or to expand queries to match texts containing
|
|
||||||
synonyms of the user terms.</p>
|
|
||||||
|
|
||||||
<p>Only the last approach is used in <span class=
|
<div class="itemizedlist">
|
||||||
"application">Recoll</span>. Synonym groups can be defined
|
<ul class="itemizedlist" style="list-style-type: disc;">
|
||||||
so that a user query term which is found to be part of a
|
<li class="listitem">
|
||||||
synonym group will be optionally expanded into an OR query
|
<p>At index creation time, they can be used to alter
|
||||||
for all synonyms.</p>
|
the indexed terms, either increasing or decreasing
|
||||||
|
their number, by expanding the original terms to all
|
||||||
|
synonyms, or by reducing all synonym terms to a
|
||||||
|
canonical one.</p>
|
||||||
|
</li>
|
||||||
|
|
||||||
<p>What is it good for ? The synonyms function is probably
|
<li class="listitem">
|
||||||
not going to help you find your letters to Mr. Smith. It is
|
<p>At query time, they can be used to match texts
|
||||||
best used for domain-specific searches. For example, it was
|
containing terms which are synonyms of the ones
|
||||||
initially suggested by a user performing searches among
|
specified by the user, either by expanding the query
|
||||||
historical documents: the synonyms file would contains
|
for all synonyms, or by reducing the user entry to
|
||||||
nicknames and aliases for each of the persons of
|
canonical terms (the latter only works if the
|
||||||
interest.</p>
|
corresponding processing has been performed while
|
||||||
|
creating the index).</p>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
|
||||||
<p>In practise, synonym groups are defined inside ordinary
|
<p><span class="application">Recoll</span> only uses
|
||||||
text files. Each line in the file defines a group.
|
synonyms at query time. A user query term which part of a
|
||||||
Example:</p>
|
synonym group will be optionally expanded into an
|
||||||
|
<code class="literal">OR</code> query for all terms in the
|
||||||
|
group.</p>
|
||||||
|
|
||||||
|
<p>Synonym groups are defined inside ordinary text files.
|
||||||
|
Each line in the file defines a group.</p>
|
||||||
|
|
||||||
|
<p>Example:</p>
|
||||||
<pre class="programlisting">
|
<pre class="programlisting">
|
||||||
hi hello "good morning"
|
hi hello "good morning"
|
||||||
|
|
||||||
|
@ -4601,29 +4614,39 @@ bye goodbye "see you" \
|
||||||
|
|
||||||
</pre>
|
</pre>
|
||||||
|
|
||||||
<p>As usual lines beginning with a <code class=
|
<p>As usual, lines beginning with a <code class=
|
||||||
"literal">#</code> are comments, empty lines are ignored,
|
"literal">#</code> are comments, empty lines are ignored,
|
||||||
and lines can be continued by ending them with a
|
and lines can be continued by ending them with a
|
||||||
backslash.</p>
|
backslash.</p>
|
||||||
|
|
||||||
<p>The synonyms are searched for matches with user terms
|
|
||||||
after these are stem-expanded, but the contents of the
|
|
||||||
synonyms file itself is not subjected to stem expansion
|
|
||||||
(1.22). This means that a match will not be found if the
|
|
||||||
form present in the synonyms file is not present anywhere
|
|
||||||
in the document set.</p>
|
|
||||||
|
|
||||||
<p>Multi-word synonyms are supported, but be aware that
|
<p>Multi-word synonyms are supported, but be aware that
|
||||||
these will generate phrase queries, which may degrade
|
these will generate phrase queries, which may degrade
|
||||||
performance (and also, no stemming).</p>
|
performance and will disable stemming expansion for the
|
||||||
|
phrase terms.</p>
|
||||||
|
|
||||||
<p>A synonyms file can be specified in the GUI preferences,
|
<p>The synonyms file can be specified in the <span class=
|
||||||
or as an option to <span class=
|
"guilabel">Search parameters</span> tab of the <span class=
|
||||||
"command"><strong>recollq</strong></span>.</p>
|
"guilabel">GUI configuration</span> <span class=
|
||||||
|
"guilabel">Preferences</span> menu entry, or as an option
|
||||||
|
for command-line searches.</p>
|
||||||
|
|
||||||
<p>This feature is new in <span class=
|
<p>Once the file is defined, the use of synonyms can be
|
||||||
"application">Recoll</span> 1.22 and will probably need to
|
enabled or disabled directly from the <span class=
|
||||||
be refined after some user feedback.</p>
|
"guilabel">Preferences</span> menu.</p>
|
||||||
|
|
||||||
|
<p>The synonyms are searched for matches with user terms
|
||||||
|
after the latter are stem-expanded, but the contents of the
|
||||||
|
synonyms file itself is not subjected to stem expansion.
|
||||||
|
This means that a match will not be found if the form
|
||||||
|
present in the synonyms file is not present anywhere in the
|
||||||
|
document set.</p>
|
||||||
|
|
||||||
|
<p>The synonyms function is probably not going to help you
|
||||||
|
find your letters to Mr. Smith. It is best used for
|
||||||
|
domain-specific searches. For example, it was initially
|
||||||
|
suggested by a user performing searches among historical
|
||||||
|
documents: the synonyms file would contains nicknames and
|
||||||
|
aliases for each of the persons of interest.</p>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="sect1">
|
<div class="sect1">
|
||||||
|
|
|
@ -57,10 +57,8 @@
|
||||||
<application>MS-Windows</application>. Many references in this
|
<application>MS-Windows</application>. Many references in this
|
||||||
manual, especially file locations, are specific to Unix, and not
|
manual, especially file locations, are specific to Unix, and not
|
||||||
valid on &WIN;. Some described features are also not available on
|
valid on &WIN;. Some described features are also not available on
|
||||||
&WIN;.</para>
|
&WIN;. The manual will be progressively updated. Until this happens,
|
||||||
|
most references to shared files can be translated by looking under
|
||||||
<para>The manual will be progressively updated for &WIN;. Until this
|
|
||||||
happens, most references to files can be translated by looking under
|
|
||||||
the Recoll installation directory (esp. the
|
the Recoll installation directory (esp. the
|
||||||
<filename>Share</filename> subdirectory). The user configuration is
|
<filename>Share</filename> subdirectory). The user configuration is
|
||||||
stored by default under <filename>AppData/Local/Recoll</filename>
|
stored by default under <filename>AppData/Local/Recoll</filename>
|
||||||
|
@ -87,11 +85,16 @@
|
||||||
</menuchoice>, then adjust the <guilabel>Top
|
</menuchoice>, then adjust the <guilabel>Top
|
||||||
directories</guilabel> section).</para>
|
directories</guilabel> section).</para>
|
||||||
|
|
||||||
<para>Also be aware that you may need to install the
|
<para>Also be aware that, on Unix/Linux, you may need to install the
|
||||||
appropriate <link linkend="RCL.INSTALL.EXTERNAL"> supporting
|
appropriate <link linkend="RCL.INSTALL.EXTERNAL"> supporting
|
||||||
applications</link> for document types that need them (for
|
applications</link> for document types that need them (for
|
||||||
example <application>antiword</application> for
|
example <application>antiword</application> for
|
||||||
<application>Microsoft Word</application> files).</para>
|
<application>Microsoft Word</application> files).</para>
|
||||||
|
|
||||||
|
<para>The &RCL; installation for &WIN; is self-contained and includes
|
||||||
|
most useful auxiliary programs. You will just need to install Python
|
||||||
|
2.7.</para>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
<sect1 id="RCL.INTRODUCTION.SEARCH">
|
<sect1 id="RCL.INTRODUCTION.SEARCH">
|
||||||
|
@ -3062,28 +3065,32 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
<sect1 id="RCL.SEARCH.SYNONYMS">
|
<sect1 id="RCL.SEARCH.SYNONYMS">
|
||||||
<title>Using Synonyms (&RCL; 1.22 and later)</title>
|
<title>Using Synonyms (1.22)</title>
|
||||||
|
|
||||||
<para>There are a number of different uses for synonyms in text
|
<formalpara><title>Term synonyms:</title>
|
||||||
search. They can be used at index time (either to increase or decrease
|
<para>there are a number of ways to use term synonyms for searching text:
|
||||||
the number of indexed terms), or at query time, to reduce user terms to
|
<itemizedlist>
|
||||||
a set of canonical ones, or to expand queries to match texts containing
|
<listitem><para>At index creation time, they can be used to alter the
|
||||||
synonyms of the user terms.</para>
|
indexed terms, either increasing or decreasing their number, by
|
||||||
|
expanding the original terms to all synonyms, or by
|
||||||
|
reducing all synonym terms to a canonical one.</para></listitem>
|
||||||
|
<listitem><para>At query time, they can be used to match texts
|
||||||
|
containing terms which are synonyms of the ones specified by the user,
|
||||||
|
either by expanding the query for all synonyms, or by reducing the user
|
||||||
|
entry to canonical terms (the latter only works if the corresponding
|
||||||
|
processing has been performed while creating the index).</para></listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
</formalpara>
|
||||||
|
|
||||||
<para>Only the last approach is used in &RCL;. Synonym groups can be
|
<para>&RCL; only uses synonyms at query time. A user query term which
|
||||||
defined so that a user query term which is found to be part of a
|
part of a synonym group will be optionally expanded into an
|
||||||
synonym group will be optionally expanded into an OR query for all
|
<literal>OR</literal> query for all terms in the group.</para>
|
||||||
synonyms.</para>
|
|
||||||
|
|
||||||
<para>What is it good for ? The synonyms function is probably not going
|
<para>Synonym groups are defined inside ordinary text files. Each line
|
||||||
to help you find your letters to Mr. Smith. It is best used for
|
in the file defines a group.</para>
|
||||||
domain-specific searches. For example, it was initially suggested by a
|
|
||||||
user performing searches among historical documents: the synonyms file
|
|
||||||
would contains nicknames and aliases for each of the persons of
|
|
||||||
interest.</para>
|
|
||||||
|
|
||||||
<para>In practise, synonym groups are defined inside ordinary text
|
<para>Example:
|
||||||
files. Each line in the file defines a group. Example:
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
hi hello "good morning"
|
hi hello "good morning"
|
||||||
|
|
||||||
|
@ -3091,26 +3098,37 @@ hi hello "good morning"
|
||||||
bye goodbye "see you" \
|
bye goodbye "see you" \
|
||||||
"au revoir"
|
"au revoir"
|
||||||
</programlisting>
|
</programlisting>
|
||||||
As usual lines beginning with a <literal>#</literal> are comments,
|
</para>
|
||||||
|
|
||||||
|
<para>As usual, lines beginning with a <literal>#</literal> are comments,
|
||||||
empty lines are ignored, and lines can be continued by ending them with
|
empty lines are ignored, and lines can be continued by ending them with
|
||||||
a backslash.
|
a backslash.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>The synonyms are searched for matches with user terms after these
|
|
||||||
are stem-expanded, but the contents of the synonyms file itself is not
|
|
||||||
subjected to stem expansion (1.22). This means that a match
|
|
||||||
will not be found if the form present in the synonyms file is not
|
|
||||||
present anywhere in the document set.</para>
|
|
||||||
|
|
||||||
<para>Multi-word synonyms are supported, but be aware that these will
|
<para>Multi-word synonyms are supported, but be aware that these will
|
||||||
generate phrase queries, which may degrade performance (and also, no
|
generate phrase queries, which may degrade performance and will disable
|
||||||
stemming).</para>
|
stemming expansion for the phrase terms.</para>
|
||||||
|
|
||||||
<para>A synonyms file can be specified in the GUI preferences, or as an
|
<para>The synonyms file can be specified in the <guilabel>Search
|
||||||
option to <command>recollq</command>.</para>
|
parameters</guilabel> tab of the <guilabel>GUI configuration</guilabel>
|
||||||
|
<guilabel>Preferences</guilabel> menu entry, or as an option for
|
||||||
|
command-line searches.</para>
|
||||||
|
|
||||||
|
<para>Once the file is defined, the use of synonyms can be enabled or
|
||||||
|
disabled directly from the <guilabel>Preferences</guilabel>
|
||||||
|
menu.</para>
|
||||||
|
|
||||||
<para>This feature is new in &RCL; 1.22 and will probably need to be
|
<para>The synonyms are searched for matches with user terms after the
|
||||||
refined after some user feedback.</para>
|
latter are stem-expanded, but the contents of the synonyms file itself
|
||||||
|
is not subjected to stem expansion. This means that a match will not be
|
||||||
|
found if the form present in the synonyms file is not present anywhere
|
||||||
|
in the document set.</para>
|
||||||
|
|
||||||
|
<para>The synonyms function is probably not going to help you find your
|
||||||
|
letters to Mr. Smith. It is best used for domain-specific searches. For
|
||||||
|
example, it was initially suggested by a user performing searches among
|
||||||
|
historical documents: the synonyms file would contains nicknames and
|
||||||
|
aliases for each of the persons of interest.</para>
|
||||||
|
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue