doc:added multithreading section

This commit is contained in:
Jean-Francois Dockes 2016-05-26 10:19:46 +02:00
parent dadf10d0ea
commit a1a2bbf952
3 changed files with 732 additions and 366 deletions

View file

@ -19,7 +19,7 @@ case $RCLVERS in
1.14*) PPANAME=recoll-ppa;;
*) PPANAME=recoll15-ppa;;
esac
PPANAME=recollexp-ppa
#PPANAME=recollexp-ppa
echo "PPA: $PPANAME. Type CR if Ok, else ^C"
read rep
@ -42,7 +42,7 @@ check_recoll_orig()
debdir=debian
# Note: no new releases for lucid: no webkit. Or use old debianrclqt4 dir.
series="precise trusty utopic vivid wily xenial"
series=trusty
series=
if test "X$series" != X ; then
check_recoll_orig
@ -141,7 +141,7 @@ done
### Unity Scope
series="trusty utopic vivid wily xenial"
series=
series=xenial
debdir=debianunityscope
if test ! -d ${debdir}/ ; then

View file

@ -800,6 +800,103 @@ indexedmimetypes = application/pdf
</sect2>
<sect2 id="RCL.INDEXING.CONFIG.THREADS">
<title>Indexing thread usage configuration GUI</title>
<para>The &RCL; indexing process
<command>recollindex</command> can use multiple threads to
speed up indexing on multiprocessor systems. The work done
to index files is divided in several stages and some of the
stages can be executed by multiple threads. The stages are:
<orderedlist>
<listitem>File system walking: this is always performed by
the main thread.</listitem>
<listitem>File conversion and data extraction.</listitem>
<listitem>Text processing (splitting, stemming,
etc.)</listitem>
<listitem>&XAP; index update.</listitem>
</orderedlist>
</para>
<para>You can also read a
<ulink url="http://www.recoll.org/idxthreads/threadingRecoll.html">
longer document</ulink> about the transformation of
&RCL; indexing to multithreading.</para>
<para>The threads configuration is controlled by two
configuration file parameters.</para>
<variablelist>
<varlistentry><term><varname>thrQSizes</varname></term>
<listitem><para>This variable defines the job input queues
configuration. There are three possible queues for stages
2, 3 and 4, and this parameter should give the queue depth
for each stage (three integer values). If a value of -1 is
used for a given stage, no queue is used, and the thread
will go on performing the next stage. In practise, deep
queues have not been shown to increase performance. A value
of 0 for the first queue tells &RCL; to perform
autoconfiguration (no need for anything else in this case,
thrTCounts is not used) - this is the default
configuration.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>thrTCounts</varname></term>
<listitem><para>This defines the number of threads used
for each stage. If a value of -1 is used for one of
the queue depths, the corresponding thread count is
ignored. It makes no sense to use a value other than 1
for the last stage because updating the &XAP; index is
necessarily single-threaded (and protected by a
mutex).</para>
</listitem>
</varlistentry>
</variablelist>
<para>The following example would use three queues (of depth 2),
and 4 threads for converting source documents, 2 for
processing their text, and one to update the index. This was
tested to be the best configuration on the test system
(quadri-processor with multiple disks).
<programlisting>
thrQSizes = 2 2 2
thrTCounts = 4 2 1
</programlisting>
</para>
<para>The following example would use a single queue, and the
complete processing for each document would be performed by
a single thread (several documents will still be processed
in parallel in most cases). The threads will use mutual
exclusion when entering the index update stage. In practise
the performance would be close to the precedent case in
general, but worse in certain cases (e.g. a Zip archive
would be performed purely sequentially), so the previous
approach is preferred. YMMV... The 2 last values for
thrTCounts are ignored.
<programlisting>
thrQSizes = 2 -1 -1
thrTCounts = 6 1 1
</programlisting>
</para>
<para>The following example would disable
multithreading. Indexing will be performed by a single
thread.
<programlisting>
thrQSizes = -1 -1 -1
</programlisting>
</para>
</sect2>
<sect2 id="RCL.INDEXING.CONFIG.GUI">
<title>The index configuration GUI</title>

File diff suppressed because it is too large Load diff