doc:added multithreading section
This commit is contained in:
parent
dadf10d0ea
commit
a1a2bbf952
3 changed files with 732 additions and 366 deletions
|
@ -19,7 +19,7 @@ case $RCLVERS in
|
|||
1.14*) PPANAME=recoll-ppa;;
|
||||
*) PPANAME=recoll15-ppa;;
|
||||
esac
|
||||
PPANAME=recollexp-ppa
|
||||
#PPANAME=recollexp-ppa
|
||||
echo "PPA: $PPANAME. Type CR if Ok, else ^C"
|
||||
read rep
|
||||
|
||||
|
@ -42,7 +42,7 @@ check_recoll_orig()
|
|||
debdir=debian
|
||||
# Note: no new releases for lucid: no webkit. Or use old debianrclqt4 dir.
|
||||
series="precise trusty utopic vivid wily xenial"
|
||||
series=trusty
|
||||
series=
|
||||
|
||||
if test "X$series" != X ; then
|
||||
check_recoll_orig
|
||||
|
@ -141,7 +141,7 @@ done
|
|||
|
||||
### Unity Scope
|
||||
series="trusty utopic vivid wily xenial"
|
||||
series=
|
||||
series=xenial
|
||||
|
||||
debdir=debianunityscope
|
||||
if test ! -d ${debdir}/ ; then
|
||||
|
|
|
@ -800,6 +800,103 @@ indexedmimetypes = application/pdf
|
|||
</sect2>
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
<sect2 id="RCL.INDEXING.CONFIG.THREADS">
|
||||
<title>Indexing thread usage configuration GUI</title>
|
||||
|
||||
<para>The &RCL; indexing process
|
||||
<command>recollindex</command> can use multiple threads to
|
||||
speed up indexing on multiprocessor systems. The work done
|
||||
to index files is divided in several stages and some of the
|
||||
stages can be executed by multiple threads. The stages are:
|
||||
<orderedlist>
|
||||
<listitem>File system walking: this is always performed by
|
||||
the main thread.</listitem>
|
||||
<listitem>File conversion and data extraction.</listitem>
|
||||
<listitem>Text processing (splitting, stemming,
|
||||
etc.)</listitem>
|
||||
<listitem>&XAP; index update.</listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
<para>You can also read a
|
||||
<ulink url="http://www.recoll.org/idxthreads/threadingRecoll.html">
|
||||
longer document</ulink> about the transformation of
|
||||
&RCL; indexing to multithreading.</para>
|
||||
|
||||
<para>The threads configuration is controlled by two
|
||||
configuration file parameters.</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term><varname>thrQSizes</varname></term>
|
||||
<listitem><para>This variable defines the job input queues
|
||||
configuration. There are three possible queues for stages
|
||||
2, 3 and 4, and this parameter should give the queue depth
|
||||
for each stage (three integer values). If a value of -1 is
|
||||
used for a given stage, no queue is used, and the thread
|
||||
will go on performing the next stage. In practise, deep
|
||||
queues have not been shown to increase performance. A value
|
||||
of 0 for the first queue tells &RCL; to perform
|
||||
autoconfiguration (no need for anything else in this case,
|
||||
thrTCounts is not used) - this is the default
|
||||
configuration.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry><term><varname>thrTCounts</varname></term>
|
||||
<listitem><para>This defines the number of threads used
|
||||
for each stage. If a value of -1 is used for one of
|
||||
the queue depths, the corresponding thread count is
|
||||
ignored. It makes no sense to use a value other than 1
|
||||
for the last stage because updating the &XAP; index is
|
||||
necessarily single-threaded (and protected by a
|
||||
mutex).</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
<para>The following example would use three queues (of depth 2),
|
||||
and 4 threads for converting source documents, 2 for
|
||||
processing their text, and one to update the index. This was
|
||||
tested to be the best configuration on the test system
|
||||
(quadri-processor with multiple disks).
|
||||
<programlisting>
|
||||
thrQSizes = 2 2 2
|
||||
thrTCounts = 4 2 1
|
||||
</programlisting>
|
||||
</para>
|
||||
|
||||
<para>The following example would use a single queue, and the
|
||||
complete processing for each document would be performed by
|
||||
a single thread (several documents will still be processed
|
||||
in parallel in most cases). The threads will use mutual
|
||||
exclusion when entering the index update stage. In practise
|
||||
the performance would be close to the precedent case in
|
||||
general, but worse in certain cases (e.g. a Zip archive
|
||||
would be performed purely sequentially), so the previous
|
||||
approach is preferred. YMMV... The 2 last values for
|
||||
thrTCounts are ignored.
|
||||
<programlisting>
|
||||
thrQSizes = 2 -1 -1
|
||||
thrTCounts = 6 1 1
|
||||
</programlisting>
|
||||
</para>
|
||||
|
||||
<para>The following example would disable
|
||||
multithreading. Indexing will be performed by a single
|
||||
thread.
|
||||
<programlisting>
|
||||
thrQSizes = -1 -1 -1
|
||||
</programlisting>
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
||||
|
||||
<sect2 id="RCL.INDEXING.CONFIG.GUI">
|
||||
<title>The index configuration GUI</title>
|
||||
|
||||
|
|
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue