doc:added multithreading section
This commit is contained in:
parent
dadf10d0ea
commit
a1a2bbf952
3 changed files with 732 additions and 366 deletions
|
@ -19,7 +19,7 @@ case $RCLVERS in
|
||||||
1.14*) PPANAME=recoll-ppa;;
|
1.14*) PPANAME=recoll-ppa;;
|
||||||
*) PPANAME=recoll15-ppa;;
|
*) PPANAME=recoll15-ppa;;
|
||||||
esac
|
esac
|
||||||
PPANAME=recollexp-ppa
|
#PPANAME=recollexp-ppa
|
||||||
echo "PPA: $PPANAME. Type CR if Ok, else ^C"
|
echo "PPA: $PPANAME. Type CR if Ok, else ^C"
|
||||||
read rep
|
read rep
|
||||||
|
|
||||||
|
@ -42,7 +42,7 @@ check_recoll_orig()
|
||||||
debdir=debian
|
debdir=debian
|
||||||
# Note: no new releases for lucid: no webkit. Or use old debianrclqt4 dir.
|
# Note: no new releases for lucid: no webkit. Or use old debianrclqt4 dir.
|
||||||
series="precise trusty utopic vivid wily xenial"
|
series="precise trusty utopic vivid wily xenial"
|
||||||
series=trusty
|
series=
|
||||||
|
|
||||||
if test "X$series" != X ; then
|
if test "X$series" != X ; then
|
||||||
check_recoll_orig
|
check_recoll_orig
|
||||||
|
@ -141,7 +141,7 @@ done
|
||||||
|
|
||||||
### Unity Scope
|
### Unity Scope
|
||||||
series="trusty utopic vivid wily xenial"
|
series="trusty utopic vivid wily xenial"
|
||||||
series=
|
series=xenial
|
||||||
|
|
||||||
debdir=debianunityscope
|
debdir=debianunityscope
|
||||||
if test ! -d ${debdir}/ ; then
|
if test ! -d ${debdir}/ ; then
|
||||||
|
|
|
@ -800,6 +800,103 @@ indexedmimetypes = application/pdf
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<sect2 id="RCL.INDEXING.CONFIG.THREADS">
|
||||||
|
<title>Indexing thread usage configuration GUI</title>
|
||||||
|
|
||||||
|
<para>The &RCL; indexing process
|
||||||
|
<command>recollindex</command> can use multiple threads to
|
||||||
|
speed up indexing on multiprocessor systems. The work done
|
||||||
|
to index files is divided in several stages and some of the
|
||||||
|
stages can be executed by multiple threads. The stages are:
|
||||||
|
<orderedlist>
|
||||||
|
<listitem>File system walking: this is always performed by
|
||||||
|
the main thread.</listitem>
|
||||||
|
<listitem>File conversion and data extraction.</listitem>
|
||||||
|
<listitem>Text processing (splitting, stemming,
|
||||||
|
etc.)</listitem>
|
||||||
|
<listitem>&XAP; index update.</listitem>
|
||||||
|
</orderedlist>
|
||||||
|
</para>
|
||||||
|
<para>You can also read a
|
||||||
|
<ulink url="http://www.recoll.org/idxthreads/threadingRecoll.html">
|
||||||
|
longer document</ulink> about the transformation of
|
||||||
|
&RCL; indexing to multithreading.</para>
|
||||||
|
|
||||||
|
<para>The threads configuration is controlled by two
|
||||||
|
configuration file parameters.</para>
|
||||||
|
|
||||||
|
<variablelist>
|
||||||
|
|
||||||
|
<varlistentry><term><varname>thrQSizes</varname></term>
|
||||||
|
<listitem><para>This variable defines the job input queues
|
||||||
|
configuration. There are three possible queues for stages
|
||||||
|
2, 3 and 4, and this parameter should give the queue depth
|
||||||
|
for each stage (three integer values). If a value of -1 is
|
||||||
|
used for a given stage, no queue is used, and the thread
|
||||||
|
will go on performing the next stage. In practise, deep
|
||||||
|
queues have not been shown to increase performance. A value
|
||||||
|
of 0 for the first queue tells &RCL; to perform
|
||||||
|
autoconfiguration (no need for anything else in this case,
|
||||||
|
thrTCounts is not used) - this is the default
|
||||||
|
configuration.</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry><term><varname>thrTCounts</varname></term>
|
||||||
|
<listitem><para>This defines the number of threads used
|
||||||
|
for each stage. If a value of -1 is used for one of
|
||||||
|
the queue depths, the corresponding thread count is
|
||||||
|
ignored. It makes no sense to use a value other than 1
|
||||||
|
for the last stage because updating the &XAP; index is
|
||||||
|
necessarily single-threaded (and protected by a
|
||||||
|
mutex).</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
</variablelist>
|
||||||
|
|
||||||
|
<para>The following example would use three queues (of depth 2),
|
||||||
|
and 4 threads for converting source documents, 2 for
|
||||||
|
processing their text, and one to update the index. This was
|
||||||
|
tested to be the best configuration on the test system
|
||||||
|
(quadri-processor with multiple disks).
|
||||||
|
<programlisting>
|
||||||
|
thrQSizes = 2 2 2
|
||||||
|
thrTCounts = 4 2 1
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>The following example would use a single queue, and the
|
||||||
|
complete processing for each document would be performed by
|
||||||
|
a single thread (several documents will still be processed
|
||||||
|
in parallel in most cases). The threads will use mutual
|
||||||
|
exclusion when entering the index update stage. In practise
|
||||||
|
the performance would be close to the precedent case in
|
||||||
|
general, but worse in certain cases (e.g. a Zip archive
|
||||||
|
would be performed purely sequentially), so the previous
|
||||||
|
approach is preferred. YMMV... The 2 last values for
|
||||||
|
thrTCounts are ignored.
|
||||||
|
<programlisting>
|
||||||
|
thrQSizes = 2 -1 -1
|
||||||
|
thrTCounts = 6 1 1
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>The following example would disable
|
||||||
|
multithreading. Indexing will be performed by a single
|
||||||
|
thread.
|
||||||
|
<programlisting>
|
||||||
|
thrQSizes = -1 -1 -1
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.CONFIG.GUI">
|
<sect2 id="RCL.INDEXING.CONFIG.GUI">
|
||||||
<title>The index configuration GUI</title>
|
<title>The index configuration GUI</title>
|
||||||
|
|
||||||
|
|
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue