This commit is contained in:
Jean-Francois Dockes 2015-08-06 08:02:47 +02:00
parent fdfcdbb47a
commit 8b3ea3e763
2 changed files with 19 additions and 17 deletions

View file

@ -121,10 +121,10 @@ subdirectory, because of all the places they're referred from
<p><a href="recoll-1.20.6.tar.gz">recoll-1.20.6.tar.gz</a>.</p> <p><a href="recoll-1.20.6.tar.gz">recoll-1.20.6.tar.gz</a>.</p>
<h3>Release 1.21.0</h3> <h3>Release 1.21.1</h3>
<p>Not the right choice if you are after complete stability: <p>Not the right choice if you are after complete stability:
<a href="recoll-1.21.0.tar.gz">recoll-1.21.0.tar.gz</a>. See what's <a href="recoll-1.21.1.tar.gz">recoll-1.21.1.tar.gz</a>. See what's
new in the <a href="release-1.21.html">release notes</a>.</p> new in the <a href="release-1.21.html">release notes</a>.</p>
<!-- <!--

View file

@ -7,12 +7,12 @@
== Introduction == Introduction
Recoll is a big process which executes many others, mostly for extracting The Recoll indexer, *recollindex*, is a big process which executes many
text from documents. Some of the executed processes are quite short-lived, others, mostly for extracting text from documents. Some of the executed
and the time used by the process execution machinery can actually dominate processes are quite short-lived, and the time used by the process execution
the time used to translate data. This document explores possible approaches machinery can actually dominate the time used to translate data. This
to improving performance without adding excessive complexity or damaging document explores possible approaches to improving performance without
reliability. adding excessive complexity or damaging reliability.
Studying fork/exec performance is not exactly a new venture, and there are Studying fork/exec performance is not exactly a new venture, and there are
many texts which address the subject. While researching, though, I found many texts which address the subject. While researching, though, I found
@ -32,9 +32,10 @@ identical processes.
space initialized from an executable file, inheriting some of the resources space initialized from an executable file, inheriting some of the resources
under various conditions. under various conditions.
As processes became bigger the copy-before-discard operation wasted This was all fine with the small processes of the first Unix systems, but
significant resources, and was optimized using two methods (at very as time progressed, processes became bigger and the copy-before-discard
different points in time): operation was found to waste significant resources. It was optimized using
two methods (at very different points in time):
- The first approach was to supplement +fork()+ with the +vfork()+ call, which - The first approach was to supplement +fork()+ with the +vfork()+ call, which
is similar but does not duplicate the address space: the new process is similar but does not duplicate the address space: the new process
@ -176,7 +177,7 @@ a single thread, and +fork()+ if it ran multiple ones.
After another careful look at the code, I could see few issues with After another careful look at the code, I could see few issues with
using +vfork()+ in the multithreaded indexer, so this was committed. using +vfork()+ in the multithreaded indexer, so this was committed.
The only change necessary was to get rid on an implementation of the The only change necessary was to get rid of an implementation of the
lacking Linux +closefrom()+ call (used to close all open descriptors above a lacking Linux +closefrom()+ call (used to close all open descriptors above a
given value). The previous Recoll implementation listed the +/proc/self/fd+ given value). The previous Recoll implementation listed the +/proc/self/fd+
directory to look for open descriptors but this was unsafe because of of directory to look for open descriptors but this was unsafe because of of
@ -200,13 +201,14 @@ same times as the +fork()+/+vfork()+ options.
The tests were performed on an Intel Core i5 750 (4 cores, 4 threads). The tests were performed on an Intel Core i5 750 (4 cores, 4 threads).
The last line is just for the fun: *recollindex* 1.18 (single-threaded)
needed almost 6 times as long to process the same files...
It would be painful to play it safe and discard the 60% reduction in It would be painful to play it safe and discard the 60% reduction in
execution time offered by using +vfork()+. execution time offered by using +vfork()+, so this was adopted for Recoll
1.21. To this day, no problems were discovered, but, still crossing
fingers...
To this day, no problems were discovered, but, still crossing fingers... The last line in the table is just for the fun: *recollindex* 1.18
(single-threaded) needed almost 6 times as long to process the same
files...
//// ////
Objections to vfork: Objections to vfork: