This commit is contained in:
Jean-Francois Dockes 2015-08-06 08:26:39 +02:00
commit e9e1c6ea6d
14 changed files with 158 additions and 84 deletions

View file

@ -1018,6 +1018,14 @@ Chapter 5. Installation and configuration
Maximum handler execution time, after which it is aborted. Some Maximum handler execution time, after which it is aborted. Some
postscript programs just loop... postscript programs just loop...
filtermaxmbytes
Recoll 1.20.7 and later. Maximum handler memory utilisation. This
uses setrlimit(RLIMIT_AS) on most systems (total virtual memory
space size limit). Some programs may start with 500 MBytes of
mapped shared libraries, so take this into account when choosing a
value. The default is a liberal 2000MB.
filtersdir filtersdir
A directory to search for the external input handler scripts used A directory to search for the external input handler scripts used

View file

@ -1858,25 +1858,23 @@ Chapter 3. Searching
third option has been available in recent releases and is probably now third option has been available in recent releases and is probably now
the best one: use PRE tags with line wrapping. the best one: use PRE tags with line wrapping.
o Use desktop preferences to choose document editor: if this is checked, o Choose editor applicationsr: this opens a dialog which allows you to
the xdg-open utility will be used to open files when you click the select the application to be used to open each MIME type. The default
Open link in the result list, instead of the application defined in is nornally to use the xdg-open utility, but you can override it.
mimeview. xdg-open will in term use your desktop preferences to choose
an appropriate application.
o Exceptions: when using the desktop preferences for opening documents, o Exceptions: even wen xdg-open is used by default for opening
these are MIME types that will still be opened according to Recoll documents, you can set exceptions for MIME types that will still be
preferences. This is useful for passing parameters like page numbers opened according to Recoll preferences. This is useful for passing
or search strings to applications that support them (e.g. evince). parameters like page numbers or search strings to applications that
This cannot be done with xdg-open which only supports passing one support them (e.g. evince). This cannot be done with xdg-open which
parameter. only supports passing one parameter.
o Choose editor applications this will let you choose the command o Document filter choice style: this will let you choose if the document
started by the Open links inside the result list, for specific categories are displayed as a list or a set of buttons, or a menu.
document types.
o Display category filter as toolbar... this will let you choose if the o Start with simple search mode: this lets you choose the value of the
document categories are displayed as a list or a set of buttons. simple search type on program startup. Either a fixed value (e.g.
Query Language, or the value in use when the program last exited.
o Auto-start simple search on white space entry: if this is checked, a o Auto-start simple search on white space entry: if this is checked, a
search will be executed each time you enter a space in the simple search will be executed each time you enter a space in the simple
@ -2159,7 +2157,10 @@ Chapter 3. Searching
recollq is not built by default. You can use the Makefile in the query recollq is not built by default. You can use the Makefile in the query
directory to build it. This is a very simple program, and if you can directory to build it. This is a very simple program, and if you can
program a little c++, you may find it useful to taylor its output format program a little c++, you may find it useful to taylor its output format
to your needs. to your needs. Not that recollq is only really useful on systems where the
Qt libraries (or even the X11 ones) are not available. Otherwise, just use
recoll -t, which takes the exact same parameters and options which are
described for recollq
recollq has a man page (not installed by default, look in the doc/man recollq has a man page (not installed by default, look in the doc/man
directory). The Usage string is as follows: directory). The Usage string is as follows:
@ -4286,6 +4287,14 @@ Chapter 5. Installation and configuration
Maximum handler execution time, after which it is aborted. Some Maximum handler execution time, after which it is aborted. Some
postscript programs just loop... postscript programs just loop...
filtermaxmbytes
Recoll 1.20.7 and later. Maximum handler memory utilisation. This
uses setrlimit(RLIMIT_AS) on most systems (total virtual memory
space size limit). Some programs may start with 500 MBytes of
mapped shared libraries, so take this into account when choosing a
value. The default is a liberal 2000MB.
filtersdir filtersdir
A directory to search for the external input handler scripts used A directory to search for the external input handler scripts used

View file

@ -709,6 +709,12 @@ bool TextSplit::text_to_words(const string &in)
// confusing. // confusing.
// ie "MySQL manual" is matched by "MySQL manual" and // ie "MySQL manual" is matched by "MySQL manual" and
// "my sql manual" but not "mysql manual" // "my sql manual" but not "mysql manual"
// A possibility would be to emit both my and sql at the
// same position. All non-phrase searches would work, and
// both "MySQL manual" and "mysql manual" phrases would
// match too. "my sql manual" would not match, but this is
// not an issue.
case A_ULETTER: case A_ULETTER:
if (m_span.length() && if (m_span.length() &&
charclasses[(unsigned char)m_span[m_span.length() - 1]] == charclasses[(unsigned char)m_span[m_span.length() - 1]] ==

View file

@ -2917,7 +2917,11 @@ MimeType=*/*
use the <filename>Makefile</filename> in the use the <filename>Makefile</filename> in the
<filename>query</filename> directory to build it. This is a very <filename>query</filename> directory to build it. This is a very
simple program, and if you can program a little c++, you may find it simple program, and if you can program a little c++, you may find it
useful to taylor its output format to your needs.</para> useful to taylor its output format to your needs. Not that recollq is
only really useful on systems where the Qt libraries (or even the X11
ones) are not available. Otherwise, just use <literal>recoll
-t</literal>, which takes the exact same parameters and options which
are described for <command>recollq</command></para>
<para><command>recollq</command> has a man page (not installed by <para><command>recollq</command> has a man page (not installed by
default, look in the <filename>doc/man</filename> directory). The default, look in the <filename>doc/man</filename> directory). The

View file

@ -114,19 +114,26 @@ def main (args):
except getopt.GetoptError: except getopt.GetoptError:
error("error parsing input options\n") error("error parsing input options\n")
usage(exname) usage(exname)
return return false
status = True
try: try:
dumper = PPTDumper(args[0], globals.params) dumper = PPTDumper(args[0], globals.params)
if not dumper.dump(): if not dumper.dump():
error("ppt-dump: dump error " + args[0] + "\n") error("ppt-dump: dump error " + args[0] + "\n")
status = False
except: except:
error("ppt-dump: FAILURE (bad format?) " + args[0] + "\n") error("ppt-dump: FAILURE (bad format?) " + args[0] + "\n")
status = False
if globals.params.dumpText: if globals.params.dumpText:
print(globals.textdump.replace("\r", "\n")) print(globals.textdump.replace("\r", "\n"))
return(status)
if __name__ == '__main__': if __name__ == '__main__':
main(sys.argv) if main(sys.argv):
sys.exit(0)
else:
sys.exit(1)
# vim:set filetype=python shiftwidth=4 softtabstop=4 expandtab: # vim:set filetype=python shiftwidth=4 softtabstop=4 expandtab:

View file

@ -28,9 +28,8 @@
#include <string> #include <string>
#include <iostream> #include <iostream>
#include <map> #include <map>
#ifndef NO_NAMESPACES
using namespace std; using namespace std;
#endif /* NO_NAMESPACES */
#include "cstr.h" #include "cstr.h"
#include "internfile.h" #include "internfile.h"
@ -550,6 +549,10 @@ bool FileInterner::dijontorcl(Rcl::Doc& doc)
// doc with an ipath, not the last one which is usually text/plain We // doc with an ipath, not the last one which is usually text/plain We
// also set the author and modification time from the last doc which // also set the author and modification time from the last doc which
// has them. // has them.
//
// The stack can contain objects with an ipath element (corresponding
// to actual embedded documents), and, at the top, elements without an
// ipath element, corresponding to format translations of the last doc.
// //
// The docsize is fetched from the first element without an ipath // The docsize is fetched from the first element without an ipath
// (first non container). If the last element directly returns // (first non container). If the last element directly returns
@ -579,7 +582,8 @@ void FileInterner::collectIpathAndMT(Rcl::Doc& doc) const
const map<string, string>& docdata = (*hit)->get_meta_data(); const map<string, string>& docdata = (*hit)->get_meta_data();
if (getKeyValue(docdata, cstr_dj_keyipath, ipathel)) { if (getKeyValue(docdata, cstr_dj_keyipath, ipathel)) {
if (!ipathel.empty()) { if (!ipathel.empty()) {
// We have a non-empty ipath // Non-empty ipath. This stack element is for an
// actual embedded document, not a format translation.
hasipath = true; hasipath = true;
getKeyValue(docdata, cstr_dj_keymt, doc.mimetype); getKeyValue(docdata, cstr_dj_keymt, doc.mimetype);
getKeyValue(docdata, cstr_dj_keyfn, doc.meta[Rcl::Doc::keyfn]); getKeyValue(docdata, cstr_dj_keyfn, doc.meta[Rcl::Doc::keyfn]);
@ -593,8 +597,18 @@ void FileInterner::collectIpathAndMT(Rcl::Doc& doc) const
getKeyValue(docdata, cstr_dj_keydocsize, doc.fbytes); getKeyValue(docdata, cstr_dj_keydocsize, doc.fbytes);
doc.ipath += cstr_isep; doc.ipath += cstr_isep;
} }
getKeyValue(docdata, cstr_dj_keyauthor, doc.meta[Rcl::Doc::keyau]); // We set the author field from the innermost doc which has
getKeyValue(docdata, cstr_dj_keymd, doc.dmtime); // one: allows finding, e.g. an image attachment having no
// metadata by a search on the sender name. Only do this for
// actually embedded documents (avoid replacing values from
// metacmds for the topmost one). For a topmost doc, author
// will be merged by dijontorcl() later on. About same for
// dmtime, but an external value will be replaced, not
// augmented if dijontorcl() finds an internal value.
if (hasipath) {
getKeyValue(docdata, cstr_dj_keyauthor, doc.meta[Rcl::Doc::keyau]);
getKeyValue(docdata, cstr_dj_keymd, doc.dmtime);
}
} }
// Trim empty tail elements in ipath. // Trim empty tail elements in ipath.
@ -878,12 +892,6 @@ FileInterner::Status FileInterner::internfile(Rcl::Doc& doc, const string& ipath
return FIAgain; return FIAgain;
} }
// Temporary while we fix backend things
static string urltolocalpath(string url)
{
return url.substr(7, string::npos);
}
bool FileInterner::tempFileForMT(TempFile& otemp, RclConfig* cnf, bool FileInterner::tempFileForMT(TempFile& otemp, RclConfig* cnf,
const string& mimetype) const string& mimetype)
{ {

View file

@ -235,8 +235,11 @@ Usage(void)
int main(int argc, char **argv) int main(int argc, char **argv)
{ {
// If "-t" is present at all, we don't do the GUI thing and pass the // if we are named recollq or option "-t" is present at all, we
// whole to recollq for command line / pipe usage. // don't do the GUI thing and pass the whole to recollq for
// command line / pipe usage.
if (!strcmp(argv[0], "recollq"))
exit(recollq(&theconfig, argc, argv));
for (int i = 0; i < argc; i++) { for (int i = 0; i < argc; i++) {
if (!strcmp(argv[i], "-t")) { if (!strcmp(argv[i], "-t")) {
exit(recollq(&theconfig, argc, argv)); exit(recollq(&theconfig, argc, argv));

View file

@ -85,8 +85,7 @@ void RclMain::showFragButs()
connect(fragbuts, SIGNAL(fragmentsChanged()), connect(fragbuts, SIGNAL(fragmentsChanged()),
this, SLOT(onFragmentsChanged())); this, SLOT(onFragmentsChanged()));
} else { } else {
delete fragbuts; deleteZ(fragbuts);
fragbuts = 0;
} }
} else { } else {
// Close and reopen, in hope that makes us visible... // Close and reopen, in hope that makes us visible...

View file

@ -279,6 +279,9 @@ void RclMain::init()
QKeySequence seq("Ctrl+Shift+s"); QKeySequence seq("Ctrl+Shift+s");
QShortcut *sc = new QShortcut(seq, this); QShortcut *sc = new QShortcut(seq, this);
connect(sc, SIGNAL (activated()), sSearch, SLOT (takeFocus())); connect(sc, SIGNAL (activated()), sSearch, SLOT (takeFocus()));
QKeySequence seql("Ctrl+l");
sc = new QShortcut(seql, this);
connect(sc, SIGNAL (activated()), sSearch, SLOT (takeFocus()));
connect(&m_watcher, SIGNAL(fileChanged(QString)), connect(&m_watcher, SIGNAL(fileChanged(QString)),
this, SLOT(idxStatus())); this, SLOT(idxStatus()));

View file

@ -12,8 +12,15 @@
using namespace std; using namespace std;
// #define LOG_PARSER
#ifdef LOG_PARSER
#define LOGP(X) {cerr << X;}
#else
#define LOGP(X)
#endif
int yylex(yy::parser::semantic_type *, yy::parser::location_type *, int yylex(yy::parser::semantic_type *, yy::parser::location_type *,
WasaParserDriver *); WasaParserDriver *);
void yyerror(char const *); void yyerror(char const *);
static void qualify(Rcl::SearchDataClauseDist *, const string &); static void qualify(Rcl::SearchDataClauseDist *, const string &);
@ -46,8 +53,8 @@ static void addSubQuery(WasaParserDriver *d,
%type <sd> query %type <sd> query
%type <str> complexfieldname %type <str> complexfieldname
/* Non operator tokens need precedence because of the possibility of /* Non operator tokens need precedence because of the possibility of
concatenation which needs to have lower prec than OR */ concatenation which needs to have lower prec than OR */
%left <str> WORD %left <str> WORD
%left <str> QUOTED %left <str> QUOTED
%left <str> QUALIFIERS %left <str> QUALIFIERS
@ -60,13 +67,14 @@ static void addSubQuery(WasaParserDriver *d,
topquery: query topquery: query
{ {
LOGP("END PARSING\n");
d->m_result = $1; d->m_result = $1;
} }
query: query:
query query %prec UCONCAT query query %prec UCONCAT
{ {
//cerr << "q: query query" << endl; LOGP("q: query query\n");
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang); Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
addSubQuery(d, sd, $1); addSubQuery(d, sd, $1);
addSubQuery(d, sd, $2); addSubQuery(d, sd, $2);
@ -74,7 +82,7 @@ query query %prec UCONCAT
} }
| query AND query | query AND query
{ {
//cerr << "q: query AND query" << endl; LOGP("q: query AND query\n");
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang); Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
addSubQuery(d, sd, $1); addSubQuery(d, sd, $1);
addSubQuery(d, sd, $3); addSubQuery(d, sd, $3);
@ -82,7 +90,7 @@ query query %prec UCONCAT
} }
| query OR query | query OR query
{ {
//cerr << "q: query OR query" << endl; LOGP("q: query OR query\n");
Rcl::SearchData *top = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang); Rcl::SearchData *top = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_OR, d->m_stemlang); Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_OR, d->m_stemlang);
addSubQuery(d, sd, $1); addSubQuery(d, sd, $1);
@ -92,13 +100,13 @@ query query %prec UCONCAT
} }
| '(' query ')' | '(' query ')'
{ {
//cerr << "q: ( query )" << endl; LOGP("q: ( query )\n");
$$ = $2; $$ = $2;
} }
| |
fieldexpr %prec UCONCAT fieldexpr %prec UCONCAT
{ {
//cerr << "q: fieldexpr" << endl; LOGP("q: fieldexpr\n");
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang); Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
d->addClause(sd, $1); d->addClause(sd, $1);
$$ = sd; $$ = sd;
@ -107,12 +115,12 @@ fieldexpr %prec UCONCAT
fieldexpr: term fieldexpr: term
{ {
// cerr << "fe: simple fieldexpr: " << $1->gettext() << endl; LOGP("fe: simple fieldexpr: " << $1->gettext() << endl);
$$ = $1; $$ = $1;
} }
| complexfieldname EQUALS term | complexfieldname EQUALS term
{ {
// cerr << "fe: " << *$1 << " = " << $3->gettext() << endl; LOGP("fe: " << *$1 << " = " << $3->gettext() << endl);
$3->setfield(*$1); $3->setfield(*$1);
$3->setrel(Rcl::SearchDataClause::REL_EQUALS); $3->setrel(Rcl::SearchDataClause::REL_EQUALS);
$$ = $3; $$ = $3;
@ -120,7 +128,7 @@ fieldexpr: term
} }
| complexfieldname CONTAINS term | complexfieldname CONTAINS term
{ {
// cerr << "fe: " << *$1 << " : " << $3->gettext() << endl; LOGP("fe: " << *$1 << " : " << $3->gettext() << endl);
$3->setfield(*$1); $3->setfield(*$1);
$3->setrel(Rcl::SearchDataClause::REL_CONTAINS); $3->setrel(Rcl::SearchDataClause::REL_CONTAINS);
$$ = $3; $$ = $3;
@ -128,7 +136,7 @@ fieldexpr: term
} }
| complexfieldname SMALLER term | complexfieldname SMALLER term
{ {
// cerr << "fe: " << *$1 << " < " << $3->gettext() << endl; LOGP(cerr << "fe: " << *$1 << " < " << $3->gettext() << endl);
$3->setfield(*$1); $3->setfield(*$1);
$3->setrel(Rcl::SearchDataClause::REL_LT); $3->setrel(Rcl::SearchDataClause::REL_LT);
$$ = $3; $$ = $3;
@ -136,7 +144,7 @@ fieldexpr: term
} }
| complexfieldname SMALLEREQ term | complexfieldname SMALLEREQ term
{ {
// cerr << "fe: " << *$1 << " <= " << $3->gettext() << endl; LOGP("fe: " << *$1 << " <= " << $3->gettext() << endl);
$3->setfield(*$1); $3->setfield(*$1);
$3->setrel(Rcl::SearchDataClause::REL_LTE); $3->setrel(Rcl::SearchDataClause::REL_LTE);
$$ = $3; $$ = $3;
@ -144,7 +152,7 @@ fieldexpr: term
} }
| complexfieldname GREATER term | complexfieldname GREATER term
{ {
// cerr << "fe: " << *$1 << " > " << $3->gettext() << endl; LOGP("fe: " << *$1 << " > " << $3->gettext() << endl);
$3->setfield(*$1); $3->setfield(*$1);
$3->setrel(Rcl::SearchDataClause::REL_GT); $3->setrel(Rcl::SearchDataClause::REL_GT);
$$ = $3; $$ = $3;
@ -152,7 +160,7 @@ fieldexpr: term
} }
| complexfieldname GREATEREQ term | complexfieldname GREATEREQ term
{ {
// cerr << "fe: " << *$1 << " >= " << $3->gettext() << endl; LOGP("fe: " << *$1 << " >= " << $3->gettext() << endl);
$3->setfield(*$1); $3->setfield(*$1);
$3->setrel(Rcl::SearchDataClause::REL_GTE); $3->setrel(Rcl::SearchDataClause::REL_GTE);
$$ = $3; $$ = $3;
@ -160,7 +168,7 @@ fieldexpr: term
} }
| '-' fieldexpr | '-' fieldexpr
{ {
// cerr << "fe: - fieldexpr[" << $2->gettext() << "]" << endl; LOGP("fe: - fieldexpr[" << $2->gettext() << "]" << endl);
$2->setexclude(true); $2->setexclude(true);
$$ = $2; $$ = $2;
} }
@ -170,13 +178,13 @@ fieldexpr: term
complexfieldname: complexfieldname:
WORD WORD
{ {
// cerr << "cfn: WORD" << endl; LOGP("cfn: WORD" << endl);
$$ = $1; $$ = $1;
} }
| |
complexfieldname CONTAINS WORD complexfieldname CONTAINS WORD
{ {
// cerr << "cfn: complexfieldname ':' WORD" << endl; LOGP("cfn: complexfieldname ':' WORD" << endl);
$$ = new string(*$1 + string(":") + *$3); $$ = new string(*$1 + string(":") + *$3);
delete $1; delete $1;
delete $3; delete $3;
@ -185,7 +193,7 @@ complexfieldname CONTAINS WORD
term: term:
WORD WORD
{ {
//cerr << "term[" << *$1 << "]" << endl; LOGP("term[" << *$1 << "]" << endl);
$$ = new Rcl::SearchDataClauseSimple(Rcl::SCLT_AND, *$1); $$ = new Rcl::SearchDataClauseSimple(Rcl::SCLT_AND, *$1);
delete $1; delete $1;
} }
@ -197,13 +205,13 @@ WORD
qualquote: qualquote:
QUOTED QUOTED
{ {
// cerr << "QUOTED[" << *$1 << "]" << endl; LOGP("QUOTED[" << *$1 << "]" << endl);
$$ = new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0); $$ = new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0);
delete $1; delete $1;
} }
| QUOTED QUALIFIERS | QUOTED QUALIFIERS
{ {
// cerr << "QUOTED[" << *$1 << "] QUALIFIERS[" << *$2 << "]" << endl; LOGP("QUOTED[" << *$1 << "] QUALIFIERS[" << *$2 << "]" << endl);
Rcl::SearchDataClauseDist *cl = Rcl::SearchDataClauseDist *cl =
new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0); new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0);
qualify(cl, *$2); qualify(cl, *$2);
@ -318,8 +326,9 @@ static int parseString(WasaParserDriver *d, yy::parser::semantic_type *yylval)
break; break;
case '"': case '"':
/* End of string. Look for qualifiers */ /* End of string. Look for qualifiers */
while ((c = d->GETCHAR()) && !isspace(c)) while ((c = d->GETCHAR()) && (isalnum(c) || c == '.'))
d->qualifiers().push_back(c); d->qualifiers().push_back(c);
d->UNGETCHAR(c);
goto out; goto out;
default: default:
value->push_back(c); value->push_back(c);

View file

@ -91,11 +91,11 @@ bool SearchData::maybeAddAutoPhrase(Rcl::Db& db, double freqThreshold)
string field; string field;
vector<string> words; vector<string> words;
// Walk the clause list. If we find any non simple clause or different // Walk the clause list. If this is not an AND list, we find any
// field names, bail out. // non simple clause or different field names, bail out.
for (qlist_it_t it = m_query.begin(); it != m_query.end(); it++) { for (qlist_it_t it = m_query.begin(); it != m_query.end(); it++) {
SClType tp = (*it)->m_tp; SClType tp = (*it)->m_tp;
if (tp != SCLT_AND && tp != SCLT_OR) { if (tp != SCLT_AND) {
LOGDEB2(("SearchData::maybeAddAutoPhrase: wrong tp %d\n", tp)); LOGDEB2(("SearchData::maybeAddAutoPhrase: wrong tp %d\n", tp));
return false; return false;
} }

View file

@ -121,10 +121,10 @@ subdirectory, because of all the places they're referred from
<p><a href="recoll-1.20.6.tar.gz">recoll-1.20.6.tar.gz</a>.</p> <p><a href="recoll-1.20.6.tar.gz">recoll-1.20.6.tar.gz</a>.</p>
<h3>Release 1.21.0</h3> <h3>Release 1.21.1</h3>
<p>Not the right choice if you are after complete stability: <p>Not the right choice if you are after complete stability:
<a href="recoll-1.21.0.tar.gz">recoll-1.21.0.tar.gz</a>. See what's <a href="recoll-1.21.1.tar.gz">recoll-1.21.1.tar.gz</a>. See what's
new in the <a href="release-1.21.html">release notes</a>.</p> new in the <a href="release-1.21.html">release notes</a>.</p>
<!-- <!--

View file

@ -7,12 +7,12 @@
== Introduction == Introduction
Recoll is a big process which executes many others, mostly for extracting The Recoll indexer, *recollindex*, is a big process which executes many
text from documents. Some of the executed processes are quite short-lived, others, mostly for extracting text from documents. Some of the executed
and the time used by the process execution machinery can actually dominate processes are quite short-lived, and the time used by the process execution
the time used to translate data. This document explores possible approaches machinery can actually dominate the time used to translate data. This
to improving performance without adding excessive complexity or damaging document explores possible approaches to improving performance without
reliability. adding excessive complexity or damaging reliability.
Studying fork/exec performance is not exactly a new venture, and there are Studying fork/exec performance is not exactly a new venture, and there are
many texts which address the subject. While researching, though, I found many texts which address the subject. While researching, though, I found
@ -32,9 +32,10 @@ identical processes.
space initialized from an executable file, inheriting some of the resources space initialized from an executable file, inheriting some of the resources
under various conditions. under various conditions.
As processes became bigger the copy-before-discard operation wasted This was all fine with the small processes of the first Unix systems, but
significant resources, and was optimized using two methods (at very as time progressed, processes became bigger and the copy-before-discard
different points in time): operation was found to waste significant resources. It was optimized using
two methods (at very different points in time):
- The first approach was to supplement +fork()+ with the +vfork()+ call, which - The first approach was to supplement +fork()+ with the +vfork()+ call, which
is similar but does not duplicate the address space: the new process is similar but does not duplicate the address space: the new process
@ -176,7 +177,7 @@ a single thread, and +fork()+ if it ran multiple ones.
After another careful look at the code, I could see few issues with After another careful look at the code, I could see few issues with
using +vfork()+ in the multithreaded indexer, so this was committed. using +vfork()+ in the multithreaded indexer, so this was committed.
The only change necessary was to get rid on an implementation of the The only change necessary was to get rid of an implementation of the
lacking Linux +closefrom()+ call (used to close all open descriptors above a lacking Linux +closefrom()+ call (used to close all open descriptors above a
given value). The previous Recoll implementation listed the +/proc/self/fd+ given value). The previous Recoll implementation listed the +/proc/self/fd+
directory to look for open descriptors but this was unsafe because of of directory to look for open descriptors but this was unsafe because of of
@ -200,13 +201,14 @@ same times as the +fork()+/+vfork()+ options.
The tests were performed on an Intel Core i5 750 (4 cores, 4 threads). The tests were performed on an Intel Core i5 750 (4 cores, 4 threads).
The last line is just for the fun: *recollindex* 1.18 (single-threaded)
needed almost 6 times as long to process the same files...
It would be painful to play it safe and discard the 60% reduction in It would be painful to play it safe and discard the 60% reduction in
execution time offered by using +vfork()+. execution time offered by using +vfork()+, so this was adopted for Recoll
1.21. To this day, no problems were discovered, but, still crossing
fingers...
To this day, no problems were discovered, but, still crossing fingers... The last line in the table is just for the fun: *recollindex* 1.18
(single-threaded) needed almost 6 times as long to process the same
files...
//// ////
Objections to vfork: Objections to vfork:

View file

@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html> <html>
<head> <head>
<title>Recoll 1.20 series release notes</title> <title>Recoll 1.21 series release notes</title>
<meta name="Author" content="Jean-Francois Dockes"> <meta name="Author" content="Jean-Francois Dockes">
<meta name="Description" <meta name="Description"
content="recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine"> content="recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine">
@ -23,7 +23,7 @@
</div> </div>
<div class="content"> <div class="content">
<h1>Release notes for Recoll 1.20.x</h1> <h1>Release notes for Recoll 1.21.x</h1>
<h2>Caveats</h2> <h2>Caveats</h2>
@ -55,8 +55,23 @@
see the manual</a>). If you do so, you must then reset the see the manual</a>). If you do so, you must then reset the
index.</p> index.</p>
<h2>Minor releases</h2>
<ul>
<li>1.21.1:
<ul>
<li>Force memory usage limits on external filters.</li>
<li>GUI: add Ctrl+l as a shortcut to return focus to the
search entry (compat with web browsers).</li>
<li>result list popup allows saving results from web cache
to files.</li>
<li>The web history indexer also processes non-html files
(e.g.: pdfs).</li>
</ul>
</li>
</ul>
<h2>Changes in Recoll 1.21</h2> <h2>Changes in Recoll 1.21.0</h2>
<ul> <ul>
<li>Allow saving queries to files and reloading them <li>Allow saving queries to files and reloading them
@ -71,9 +86,10 @@
<li>Improve indexing speed by always using vfork() for <li>Improve indexing speed by always using vfork() for
spawning external commands.</li> spawning external commands.</li>
<li>The pdf filter gains the capability to run OCR (tesseract) on <li>The pdf filter gains the capability to run OCR (tesseract) on
image-only files.</li> image-only files. This happens automatically on image-only
<li>Improved check about when we should try to uncompress pdfs if tesseract is available.</li>
stuff. Will eliminate some of the most dreadful case of <li>Improved checks about when we should try to uncompress
stuff. Will eliminate some of the most dreadful cases of
recollindex having an impact on system performance.</li> recollindex having an impact on system performance.</li>
<li>Warn if non-existent paths are listed in the configuration <li>Warn if non-existent paths are listed in the configuration
file (help with typos).</li> file (help with typos).</li>