merge
This commit is contained in:
commit
e9e1c6ea6d
14 changed files with 158 additions and 84 deletions
|
@ -1018,6 +1018,14 @@ Chapter 5. Installation and configuration
|
||||||
Maximum handler execution time, after which it is aborted. Some
|
Maximum handler execution time, after which it is aborted. Some
|
||||||
postscript programs just loop...
|
postscript programs just loop...
|
||||||
|
|
||||||
|
filtermaxmbytes
|
||||||
|
|
||||||
|
Recoll 1.20.7 and later. Maximum handler memory utilisation. This
|
||||||
|
uses setrlimit(RLIMIT_AS) on most systems (total virtual memory
|
||||||
|
space size limit). Some programs may start with 500 MBytes of
|
||||||
|
mapped shared libraries, so take this into account when choosing a
|
||||||
|
value. The default is a liberal 2000MB.
|
||||||
|
|
||||||
filtersdir
|
filtersdir
|
||||||
|
|
||||||
A directory to search for the external input handler scripts used
|
A directory to search for the external input handler scripts used
|
||||||
|
|
43
src/README
43
src/README
|
@ -1858,25 +1858,23 @@ Chapter 3. Searching
|
||||||
third option has been available in recent releases and is probably now
|
third option has been available in recent releases and is probably now
|
||||||
the best one: use PRE tags with line wrapping.
|
the best one: use PRE tags with line wrapping.
|
||||||
|
|
||||||
o Use desktop preferences to choose document editor: if this is checked,
|
o Choose editor applicationsr: this opens a dialog which allows you to
|
||||||
the xdg-open utility will be used to open files when you click the
|
select the application to be used to open each MIME type. The default
|
||||||
Open link in the result list, instead of the application defined in
|
is nornally to use the xdg-open utility, but you can override it.
|
||||||
mimeview. xdg-open will in term use your desktop preferences to choose
|
|
||||||
an appropriate application.
|
|
||||||
|
|
||||||
o Exceptions: when using the desktop preferences for opening documents,
|
o Exceptions: even wen xdg-open is used by default for opening
|
||||||
these are MIME types that will still be opened according to Recoll
|
documents, you can set exceptions for MIME types that will still be
|
||||||
preferences. This is useful for passing parameters like page numbers
|
opened according to Recoll preferences. This is useful for passing
|
||||||
or search strings to applications that support them (e.g. evince).
|
parameters like page numbers or search strings to applications that
|
||||||
This cannot be done with xdg-open which only supports passing one
|
support them (e.g. evince). This cannot be done with xdg-open which
|
||||||
parameter.
|
only supports passing one parameter.
|
||||||
|
|
||||||
o Choose editor applications this will let you choose the command
|
o Document filter choice style: this will let you choose if the document
|
||||||
started by the Open links inside the result list, for specific
|
categories are displayed as a list or a set of buttons, or a menu.
|
||||||
document types.
|
|
||||||
|
|
||||||
o Display category filter as toolbar... this will let you choose if the
|
o Start with simple search mode: this lets you choose the value of the
|
||||||
document categories are displayed as a list or a set of buttons.
|
simple search type on program startup. Either a fixed value (e.g.
|
||||||
|
Query Language, or the value in use when the program last exited.
|
||||||
|
|
||||||
o Auto-start simple search on white space entry: if this is checked, a
|
o Auto-start simple search on white space entry: if this is checked, a
|
||||||
search will be executed each time you enter a space in the simple
|
search will be executed each time you enter a space in the simple
|
||||||
|
@ -2159,7 +2157,10 @@ Chapter 3. Searching
|
||||||
recollq is not built by default. You can use the Makefile in the query
|
recollq is not built by default. You can use the Makefile in the query
|
||||||
directory to build it. This is a very simple program, and if you can
|
directory to build it. This is a very simple program, and if you can
|
||||||
program a little c++, you may find it useful to taylor its output format
|
program a little c++, you may find it useful to taylor its output format
|
||||||
to your needs.
|
to your needs. Not that recollq is only really useful on systems where the
|
||||||
|
Qt libraries (or even the X11 ones) are not available. Otherwise, just use
|
||||||
|
recoll -t, which takes the exact same parameters and options which are
|
||||||
|
described for recollq
|
||||||
|
|
||||||
recollq has a man page (not installed by default, look in the doc/man
|
recollq has a man page (not installed by default, look in the doc/man
|
||||||
directory). The Usage string is as follows:
|
directory). The Usage string is as follows:
|
||||||
|
@ -4286,6 +4287,14 @@ Chapter 5. Installation and configuration
|
||||||
Maximum handler execution time, after which it is aborted. Some
|
Maximum handler execution time, after which it is aborted. Some
|
||||||
postscript programs just loop...
|
postscript programs just loop...
|
||||||
|
|
||||||
|
filtermaxmbytes
|
||||||
|
|
||||||
|
Recoll 1.20.7 and later. Maximum handler memory utilisation. This
|
||||||
|
uses setrlimit(RLIMIT_AS) on most systems (total virtual memory
|
||||||
|
space size limit). Some programs may start with 500 MBytes of
|
||||||
|
mapped shared libraries, so take this into account when choosing a
|
||||||
|
value. The default is a liberal 2000MB.
|
||||||
|
|
||||||
filtersdir
|
filtersdir
|
||||||
|
|
||||||
A directory to search for the external input handler scripts used
|
A directory to search for the external input handler scripts used
|
||||||
|
|
|
@ -709,6 +709,12 @@ bool TextSplit::text_to_words(const string &in)
|
||||||
// confusing.
|
// confusing.
|
||||||
// ie "MySQL manual" is matched by "MySQL manual" and
|
// ie "MySQL manual" is matched by "MySQL manual" and
|
||||||
// "my sql manual" but not "mysql manual"
|
// "my sql manual" but not "mysql manual"
|
||||||
|
|
||||||
|
// A possibility would be to emit both my and sql at the
|
||||||
|
// same position. All non-phrase searches would work, and
|
||||||
|
// both "MySQL manual" and "mysql manual" phrases would
|
||||||
|
// match too. "my sql manual" would not match, but this is
|
||||||
|
// not an issue.
|
||||||
case A_ULETTER:
|
case A_ULETTER:
|
||||||
if (m_span.length() &&
|
if (m_span.length() &&
|
||||||
charclasses[(unsigned char)m_span[m_span.length() - 1]] ==
|
charclasses[(unsigned char)m_span[m_span.length() - 1]] ==
|
||||||
|
|
|
@ -2917,7 +2917,11 @@ MimeType=*/*
|
||||||
use the <filename>Makefile</filename> in the
|
use the <filename>Makefile</filename> in the
|
||||||
<filename>query</filename> directory to build it. This is a very
|
<filename>query</filename> directory to build it. This is a very
|
||||||
simple program, and if you can program a little c++, you may find it
|
simple program, and if you can program a little c++, you may find it
|
||||||
useful to taylor its output format to your needs.</para>
|
useful to taylor its output format to your needs. Not that recollq is
|
||||||
|
only really useful on systems where the Qt libraries (or even the X11
|
||||||
|
ones) are not available. Otherwise, just use <literal>recoll
|
||||||
|
-t</literal>, which takes the exact same parameters and options which
|
||||||
|
are described for <command>recollq</command></para>
|
||||||
|
|
||||||
<para><command>recollq</command> has a man page (not installed by
|
<para><command>recollq</command> has a man page (not installed by
|
||||||
default, look in the <filename>doc/man</filename> directory). The
|
default, look in the <filename>doc/man</filename> directory). The
|
||||||
|
|
|
@ -114,19 +114,26 @@ def main (args):
|
||||||
except getopt.GetoptError:
|
except getopt.GetoptError:
|
||||||
error("error parsing input options\n")
|
error("error parsing input options\n")
|
||||||
usage(exname)
|
usage(exname)
|
||||||
return
|
return false
|
||||||
|
|
||||||
|
status = True
|
||||||
try:
|
try:
|
||||||
dumper = PPTDumper(args[0], globals.params)
|
dumper = PPTDumper(args[0], globals.params)
|
||||||
if not dumper.dump():
|
if not dumper.dump():
|
||||||
error("ppt-dump: dump error " + args[0] + "\n")
|
error("ppt-dump: dump error " + args[0] + "\n")
|
||||||
|
status = False
|
||||||
except:
|
except:
|
||||||
error("ppt-dump: FAILURE (bad format?) " + args[0] + "\n")
|
error("ppt-dump: FAILURE (bad format?) " + args[0] + "\n")
|
||||||
|
status = False
|
||||||
|
|
||||||
if globals.params.dumpText:
|
if globals.params.dumpText:
|
||||||
print(globals.textdump.replace("\r", "\n"))
|
print(globals.textdump.replace("\r", "\n"))
|
||||||
|
return(status)
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
main(sys.argv)
|
if main(sys.argv):
|
||||||
|
sys.exit(0)
|
||||||
|
else:
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
# vim:set filetype=python shiftwidth=4 softtabstop=4 expandtab:
|
# vim:set filetype=python shiftwidth=4 softtabstop=4 expandtab:
|
||||||
|
|
|
@ -28,9 +28,8 @@
|
||||||
#include <string>
|
#include <string>
|
||||||
#include <iostream>
|
#include <iostream>
|
||||||
#include <map>
|
#include <map>
|
||||||
#ifndef NO_NAMESPACES
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
#endif /* NO_NAMESPACES */
|
|
||||||
|
|
||||||
#include "cstr.h"
|
#include "cstr.h"
|
||||||
#include "internfile.h"
|
#include "internfile.h"
|
||||||
|
@ -550,6 +549,10 @@ bool FileInterner::dijontorcl(Rcl::Doc& doc)
|
||||||
// doc with an ipath, not the last one which is usually text/plain We
|
// doc with an ipath, not the last one which is usually text/plain We
|
||||||
// also set the author and modification time from the last doc which
|
// also set the author and modification time from the last doc which
|
||||||
// has them.
|
// has them.
|
||||||
|
//
|
||||||
|
// The stack can contain objects with an ipath element (corresponding
|
||||||
|
// to actual embedded documents), and, at the top, elements without an
|
||||||
|
// ipath element, corresponding to format translations of the last doc.
|
||||||
//
|
//
|
||||||
// The docsize is fetched from the first element without an ipath
|
// The docsize is fetched from the first element without an ipath
|
||||||
// (first non container). If the last element directly returns
|
// (first non container). If the last element directly returns
|
||||||
|
@ -579,7 +582,8 @@ void FileInterner::collectIpathAndMT(Rcl::Doc& doc) const
|
||||||
const map<string, string>& docdata = (*hit)->get_meta_data();
|
const map<string, string>& docdata = (*hit)->get_meta_data();
|
||||||
if (getKeyValue(docdata, cstr_dj_keyipath, ipathel)) {
|
if (getKeyValue(docdata, cstr_dj_keyipath, ipathel)) {
|
||||||
if (!ipathel.empty()) {
|
if (!ipathel.empty()) {
|
||||||
// We have a non-empty ipath
|
// Non-empty ipath. This stack element is for an
|
||||||
|
// actual embedded document, not a format translation.
|
||||||
hasipath = true;
|
hasipath = true;
|
||||||
getKeyValue(docdata, cstr_dj_keymt, doc.mimetype);
|
getKeyValue(docdata, cstr_dj_keymt, doc.mimetype);
|
||||||
getKeyValue(docdata, cstr_dj_keyfn, doc.meta[Rcl::Doc::keyfn]);
|
getKeyValue(docdata, cstr_dj_keyfn, doc.meta[Rcl::Doc::keyfn]);
|
||||||
|
@ -593,8 +597,18 @@ void FileInterner::collectIpathAndMT(Rcl::Doc& doc) const
|
||||||
getKeyValue(docdata, cstr_dj_keydocsize, doc.fbytes);
|
getKeyValue(docdata, cstr_dj_keydocsize, doc.fbytes);
|
||||||
doc.ipath += cstr_isep;
|
doc.ipath += cstr_isep;
|
||||||
}
|
}
|
||||||
getKeyValue(docdata, cstr_dj_keyauthor, doc.meta[Rcl::Doc::keyau]);
|
// We set the author field from the innermost doc which has
|
||||||
getKeyValue(docdata, cstr_dj_keymd, doc.dmtime);
|
// one: allows finding, e.g. an image attachment having no
|
||||||
|
// metadata by a search on the sender name. Only do this for
|
||||||
|
// actually embedded documents (avoid replacing values from
|
||||||
|
// metacmds for the topmost one). For a topmost doc, author
|
||||||
|
// will be merged by dijontorcl() later on. About same for
|
||||||
|
// dmtime, but an external value will be replaced, not
|
||||||
|
// augmented if dijontorcl() finds an internal value.
|
||||||
|
if (hasipath) {
|
||||||
|
getKeyValue(docdata, cstr_dj_keyauthor, doc.meta[Rcl::Doc::keyau]);
|
||||||
|
getKeyValue(docdata, cstr_dj_keymd, doc.dmtime);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Trim empty tail elements in ipath.
|
// Trim empty tail elements in ipath.
|
||||||
|
@ -878,12 +892,6 @@ FileInterner::Status FileInterner::internfile(Rcl::Doc& doc, const string& ipath
|
||||||
return FIAgain;
|
return FIAgain;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Temporary while we fix backend things
|
|
||||||
static string urltolocalpath(string url)
|
|
||||||
{
|
|
||||||
return url.substr(7, string::npos);
|
|
||||||
}
|
|
||||||
|
|
||||||
bool FileInterner::tempFileForMT(TempFile& otemp, RclConfig* cnf,
|
bool FileInterner::tempFileForMT(TempFile& otemp, RclConfig* cnf,
|
||||||
const string& mimetype)
|
const string& mimetype)
|
||||||
{
|
{
|
||||||
|
|
|
@ -235,8 +235,11 @@ Usage(void)
|
||||||
|
|
||||||
int main(int argc, char **argv)
|
int main(int argc, char **argv)
|
||||||
{
|
{
|
||||||
// If "-t" is present at all, we don't do the GUI thing and pass the
|
// if we are named recollq or option "-t" is present at all, we
|
||||||
// whole to recollq for command line / pipe usage.
|
// don't do the GUI thing and pass the whole to recollq for
|
||||||
|
// command line / pipe usage.
|
||||||
|
if (!strcmp(argv[0], "recollq"))
|
||||||
|
exit(recollq(&theconfig, argc, argv));
|
||||||
for (int i = 0; i < argc; i++) {
|
for (int i = 0; i < argc; i++) {
|
||||||
if (!strcmp(argv[i], "-t")) {
|
if (!strcmp(argv[i], "-t")) {
|
||||||
exit(recollq(&theconfig, argc, argv));
|
exit(recollq(&theconfig, argc, argv));
|
||||||
|
|
|
@ -85,8 +85,7 @@ void RclMain::showFragButs()
|
||||||
connect(fragbuts, SIGNAL(fragmentsChanged()),
|
connect(fragbuts, SIGNAL(fragmentsChanged()),
|
||||||
this, SLOT(onFragmentsChanged()));
|
this, SLOT(onFragmentsChanged()));
|
||||||
} else {
|
} else {
|
||||||
delete fragbuts;
|
deleteZ(fragbuts);
|
||||||
fragbuts = 0;
|
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
// Close and reopen, in hope that makes us visible...
|
// Close and reopen, in hope that makes us visible...
|
||||||
|
|
|
@ -279,6 +279,9 @@ void RclMain::init()
|
||||||
QKeySequence seq("Ctrl+Shift+s");
|
QKeySequence seq("Ctrl+Shift+s");
|
||||||
QShortcut *sc = new QShortcut(seq, this);
|
QShortcut *sc = new QShortcut(seq, this);
|
||||||
connect(sc, SIGNAL (activated()), sSearch, SLOT (takeFocus()));
|
connect(sc, SIGNAL (activated()), sSearch, SLOT (takeFocus()));
|
||||||
|
QKeySequence seql("Ctrl+l");
|
||||||
|
sc = new QShortcut(seql, this);
|
||||||
|
connect(sc, SIGNAL (activated()), sSearch, SLOT (takeFocus()));
|
||||||
|
|
||||||
connect(&m_watcher, SIGNAL(fileChanged(QString)),
|
connect(&m_watcher, SIGNAL(fileChanged(QString)),
|
||||||
this, SLOT(idxStatus()));
|
this, SLOT(idxStatus()));
|
||||||
|
|
|
@ -12,8 +12,15 @@
|
||||||
|
|
||||||
using namespace std;
|
using namespace std;
|
||||||
|
|
||||||
|
// #define LOG_PARSER
|
||||||
|
#ifdef LOG_PARSER
|
||||||
|
#define LOGP(X) {cerr << X;}
|
||||||
|
#else
|
||||||
|
#define LOGP(X)
|
||||||
|
#endif
|
||||||
|
|
||||||
int yylex(yy::parser::semantic_type *, yy::parser::location_type *,
|
int yylex(yy::parser::semantic_type *, yy::parser::location_type *,
|
||||||
WasaParserDriver *);
|
WasaParserDriver *);
|
||||||
void yyerror(char const *);
|
void yyerror(char const *);
|
||||||
static void qualify(Rcl::SearchDataClauseDist *, const string &);
|
static void qualify(Rcl::SearchDataClauseDist *, const string &);
|
||||||
|
|
||||||
|
@ -46,8 +53,8 @@ static void addSubQuery(WasaParserDriver *d,
|
||||||
%type <sd> query
|
%type <sd> query
|
||||||
%type <str> complexfieldname
|
%type <str> complexfieldname
|
||||||
|
|
||||||
/* Non operator tokens need precedence because of the possibility of
|
/* Non operator tokens need precedence because of the possibility of
|
||||||
concatenation which needs to have lower prec than OR */
|
concatenation which needs to have lower prec than OR */
|
||||||
%left <str> WORD
|
%left <str> WORD
|
||||||
%left <str> QUOTED
|
%left <str> QUOTED
|
||||||
%left <str> QUALIFIERS
|
%left <str> QUALIFIERS
|
||||||
|
@ -60,13 +67,14 @@ static void addSubQuery(WasaParserDriver *d,
|
||||||
|
|
||||||
topquery: query
|
topquery: query
|
||||||
{
|
{
|
||||||
|
LOGP("END PARSING\n");
|
||||||
d->m_result = $1;
|
d->m_result = $1;
|
||||||
}
|
}
|
||||||
|
|
||||||
query:
|
query:
|
||||||
query query %prec UCONCAT
|
query query %prec UCONCAT
|
||||||
{
|
{
|
||||||
//cerr << "q: query query" << endl;
|
LOGP("q: query query\n");
|
||||||
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
||||||
addSubQuery(d, sd, $1);
|
addSubQuery(d, sd, $1);
|
||||||
addSubQuery(d, sd, $2);
|
addSubQuery(d, sd, $2);
|
||||||
|
@ -74,7 +82,7 @@ query query %prec UCONCAT
|
||||||
}
|
}
|
||||||
| query AND query
|
| query AND query
|
||||||
{
|
{
|
||||||
//cerr << "q: query AND query" << endl;
|
LOGP("q: query AND query\n");
|
||||||
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
||||||
addSubQuery(d, sd, $1);
|
addSubQuery(d, sd, $1);
|
||||||
addSubQuery(d, sd, $3);
|
addSubQuery(d, sd, $3);
|
||||||
|
@ -82,7 +90,7 @@ query query %prec UCONCAT
|
||||||
}
|
}
|
||||||
| query OR query
|
| query OR query
|
||||||
{
|
{
|
||||||
//cerr << "q: query OR query" << endl;
|
LOGP("q: query OR query\n");
|
||||||
Rcl::SearchData *top = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
Rcl::SearchData *top = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
||||||
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_OR, d->m_stemlang);
|
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_OR, d->m_stemlang);
|
||||||
addSubQuery(d, sd, $1);
|
addSubQuery(d, sd, $1);
|
||||||
|
@ -92,13 +100,13 @@ query query %prec UCONCAT
|
||||||
}
|
}
|
||||||
| '(' query ')'
|
| '(' query ')'
|
||||||
{
|
{
|
||||||
//cerr << "q: ( query )" << endl;
|
LOGP("q: ( query )\n");
|
||||||
$$ = $2;
|
$$ = $2;
|
||||||
}
|
}
|
||||||
|
|
|
|
||||||
fieldexpr %prec UCONCAT
|
fieldexpr %prec UCONCAT
|
||||||
{
|
{
|
||||||
//cerr << "q: fieldexpr" << endl;
|
LOGP("q: fieldexpr\n");
|
||||||
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
Rcl::SearchData *sd = new Rcl::SearchData(Rcl::SCLT_AND, d->m_stemlang);
|
||||||
d->addClause(sd, $1);
|
d->addClause(sd, $1);
|
||||||
$$ = sd;
|
$$ = sd;
|
||||||
|
@ -107,12 +115,12 @@ fieldexpr %prec UCONCAT
|
||||||
|
|
||||||
fieldexpr: term
|
fieldexpr: term
|
||||||
{
|
{
|
||||||
// cerr << "fe: simple fieldexpr: " << $1->gettext() << endl;
|
LOGP("fe: simple fieldexpr: " << $1->gettext() << endl);
|
||||||
$$ = $1;
|
$$ = $1;
|
||||||
}
|
}
|
||||||
| complexfieldname EQUALS term
|
| complexfieldname EQUALS term
|
||||||
{
|
{
|
||||||
// cerr << "fe: " << *$1 << " = " << $3->gettext() << endl;
|
LOGP("fe: " << *$1 << " = " << $3->gettext() << endl);
|
||||||
$3->setfield(*$1);
|
$3->setfield(*$1);
|
||||||
$3->setrel(Rcl::SearchDataClause::REL_EQUALS);
|
$3->setrel(Rcl::SearchDataClause::REL_EQUALS);
|
||||||
$$ = $3;
|
$$ = $3;
|
||||||
|
@ -120,7 +128,7 @@ fieldexpr: term
|
||||||
}
|
}
|
||||||
| complexfieldname CONTAINS term
|
| complexfieldname CONTAINS term
|
||||||
{
|
{
|
||||||
// cerr << "fe: " << *$1 << " : " << $3->gettext() << endl;
|
LOGP("fe: " << *$1 << " : " << $3->gettext() << endl);
|
||||||
$3->setfield(*$1);
|
$3->setfield(*$1);
|
||||||
$3->setrel(Rcl::SearchDataClause::REL_CONTAINS);
|
$3->setrel(Rcl::SearchDataClause::REL_CONTAINS);
|
||||||
$$ = $3;
|
$$ = $3;
|
||||||
|
@ -128,7 +136,7 @@ fieldexpr: term
|
||||||
}
|
}
|
||||||
| complexfieldname SMALLER term
|
| complexfieldname SMALLER term
|
||||||
{
|
{
|
||||||
// cerr << "fe: " << *$1 << " < " << $3->gettext() << endl;
|
LOGP(cerr << "fe: " << *$1 << " < " << $3->gettext() << endl);
|
||||||
$3->setfield(*$1);
|
$3->setfield(*$1);
|
||||||
$3->setrel(Rcl::SearchDataClause::REL_LT);
|
$3->setrel(Rcl::SearchDataClause::REL_LT);
|
||||||
$$ = $3;
|
$$ = $3;
|
||||||
|
@ -136,7 +144,7 @@ fieldexpr: term
|
||||||
}
|
}
|
||||||
| complexfieldname SMALLEREQ term
|
| complexfieldname SMALLEREQ term
|
||||||
{
|
{
|
||||||
// cerr << "fe: " << *$1 << " <= " << $3->gettext() << endl;
|
LOGP("fe: " << *$1 << " <= " << $3->gettext() << endl);
|
||||||
$3->setfield(*$1);
|
$3->setfield(*$1);
|
||||||
$3->setrel(Rcl::SearchDataClause::REL_LTE);
|
$3->setrel(Rcl::SearchDataClause::REL_LTE);
|
||||||
$$ = $3;
|
$$ = $3;
|
||||||
|
@ -144,7 +152,7 @@ fieldexpr: term
|
||||||
}
|
}
|
||||||
| complexfieldname GREATER term
|
| complexfieldname GREATER term
|
||||||
{
|
{
|
||||||
// cerr << "fe: " << *$1 << " > " << $3->gettext() << endl;
|
LOGP("fe: " << *$1 << " > " << $3->gettext() << endl);
|
||||||
$3->setfield(*$1);
|
$3->setfield(*$1);
|
||||||
$3->setrel(Rcl::SearchDataClause::REL_GT);
|
$3->setrel(Rcl::SearchDataClause::REL_GT);
|
||||||
$$ = $3;
|
$$ = $3;
|
||||||
|
@ -152,7 +160,7 @@ fieldexpr: term
|
||||||
}
|
}
|
||||||
| complexfieldname GREATEREQ term
|
| complexfieldname GREATEREQ term
|
||||||
{
|
{
|
||||||
// cerr << "fe: " << *$1 << " >= " << $3->gettext() << endl;
|
LOGP("fe: " << *$1 << " >= " << $3->gettext() << endl);
|
||||||
$3->setfield(*$1);
|
$3->setfield(*$1);
|
||||||
$3->setrel(Rcl::SearchDataClause::REL_GTE);
|
$3->setrel(Rcl::SearchDataClause::REL_GTE);
|
||||||
$$ = $3;
|
$$ = $3;
|
||||||
|
@ -160,7 +168,7 @@ fieldexpr: term
|
||||||
}
|
}
|
||||||
| '-' fieldexpr
|
| '-' fieldexpr
|
||||||
{
|
{
|
||||||
// cerr << "fe: - fieldexpr[" << $2->gettext() << "]" << endl;
|
LOGP("fe: - fieldexpr[" << $2->gettext() << "]" << endl);
|
||||||
$2->setexclude(true);
|
$2->setexclude(true);
|
||||||
$$ = $2;
|
$$ = $2;
|
||||||
}
|
}
|
||||||
|
@ -170,13 +178,13 @@ fieldexpr: term
|
||||||
complexfieldname:
|
complexfieldname:
|
||||||
WORD
|
WORD
|
||||||
{
|
{
|
||||||
// cerr << "cfn: WORD" << endl;
|
LOGP("cfn: WORD" << endl);
|
||||||
$$ = $1;
|
$$ = $1;
|
||||||
}
|
}
|
||||||
|
|
|
|
||||||
complexfieldname CONTAINS WORD
|
complexfieldname CONTAINS WORD
|
||||||
{
|
{
|
||||||
// cerr << "cfn: complexfieldname ':' WORD" << endl;
|
LOGP("cfn: complexfieldname ':' WORD" << endl);
|
||||||
$$ = new string(*$1 + string(":") + *$3);
|
$$ = new string(*$1 + string(":") + *$3);
|
||||||
delete $1;
|
delete $1;
|
||||||
delete $3;
|
delete $3;
|
||||||
|
@ -185,7 +193,7 @@ complexfieldname CONTAINS WORD
|
||||||
term:
|
term:
|
||||||
WORD
|
WORD
|
||||||
{
|
{
|
||||||
//cerr << "term[" << *$1 << "]" << endl;
|
LOGP("term[" << *$1 << "]" << endl);
|
||||||
$$ = new Rcl::SearchDataClauseSimple(Rcl::SCLT_AND, *$1);
|
$$ = new Rcl::SearchDataClauseSimple(Rcl::SCLT_AND, *$1);
|
||||||
delete $1;
|
delete $1;
|
||||||
}
|
}
|
||||||
|
@ -197,13 +205,13 @@ WORD
|
||||||
qualquote:
|
qualquote:
|
||||||
QUOTED
|
QUOTED
|
||||||
{
|
{
|
||||||
// cerr << "QUOTED[" << *$1 << "]" << endl;
|
LOGP("QUOTED[" << *$1 << "]" << endl);
|
||||||
$$ = new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0);
|
$$ = new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0);
|
||||||
delete $1;
|
delete $1;
|
||||||
}
|
}
|
||||||
| QUOTED QUALIFIERS
|
| QUOTED QUALIFIERS
|
||||||
{
|
{
|
||||||
// cerr << "QUOTED[" << *$1 << "] QUALIFIERS[" << *$2 << "]" << endl;
|
LOGP("QUOTED[" << *$1 << "] QUALIFIERS[" << *$2 << "]" << endl);
|
||||||
Rcl::SearchDataClauseDist *cl =
|
Rcl::SearchDataClauseDist *cl =
|
||||||
new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0);
|
new Rcl::SearchDataClauseDist(Rcl::SCLT_PHRASE, *$1, 0);
|
||||||
qualify(cl, *$2);
|
qualify(cl, *$2);
|
||||||
|
@ -318,8 +326,9 @@ static int parseString(WasaParserDriver *d, yy::parser::semantic_type *yylval)
|
||||||
break;
|
break;
|
||||||
case '"':
|
case '"':
|
||||||
/* End of string. Look for qualifiers */
|
/* End of string. Look for qualifiers */
|
||||||
while ((c = d->GETCHAR()) && !isspace(c))
|
while ((c = d->GETCHAR()) && (isalnum(c) || c == '.'))
|
||||||
d->qualifiers().push_back(c);
|
d->qualifiers().push_back(c);
|
||||||
|
d->UNGETCHAR(c);
|
||||||
goto out;
|
goto out;
|
||||||
default:
|
default:
|
||||||
value->push_back(c);
|
value->push_back(c);
|
||||||
|
|
|
@ -91,11 +91,11 @@ bool SearchData::maybeAddAutoPhrase(Rcl::Db& db, double freqThreshold)
|
||||||
|
|
||||||
string field;
|
string field;
|
||||||
vector<string> words;
|
vector<string> words;
|
||||||
// Walk the clause list. If we find any non simple clause or different
|
// Walk the clause list. If this is not an AND list, we find any
|
||||||
// field names, bail out.
|
// non simple clause or different field names, bail out.
|
||||||
for (qlist_it_t it = m_query.begin(); it != m_query.end(); it++) {
|
for (qlist_it_t it = m_query.begin(); it != m_query.end(); it++) {
|
||||||
SClType tp = (*it)->m_tp;
|
SClType tp = (*it)->m_tp;
|
||||||
if (tp != SCLT_AND && tp != SCLT_OR) {
|
if (tp != SCLT_AND) {
|
||||||
LOGDEB2(("SearchData::maybeAddAutoPhrase: wrong tp %d\n", tp));
|
LOGDEB2(("SearchData::maybeAddAutoPhrase: wrong tp %d\n", tp));
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
|
@ -121,10 +121,10 @@ subdirectory, because of all the places they're referred from
|
||||||
|
|
||||||
<p><a href="recoll-1.20.6.tar.gz">recoll-1.20.6.tar.gz</a>.</p>
|
<p><a href="recoll-1.20.6.tar.gz">recoll-1.20.6.tar.gz</a>.</p>
|
||||||
|
|
||||||
<h3>Release 1.21.0</h3>
|
<h3>Release 1.21.1</h3>
|
||||||
|
|
||||||
<p>Not the right choice if you are after complete stability:
|
<p>Not the right choice if you are after complete stability:
|
||||||
<a href="recoll-1.21.0.tar.gz">recoll-1.21.0.tar.gz</a>. See what's
|
<a href="recoll-1.21.1.tar.gz">recoll-1.21.1.tar.gz</a>. See what's
|
||||||
new in the <a href="release-1.21.html">release notes</a>.</p>
|
new in the <a href="release-1.21.html">release notes</a>.</p>
|
||||||
|
|
||||||
<!--
|
<!--
|
||||||
|
|
|
@ -7,12 +7,12 @@
|
||||||
|
|
||||||
== Introduction
|
== Introduction
|
||||||
|
|
||||||
Recoll is a big process which executes many others, mostly for extracting
|
The Recoll indexer, *recollindex*, is a big process which executes many
|
||||||
text from documents. Some of the executed processes are quite short-lived,
|
others, mostly for extracting text from documents. Some of the executed
|
||||||
and the time used by the process execution machinery can actually dominate
|
processes are quite short-lived, and the time used by the process execution
|
||||||
the time used to translate data. This document explores possible approaches
|
machinery can actually dominate the time used to translate data. This
|
||||||
to improving performance without adding excessive complexity or damaging
|
document explores possible approaches to improving performance without
|
||||||
reliability.
|
adding excessive complexity or damaging reliability.
|
||||||
|
|
||||||
Studying fork/exec performance is not exactly a new venture, and there are
|
Studying fork/exec performance is not exactly a new venture, and there are
|
||||||
many texts which address the subject. While researching, though, I found
|
many texts which address the subject. While researching, though, I found
|
||||||
|
@ -32,9 +32,10 @@ identical processes.
|
||||||
space initialized from an executable file, inheriting some of the resources
|
space initialized from an executable file, inheriting some of the resources
|
||||||
under various conditions.
|
under various conditions.
|
||||||
|
|
||||||
As processes became bigger the copy-before-discard operation wasted
|
This was all fine with the small processes of the first Unix systems, but
|
||||||
significant resources, and was optimized using two methods (at very
|
as time progressed, processes became bigger and the copy-before-discard
|
||||||
different points in time):
|
operation was found to waste significant resources. It was optimized using
|
||||||
|
two methods (at very different points in time):
|
||||||
|
|
||||||
- The first approach was to supplement +fork()+ with the +vfork()+ call, which
|
- The first approach was to supplement +fork()+ with the +vfork()+ call, which
|
||||||
is similar but does not duplicate the address space: the new process
|
is similar but does not duplicate the address space: the new process
|
||||||
|
@ -176,7 +177,7 @@ a single thread, and +fork()+ if it ran multiple ones.
|
||||||
After another careful look at the code, I could see few issues with
|
After another careful look at the code, I could see few issues with
|
||||||
using +vfork()+ in the multithreaded indexer, so this was committed.
|
using +vfork()+ in the multithreaded indexer, so this was committed.
|
||||||
|
|
||||||
The only change necessary was to get rid on an implementation of the
|
The only change necessary was to get rid of an implementation of the
|
||||||
lacking Linux +closefrom()+ call (used to close all open descriptors above a
|
lacking Linux +closefrom()+ call (used to close all open descriptors above a
|
||||||
given value). The previous Recoll implementation listed the +/proc/self/fd+
|
given value). The previous Recoll implementation listed the +/proc/self/fd+
|
||||||
directory to look for open descriptors but this was unsafe because of of
|
directory to look for open descriptors but this was unsafe because of of
|
||||||
|
@ -200,13 +201,14 @@ same times as the +fork()+/+vfork()+ options.
|
||||||
|
|
||||||
The tests were performed on an Intel Core i5 750 (4 cores, 4 threads).
|
The tests were performed on an Intel Core i5 750 (4 cores, 4 threads).
|
||||||
|
|
||||||
The last line is just for the fun: *recollindex* 1.18 (single-threaded)
|
|
||||||
needed almost 6 times as long to process the same files...
|
|
||||||
|
|
||||||
It would be painful to play it safe and discard the 60% reduction in
|
It would be painful to play it safe and discard the 60% reduction in
|
||||||
execution time offered by using +vfork()+.
|
execution time offered by using +vfork()+, so this was adopted for Recoll
|
||||||
|
1.21. To this day, no problems were discovered, but, still crossing
|
||||||
|
fingers...
|
||||||
|
|
||||||
To this day, no problems were discovered, but, still crossing fingers...
|
The last line in the table is just for the fun: *recollindex* 1.18
|
||||||
|
(single-threaded) needed almost 6 times as long to process the same
|
||||||
|
files...
|
||||||
|
|
||||||
////
|
////
|
||||||
Objections to vfork:
|
Objections to vfork:
|
||||||
|
|
|
@ -1,7 +1,7 @@
|
||||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
||||||
<html>
|
<html>
|
||||||
<head>
|
<head>
|
||||||
<title>Recoll 1.20 series release notes</title>
|
<title>Recoll 1.21 series release notes</title>
|
||||||
<meta name="Author" content="Jean-Francois Dockes">
|
<meta name="Author" content="Jean-Francois Dockes">
|
||||||
<meta name="Description"
|
<meta name="Description"
|
||||||
content="recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine">
|
content="recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine">
|
||||||
|
@ -23,7 +23,7 @@
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
<div class="content">
|
<div class="content">
|
||||||
<h1>Release notes for Recoll 1.20.x</h1>
|
<h1>Release notes for Recoll 1.21.x</h1>
|
||||||
|
|
||||||
<h2>Caveats</h2>
|
<h2>Caveats</h2>
|
||||||
|
|
||||||
|
@ -55,8 +55,23 @@
|
||||||
see the manual</a>). If you do so, you must then reset the
|
see the manual</a>). If you do so, you must then reset the
|
||||||
index.</p>
|
index.</p>
|
||||||
|
|
||||||
|
<h2>Minor releases</h2>
|
||||||
|
<ul>
|
||||||
|
<li>1.21.1:
|
||||||
|
<ul>
|
||||||
|
<li>Force memory usage limits on external filters.</li>
|
||||||
|
<li>GUI: add Ctrl+l as a shortcut to return focus to the
|
||||||
|
search entry (compat with web browsers).</li>
|
||||||
|
<li>result list popup allows saving results from web cache
|
||||||
|
to files.</li>
|
||||||
|
<li>The web history indexer also processes non-html files
|
||||||
|
(e.g.: pdfs).</li>
|
||||||
|
|
||||||
|
</ul>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
<h2>Changes in Recoll 1.21</h2>
|
<h2>Changes in Recoll 1.21.0</h2>
|
||||||
|
|
||||||
<ul>
|
<ul>
|
||||||
<li>Allow saving queries to files and reloading them
|
<li>Allow saving queries to files and reloading them
|
||||||
|
@ -71,9 +86,10 @@
|
||||||
<li>Improve indexing speed by always using vfork() for
|
<li>Improve indexing speed by always using vfork() for
|
||||||
spawning external commands.</li>
|
spawning external commands.</li>
|
||||||
<li>The pdf filter gains the capability to run OCR (tesseract) on
|
<li>The pdf filter gains the capability to run OCR (tesseract) on
|
||||||
image-only files.</li>
|
image-only files. This happens automatically on image-only
|
||||||
<li>Improved check about when we should try to uncompress
|
pdfs if tesseract is available.</li>
|
||||||
stuff. Will eliminate some of the most dreadful case of
|
<li>Improved checks about when we should try to uncompress
|
||||||
|
stuff. Will eliminate some of the most dreadful cases of
|
||||||
recollindex having an impact on system performance.</li>
|
recollindex having an impact on system performance.</li>
|
||||||
<li>Warn if non-existent paths are listed in the configuration
|
<li>Warn if non-existent paths are listed in the configuration
|
||||||
file (help with typos).</li>
|
file (help with typos).</li>
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue