Use structured comments in recoll.conf and use them to generate the docbook and man page texts

This commit is contained in:
Jean-Francois Dockes 2016-05-26 18:20:09 +02:00
parent a1a2bbf952
commit 8200bb78d2
6 changed files with 2376 additions and 2267 deletions

View file

@ -54,315 +54,565 @@ Where values are lists, white space is used for separation, and elements with
embedded spaces can be quoted with double-quotes. embedded spaces can be quoted with double-quotes.
.SH OPTIONS .SH OPTIONS
.TP .TP
.BI "topdirs = " directories .BI "topdirs = "string
Specifies the list of directories to index (recursively). Space-separated list of files or
directories to recursively index. Default to ~ (indexes
$HOME). You can use symbolic links in the list, they will be followed,
independantly of the value of the followLinks variable.
.TP .TP
.BI "skippedNames = " patterns .BI "skippedNames = "string
A space-separated list of patterns for names of files or directories that Files and directories which should be ignored.
should be completely ignored. The list defined in the default file is: White space separated list of wildcard patterns (simple ones, not paths,
.sp must contain no / ), which will be tested against file and directory
.nf names. The list in the default configuration does not exclude hidden
*~ #* bin CVS Cache caughtspam tmp directories (names beginning with a dot), which means that it may index
quite a few things that you do not want. On the other hand, email user
agents like Thunderbird usually store messages in hidden directories, and
you probably want this indexed. One possible solution is to have '.*' in
'skippedNames', and add things like '~/.thunderbird' '~/.evolution' to
'topdirs'. Not even the file names are indexed for patterns in this
list, see the 'noContentSuffixes' variable for an alternative approach
which indexes the file names. Can be redefined for any
subtree.
.TP
.BI "noContentSuffixes = "string
List of name endings (not necessarily dot-separated suffixes) for
which we don't try MIME type identification, and don't uncompress or
index content. Only the names will be indexed. This
complements the now obsoleted recoll_noindex list from the mimemap file,
which will go away in a future release (the move from mimemap to
recoll.conf allows editing the list through the GUI). This is different
from skippedNames because these are name ending matches only (not
wildcard patterns), and the file name itself gets indexed normally. This
can be redefined for subdirectories.
.TP
.BI "skippedPaths = "string
Paths we should not go into. Space-separated list of
wildcard expressions for filesystem paths. Can contain files and
directories. The database and configuration directories will
automatically be added. The expressions are matched using 'fnmatch(3)'
with the FNM_PATHNAME flag set by default. This means that '/' characters
must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0
to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match
'/dir1/dir2/dir3'). The default value contains the usual mount point for
removable media to remind you that it is a bad idea to have Recoll work
on these (esp. with the monitor: media gets indexed on mount, all data
gets erased on unmount). Explicitely adding '/media/xxx' to the topdirs
will override this.
.TP
.BI "skippedPathsFnmPathname = "bool
Set to 0 to
override use of FNM_PATHNAME for matching skipped
paths.
.TP
.BI "daemSkippedPaths = "string
skippedPaths equivalent specific to
real time indexing. This enables having parts of the tree
which are initially indexed but not monitored. If daemSkippedPaths is
not set, the daemon uses skippedPaths.
.TP
.BI "zipSkippedNames = "string
Space-separated list of wildcard expressions for names that should
be ignored inside zip archives. This is used directly by
the zip handler, and has a function similar to skippedNames, but works
independantly. Can be redefined for subdirectories. Supported by recoll
1.20 and newer. See
https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members
.fi
The list can be redefined for subdirectories, but is only actually changed
for the top level ones in
.I topdirs
.TP .TP
.BI "skippedPaths = " patterns .BI "followLinks = "bool
A space-separated list of patterns for paths the indexer should not descend Follow symbolic links during
into. Together with topdirs, this allows pruning the indexed tree to one's indexing. The default is to ignore symbolic links to avoid
content. multiple indexing of linked files. No effort is made to avoid duplication
.B daemSkippedPaths when this option is set to true. This option can be set individually for
can be used to define a specific value for the real time indexing monitor. each of the 'topdirs' members by using sections. It can not be changed
below the 'topdirs' level. Links in the 'topdirs' list itself are always
followed.
.TP .TP
.BI "skippedPathsFnmPathname = " 0/1 .BI "indexedmimetypes = "string
The values in the *skippedPaths variables are matched by default with Restrictive list of
fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags. This means indexed mime types. Normally not set (in which case all
that '/' characters must be matched explicitly. You can set supported types are indexed). If it is set,
skippedPathsFnmPathname to 0 to disable the use of FNM_PATHNAME (meaning only the types from the list will have their contents indexed. The names
that /*/dir3 will match /dir1/dir2/dir3). will be indexed anyway if indexallfilenames is set (default). MIME
type names should be taken from the mimemap file. Can be redefined for
subtrees.
.TP .TP
.BI "followLinks = " boolean .BI "excludedmimetypes = "string
Specifies if the indexer should follow List of excluded MIME
symbolic links while walking the file tree. The default is types. Lets you exclude some types from indexing. Can be
to ignore symbolic links to avoid multiple indexing of redefined for subtrees.
linked files. No effort is made to avoid duplication when
this option is set to true. This option can be set
individually for each of the
.I topdirs
members by using sections. It can not be changed below the
.I topdirs
level.
.TP .TP
.BI "indexedmimetypes = " list .BI "compressedfilemaxkbs = "int
Recoll normally indexes any file which it knows how to read. This list lets Size limit for compressed
you restrict the indexed mime types to what you specify. If the variable is files. We need to decompress these in a
unspecified or the list empty (the default), all supported types are temporary directory for identification, which can be wasteful in some
processed. cases. Limit the waste. Negative means no limit. 0 results in no
processing of any compressed file. Default 50 MB.
.TP .TP
.BI "compressedfilemaxkbs = " value .BI "textfilemaxmbs = "int
Size limit for compressed (.gz or .bz2) files. These need to be Size limit for text
decompressed in a temporary directory for identification, which can be very files. Mostly for skipping monster
wasteful if 'uninteresting' big compressed files are present. Negative logs. Default 20 MB.
means no limit, 0 means no processing of any compressed file. Defaults
to \-1.
.TP .TP
.BI "textfilemaxmbs = " value .BI "indexallfilenames = "bool
Maximum size for text files. Very big text files are often uninteresting Index the file names of
logs. Set to \-1 to disable (default 20MB). unprocessed files Index the names of files the contents of
which we don't index because of an excluded or unsupported MIME
type.
.TP .TP
.BI "textfilepagekbs = " value .BI "usesystemfilecommand = "bool
If this is set to other than \-1, text files will be indexed as multiple Use a system command
documents of the given page size. This may be useful if you do want to for file MIME type guessing as a final step in file type
index very big text files as it will both reduce memory usage at index time identification This is generally useful, but will usually
and help with loading data to the preview window. A size of a few megabytes cause the indexing of many bogus 'text' files. See 'systemfilecommand'
would seem reasonable (default: 1000 : 1MB). for the command used.
.TP .TP
.BI "membermaxkbs = " "value in kilobytes" .BI "systemfilecommand = "string
This defines the maximum size for an archive member (zip, tar or rar at Command used to guess
the moment). Bigger entries will be skipped. Current default: 50000 (50 MB). MIME types if the internal methods fails This should be a
"file -i" workalike. The file path will be added as a last parameter to
the command line. 'xdg-mime' works better than the traditional 'file'
command, and is now the configured default (with a hard-coded fallback to
'file')
.TP .TP
.BI "indexallfilenames = " boolean .BI "processwebqueue = "bool
Recoll indexes file names into a special section of the database to allow Decide if we process the
specific file names searches using wild cards. This parameter decides if Web queue. The queue is a directory where the Recoll Web
file name indexing is performed only for files with mime types that would browser plugins create the copies of visited pages.
qualify them for full text indexing, or for all files inside
the selected subtrees, independent of mime type.
.TP .TP
.BI "usesystemfilecommand = " boolean .BI "textfilepagekbs = "int
Decide if we use the Page size for text
.B "file \-i" files. If this is set, text/plain files will be divided
system command as a final step for determining the mime type for a file into documents of approximately this size. Will reduce memory usage at
(the main procedure uses suffix associations as defined in the index time and help with loading data in the preview window at query
.B mimemap time. Particularly useful with very big files, such as application or
file). This can be useful for files with suffixless names, but it will system logs. Also see textfilemaxmbs and
also cause the indexing of many bogus "text" files. compressedfilemaxkbs.
.TP .TP
.BI "processbeaglequeue = " 0/1 .BI "membermaxkbs = "int
If this is set, process the directory where Beagle Web browser plugins copy Size limit for archive
visited pages for indexing. Of course, Beagle MUST NOT be running, else members. This is passed to the filters in the environment
things will behave strangely. as RECOLL_FILTER_MAXMEMBERKB.
.TP .TP
.BI "beaglequeuedir = " directory path .BI "indexStripChars = "bool
The path to the Beagle indexing queue. This is hard-coded in the Beagle Decide if we store
plugin as ~/.beagle/ToIndex so there should be no need to change it. character case and diacritics in the index. If we do,
.TP searches sensitive to case and diacritics can be performed, but the index
.BI "indexStripChars = " 0/1 will be bigger, and some marginal weirdness may sometimes occur. The
Decide if we strip characters of diacritics and convert them to lower-case default is a stripped index. When using multiple indexes for a search,
before terms are indexed. If we don't, searches sensitive to case and
diacritics can be performed, but the index will be bigger, and some
marginal weirdness may sometimes occur. The default is a stripped index
(indexStripChars = 1) for now. When using multiple indexes for a search,
this parameter must be defined identically for all. Changing the value this parameter must be defined identically for all. Changing the value
implies an index reset. implies an index reset.
.TP .TP
.BI "maxTermExpand = " value .BI "nonumbers = "bool
Maximum expansion count for a single term (e.g.: when using wildcards). The Decides if terms will be
default of 10000 is reasonable and will avoid queries that appear frozen generated for numbers. For example "123", "1.5e6",
while the engine is walking the term list. 192.168.1.4, would not be indexed if nonumbers is set ("value123" would
still be). Numbers are often quite interesting to search for, and this
should probably not be set except for special situations, ie, scientific
documents with huge amounts of numbers in them, where setting nonumbers
will reduce the index size. This can only be set for a whole index, not
for a subtree.
.TP .TP
.BI "maxXapianClauses = " value .BI "dehyphenate = "bool
Maximum number of elementary clauses we can add to a single Xapian Determines if we index
query. In some cases, the result of term expansion can be multiplicative, 'coworker' also when the input is 'co-worker'. This is new
and we want to avoid using excessive memory. The default of 100 000 should in version 1.22, and on by default. Setting the variable to off allows
be both high enough in most cases and compatible with current typical restoring the previous behaviour.
hardware configurations.
.TP .TP
.BI "nonumbers = " 0/1 .BI "nocjk = "bool
If this set to true, no terms will be generated for numbers. For example Decides if specific East Asian
"123", "1.5e6", 192.168.1.4, would not be indexed ("value123" would still (Chinese Korean Japanese) characters/word splitting is turned
be). Numbers are often quite interesting to search for, and this should off. This will save a small amount of CPU if you have no CJK
probably not be set except for special situations, ie, scientific documents documents. If your document base does include such text but you are not
with huge amounts of numbers in them. This can only be set for a whole interested in searching it, setting nocjk may be a
index, not for a subtree. significant time and space saver.
.TP .TP
.BI "nocjk = " boolean .BI "cjkngramlen = "int
If this set to true, specific east asian (Chinese Korean Japanese) This lets you adjust the size of
characters/word splitting is turned off. This will save a small amount of n-grams used for indexing CJK text. The default value of 2 is
cpu if you have no CJK documents. If your document base does include such probably appropriate in most cases. A value of 3 would allow more precision
text but you are not interested in searching it, setting and efficiency on longer words, but the index will be approximately twice
.I nocjk as large.
may be a significant time and space saver.
.TP .TP
.BI "cjkngramlen = " value .BI "indexstemminglanguages = "string
This lets you adjust the size of n-grams used for indexing CJK text. The Languages for which to create stemming expansion
default value of 2 is probably appropriate in most cases. A value of 3 data. Stemmer names can be found by executing 'recollindex
would allow more precision and efficiency on longer words, but the index -l', or this can also be set from a list in the GUI.
will be approximately twice as large.
.TP .TP
.BI "indexstemminglanguages = " languages .BI "defaultcharset = "string
A list of languages for which the stem expansion databases will be Default character
built. See recollindex(1) for possible values. set. This is used for files which do not contain a
character set definition (e.g.: text/plain). Values found inside files,
e.g. a 'charset' tag in HTML documents, will override it. If this is not
set, the default character set is the one defined by the NLS environment
($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact).
If for some reason you want a general default which does not match your
LANG and is not 8859-1, use this variable. This can be redefined for any
sub-directory.
.TP .TP
.BI "defaultcharset = " charset .BI "unac_except_trans = "string
The name of the character set used for files that do not contain a A list of characters,
character set definition (ie: plain text files). This can be redefined for encoded in UTF-8, which should be handled specially
any subdirectory. when converting text to unaccented lowercase. For
example, in Swedish, the letter a with diaeresis has full alphabet
citizenship and should not be turned into an a.
Each element in the space-separated list has the special character as
first element and the translation following. The handling of both the
lowercase and upper-case versions of a character should be specified, as
appartenance to the list will turn-off both standard accent and case
processing. The value is global and affects both indexing and querying.
Examples:
Swedish:
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
. German:
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
In French, you probably want to decompose oe and ae and nobody would type
a German ß
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
. The default for all until someone protests follows. These decompositions
are not performed by unac, but it is unlikely that someone would type the
composed forms in a search.
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
.TP .TP
.BI "unac_except_trans = " "list of utf-8 groups" .BI "maildefcharset = "string
This is a list of characters, encoded in UTF-8, which should be handled Overrides the default
specially when converting text to unaccented lowercase. For example, in character set for email messages which don't specify
Swedish, the letter "a with diaeresis" has full alphabet citizenship and one. This is mainly useful for readpst (libpst) dumps,
should not be turned into an a. which are utf-8 but do not say so.
.br
Each element in the space-separated list has the special character as first
element and the translation following. The handling of both the lowercase
and upper-case versions of a character should be specified, as appartenance
to the list will turn-off both standard accent and case processing.
.br
Note that the translation is not limited to a single character.
.br
This parameter cannot be redefined for subdirectories, it is global,
because there is no way to do otherwise when querying. If you have document
sets which would need different values, you will have to index and query
them separately.
.TP .TP
.BI "maildefcharset = " character set name .BI "localfields = "string
This can be used to define the default character set specifically for email Set fields on all files
messages which don't specify it. This is mainly useful for readpst (libpst) (usually of a specific fs area). Syntax is the usual:
dumps, which are utf-8 but do not say so. name = value ; attr1 = val1 ; [...]
value is empty so this needs an initial semi-colon. This is useful, e.g.,
for setting the rclaptg field for application selection inside
mimeview.
.TP .TP
.BI "localfields = " "fieldname = value:..." .BI "testmodifusemtime = "bool
This allows setting fields for all documents under a given Use mtime instead of
directory. Typical usage would be to set an "rclaptg" field, to be used in ctime to test if a file has been modified. The time is used
mimeview to select a specific viewer. If several fields are to be set, they in addition to the size, which is always used.
should be separated with a colon (':') character (which there is currently Setting this can reduce re-indexing on systems where extended attributes
no way to escape). Ie: localfields= rclaptg=gnus:other = val, then select are used (by some other application), but not indexed, because changing
specifier viewer with mimetype|tag=... in mimeview. extended attributes only affects ctime.
Notes:
- This may prevent detection of change in some marginal file rename cases
(the target would need to have the same size and mtime).
- You should probably also set noxattrfields to 1 in this case, except if
you still prefer to perform xattr indexing, for example if the local
file update pattern makes it of value (as in general, there is a risk
for pure extended attributes updates without file modification to go
undetected). Perform a full index reset after changing this.
.TP .TP
.BI "dbdir = " directory .BI "noxattrfields = "bool
The name of the Xapian database directory. It will be created if needed Disable extended attributes
when the database is initialized. If this is not an absolute pathname, it conversion to metadata fields. This probably needs to be
will be taken relative to the configuration directory. set if testmodifusemtime is set.
.TP .TP
.BI "idxstatusfile = " "file path" .BI "metadatacmds = "string
The name of the scratch file where the indexer process updates its Define commands to
status. Default: idxstatus.txt inside the configuration directory. gather external metadata, e.g. tmsu tags.
There can be several entries, separated by semi-colons, each defining
which field name the data goes into and the command to use. Don't forget the
initial semi-colon. All the field names must be different. You can use
aliases in the "field" file if necessary.
As a not too pretty hack conceded to convenience, any field name
beginning with "rclmulti" will be taken as an indication that the command
returns multiple field values inside a text blob formatted as a recoll
configuration file ("fieldname = fieldvalue" lines). The rclmultixx name
will be ignored, and field names and values will be parsed from the data.
Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
.TP .TP
.BI "maxfsoccuppc = " percentnumber .BI "cachedir = "dfn
Maximum file system occupation before we Top directory for Recoll data. Recoll data
stop indexing. The value is a percentage, corresponding to directories are normally located relative to the configuration directory
what the "Capacity" df output column shows. The default (e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set, the
directories are stored under the specified value instead (e.g. if
cachedir is ~/.cache/recoll, the default dbdir would be
~/.cache/recoll/xapiandb). This affects dbdir, webcachedir,
mboxcachedir, aspellDicDir, which can still be individually specified to
override cachedir. Note that if you have multiple configurations, each
must have a different cachedir, there is no automatic computation of a
subpath under cachedir.
.TP
.BI "maxfsoccuppc = "int
Maximum file system occupation
over which we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default
value is 0, meaning no checking. value is 0, meaning no checking.
.TP .TP
.BI "mboxcachedir = " "directory path" .BI "xapiandb = "dfn
The directory where mbox message offsets cache files are held. This is Xapian database directory
normally $RECOLL_CONFDIR/mboxcache, but it may be useful to share a location. This will be created on first indexing. If the
directory between different configurations. value is not an absolute path, it will be interpreted as relative to
cachedir if set, or the configuration directory (-c argument or
$RECOLL_CONFDIR). If nothing is specified, the default is then
~/.recoll/xapiandb/
.TP .TP
.BI "mboxcacheminmbs = " "value in megabytes" .BI "idxstatusfile = "fn
The minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The default is 5 MB. Name of the scratch file where the indexer process updates its
status. Default: idxstatus.txt inside the configuration
directory.
.TP .TP
.BI "webcachedir = " "directory path" .BI "mboxcachedir = "dfn
This is only used by the Beagle web browser plugin indexing code, and Directory location for storing mbox message offsets cache
defines where the cache for visited pages will live. Default: files. This is normally 'mboxcache' under cachedir if set,
or else under the configuration directory, but it may be useful to share
a directory between different configurations.
.TP
.BI "mboxcacheminmbs = "int
Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The
default is 5 MB.
.TP
.BI "webcachedir = "dfn
Directory where we store the archived web pages. This is only used by the web history indexing code
Default: cachedir/webcache if cachedir is set, else
$RECOLL_CONFDIR/webcache $RECOLL_CONFDIR/webcache
.TP .TP
.BI "webcachemaxmbs = " "value in megabytes" .BI "webcachemaxmbs = "int
This is only used by the Beagle web browser plugin indexing code, and Maximum size in MB of the Web archive. This is only used by the web history indexing code.
defines the maximum size for the web page cache. Default: 40 MB. Default: 40 MB.
Reducing the size will not physically truncate the file.
.TP .TP
.BI "idxflushmb = " megabytes .BI "webqueuedir = "fn
Threshold (megabytes of new text data) The path to the Web indexing queue. This is
where we flush from memory to disk index. Setting this can hard-coded in the plugin as ~/.recollweb/ToIndex so there should be no
help control memory usage. A value of 0 means no explicit need or possibility to change it.
flushing, letting Xapian use its own default, which is
flushing every 10000 documents (or XAPIAN_FLUSH_THRESHOLD), meaning that
memory usage depends on average document size. The default value is 10.
.TP .TP
.BI "autodiacsens = " 0/1 .BI "aspellDicDir = "dfn
IF the index is not stripped, decide if we automatically trigger diacritics Aspell dictionary storage directory location. The
sensitivity if the search term has accented characters (not in aspell dictionary (aspdict.(lang).rws) is normally stored in the
unac_except_trans). Else you need to use the query language and the D directory specified by cachedir if set, or under the configuration
modifier to specify diacritics sensitivity. Default is no. directory.
.TP .TP
.BI "autocasesens = " 0/1 .BI "filtersdir = "dfn
IF the index is not stripped, decide if we automatically trigger character Directory location for executable input handlers. If
case sensitivity if the search term has upper-case characters in any but RECOLL_FILTERSDIR is set in the environment, we use it instead. Defaults
the first position. Else you need to use the query language and the C to $prefix/share/recoll/filters. Can be redefined for
modifier to specify character-case sensitivity. Default is yes. subdirectories.
.TP .TP
.BI "loglevel = " value .BI "iconsdir = "dfn
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of Directory location for icons. The only reason to
debug/information messages. 3 lists only errors. change this would be if you want to change the icons displayed in the
.B daemloglevel result list. Defaults to $prefix/share/recoll/images
can be used to specify a different value for the real-time indexing daemon.
.TP .TP
.BI "logfilename = " file .BI "idxflushmb = "int
Where should the messages go. 'stderr' can be used as a special value. Threshold (megabytes of new data) where we flush from memory to
.B daemlogfilename disk index. Setting this allows some control over memory
can be used to specify a different value for the real-time indexing daemon. usage by the indexer process. A value of 0 means no explicit flushing,
which lets Xapian perform its own thing, meaning flushing every
$XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
usage depends on average document size, not only document count, the
Xapian approach is is not very useful, and you should let Recoll manage
the flushes. The default value of idxflushmb is 10 MB, and may be a bit
low. If you are looking for maximum speed, you may want to experiment
with values between 20 and
80. In my experience, values beyond 100 are always counterproductive. If
you find otherwise, please drop me a note.
.TP .TP
.BI "mondelaypatterns = " "list of patterns" .BI "filtermaxseconds = "int
This allows specify wildcard path patterns (processed with fnmatch(3) with Maximum external filter execution time in
0 flag), to match files which change too often and for which a delay should seconds. Default 1200 (20mn). Set to 0 for no limit. This
be observed before re-indexing. This is a space-separated list, each entry is mainly to avoid infinite loops in postscript files
being a pattern and a time in seconds, separated by a colon. You can use (loop.ps)
double quotes if a path entry contains white space. Example:
.sp
mondelaypatterns = *.log:20 "this one has spaces*:10"
.TP .TP
.BI "monixinterval = " "value in seconds .BI "filtermaxmbytes = "int
Minimum interval (seconds) for processing the indexing queue. The real time Maximum virtual memory space for filter processes
monitor does not process each event when it comes in, but will wait this (setrlimit(RLIMIT_AS)), in megabytes. Note that this
time for the queue to accumulate to diminish overhead and in order to includes any mapped libs (there is no reliable Linux way to limit the
aggregate multiple events to the same file. Default 30 S. data space only), so we need to be a bit generous here. Anything over
2000 will be ignored on 32 bits machines.
.TP .TP
.BI "monauxinterval = " "value in seconds .BI "thrQSizes = "string
Period (in seconds) at which the real time monitor will regenerate the Stage input queues configuration. There are three
auxiliary databases (spelling, stemming) if needed. The default is one internal queues in the indexing pipeline stages (file data extraction,
hour. terms generation, index update). This parameter defines the queue depths
for each stage (three integer values). If a value of -1 is given for a
given stage, no queue is used, and the thread will go on performing the
next stage. In practise, deep queues have not been shown to increase
performance. Default: a value of 0 for the first queue tells Recoll to
perform autoconfiguration based on the detected number of CPUs (no need
for the two other values in this case). Use thrQSizes = -1 -1 -1 to
disable multithreading entirely.
.TP .TP
.BI "monioniceclass, monioniceclassdata" .BI "thrTCounts = "string
These allow defining the ionice class and data used by the indexer (default Number of threads used for each indexing stage. The
class 3, no data). three stages are: file data extraction, terms generation, index
update). The use of the counts is also controlled by some special values
in thrQSizes: if the first queue depth is 0, all counts are ignored
(autoconfigured); if a value of -1 is used for a queue depth, the
corresponding thread count is ignored. It makes no sense to use a value
other than 1 for the last stage because updating the Xapian index is
necessarily single-threaded (and protected by a mutex).
.TP .TP
.BI "filtermaxseconds = " "value in seconds" .BI "loglevel = "int
Maximum filter execution time, after which it is aborted. Some postscript Log file verbosity 1-6. A value of 2 will print
programs just loop... only errors and warnings. 3 will print information like document updates,
4 is quite verbose and 6 very verbose.
.TP .TP
.BI "filtersdir = " directory .BI "logfilename = "fn
A directory to search for the external filter scripts used to index some Log file destination. Use 'stderr' (default) to write to the
types of files. The value should not be changed, except if you want to console.
modify one of the default scripts. The value can be redefined for any
subdirectory.
.TP .TP
.BI "iconsdir = " directory .BI "idxloglevel = "int
The name of the directory where Override loglevel for the indexer.
.B recoll
result list icons are stored. You can change this if you want different
images.
.TP .TP
.BI "idxabsmlen = " value .BI "idxlogfilename = "fn
Recoll stores an abstract for each indexed file inside the database. The Override logfilename for the indexer.
text can come from an actual 'abstract' section in the document or will .TP
just be the beginning of the document. It is stored in the index so that it .BI "daemloglevel = "int
can be displayed inside the result lists without decoding the original Override loglevel for the indexer in real time
file. The mode. The default is to use the idx... values if set, else
.I idxabsmlen the log... values.
parameter defines the size of the stored abstract. The default value is 250 .TP
bytes. The search interface gives you the choice to display this stored .BI "daemlogfilename = "fn
Override logfilename for the indexer in real time
mode. The default is to use the idx... values if set, else
the log... values.
.TP
.BI "idxrundir = "dfn
Indexing process current directory. The input
handlers sometimes leave temporary files in the current directory, so it
makes sense to have recollindex chdir to some temporary directory. If the
value is empty, the current directory is not changed. If the
value is (literal) tmp, we use the temporary directory as set by the
environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an
absolute path to a directory, we go there.
.TP
.BI "checkneedretryindexscript = "fn
Script used to heuristically check if we need to retry indexing
files which previously failed. The default script checks
the modified dates on /usr/bin and /usr/local/bin. A relative path will
be looked up in the filters dirs, then in the path. Use an absolute path
to do otherwise.
.TP
.BI "recollhelperpath = "string
Additional places to search for helper executables. This is only used on Windows for now.
.TP
.BI "idxabsmlen = "int
Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file.
The text can come from an actual 'abstract' section in the
document or will just be the beginning of the document. It is stored in
the index so that it can be displayed inside the result lists without
decoding the original file. The idxabsmlen parameter
defines the size of the stored abstract. The default value is 250
bytes. The search interface gives you the choice to display this stored
text or a synthetic abstract built by extracting text around the search text or a synthetic abstract built by extracting text around the search
terms. If you always prefer the synthetic abstract, you can reduce this terms. If you always prefer the synthetic abstract, you can reduce this
value and save a little space. value and save a little space.
.TP .TP
.BI "aspellLanguage = " lang .BI "idxmetastoredlen = "int
Language definitions to use when creating the aspell dictionary. The value Truncation length of stored metadata fields. This
must match a set of aspell language definition files. You can type "aspell does not affect indexing (the whole field is processed anyway), just the
config" to see where these are installed (look for data-dir). The default amount of data stored in the index for the purpose of displaying fields
if the variable is not set is to use your desktop national language inside result lists or previews. The default value is 150 bytes which
environment to guess the value. may be too low if you have custom fields.
.TP .TP
.BI "noaspell = " boolean .BI "aspellLanguage = "string
If this is set, the aspell dictionary generation is turned off. Useful for Language definitions to use when creating the aspell
cases where you don't need the functionality or when it is unusable because dictionary. The value must match a set of aspell language
aspell crashes during dictionary generation. definition files. You can type "aspell dicts" to see a list The default
if this is not set is to use the NLS environment to guess the
value.
.TP .TP
.BI "mhmboxquirks = " flags .BI "aspellAddCreateParam = "string
This allows definining location-related quirks for the mailbox Additional option and parameter to aspell dictionary creation
handler. Currently only the tbird flag is defined, and it should be set for command. Some aspell packages may need an additional option
directories which hold Thunderbird data, as their folder format is weird. (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug
772415.
.TP
.BI "aspellKeepStderr = "bool
Set this to have a look at aspell dictionary creation
errors. There are always many, so this is mostly for
debugging.
.TP
.BI "noaspell = "bool
Disable aspell use. The aspell dictionary generation
takes time, and some combinations of aspell version, language, and local
terms, result in aspell crashing, so it sometimes makes sense to just
disable the thing.
.TP
.BI "monauxinterval = "int
Auxiliary database update interval. The real time
indexer only updates the auxiliary databases (stemdb, aspell)
periodically, because it would be too costly to do it for every document
change. The default period is one hour.
.TP
.BI "monixinterval = "int
Minimum interval (seconds) between processings of the indexing
queue. The real time indexer does not process each event
when it comes in, but lets the queue accumulate, to diminish overhead and
to aggregate multiple events affecting the same file. Default 30
S.
.TP
.BI "mondelaypatterns = "string
Timing parameters for the real time indexing. Definitions for files which get a longer delay before reindexing
is allowed. This is for fast-changing files, that should only be
reindexed once in a while. A list of wildcardPattern:seconds pairs. The
patterns are matched with fnmatch(pattern, path, 0) You can quote entries
containing white space with double quotes (quote the whole entry, not the
pattern). The default is empty.
Example: mondelaypatterns = *.log:20 "*with spaces.*:30"
.TP
.BI "monioniceclass = "int
ionice class for the real time indexing process On platforms where this is supported. The default value is
3.
.TP
.BI "monioniceclassdata = "string
ionice class parameter for the real time indexing process. On platforms where this is supported. The default is
empty.
.TP
.BI "autodiacsens = "bool
auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger
diacritics sensitivity if the search term has accented characters (not in
unac_except_trans). Else you need to use the query language and the "D"
modifier to specify diacritics sensitivity. Default is no.
.TP
.BI "autocasesens = "bool
auto-trigger case sensitivity (raw index only). IF
the index is not stripped (see indexStripChars), decide if we
automatically trigger character case sensitivity if the search term has
upper-case characters in any but the first position. Else you need to use
the query language and the "C" modifier to specify character-case
sensitivity. Default is yes.
.TP
.BI "maxTermExpand = "int
Maximum query expansion count
for a single term (e.g.: when using wildcards). This only
affects queries, not indexing. We used to not limit this at all (except
for filenames where the limit was too low at 1000), but it is
unreasonable with a big index. Default 10000.
.TP
.BI "maxXapianClauses = "int
Maximum number of clauses
we add to a single Xapian query. This only affects queries,
not indexing. In some cases, the result of term expansion can be
multiplicative, and we want to avoid eating all the memory. Default
50000.
.TP
.BI "snippetMaxPosWalk = "int
Maximum number of positions we walk while populating a snippet for
the result list. The default of 1,000,000 may be
insufficient for very big documents, the consequence would be snippets
with possibly meaning-altering missing words.
.TP
.BI "pdfocr = "bool
Attempt OCR of PDF files with no text content if both tesseract and
pdftoppm are installed. The default is off because OCR is so
very slow.
.TP
.BI "pdfattach = "bool
Enable PDF attachment extraction by executing pdftk (if
available). This is
normally disabled, because it does slow down PDF indexing a bit even if
not one attachment is ever found.
.TP
.BI "mhmboxquirks = "string
Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are
stored.
.SH SEE ALSO .SH SEE ALSO
.PP .PP

View file

@ -25,7 +25,7 @@ webh:
make -C webhelp make -C webhelp
usermanual.html: usermanual.xml usermanual.html: usermanual.xml
xsltproc ${commonoptions} \ xsltproc --xinclude ${commonoptions} \
-o tmpfile.html "${XSLDIR}/html/docbook.xsl" $< -o tmpfile.html "${XSLDIR}/html/docbook.xsl" $<
-tidy -indent tmpfile.html > usermanual.html -tidy -indent tmpfile.html > usermanual.html
rm -f tmpfile.html rm -f tmpfile.html

View file

@ -0,0 +1,588 @@
<?xml version="1.0"?>
<sect2 id="RCL.INSTALL.CONFIG.RECOLLCONF">
<title>Recoll main configuration file, recoll.conf </title>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.WHATDOCS">
<title>Parameters affecting what documents we index </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS">
<term><varname>topdirs</varname></term>
<listitem><para>Space-separated list of files or
directories to recursively index. Default to ~ (indexes
$HOME). You can use symbolic links in the list, they will be followed,
independantly of the value of the followLinks variable.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
<term><varname>skippedNames</varname></term>
<listitem><para>Files and directories which should be ignored.
White space separated list of wildcard patterns (simple ones, not paths,
must contain no / ), which will be tested against file and directory
names. The list in the default configuration does not exclude hidden
directories (names beginning with a dot), which means that it may index
quite a few things that you do not want. On the other hand, email user
agents like Thunderbird usually store messages in hidden directories, and
you probably want this indexed. One possible solution is to have '.*' in
'skippedNames', and add things like '~/.thunderbird' '~/.evolution' to
'topdirs'. Not even the file names are indexed for patterns in this
list, see the 'noContentSuffixes' variable for an alternative approach
which indexes the file names. Can be redefined for any
subtree.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES">
<term><varname>noContentSuffixes</varname></term>
<listitem><para>List of name endings (not necessarily dot-separated suffixes) for
which we don't try MIME type identification, and don't uncompress or
index content. Only the names will be indexed. This
complements the now obsoleted recoll_noindex list from the mimemap file,
which will go away in a future release (the move from mimemap to
recoll.conf allows editing the list through the GUI). This is different
from skippedNames because these are name ending matches only (not
wildcard patterns), and the file name itself gets indexed normally. This
can be redefined for subdirectories.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS">
<term><varname>skippedPaths</varname></term>
<listitem><para>Paths we should not go into. Space-separated list of
wildcard expressions for filesystem paths. Can contain files and
directories. The database and configuration directories will
automatically be added. The expressions are matched using 'fnmatch(3)'
with the FNM_PATHNAME flag set by default. This means that '/' characters
must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0
to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match
'/dir1/dir2/dir3'). The default value contains the usual mount point for
removable media to remind you that it is a bad idea to have Recoll work
on these (esp. with the monitor: media gets indexed on mount, all data
gets erased on unmount). Explicitely adding '/media/xxx' to the topdirs
will override this.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME">
<term><varname>skippedPathsFnmPathname</varname></term>
<listitem><para>Set to 0 to
override use of FNM_PATHNAME for matching skipped
paths. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMSKIPPEDPATHS">
<term><varname>daemSkippedPaths</varname></term>
<listitem><para>skippedPaths equivalent specific to
real time indexing. This enables having parts of the tree
which are initially indexed but not monitored. If daemSkippedPaths is
not set, the daemon uses skippedPaths.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPSKIPPEDNAMES">
<term><varname>zipSkippedNames</varname></term>
<listitem><para>Space-separated list of wildcard expressions for names that should
be ignored inside zip archives. This is used directly by
the zip handler, and has a function similar to skippedNames, but works
independantly. Can be redefined for subdirectories. Supported by recoll
1.20 and newer. See
https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FOLLOWLINKS">
<term><varname>followLinks</varname></term>
<listitem><para>Follow symbolic links during
indexing. The default is to ignore symbolic links to avoid
multiple indexing of linked files. No effort is made to avoid duplication
when this option is set to true. This option can be set individually for
each of the 'topdirs' members by using sections. It can not be changed
below the 'topdirs' level. Links in the 'topdirs' list itself are always
followed.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXEDMIMETYPES">
<term><varname>indexedmimetypes</varname></term>
<listitem><para>Restrictive list of
indexed mime types. Normally not set (in which case all
supported types are indexed). If it is set,
only the types from the list will have their contents indexed. The names
will be indexed anyway if indexallfilenames is set (default). MIME
type names should be taken from the mimemap file. Can be redefined for
subtrees.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES">
<term><varname>excludedmimetypes</varname></term>
<listitem><para>List of excluded MIME
types. Lets you exclude some types from indexing. Can be
redefined for subtrees.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
<term><varname>compressedfilemaxkbs</varname></term>
<listitem><para>Size limit for compressed
files. We need to decompress these in a
temporary directory for identification, which can be wasteful in some
cases. Limit the waste. Negative means no limit. 0 results in no
processing of any compressed file. Default 50 MB.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEMAXMBS">
<term><varname>textfilemaxmbs</varname></term>
<listitem><para>Size limit for text
files. Mostly for skipping monster
logs. Default 20 MB.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXALLFILENAMES">
<term><varname>indexallfilenames</varname></term>
<listitem><para>Index the file names of
unprocessed files Index the names of files the contents of
which we don't index because of an excluded or unsupported MIME
type.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.USESYSTEMFILECOMMAND">
<term><varname>usesystemfilecommand</varname></term>
<listitem><para>Use a system command
for file MIME type guessing as a final step in file type
identification This is generally useful, but will usually
cause the indexing of many bogus 'text' files. See 'systemfilecommand'
for the command used.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SYSTEMFILECOMMAND">
<term><varname>systemfilecommand</varname></term>
<listitem><para>Command used to guess
MIME types if the internal methods fails This should be a
"file -i" workalike. The file path will be added as a last parameter to
the command line. 'xdg-mime' works better than the traditional 'file'
command, and is now the configured default (with a hard-coded fallback to
'file')</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PROCESSWEBQUEUE">
<term><varname>processwebqueue</varname></term>
<listitem><para>Decide if we process the
Web queue. The queue is a directory where the Recoll Web
browser plugins create the copies of visited pages.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEPAGEKBS">
<term><varname>textfilepagekbs</varname></term>
<listitem><para>Page size for text
files. If this is set, text/plain files will be divided
into documents of approximately this size. Will reduce memory usage at
index time and help with loading data in the preview window at query
time. Particularly useful with very big files, such as application or
system logs. Also see textfilemaxmbs and
compressedfilemaxkbs.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MEMBERMAXKBS">
<term><varname>membermaxkbs</varname></term>
<listitem><para>Size limit for archive
members. This is passed to the filters in the environment
as RECOLL_FILTER_MAXMEMBERKB.</para></listitem></varlistentry>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
<title>Parameters affecting how we generate terms </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTRIPCHARS">
<term><varname>indexStripChars</varname></term>
<listitem><para>Decide if we store
character case and diacritics in the index. If we do,
searches sensitive to case and diacritics can be performed, but the index
will be bigger, and some marginal weirdness may sometimes occur. The
default is a stripped index. When using multiple indexes for a search,
this parameter must be defined identically for all. Changing the value
implies an index reset.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS">
<term><varname>nonumbers</varname></term>
<listitem><para>Decides if terms will be
generated for numbers. For example "123", "1.5e6",
192.168.1.4, would not be indexed if nonumbers is set ("value123" would
still be). Numbers are often quite interesting to search for, and this
should probably not be set except for special situations, ie, scientific
documents with huge amounts of numbers in them, where setting nonumbers
will reduce the index size. This can only be set for a whole index, not
for a subtree.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEHYPHENATE">
<term><varname>dehyphenate</varname></term>
<listitem><para>Determines if we index
'coworker' also when the input is 'co-worker'. This is new
in version 1.22, and on by default. Setting the variable to off allows
restoring the previous behaviour.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
<term><varname>nocjk</varname></term>
<listitem><para>Decides if specific East Asian
(Chinese Korean Japanese) characters/word splitting is turned
off. This will save a small amount of CPU if you have no CJK
documents. If your document base does include such text but you are not
interested in searching it, setting nocjk may be a
significant time and space saver.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CJKNGRAMLEN">
<term><varname>cjkngramlen</varname></term>
<listitem><para>This lets you adjust the size of
n-grams used for indexing CJK text. The default value of 2 is
probably appropriate in most cases. A value of 3 would allow more precision
and efficiency on longer words, but the index will be approximately twice
as large.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTEMMINGLANGUAGES">
<term><varname>indexstemminglanguages</varname></term>
<listitem><para>Languages for which to create stemming expansion
data. Stemmer names can be found by executing 'recollindex
-l', or this can also be set from a list in the GUI.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEFAULTCHARSET">
<term><varname>defaultcharset</varname></term>
<listitem><para>Default character
set. This is used for files which do not contain a
character set definition (e.g.: text/plain). Values found inside files,
e.g. a 'charset' tag in HTML documents, will override it. If this is not
set, the default character set is the one defined by the NLS environment
($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact).
If for some reason you want a general default which does not match your
LANG and is not 8859-1, use this variable. This can be redefined for any
sub-directory.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.UNAC_EXCEPT_TRANS">
<term><varname>unac_except_trans</varname></term>
<listitem><para>A list of characters,
encoded in UTF-8, which should be handled specially
when converting text to unaccented lowercase. For
example, in Swedish, the letter a with diaeresis has full alphabet
citizenship and should not be turned into an a.
Each element in the space-separated list has the special character as
first element and the translation following. The handling of both the
lowercase and upper-case versions of a character should be specified, as
appartenance to the list will turn-off both standard accent and case
processing. The value is global and affects both indexing and querying.
Examples:
Swedish:
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
. German:
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
In French, you probably want to decompose oe and ae and nobody would type
a German ß
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
. The default for all until someone protests follows. These decompositions
are not performed by unac, but it is unlikely that someone would type the
composed forms in a search.
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAILDEFCHARSET">
<term><varname>maildefcharset</varname></term>
<listitem><para>Overrides the default
character set for email messages which don't specify
one. This is mainly useful for readpst (libpst) dumps,
which are utf-8 but do not say so.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOCALFIELDS">
<term><varname>localfields</varname></term>
<listitem><para>Set fields on all files
(usually of a specific fs area). Syntax is the usual:
name = value ; attr1 = val1 ; [...]
value is empty so this needs an initial semi-colon. This is useful, e.g.,
for setting the rclaptg field for application selection inside
mimeview.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESTMODIFUSEMTIME">
<term><varname>testmodifusemtime</varname></term>
<listitem><para>Use mtime instead of
ctime to test if a file has been modified. The time is used
in addition to the size, which is always used.
Setting this can reduce re-indexing on systems where extended attributes
are used (by some other application), but not indexed, because changing
extended attributes only affects ctime.
Notes:
- This may prevent detection of change in some marginal file rename cases
(the target would need to have the same size and mtime).
- You should probably also set noxattrfields to 1 in this case, except if
you still prefer to perform xattr indexing, for example if the local
file update pattern makes it of value (as in general, there is a risk
for pure extended attributes updates without file modification to go
undetected). Perform a full index reset after changing this.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOXATTRFIELDS">
<term><varname>noxattrfields</varname></term>
<listitem><para>Disable extended attributes
conversion to metadata fields. This probably needs to be
set if testmodifusemtime is set.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.METADATACMDS">
<term><varname>metadatacmds</varname></term>
<listitem><para>Define commands to
gather external metadata, e.g. tmsu tags.
There can be several entries, separated by semi-colons, each defining
which field name the data goes into and the command to use. Don't forget the
initial semi-colon. All the field names must be different. You can use
aliases in the "field" file if necessary.
As a not too pretty hack conceded to convenience, any field name
beginning with "rclmulti" will be taken as an indication that the command
returns multiple field values inside a text blob formatted as a recoll
configuration file ("fieldname = fieldvalue" lines). The rclmultixx name
will be ignored, and field names and values will be parsed from the data.
Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
</para></listitem></varlistentry>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.STORE">
<title>Parameters affecting where and how we store things </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CACHEDIR">
<term><varname>cachedir</varname></term>
<listitem><para>Top directory for Recoll data. Recoll data
directories are normally located relative to the configuration directory
(e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set, the
directories are stored under the specified value instead (e.g. if
cachedir is ~/.cache/recoll, the default dbdir would be
~/.cache/recoll/xapiandb). This affects dbdir, webcachedir,
mboxcachedir, aspellDicDir, which can still be individually specified to
override cachedir. Note that if you have multiple configurations, each
must have a different cachedir, there is no automatic computation of a
subpath under cachedir.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXFSOCCUPPC">
<term><varname>maxfsoccuppc</varname></term>
<listitem><para>Maximum file system occupation
over which we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default
value is 0, meaning no checking.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB">
<term><varname>xapiandb</varname></term>
<listitem><para>Xapian database directory
location. This will be created on first indexing. If the
value is not an absolute path, it will be interpreted as relative to
cachedir if set, or the configuration directory (-c argument or
$RECOLL_CONFDIR). If nothing is specified, the default is then
~/.recoll/xapiandb/</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXSTATUSFILE">
<term><varname>idxstatusfile</varname></term>
<listitem><para>Name of the scratch file where the indexer process updates its
status. Default: idxstatus.txt inside the configuration
directory.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEDIR">
<term><varname>mboxcachedir</varname></term>
<listitem><para>Directory location for storing mbox message offsets cache
files. This is normally 'mboxcache' under cachedir if set,
or else under the configuration directory, but it may be useful to share
a directory between different configurations.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEMINMBS">
<term><varname>mboxcacheminmbs</varname></term>
<listitem><para>Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The
default is 5 MB.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEDIR">
<term><varname>webcachedir</varname></term>
<listitem><para>Directory where we store the archived web pages. This is only used by the web history indexing code
Default: cachedir/webcache if cachedir is set, else
$RECOLL_CONFDIR/webcache</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEMAXMBS">
<term><varname>webcachemaxmbs</varname></term>
<listitem><para>Maximum size in MB of the Web archive. This is only used by the web history indexing code.
Default: 40 MB.
Reducing the size will not physically truncate the file.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBQUEUEDIR">
<term><varname>webqueuedir</varname></term>
<listitem><para>The path to the Web indexing queue. This is
hard-coded in the plugin as ~/.recollweb/ToIndex so there should be no
need or possibility to change it.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLDICDIR">
<term><varname>aspellDicDir</varname></term>
<listitem><para>Aspell dictionary storage directory location. The
aspell dictionary (aspdict.(lang).rws) is normally stored in the
directory specified by cachedir if set, or under the configuration
directory.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERSDIR">
<term><varname>filtersdir</varname></term>
<listitem><para>Directory location for executable input handlers. If
RECOLL_FILTERSDIR is set in the environment, we use it instead. Defaults
to $prefix/share/recoll/filters. Can be redefined for
subdirectories.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ICONSDIR">
<term><varname>iconsdir</varname></term>
<listitem><para>Directory location for icons. The only reason to
change this would be if you want to change the icons displayed in the
result list. Defaults to $prefix/share/recoll/images</para></listitem></varlistentry>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PERFS">
<title>Parameters affecting indexing performance and resource usage </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXFLUSHMB">
<term><varname>idxflushmb</varname></term>
<listitem><para>Threshold (megabytes of new data) where we flush from memory to
disk index. Setting this allows some control over memory
usage by the indexer process. A value of 0 means no explicit flushing,
which lets Xapian perform its own thing, meaning flushing every
$XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
usage depends on average document size, not only document count, the
Xapian approach is is not very useful, and you should let Recoll manage
the flushes. The default value of idxflushmb is 10 MB, and may be a bit
low. If you are looking for maximum speed, you may want to experiment
with values between 20 and
80. In my experience, values beyond 100 are always counterproductive. If
you find otherwise, please drop me a note.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS">
<term><varname>filtermaxseconds</varname></term>
<listitem><para>Maximum external filter execution time in
seconds. Default 1200 (20mn). Set to 0 for no limit. This
is mainly to avoid infinite loops in postscript files
(loop.ps)</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXMBYTES">
<term><varname>filtermaxmbytes</varname></term>
<listitem><para>Maximum virtual memory space for filter processes
(setrlimit(RLIMIT_AS)), in megabytes. Note that this
includes any mapped libs (there is no reliable Linux way to limit the
data space only), so we need to be a bit generous here. Anything over
2000 will be ignored on 32 bits machines.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRQSIZES">
<term><varname>thrQSizes</varname></term>
<listitem><para>Stage input queues configuration. There are three
internal queues in the indexing pipeline stages (file data extraction,
terms generation, index update). This parameter defines the queue depths
for each stage (three integer values). If a value of -1 is given for a
given stage, no queue is used, and the thread will go on performing the
next stage. In practise, deep queues have not been shown to increase
performance. Default: a value of 0 for the first queue tells Recoll to
perform autoconfiguration based on the detected number of CPUs (no need
for the two other values in this case). Use thrQSizes = -1 -1 -1 to
disable multithreading entirely.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRTCOUNTS">
<term><varname>thrTCounts</varname></term>
<listitem><para>Number of threads used for each indexing stage. The
three stages are: file data extraction, terms generation, index
update). The use of the counts is also controlled by some special values
in thrQSizes: if the first queue depth is 0, all counts are ignored
(autoconfigured); if a value of -1 is used for a queue depth, the
corresponding thread count is ignored. It makes no sense to use a value
other than 1 for the last stage because updating the Xapian index is
necessarily single-threaded (and protected by a mutex).</para></listitem></varlistentry>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">
<title>Miscellaneous parameters </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOGLEVEL">
<term><varname>loglevel</varname></term>
<listitem><para>Log file verbosity 1-6. A value of 2 will print
only errors and warnings. 3 will print information like document updates,
4 is quite verbose and 6 very verbose.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOGFILENAME">
<term><varname>logfilename</varname></term>
<listitem><para>Log file destination. Use 'stderr' (default) to write to the
console. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXLOGLEVEL">
<term><varname>idxloglevel</varname></term>
<listitem><para>Override loglevel for the indexer. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXLOGFILENAME">
<term><varname>idxlogfilename</varname></term>
<listitem><para>Override logfilename for the indexer. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGLEVEL">
<term><varname>daemloglevel</varname></term>
<listitem><para>Override loglevel for the indexer in real time
mode. The default is to use the idx... values if set, else
the log... values.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGFILENAME">
<term><varname>daemlogfilename</varname></term>
<listitem><para>Override logfilename for the indexer in real time
mode. The default is to use the idx... values if set, else
the log... values.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXRUNDIR">
<term><varname>idxrundir</varname></term>
<listitem><para>Indexing process current directory. The input
handlers sometimes leave temporary files in the current directory, so it
makes sense to have recollindex chdir to some temporary directory. If the
value is empty, the current directory is not changed. If the
value is (literal) tmp, we use the temporary directory as set by the
environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an
absolute path to a directory, we go there.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CHECKNEEDRETRYINDEXSCRIPT">
<term><varname>checkneedretryindexscript</varname></term>
<listitem><para>Script used to heuristically check if we need to retry indexing
files which previously failed. The default script checks
the modified dates on /usr/bin and /usr/local/bin. A relative path will
be looked up in the filters dirs, then in the path. Use an absolute path
to do otherwise.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.RECOLLHELPERPATH">
<term><varname>recollhelperpath</varname></term>
<listitem><para>Additional places to search for helper executables. This is only used on Windows for now.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXABSMLEN">
<term><varname>idxabsmlen</varname></term>
<listitem><para>Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file.
The text can come from an actual 'abstract' section in the
document or will just be the beginning of the document. It is stored in
the index so that it can be displayed inside the result lists without
decoding the original file. The idxabsmlen parameter
defines the size of the stored abstract. The default value is 250
bytes. The search interface gives you the choice to display this stored
text or a synthetic abstract built by extracting text around the search
terms. If you always prefer the synthetic abstract, you can reduce this
value and save a little space.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXMETASTOREDLEN">
<term><varname>idxmetastoredlen</varname></term>
<listitem><para>Truncation length of stored metadata fields. This
does not affect indexing (the whole field is processed anyway), just the
amount of data stored in the index for the purpose of displaying fields
inside result lists or previews. The default value is 150 bytes which
may be too low if you have custom fields.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE">
<term><varname>aspellLanguage</varname></term>
<listitem><para>Language definitions to use when creating the aspell
dictionary. The value must match a set of aspell language
definition files. You can type "aspell dicts" to see a list The default
if this is not set is to use the NLS environment to guess the
value.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLADDCREATEPARAM">
<term><varname>aspellAddCreateParam</varname></term>
<listitem><para>Additional option and parameter to aspell dictionary creation
command. Some aspell packages may need an additional option
(e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug
772415.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLKEEPSTDERR">
<term><varname>aspellKeepStderr</varname></term>
<listitem><para>Set this to have a look at aspell dictionary creation
errors. There are always many, so this is mostly for
debugging.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOASPELL">
<term><varname>noaspell</varname></term>
<listitem><para>Disable aspell use. The aspell dictionary generation
takes time, and some combinations of aspell version, language, and local
terms, result in aspell crashing, so it sometimes makes sense to just
disable the thing.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONAUXINTERVAL">
<term><varname>monauxinterval</varname></term>
<listitem><para>Auxiliary database update interval. The real time
indexer only updates the auxiliary databases (stemdb, aspell)
periodically, because it would be too costly to do it for every document
change. The default period is one hour.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIXINTERVAL">
<term><varname>monixinterval</varname></term>
<listitem><para>Minimum interval (seconds) between processings of the indexing
queue. The real time indexer does not process each event
when it comes in, but lets the queue accumulate, to diminish overhead and
to aggregate multiple events affecting the same file. Default 30
S.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONDELAYPATTERNS">
<term><varname>mondelaypatterns</varname></term>
<listitem><para>Timing parameters for the real time indexing. Definitions for files which get a longer delay before reindexing
is allowed. This is for fast-changing files, that should only be
reindexed once in a while. A list of wildcardPattern:seconds pairs. The
patterns are matched with fnmatch(pattern, path, 0) You can quote entries
containing white space with double quotes (quote the whole entry, not the
pattern). The default is empty.
Example: mondelaypatterns = *.log:20 "*with spaces.*:30"</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASS">
<term><varname>monioniceclass</varname></term>
<listitem><para>ionice class for the real time indexing process On platforms where this is supported. The default value is
3.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASSDATA">
<term><varname>monioniceclassdata</varname></term>
<listitem><para>ionice class parameter for the real time indexing process. On platforms where this is supported. The default is
empty.</para></listitem></varlistentry>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.QUERY">
<title>Query-time parameters (no impact on the index) </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.AUTODIACSENS">
<term><varname>autodiacsens</varname></term>
<listitem><para>auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger
diacritics sensitivity if the search term has accented characters (not in
unac_except_trans). Else you need to use the query language and the "D"
modifier to specify diacritics sensitivity. Default is no.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.AUTOCASESENS">
<term><varname>autocasesens</varname></term>
<listitem><para>auto-trigger case sensitivity (raw index only). IF
the index is not stripped (see indexStripChars), decide if we
automatically trigger character case sensitivity if the search term has
upper-case characters in any but the first position. Else you need to use
the query language and the "C" modifier to specify character-case
sensitivity. Default is yes.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMEXPAND">
<term><varname>maxTermExpand</varname></term>
<listitem><para>Maximum query expansion count
for a single term (e.g.: when using wildcards). This only
affects queries, not indexing. We used to not limit this at all (except
for filenames where the limit was too low at 1000), but it is
unreasonable with a big index. Default 10000.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXXAPIANCLAUSES">
<term><varname>maxXapianClauses</varname></term>
<listitem><para>Maximum number of clauses
we add to a single Xapian query. This only affects queries,
not indexing. In some cases, the result of term expansion can be
multiplicative, and we want to avoid eating all the memory. Default
50000.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SNIPPETMAXPOSWALK">
<term><varname>snippetMaxPosWalk</varname></term>
<listitem><para>Maximum number of positions we walk while populating a snippet for
the result list. The default of 1,000,000 may be
insufficient for very big documents, the consequence would be snippets
with possibly meaning-altering missing words.</para></listitem></varlistentry>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PDF">
<title>Parameters for the PDF input script </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCR">
<term><varname>pdfocr</varname></term>
<listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
pdftoppm are installed. The default is off because OCR is so
very slow.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
<term><varname>pdfattach</varname></term>
<listitem><para>Enable PDF attachment extraction by executing pdftk (if
available). This is
normally disabled, because it does slow down PDF indexing a bit even if
not one attachment is ever found.</para></listitem></varlistentry>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.SPECLOCATIONS">
<title>Parameters set for specific locations </title>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MHMBOXQUIRKS">
<term><varname>mhmboxquirks</varname></term>
<listitem><para>Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are
stored.</para></listitem></varlistentry>
</sect3>
</sect2>

File diff suppressed because it is too large Load diff

View file

@ -5651,880 +5651,10 @@ thesame = "some string with spaces"
</sect2> </sect2>
<sect2 id="RCL.INSTALL.CONFIG.RECOLLCONF"> <!-- <sect2 id="RCL.INSTALL.CONFIG.RECOLLCONF"> -->
<title>The main configuration file, recoll.conf</title> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="recoll.conf.xml" />
<para><filename>recoll.conf</filename> is the main
configuration file. It defines things like
what to index (top directories and things to ignore), and the
default character set to use for document types which do not
specify it internally.</para>
<para>The default configuration will index your home
directory. If this is not appropriate, start
<command>recoll</command> to create a blank
configuration, click <guimenu>Cancel</guimenu>, and edit
the configuration file before restarting the command. This
will start the initial indexing, which may take some time.</para>
<para>Most of the following parameters can be changed from the
<guilabel>Index Configuration</guilabel> menu in the
<command>recoll</command> interface. Some can only be set by
editing the configuration file.</para>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.FILES">
<title>Parameters affecting what documents we index:</title>
<variablelist>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS">
<term><varname>topdirs</varname></term>
<listitem><para>Specifies the list of directories or files to
index (recursively for directories). You can use symbolic links
as elements of this list. See the
<varname>followLinks</varname> option about following symbolic links
found under the top elements (not followed by default).</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>skippedNames</varname></term>
<listitem>
<para>A space-separated list of wilcard patterns for
names of files or directories that should be completely
ignored. The list defined in the default file is: </para>
<programlisting>
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
.recoll* xapiandb recollrc recoll.conf
</programlisting>
<para>The list can be redefined at any sub-directory in the
indexed area.</para>
<para>The top-level directories are not affected by this
list (that is, a directory in <varname>topdirs</varname>
might match and would still be indexed).</para>
<para>The list in the default configuration does not
exclude hidden directories (names beginning with a
dot), which means that it may index quite a few things
that you do not want. On the other hand, email user
agents like <application>thunderbird</application>
usually store messages in hidden directories, and you
probably want this indexed. One possible solution is to
have <filename>.*</filename> in
<varname>skippedNames</varname>, and add things like
<filename>~/.thunderbird</filename> or
<filename>~/.evolution</filename> in
<varname>topdirs</varname>.</para>
<para>Not even the file names are indexed for patterns
in this list. See the
<varname>noContentSuffixes</varname> variable for an alternative
approach which indexes the file names.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>noContentSuffixes</varname></term>
<listitem><para>This is a list of file name endings (not
wildcard expressions, nor dot-delimited suffixes). Only the
names of matching files will be indexed (no attempt at MIME
type identification, no decompression, no content
indexing). This can be redefined for
subdirectories, and edited from the GUI. The default value is:
<programlisting>
noContentSuffixes = .md5 .map \
.o .lib .dll .a .sys .exe .com \
.mpp .mpt .vsd \
.img .img.gz .img.bz2 .img.xz .image .image.gz .image.bz2 .image.xz \
.dat .bak .rdf .log.gz .log .db .msf .pid \
,v ~ #
</programlisting>
</para></listitem>
</varlistentry>
<varlistentry><term><varname>skippedPaths</varname> and
<varname>daemSkippedPaths</varname> </term>
<listitem>
<para>A space-separated list of patterns for
<emphasis>paths</emphasis> of files or directories that should be skipped.
There is no default in the sample configuration file,
but the code always adds the configuration and database
directories in there.</para>
<para><varname>skippedPaths</varname> is used both by
batch and real time
indexing. <varname>daemSkippedPaths</varname> can be
used to specify things that should be indexed at
startup, but not monitored.</para>
<para>Example of use for skipping text files only in a
specific directory:</para>
<programlisting>
skippedPaths = ~/somedir/*.txt
</programlisting>
</listitem>
</varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME">
<term><varname>skippedPathsFnmPathname</varname></term>
<listitem><para>The values in the
<varname>*skippedPaths</varname> variables are matched by
default with <literal>fnmatch(3)</literal>, with the
FNM_PATHNAME flag. This means that '/'
characters must be matched explicitely. You can set
<varname>skippedPathsFnmPathname</varname> to 0 to disable
the use of FNM_PATHNAME (meaning that /*/dir3 will match
/dir1/dir2/dir3).</para>
</listitem>
</varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPSKIPPEDNAMES">
<term><varname>zipSkippedNames</varname></term>
<listitem><para>A space-separated list of patterns for
names of files or directories that should be ignored
inside zip archives. This is used directly by the zip
handler, and has a function similar to skippedNames, but
works independantly. Can be redefined for filesystem
subdirectories. For versions up to 1.19, you will need
to update the Zip handler and install a supplementary
Python module. The details are
described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
the &RCL; wiki</ulink>.
</para></listitem>
</varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FOLLOWLINKS">
<term><varname>followLinks</varname></term>
<listitem><para>Specifies if the indexer should follow
symbolic links while walking the file tree. The default is
to ignore symbolic links to avoid multiple indexing of
linked files. No effort is made to avoid duplication when
this option is set to true. This option can be set
individually for each of the <varname>topdirs</varname>
members by using sections. It can not be changed below the
<varname>topdirs</varname> level.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>indexedmimetypes</varname></term>
<listitem><para>&RCL; normally indexes any file which it
knows how to read. This list lets you restrict the indexed
MIME types to what you specify. If the variable is
unspecified or the list empty (the default), all supported
types are processed. Can be redefined for subdirectories.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>excludedmimetypes</varname></term>
<listitem><para> This list lets you exclude some MIME types from
indexing. Can be redefined for subdirectories.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>compressedfilemaxkbs</varname></term>
<listitem><para>Size limit for compressed (.gz or .bz2)
files. These need to be decompressed in a temporary
directory for identification, which can be very wasteful
if 'uninteresting' big compressed files are present.
Negative means no limit, 0 means no processing of any
compressed file. Defaults to -1.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>textfilemaxmbs</varname></term>
<listitem><para>Maximum size for text files. Very big text
files are often uninteresting logs. Set to -1 to disable
(default 20MB).</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>textfilepagekbs</varname></term>
<listitem><para>If set to other than -1, text files will be
indexed as multiple documents of the given page size. This may
be useful if you do want to index very big text files as it
will both reduce memory usage at index time and help with
loading data to the preview window. A size of a few megabytes
would seem reasonable (default: 1MB).</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>membermaxkbs</varname></term>
<listitem><para>This defines the maximum size in kilobytes for
an archive member (zip, tar or rar at the moment). Bigger
entries will be skipped.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>indexallfilenames</varname></term>
<listitem><para>&RCL; indexes file names in a special
section of the database to allow specific file names
searches using wild cards. This parameter decides if
file name indexing is performed only for files with MIME
types that would qualify them for full text indexing, or
for all files inside the selected subtrees, independently of
MIME type.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>usesystemfilecommand</varname></term>
<listitem><para>Decide if we execute a system command
(<command>file</command> <option>-i</option> by default)
as a final step for determining the MIME type for a file
(the main procedure uses suffix associations as defined in
the <filename>mimemap</filename> file). This can be useful
for files with suffix-less names, but it will also cause
the indexing of many bogus "text" files.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>systemfilecommand</varname></term>
<listitem><para>Command to use for mime for mime type
determination if <literal>usesystefilecommand</literal> is
set. Recent versions of <command>xdg-mime</command> sometimes
work better than <command>file</command>.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>processwebqueue</varname></term>
<listitem><para>If this is set, process the directory where
Web browser plugins copy visited pages for indexing.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>webqueuedir</varname></term>
<listitem><para>The path to the web indexing queue. This is
hard-coded in the Firefox plugin as
<filename>~/.recollweb/ToIndex</filename> so there should be no
need to change it.</para>
</listitem>
</varlistentry>
</variablelist>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
<title>Parameters affecting how we generate terms:</title>
<para>Changing some of these parameters will imply a full
reindex. Also, when using multiple indexes, it may not make sense
to search indexes that don't share the values for these parameters,
because they usually affect both search and index operations.</para>
<variablelist>
<varlistentry><term><varname>indexStripChars</varname></term>
<listitem><para>Decide if we strip characters of diacritics and
convert them to lower-case before terms are indexed. If we
don't, searches sensitive to case and diacritics can be
performed, but the index will be bigger, and some marginal
weirdness may sometimes occur. The default is a stripped
index (<literal>indexStripChars = 1</literal>) for
now. When using multiple indexes for a search,
this parameter must be defined identically for
all. Changing the value implies an index reset.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>maxTermExpand</varname></term>
<listitem><para>Maximum expansion count for a single term (e.g.:
when using wildcards). The default of 10000 is reasonable and
will avoid queries that appear frozen while the engine is
walking the term list.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>maxXapianClauses</varname></term>
<listitem><para>Maximum number of elementary clauses we can add
to a single Xapian query. In some cases, the result of term
expansion can be multiplicative, and we want to avoid using
excessive memory. The default of 100 000 should be both
high enough in most cases and compatible with current
typical hardware configurations.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>nonumbers</varname></term>
<listitem><para>If this set to true, no terms will be generated
for numbers. For example "123", "1.5e6", 192.168.1.4, would not
be indexed ("value123" would still be). Numbers are often quite
interesting to search for, and this should probably not be set
except for special situations, ie, scientific documents with huge
amounts of numbers in them. This can only be set for a whole
index, not for a subtree.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>dehyphenate</varname></term>
<listitem><para>Determines if, given an input of
<literal>co-worker</literal>, we add a term for
<literal>coworker</literal>. This possibility is new in version
1.22, and on by default. Setting the variable to off allows
restoring the previous behaviour.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>nocjk</varname></term>
<listitem><para>If this set to true, specific east asian
(Chinese Korean Japanese) characters/word splitting is
turned off. This will save a small amount of cpu if you
have no CJK documents. If your document base does include
such text but you are not interested in searching it,
setting <varname>nocjk</varname> may be a significant time
and space saver.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>cjkngramlen</varname></term>
<listitem><para>This lets you adjust the size of n-grams
used for indexing CJK text. The default value of 2 is
probably appropriate in most cases. A value of 3 would
allow more precision and efficiency on longer words, but
the index will be approximately twice as large.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>indexstemminglanguages</varname></term>
<listitem><para>A list of languages for which the stem
expansion databases will be built. See <citerefentry>
<refentrytitle>recollindex</refentrytitle>
<manvolnum>1</manvolnum> </citerefentry> or use the
<command>recollindex</command> <option>-l</option> command
for possible values. You can add a stem expansion database
for a different language by using
<command>recollindex</command> <option>-s</option>, but it
will be deleted during the next indexing. Only languages
listed in the configuration file are permanent.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>defaultcharset</varname></term>
<listitem><para>The name of the character set used for
files that do not contain a character set definition (ie:
plain text files). This can be redefined for any
sub-directory. If it is not set at all, the character set
used is the one defined by the nls environment (
<envar>LC_ALL</envar>, <envar>LC_CTYPE</envar>,
<envar>LANG</envar>), or <literal>iso8859-1</literal>
if nothing is set.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>unac_except_trans</varname></term>
<listitem><para>This is a list of characters, encoded in UTF-8,
which should be handled specially when converting text to
unaccented lowercase. For example, in Swedish, the letter
<literal>a with diaeresis</literal> has full alphabet
citizenship and should not be turned into an
<literal>a</literal>. Each element in the space-separated list
has the special character as first element and the translation
following. The handling of both the lowercase and upper-case
versions of a character should be specified, as appartenance to
the list will turn-off both standard accent and case
processing. Example for Swedish:</para>
<programlisting>
unac_except_trans = åå Åå ää Ää öö Öö
</programlisting>
<para>Note that the translation is not limited to a single
character, you could very well have something like
<literal>üue</literal> in the list.</para>
<para>The default value set for
<literal>unac_except_trans</literal> can't be listed here
because I have trouble with SGML and UTF-8, but it only
contains ligature decompositions: german ss, oe, ae, fi,
fl.</para>
<para>This parameter can't be defined for subdirectories, it
is global, because there is no way to do otherwise when
querying. If you have document sets which would need different
values, you will have to index and query them separately.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>maildefcharset</varname></term>
<listitem><para>This can be used to define the default
character set specifically for email messages which don't
specify it. This is mainly useful for readpst (libpst) dumps,
which are utf-8 but do not say so.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>localfields</varname></term>
<listitem><para>This allows setting fields for all documents
under a given directory. Typical usage would be to set an
"rclaptg" field, to be used in <filename>mimeview</filename> to
select a specific viewer. If several fields are to be set, they
should be separated with a semi-colon (';') character, which there
is currently no way to escape. Also note the initial semi-colon.
Example:
<literal>localfields= ;rclaptg=gnus;other = val</literal>, then
select specifier viewer with
<literal>mimetype|tag=...</literal> in
<filename>mimeview</filename>.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>testmodifusemtime</varname></term>
<listitem><para>If true, use mtime instead of default ctime to
determine if a file has been modified (in addition to
size, which is always used). Setting this can reduce
re-indexing on systems where extended attributes are
modified (by some other application), but not indexed
(changing extended attributes only affects
ctime). Notes:
<itemizedlist>
<listitem><para>This may prevent detection of change
in some marginal file rename cases (the target would
need to have the same size and
mtime).</para></listitem>
<listitem><para>You should probably also set
noxattrfields to 1 in this case, except if you still
prefer to perform xattr indexing, for example if the
local file update pattern makes it of value (as in
general, there is a risk for pure extended attributes
updates without file modification to go
undetected).</para></listitem>
</itemizedlist>
Perform a full index reset after changing the value of
this parameter.
</para></listitem>
</varlistentry>
<varlistentry><term><varname>noxattrfields</varname></term>
<listitem><para>Recoll versions 1.19 and later
automatically translate file extended attributes into
document fields (to be processed according to the
parameters from the <filename>fields</filename>
file). Setting this variable to 1 will disable the
behaviour.</para></listitem>
</varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.METADATACMDS">
<term><varname>metadatacmds</varname></term>
<listitem><para>This allows executing external commands
for each file and storing the output in &RCL; document
fields. This could be used for example to index
external tag data. The value is a list of field names
and commands, don't forget an initial
semi-colon. Example:
<programlisting>
[/some/area/of/the/fs]
metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
</programlisting>
</para> <para>As a specially disgusting hack brought by
&RCL; 1.19.7, if a "field name" begins
with <literal>rclmulti</literal>, the data returned by
the command is expected to contain multiple field
values, in configuration file format. This allows
setting several fields by executing a single
command. Example:
<programlisting>
metadatacmds = ; rclmulti1 = somecmd %f
</programlisting>
If <literal>somecmd</literal> returns data in the form
of:
<programlisting>
field1 = value1
field2 = value for field2
</programlisting>
<literal>field1</literal>
and <literal>field2</literal> will be set inside the
document metadata.</para>
</listitem>
</varlistentry>
</variablelist>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.STORAGE">
<title>Parameters affecting where and how we store things:</title>
<variablelist>
<varlistentry><term><varname>cachedir</varname></term>
<listitem>
<para>When not explicitly specified, the &RCL; data directories
are stored relative to the configuration directory. If
<literal>cachedir</literal> is set, the directories are stored
under the specified value instead (e.g. if
<literal>cachedir</literal> is set to
<filename>~/.cache/recoll</filename>, the default
<literal>dbdir</literal> would be
<filename>~/.cache/recoll/xapiandb</filename> instead of
<filename>~/.recoll/xapiandb</filename> ). This affects the
default values for <literal>dbdir</literal>,
<literal>webcachedir</literal>,
<literal>mboxcachedir</literal>, and
<literal>aspellDicDir</literal>, which can still be
individually specified to override
<literal>cachedir</literal>. Note that if you have multiple
configurations, each must have a different
<literal>cachedir</literal>.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>dbdir</varname></term>
<listitem><para>The name of the Xapian data directory. It
will be created if needed when the index is
initialized. If this is not an absolute path, it will be
interpreted relative to the configuration directory. The
value can have embedded spaces but starting or trailing
spaces will be trimmed. You cannot use quotes here.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>idxstatusfile</varname></term>
<listitem><para>The name of the scratch file where the indexer
process updates its status. Default:
<filename>idxstatus.txt</filename> inside the configuration
directory.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>maxfsoccuppc</varname></term>
<listitem><para>Maximum file system occupation before we
stop indexing. The value is a percentage, corresponding to
what the "Capacity" df output column shows. The default
value is 0, meaning no checking. </para>
</listitem>
</varlistentry>
<varlistentry><term><varname>mboxcachedir</varname></term>
<listitem><para>The directory where mbox message offsets cache
files are held. This is normally $RECOLL_CONFDIR/mboxcache, but
it may be useful to share a directory between different
configurations.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>mboxcacheminmbs</varname></term>
<listitem><para>The minimum mbox file size over which we
cache the offsets. There is really no sense in caching
offsets for small files. The default is 5 MB.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>webcachedir</varname></term>
<listitem><para>This is only used by the web browser
plugin indexing code, and defines where the cache for visited
pages will live. Default:
<filename>$RECOLL_CONFDIR/webcache</filename></para>
</listitem>
</varlistentry>
<varlistentry><term><varname>webcachemaxmbs</varname></term>
<listitem><para>This is only used by the web browser
plugin indexing code, and defines the maximum size for the web
page cache. Default: 40 MB. Quite unfortunately, this is only
taken into account when creating the cache file. You need to
delete the file for a change to be taken into account.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>idxflushmb</varname></term>
<listitem><para>Threshold (megabytes of new text data) where we
flush from memory to disk index. Setting this can help control
memory usage. A value of 0 means no explicit flushing, letting
Xapian use its own default, which is flushing every 10000 (or
XAPIAN_FLUSH_THRESHOLD) documents, which gives little memory
usage control, as memory usage also depends on average document
size. The default value is 10, and it is probably a bit low. If
your system usually has free memory, you can try higher values
between 20 and 80. In my experience, values beyond 100 are
always counterproductive.</para>
</listitem>
</varlistentry>
</variablelist>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXTHREADS">
<title>Parameters affecting multithread processing</title>
<para>The &RCL; indexing process
<command>recollindex</command> can use multiple threads to
speed up indexing on multiprocessor systems. The work done
to index files is divided in several stages and some of the
stages can be executed by multiple threads. The stages are:
<orderedlist>
<listitem>File system walking: this is always performed by
the main thread.</listitem>
<listitem>File conversion and data extraction.</listitem>
<listitem>Text processing (splitting, stemming,
etc.)</listitem>
<listitem>&XAP; index update.</listitem>
</orderedlist>
</para>
<para>You can also read a
<ulink url="http://www.recoll.org/idxthreads/threadingRecoll.html">
longer document</ulink> about the transformation of
&RCL; indexing to multithreading.</para>
<para>The threads configuration is controlled by two
configuration file parameters.</para>
<variablelist>
<varlistentry><term><varname>thrQSizes</varname></term>
<listitem><para>This variable defines the job input queues
configuration. There are three possible queues for
stages 2, 3 and 4, and this parameter should give the
queue depth for each stage (three integer values). If
a value of -1 is used for a given stage, no queue is
used, and the thread will go on performing the next
stage. In practise, deep queues have not been shown to
increase performance. A value of 0 for the first queue
tells &RCL; to perform autoconfiguration (no need for
the two other values in this case) - this is the
default configuration.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>thrTCounts</varname></term>
<listitem><para>This defines the number of threads used
for each stage. If a value of -1 is used for one of
the queue depths, the corresponding thread count is
ignored. It makes no sense to use a value other than 1
for the last stage because updating the &XAP; index is
necessarily single-threaded (and protected by a
mutex).</para>
</listitem>
</varlistentry>
</variablelist>
<para>The following example would use three queues (of depth 2),
and 4 threads for converting source documents, 2 for
processing their text, and one to update the index. This was
tested to be the best configuration on the test system
(quadri-processor with multiple disks).
<programlisting>
thrQSizes = 2 2 2
thrTCounts = 4 2 1
</programlisting>
</para>
<para>The following example would use a single queue, and the
complete processing for each document would be performed by
a single thread (several documents will still be processed
in parallel in most cases). The threads will use mutual
exclusion when entering the index update stage. In practise
the performance would be close to the precedent case in
general, but worse in certain cases (e.g. a Zip archive
would be performed purely sequentially), so the previous
approach is preferred. YMMV... The 2 last values for
thrTCounts are ignored.
<programlisting>
thrQSizes = 2 -1 -1
thrTCounts = 6 1 1
</programlisting>
</para>
<para>The following example would disable
multithreading. Indexing will be performed by a single
thread.
<programlisting>
thrQSizes = -1 -1 -1
</programlisting>
</para>
</sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">
<title>Miscellaneous parameters:</title>
<variablelist>
<varlistentry><term><varname>autodiacsens</varname></term>
<listitem><para>IF the index is not stripped, decide if we
automatically trigger diacritics sensitivity if the search
term has accented characters (not in
<literal>unac_except_trans</literal>). Else you need to use
the query language and the <literal>D</literal> modifier to
specify diacritics sensitivity. Default is no.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>autocasesens</varname></term>
<listitem><para>IF the index is not stripped, decide if we
automatically trigger character case sensitivity if the
search term has upper-case characters in any but the first
position. Else you need to use the query language and the
<literal>C</literal> modifier to specify character-case
sensitivity. Default is yes.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>loglevel,daemloglevel</varname></term>
<listitem><para>Verbosity level for recoll and
recollindex. A value of 4 lists quite a lot of
debug/information messages. 2 only lists errors. The
<literal>daem</literal>version is specific to the indexing monitor
daemon.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>logfilename,
daemlogfilename</varname></term>
<listitem><para>Where the messages should go. 'stderr' can
be used as a special value, and is the default. The
<literal>daem</literal>version is specific to the indexing monitor
daemon.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>checkneedretryindexscript</varname></term>
<listitem><para>This defines the name for a command
executed by <command>recollindex</command> when starting
indexing. If the exit status of the command is 0,
<command>recollindex</command> retries to index all files
which previously could not be indexed because of data
extraction errors. The default value is a script which
checks if any of the common <filename>bin</filename>
directories have changed (indicating that a helper program
may have been installed).</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>mondelaypatterns</varname></term>
<listitem><para>This allows specify wildcard path patterns
(processed with fnmatch(3) with 0 flag), to match files which
change too often and for which a delay should be observed before
re-indexing. This is a space-separated list, each entry being a
pattern and a time in seconds, separated by a colon. You can
use double quotes if a path entry contains white
space. Example:</para>
<programlisting>
mondelaypatterns = *.log:20 "this one has spaces*:10"
</programlisting>
</listitem>
</varlistentry>
<varlistentry><term><varname>monixinterval</varname></term>
<listitem><para>Minimum interval (seconds) for processing the
indexing queue. The real time monitor does not process each
event when it comes in, but will wait this time for the queue
to accumulate to diminish overhead and in order to aggregate
multiple events to the same file. Default 30 S.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>monauxinterval</varname></term>
<listitem><para>Period (in seconds) at which the real time
monitor will regenerate the auxiliary databases (spelling,
stemming) if needed. The default is one hour.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>monioniceclass, monioniceclassdata
</varname></term><listitem><para>These allow defining the
<application>ionice</application> class and data used by the
indexer (default class 3, no data).</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>filtermaxseconds</varname></term>
<listitem><para>Maximum handler execution time, after which it
is aborted. Some postscript programs just loop...</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>filtermaxmbytes</varname></term>
<listitem><para>&RCL; 1.20.7 and later. Maximum handler memory
utilisation. This uses setrlimit(RLIMIT_AS) on most systems
(total virtual memory space size limit). Some programs may start
with 500 MBytes of mapped shared libraries, so take this into
account when choosing a value. The default is a liberal
2000MB.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>filtersdir</varname></term>
<listitem><para>A directory to search for the external
input handler scripts used to index some types of files. The
value should not be changed, except if you want to modify
one of the default scripts. The value can be redefined for
any sub-directory. </para>
</listitem>
</varlistentry>
<varlistentry><term><varname>iconsdir</varname></term>
<listitem><para>The name of the directory where
<command>recoll</command> result list icons are
stored. You can change this if you want different
images.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>idxabsmlen</varname></term>
<listitem><para>&RCL; stores an abstract for each indexed
file inside the database. The text can come from an actual
'abstract' section in the document or will just be the
beginning of the document. It is stored in the index so
that it can be displayed inside the result lists without
decoding the original
file. The <varname>idxabsmlen</varname> parameter defines
the size of the stored abstract. The default value is 250 bytes.
The search interface gives you the choice to display this
stored text or a synthetic abstract built by extracting
text around the search terms. If you always
prefer the synthetic abstract, you can reduce this value
and save a little space.
</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>idxmetastoredlen</varname></term>
<listitem><para>Maximum stored length for metadata
fields. This does not affect indexing (the whole field is
processed anyway), just the amount of data stored in the
index for the purpose of displaying fields inside result
lists or previews. The default value is 150 bytes which
may be too low if you have custom fields.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>aspellLanguage</varname></term>
<listitem><para>Language definitions to use when creating
the aspell dictionary. The value must match a set of
aspell language definition files. You can type "aspell
config" to see where these are installed (look for
data-dir). The default if the variable is not set is to
use your desktop national language environment to guess
the value.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>noaspell</varname></term>
<listitem><para>If this is set, the aspell dictionary
generation is turned off. Useful for cases where you don't
need the functionality or when it is unusable because
aspell crashes during dictionary generation.</para>
</listitem>
</varlistentry>
<varlistentry><term><varname>mhmboxquirks</varname></term>
<listitem><para>This allows definining location-related quirks
for the mailbox handler. Currently only the
<literal>tbird</literal> flag is defined, and it should be set
for directories which hold
<application>Thunderbird</application> data, as their folder
format is weird. Example:
<programlisting>[/path/to/my/mozilla/mail]
mhmboxquirks = tbird</programlisting>
It should be noted that later &RCL;
versions have improved automatic detection of
<application>Thunderbird</application> folders, so that this
should not be needed at all in most cases.</para>
</listitem>
</varlistentry>
</variablelist>
</sect3>
</sect2>
<sect2 id="RCL.INSTALL.CONFIG.FIELDS"> <sect2 id="RCL.INSTALL.CONFIG.FIELDS">
<title>The fields file</title> <title>The fields file</title>

View file

@ -1,4 +1,4 @@
# <filetitle>Recoll default main configuration file</filetitle> # <filetitle>Recoll main configuration file, recoll.conf</filetitle>
# The XML tags in the comments are used to help produce the documentation # The XML tags in the comments are used to help produce the documentation
# from the sample/reference file, and not at all at run time, where # from the sample/reference file, and not at all at run time, where
@ -11,7 +11,8 @@
# Most of the important values in this file can be set from the GUI # Most of the important values in this file can be set from the GUI
# configuration menus, which may be an easier approach than direct editing. # configuration menus, which may be an easier approach than direct editing.
# <grouptitle>Parameters affecting what documents we index</grouptitle> # <grouptitle id="WHATDOCS">Parameters affecting what documents we
# index</grouptitle>
# <var name="topdirs" type="string"><brief>Space-separated list of files or # <var name="topdirs" type="string"><brief>Space-separated list of files or
# directories to recursively index.</brief><descr>Default to ~ (indexes # directories to recursively index.</brief><descr>Default to ~ (indexes
@ -19,34 +20,37 @@
# independantly of the value of the followLinks variable.</descr></var> # independantly of the value of the followLinks variable.</descr></var>
topdirs = ~ topdirs = ~
# <var name="skippedNames" type="string"><brief>Wildcard expressions for # <var name="skippedNames" type="string">
# names of files and directories that we should ignore.</brief> #
# <descr> White space separated list of wildcard patterns (simple # <brief>Files and directories which should be ignored.</brief> <descr>
# ones, not paths, must contain no / ), which will be tested against file # White space separated list of wildcard patterns (simple ones, not paths,
# and directory names. The list in the default configuration does not # must contain no / ), which will be tested against file and directory
# exclude hidden directories (names beginning with a dot), which means that # names. The list in the default configuration does not exclude hidden
# it may index quite a few things that you do not want. On the other hand, # directories (names beginning with a dot), which means that it may index
# email user agents like Thunderbird usually store messages in hidden # quite a few things that you do not want. On the other hand, email user
# directories, and you probably want this indexed. One possible solution is # agents like Thunderbird usually store messages in hidden directories, and
# to have '.*' in 'skippedNames', and add things like '~/.thunderbird' # you probably want this indexed. One possible solution is to have '.*' in
# '~/.evolution' to 'topdirs'. Not even the file names are indexed for # 'skippedNames', and add things like '~/.thunderbird' '~/.evolution' to
# patterns in this list, see the 'noContentSuffixes' variable for an # 'topdirs'. Not even the file names are indexed for patterns in this
# alternative approach which indexes the file names. Can be redefined for # list, see the 'noContentSuffixes' variable for an alternative approach
# any subtree.</descr></var> # which indexes the file names. Can be redefined for any
# subtree.</descr></var>
skippedNames = #* bin CVS Cache cache* .cache caughtspam tmp \ skippedNames = #* bin CVS Cache cache* .cache caughtspam tmp \
.thumbnails .svn \ .thumbnails .svn \
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \ *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
.recoll* xapiandb recollrc recoll.conf .recoll* xapiandb recollrc recoll.conf
# <var name="noContentSuffixes" type="string"><brief>List of name endings (not # <var name="noContentSuffixes" type="string">
# necessarily dot-separated suffixes) for which we don't try MIME type #
# identification, and don't uncompress or index content.</brief><descr>Only # <brief>List of name endings (not necessarily dot-separated suffixes) for
# the names will be indexed. This complements the now obsoleted mimemap # which we don't try MIME type identification, and don't uncompress or
# recoll_noindex list, which will go away in a future release (the move # index content.</brief><descr>Only the names will be indexed. This
# from mimemap to recoll.conf allows editing the list through the # complements the now obsoleted recoll_noindex list from the mimemap file,
# GUI). This is different from skippedNames because these are name ending # which will go away in a future release (the move from mimemap to
# matches only (not wildcard patterns), and the file name itself gets # recoll.conf allows editing the list through the GUI). This is different
# indexed normally. This can be redefined for subdirectories.</descr></var> # from skippedNames because these are name ending matches only (not
# wildcard patterns), and the file name itself gets indexed normally. This
# can be redefined for subdirectories.</descr></var>
noContentSuffixes = .md5 .map \ noContentSuffixes = .md5 .map \
.o .lib .dll .a .sys .exe .com \ .o .lib .dll .a .sys .exe .com \
.mpp .mpt .vsd \ .mpp .mpt .vsd \
@ -54,20 +58,20 @@ noContentSuffixes = .md5 .map \
.dat .bak .rdf .log.gz .log .db .msf .pid \ .dat .bak .rdf .log.gz .log .db .msf .pid \
,v ~ # ,v ~ #
# <var name="skippedPaths" type="string"><brief>Space-separated list of # <var name="skippedPaths" type="string">
# wildcard expressions for paths we shouldn't go into.</brief><descr>Can #
# contain files and directories. The database and configuration directories # <brief>Paths we should not go into.</brief><descr>Space-separated list of
# will automatically be added. The expressions are matched 'fnmatch(3)' # wildcard expressions for filesystem paths. Can contain files and
# directories. The database and configuration directories will
# automatically be added. The expressions are matched using 'fnmatch(3)'
# with the FNM_PATHNAME flag set by default. This means that '/' characters # with the FNM_PATHNAME flag set by default. This means that '/' characters
# must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0 # must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0
# to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match # to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match
# '/dir1/dir2/dir3'). The default contains the usual mount point for # '/dir1/dir2/dir3'). The default value contains the usual mount point for
# removable media by default to remind people that it is a bad idea to # removable media to remind you that it is a bad idea to have Recoll work
# naively have recoll work on these (esp. with the monitor: media gets # on these (esp. with the monitor: media gets indexed on mount, all data
# indexed on mount, all data gets erased on unmount). Typically the # gets erased on unmount). Explicitely adding '/media/xxx' to the topdirs
# presence of '/media' is mostly a reminder, it would only have effect for # will override this.</descr></var>
# someone who is indexing '/'. Explicitely adding '/media/xxx' to the
# topdirs will override this.</descr></var>
skippedPaths = /media skippedPaths = /media
# <var name="skippedPathsFnmPathname" type="bool"><brief>Set to 0 to # <var name="skippedPathsFnmPathname" type="bool"><brief>Set to 0 to
@ -75,19 +79,22 @@ skippedPaths = /media
# paths.</brief><descr></descr></var> # paths.</brief><descr></descr></var>
#skippedPathsFnmPathname = 1 #skippedPathsFnmPathname = 1
# <var name="daemSkippedPaths"><brief>skippedPaths equivalent specific to # <var name="daemSkippedPaths" type="string">
#
# <brief>skippedPaths equivalent specific to
# real time indexing.</brief><descr>This enables having parts of the tree # real time indexing.</brief><descr>This enables having parts of the tree
# which are initially indexed but not monitored. If daemSkippedPaths is # which are initially indexed but not monitored. If daemSkippedPaths is
# not set, the daemon uses skippedPaths.</descr></var> # not set, the daemon uses skippedPaths.</descr></var>
#daemSkippedPaths = #daemSkippedPaths =
# <var name="zipSkippedNames" type="string"><brief>Space-separated list of # <var name="zipSkippedNames" type="string">
# wildcard expresions for names that should be ignored #
# inside zip archives.</brief><descr>This is used directly by the zip # <brief>Space-separated list of wildcard expressions for names that should
# handler, and has a function similar to skippedNames, but # be ignored inside zip archives.</brief><descr>This is used directly by
# works independantly. Can be redefined for subdirectories. Supported by # the zip handler, and has a function similar to skippedNames, but works
# recoll 1.20 and newer. See # independantly. Can be redefined for subdirectories. Supported by recoll
# 1.20 and newer. See
# https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members # https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members
# </descr></var> # </descr></var>
#zipSkippedNames = #zipSkippedNames =
@ -119,12 +126,12 @@ skippedPaths = /media
# files.</brief><descr>We need to decompress these in a # files.</brief><descr>We need to decompress these in a
# temporary directory for identification, which can be wasteful in some # temporary directory for identification, which can be wasteful in some
# cases. Limit the waste. Negative means no limit. 0 results in no # cases. Limit the waste. Negative means no limit. 0 results in no
# processing of any compressed file.</descr></var> # processing of any compressed file. Default 50 MB.</descr></var>
compressedfilemaxkbs = 50000 compressedfilemaxkbs = 50000
# <var name="textfilemaxmbs" type="int"><brief>Size limit for text # <var name="textfilemaxmbs" type="int"><brief>Size limit for text
# files.</brief><descr>Mostly for skipping monster # files.</brief><descr>Mostly for skipping monster
# logs.</descr></var> # logs. Default 20 MB.</descr></var>
textfilemaxmbs = 20 textfilemaxmbs = 20
# <var name="indexallfilenames" type="bool"><brief>Index the file names of # <var name="indexallfilenames" type="bool"><brief>Index the file names of
@ -158,7 +165,8 @@ processwebqueue = 0
# into documents of approximately this size. Will reduce memory usage at # into documents of approximately this size. Will reduce memory usage at
# index time and help with loading data in the preview window at query # index time and help with loading data in the preview window at query
# time. Particularly useful with very big files, such as application or # time. Particularly useful with very big files, such as application or
# system logs.</descr></var> # system logs. Also see textfilemaxmbs and
# compressedfilemaxkbs.</descr></var>
textfilepagekbs = 1000 textfilepagekbs = 1000
# <var name="membermaxkbs" type="int"><brief>Size limit for archive # <var name="membermaxkbs" type="int"><brief>Size limit for archive
@ -168,7 +176,8 @@ membermaxkbs = 50000
# <grouptitle>Parameters affecting how we generate terms</grouptitle> # <grouptitle id="TERMS">Parameters affecting how we generate
# terms</grouptitle>
# Changing some of these parameters will imply a full # Changing some of these parameters will imply a full
# reindex. Also, when using multiple indexes, it may not make sense # reindex. Also, when using multiple indexes, it may not make sense
@ -201,9 +210,9 @@ indexStripChars = 1
# restoring the previous behaviour.</descr></var> # restoring the previous behaviour.</descr></var>
#dehyphenate = 1 #dehyphenate = 1
# <var name="nocjk" type="bool"><brief>Decides if specific east asian # <var name="nocjk" type="bool"><brief>Decides if specific East Asian
# (Chinese Korean Japanese) characters/word splitting is turned # (Chinese Korean Japanese) characters/word splitting is turned
# off.</brief><descr>This will save a small amount of cpu if you have no CJK # off.</brief><descr>This will save a small amount of CPU if you have no CJK
# documents. If your document base does include such text but you are not # documents. If your document base does include such text but you are not
# interested in searching it, setting nocjk may be a # interested in searching it, setting nocjk may be a
# significant time and space saver.</descr></var> # significant time and space saver.</descr></var>
@ -216,10 +225,11 @@ indexStripChars = 1
# as large.</descr></var> # as large.</descr></var>
#cjkngramlen = 2 #cjkngramlen = 2
# <var name="indexstemminglanguages" type="string"><brief>Languages for # <var name="indexstemminglanguages" type="string">
# which to create stemming expansion data.</brief><descr>Stemmer names can #
# be found on http://www.xapian.org, or by executing 'recollindex -l', or # <brief>Languages for which to create stemming expansion
# this can also be set from a list in the GUI</descr></var> # data.</brief><descr>Stemmer names can be found by executing 'recollindex
# -l', or this can also be set from a list in the GUI.</descr></var>
indexstemminglanguages = english indexstemminglanguages = english
# <var name="defaultcharset" type="string"><brief>Default character # <var name="defaultcharset" type="string"><brief>Default character
@ -246,14 +256,14 @@ indexstemminglanguages = english
# Examples: # Examples:
# Swedish: # Swedish:
# unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå # unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
# German: # . German:
# unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl # unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
# In French, you probably want to decompose oe and ae and nobody would type # In French, you probably want to decompose oe and ae and nobody would type
# a German ß # a German ß
# unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl # unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
# Reasonable default for all until someone protests. These decompositions # . The default for all until someone protests follows. These decompositions
# are not performed by unac, but I cant imagine someone typing the composed # are not performed by unac, but it is unlikely that someone would type the
# forms in a search. # composed forms in a search.
# unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl</descr></var> # unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl</descr></var>
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
@ -274,7 +284,7 @@ unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
# <var name="testmodifusemtime" type="bool"><brief>Use mtime instead of # <var name="testmodifusemtime" type="bool"><brief>Use mtime instead of
# ctime to test if a file has been modified.</brief><descr>The time is used # ctime to test if a file has been modified.</brief><descr>The time is used
# in in addition to the size, which is always used. # in addition to the size, which is always used.
# Setting this can reduce re-indexing on systems where extended attributes # Setting this can reduce re-indexing on systems where extended attributes
# are used (by some other application), but not indexed, because changing # are used (by some other application), but not indexed, because changing
# extended attributes only affects ctime. # extended attributes only affects ctime.
@ -305,6 +315,7 @@ noxattrfields = 0
# returns multiple field values inside a text blob formatted as a recoll # returns multiple field values inside a text blob formatted as a recoll
# configuration file ("fieldname = fieldvalue" lines). The rclmultixx name # configuration file ("fieldname = fieldvalue" lines). The rclmultixx name
# will be ignored, and field names and values will be parsed from the data. # will be ignored, and field names and values will be parsed from the data.
# Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
# </descr></var> # </descr></var>
#[/some/area/of/the/fs] #[/some/area/of/the/fs]
#metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f #metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
@ -312,24 +323,27 @@ noxattrfields = 0
# <grouptitle>Parameters affecting where and how we store things</grouptitle> # <grouptitle id="STORE">Parameters affecting where and how we store
# things</grouptitle>
# <var name="cachedir" type="dfn"><brief>Top directory for Recoll # <var name="cachedir" type="dfn">
# data</brief><descr>Recoll data directories are normally located relative #
# to the configuration directory (e.g. ~/.recoll/xapiandb, # <brief>Top directory for Recoll data.</brief><descr>Recoll data
# ~/.recoll/mboxcache). If 'cachedir' is set, the directories are stored under # directories are normally located relative to the configuration directory
# the specified value instead (e.g. if cachedir is ~/.cache/recoll, the # (e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set, the
# default dbdir would be ~/.cache/recoll/xapiandb). This affects dbdir, # directories are stored under the specified value instead (e.g. if
# webcachedir, mboxcachedir, aspellDicDir, which can still be individually # cachedir is ~/.cache/recoll, the default dbdir would be
# specified to override cachedir. Note that if you have multiple # ~/.cache/recoll/xapiandb). This affects dbdir, webcachedir,
# configurations, each must have a different cachedir, there is no # mboxcachedir, aspellDicDir, which can still be individually specified to
# automatic computation of a subpath under cachedir.</descr></var> # override cachedir. Note that if you have multiple configurations, each
# must have a different cachedir, there is no automatic computation of a
# subpath under cachedir.</descr></var>
#cachedir = ~/.cache/recoll #cachedir = ~/.cache/recoll
# <var name="maxfsoccuppc" type="int"><brief>Maximum file system occupation # <var name="maxfsoccuppc" type="int"><brief>Maximum file system occupation
# over which we stop indexing.</brief><descr>The value is a percentage, # over which we stop indexing.</brief><descr>The value is a percentage,
# corresponding to what the "Capacity" df output column shows. The default # corresponding to what the "Capacity" df output column shows. The default
# value is 0, meaning no checking.</descr></brief> # value is 0, meaning no checking.</descr></var>
maxfsoccuppc = 0 maxfsoccuppc = 0
# <var name="xapiandb" type="dfn"><brief>Xapian database directory # <var name="xapiandb" type="dfn"><brief>Xapian database directory
@ -340,9 +354,11 @@ maxfsoccuppc = 0
# ~/.recoll/xapiandb/</descr></var> # ~/.recoll/xapiandb/</descr></var>
dbdir = xapiandb dbdir = xapiandb
# <var name="idxstatusfile" type="fn"><brief>Name of the scratch file where # <var name="idxstatusfile" type="fn">
# the indexer process updates its status. Default: #
# idxstatus.txt inside the configuration directory # <brief>Name of the scratch file where the indexer process updates its
# status.</brief><descr>Default: idxstatus.txt inside the configuration
# directory.</descr></var>
#idxstatusfile = idxstatus.txt #idxstatusfile = idxstatus.txt
# <var name="mboxcachedir" type="dfn"> # <var name="mboxcachedir" type="dfn">
@ -371,9 +387,9 @@ webcachedir = webcache
# <var name="webcachemaxmbs" type="int"> # <var name="webcachemaxmbs" type="int">
# <brief>Maximum size in MB of the Web archive.</brief> # <brief>Maximum size in MB of the Web archive.</brief>
# <descr>This is only used by the web history indexing code. # <descr>This is only used by the web history indexing code.
# Default: 100 MB. # Default: 40 MB.
# Reducing the size will not physically truncate the file.</descr></var> # Reducing the size will not physically truncate the file.</descr></var>
webcachemaxmbs = 100 webcachemaxmbs = 40
# <var name="webqueuedir" type="fn"> # <var name="webqueuedir" type="fn">
# #
@ -405,21 +421,21 @@ webcachemaxmbs = 100
# result list. Defaults to $prefix/share/recoll/images</descr></var> # result list. Defaults to $prefix/share/recoll/images</descr></var>
#iconsdir = /path/to/my/icons #iconsdir = /path/to/my/icons
# <grouptitle>Parameters affecting indexing performance and resource # <grouptitle id="PERFS">Parameters affecting indexing performance and
# usage</grouptitle> # resource usage</grouptitle>
# <var name="idxflushmb" type="int"> # <var name="idxflushmb" type="int">
# #
# <brief>Threshold (megabytes of new data) where we flush from memory to disk # <brief>Threshold (megabytes of new data) where we flush from memory to
# index.</brief> # disk index.</brief> <descr>Setting this allows some control over memory
# <descr>Setting this allows some control over memory usage by the indexer # usage by the indexer process. A value of 0 means no explicit flushing,
# process. A value of 0 means no explicit flushing, which lets Xapian # which lets Xapian perform its own thing, meaning flushing every
# perform its own thing, meaning flushing every XAPIAN_FLUSH_THRESHOLD # $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
# documents created, modified or deleted. XAPIAN_FLUSH_THRESHOLD is an # usage depends on average document size, not only document count, the
# environment variable. As memory usage depends on average document size, # Xapian approach is is not very useful, and you should let Recoll manage
# not only document count, this is not very useful. # the flushes. The default value of idxflushmb is 10 MB, and may be a bit
# The default value of 10 MB may be a bit low. If you are looking for # low. If you are looking for maximum speed, you may want to experiment
# maximum speed, you may want to experiment with values between 20 and # with values between 20 and
# 80. In my experience, values beyond 100 are always counterproductive. If # 80. In my experience, values beyond 100 are always counterproductive. If
# you find otherwise, please drop me a note.</descr></var> # you find otherwise, please drop me a note.</descr></var>
idxflushmb = 10 idxflushmb = 10
@ -449,7 +465,7 @@ filtermaxmbytes = 2000
# for each stage (three integer values). If a value of -1 is given for a # for each stage (three integer values). If a value of -1 is given for a
# given stage, no queue is used, and the thread will go on performing the # given stage, no queue is used, and the thread will go on performing the
# next stage. In practise, deep queues have not been shown to increase # next stage. In practise, deep queues have not been shown to increase
# performance. Default: a value of 0 for the first queue tells &RCL; to # performance. Default: a value of 0 for the first queue tells Recoll to
# perform autoconfiguration based on the detected number of CPUs (no need # perform autoconfiguration based on the detected number of CPUs (no need
# for the two other values in this case). Use thrQSizes = -1 -1 -1 to # for the two other values in this case). Use thrQSizes = -1 -1 -1 to
# disable multithreading entirely.</descr></var> # disable multithreading entirely.</descr></var>
@ -463,23 +479,23 @@ thrQSizes = 0
# in thrQSizes: if the first queue depth is 0, all counts are ignored # in thrQSizes: if the first queue depth is 0, all counts are ignored
# (autoconfigured); if a value of -1 is used for a queue depth, the # (autoconfigured); if a value of -1 is used for a queue depth, the
# corresponding thread count is ignored. It makes no sense to use a value # corresponding thread count is ignored. It makes no sense to use a value
# other than 1 for the last stage because updating the &XAP; index is # other than 1 for the last stage because updating the Xapian index is
# necessarily single-threaded (and protected by a mutex).</descr></var> # necessarily single-threaded (and protected by a mutex).</descr></var>
#thrTCounts = 4 2 1 #thrTCounts = 4 2 1
# <grouptitle>Miscellaneous parameters</grouptitle> # <grouptitle id="MISC">Miscellaneous parameters</grouptitle>
# <var name="loglevel" type="int"> # <var name="loglevel" type="int">
# #
# <brief>Debug log verbosity 1-6</brief> <descr>2 is errors/warnings # <brief>Log file verbosity 1-6.</brief> <descr>A value of 2 will print
# only. 3 information like document updates, 4 is quite verbose and 6 very # only errors and warnings. 3 will print information like document updates,
# verbose.</descr></var> # 4 is quite verbose and 6 very verbose.</descr></var>
loglevel = 3 loglevel = 3
# <var name="logfilename" type="fn"> # <var name="logfilename" type="fn">
# #
# <brief>Debug log destination. Use 'stderr' (default) to write to the # <brief>Log file destination. Use 'stderr' (default) to write to the
# console.</brief><descr></descr></var> # console.</brief><descr></descr></var>
logfilename = stderr logfilename = stderr
@ -511,12 +527,11 @@ logfilename = stderr
# #
# <brief>Indexing process current directory.</brief> <descr>The input # <brief>Indexing process current directory.</brief> <descr>The input
# handlers sometimes leave temporary files in the current directory, so it # handlers sometimes leave temporary files in the current directory, so it
# makes sense to have recollindex chdir to some temporary directory. Three # makes sense to have recollindex chdir to some temporary directory. If the
# possible types of values: # value is empty, the current directory is not changed. If the
# - (literal) tmp : go to temp dir as set by environment (RECOLL_TMPDIR else # value is (literal) tmp, we use the temporary directory as set by the
# TMPDIR else /tmp) # environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an
# - Empty: stay where started # absolute path to a directory, we go there.</descr></var>
# - Absolute path value: go there.</descr></var>
idxrundir = tmp idxrundir = tmp
# <var name="checkneedretryindexscript" type="fn"> # <var name="checkneedretryindexscript" type="fn">
@ -525,7 +540,7 @@ idxrundir = tmp
# files which previously failed. </brief> <descr>The default script checks # files which previously failed. </brief> <descr>The default script checks
# the modified dates on /usr/bin and /usr/local/bin. A relative path will # the modified dates on /usr/bin and /usr/local/bin. A relative path will
# be looked up in the filters dirs, then in the path. Use an absolute path # be looked up in the filters dirs, then in the path. Use an absolute path
# to do otherwise.</descr> # to do otherwise.</descr></var>
checkneedretryindexscript = rclcheckneedretry.sh checkneedretryindexscript = rclcheckneedretry.sh
# <var name="recollhelperpath" type="string"> # <var name="recollhelperpath" type="string">
@ -569,9 +584,10 @@ checkneedretryindexscript = rclcheckneedretry.sh
# <var name="aspellAddCreateParam" type="string"> # <var name="aspellAddCreateParam" type="string">
# #
# <brief>Additional parameter to aspell dictionary creation # <brief>Additional option and parameter to aspell dictionary creation
# command.</brief><descr>Some aspell packages may need an additional option # command.</brief><descr>Some aspell packages may need an additional option
# (e.g. on Debian Jessie). See Debian bug 772415.</descr></var> # (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug
# 772415.</descr></var>
#aspellAddCreateParam = --local-data-dir=/usr/lib/aspell #aspellAddCreateParam = --local-data-dir=/usr/lib/aspell
# <var name="aspellKeepStderr" type="bool"> # <var name="aspellKeepStderr" type="bool">
@ -589,18 +605,21 @@ checkneedretryindexscript = rclcheckneedretry.sh
# disable the thing.</descr></var> # disable the thing.</descr></var>
#noaspell = 1 #noaspell = 1
# <var name="monixinterval" type="int"> # <var name="monauxinterval" type="int">
# #
# <brief>Seconds between auxiliary databases updates (stemdb, # <brief>Auxiliary database update interval.</brief><descr>The real time
# aspell).</brief><descr>The default is one hour.</descr></var> # indexer only updates the auxiliary databases (stemdb, aspell)
# periodically, because it would be too costly to do it for every document
# change. The default period is one hour.</descr></var>
#monauxinterval = 3600 #monauxinterval = 3600
# <var name="monixinterval" type="int"> # <var name="monixinterval" type="int">
# #
# <brief>Minimum interval (seconds) between processings of the indexing # <brief>Minimum interval (seconds) between processings of the indexing
# queue.</brief> <descr>The real time monitor does not process each event # queue.</brief><descr>The real time indexer does not process each event
# when it comes in, but lets the queue accumulate, to diminish overhead and # when it comes in, but lets the queue accumulate, to diminish overhead and
# to aggregate multiple events to the same file. Default 30 S.</descr></var> # to aggregate multiple events affecting the same file. Default 30
# S.</descr></var>
#monixinterval = 30 #monixinterval = 30
# <var name="mondelaypatterns" type="string"> # <var name="mondelaypatterns" type="string">
@ -611,14 +630,14 @@ checkneedretryindexscript = rclcheckneedretry.sh
# reindexed once in a while. A list of wildcardPattern:seconds pairs. The # reindexed once in a while. A list of wildcardPattern:seconds pairs. The
# patterns are matched with fnmatch(pattern, path, 0) You can quote entries # patterns are matched with fnmatch(pattern, path, 0) You can quote entries
# containing white space with double quotes (quote the whole entry, not the # containing white space with double quotes (quote the whole entry, not the
# pattern). The default is empty. Example:mondelaypatterns = *.log:20 # pattern). The default is empty.
# "*with spaces.*:30"</descr></brief> # Example: mondelaypatterns = *.log:20 "*with spaces.*:30"</descr></var>
#mondelaypatterns = *.log:20 "*with spaces.*:30" #mondelaypatterns = *.log:20 "*with spaces.*:30"
# <var name="monioniceclass" type="int"> # <var name="monioniceclass" type="int">
# #
# <brief>ionice class for the real time indexing process</brief> # <brief>ionice class for the real time indexing process</brief>
# <descr>On platforms where this is supported, the default value is # <descr>On platforms where this is supported. The default value is
# 3.</descr></var> # 3.</descr></var>
# monioniceclass = 3 # monioniceclass = 3
@ -631,11 +650,12 @@ checkneedretryindexscript = rclcheckneedretry.sh
# <grouptitle>Query-time parameters (no impact on the index)</grouptitle> # <grouptitle id="QUERY">Query-time parameters (no impact on the
# index)</grouptitle>
# <var name="autodiacsens" type="bool"> # <var name="autodiacsens" type="bool">
# #
# <brief>auto-trigger diacritics sensitivity (raw index only)</brief> # <brief>auto-trigger diacritics sensitivity (raw index only).</brief>
# <descr>IF the index is not stripped, decide if we automatically trigger # <descr>IF the index is not stripped, decide if we automatically trigger
# diacritics sensitivity if the search term has accented characters (not in # diacritics sensitivity if the search term has accented characters (not in
# unac_except_trans). Else you need to use the query language and the "D" # unac_except_trans). Else you need to use the query language and the "D"
@ -644,7 +664,7 @@ autodiacsens = 0
# <var name="autocasesens" type="bool"> # <var name="autocasesens" type="bool">
# #
# <brief>auto-trigger case sensitivity (raw index only)</brief> <descr>IF # <brief>auto-trigger case sensitivity (raw index only).</brief><descr>IF
# the index is not stripped (see indexStripChars), decide if we # the index is not stripped (see indexStripChars), decide if we
# automatically trigger character case sensitivity if the search term has # automatically trigger character case sensitivity if the search term has
# upper-case characters in any but the first position. Else you need to use # upper-case characters in any but the first position. Else you need to use
@ -668,14 +688,14 @@ maxXapianClauses = 50000
# <var name="snippetMaxPosWalk" type="int"> # <var name="snippetMaxPosWalk" type="int">
# #
# <brief>Maximum number of positions we walk while populating a snippet for the # <brief>Maximum number of positions we walk while populating a snippet for
# result list.</brief><descr>The default of 1,000,000 may be insufficient # the result list.</brief><descr>The default of 1,000,000 may be
# for big documents, the consequence would be snippets with possibly # insufficient for very big documents, the consequence would be snippets
# meaning-altering missing words.</descr></var> # with possibly meaning-altering missing words.</descr></var>
snippetMaxPosWalk = 1000000 snippetMaxPosWalk = 1000000
# <grouptitle>Parameters for the PDF input script</grouptitle> # <grouptitle id="PDF">Parameters for the PDF input script</grouptitle>
# <var name="pdfocr" type="bool"> # <var name="pdfocr" type="bool">
# #
@ -693,7 +713,8 @@ snippetMaxPosWalk = 1000000
#pdfattach = 0 #pdfattach = 0
# <grouptitle>Parameters set for specific locations</grouptitle> # <grouptitle id="SPECLOCATIONS">Parameters set for specific
# locations</grouptitle>
# You could specify different parameters for a subdirectory like this: # You could specify different parameters for a subdirectory like this:
#[~/hungariandocs/plain] #[~/hungariandocs/plain]