Use structured comments in recoll.conf and use them to generate the docbook and man page texts

2016-05-26 18:20:09 +02:00 · 2016-05-26 18:20:09 +02:00 · 8200bb78d2
commit 8200bb78d2
parent a1a2bbf952
6 changed files with 2376 additions and 2267 deletions
--- a/src/doc/man/recoll.conf.5
+++ b/src/doc/man/recoll.conf.5
@ -54,315 +54,565 @@ Where values are lists, white space is used for separation, and elements with
 embedded spaces can be quoted with double-quotes.
 .SH OPTIONS
 .TP
-.BI "topdirs = "  directories
+.BI "topdirs = "string
-Specifies the list of directories to index (recursively). 
+Space-separated list of files or
 directories to recursively index. Default to ~ (indexes
 $HOME). You can use symbolic links in the list, they will be followed,
 independantly of the value of the followLinks variable.
 .TP
-.BI "skippedNames = " patterns
+.BI "skippedNames = "string
-A space-separated list of patterns for names of files or directories that
+Files and directories which should be ignored. 
-should be completely ignored. The list defined in the default file is:
+White space separated list of wildcard patterns (simple ones, not paths,
-.sp
+must contain no / ), which will be tested against file and directory
-.nf
+names.  The list in the default configuration does not exclude hidden
-*~ #* bin CVS  Cache caughtspam  tmp
+directories (names beginning with a dot), which means that it may index
 quite a few things that you do not want. On the other hand, email user
 agents like Thunderbird usually store messages in hidden directories, and
 you probably want this indexed. One possible solution is to have '.*' in
 'skippedNames', and add things like '~/.thunderbird' '~/.evolution' to
 'topdirs'.  Not even the file names are indexed for patterns in this
 list, see the 'noContentSuffixes' variable for an alternative approach
 which indexes the file names. Can be redefined for any
 subtree.
 .TP
 .BI "noContentSuffixes = "string
 List of name endings (not necessarily dot-separated suffixes) for
 which we don't try MIME type identification, and don't uncompress or
 index content. Only the names will be indexed. This
 complements the now obsoleted recoll_noindex list from the mimemap file,
 which will go away in a future release (the move from mimemap to
 recoll.conf allows editing the list through the GUI). This is different
 from skippedNames because these are name ending matches only (not
 wildcard patterns), and the file name itself gets indexed normally. This
 can be redefined for subdirectories.
 .TP
 .BI "skippedPaths = "string
 Paths we should not go into. Space-separated list of
 wildcard expressions for filesystem paths. Can contain files and
 directories. The database and configuration directories will
 automatically be added. The expressions are matched using 'fnmatch(3)'
 with the FNM_PATHNAME flag set by default. This means that '/' characters
 must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0
 to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match
 '/dir1/dir2/dir3').  The default value contains the usual mount point for
 removable media to remind you that it is a bad idea to have Recoll work
 on these (esp. with the monitor: media gets indexed on mount, all data
 gets erased on unmount).  Explicitely adding '/media/xxx' to the topdirs
 will override this.
 .TP
 .BI "skippedPathsFnmPathname = "bool
 Set to 0 to
 override use of FNM_PATHNAME for matching skipped
 paths. 
 .TP
 .BI "daemSkippedPaths = "string
 skippedPaths equivalent specific to
 real time indexing. This enables having parts of the tree
 which are initially indexed but not monitored. If daemSkippedPaths is
 not set, the daemon uses skippedPaths.
 .TP
 .BI "zipSkippedNames = "string
 Space-separated list of wildcard expressions for names that should
 be ignored inside zip archives. This is used directly by
 the zip handler, and has a function similar to skippedNames, but works
 independantly. Can be redefined for subdirectories. Supported by recoll
 1.20 and newer. See
 https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members
 .fi
 The list can be redefined for subdirectories, but is only actually changed
 for the top level ones in 
 .I topdirs
 .TP
-.BI "skippedPaths = " patterns
+.BI "followLinks = "bool
-A space-separated list of patterns for paths the indexer should not descend
+Follow symbolic links during
-into. Together with topdirs, this allows pruning the indexed tree to one's
+indexing. The default is to ignore symbolic links to avoid
-content.
+multiple indexing of linked files. No effort is made to avoid duplication
-.B daemSkippedPaths 
+when this option is set to true. This option can be set individually for
-can be used to define a specific value for the real time indexing monitor.
+each of the 'topdirs' members by using sections. It can not be changed
 below the 'topdirs' level. Links in the 'topdirs' list itself are always
 followed.
 .TP
-.BI "skippedPathsFnmPathname = " 0/1
+.BI "indexedmimetypes = "string
-The values in the *skippedPaths variables are matched by default with
+Restrictive list of
-fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags. This means
+indexed mime types. Normally not set (in which case all
-that '/' characters must be matched explicitly. You can set
+supported types are indexed). If it is set,
-skippedPathsFnmPathname to 0 to disable the use of FNM_PATHNAME (meaning
+only the types from the list will have their contents indexed. The names
-that /*/dir3 will match /dir1/dir2/dir3). 
+will be indexed anyway if indexallfilenames is set (default). MIME
 type names should be taken from the mimemap file. Can be redefined for
 subtrees.
 .TP
-.BI "followLinks = " boolean
+.BI "excludedmimetypes = "string
-Specifies if the indexer should follow
+List of excluded MIME
-symbolic links while walking the file tree. The default is
+types. Lets you exclude some types from indexing. Can be
-to ignore symbolic links to avoid multiple indexing of
+redefined for subtrees.
 linked files. No effort is made to avoid duplication when
 this option is set to true. This option can be set
 individually for each of the 
 .I topdirs
 members by using sections. It can not be changed below the
 .I topdirs
 level.
 .TP
-.BI "indexedmimetypes = " list
+.BI "compressedfilemaxkbs = "int
-Recoll normally indexes any file which it knows how to read. This list lets
+Size limit for compressed
-you restrict the indexed mime types to what you specify. If the variable is
+files. We need to decompress these in a
-unspecified or the list empty (the default), all supported types are
+temporary directory for identification, which can be wasteful in some
-processed.
+cases. Limit the waste. Negative means no limit. 0 results in no
 processing of any compressed file. Default 50 MB.
 .TP
-.BI "compressedfilemaxkbs = " value
+.BI "textfilemaxmbs = "int
-Size limit for compressed (.gz or .bz2) files. These need to be
+Size limit for text
-decompressed in a temporary directory for identification, which can be very
+files. Mostly for skipping monster
-wasteful if 'uninteresting' big compressed files are present.  Negative
+logs. Default 20 MB.
 means no limit, 0 means no processing of any compressed file. Defaults 
 to \-1.
 .TP
-.BI "textfilemaxmbs = " value
+.BI "indexallfilenames = "bool
-Maximum size for text files. Very big text files are often uninteresting
+Index the file names of
-logs. Set to \-1 to disable (default 20MB). 
+unprocessed files Index the names of files the contents of
 which we don't index because of an excluded or unsupported MIME
 type.
 .TP
-.BI "textfilepagekbs = " value
+.BI "usesystemfilecommand = "bool
-If this is set to other than \-1, text files will be indexed as multiple
+Use a system command
-documents of the given page size. This may be useful if you do want to
+for file MIME type guessing as a final step in file type
-index very big text files as it will both reduce memory usage at index time
+identification This is generally useful, but will usually
-and help with loading data to the preview window. A size of a few megabytes
+cause the indexing of many bogus 'text' files. See 'systemfilecommand'
-would seem reasonable (default: 1000 : 1MB).
+for the command used.
 .TP
-.BI "membermaxkbs = " "value in kilobytes"
+.BI "systemfilecommand = "string
-This defines the maximum size for an archive member (zip, tar or rar at
+Command used to guess
-the moment). Bigger entries will be skipped. Current default: 50000 (50 MB).
+MIME types if the internal methods fails This should be a
 "file -i" workalike.  The file path will be added as a last parameter to
 the command line. 'xdg-mime' works better than the traditional 'file'
 command, and is now the configured default (with a hard-coded fallback to
 'file')
 .TP
-.BI "indexallfilenames = " boolean
+.BI "processwebqueue = "bool
-Recoll indexes file names into a special section of the database to allow
+Decide if we process the
-specific file names searches using wild cards. This parameter decides if
+Web queue. The queue is a directory where the Recoll Web
-file name indexing is performed only for files with mime types that would
+browser plugins create the copies of visited pages.
 qualify them for full text indexing, or for all files inside
 the selected subtrees, independent of mime type.
 .TP
-.BI "usesystemfilecommand = " boolean
+.BI "textfilepagekbs = "int
-Decide if we use the 
+Page size for text
-.B "file \-i"
+files. If this is set, text/plain files will be divided
-system command as a final step for determining the mime type for a file
+into documents of approximately this size. Will reduce memory usage at
-(the main procedure uses suffix associations as defined in the 
+index time and help with loading data in the preview window at query
-.B mimemap 
+time. Particularly useful with very big files, such as application or
-file). This can be useful for files with suffixless names, but it will
+system logs. Also see textfilemaxmbs and
-also cause the indexing of many bogus "text" files.
+compressedfilemaxkbs.
 .TP
-.BI "processbeaglequeue = " 0/1
+.BI "membermaxkbs = "int
-If this is set, process the directory where Beagle Web browser plugins copy
+Size limit for archive
-visited pages for indexing. Of course, Beagle MUST NOT be running, else
+members. This is passed to the filters in the environment
-things will behave strangely. 
+as RECOLL_FILTER_MAXMEMBERKB.
 .TP
-.BI "beaglequeuedir = " directory path
+.BI "indexStripChars = "bool
-The path to the Beagle indexing queue. This is hard-coded in the Beagle
+Decide if we store
-plugin as ~/.beagle/ToIndex so there should be no need to change it. 
+character case and diacritics in the index. If we do,
-.TP 
+searches sensitive to case and diacritics can be performed, but the index
-.BI "indexStripChars = " 0/1
+will be bigger, and some marginal weirdness may sometimes occur. The
-Decide if we strip characters of diacritics and convert them to lower-case
+default is a stripped index. When using multiple indexes for a search,
 before terms are indexed. If we don't, searches sensitive to case and
 diacritics can be performed, but the index will be bigger, and some
 marginal weirdness may sometimes occur. The default is a stripped index
 (indexStripChars = 1) for now. When using multiple indexes for a search,
 this parameter must be defined identically for all. Changing the value
 implies an index reset.
 .TP
-.BI "maxTermExpand = " value
+.BI "nonumbers = "bool
-Maximum expansion count for a single term (e.g.: when using wildcards). The
+Decides if terms will be
-default of 10000 is reasonable and will avoid queries that appear frozen
+generated for numbers. For example "123", "1.5e6",
-while the engine is walking the term list. 
+192.168.1.4, would not be indexed if nonumbers is set ("value123" would
 still be). Numbers are often quite interesting to search for, and this
 should probably not be set except for special situations, ie, scientific
 documents with huge amounts of numbers in them, where setting nonumbers
 will reduce the index size. This can only be set for a whole index, not
 for a subtree.
 .TP
-.BI "maxXapianClauses = " value
+.BI "dehyphenate = "bool
-Maximum number of elementary clauses we can add to a single Xapian
+Determines if we index
-query. In some cases, the result of term expansion can be multiplicative,
+'coworker' also when the input is 'co-worker'. This is new
-and we want to avoid using excessive memory. The default of 100 000 should
+in version 1.22, and on by default. Setting the variable to off allows
-be both high enough in most cases and compatible with current typical
+restoring the previous behaviour.
 hardware configurations. 
 .TP
-.BI "nonumbers = " 0/1
+.BI "nocjk = "bool
-If this set to true, no terms will be generated for numbers. For example
+Decides if specific East Asian
-"123", "1.5e6", 192.168.1.4, would not be indexed ("value123" would still
+(Chinese Korean Japanese) characters/word splitting is turned
-be). Numbers are often quite interesting to search for, and this should
+off. This will save a small amount of CPU if you have no CJK
-probably not be set except for special situations, ie, scientific documents
+documents. If your document base does include such text but you are not
-with huge amounts of numbers in them. This can only be set for a whole
+interested in searching it, setting nocjk may be a
-index, not for a subtree. 
+significant time and space saver.
 .TP
-.BI "nocjk = " boolean
+.BI "cjkngramlen = "int
-If this set to true, specific east asian (Chinese Korean Japanese)
+This lets you adjust the size of
-characters/word splitting is turned off. This will save a small amount of
+n-grams used for indexing CJK text. The default value of 2 is
-cpu if you have no CJK documents. If your document base does include such
+probably appropriate in most cases. A value of 3 would allow more precision
-text but you are not interested in searching it, setting
+and efficiency on longer words, but the index will be approximately twice
-.I nocjk
+as large.
 may be a significant time and space saver.
 .TP
-.BI "cjkngramlen = " value
+.BI "indexstemminglanguages = "string
-This lets you adjust the size of n-grams used for indexing CJK text. The
+Languages for which to create stemming expansion
-default value of 2 is probably appropriate in most cases. A value of 3
+data. Stemmer names can be found by executing 'recollindex
-would allow more precision and efficiency on longer words, but the index
+-l', or this can also be set from a list in the GUI.
 will be approximately twice as large.
 .TP
-.BI "indexstemminglanguages = " languages
+.BI "defaultcharset = "string
-A list of languages for which the stem expansion databases will be
+Default character
-built. See recollindex(1) for possible values.
+set. This is used for files which do not contain a
 character set definition (e.g.: text/plain). Values found inside files,
 e.g. a 'charset' tag in HTML documents, will override it. If this is not
 set, the default character set is the one defined by the NLS environment
 ($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact).
 If for some reason you want a general default which does not match your
 LANG and is not 8859-1, use this variable. This can be redefined for any
 sub-directory.
 .TP
-.BI "defaultcharset = " charset
+.BI "unac_except_trans = "string
-The name of the character set used for files that do not contain a
+A list of characters,
-character set definition (ie: plain text files). This can be redefined for
+encoded in UTF-8, which should be handled specially
-any subdirectory.
+when converting text to unaccented lowercase. For
 example, in Swedish, the letter a with diaeresis has full alphabet
 citizenship and should not be turned into an a.
 Each element in the space-separated list has the special character as
 first element and the translation following. The handling of both the
 lowercase and upper-case versions of a character should be specified, as
 appartenance to the list will turn-off both standard accent and case
 processing. The value is global and affects both indexing and querying.
 Examples:
 Swedish:
 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl åå Åå
 . German:
 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
 In French, you probably want to decompose oe and ae and nobody would type
 a German ß
 unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
 . The default for all until someone protests follows. These decompositions
 are not performed by unac, but it is unlikely that someone would type the
 composed forms in a search.
 unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
 .TP
-.BI "unac_except_trans = " "list of utf-8 groups"
+.BI "maildefcharset = "string
-This is a list of characters, encoded in UTF-8, which should be handled
+Overrides the default
-specially when converting text to unaccented lowercase. For example, in
+character set for email messages which don't specify
-Swedish, the letter "a with diaeresis" has full alphabet citizenship and
+one. This is mainly useful for readpst (libpst) dumps,
-should not be turned into an a. 
+which are utf-8 but do not say so.
 .br
 Each element in the space-separated list has the special character as first
 element and the translation following. The handling of both the lowercase
 and upper-case versions of a character should be specified, as appartenance
 to the list will turn-off both standard accent and case processing.
 .br
 Note that the translation is not limited to a single character.
 .br
 This parameter cannot be redefined for subdirectories, it is global,
 because there is no way to do otherwise when querying. If you have document
 sets which would need different values, you will have to index and query
 them separately.
 .TP
-.BI "maildefcharset = " character set name
+.BI "localfields = "string
-This can be used to define the default character set specifically for email
+Set fields on all files
-messages which don't specify it. This is mainly useful for readpst (libpst)
+(usually of a specific fs area). Syntax is the usual:
-dumps, which are utf-8 but do not say so. 
+name = value ; attr1 = val1 ; [...]
 value is empty so this needs an initial semi-colon. This is useful, e.g.,
 for setting the rclaptg field for application selection inside
 mimeview.
 .TP
-.BI "localfields = " "fieldname = value:..."
+.BI "testmodifusemtime = "bool
-This allows setting fields for all documents under a given
+Use mtime instead of
-directory. Typical usage would be to set an "rclaptg" field, to be used in
+ctime to test if a file has been modified. The time is used
-mimeview to select a specific viewer. If several fields are to be set, they
+in addition to the size, which is always used.
-should be separated with a colon (':') character (which there is currently
+Setting this can reduce re-indexing on systems where extended attributes
-no way to escape). Ie: localfields= rclaptg=gnus:other = val, then select
+are used (by some other application), but not indexed, because changing
-specifier viewer with mimetype|tag=... in mimeview. 
+extended attributes only affects ctime.
 Notes:
 - This may prevent detection of change in some marginal file rename cases
 (the target would need to have the same size and mtime).
 - You should probably also set noxattrfields to 1 in this case, except if
 you still prefer to perform xattr indexing, for example if the local
 file update pattern makes it of value (as in general, there is a risk
 for pure extended attributes updates without file modification to go
 undetected). Perform a full index reset after changing this.
 .TP
-.BI "dbdir = " directory
+.BI "noxattrfields = "bool
-The name of the Xapian database directory. It will be created if needed
+Disable extended attributes
-when the database is initialized. If this is not an absolute pathname, it
+conversion to metadata fields. This probably needs to be
-will be taken relative to the configuration directory.
+set if testmodifusemtime is set.
 .TP
-.BI "idxstatusfile = " "file path"
+.BI "metadatacmds = "string
-The name of the scratch file where the indexer process updates its
+Define commands to
-status. Default: idxstatus.txt inside the configuration directory. 
+gather external metadata, e.g. tmsu tags. 
 There can be several entries, separated by semi-colons, each defining
 which field name the data goes into and the command to use. Don't forget the
 initial semi-colon. All the field names must be different. You can use
 aliases in the "field" file if necessary.
 As a not too pretty hack conceded to convenience, any field name
 beginning with "rclmulti" will be taken as an indication that the command
 returns multiple field values inside a text blob formatted as a recoll
 configuration file ("fieldname = fieldvalue" lines). The rclmultixx name
 will be ignored, and field names and values will be parsed from the data.
 Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
 .TP
-.BI "maxfsoccuppc = " percentnumber
+.BI "cachedir = "dfn
-Maximum file system occupation before we
+Top directory for Recoll data. Recoll data
-stop indexing. The value is a percentage, corresponding to
+directories are normally located relative to the configuration directory
-what the "Capacity" df output column shows.  The default
+(e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set, the
 directories are stored under the specified value instead (e.g. if
 cachedir is ~/.cache/recoll, the default dbdir would be
 ~/.cache/recoll/xapiandb).  This affects dbdir, webcachedir,
 mboxcachedir, aspellDicDir, which can still be individually specified to
 override cachedir.  Note that if you have multiple configurations, each
 must have a different cachedir, there is no automatic computation of a
 subpath under cachedir.
 .TP
 .BI "maxfsoccuppc = "int
 Maximum file system occupation
 over which we stop indexing. The value is a percentage,
 corresponding to what the "Capacity" df output column shows. The default
 value is 0, meaning no checking.
 .TP
-.BI "mboxcachedir = " "directory path"
+.BI "xapiandb = "dfn
-The directory where mbox message offsets cache files are held. This is
+Xapian database directory
-normally $RECOLL_CONFDIR/mboxcache, but it may be useful to share a
+location. This will be created on first indexing. If the
-directory between different configurations. 
+value is not an absolute path, it will be interpreted as relative to
 cachedir if set, or the configuration directory (-c argument or
 $RECOLL_CONFDIR).  If nothing is specified, the default is then
 ~/.recoll/xapiandb/
 .TP
-.BI "mboxcacheminmbs = " "value in megabytes"
+.BI "idxstatusfile = "fn
-The minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The default is 5 MB.
+Name of the scratch file where the indexer process updates its
 status. Default: idxstatus.txt inside the configuration
 directory.
 .TP
-.BI "webcachedir = " "directory path"
+.BI "mboxcachedir = "dfn
-This is only used by the Beagle web browser plugin indexing code, and
+Directory location for storing mbox message offsets cache
-defines where the cache for visited pages will live. Default:
+files. This is normally 'mboxcache' under cachedir if set,
 or else under the configuration directory, but it may be useful to share
 a directory between different configurations.
 .TP
 .BI "mboxcacheminmbs = "int
 Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The
 default is 5 MB.
 .TP
 .BI "webcachedir = "dfn
 Directory where we store the archived web pages. This is only used by the web history indexing code
 Default: cachedir/webcache if cachedir is set, else
 $RECOLL_CONFDIR/webcache
 .TP
-.BI "webcachemaxmbs = " "value in megabytes"
+.BI "webcachemaxmbs = "int
-This is only used by the Beagle web browser plugin indexing code, and
+Maximum size in MB of the Web archive. This is only used by the web history indexing code.
-defines the maximum size for the web page cache. Default: 40 MB. 
+Default: 40 MB.
 Reducing the size will not physically truncate the file.
 .TP
-.BI "idxflushmb = " megabytes
+.BI "webqueuedir = "fn
-Threshold (megabytes of new text data)
+The path to the Web indexing queue. This is
-where we flush from memory to disk index. Setting this can
+hard-coded in the plugin as ~/.recollweb/ToIndex so there should be no
-help control memory usage. A value of 0 means no explicit
+need or possibility to change it.
 flushing, letting Xapian use its own default, which is
 flushing every 10000 documents (or XAPIAN_FLUSH_THRESHOLD), meaning that
 memory usage depends on average document size. The default value is 10.
 .TP
-.BI "autodiacsens = " 0/1
+.BI "aspellDicDir = "dfn
-IF the index is not stripped, decide if we automatically trigger diacritics
+Aspell dictionary storage directory location. The
-sensitivity if the search term has accented characters (not in
+aspell dictionary (aspdict.(lang).rws) is normally stored in the
-unac_except_trans). Else you need to use the query language and the D
+directory specified by cachedir if set, or under the configuration
-modifier to specify diacritics sensitivity. Default is no. 
+directory.
 .TP
-.BI "autocasesens = " 0/1
+.BI "filtersdir = "dfn
-IF the index is not stripped, decide if we automatically trigger character
+Directory location for executable input handlers. If
-case sensitivity if the search term has upper-case characters in any but
+RECOLL_FILTERSDIR is set in the environment, we use it instead. Defaults
-the first position. Else you need to use the query language and the C
+to $prefix/share/recoll/filters. Can be redefined for
-modifier to specify character-case sensitivity. Default is yes. 
+subdirectories.
 .TP
-.BI "loglevel = " value
+.BI "iconsdir = "dfn
-Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
+Directory location for icons. The only reason to
-debug/information messages. 3 lists only errors. 
+change this would be if you want to change the icons displayed in the
-.B daemloglevel
+result list. Defaults to $prefix/share/recoll/images
 can be used to specify a different value for the real-time indexing daemon.
 .TP
-.BI "logfilename = " file
+.BI "idxflushmb = "int
-Where should the messages go. 'stderr' can be used as a special value.
+Threshold (megabytes of new data) where we flush from memory to
-.B daemlogfilename
+disk index. Setting this allows some control over memory
-can be used to specify a different value for the real-time indexing daemon.
+usage by the indexer process. A value of 0 means no explicit flushing,
 which lets Xapian perform its own thing, meaning flushing every
 $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
 usage depends on average document size, not only document count, the
 Xapian approach is is not very useful, and you should let Recoll manage
 the flushes.  The default value of idxflushmb is 10 MB, and may be a bit
 low. If you are looking for maximum speed, you may want to experiment
 with values between 20 and
 80. In my experience, values beyond 100 are always counterproductive. If
 you find otherwise, please drop me a note.
 .TP
-.BI "mondelaypatterns = " "list of patterns"
+.BI "filtermaxseconds = "int
-This allows specify wildcard path patterns (processed with fnmatch(3) with
+Maximum external filter execution time in
-0 flag), to match files which change too often and for which a delay should
+seconds. Default 1200 (20mn). Set to 0 for no limit. This
-be observed before re-indexing. This is a space-separated list, each entry
+is mainly to avoid infinite loops in postscript files
-being a pattern and a time in seconds, separated by a colon. You can use
+(loop.ps)
 double quotes if a path entry contains white space. Example: 
 .sp
 mondelaypatterns = *.log:20 "this one has spaces*:10"
 .TP
-.BI "monixinterval = " "value in seconds
+.BI "filtermaxmbytes = "int
-Minimum interval (seconds) for processing the indexing queue. The real time
+Maximum virtual memory space for filter processes
-monitor does not process each event when it comes in, but will wait this
+(setrlimit(RLIMIT_AS)), in megabytes. Note that this
-time for the queue to accumulate to diminish overhead and in order to
+includes any mapped libs (there is no reliable Linux way to limit the
-aggregate multiple events to the same file. Default 30 S. 
+data space only), so we need to be a bit generous here. Anything over
 2000 will be ignored on 32 bits machines.
 .TP
-.BI "monauxinterval = " "value in seconds
+.BI "thrQSizes = "string
-Period (in seconds) at which the real time monitor will regenerate the
+Stage input queues configuration. There are three
-auxiliary databases (spelling, stemming) if needed. The default is one
+internal queues in the indexing pipeline stages (file data extraction,
-hour. 
+terms generation, index update). This parameter defines the queue depths
 for each stage (three integer values). If a value of -1 is given for a
 given stage, no queue is used, and the thread will go on performing the
 next stage. In practise, deep queues have not been shown to increase
 performance. Default: a value of 0 for the first queue tells Recoll to
 perform autoconfiguration based on the detected number of CPUs (no need
 for the two other values in this case).  Use thrQSizes = -1 -1 -1 to
 disable multithreading entirely.
 .TP
-.BI "monioniceclass, monioniceclassdata"
+.BI "thrTCounts = "string
-These allow defining the ionice class and data used by the indexer (default
+Number of threads used for each indexing stage. The
-class 3, no data). 
+three stages are: file data extraction, terms generation, index
 update). The use of the counts is also controlled by some special values
 in thrQSizes: if the first queue depth is 0, all counts are ignored
 (autoconfigured); if a value of -1 is used for a queue depth, the
 corresponding thread count is ignored. It makes no sense to use a value
 other than 1 for the last stage because updating the Xapian index is
 necessarily single-threaded (and protected by a mutex).
 .TP
-.BI "filtermaxseconds = " "value in seconds"
+.BI "loglevel = "int
-Maximum filter execution time, after which it is aborted. Some postscript
+Log file verbosity 1-6. A value of 2 will print
-programs just loop... 
+only errors and warnings. 3 will print information like document updates,
 4 is quite verbose and 6 very verbose.
 .TP
-.BI "filtersdir = " directory
+.BI "logfilename = "fn
-A directory to search for the external filter scripts used to index some
+Log file destination. Use 'stderr' (default) to write to the
-types of files. The value should not be changed, except if you want to
+console. 
 modify one of the default scripts. The value can be redefined for any
 subdirectory. 
 .TP
-.BI "iconsdir = " directory
+.BI "idxloglevel = "int
-The name of the directory where 
+Override loglevel for the indexer. 
 .B recoll
 result list icons are stored. You can change this if you want different
 images.
 .TP
-.BI "idxabsmlen = " value
+.BI "idxlogfilename = "fn
-Recoll stores an abstract for each indexed file inside the database. The
+Override logfilename for the indexer. 
-text can come from an actual 'abstract' section in the document or will
+.TP
-just be the beginning of the document. It is stored in the index so that it
+.BI "daemloglevel = "int
-can be displayed inside the result lists without decoding the original
+Override loglevel for the indexer in real time
-file. The
+mode. The default is to use the idx... values if set, else
-.I idxabsmlen
+the log... values.
-parameter defines the size of the stored abstract. The default value is 250
+.TP
-bytes.  The search interface gives you the choice to display this stored
+.BI "daemlogfilename = "fn
 Override logfilename for the indexer in real time
 mode. The default is to use the idx... values if set, else
 the log... values.
 .TP
 .BI "idxrundir = "dfn
 Indexing process current directory. The input
 handlers sometimes leave temporary files in the current directory, so it
 makes sense to have recollindex chdir to some temporary directory. If the
 value is empty, the current directory is not changed. If the
 value is (literal) tmp, we use the temporary directory as set by the
 environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an
 absolute path to a directory, we go there.
 .TP
 .BI "checkneedretryindexscript = "fn
 Script used to heuristically check if we need to retry indexing
 files which previously failed.  The default script checks
 the modified dates on /usr/bin and /usr/local/bin. A relative path will
 be looked up in the filters dirs, then in the path. Use an absolute path
 to do otherwise.
 .TP
 .BI "recollhelperpath = "string
 Additional places to search for helper executables. This is only used on Windows for now.
 .TP
 .BI "idxabsmlen = "int
 Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file.
 The text can come from an actual 'abstract' section in the
 document or will just be the beginning of the document. It is stored in
 the index so that it can be displayed inside the result lists without
 decoding the original file. The idxabsmlen parameter
 defines the size of the stored abstract. The default value is 250
 bytes. The search interface gives you the choice to display this stored
 text or a synthetic abstract built by extracting text around the search
 terms. If you always prefer the synthetic abstract, you can reduce this
 value and save a little space.
 .TP
-.BI "aspellLanguage = " lang
+.BI "idxmetastoredlen = "int
-Language definitions to use when creating the aspell dictionary.  The value
+Truncation length of stored metadata fields. This
-must match a set of aspell language definition files. You can type "aspell
+does not affect indexing (the whole field is processed anyway), just the
-config" to see where these are installed (look for data-dir). The default
+amount of data stored in the index for the purpose of displaying fields
-if the variable is not set is to use your desktop national language
+inside result lists or previews. The default value is 150 bytes which
-environment to guess the value.
+may be too low if you have custom fields.
 .TP
-.BI "noaspell = " boolean
+.BI "aspellLanguage = "string
-If this is set, the aspell dictionary generation is turned off. Useful for
+Language definitions to use when creating the aspell
-cases where you don't need the functionality or when it is unusable because
+dictionary. The value must match a set of aspell language
-aspell crashes during dictionary generation.
+definition files. You can type "aspell dicts"  to see a list The default
 if this is not set is to use the NLS environment to guess the
 value.
 .TP
-.BI "mhmboxquirks = " flags
+.BI "aspellAddCreateParam = "string
-This allows definining location-related quirks for the mailbox
+Additional option and parameter to aspell dictionary creation
-handler. Currently only the tbird flag is defined, and it should be set for
+command. Some aspell packages may need an additional option
-directories which hold Thunderbird data, as their folder format is weird. 
+(e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug
 772415.
 .TP
 .BI "aspellKeepStderr = "bool
 Set this to have a look at aspell dictionary creation
 errors. There are always many, so this is mostly for
 debugging.
 .TP
 .BI "noaspell = "bool
 Disable aspell use. The aspell dictionary generation
 takes time, and some combinations of aspell version, language, and local
 terms, result in aspell crashing, so it sometimes makes sense to just
 disable the thing.
 .TP
 .BI "monauxinterval = "int
 Auxiliary database update interval. The real time
 indexer only updates the auxiliary databases (stemdb, aspell)
 periodically, because it would be too costly to do it for every document
 change. The default period is one hour.
 .TP
 .BI "monixinterval = "int
 Minimum interval (seconds) between processings of the indexing
 queue. The real time indexer does not process each event
 when it comes in, but lets the queue accumulate, to diminish overhead and
 to aggregate multiple events affecting the same file. Default 30
 S.
 .TP
 .BI "mondelaypatterns = "string
 Timing parameters for the real time indexing. Definitions for files which get a longer delay before reindexing
 is allowed. This is for fast-changing files, that should only be
 reindexed once in a while. A list of wildcardPattern:seconds pairs. The
 patterns are matched with fnmatch(pattern, path, 0) You can quote entries
 containing white space with double quotes (quote the whole entry, not the
 pattern). The default is empty.
 Example: mondelaypatterns = *.log:20 "*with spaces.*:30"
 .TP
 .BI "monioniceclass = "int
 ionice class for the real time indexing process On platforms where this is supported. The default value is
 3.
 .TP
 .BI "monioniceclassdata = "string
 ionice class parameter for the real time indexing process. On platforms where this is supported. The default is
 empty.
 .TP
 .BI "autodiacsens = "bool
 auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger
 diacritics sensitivity if the search term has accented characters (not in
 unac_except_trans). Else you need to use the query language and the "D"
 modifier to specify diacritics sensitivity. Default is no.
 .TP
 .BI "autocasesens = "bool
 auto-trigger case sensitivity (raw index only). IF
 the index is not stripped (see indexStripChars), decide if we
 automatically trigger character case sensitivity if the search term has
 upper-case characters in any but the first position. Else you need to use
 the query language and the "C" modifier to specify character-case
 sensitivity. Default is yes.
 .TP
 .BI "maxTermExpand = "int
 Maximum query expansion count
 for a single term (e.g.: when using wildcards). This only
 affects queries, not indexing. We used to not limit this at all (except
 for filenames where the limit was too low at 1000), but it is
 unreasonable with a big index. Default 10000.
 .TP
 .BI "maxXapianClauses = "int
 Maximum number of clauses
 we add to a single Xapian query. This only affects queries,
 not indexing. In some cases, the result of term expansion can be
 multiplicative, and we want to avoid eating all the memory. Default
 50000.
 .TP
 .BI "snippetMaxPosWalk = "int
 Maximum number of positions we walk while populating a snippet for
 the result list. The default of 1,000,000 may be
 insufficient for very big documents, the consequence would be snippets
 with possibly meaning-altering missing words.
 .TP
 .BI "pdfocr = "bool
 Attempt OCR of PDF files with no text content if both tesseract and
 pdftoppm are installed. The default is off because OCR is so
 very slow.
 .TP
 .BI "pdfattach = "bool
 Enable PDF attachment extraction by executing pdftk (if
 available). This is
 normally disabled, because it does slow down PDF indexing a bit even if
 not one attachment is ever found.
 .TP
 .BI "mhmboxquirks = "string
 Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are
 stored.
 .SH SEE ALSO
 .PP 
--- a/src/doc/user/Makefile
+++ b/src/doc/user/Makefile
@ -25,7 +25,7 @@ webh:
 	make -C webhelp
 usermanual.html: usermanual.xml
-	xsltproc ${commonoptions} \
+	xsltproc --xinclude ${commonoptions} \
            -o tmpfile.html "${XSLDIR}/html/docbook.xsl" $<
 	-tidy -indent tmpfile.html > usermanual.html
 	rm -f tmpfile.html
--- a/src/doc/user/recoll.conf.xml
+++ b/src/doc/user/recoll.conf.xml
@ -0,0 +1,588 @@
 <?xml version="1.0"?>
 <sect2 id="RCL.INSTALL.CONFIG.RECOLLCONF">
 <title>Recoll main configuration file, recoll.conf </title>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.WHATDOCS">
 <title>Parameters affecting what documents we index </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS">
 <term><varname>topdirs</varname></term>
 <listitem><para>Space-separated list of files or
 directories to recursively index. Default to ~ (indexes
 $HOME). You can use symbolic links in the list, they will be followed,
 independantly of the value of the followLinks variable.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
 <term><varname>skippedNames</varname></term>
 <listitem><para>Files and directories which should be ignored. 
 White space separated list of wildcard patterns (simple ones, not paths,
 must contain no / ), which will be tested against file and directory
 names.  The list in the default configuration does not exclude hidden
 directories (names beginning with a dot), which means that it may index
 quite a few things that you do not want. On the other hand, email user
 agents like Thunderbird usually store messages in hidden directories, and
 you probably want this indexed. One possible solution is to have '.*' in
 'skippedNames', and add things like '~/.thunderbird' '~/.evolution' to
 'topdirs'.  Not even the file names are indexed for patterns in this
 list, see the 'noContentSuffixes' variable for an alternative approach
 which indexes the file names. Can be redefined for any
 subtree.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES">
 <term><varname>noContentSuffixes</varname></term>
 <listitem><para>List of name endings (not necessarily dot-separated suffixes) for
 which we don't try MIME type identification, and don't uncompress or
 index content. Only the names will be indexed. This
 complements the now obsoleted recoll_noindex list from the mimemap file,
 which will go away in a future release (the move from mimemap to
 recoll.conf allows editing the list through the GUI). This is different
 from skippedNames because these are name ending matches only (not
 wildcard patterns), and the file name itself gets indexed normally. This
 can be redefined for subdirectories.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS">
 <term><varname>skippedPaths</varname></term>
 <listitem><para>Paths we should not go into. Space-separated list of
 wildcard expressions for filesystem paths. Can contain files and
 directories. The database and configuration directories will
 automatically be added. The expressions are matched using 'fnmatch(3)'
 with the FNM_PATHNAME flag set by default. This means that '/' characters
 must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0
 to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match
 '/dir1/dir2/dir3').  The default value contains the usual mount point for
 removable media to remind you that it is a bad idea to have Recoll work
 on these (esp. with the monitor: media gets indexed on mount, all data
 gets erased on unmount).  Explicitely adding '/media/xxx' to the topdirs
 will override this.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME">
 <term><varname>skippedPathsFnmPathname</varname></term>
 <listitem><para>Set to 0 to
 override use of FNM_PATHNAME for matching skipped
 paths. </para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMSKIPPEDPATHS">
 <term><varname>daemSkippedPaths</varname></term>
 <listitem><para>skippedPaths equivalent specific to
 real time indexing. This enables having parts of the tree
 which are initially indexed but not monitored. If daemSkippedPaths is
 not set, the daemon uses skippedPaths.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPSKIPPEDNAMES">
 <term><varname>zipSkippedNames</varname></term>
 <listitem><para>Space-separated list of wildcard expressions for names that should
 be ignored inside zip archives. This is used directly by
 the zip handler, and has a function similar to skippedNames, but works
 independantly. Can be redefined for subdirectories. Supported by recoll
 1.20 and newer. See
 https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members
 </para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FOLLOWLINKS">
 <term><varname>followLinks</varname></term>
 <listitem><para>Follow symbolic links during
 indexing. The default is to ignore symbolic links to avoid
 multiple indexing of linked files. No effort is made to avoid duplication
 when this option is set to true. This option can be set individually for
 each of the 'topdirs' members by using sections. It can not be changed
 below the 'topdirs' level. Links in the 'topdirs' list itself are always
 followed.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXEDMIMETYPES">
 <term><varname>indexedmimetypes</varname></term>
 <listitem><para>Restrictive list of
 indexed mime types. Normally not set (in which case all
 supported types are indexed). If it is set,
 only the types from the list will have their contents indexed. The names
 will be indexed anyway if indexallfilenames is set (default). MIME
 type names should be taken from the mimemap file. Can be redefined for
 subtrees.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES">
 <term><varname>excludedmimetypes</varname></term>
 <listitem><para>List of excluded MIME
 types. Lets you exclude some types from indexing. Can be
 redefined for subtrees.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
 <term><varname>compressedfilemaxkbs</varname></term>
 <listitem><para>Size limit for compressed
 files. We need to decompress these in a
 temporary directory for identification, which can be wasteful in some
 cases. Limit the waste. Negative means no limit. 0 results in no
 processing of any compressed file. Default 50 MB.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEMAXMBS">
 <term><varname>textfilemaxmbs</varname></term>
 <listitem><para>Size limit for text
 files. Mostly for skipping monster
 logs. Default 20 MB.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXALLFILENAMES">
 <term><varname>indexallfilenames</varname></term>
 <listitem><para>Index the file names of
 unprocessed files Index the names of files the contents of
 which we don't index because of an excluded or unsupported MIME
 type.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.USESYSTEMFILECOMMAND">
 <term><varname>usesystemfilecommand</varname></term>
 <listitem><para>Use a system command
 for file MIME type guessing as a final step in file type
 identification This is generally useful, but will usually
 cause the indexing of many bogus 'text' files. See 'systemfilecommand'
 for the command used.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SYSTEMFILECOMMAND">
 <term><varname>systemfilecommand</varname></term>
 <listitem><para>Command used to guess
 MIME types if the internal methods fails This should be a
 "file -i" workalike.  The file path will be added as a last parameter to
 the command line. 'xdg-mime' works better than the traditional 'file'
 command, and is now the configured default (with a hard-coded fallback to
 'file')</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PROCESSWEBQUEUE">
 <term><varname>processwebqueue</varname></term>
 <listitem><para>Decide if we process the
 Web queue. The queue is a directory where the Recoll Web
 browser plugins create the copies of visited pages.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEPAGEKBS">
 <term><varname>textfilepagekbs</varname></term>
 <listitem><para>Page size for text
 files. If this is set, text/plain files will be divided
 into documents of approximately this size. Will reduce memory usage at
 index time and help with loading data in the preview window at query
 time. Particularly useful with very big files, such as application or
 system logs. Also see textfilemaxmbs and
 compressedfilemaxkbs.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MEMBERMAXKBS">
 <term><varname>membermaxkbs</varname></term>
 <listitem><para>Size limit for archive
 members. This is passed to the filters in the environment
 as RECOLL_FILTER_MAXMEMBERKB.</para></listitem></varlistentry>
 </sect3>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
 <title>Parameters affecting how we generate terms </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTRIPCHARS">
 <term><varname>indexStripChars</varname></term>
 <listitem><para>Decide if we store
 character case and diacritics in the index. If we do,
 searches sensitive to case and diacritics can be performed, but the index
 will be bigger, and some marginal weirdness may sometimes occur. The
 default is a stripped index. When using multiple indexes for a search,
 this parameter must be defined identically for all. Changing the value
 implies an index reset.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS">
 <term><varname>nonumbers</varname></term>
 <listitem><para>Decides if terms will be
 generated for numbers. For example "123", "1.5e6",
 192.168.1.4, would not be indexed if nonumbers is set ("value123" would
 still be). Numbers are often quite interesting to search for, and this
 should probably not be set except for special situations, ie, scientific
 documents with huge amounts of numbers in them, where setting nonumbers
 will reduce the index size. This can only be set for a whole index, not
 for a subtree.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEHYPHENATE">
 <term><varname>dehyphenate</varname></term>
 <listitem><para>Determines if we index
 'coworker' also when the input is 'co-worker'. This is new
 in version 1.22, and on by default. Setting the variable to off allows
 restoring the previous behaviour.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
 <term><varname>nocjk</varname></term>
 <listitem><para>Decides if specific East Asian
 (Chinese Korean Japanese) characters/word splitting is turned
 off. This will save a small amount of CPU if you have no CJK
 documents. If your document base does include such text but you are not
 interested in searching it, setting nocjk may be a
 significant time and space saver.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CJKNGRAMLEN">
 <term><varname>cjkngramlen</varname></term>
 <listitem><para>This lets you adjust the size of
 n-grams used for indexing CJK text. The default value of 2 is
 probably appropriate in most cases. A value of 3 would allow more precision
 and efficiency on longer words, but the index will be approximately twice
 as large.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTEMMINGLANGUAGES">
 <term><varname>indexstemminglanguages</varname></term>
 <listitem><para>Languages for which to create stemming expansion
 data. Stemmer names can be found by executing 'recollindex
 -l', or this can also be set from a list in the GUI.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEFAULTCHARSET">
 <term><varname>defaultcharset</varname></term>
 <listitem><para>Default character
 set. This is used for files which do not contain a
 character set definition (e.g.: text/plain). Values found inside files,
 e.g. a 'charset' tag in HTML documents, will override it. If this is not
 set, the default character set is the one defined by the NLS environment
 ($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact).
 If for some reason you want a general default which does not match your
 LANG and is not 8859-1, use this variable. This can be redefined for any
 sub-directory.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.UNAC_EXCEPT_TRANS">
 <term><varname>unac_except_trans</varname></term>
 <listitem><para>A list of characters,
 encoded in UTF-8, which should be handled specially
 when converting text to unaccented lowercase. For
 example, in Swedish, the letter a with diaeresis has full alphabet
 citizenship and should not be turned into an a.
 Each element in the space-separated list has the special character as
 first element and the translation following. The handling of both the
 lowercase and upper-case versions of a character should be specified, as
 appartenance to the list will turn-off both standard accent and case
 processing. The value is global and affects both indexing and querying.
 Examples:
 Swedish:
 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl åå Åå
 . German:
 unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
 In French, you probably want to decompose oe and ae and nobody would type
 a German ß
 unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
 . The default for all until someone protests follows. These decompositions
 are not performed by unac, but it is unlikely that someone would type the
 composed forms in a search.
 unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAILDEFCHARSET">
 <term><varname>maildefcharset</varname></term>
 <listitem><para>Overrides the default
 character set for email messages which don't specify
 one. This is mainly useful for readpst (libpst) dumps,
 which are utf-8 but do not say so.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOCALFIELDS">
 <term><varname>localfields</varname></term>
 <listitem><para>Set fields on all files
 (usually of a specific fs area). Syntax is the usual:
 name = value ; attr1 = val1 ; [...]
 value is empty so this needs an initial semi-colon. This is useful, e.g.,
 for setting the rclaptg field for application selection inside
 mimeview.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESTMODIFUSEMTIME">
 <term><varname>testmodifusemtime</varname></term>
 <listitem><para>Use mtime instead of
 ctime to test if a file has been modified. The time is used
 in addition to the size, which is always used.
 Setting this can reduce re-indexing on systems where extended attributes
 are used (by some other application), but not indexed, because changing
 extended attributes only affects ctime.
 Notes:
 - This may prevent detection of change in some marginal file rename cases
 (the target would need to have the same size and mtime).
 - You should probably also set noxattrfields to 1 in this case, except if
 you still prefer to perform xattr indexing, for example if the local
 file update pattern makes it of value (as in general, there is a risk
 for pure extended attributes updates without file modification to go
 undetected). Perform a full index reset after changing this.
 </para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOXATTRFIELDS">
 <term><varname>noxattrfields</varname></term>
 <listitem><para>Disable extended attributes
 conversion to metadata fields. This probably needs to be
 set if testmodifusemtime is set.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.METADATACMDS">
 <term><varname>metadatacmds</varname></term>
 <listitem><para>Define commands to
 gather external metadata, e.g. tmsu tags. 
 There can be several entries, separated by semi-colons, each defining
 which field name the data goes into and the command to use. Don't forget the
 initial semi-colon. All the field names must be different. You can use
 aliases in the "field" file if necessary.
 As a not too pretty hack conceded to convenience, any field name
 beginning with "rclmulti" will be taken as an indication that the command
 returns multiple field values inside a text blob formatted as a recoll
 configuration file ("fieldname = fieldvalue" lines). The rclmultixx name
 will be ignored, and field names and values will be parsed from the data.
 Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
 </para></listitem></varlistentry>
 </sect3>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.STORE">
 <title>Parameters affecting where and how we store things </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CACHEDIR">
 <term><varname>cachedir</varname></term>
 <listitem><para>Top directory for Recoll data. Recoll data
 directories are normally located relative to the configuration directory
 (e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set, the
 directories are stored under the specified value instead (e.g. if
 cachedir is ~/.cache/recoll, the default dbdir would be
 ~/.cache/recoll/xapiandb).  This affects dbdir, webcachedir,
 mboxcachedir, aspellDicDir, which can still be individually specified to
 override cachedir.  Note that if you have multiple configurations, each
 must have a different cachedir, there is no automatic computation of a
 subpath under cachedir.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXFSOCCUPPC">
 <term><varname>maxfsoccuppc</varname></term>
 <listitem><para>Maximum file system occupation
 over which we stop indexing. The value is a percentage,
 corresponding to what the "Capacity" df output column shows. The default
 value is 0, meaning no checking.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB">
 <term><varname>xapiandb</varname></term>
 <listitem><para>Xapian database directory
 location. This will be created on first indexing. If the
 value is not an absolute path, it will be interpreted as relative to
 cachedir if set, or the configuration directory (-c argument or
 $RECOLL_CONFDIR).  If nothing is specified, the default is then
 ~/.recoll/xapiandb/</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXSTATUSFILE">
 <term><varname>idxstatusfile</varname></term>
 <listitem><para>Name of the scratch file where the indexer process updates its
 status. Default: idxstatus.txt inside the configuration
 directory.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEDIR">
 <term><varname>mboxcachedir</varname></term>
 <listitem><para>Directory location for storing mbox message offsets cache
 files. This is normally 'mboxcache' under cachedir if set,
 or else under the configuration directory, but it may be useful to share
 a directory between different configurations.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEMINMBS">
 <term><varname>mboxcacheminmbs</varname></term>
 <listitem><para>Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The
 default is 5 MB.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEDIR">
 <term><varname>webcachedir</varname></term>
 <listitem><para>Directory where we store the archived web pages. This is only used by the web history indexing code
 Default: cachedir/webcache if cachedir is set, else
 $RECOLL_CONFDIR/webcache</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEMAXMBS">
 <term><varname>webcachemaxmbs</varname></term>
 <listitem><para>Maximum size in MB of the Web archive. This is only used by the web history indexing code.
 Default: 40 MB.
 Reducing the size will not physically truncate the file.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBQUEUEDIR">
 <term><varname>webqueuedir</varname></term>
 <listitem><para>The path to the Web indexing queue. This is
 hard-coded in the plugin as ~/.recollweb/ToIndex so there should be no
 need or possibility to change it.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLDICDIR">
 <term><varname>aspellDicDir</varname></term>
 <listitem><para>Aspell dictionary storage directory location. The
 aspell dictionary (aspdict.(lang).rws) is normally stored in the
 directory specified by cachedir if set, or under the configuration
 directory.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERSDIR">
 <term><varname>filtersdir</varname></term>
 <listitem><para>Directory location for executable input handlers. If
 RECOLL_FILTERSDIR is set in the environment, we use it instead. Defaults
 to $prefix/share/recoll/filters. Can be redefined for
 subdirectories.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ICONSDIR">
 <term><varname>iconsdir</varname></term>
 <listitem><para>Directory location for icons. The only reason to
 change this would be if you want to change the icons displayed in the
 result list. Defaults to $prefix/share/recoll/images</para></listitem></varlistentry>
 </sect3>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PERFS">
 <title>Parameters affecting indexing performance and resource usage </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXFLUSHMB">
 <term><varname>idxflushmb</varname></term>
 <listitem><para>Threshold (megabytes of new data) where we flush from memory to
 disk index. Setting this allows some control over memory
 usage by the indexer process. A value of 0 means no explicit flushing,
 which lets Xapian perform its own thing, meaning flushing every
 $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
 usage depends on average document size, not only document count, the
 Xapian approach is is not very useful, and you should let Recoll manage
 the flushes.  The default value of idxflushmb is 10 MB, and may be a bit
 low. If you are looking for maximum speed, you may want to experiment
 with values between 20 and
 80. In my experience, values beyond 100 are always counterproductive. If
 you find otherwise, please drop me a note.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS">
 <term><varname>filtermaxseconds</varname></term>
 <listitem><para>Maximum external filter execution time in
 seconds. Default 1200 (20mn). Set to 0 for no limit. This
 is mainly to avoid infinite loops in postscript files
 (loop.ps)</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXMBYTES">
 <term><varname>filtermaxmbytes</varname></term>
 <listitem><para>Maximum virtual memory space for filter processes
 (setrlimit(RLIMIT_AS)), in megabytes. Note that this
 includes any mapped libs (there is no reliable Linux way to limit the
 data space only), so we need to be a bit generous here. Anything over
 2000 will be ignored on 32 bits machines.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRQSIZES">
 <term><varname>thrQSizes</varname></term>
 <listitem><para>Stage input queues configuration. There are three
 internal queues in the indexing pipeline stages (file data extraction,
 terms generation, index update). This parameter defines the queue depths
 for each stage (three integer values). If a value of -1 is given for a
 given stage, no queue is used, and the thread will go on performing the
 next stage. In practise, deep queues have not been shown to increase
 performance. Default: a value of 0 for the first queue tells Recoll to
 perform autoconfiguration based on the detected number of CPUs (no need
 for the two other values in this case).  Use thrQSizes = -1 -1 -1 to
 disable multithreading entirely.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRTCOUNTS">
 <term><varname>thrTCounts</varname></term>
 <listitem><para>Number of threads used for each indexing stage. The
 three stages are: file data extraction, terms generation, index
 update). The use of the counts is also controlled by some special values
 in thrQSizes: if the first queue depth is 0, all counts are ignored
 (autoconfigured); if a value of -1 is used for a queue depth, the
 corresponding thread count is ignored. It makes no sense to use a value
 other than 1 for the last stage because updating the Xapian index is
 necessarily single-threaded (and protected by a mutex).</para></listitem></varlistentry>
 </sect3>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">
 <title>Miscellaneous parameters </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOGLEVEL">
 <term><varname>loglevel</varname></term>
 <listitem><para>Log file verbosity 1-6. A value of 2 will print
 only errors and warnings. 3 will print information like document updates,
 4 is quite verbose and 6 very verbose.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOGFILENAME">
 <term><varname>logfilename</varname></term>
 <listitem><para>Log file destination. Use 'stderr' (default) to write to the
 console. </para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXLOGLEVEL">
 <term><varname>idxloglevel</varname></term>
 <listitem><para>Override loglevel for the indexer. </para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXLOGFILENAME">
 <term><varname>idxlogfilename</varname></term>
 <listitem><para>Override logfilename for the indexer. </para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGLEVEL">
 <term><varname>daemloglevel</varname></term>
 <listitem><para>Override loglevel for the indexer in real time
 mode. The default is to use the idx... values if set, else
 the log... values.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGFILENAME">
 <term><varname>daemlogfilename</varname></term>
 <listitem><para>Override logfilename for the indexer in real time
 mode. The default is to use the idx... values if set, else
 the log... values.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXRUNDIR">
 <term><varname>idxrundir</varname></term>
 <listitem><para>Indexing process current directory. The input
 handlers sometimes leave temporary files in the current directory, so it
 makes sense to have recollindex chdir to some temporary directory. If the
 value is empty, the current directory is not changed. If the
 value is (literal) tmp, we use the temporary directory as set by the
 environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an
 absolute path to a directory, we go there.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CHECKNEEDRETRYINDEXSCRIPT">
 <term><varname>checkneedretryindexscript</varname></term>
 <listitem><para>Script used to heuristically check if we need to retry indexing
 files which previously failed.  The default script checks
 the modified dates on /usr/bin and /usr/local/bin. A relative path will
 be looked up in the filters dirs, then in the path. Use an absolute path
 to do otherwise.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.RECOLLHELPERPATH">
 <term><varname>recollhelperpath</varname></term>
 <listitem><para>Additional places to search for helper executables. This is only used on Windows for now.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXABSMLEN">
 <term><varname>idxabsmlen</varname></term>
 <listitem><para>Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file.
 The text can come from an actual 'abstract' section in the
 document or will just be the beginning of the document. It is stored in
 the index so that it can be displayed inside the result lists without
 decoding the original file. The idxabsmlen parameter
 defines the size of the stored abstract. The default value is 250
 bytes. The search interface gives you the choice to display this stored
 text or a synthetic abstract built by extracting text around the search
 terms. If you always prefer the synthetic abstract, you can reduce this
 value and save a little space.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXMETASTOREDLEN">
 <term><varname>idxmetastoredlen</varname></term>
 <listitem><para>Truncation length of stored metadata fields. This
 does not affect indexing (the whole field is processed anyway), just the
 amount of data stored in the index for the purpose of displaying fields
 inside result lists or previews. The default value is 150 bytes which
 may be too low if you have custom fields.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE">
 <term><varname>aspellLanguage</varname></term>
 <listitem><para>Language definitions to use when creating the aspell
 dictionary. The value must match a set of aspell language
 definition files. You can type "aspell dicts"  to see a list The default
 if this is not set is to use the NLS environment to guess the
 value.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLADDCREATEPARAM">
 <term><varname>aspellAddCreateParam</varname></term>
 <listitem><para>Additional option and parameter to aspell dictionary creation
 command. Some aspell packages may need an additional option
 (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug
 772415.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLKEEPSTDERR">
 <term><varname>aspellKeepStderr</varname></term>
 <listitem><para>Set this to have a look at aspell dictionary creation
 errors. There are always many, so this is mostly for
 debugging.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOASPELL">
 <term><varname>noaspell</varname></term>
 <listitem><para>Disable aspell use. The aspell dictionary generation
 takes time, and some combinations of aspell version, language, and local
 terms, result in aspell crashing, so it sometimes makes sense to just
 disable the thing.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONAUXINTERVAL">
 <term><varname>monauxinterval</varname></term>
 <listitem><para>Auxiliary database update interval. The real time
 indexer only updates the auxiliary databases (stemdb, aspell)
 periodically, because it would be too costly to do it for every document
 change. The default period is one hour.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIXINTERVAL">
 <term><varname>monixinterval</varname></term>
 <listitem><para>Minimum interval (seconds) between processings of the indexing
 queue. The real time indexer does not process each event
 when it comes in, but lets the queue accumulate, to diminish overhead and
 to aggregate multiple events affecting the same file. Default 30
 S.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONDELAYPATTERNS">
 <term><varname>mondelaypatterns</varname></term>
 <listitem><para>Timing parameters for the real time indexing. Definitions for files which get a longer delay before reindexing
 is allowed. This is for fast-changing files, that should only be
 reindexed once in a while. A list of wildcardPattern:seconds pairs. The
 patterns are matched with fnmatch(pattern, path, 0) You can quote entries
 containing white space with double quotes (quote the whole entry, not the
 pattern). The default is empty.
 Example: mondelaypatterns = *.log:20 "*with spaces.*:30"</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASS">
 <term><varname>monioniceclass</varname></term>
 <listitem><para>ionice class for the real time indexing process On platforms where this is supported. The default value is
 3.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASSDATA">
 <term><varname>monioniceclassdata</varname></term>
 <listitem><para>ionice class parameter for the real time indexing process. On platforms where this is supported. The default is
 empty.</para></listitem></varlistentry>
 </sect3>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.QUERY">
 <title>Query-time parameters (no impact on the index) </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.AUTODIACSENS">
 <term><varname>autodiacsens</varname></term>
 <listitem><para>auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger
 diacritics sensitivity if the search term has accented characters (not in
 unac_except_trans). Else you need to use the query language and the "D"
 modifier to specify diacritics sensitivity. Default is no.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.AUTOCASESENS">
 <term><varname>autocasesens</varname></term>
 <listitem><para>auto-trigger case sensitivity (raw index only). IF
 the index is not stripped (see indexStripChars), decide if we
 automatically trigger character case sensitivity if the search term has
 upper-case characters in any but the first position. Else you need to use
 the query language and the "C" modifier to specify character-case
 sensitivity. Default is yes.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMEXPAND">
 <term><varname>maxTermExpand</varname></term>
 <listitem><para>Maximum query expansion count
 for a single term (e.g.: when using wildcards). This only
 affects queries, not indexing. We used to not limit this at all (except
 for filenames where the limit was too low at 1000), but it is
 unreasonable with a big index. Default 10000.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXXAPIANCLAUSES">
 <term><varname>maxXapianClauses</varname></term>
 <listitem><para>Maximum number of clauses
 we add to a single Xapian query. This only affects queries,
 not indexing. In some cases, the result of term expansion can be
 multiplicative, and we want to avoid eating all the memory. Default
 50000.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SNIPPETMAXPOSWALK">
 <term><varname>snippetMaxPosWalk</varname></term>
 <listitem><para>Maximum number of positions we walk while populating a snippet for
 the result list. The default of 1,000,000 may be
 insufficient for very big documents, the consequence would be snippets
 with possibly meaning-altering missing words.</para></listitem></varlistentry>
 </sect3>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PDF">
 <title>Parameters for the PDF input script </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCR">
 <term><varname>pdfocr</varname></term>
 <listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
 pdftoppm are installed. The default is off because OCR is so
 very slow.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
 <term><varname>pdfattach</varname></term>
 <listitem><para>Enable PDF attachment extraction by executing pdftk (if
 available). This is
 normally disabled, because it does slow down PDF indexing a bit even if
 not one attachment is ever found.</para></listitem></varlistentry>
 </sect3>
 <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.SPECLOCATIONS">
 <title>Parameters set for specific locations </title>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MHMBOXQUIRKS">
 <term><varname>mhmboxquirks</varname></term>
 <listitem><para>Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are
 stored.</para></listitem></varlistentry>
 </sect3>
 </sect2>
--- a/src/doc/user/usermanual.html
+++ b/src/doc/user/usermanual.html
--- a/src/doc/user/usermanual.xml
+++ b/src/doc/user/usermanual.xml
@ -5651,880 +5651,10 @@ thesame = "some string with spaces"
 	  </sect2>
-      <sect2 id="RCL.INSTALL.CONFIG.RECOLLCONF">
+      <!-- <sect2 id="RCL.INSTALL.CONFIG.RECOLLCONF"> -->
-        <title>The main configuration file, recoll.conf</title>
+      <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
        href="recoll.conf.xml" /> 
        <para><filename>recoll.conf</filename> is the main
         configuration file. It defines things like
         what to index (top directories and things to ignore), and the
         default character set to use for document types which do not
         specify it internally.</para>
        <para>The default configuration will index your home
         directory. If this is not appropriate, start
         <command>recoll</command> to create a blank 
         configuration, click <guimenu>Cancel</guimenu>, and edit
         the configuration file before restarting the command. This
         will start the initial indexing, which may take some time.</para>
        <para>Most of the following parameters can be changed from the
        <guilabel>Index Configuration</guilabel> menu in the
        <command>recoll</command> interface. Some can only be set by
        editing the configuration file.</para>
        <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.FILES">
          <title>Parameters affecting what documents we index:</title>
        <variablelist>
          <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS">
            <term><varname>topdirs</varname></term>
            <listitem><para>Specifies the list of directories or files to
            index (recursively for directories). You can use symbolic links
            as elements of this list. See the
            <varname>followLinks</varname> option about following symbolic links
            found under the top elements (not followed by default).</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>skippedNames</varname></term>
            <listitem>
              <para>A space-separated list of wilcard patterns for
               names of files or directories that should be completely
               ignored. The list defined in the default file is: </para>
 <programlisting>
 skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
 	       *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
 	       .recoll* xapiandb recollrc recoll.conf 
 </programlisting>
              <para>The list can be redefined at any sub-directory in the
 		indexed area.</para>
              <para>The top-level directories are not affected by this
                list (that is, a directory in <varname>topdirs</varname>
                might match and would still be indexed).</para>
                <para>The list in the default configuration does not
                exclude hidden directories (names beginning with a
                dot), which means that it may index quite a few things
                that you do not want. On the other hand, email user
                agents like <application>thunderbird</application>
                usually store messages in hidden directories, and you
                probably want this indexed. One possible solution is to
                have <filename>.*</filename> in
                <varname>skippedNames</varname>, and add things like
                <filename>~/.thunderbird</filename> or
                <filename>~/.evolution</filename> in
                <varname>topdirs</varname>.</para> 
                <para>Not even the file names are indexed for patterns
                in this list. See the
                <varname>noContentSuffixes</varname> variable for an alternative
                approach which indexes the file names.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>noContentSuffixes</varname></term>
          <listitem><para>This is a list of file name endings (not
          wildcard expressions, nor dot-delimited suffixes). Only the
          names of matching files will be indexed (no attempt at MIME
          type identification, no decompression, no content
          indexing). This can be redefined for
          subdirectories, and edited from the GUI. The default value is:
 <programlisting>
 noContentSuffixes = .md5 .map \
       .o .lib .dll .a .sys .exe .com \
       .mpp .mpt .vsd \
 	   .img .img.gz .img.bz2 .img.xz .image .image.gz .image.bz2 .image.xz \
       .dat .bak .rdf .log.gz .log .db .msf .pid \
       ,v ~ #
 </programlisting>
          </para></listitem>
          </varlistentry>
          <varlistentry><term><varname>skippedPaths</varname> and
             <varname>daemSkippedPaths</varname> </term>
            <listitem>
              <para>A space-separated list of patterns for
               <emphasis>paths</emphasis> of files or directories that should be skipped.
               There is no default in the sample configuration file,
               but the code always adds the configuration and database
               directories in there.</para>
              <para><varname>skippedPaths</varname> is used both by
              batch and real time
              indexing. <varname>daemSkippedPaths</varname> can be
              used to specify things that should be indexed at
              startup, but not monitored.</para>
              <para>Example of use for skipping text files only in a
              specific directory:</para>
              <programlisting>
 skippedPaths = ~/somedir/*.txt
              </programlisting>
            </listitem>
          </varlistentry>
          <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME">
            <term><varname>skippedPathsFnmPathname</varname></term>
                <listitem><para>The values in the
                <varname>*skippedPaths</varname> variables are matched by
                default with <literal>fnmatch(3)</literal>, with the
                FNM_PATHNAME flag. This means that '/'
                characters must be matched explicitely. You can set
                <varname>skippedPathsFnmPathname</varname> to 0 to disable
                the use of FNM_PATHNAME (meaning that /*/dir3 will match
                /dir1/dir2/dir3).</para>
            </listitem>
          </varlistentry>
 	  <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPSKIPPEDNAMES">
 	    <term><varname>zipSkippedNames</varname></term>
 	    <listitem><para>A space-separated list of patterns for
               names of files or directories that should be ignored
               inside zip archives. This is used directly by the zip
               handler, and has a function similar to skippedNames, but
               works independantly. Can be redefined for filesystem
               subdirectories. For versions up to 1.19, you will need
               to update the Zip handler and install a supplementary
               Python module. The details are
               described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
 		  the &RCL; wiki</ulink>.
 	    </para></listitem>
 	  </varlistentry>
          <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FOLLOWLINKS">
            <term><varname>followLinks</varname></term>
            <listitem><para>Specifies if the indexer should follow
            symbolic links while walking the file tree. The default is
            to ignore symbolic links to avoid multiple indexing of
            linked files. No effort is made to avoid duplication when
            this option is set to true. This option can be set
            individually for each of the <varname>topdirs</varname>
            members by using sections. It can not be changed below the
            <varname>topdirs</varname> level.</para>
            </listitem> 
          </varlistentry>
          <varlistentry><term><varname>indexedmimetypes</varname></term>
            <listitem><para>&RCL; normally indexes any file which it
            knows how to read. This list lets you restrict the indexed
            MIME types to what you specify. If the variable is
            unspecified or the list empty (the default), all supported
            types are processed. Can be redefined for subdirectories.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>excludedmimetypes</varname></term>
            <listitem><para> This list lets you exclude some MIME types from
            indexing. Can be redefined for subdirectories.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>compressedfilemaxkbs</varname></term>
            <listitem><para>Size limit for compressed (.gz or .bz2)
            files. These need to be decompressed in a temporary
            directory for identification, which can be very wasteful
            if 'uninteresting' big compressed files are present.
            Negative means no limit, 0 means no processing of any
            compressed file. Defaults to -1.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>textfilemaxmbs</varname></term>
            <listitem><para>Maximum size for text files. Very big text
            files are often uninteresting logs. Set to -1 to disable
            (default 20MB).</para>  
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>textfilepagekbs</varname></term>
            <listitem><para>If set to other than -1, text files will be
            indexed as multiple documents of the given page size. This may
            be useful if you do want to index very big text files as it
            will both reduce memory usage at index time and help with
            loading data to the preview window. A size of a few megabytes
            would seem reasonable (default: 1MB).</para>
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>membermaxkbs</varname></term>
            <listitem><para>This defines the maximum size in kilobytes for
            an archive member (zip, tar or rar at the moment). Bigger
            entries will be skipped.</para>
              </listitem>
            </varlistentry>
          <varlistentry><term><varname>indexallfilenames</varname></term>
            <listitem><para>&RCL; indexes file names in a special
            section of the database to allow specific file names
            searches using wild cards. This parameter decides if 
            file name indexing is performed only for files with MIME
            types that would qualify them for full text indexing, or
            for all files inside the selected subtrees, independently of
            MIME type.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>usesystemfilecommand</varname></term>
            <listitem><para>Decide if we execute a system command 
            (<command>file</command> <option>-i</option> by default)
            as a final step for determining the MIME type for a file
            (the main procedure uses suffix associations as defined in
            the <filename>mimemap</filename> file). This can be useful
            for files with suffix-less names, but it will also cause
            the indexing of many bogus "text" files.</para>
            </listitem> 
 	  </varlistentry>
          <varlistentry><term><varname>systemfilecommand</varname></term>
            <listitem><para>Command to use for mime for mime type
            determination if <literal>usesystefilecommand</literal> is
            set. Recent versions of <command>xdg-mime</command> sometimes
            work better than <command>file</command>.</para>
            </listitem> 
 	  </varlistentry>
          <varlistentry><term><varname>processwebqueue</varname></term>
            <listitem><para>If this is set, process the directory where
            Web browser plugins copy visited pages for indexing.</para>
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>webqueuedir</varname></term>
            <listitem><para>The path to the web indexing queue. This is
            hard-coded in the Firefox plugin as
            <filename>~/.recollweb/ToIndex</filename> so there should be no
            need to change it.</para> 
            </listitem>
           </varlistentry>
        </variablelist>
       </sect3>
       <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
 	<title>Parameters affecting how we generate terms:</title>
        <para>Changing some of these parameters will imply a full
          reindex. Also, when using multiple indexes, it may not make sense
          to search indexes that don't share the values for these parameters,
          because they usually affect both search and index operations.</para>
        <variablelist>
          <varlistentry><term><varname>indexStripChars</varname></term>
            <listitem><para>Decide if we strip characters of diacritics and
                convert them to lower-case before terms are indexed. If we
                don't, searches sensitive to case and diacritics can be
                performed, but the index will be bigger, and some marginal
                weirdness may sometimes occur. The default is a stripped
                index (<literal>indexStripChars = 1</literal>) for
                now. When using multiple indexes for a search,
                this parameter must be defined identically for
                all. Changing the value implies an index reset.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>maxTermExpand</varname></term>
            <listitem><para>Maximum expansion count for a single term (e.g.:
                when using wildcards). The default of 10000 is reasonable and
                will avoid queries that appear frozen while the engine is
                walking the term list.</para>
            </listitem>
         </varlistentry>
          <varlistentry><term><varname>maxXapianClauses</varname></term>
            <listitem><para>Maximum number of elementary clauses we can add
                to a single Xapian query. In some cases, the result of term
                expansion can be multiplicative, and we want to avoid using
                excessive memory. The default of 100 000 should be both
                high enough in most cases and compatible with current
                typical hardware configurations.</para>
            </listitem>
         </varlistentry>
          <varlistentry><term><varname>nonumbers</varname></term>
            <listitem><para>If this set to true, no terms will be generated
            for numbers. For example "123", "1.5e6", 192.168.1.4, would not
            be indexed ("value123" would still be). Numbers are often quite
            interesting to search for, and this should probably not be set
            except for special situations, ie, scientific documents with huge
            amounts of numbers in them. This can only be set for a whole
            index, not for a subtree.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>dehyphenate</varname></term>
            <listitem><para>Determines if, given an input of
            <literal>co-worker</literal>, we add a term for
            <literal>coworker</literal>. This possibility is new in version
            1.22, and on by default. Setting the variable to off allows
            restoring the previous behaviour.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>nocjk</varname></term>
            <listitem><para>If this set to true, specific east asian
            (Chinese Korean Japanese) characters/word splitting is
            turned off. This will save a small amount of cpu if you
            have no CJK documents. If your document base does include
            such text but you are not interested in searching it,
            setting <varname>nocjk</varname> may be a significant time
            and space saver.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>cjkngramlen</varname></term>
            <listitem><para>This lets you adjust the size of n-grams
            used for indexing CJK text. The default value of 2 is
            probably appropriate in most cases. A value of 3 would
            allow more precision and efficiency on longer words, but
            the index will be approximately twice as large.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>indexstemminglanguages</varname></term>
            <listitem><para>A list of languages for which the stem
            expansion databases will be built. See <citerefentry>
            <refentrytitle>recollindex</refentrytitle>
            <manvolnum>1</manvolnum> </citerefentry> or use the
            <command>recollindex</command> <option>-l</option> command
            for possible values. You can add a stem expansion database
            for a different language by using
            <command>recollindex</command> <option>-s</option>, but it
            will be deleted during the next indexing. Only languages
            listed in the configuration file are permanent.</para>
            </listitem> 
          </varlistentry>
          <varlistentry><term><varname>defaultcharset</varname></term>
            <listitem><para>The name of the character set used for
            files that do not contain a character set definition (ie:
            plain text files). This can be redefined for any
            sub-directory. If it is not set at all, the character set
            used is the one defined by the nls environment (
 	    <envar>LC_ALL</envar>, <envar>LC_CTYPE</envar>, 
 	    <envar>LANG</envar>), or <literal>iso8859-1</literal> 
 	    if nothing is set.</para> 
 	   </listitem>
         </varlistentry>
          <varlistentry><term><varname>unac_except_trans</varname></term>
            <listitem><para>This is a list of characters, encoded in UTF-8,
            which should be handled specially when converting text to
            unaccented lowercase.  For example, in Swedish, the letter
            <literal>a with diaeresis</literal> has full alphabet
            citizenship and should not be turned into an
            <literal>a</literal>. Each element in the space-separated list
            has the special character as first element and the translation
            following. The handling of both the lowercase and upper-case
            versions of a character should be specified, as appartenance to
            the list will turn-off both standard accent and case
            processing. Example for Swedish:</para>
                <programlisting>
 unac_except_trans =  åå Åå ää Ää öö Öö
            </programlisting>
            <para>Note that the translation is not limited to a single
            character, you could very well have something like
            <literal>üue</literal> in the list.</para>
             <para>The default value set for
             <literal>unac_except_trans</literal> can't be listed here
             because I have trouble with SGML and UTF-8, but it only
             contains ligature decompositions: german ss, oe, ae, fi,
             fl.</para>
             <para>This parameter can't be defined for subdirectories, it
             is global, because there is no way to do otherwise when
             querying. If you have document sets which would need different
             values, you will have to index and query them separately.</para> 
              </listitem>
            </varlistentry>
          <varlistentry><term><varname>maildefcharset</varname></term>
            <listitem><para>This can be used to define the default
 		character set specifically for email messages which don't
 		specify it. This is mainly useful for readpst (libpst) dumps,
 		which are utf-8 but do not say so.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>localfields</varname></term>
            <listitem><para>This allows setting fields for all documents
            under a given directory. Typical usage would be to set an
            "rclaptg" field, to be used in <filename>mimeview</filename> to
            select a specific viewer. If several fields are to be set, they
            should be separated with a semi-colon (';') character, which there
            is currently no way to escape. Also note the initial semi-colon. 
            Example:
 		<literal>localfields= ;rclaptg=gnus;other = val</literal>, then
 		select specifier viewer with
 		<literal>mimetype|tag=...</literal> in
 		<filename>mimeview</filename>.</para>  
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>testmodifusemtime</varname></term>
            <listitem><para>If true, use mtime instead of default ctime to
              determine if a file has been modified (in addition to
              size, which is always used). Setting this can reduce
              re-indexing on systems where extended attributes are
              modified (by some other application), but not indexed
              (changing extended attributes only affects
              ctime). Notes:
              <itemizedlist>
                <listitem><para>This may prevent detection of change
                in some marginal file rename cases (the target would
                need to have the same size and
                mtime).</para></listitem>
                <listitem><para>You should probably also set
                noxattrfields to 1 in this case, except if you still
                prefer to perform xattr indexing, for example if the
                local file update pattern makes it of value (as in
                general, there is a risk for pure extended attributes
                updates without file modification to go
                undetected).</para></listitem>
              </itemizedlist>
                Perform a full index reset after changing the value of
                this parameter.
            </para></listitem>
          </varlistentry>
          <varlistentry><term><varname>noxattrfields</varname></term>
            <listitem><para>Recoll versions 1.19 and later
                automatically translate file extended attributes into
                document fields (to be processed according to the
                parameters from the <filename>fields</filename>
                file). Setting this variable to 1 will disable the
                behaviour.</para></listitem>
          </varlistentry>
          <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.METADATACMDS">
            <term><varname>metadatacmds</varname></term>
            <listitem><para>This allows executing external commands
                for each file and storing the output in &RCL; document
                fields. This could be used for example to index
                external tag data. The value is a list of field names
                and commands, don't forget an initial
                semi-colon. Example:
                <programlisting>
 [/some/area/of/the/fs]
 metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
                </programlisting>
              </para> <para>As a specially disgusting hack brought by
                &RCL; 1.19.7, if a "field name" begins
                with <literal>rclmulti</literal>, the data returned by
                the command is expected to contain multiple field
                values, in configuration file format. This allows
                setting several fields by executing a single
                command. Example:
                <programlisting>
 metadatacmds = ; rclmulti1 = somecmd %f
                </programlisting>
                If <literal>somecmd</literal> returns data in the form
                of:
                <programlisting>
 field1 = value1
 field2 = value for field2
                </programlisting>
                <literal>field1</literal>
                and <literal>field2</literal> will be set inside the
                document metadata.</para>
            </listitem>
          </varlistentry>
        </variablelist>
       </sect3>
       <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.STORAGE">
 	<title>Parameters affecting where and how we store things:</title>
          <variablelist>
          <varlistentry><term><varname>cachedir</varname></term>
            <listitem>
            <para>When not explicitly specified, the &RCL; data directories
            are stored relative to the configuration directory. If
            <literal>cachedir</literal> is set, the directories are stored
            under the specified value instead (e.g. if
            <literal>cachedir</literal> is set to
            <filename>~/.cache/recoll</filename>, the default
            <literal>dbdir</literal> would be
            <filename>~/.cache/recoll/xapiandb</filename> instead of
            <filename>~/.recoll/xapiandb</filename> ). This affects the
            default values for <literal>dbdir</literal>,
            <literal>webcachedir</literal>,
            <literal>mboxcachedir</literal>, and
            <literal>aspellDicDir</literal>, which can still be
            individually specified to override
            <literal>cachedir</literal>. Note that if you have multiple
            configurations, each must have a different
            <literal>cachedir</literal>.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>dbdir</varname></term>
            <listitem><para>The name of the Xapian data directory. It
            will be created if needed when the index is
            initialized. If this is not an absolute path, it will be
            interpreted relative to the configuration directory. The
            value can have embedded spaces but starting or trailing
            spaces will be trimmed. You cannot use quotes here.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>idxstatusfile</varname></term>
            <listitem><para>The name of the scratch file where the indexer
                process updates its status. Default:
            <filename>idxstatus.txt</filename> inside the configuration
            directory.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>maxfsoccuppc</varname></term>
            <listitem><para>Maximum file system occupation before we
            stop indexing. The value is a percentage, corresponding to
            what the "Capacity" df output column shows.  The default
            value is 0, meaning no checking. </para>
            </listitem>
          </varlistentry>
 	  <varlistentry><term><varname>mboxcachedir</varname></term>
 	    <listitem><para>The directory where mbox message offsets cache
 	    files are held. This is normally $RECOLL_CONFDIR/mboxcache, but
 	    it may be useful to share a directory between different
 	    configurations.</para>
 	    </listitem>
 	  </varlistentry>
 	  <varlistentry><term><varname>mboxcacheminmbs</varname></term>
 	    <listitem><para>The minimum mbox file size over which we
 		cache the offsets. There is really no sense in caching
 		offsets for small files. The default is 5 MB.</para>
 	    </listitem>
 	   </varlistentry>
          <varlistentry><term><varname>webcachedir</varname></term>
            <listitem><para>This is only used by the web browser
            plugin indexing code, and defines where the cache for visited
            pages will live. Default:
            <filename>$RECOLL_CONFDIR/webcache</filename></para> 
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>webcachemaxmbs</varname></term>
            <listitem><para>This is only used by the web browser
            plugin indexing code, and defines the maximum size for the web
            page cache. Default: 40 MB. Quite unfortunately, this is only
            taken into account when creating the cache file. You need to
            delete the file for a change to be taken into account.</para> 
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>idxflushmb</varname></term>
            <listitem><para>Threshold (megabytes of new text data) where we
            flush from memory to disk index. Setting this can help control
            memory usage. A value of 0 means no explicit flushing, letting
            Xapian use its own default, which is flushing every 10000 (or
            XAPIAN_FLUSH_THRESHOLD) documents, which gives little memory
            usage control, as memory usage also depends on average document
            size. The default value is 10, and it is probably a bit low. If
            your system usually has free memory, you can try higher values
            between 20 and 80. In my experience, values beyond 100 are
            always counterproductive.</para> 
            </listitem>
          </varlistentry>
        </variablelist>
       </sect3>
       <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXTHREADS">
 	<title>Parameters affecting multithread processing</title>
        <para>The &RCL; indexing process 
          <command>recollindex</command> can use multiple threads to
          speed up indexing on multiprocessor systems. The work done
          to index files is divided in several stages and some of the
          stages can be executed by multiple threads. The stages are:
          <orderedlist>
            <listitem>File system walking: this is always performed by
              the main thread.</listitem>
            <listitem>File conversion and data extraction.</listitem>
            <listitem>Text processing (splitting, stemming,
            etc.)</listitem>
            <listitem>&XAP; index update.</listitem>
          </orderedlist>
        </para>
        <para>You can also read a 
          <ulink url="http://www.recoll.org/idxthreads/threadingRecoll.html">
            longer document</ulink> about the transformation of
          &RCL; indexing to multithreading.</para>
        <para>The threads configuration is controlled by two
          configuration file parameters.</para>
 	 <variablelist>
          <varlistentry><term><varname>thrQSizes</varname></term>
            <listitem><para>This variable defines the job input queues
                configuration. There are three possible queues for
                stages 2, 3 and 4, and this parameter should give the
                queue depth for each stage (three integer values). If
                a value of -1 is used for a given stage, no queue is
                used, and the thread will go on performing the next
                stage. In practise, deep queues have not been shown to
                increase performance. A value of 0 for the first queue
                tells &RCL; to perform autoconfiguration (no need for
                the two other values in this case) - this is the
                default configuration.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>thrTCounts</varname></term>
            <listitem><para>This defines the number of threads used
                for each stage. If a value of -1 is used for one of
                the queue depths, the corresponding thread count is
                ignored. It makes no sense to use a value other than 1
                for the last stage because updating the &XAP; index is
                necessarily single-threaded (and protected by a
                mutex).</para>
            </listitem>
          </varlistentry>
         </variablelist>
         <para>The following example would use three queues (of depth 2),
         and 4 threads for converting source documents, 2 for
         processing their text, and one to update the index. This was
         tested to be the best configuration on the test system
         (quadri-processor with multiple disks).
 <programlisting>
 thrQSizes = 2 2 2
 thrTCounts =  4 2 1
 </programlisting>
         </para>
         <para>The following example would use a single queue, and the
           complete processing for each document would be performed by
           a single thread (several documents will still be processed
           in parallel in most cases). The threads will use mutual
           exclusion when entering the index update stage. In practise
           the performance would be close to the precedent case in
           general, but worse in certain cases (e.g. a Zip archive
           would be performed purely sequentially), so the previous
           approach is preferred. YMMV...  The 2 last values for
           thrTCounts are ignored.
 <programlisting>
 thrQSizes = 2 -1 -1
 thrTCounts =  6 1 1
 </programlisting>
         </para>
         <para>The following example would disable
           multithreading. Indexing will be performed by a single
           thread.
 <programlisting>
 thrQSizes = -1 -1 -1
 </programlisting>
         </para>
       </sect3>
       <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">
 	<title>Miscellaneous parameters:</title>
 	 <variablelist>
           <varlistentry><term><varname>autodiacsens</varname></term>
            <listitem><para>IF the index is not stripped, decide if we
                automatically trigger diacritics sensitivity if the search
                term has accented characters (not in
                <literal>unac_except_trans</literal>). Else you need to use
                the query language and the <literal>D</literal> modifier to
                specify diacritics sensitivity. Default is no.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>autocasesens</varname></term>
            <listitem><para>IF the index is not stripped, decide if we
                automatically trigger character case sensitivity if the
                search term has upper-case characters in any but the first
                position. Else you need to use the query language and the
                <literal>C</literal> modifier to specify character-case
                sensitivity. Default is yes.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>loglevel,daemloglevel</varname></term>
            <listitem><para>Verbosity level for recoll and
            recollindex. A value of 4 lists quite a lot of
            debug/information messages. 2 only lists errors. The
            <literal>daem</literal>version is specific to the indexing monitor
            daemon.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>logfilename,
 		daemlogfilename</varname></term> 
            <listitem><para>Where the messages should go. 'stderr' can
            be used as a special value, and is the default. The
            <literal>daem</literal>version is specific to the indexing monitor
            daemon.</para>
            </listitem>
          </varlistentry>
           <varlistentry><term><varname>checkneedretryindexscript</varname></term>
           <listitem><para>This defines the name for a command
           executed by <command>recollindex</command> when starting
           indexing. If the exit status of the command is 0,
           <command>recollindex</command> retries to index all files
           which previously could not be indexed because of data
           extraction errors. The default value is a script which
           checks if any of the common <filename>bin</filename>
           directories have changed (indicating that a helper program
           may have been installed).</para>
           </listitem>
           </varlistentry>
          <varlistentry><term><varname>mondelaypatterns</varname></term>
            <listitem><para>This allows specify wildcard path patterns
            (processed with fnmatch(3) with 0 flag), to match files which
            change too often and for which a delay should be observed before
            re-indexing. This is a space-separated list, each entry being a
            pattern and a time in seconds, separated by a colon. You can
            use double quotes if a path entry contains white
            space. Example:</para>  
              <programlisting>
 mondelaypatterns = *.log:20 "this one has spaces*:10"
              </programlisting>
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>monixinterval</varname></term>
            <listitem><para>Minimum interval (seconds) for processing the
            indexing queue. The real time monitor does not process each
            event when it comes in, but will wait this time for the queue
            to accumulate to diminish overhead and in order to aggregate
            multiple events to the same file. Default 30 S.</para>
            </listitem>
           </varlistentry>
          <varlistentry><term><varname>monauxinterval</varname></term>
            <listitem><para>Period (in seconds) at which the real time
            monitor will regenerate the auxiliary databases (spelling,
            stemming) if needed. The default is one hour.</para>
              </listitem>
           </varlistentry>
           <varlistentry><term><varname>monioniceclass, monioniceclassdata
           </varname></term><listitem><para>These allow defining the
           <application>ionice</application> class and data used by the
           indexer (default class 3, no data).</para>
         </listitem>
           </varlistentry>
           <varlistentry><term><varname>filtermaxseconds</varname></term>
           <listitem><para>Maximum handler execution time, after which it
           is aborted. Some postscript programs just loop...</para> 
           </listitem>
           </varlistentry>
           <varlistentry><term><varname>filtermaxmbytes</varname></term>
           <listitem><para>&RCL; 1.20.7 and later. Maximum handler memory
           utilisation. This uses setrlimit(RLIMIT_AS) on most systems
           (total virtual memory space size limit). Some programs may start
           with 500 MBytes of mapped shared libraries, so take this into
           account when choosing a value. The default is a liberal
           2000MB.</para>
           </listitem>
           </varlistentry>
          <varlistentry><term><varname>filtersdir</varname></term>
            <listitem><para>A directory to search for the external
            input handler scripts used to index some types of files. The
            value should not be changed, except if you want to modify
            one of the default scripts. The value can be redefined for
            any sub-directory. </para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>iconsdir</varname></term>
            <listitem><para>The name of the directory where
            <command>recoll</command> result list icons are
            stored. You can change this if you want different
            images.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>idxabsmlen</varname></term>
            <listitem><para>&RCL; stores an abstract for each indexed
            file inside the database. The text can come from an actual
            'abstract' section in the document or will just be the
            beginning of the document. It is stored in the index so
            that it can be displayed inside the result lists without
            decoding the original
            file. The <varname>idxabsmlen</varname> parameter defines
            the size of the stored abstract. The default value is 250 bytes.
            The search interface gives you the choice to display this
            stored text or a synthetic abstract built by extracting
            text around the search terms. If you always
            prefer the synthetic abstract, you can reduce this value
            and save a little space.
            </para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>idxmetastoredlen</varname></term>
            <listitem><para>Maximum stored length for metadata
                fields. This does not affect indexing (the whole field is
                processed anyway), just the amount of data stored in the
                index for the purpose of displaying fields inside result
                lists or previews. The default value is 150 bytes which
                may be too low if you have custom fields.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>aspellLanguage</varname></term>
            <listitem><para>Language definitions to use when creating
            the aspell dictionary.  The value must match a set of
            aspell language definition files. You can type "aspell
            config" to see where these are installed (look for
            data-dir). The default if the variable is not set is to
            use your desktop national language environment to guess
            the value.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>noaspell</varname></term>
            <listitem><para>If this is set, the aspell dictionary
            generation is turned off. Useful for cases where you don't
            need the functionality or when it is unusable because
            aspell crashes during dictionary generation.</para>
            </listitem>
          </varlistentry>
          <varlistentry><term><varname>mhmboxquirks</varname></term>
            <listitem><para>This allows definining location-related quirks
            for the mailbox handler. Currently only the
            <literal>tbird</literal> flag is defined, and it should be set
            for directories which hold
            <application>Thunderbird</application> data, as their folder
            format is weird. Example: 
            <programlisting>[/path/to/my/mozilla/mail] 
 mhmboxquirks = tbird</programlisting>
             It should be noted that later &RCL;
             versions have improved automatic detection of
             <application>Thunderbird</application> folders, so that this
             should not be needed at all in most cases.</para>
              </listitem>
            </varlistentry>
        </variablelist>
       </sect3>
      </sect2>
      <sect2 id="RCL.INSTALL.CONFIG.FIELDS">
 	<title>The fields file</title>
--- a/src/sampleconf/recoll.conf
+++ b/src/sampleconf/recoll.conf
@ -1,4 +1,4 @@
-# <filetitle>Recoll default main configuration file</filetitle>
+# <filetitle>Recoll main configuration file, recoll.conf</filetitle>
 # The XML tags in the comments are used to help produce the documentation
 # from the sample/reference file, and not at all at run time, where
@ -11,7 +11,8 @@
 # Most of the important values in this file can be set from the GUI
 # configuration menus, which may be an easier approach than direct editing.
-# <grouptitle>Parameters affecting what documents we index</grouptitle>
+# <grouptitle id="WHATDOCS">Parameters affecting what documents we
 # index</grouptitle> 
 # <var name="topdirs" type="string"><brief>Space-separated list of files or
 # directories to recursively index.</brief><descr>Default to ~ (indexes
@ -19,34 +20,37 @@
 # independantly of the value of the followLinks variable.</descr></var>
 topdirs = ~
-# <var name="skippedNames" type="string"><brief>Wildcard expressions for
+# <var name="skippedNames" type="string">
-# names of files and directories that we should ignore.</brief>
+#
-# <descr> White space separated list of wildcard patterns (simple
+# <brief>Files and directories which should be ignored.</brief> <descr>
-# ones, not paths, must contain no / ), which will be tested against file
+# White space separated list of wildcard patterns (simple ones, not paths,
-# and directory names.  The list in the default configuration does not
+# must contain no / ), which will be tested against file and directory
-# exclude hidden directories (names beginning with a dot), which means that
+# names.  The list in the default configuration does not exclude hidden
-# it may index quite a few things that you do not want. On the other hand,
+# directories (names beginning with a dot), which means that it may index
-# email user agents like Thunderbird usually store messages in hidden
+# quite a few things that you do not want. On the other hand, email user
-# directories, and you probably want this indexed. One possible solution is
+# agents like Thunderbird usually store messages in hidden directories, and
-# to have '.*' in 'skippedNames', and add things like '~/.thunderbird'
+# you probably want this indexed. One possible solution is to have '.*' in
-# '~/.evolution' to 'topdirs'.  Not even the file names are indexed for
+# 'skippedNames', and add things like '~/.thunderbird' '~/.evolution' to
-# patterns in this list, see the 'noContentSuffixes' variable for an
+# 'topdirs'.  Not even the file names are indexed for patterns in this
-# alternative approach which indexes the file names. Can be redefined for
+# list, see the 'noContentSuffixes' variable for an alternative approach
-# any subtree.</descr></var>
+# which indexes the file names. Can be redefined for any
 # subtree.</descr></var>
 skippedNames = #* bin CVS  Cache cache* .cache caughtspam tmp \
     .thumbnails .svn \
     *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
     .recoll* xapiandb recollrc recoll.conf
-# <var name="noContentSuffixes" type="string"><brief>List of name endings (not
+# <var name="noContentSuffixes" type="string">
-# necessarily dot-separated suffixes) for which we don't try MIME type
+#
-# identification, and don't uncompress or index content.</brief><descr>Only
+# <brief>List of name endings (not necessarily dot-separated suffixes) for
-# the names will be indexed. This complements the now obsoleted mimemap
+# which we don't try MIME type identification, and don't uncompress or
-# recoll_noindex list, which will go away in a future release (the move
+# index content.</brief><descr>Only the names will be indexed. This
-# from mimemap to recoll.conf allows editing the list through the
+# complements the now obsoleted recoll_noindex list from the mimemap file,
-# GUI). This is different from skippedNames because these are name ending
+# which will go away in a future release (the move from mimemap to
-# matches only (not wildcard patterns), and the file name itself gets
+# recoll.conf allows editing the list through the GUI). This is different
-# indexed normally. This can be redefined for subdirectories.</descr></var>
+# from skippedNames because these are name ending matches only (not
 # wildcard patterns), and the file name itself gets indexed normally. This
 # can be redefined for subdirectories.</descr></var>
 noContentSuffixes = .md5 .map \
       .o .lib .dll .a .sys .exe .com \
       .mpp .mpt .vsd \
@ -54,20 +58,20 @@ noContentSuffixes = .md5 .map \
       .dat .bak .rdf .log.gz .log .db .msf .pid \
       ,v ~ #
-# <var name="skippedPaths" type="string"><brief>Space-separated list of
+# <var name="skippedPaths" type="string">
-# wildcard expressions for paths we shouldn't go into.</brief><descr>Can
+#
-# contain files and directories. The database and configuration directories
+# <brief>Paths we should not go into.</brief><descr>Space-separated list of
-# will automatically be added.  The expressions are matched 'fnmatch(3)'
+# wildcard expressions for filesystem paths. Can contain files and
 # directories. The database and configuration directories will
 # automatically be added. The expressions are matched using 'fnmatch(3)'
 # with the FNM_PATHNAME flag set by default. This means that '/' characters
 # must be matched explicitely. You can set 'skippedPathsFnmPathname' to 0
 # to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match
-# '/dir1/dir2/dir3').  The default contains the usual mount point for
+# '/dir1/dir2/dir3').  The default value contains the usual mount point for
-# removable media by default to remind people that it is a bad idea to
+# removable media to remind you that it is a bad idea to have Recoll work
-# naively have recoll work on these (esp. with the monitor: media gets
+# on these (esp. with the monitor: media gets indexed on mount, all data
-# indexed on mount, all data gets erased on unmount). Typically the
+# gets erased on unmount).  Explicitely adding '/media/xxx' to the topdirs
-# presence of '/media' is mostly a reminder, it would only have effect for
+# will override this.</descr></var>
 # someone who is indexing '/'.  Explicitely adding '/media/xxx' to the
 # topdirs will override this.</descr></var>
 skippedPaths = /media
 # <var name="skippedPathsFnmPathname" type="bool"><brief>Set to 0 to
@ -75,19 +79,22 @@ skippedPaths = /media
 # paths.</brief><descr></descr></var> 
 #skippedPathsFnmPathname = 1
-# <var name="daemSkippedPaths"><brief>skippedPaths equivalent specific to
+# <var name="daemSkippedPaths" type="string">
 #
 # <brief>skippedPaths equivalent specific to
 # real time indexing.</brief><descr>This enables having parts of the tree
 # which are initially indexed but not monitored. If daemSkippedPaths is
 # not set, the daemon uses skippedPaths.</descr></var>
 #daemSkippedPaths = 
-# <var name="zipSkippedNames" type="string"><brief>Space-separated list of
+# <var name="zipSkippedNames" type="string">
-# wildcard expresions for names that should be ignored
+#
-# inside zip archives.</brief><descr>This is used directly by the zip
+# <brief>Space-separated list of wildcard expressions for names that should
-# handler, and has a function similar to skippedNames, but
+# be ignored inside zip archives.</brief><descr>This is used directly by
-# works independantly. Can be redefined for subdirectories. Supported by
+# the zip handler, and has a function similar to skippedNames, but works
-# recoll 1.20 and newer. See
+# independantly. Can be redefined for subdirectories. Supported by recoll
 # 1.20 and newer. See
 # https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members
 # </descr></var>
 #zipSkippedNames =
@ -119,12 +126,12 @@ skippedPaths = /media
 # files.</brief><descr>We need to decompress these in a
 # temporary directory for identification, which can be wasteful in some
 # cases. Limit the waste. Negative means no limit. 0 results in no
-# processing of any compressed file.</descr></var>
+# processing of any compressed file. Default 50 MB.</descr></var>
 compressedfilemaxkbs = 50000
 # <var name="textfilemaxmbs" type="int"><brief>Size limit for text
 # files.</brief><descr>Mostly for skipping monster
-# logs.</descr></var> 
+# logs. Default 20 MB.</descr></var> 
 textfilemaxmbs = 20
 # <var name="indexallfilenames" type="bool"><brief>Index the file names of
@ -158,7 +165,8 @@ processwebqueue = 0
 # into documents of approximately this size. Will reduce memory usage at
 # index time and help with loading data in the preview window at query
 # time. Particularly useful with very big files, such as application or
-# system logs.</descr></var>
+# system logs. Also see textfilemaxmbs and
 # compressedfilemaxkbs.</descr></var>
 textfilepagekbs = 1000
 # <var name="membermaxkbs" type="int"><brief>Size limit for archive
@ -168,7 +176,8 @@ membermaxkbs = 50000
-# <grouptitle>Parameters affecting how we generate terms</grouptitle>
+# <grouptitle id="TERMS">Parameters affecting how we generate
 # terms</grouptitle> 
 # Changing some of these parameters will imply a full
 # reindex. Also, when using multiple indexes, it may not make sense
@ -201,9 +210,9 @@ indexStripChars = 1
 # restoring the previous behaviour.</descr></var>
 #dehyphenate = 1
-# <var name="nocjk" type="bool"><brief>Decides if specific east asian
+# <var name="nocjk" type="bool"><brief>Decides if specific East Asian
 # (Chinese Korean Japanese) characters/word splitting is turned
-# off.</brief><descr>This will save a small amount of cpu if you have no CJK
+# off.</brief><descr>This will save a small amount of CPU if you have no CJK
 # documents. If your document base does include such text but you are not
 # interested in searching it, setting nocjk may be a
 # significant time and space saver.</descr></var>
@ -216,10 +225,11 @@ indexStripChars = 1
 # as large.</descr></var>
 #cjkngramlen = 2
-# <var name="indexstemminglanguages" type="string"><brief>Languages for
+# <var name="indexstemminglanguages" type="string">
-# which to create stemming expansion data.</brief><descr>Stemmer names can
+#
-# be found on http://www.xapian.org, or by executing 'recollindex -l', or
+# <brief>Languages for which to create stemming expansion
-# this can also be set from a list in the GUI</descr></var>
+# data.</brief><descr>Stemmer names can be found by executing 'recollindex
 # -l', or this can also be set from a list in the GUI.</descr></var>
 indexstemminglanguages = english 
 # <var name="defaultcharset" type="string"><brief>Default character
@ -246,14 +256,14 @@ indexstemminglanguages = english
 # Examples: 
 # Swedish:
 # unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl åå Åå
-# German:
+# . German:
 # unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
 # In French, you probably want to decompose oe and ae and nobody would type
 # a German ß
 # unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
-# Reasonable default for all until someone protests. These decompositions
+# . The default for all until someone protests follows. These decompositions
-# are not performed by unac, but I cant imagine someone typing the composed
+# are not performed by unac, but it is unlikely that someone would type the
-# forms in a search.
+# composed forms in a search.
 # unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl</descr></var>
 unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
@ -274,7 +284,7 @@ unac_except_trans = ßss œoe Œoe æae Æae ﬀff ﬁfi ﬂfl
 # <var name="testmodifusemtime" type="bool"><brief>Use mtime instead of
 # ctime to test if a file has been modified.</brief><descr>The time is used
-# in in addition to the size, which is always used.
+# in addition to the size, which is always used.
 # Setting this can reduce re-indexing on systems where extended attributes
 # are used (by some other application), but not indexed, because changing
 # extended attributes only affects ctime.
@ -305,6 +315,7 @@ noxattrfields = 0
 # returns multiple field values inside a text blob formatted as a recoll
 # configuration file ("fieldname = fieldvalue" lines). The rclmultixx name
 # will be ignored, and field names and values will be parsed from the data.
 # Example: metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
 # </descr></var>
 #[/some/area/of/the/fs]
 #metadatacmds = ; tags = tmsu tags %f; rclmulti1 = cmdOutputsConf %f
@ -312,24 +323,27 @@ noxattrfields = 0
-# <grouptitle>Parameters affecting where and how we store things</grouptitle>
+# <grouptitle id="STORE">Parameters affecting where and how we store
 # things</grouptitle> 
-# <var name="cachedir" type="dfn"><brief>Top directory for Recoll
+# <var name="cachedir" type="dfn">
-# data</brief><descr>Recoll data directories are normally located relative
+#
-# to the configuration directory (e.g. ~/.recoll/xapiandb,
+# <brief>Top directory for Recoll data.</brief><descr>Recoll data
-# ~/.recoll/mboxcache). If 'cachedir' is set, the directories are stored under
+# directories are normally located relative to the configuration directory
-# the specified value instead (e.g. if cachedir is ~/.cache/recoll, the
+# (e.g. ~/.recoll/xapiandb, ~/.recoll/mboxcache). If 'cachedir' is set, the
-# default dbdir would be ~/.cache/recoll/xapiandb).  This affects dbdir,
+# directories are stored under the specified value instead (e.g. if
-# webcachedir, mboxcachedir, aspellDicDir, which can still be individually
+# cachedir is ~/.cache/recoll, the default dbdir would be
-# specified to override cachedir.  Note that if you have multiple
+# ~/.cache/recoll/xapiandb).  This affects dbdir, webcachedir,
-# configurations, each must have a different cachedir, there is no
+# mboxcachedir, aspellDicDir, which can still be individually specified to
-# automatic computation of a subpath under cachedir.</descr></var>
+# override cachedir.  Note that if you have multiple configurations, each
 # must have a different cachedir, there is no automatic computation of a
 # subpath under cachedir.</descr></var>
 #cachedir = ~/.cache/recoll
 # <var name="maxfsoccuppc" type="int"><brief>Maximum file system occupation
 # over which we stop indexing.</brief><descr>The value is a percentage,
 # corresponding to what the "Capacity" df output column shows. The default
-# value is 0, meaning no checking.</descr></brief>
+# value is 0, meaning no checking.</descr></var>
 maxfsoccuppc = 0
 # <var name="xapiandb" type="dfn"><brief>Xapian database directory
@ -340,9 +354,11 @@ maxfsoccuppc = 0
 # ~/.recoll/xapiandb/</descr></var>
 dbdir = xapiandb
-# <var name="idxstatusfile" type="fn"><brief>Name of the scratch file where
+# <var name="idxstatusfile" type="fn">
-# the indexer process updates its status. Default:
+#
-# idxstatus.txt inside the configuration directory
+# <brief>Name of the scratch file where the indexer process updates its
 # status.</brief><descr>Default: idxstatus.txt inside the configuration
 # directory.</descr></var>
 #idxstatusfile = idxstatus.txt
 # <var name="mboxcachedir" type="dfn">
@ -371,9 +387,9 @@ webcachedir = webcache
 # <var name="webcachemaxmbs" type="int">
 # <brief>Maximum size in MB of the Web archive.</brief>
 # <descr>This is only used by the web history indexing code.
-# Default: 100 MB.
+# Default: 40 MB.
 # Reducing the size will not physically truncate the file.</descr></var>
-webcachemaxmbs = 100
+webcachemaxmbs = 40
 # <var name="webqueuedir" type="fn">
 #
@ -405,21 +421,21 @@ webcachemaxmbs = 100
 # result list. Defaults to $prefix/share/recoll/images</descr></var>
 #iconsdir = /path/to/my/icons
-# <grouptitle>Parameters affecting indexing performance and resource
+# <grouptitle id="PERFS">Parameters affecting indexing performance and
-# usage</grouptitle> 
+# resource usage</grouptitle> 
 # <var name="idxflushmb" type="int">
 #
-# <brief>Threshold (megabytes of new data) where we flush from memory to disk
+# <brief>Threshold (megabytes of new data) where we flush from memory to
-# index.</brief>
+# disk index.</brief> <descr>Setting this allows some control over memory
-# <descr>Setting this allows some control over memory usage by the indexer
+# usage by the indexer process. A value of 0 means no explicit flushing,
-# process. A value of 0 means no explicit flushing, which lets Xapian
+# which lets Xapian perform its own thing, meaning flushing every
-# perform its own thing, meaning flushing every XAPIAN_FLUSH_THRESHOLD
+# $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
-# documents created, modified or deleted. XAPIAN_FLUSH_THRESHOLD is an
+# usage depends on average document size, not only document count, the
-# environment variable. As memory usage depends on average document size,
+# Xapian approach is is not very useful, and you should let Recoll manage
-# not only document count, this is not very useful.
+# the flushes.  The default value of idxflushmb is 10 MB, and may be a bit
-# The default value of 10 MB may be a bit low. If you are looking for
+# low. If you are looking for maximum speed, you may want to experiment
-# maximum speed, you may want to experiment with values between 20 and
+# with values between 20 and
 # 80. In my experience, values beyond 100 are always counterproductive. If
 # you find otherwise, please drop me a note.</descr></var>
 idxflushmb = 10
@ -449,7 +465,7 @@ filtermaxmbytes = 2000
 # for each stage (three integer values). If a value of -1 is given for a
 # given stage, no queue is used, and the thread will go on performing the
 # next stage. In practise, deep queues have not been shown to increase
-# performance. Default: a value of 0 for the first queue tells &RCL; to
+# performance. Default: a value of 0 for the first queue tells Recoll to
 # perform autoconfiguration based on the detected number of CPUs (no need
 # for the two other values in this case).  Use thrQSizes = -1 -1 -1 to
 # disable multithreading entirely.</descr></var>
@ -463,23 +479,23 @@ thrQSizes = 0
 # in thrQSizes: if the first queue depth is 0, all counts are ignored
 # (autoconfigured); if a value of -1 is used for a queue depth, the
 # corresponding thread count is ignored. It makes no sense to use a value
-# other than 1 for the last stage because updating the &XAP; index is
+# other than 1 for the last stage because updating the Xapian index is
 # necessarily single-threaded (and protected by a mutex).</descr></var>
 #thrTCounts = 4 2 1
-# <grouptitle>Miscellaneous parameters</grouptitle>
+# <grouptitle id="MISC">Miscellaneous parameters</grouptitle>
 # <var name="loglevel" type="int">
 #
-# <brief>Debug log verbosity 1-6</brief> <descr>2 is errors/warnings
+# <brief>Log file verbosity 1-6.</brief> <descr>A value of 2 will print
-# only. 3 information like document updates, 4 is quite verbose and 6 very
+# only errors and warnings. 3 will print information like document updates,
-# verbose.</descr></var>
+# 4 is quite verbose and 6 very verbose.</descr></var>
 loglevel = 3
 # <var name="logfilename" type="fn">
 #
-# <brief>Debug log destination. Use 'stderr' (default) to write to the
+# <brief>Log file destination. Use 'stderr' (default) to write to the
 # console.</brief><descr></descr></var>
 logfilename = stderr
@ -511,12 +527,11 @@ logfilename = stderr
 #
 # <brief>Indexing process current directory.</brief> <descr>The input
 # handlers sometimes leave temporary files in the current directory, so it
-# makes sense to have recollindex chdir to some temporary directory. Three
+# makes sense to have recollindex chdir to some temporary directory. If the
-# possible types of values:
+# value is empty, the current directory is not changed. If the
-#  - (literal) tmp : go to temp dir as set by environment (RECOLL_TMPDIR else
+# value is (literal) tmp, we use the temporary directory as set by the
-#    TMPDIR else /tmp)
+# environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an
-#  - Empty: stay where started
+# absolute path to a directory, we go there.</descr></var>
 #  - Absolute path value: go there.</descr></var>
 idxrundir = tmp
 # <var name="checkneedretryindexscript" type="fn">
@ -525,7 +540,7 @@ idxrundir = tmp
 # files which previously failed. </brief> <descr>The default script checks
 # the modified dates on /usr/bin and /usr/local/bin. A relative path will
 # be looked up in the filters dirs, then in the path. Use an absolute path
-# to do otherwise.</descr>
+# to do otherwise.</descr></var>
 checkneedretryindexscript = rclcheckneedretry.sh
 # <var name="recollhelperpath" type="string">
@ -569,9 +584,10 @@ checkneedretryindexscript = rclcheckneedretry.sh
 # <var name="aspellAddCreateParam" type="string">
 #
-# <brief>Additional parameter to aspell dictionary creation
+# <brief>Additional option and parameter to aspell dictionary creation
 # command.</brief><descr>Some aspell packages may need an additional option
-# (e.g. on Debian Jessie). See Debian bug 772415.</descr></var>
+# (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug
 # 772415.</descr></var>
 #aspellAddCreateParam = --local-data-dir=/usr/lib/aspell
 # <var name="aspellKeepStderr" type="bool">
@ -589,18 +605,21 @@ checkneedretryindexscript = rclcheckneedretry.sh
 # disable the thing.</descr></var>
 #noaspell = 1
-# <var name="monixinterval" type="int">
+# <var name="monauxinterval" type="int">
 #
-# <brief>Seconds between auxiliary databases updates (stemdb,
+# <brief>Auxiliary database update interval.</brief><descr>The real time
-# aspell).</brief><descr>The default is one hour.</descr></var>
+# indexer only updates the auxiliary databases (stemdb, aspell)
 # periodically, because it would be too costly to do it for every document
 # change. The default period is one hour.</descr></var>
 #monauxinterval = 3600
 # <var name="monixinterval" type="int">
 # 
 # <brief>Minimum interval (seconds) between processings of the indexing
-# queue.</brief> <descr>The real time monitor does not process each event
+# queue.</brief><descr>The real time indexer does not process each event
 # when it comes in, but lets the queue accumulate, to diminish overhead and
-# to aggregate multiple events to the same file. Default 30 S.</descr></var>
+# to aggregate multiple events affecting the same file. Default 30
 # S.</descr></var>
 #monixinterval = 30
 # <var name="mondelaypatterns" type="string">
@ -611,14 +630,14 @@ checkneedretryindexscript = rclcheckneedretry.sh
 # reindexed once in a while. A list of wildcardPattern:seconds pairs. The
 # patterns are matched with fnmatch(pattern, path, 0) You can quote entries
 # containing white space with double quotes (quote the whole entry, not the
-# pattern). The default is empty.  Example:mondelaypatterns = *.log:20
+# pattern). The default is empty.
-# "*with spaces.*:30"</descr></brief>
+# Example: mondelaypatterns = *.log:20 "*with spaces.*:30"</descr></var>
 #mondelaypatterns = *.log:20  "*with spaces.*:30"
 # <var name="monioniceclass" type="int">
 #
 # <brief>ionice class for the real time indexing process</brief>
-# <descr>On platforms where this is supported, the default value is
+# <descr>On platforms where this is supported. The default value is
 # 3.</descr></var> 
 # monioniceclass = 3
@ -631,11 +650,12 @@ checkneedretryindexscript = rclcheckneedretry.sh
-# <grouptitle>Query-time parameters (no impact on the index)</grouptitle>
+# <grouptitle id="QUERY">Query-time parameters (no impact on the
 # index)</grouptitle> 
 # <var name="autodiacsens" type="bool">
 #
-# <brief>auto-trigger diacritics sensitivity (raw index only)</brief>
+# <brief>auto-trigger diacritics sensitivity (raw index only).</brief>
 # <descr>IF the index is not stripped, decide if we automatically trigger
 # diacritics sensitivity if the search term has accented characters (not in
 # unac_except_trans). Else you need to use the query language and the "D"
@ -644,7 +664,7 @@ autodiacsens = 0
 # <var name="autocasesens" type="bool">
 #
-# <brief>auto-trigger case sensitivity (raw index only)</brief> <descr>IF
+# <brief>auto-trigger case sensitivity (raw index only).</brief><descr>IF
 # the index is not stripped (see indexStripChars), decide if we
 # automatically trigger character case sensitivity if the search term has
 # upper-case characters in any but the first position. Else you need to use
@ -668,14 +688,14 @@ maxXapianClauses = 50000
 # <var name="snippetMaxPosWalk" type="int">
 #
-# <brief>Maximum number of positions we walk while populating a snippet for the
+# <brief>Maximum number of positions we walk while populating a snippet for
-# result list.</brief><descr>The default of 1,000,000 may be insufficient
+# the result list.</brief><descr>The default of 1,000,000 may be
-# for big documents, the consequence would be snippets with possibly
+# insufficient for very big documents, the consequence would be snippets
-# meaning-altering missing words.</descr></var>
+# with possibly meaning-altering missing words.</descr></var>
 snippetMaxPosWalk = 1000000
-# <grouptitle>Parameters for the PDF input script</grouptitle>
+# <grouptitle id="PDF">Parameters for the PDF input script</grouptitle>
 # <var name="pdfocr" type="bool">
 #
@ -693,7 +713,8 @@ snippetMaxPosWalk = 1000000
 #pdfattach = 0
-# <grouptitle>Parameters set for specific locations</grouptitle>
+# <grouptitle id="SPECLOCATIONS">Parameters set for specific
 # locations</grouptitle> 
 # You could specify different parameters for a subdirectory like this:
 #[~/hungariandocs/plain]