topdirs
Specifies the list of directories or files to index (recursively for directories). You can use symbolic links as elements of this list. See the
followLinks
option about following symbolic links found under the top elements (not followed by default).skippedNames
A space-separated list of wilcard patterns for names of files or directories that should be completely ignored. The list defined in the default file is:
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ *~ .beagle .git .hg .bzr loop.ps .xsession-errors \ .recoll* xapiandb recollrc recoll.conf
The list can be redefined at any sub-directory in the indexed area.
The top-level directories are not affected by this list (that is, a directory in
topdirs
might match and would still be indexed).The list in the default configuration does not exclude hidden directories (names beginning with a dot), which means that it may index quite a few things that you do not want. On the other hand, email user agents like thunderbird usually store messages in hidden directories, and you probably want this indexed. One possible solution is to have
.*
inskippedNames
, and add things like~/.thunderbird
or~/.evolution
intopdirs
.Not even the file names are indexed for patterns in this list. See the
noContentSuffixes
variable for an alternative approach which indexes the file names.noContentSuffixes
This is a list of file name endings (not wildcard expressions, nor dot-delimited suffixes). Only the names of matching files will be indexed (no attempt at MIME type identification, no decompression, no content indexing). This can be redefined for subdirectories, and edited from the GUI. The default value is:
noContentSuffixes = .md5 .map \ .o .lib .dll .a .sys .exe .com \ .mpp .mpt .vsd \ .img .img.gz .img.bz2 .img.xz .image .image.gz .image.bz2 .image.xz \ .dat .bak .rdf .log.gz .log .db .msf .pid \ ,v ~ #
skippedPaths
anddaemSkippedPaths
A space-separated list of patterns for paths of files or directories that should be skipped. There is no default in the sample configuration file, but the code always adds the configuration and database directories in there.
skippedPaths
is used both by batch and real time indexing.daemSkippedPaths
can be used to specify things that should be indexed at startup, but not monitored.Example of use for skipping text files only in a specific directory:
skippedPaths = ~/somedir/*.txt
skippedPathsFnmPathname
The values in the
*skippedPaths
variables are matched by default withfnmatch(3)
, with the FNM_PATHNAME flag. This means that '/' characters must be matched explicitely. You can setskippedPathsFnmPathname
to 0 to disable the use of FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).zipSkippedNames
A space-separated list of patterns for names of files or directories that should be ignored inside zip archives. This is used directly by the zip handler, and has a function similar to skippedNames, but works independantly. Can be redefined for filesystem subdirectories. For versions up to 1.19, you will need to update the Zip handler and install a supplementary Python module. The details are described on the Recoll wiki.
followLinks
Specifies if the indexer should follow symbolic links while walking the file tree. The default is to ignore symbolic links to avoid multiple indexing of linked files. No effort is made to avoid duplication when this option is set to true. This option can be set individually for each of the
topdirs
members by using sections. It can not be changed below thetopdirs
level.indexedmimetypes
Recoll normally indexes any file which it knows how to read. This list lets you restrict the indexed MIME types to what you specify. If the variable is unspecified or the list empty (the default), all supported types are processed. Can be redefined for subdirectories.
excludedmimetypes
This list lets you exclude some MIME types from indexing. Can be redefined for subdirectories.
compressedfilemaxkbs
Size limit for compressed (.gz or .bz2) files. These need to be decompressed in a temporary directory for identification, which can be very wasteful if 'uninteresting' big compressed files are present. Negative means no limit, 0 means no processing of any compressed file. Defaults to -1.
textfilemaxmbs
Maximum size for text files. Very big text files are often uninteresting logs. Set to -1 to disable (default 20MB).
textfilepagekbs
If set to other than -1, text files will be indexed as multiple documents of the given page size. This may be useful if you do want to index very big text files as it will both reduce memory usage at index time and help with loading data to the preview window. A size of a few megabytes would seem reasonable (default: 1MB).
membermaxkbs
This defines the maximum size in kilobytes for an archive member (zip, tar or rar at the moment). Bigger entries will be skipped.
indexallfilenames
Recoll indexes file names in a special section of the database to allow specific file names searches using wild cards. This parameter decides if file name indexing is performed only for files with MIME types that would qualify them for full text indexing, or for all files inside the selected subtrees, independently of MIME type.
usesystemfilecommand
Decide if we execute a system command (file
-i
by default) as a final step for determining the MIME type for a file (the main procedure uses suffix associations as defined in themimemap
file). This can be useful for files with suffix-less names, but it will also cause the indexing of many bogus "text" files.systemfilecommand
Command to use for mime for mime type determination if
usesystefilecommand
is set. Recent versions of xdg-mime sometimes work better than file.processwebqueue
If this is set, process the directory where Web browser plugins copy visited pages for indexing.
webqueuedir
The path to the web indexing queue. This is hard-coded in the Firefox plugin as
~/.recollweb/ToIndex
so there should be no need to change it.