diff --git a/src/INSTALL b/src/INSTALL index 3dca1c9f..3968a261 100644 --- a/src/INSTALL +++ b/src/INSTALL @@ -2,3 +2,1070 @@ More documentation can be found in the doc/ directory or at http://www.recoll.org + Link: home: Recoll user manual + Link: up: Recoll user manual + Link: prev: 4.3. API + Link: next: 5.2. Supporting packages + + Chapter 5. Installation and configuration + Prev Next + + ---------------------------------------------------------------------- + +Chapter 5. Installation and configuration + +5.1. Installing a binary copy + + There are three types of binary Recoll installations: + + o Through your system normal software distribution framework (ie, + Debian/Ubuntu apt, FreeBSD ports, etc.). + + o From a package downloaded from the Recoll web site. + + o From a prebuilt tree downloaded from the Recoll web site. + + In all cases, the strict software dependancies (ie on Xapian or iconv) + will be automatically satisfied, you should not have to worry about them. + + You will only have to check or install supporting applications for the + file types that you want to index beyond those that are natively processed + by Recoll (text, HTML, email files, and a few others). + + You should also maybe have a look at the configuration section (but this + may not be necessary for a quick test with default parameters). Most + parameters can be more conveniently set from the GUI interface. + + 5.1.1. Installing through a package system + + If you use a BSD-type port system or a prebuilt package (DEB, RPM, + manually or through the system software configuration utility), just + follow the usual procedure for your system. + + 5.1.2. Installing a prebuilt Recoll + + The unpackaged binary versions on the Recoll web site are just compressed + tar files of a build tree, where only the useful parts were kept + (executables and sample configuration). + + The executable binary files are built with a static link to libxapian and + libiconv, to make installation easier (no dependencies). + + After extracting the tar file, you can proceed with installation as if you + had built the package from source (that is, just type make install). The + binary trees are built for installation to /usr/local. + + ---------------------------------------------------------------------- + + Prev Next + 4.3. API Home 5.2. Supporting packages + Link: home: Recoll user manual + Link: up: Chapter 5. Installation and configuration + Link: prev: Chapter 5. Installation and configuration + Link: next: 5.3. Building from source + + 5.2. Supporting packages + Prev Chapter 5. Installation and configuration Next + + ---------------------------------------------------------------------- + +5.2. Supporting packages + + Recoll uses external applications to index some file types. You need to + install them for the file types that you wish to have indexed (these are + run-time optional dependencies. None is needed for building or running + Recoll except for indexing their specific file type). + + After an indexing pass, the commands that were found missing can be + displayed from the recoll File menu. The list is stored in the missing + text file inside the configuration directory. + + A list of common file types which need external commands follows. Many of + the filters need the iconv command, which is not always listed as a + dependancy. + + Please note that, due to the relatively dynamic nature of this + information, the most up to date version is now kept on the Recoll helper + applications page along with links to the home pages or best + source/patches pages, and misc tips. The list below is not updated often + and may be quite stale. + + For many Linux distributions, most of the commands listed can be installed + from the package repositories. However, the packages are sometimes + outdated, or not the best version for Recoll, so you should take a look at + the Recoll helper applications page if a file type is important to you. + + As of Recoll release 1.14, a number of XML-based formats that were handled + by ad hoc filter code now use the xsltproc command, which usually comes + with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg. + + Now for the list: + + o Openoffice files need unzip and xsltproc. + + o PDF files need pdftotext which is part of the Xpdf or Poppler + packages. + + o Postscript files need pstotext. The original version has an issue with + shell character in file names, which is corrected in recent packages. + See the the Recoll helper applications page for more detail. + + o MS Word needs antiword. It is also useful to have wvWare installed as + it may be be used as a fallback for some files which antiword does not + handle. + + o MS Excel and PowerPoint need catdoc. + + o MS Open XML (docx) needs xsltproc. + + o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on + Ubuntu) package. + + o RTF files need unrtf, which, in its standard version, has much trouble + with non-western character sets. Check the Recoll helper applications + page. + + o TeX files need untex or detex. Check the Recoll helper applications + page for sources if it's not packaged for your distribution. + + o dvi files need dvips. + + o djvu files need djvutxt and djvused from the DjVuLibre package. + + o Audio files: Recoll releases before 1.13 used the id3info command from + the id3lib package to extract mp3 tag information, metaflac (standard + flac tools) for flac files, and ogginfo (vorbis tools) for ogg files. + Releases 1.14 and later use a single Python filter based on mutagen + for all audio file types. + + o Pictures: Recoll uses the Exiftool Perl package to extract tag + information. Most image file formats are supported. Note that there + may not be much interest in indexing the technical tags (image size, + aperture, etc.). This is only of interest if you store personal tags + or textual descriptions inside the image files. + + o chm: files in microsoft help format need Python and the pychm module + (which needs chmlib). + + o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar + module. icalendar is not needed for newer versions, which use internal + code. + + o Zip archives need Python (and the standard zipfile module). + + o Rar archives need Python, the rarfile Python module and the unrar + utility. + + o Midi karaoke files need Python and the Midi module + + o Konqueror webarchive format with Python (uses the Tarfile module). + + o mimehtml web archive format (support based on the email filter, which + introduces some mild weirdness, but still usable). + + Text, HTML, email folders, and Scribus files are processed internally. Lyx + is used to index Lyx files. Many filters need iconv and the standard sed + and awk. + + ---------------------------------------------------------------------- + + Prev Up Next + Chapter 5. Installation and configuration Home 5.3. Building from source + Link: home: Recoll user manual + Link: up: Chapter 5. Installation and configuration + Link: prev: 5.2. Supporting packages + Link: next: 5.4. Configuration overview + + 5.3. Building from source + Prev Chapter 5. Installation and configuration Next + + ---------------------------------------------------------------------- + +5.3. Building from source + + 5.3.1. Prerequisites + + C++ compiler. Up to Recoll version 1.13.04, its absence can manifest + itself by strange messages about a missing iconv_open. + + Development files for Xapian core. + + Important + + If you are building Xapian for an older CPU (before Pentium 4 or Athlon + 64), you need to add the --disable-sse flag to the configure command. Else + all Xapian application will crash with an illegal instruction error. + + Development files for Qt . + + Development files for X11 and zlib. + + Check the Recoll download page for up to date version information. + + You will most probably be able to find a binary package for Qt for your + system. You may have to compile Xapian but this is not difficult (if you + are using FreeBSD, there is a port). + + You may also need libiconv. Recoll currently uses version 1.9 (this should + not be critical). On Linux systems, the iconv interface is part of libc + and you should not need to do anything special. + + 5.3.2. Building + + Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most + versions after 2005 should be ok, maybe some older ones too (Solaris 8 is + ok). If you build on another system, and need to modify things, I would + very much welcome patches. + + Depending on the Qt 3 configuration on your system, you may have to set + the QTDIR and QMAKESPECS variables in your environment: + + o QTDIR should point to the directory above the one that holds the qt + include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should + be /usr/local/qt). + + o QMAKESPECS should be set to the name of one of the Qt mkspecs + sub-directories (ie: linux-g++). + + On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS + is not needed because there is a default link in mkspecs/. + + Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration + details are entirely determined by qmake (which is quite often installed + as qmake-qt4). + + Configure options: + + o --without-aspell will disable the code for phonetic matching of search + terms. + + o --with-fam or --with-inotify will enable the code for real time + indexing. Inotify support is enabled by default on recent Linux + systems. + + o --disable-webkit is available from version 1.17 to implement the + result list with a Qt QTextBrowser instead of a WebKit widget if you + do not or can't depend on the latter. + + o --enable-xattr will enable code to fetch data from file extended + attributes. This is only useful is some application stores data in + there, and also needs some simple configuration (see comments in the + fields configuration file). + + o --enable-camelcase will enable splitting camelCase words. This is not + enabled by default as it has the unfortunate side-effect of making + some phrase searches quite confusing: ie, "MySQL manual" would be + matched by "MySQL manual" and "my sql manual" but not "mysql manual" + (only inside phrase searches). + + o --with-file-command Specify the version of the 'file' command to use + (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable + the gnu version on systems where the native one is bad. + + o --disable-qtgui Disable the Qt interface. Will allow building the + indexer and the command line search program in absence of a Qt + environment. + + o --disable-x11mon Disable X11 connection monitoring inside recollindex. + Together with --disable-qtgui, this allows building recoll without Qt + and X11. + + o Of course the usual autoconf configure options, like --prefix apply. + + Normal procedure: + + cd recoll-xxx + configure + make + (practices usual hardship-repelling invocations) + + + There is little auto-configuration. The configure script will mainly link + one of the system-specific files in the mk directory to mk/sysconf. If + your system is not known yet, it will tell you as much, and you may want + to manually copy and modify one of the existing files (the new file name + should be the output of uname -s). + + 5.3.3. Installation + + Either type make install or execute recollinstall prefix, in the root of + the source tree. This will copy the commands to prefix/bin and the sample + configuration files, scripts and other shared data to prefix/share/recoll. + + If the installation prefix given to recollinstall is different from either + the system default or the value which was specified when executing + configure (as in configure --prefix /some/path), you will have to set the + RECOLL_DATADIR environment variable to indicate where the shared data is + to be found (ie for (ba)sh: export + RECOLL_DATADIR=/some/path/share/recoll). + + You can then proceed to configuration. + + ---------------------------------------------------------------------- + + Prev Up Next + 5.2. Supporting packages Home 5.4. Configuration overview + Link: home: Recoll user manual + Link: up: Chapter 5. Installation and configuration + Link: prev: 5.3. Building from source + + 5.4. Configuration overview + Prev Chapter 5. Installation and configuration + + ---------------------------------------------------------------------- + +5.4. Configuration overview + + Most of the parameters specific to the recoll GUI are set through the + Preferences menu and stored in the standard Qt place + ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit + this by hand. + + Recoll indexing options are set inside text configuration files located in + a configuration directory. There can be several such directories, each of + which define the parameters for one index. + + The configuration files can be edited by hand or through the Index + configuration dialog (Preferences menu). The GUI tool will try to respect + your formatting and comments as much as possible, so it is quite possible + to use both ways. + + The most accurate documentation for the configuration parameters is given + by comments inside the default files, and we will just give a general + overview here. + + For each index, there are two sets of configuration files. System-wide + configuration files are kept in a directory named like + /usr/[local/]share/recoll/examples, and define default values, shared by + all indexes. For each index, a parallel set of files defines the + customized parameters. + + The default location of the configuration is the .recoll directory in your + home. Most people will only use this directory. + + This location can be changed, or others can be added with the + RECOLL_CONFDIR environment variable or the -c option parameter to recoll + and recollindex. + + If the .recoll directory does not exist when recoll or recollindex are + started, it will be created with a set of empty configuration files. + recoll will give you a chance to edit the configuration file before + starting indexing. recollindex will proceed immediately. To avoid + mistakes, the automatic directory creation will only occur for the default + location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you + will have to create the directory). + + All configuration files share the same format. For example, a short + extract of the main configuration file might look as follows: + + # Space-separated list of directories to index. + topdirs = ~/docs /usr/share/doc + + [~/somedirectory-with-utf8-txt-files] + defaultcharset = utf-8 + + + There are three kinds of lines: + + o Comment (starts with #) or empty. + + o Parameter affectation (name = value). + + o Section definition ([somedirname]). + + Depending on the type of configuration file, section definitions either + separate groups of parameters or allow redefining some parameters for a + directory sub-tree. They stay in effect until another section definition, + or the end of file, is encountered. Some of the parameters used for + indexing are looked up hierarchically from the current directory location + upwards. Not all parameters can be meaningfully redefined, this is + specified for each in the next section. + + When found at the beginning of a file path, the tilde character (~) is + expanded to the name of the user's home directory, as a shell would do. + + White space is used for separation inside lists. List elements with + embedded spaces can be quoted using double-quotes. + + Encoding issues. Most of the configuration parameters are plain ASCII. Two + particular sets of values may cause encoding issues: + + o File path parameters may contain non-ascii characters and should use + the exact same byte values as found in the file system directory. + Usually, this means that the configuration file should use the system + default locale encoding. + + o The unac_except_trans parameter should be encoded in UTF-8. If your + system locale is not UTF-8, and you need to also specify non-ascii + file paths, this poses a difficulty because common text editors cannot + handle multiple encodings in a single file. In this relatively + unlikely case, you can edit the configuration file as two separate + text files with appropriate encodings, and concatenate them to create + the complete configuration. + + 5.4.1. Main configuration file + + recoll.conf is the main configuration file. It defines things like what to + index (top directories and things to ignore), and the default character + set to use for document types which do not specify it internally. + + The default configuration will index your home directory. If this is not + appropriate, start recoll to create a blank configuration, click Cancel, + and edit the configuration file before restarting the command. This will + start the initial indexing, which may take some time. + + Most of the following parameters can be changed from the Index + Configuration menu in the recoll interface. Some can only be set by + editing the configuration file. + + 5.4.1.1. Parameters affecting what documents we index: + + topdirs + + Specifies the list of directories or files to index (recursively + for directories). You can use symbolic links as elements of this + list. See the followLinks option about following symbolic links + found under the top elements (not followed by default). + + skippedNames + + A space-separated list of patterns for names of files or + directories that should be completely ignored. The list defined in + the default file is: + + skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ + *~ .beagle .git .hg .bzr loop.ps .xsession-errors \ + .recoll* xapiandb recollrc recoll.conf + + The list can be redefined at any sub-directory in the indexed + area. + + The top-level directories are not affected by this list (that is, + a directory in topdirs might match and would still be indexed). + + The list in the default configuration does not exclude hidden + directories (names beginning with a dot), which means that it may + index quite a few things that you do not want. On the other hand, + email user agents like thunderbird usually store messages in + hidden directories, and you probably want this indexed. One + possible solution is to have .* in skippedNames, and add things + like ~/.thunderbird or ~/.evolution in topdirs. + + Not even the file names are indexed for patterns in this list. See + the recoll_noindex variable in mimemap for an alternative approach + which indexes the file names. + + skippedPaths and daemSkippedPaths + + A space-separated list of patterns for paths of files or + directories that should be skipped. There is no default in the + sample configuration file, but the code always adds the + configuration and database directories in there. + + skippedPaths is used both by batch and real time indexing. + daemSkippedPaths can be used to specify things that should be + indexed at startup, but not monitored. + + Example of use for skipping text files only in a specific + directory: + + skippedPaths = ~/somedir/*.txt + + + skippedPathsFnmPathname + + The values in the *skippedPaths variables are matched by default + with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags. + This means that '/' characters must be matched explicitely. You + can set skippedPathsFnmPathname to 0 to disable the use of + FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3). + + followLinks + + Specifies if the indexer should follow symbolic links while + walking the file tree. The default is to ignore symbolic links to + avoid multiple indexing of linked files. No effort is made to + avoid duplication when this option is set to true. This option can + be set individually for each of the topdirs members by using + sections. It can not be changed below the topdirs level. + + indexedmimetypes + + Recoll normally indexes any file which it knows how to read. This + list lets you restrict the indexed mime types to what you specify. + If the variable is unspecified or the list empty (the default), + all supported types are processed. + + compressedfilemaxkbs + + Size limit for compressed (.gz or .bz2) files. These need to be + decompressed in a temporary directory for identification, which + can be very wasteful if 'uninteresting' big compressed files are + present. Negative means no limit, 0 means no processing of any + compressed file. Defaults to -1. + + textfilemaxmbs + + Maximum size for text files. Very big text files are often + uninteresting logs. Set to -1 to disable (default 20MB). + + textfilepagekbs + + If set to other than -1, text files will be indexed as multiple + documents of the given page size. This may be useful if you do + want to index very big text files as it will both reduce memory + usage at index time and help with loading data to the preview + window. A size of a few megabytes would seem reasonable (default: + 1MB). + + membermaxkbs + + This defines the maximum size in kilobytes for an archive member + (zip, tar or rar at the moment). Bigger entries will be skipped. + + indexallfilenames + + Recoll indexes file names in a special section of the database to + allow specific file names searches using wild cards. This + parameter decides if file name indexing is performed only for + files with mime types that would qualify them for full text + indexing, or for all files inside the selected subtrees, + independently of mime type. + + usesystemfilecommand + + Decide if we use the file -i system command as a final step for + determining the mime type for a file (the main procedure uses + suffix associations as defined in the mimemap file). This can be + useful for files with suffix-less names, but it will also cause + the indexing of many bogus "text" files. + + processwebqueue + + If this is set, process the directory where Web browser plugins + copy visited pages for indexing. + + webqueuedir + + The path to the web indexing queue. This is hard-coded in the + Firefox plugin as ~/.recollweb/ToIndex so there should be no need + to change it. + + 5.4.1.2. Parameters affecting how we generate terms: + + Changing some of these parameters will imply a full reindex. Also, when + using multiple indexes, it may not make sense to search indexes that don't + share the values for these parameters, because they usually affect both + search and index operations. + + indexStripChars + + Decide if we strip characters of diacritics and convert them to + lower-case before terms are indexed. If we don't, searches + sensitive to case and diacritics can be performed, but the index + will be bigger, and some marginal weirdness may sometimes occur. + The default is a stripped index (indexStripChars = 1) for now. + When using multiple indexes for a search, this parameter must be + defined identically for all. Changing the value implies an index + reset. + + maxTermExpand + + Maximum expansion count for a single term (e.g.: when using + wildcards). The default of 10000 is reasonable and will avoid + queries that appear frozen while the engine is walking the term + list. + + maxXapianClauses + + Maximum number of elementary clauses we can add to a single Xapian + query. In some cases, the result of term expansion can be + multiplicative, and we want to avoid using excessive memory. The + default of 100 000 should be both high enough in most cases and + compatible with current typical hardware configurations. + + nonumbers + + If this set to true, no terms will be generated for numbers. For + example "123", "1.5e6", 192.168.1.4, would not be indexed + ("value123" would still be). Numbers are often quite interesting + to search for, and this should probably not be set except for + special situations, ie, scientific documents with huge amounts of + numbers in them. This can only be set for a whole index, not for a + subtree. + + nocjk + + If this set to true, specific east asian (Chinese Korean Japanese) + characters/word splitting is turned off. This will save a small + amount of cpu if you have no CJK documents. If your document base + does include such text but you are not interested in searching it, + setting nocjk may be a significant time and space saver. + + cjkngramlen + + This lets you adjust the size of n-grams used for indexing CJK + text. The default value of 2 is probably appropriate in most + cases. A value of 3 would allow more precision and efficiency on + longer words, but the index will be approximately twice as large. + + indexstemminglanguages + + A list of languages for which the stem expansion databases will be + built. See recollindex(1) or use the recollindex -l command for + possible values. You can add a stem expansion database for a + different language by using recollindex -s, but it will be deleted + during the next indexing. Only languages listed in the + configuration file are permanent. + + defaultcharset + + The name of the character set used for files that do not contain a + character set definition (ie: plain text files). This can be + redefined for any sub-directory. If it is not set at all, the + character set used is the one defined by the nls environment ( + LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set. + + unac_except_trans + + This is a list of characters, encoded in UTF-8, which should be + handled specially when converting text to unaccented lowercase. + For example, in Swedish, the letter a with diaeresis has full + alphabet citizenship and should not be turned into an a. Each + element in the space-separated list has the special character as + first element and the translation following. The handling of both + the lowercase and upper-case versions of a character should be + specified, as appartenance to the list will turn-off both standard + accent and case processing. Example for Swedish: + + unac_except_trans = aaaa AAaa a:a: A:a: o:o: O:o: + + + Note that the translation is not limited to a single character, + you could very well have something like u:ue in the list. + + The default value set for unac_except_trans can't be listed here + because I have trouble with SGML and UTF-8, but it only contains + ligature decompositions: german ss, oe, ae, fi, fl. + + This parameter can't be defined for subdirectories, it is global, + because there is no way to do otherwise when querying. If you have + document sets which would need different values, you will have to + index and query them separately. + + maildefcharset + + This can be used to define the default character set specifically + for email messages which don't specify it. This is mainly useful + for readpst (libpst) dumps, which are utf-8 but do not say so. + + localfields + + This allows setting fields for all documents under a given + directory. Typical usage would be to set an "rclaptg" field, to be + used in mimeview to select a specific viewer. If several fields + are to be set, they should be separated with a colon (':') + character (which there is currently no way to escape). Ie: + localfields= rclaptg=gnus:other = val, then select specifier + viewer with mimetype|tag=... in mimeview. + + 5.4.1.3. Parameters affecting where and how we store things: + + dbdir + + The name of the Xapian data directory. It will be created if + needed when the index is initialized. If this is not an absolute + path, it will be interpreted relative to the configuration + directory. The value can have embedded spaces but starting or + trailing spaces will be trimmed. You cannot use quotes here. + + idxstatusfile + + The name of the scratch file where the indexer process updates its + status. Default: idxstatus.txt inside the configuration directory. + + maxfsoccuppc + + Maximum file system occupation before we stop indexing. The value + is a percentage, corresponding to what the "Capacity" df output + column shows. The default value is 0, meaning no checking. + + mboxcachedir + + The directory where mbox message offsets cache files are held. + This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful + to share a directory between different configurations. + + mboxcacheminmbs + + The minimum mbox file size over which we cache the offsets. There + is really no sense in caching offsets for small files. The default + is 5 MB. + + webcachedir + + This is only used by the web browser plugin indexing code, and + defines where the cache for visited pages will live. Default: + $RECOLL_CONFDIR/webcache + + webcachemaxmbs + + This is only used by the web browser plugin indexing code, and + defines the maximum size for the web page cache. Default: 40 MB. + + idxflushmb + + Threshold (megabytes of new text data) where we flush from memory + to disk index. Setting this can help control memory usage. A value + of 0 means no explicit flushing, letting Xapian use its own + default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD) + documents, which gives little memory usage control, as memory + usage also depends on average document size. The default value is + 10, and it is probably a bit low. If your system usually has free + memory, you can try higher values between 20 and 80. In my + experience, values beyond 100 are always counterproductive. + + 5.4.1.4. Miscellaneous parameters: + + autodiacsens + + IF the index is not stripped, decide if we automatically trigger + diacritics sensitivity if the search term has accented characters + (not in unac_except_trans). Else you need to use the query + language and the D modifier to specify diacritics sensitivity. + Default is no. + + autocasesens + + IF the index is not stripped, decide if we automatically trigger + character case sensitivity if the search term has upper-case + characters in any but the first position. Else you need to use the + query language and the C modifier to specify character-case + sensitivity. Default is yes. + + loglevel,daemloglevel + + Verbosity level for recoll and recollindex. A value of 4 lists + quite a lot of debug/information messages. 2 only lists errors. + The daemversion is specific to the indexing monitor daemon. + + logfilename, daemlogfilename + + Where the messages should go. 'stderr' can be used as a special + value, and is the default. The daemversion is specific to the + indexing monitor daemon. + + mondelaypatterns + + This allows specify wildcard path patterns (processed with + fnmatch(3) with 0 flag), to match files which change too often and + for which a delay should be observed before re-indexing. This is a + space-separated list, each entry being a pattern and a time in + seconds, separated by a colon. You can use double quotes if a path + entry contains white space. Example: + + mondelaypatterns = *.log:20 "this one has spaces*:10" + + + monixinterval + + Minimum interval (seconds) for processing the indexing queue. The + real time monitor does not process each event when it comes in, + but will wait this time for the queue to accumulate to diminish + overhead and in order to aggregate multiple events to the same + file. Default 30 S. + + monauxinterval + + Period (in seconds) at which the real time monitor will regenerate + the auxiliary databases (spelling, stemming) if needed. The + default is one hour. + + monioniceclass, monioniceclassdata + + These allow defining the ionice class and data used by the indexer + (default class 3, no data). + + filtermaxseconds + + Maximum filter execution time, after which it is aborted. Some + postscript programs just loop... + + filtersdir + + A directory to search for the external filter scripts used to + index some types of files. The value should not be changed, except + if you want to modify one of the default scripts. The value can be + redefined for any sub-directory. + + iconsdir + + The name of the directory where recoll result list icons are + stored. You can change this if you want different images. + + idxabsmlen + + Recoll stores an abstract for each indexed file inside the + database. The text can come from an actual 'abstract' section in + the document or will just be the beginning of the document. It is + stored in the index so that it can be displayed inside the result + lists without decoding the original file. The idxabsmlen parameter + defines the size of the stored abstract. The default value is 250 + bytes. The search interface gives you the choice to display this + stored text or a synthetic abstract built by extracting text + around the search terms. If you always prefer the synthetic + abstract, you can reduce this value and save a little space. + + aspellLanguage + + Language definitions to use when creating the aspell dictionary. + The value must match a set of aspell language definition files. + You can type "aspell config" to see where these are installed + (look for data-dir). The default if the variable is not set is to + use your desktop national language environment to guess the value. + + noaspell + + If this is set, the aspell dictionary generation is turned off. + Useful for cases where you don't need the functionality or when it + is unusable because aspell crashes during dictionary generation. + + mhmboxquirks + + This allows definining location-related quirks for the mailbox + handler. Currently only the tbird flag is defined, and it should + be set for directories which hold Thunderbird data, as their + folder format is weird. + + 5.4.2. The fields file + + This file contains information about dynamic fields handling in Recoll. + Some very basic fields have hard-wired behaviour, and, mostly, you should + not change the original data inside the fields file. But you can create + custom fields fitting your data and handle them just like they were native + ones. + + The fields file has several sections, which each define an aspect of + fields processing. Quite often, you'll have to modify several sections to + obtain the desired behaviour. + + We will only give a short description here, you should refer to the + comments inside the file for more detailed information. + + Field names should be lowercase alphabetic ASCII. + + [prefixes] + + A field becomes indexed (searchable) by having a prefix defined in + this section. + + [stored] + + A field becomes stored (displayable inside results) by having its + name listed in this section (typically with an empty value). + + [aliases] + + This section defines lists of synonyms for the canonical names + used inside the [prefixes] and [stored] sections + + filter-specific sections + + Some filters may need specific configuration for handling fields. + Only the email message filter currently has such a section (named + [mail]). It allows indexing arbitrary email headers in addition to + the ones indexed by default. Other such sections may appear in the + future. + + Here follows a small example of a personal fields file. This would extract + a specific email header and use it as a searchable field, with data + displayable inside result lists. (Side note: as the email filter does no + decoding on the values, only plain ascii headers can be indexed, and only + the first occurrence will be used for headers that occur several times). + + [prefixes] + # Index mailmytag contents (with the given prefix) + mailmytag = XMTAG + + [stored] + # Store mailmytag inside the document data record (so that it can be + # displayed - as %(mailmytag) - in result lists). + mailmytag = + + [mail] + # Extract the X-My-Tag mail header, and use it internally with the + # mailmytag field name + x-my-tag = mailmytag + + 5.4.3. The mimemap file + + mimemap specifies the file name extension to mime type mappings. + + For file names without an extension, or with an unknown one, the system's + file -i command will be executed to determine the mime type (this can be + switched off inside the main configuration file). + + The mappings can be specified on a per-subtree basis, which may be useful + in some cases. Example: gaim logs have a .txt extension but should be + handled specially, which is possible because they are usually all located + in one place. + + mimemap also has a recoll_noindex variable which is a list of suffixes. + Matching files will be skipped (which avoids unnecessary decompressions or + file executions). This is partially redundant with skippedNames in the + main configuration file, with a few differences: it will not affect + directories, it cannot be made dependant on the file-system location (it + is a configuration-wide parameter), and the file names will still be + indexed (not even the file names are indexed for patterns in skippedNames. + recoll_noindex is used mostly for things known to be unindexable by a + given Recoll version. Having it there avoids cluttering the more + user-oriented and locally customized skippedNames. + + 5.4.4. The mimeconf file + + mimeconf specifies how the different mime types are handled for indexing, + and which icons are displayed in the recoll result lists. + + Changing the parameters in the [index] section is probably not a good idea + except if you are a Recoll developer. + + The [icons] section allows you to change the icons which are displayed by + recoll in the result lists (the values are the basenames of the png images + inside the iconsdir directory (specified in recoll.conf). + + 5.4.5. The mimeview file + + mimeview specifies which programs are started when you click on an Open + link in a result list. Ie: HTML is normally displayed using firefox, but + you may prefer Konqueror, your openoffice.org program might be named + oofice instead of openoffice etc. + + Changes to this file can be done by direct editing, or through the recoll + GUI preferences dialog. + + If Use desktop preferences to choose document editor is checked in the + Recoll GUI preferences, all mimeview entries will be ignored except the + one labelled application/x-all (which is set to use xdg-open by default). + + In this case, the xallexcepts top level variable defines a list of mime + type exceptions which will be processed according to the local entries + instead of being passed to the desktop. This is so that specific Recoll + options such as a page number or a search string can be passed to + applications that support them, such as the evince viewer. + + As for the other configuration files, the normal usage is to have a + mimeview inside your own configuration directory, with just the + non-default entries, which will override those from the central + configuration file. + + All viewer definition entries must be placed under a [view] section. + + The keys in the file are normally mime types. You can add an application + tag to specialize the choice for an area of the filesystem (using a + localfields specification in mimeconf). The syntax for the key is + mimetype|tag + + The nouncompforviewmts entry, (placed at the top level, outside of the + [view] section), holds a list of mime types that should not be + uncompressed before starting the viewer (if they are found compressed, ie: + mydoc.doc.gz). + + The right side of each assignment holds a command to be executed for + opening the file. The following substitutions are performed: + + o %D. Document date + + o %f. File name. This may be the name of a temporary file if it was + necessary to create one (ie: to extract a subdocument from a + container). + + o %F. Original file name. Same as %f except if a temporary file is used. + + o %i. Internal path, for subdocuments of containers. The format depends + on the container type. If this appears in the command line, Recoll + will not create a temporary file to extract the subdocument, expecting + the called application (possibly a script) to be able to handle it. + + o %M. Mime type + + o %p. Page index. Only significant for a subset of document types, + currently only PDF, Postscript and DVI files. Can be used to start the + editor at the right page for a match or snippet. + + o %s. Search term. The value will only be set for documents with indexed + page numbers (ie: PDF). The value will be one of the matched search + terms. It would allow pre-setting the value in the "Find" entry inside + Evince for example, for easy highlighting of the term. + + o %U, %u. Url. + + In addition to the predefined values above, all strings like %(fieldname) + will be replaced by the value of the field named fieldname for the + document. This could be used in combination with field customisation to + help with opening the document. + + 5.4.6. Examples of configuration adjustments + + 5.4.6.1. Adding an external viewer for an non-indexed type + + Imagine that you have some kind of file which does not have indexable + content, but for which you would like to have a functional Open link in + the result list (when found by file name). The file names end in .blob and + can be displayed by application blobviewer. + + You need two entries in the configuration files for this to work: + + o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the + following line: + + .blob = application/x-blobapp + + Note that the mime type is made up here, and you could call it + diesel/oil just the same. + + o In $RECOLL_CONFDIR/mimeview under the [view] section, add: + + application/x-blobapp = blobviewer %f + + We are supposing that blobviewer wants a file name parameter here, you + would use %u if it liked URLs better. + + If you just wanted to change the application used by Recoll to display a + mime type which it already knows, you would just need to edit mimeview. + The entries you add in your personal file override those in the central + configuration, which you do not need to alter. mimeview can also be + modified from the Gui. + + 5.4.6.2. Adding indexing support for a new file type + + Let us now imagine that the above .blob files actually contain indexable + text and that you know how to extract it with a command line program. + Getting Recoll to index the files is easy. You need to perform the above + alteration, and also to add data to the mimeconf file (typically in + ~/.recoll/mimeconf): + + o Under the [index] section, add the following line (more about the + rclblob indexing script later): + + application/x-blobapp = exec rclblob + + o Under the [icons] section, you should choose an icon to be displayed + for the files inside the result lists. Icons are normally 64x64 pixels + PNG files which live in /usr/[local/]share/recoll/images. + + o Under the [categories] section, you should add the mime type where it + makes sense (you can also create a category). Categories may be used + for filtering in advanced search. + + The rclblob filter should be an executable program or script which exists + inside /usr/[local/]share/recoll/filters. It will be given a file name as + argument and should output the text or html contents on the standard + output. + + The filter programming section describes in more detail how to write a + filter. + + ---------------------------------------------------------------------- + + Prev Up + 5.3. Building from source Home