diff --git a/src/INSTALL b/src/INSTALL index b17e58e6..3dca1c9f 100644 --- a/src/INSTALL +++ b/src/INSTALL @@ -2,1076 +2,3 @@ More documentation can be found in the doc/ directory or at http://www.recoll.org - Link: HOME - Link: PREVIOUS - Link: NEXT - - Recoll user manual - Prev Next - - -------------------------------------------------------------------------- - - Chapter 5. Installation and configuration - - Table of Contents - - 5.1. Installing a binary copy - - 5.2. Supporting packages - - 5.3. Building from source - - 5.4. Configuration overview - - 5.1. Installing a binary copy - - There are three types of binary Recoll installations: - - * Through your system normal software distribution framework (ie, - Debian/Ubuntu apt, FreeBSD ports, etc.). - - * From a package downloaded from the Recoll web site. - - * From a prebuilt tree downloaded from the Recoll web site. - - In all cases, the strict software dependancies (ie on Xapian or iconv) - will be automatically satisfied, you should not have to worry about them. - - You will only have to check or install supporting applications for the - file types that you want to index beyond those that are natively processed - by Recoll (text, HTML, email files, and a few others). - - You should also maybe have a look at the configuration section (but this - may not be necessary for a quick test with default parameters). Most - parameters can be more conveniently set from the GUI interface. - -5.1.1. Installing through a package system - - If you use a BSD-type port system or a prebuilt package (DEB, RPM, - manually or through the system software configuration utility), just - follow the usual procedure for your system. - -5.1.2. Installing a prebuilt Recoll - - The unpackaged binary versions on the Recoll web site are just compressed - tar files of a build tree, where only the useful parts were kept - (executables and sample configuration). - - The executable binary files are built with a static link to libxapian and - libiconv, to make installation easier (no dependencies). - - After extracting the tar file, you can proceed with installation as if you - had built the package from source (that is, just type make install). The - binary trees are built for installation to /usr/local. - - -------------------------------------------------------------------------- - - Prev Home Next - API Supporting packages - Link: HOME - Link: UP - Link: PREVIOUS - Link: NEXT - - Recoll user manual - Prev Chapter 5. Installation and configuration Next - - -------------------------------------------------------------------------- - - 5.2. Supporting packages - - Recoll uses external applications to index some file types. You need to - install them for the file types that you wish to have indexed (these are - run-time optional dependencies. None is needed for building or running - Recoll except for indexing their specific file type). - - After an indexing pass, the commands that were found missing can be - displayed from the recoll File menu. The list is stored in the missing - text file inside the configuration directory. - - A list of common file types which need external commands follows. Many of - the filters need the iconv command, which is not always listed as a - dependancy. - - Please note that, due to the relatively dynamic nature of this - information, the most up to date version is now kept on the Recoll helper - applications page along with links to the home pages or best - source/patches pages, and misc tips. The list below is not updated often - and may be quite stale. - - For many Linux distributions, most of the commands listed can be installed - from the package repositories. However, the packages are sometimes - outdated, or not the best version for Recoll, so you should take a look at - the Recoll helper applications page if a file type is important to you. - - As of Recoll release 1.14, a number of XML-based formats that were handled - by ad hoc filter code now use the xsltproc command, which usually comes - with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg. - - Now for the list: - - * Openoffice files need unzip and xsltproc. - - * PDF files need pdftotext which is part of the Xpdf or Poppler - packages. - - * Postscript files need pstotext. The original version has an issue with - shell character in file names, which is corrected in recent packages. - See the the Recoll helper applications page for more detail. - - * MS Word needs antiword. It is also useful to have wvWare installed as - it may be be used as a fallback for some files which antiword does not - handle. - - * MS Excel and PowerPoint need catdoc. - - * MS Open XML (docx) needs xsltproc. - - * Wordperfect files need wpd2html from the libwpd (or libwpd-tools on - Ubuntu) package. - - * RTF files need unrtf, which, in its standard version, has much trouble - with non-western character sets. Check the Recoll helper applications - page. - - * TeX files need untex or detex. Check the Recoll helper applications - page for sources if it's not packaged for your distribution. - - * dvi files need dvips. - - * djvu files need djvutxt and djvused from the DjVuLibre package. - - * Audio files: Recoll releases before 1.13 used the id3info command from - the id3lib package to extract mp3 tag information, metaflac (standard - flac tools) for flac files, and ogginfo (vorbis tools) for ogg files. - Releases 1.14 and later use a single Python filter based on mutagen - for all audio file types. - - * Pictures: Recoll uses the Exiftool Perl package to extract tag - information. Most image file formats are supported. Note that there - may not be much interest in indexing the technical tags (image size, - aperture, etc.). This is only of interest if you store personal tags - or textual descriptions inside the image files. - - * chm: files in microsoft help format need Python and the pychm module - (which needs chmlib). - - * ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar - module. icalendar is not needed for newer versions, which use internal - code. - - * Zip archives need Python (and the standard zipfile module). - - * Rar archives need Python, the rarfile Python module and the unrar - utility. - - * Midi karaoke files need Python and the Midi module - - * Konqueror webarchive format with Python (uses the Tarfile module). - - * mimehtml web archive format (support based on the email filter, which - introduces some mild weirdness, but still usable). - - Text, HTML, email folders, and Scribus files are processed internally. Lyx - is used to index Lyx files. Many filters need iconv and the standard sed - and awk. - - -------------------------------------------------------------------------- - - Prev Home Next - Installation and configuration Up Building from source - Link: HOME - Link: UP - Link: PREVIOUS - Link: NEXT - - Recoll user manual - Prev Chapter 5. Installation and configuration Next - - -------------------------------------------------------------------------- - - 5.3. Building from source - -5.3.1. Prerequisites - - C++ compiler. Up to Recoll version 1.13.04, its absence can manifest - itself by strange messages about a missing iconv_open. - - Development files for Xapian core. - - Important: If you are building Xapian for an older CPU (before Pentium 4 - or Athlon 64), you need to add the --disable-sse flag to the configure - command. Else all Xapian application will crash with an illegal - instruction error. - - Development files for Qt . - - Development files for X11 and zlib. - - Check the Recoll download page for up to date version information. - - You will most probably be able to find a binary package for Qt for your - system. You may have to compile Xapian but this is not difficult (if you - are using FreeBSD, there is a port). - - You may also need libiconv. Recoll currently uses version 1.9 (this should - not be critical). On Linux systems, the iconv interface is part of libc - and you should not need to do anything special. - -5.3.2. Building - - Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most - versions after 2005 should be ok, maybe some older ones too (Solaris 8 is - ok). If you build on another system, and need to modify things, I would - very much welcome patches. - - Depending on the Qt 3 configuration on your system, you may have to set - the QTDIR and QMAKESPECS variables in your environment: - - * QTDIR should point to the directory above the one that holds the qt - include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should - be /usr/local/qt). - - * QMAKESPECS should be set to the name of one of the Qt mkspecs - sub-directories (ie: linux-g++). - - On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS - is not needed because there is a default link in mkspecs/. - - Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration - details are entirely determined by qmake (which is quite often installed - as qmake-qt4). - - Configure options: - - * --without-aspell will disable the code for phonetic matching of search - terms. - - * --with-fam or --with-inotify will enable the code for real time - indexing. Inotify support is enabled by default on recent Linux - systems. - - * --disable-webkit is available from version 1.17 to implement the - result list with a Qt QTextBrowser instead of a WebKit widget if you - do not or can't depend on the latter. - - * --enable-xattr will enable code to fetch data from file extended - attributes. This is only useful is some application stores data in - there, and also needs some simple configuration (see comments in the - fields configuration file). - - * --enable-camelcase will enable splitting camelCase words. This is not - enabled by default as it has the unfortunate side-effect of making - some phrase searches quite confusing: ie, "MySQL manual" would be - matched by "MySQL manual" and "my sql manual" but not "mysql manual" - (only inside phrase searches). - - * --with-file-command Specify the version of the 'file' command to use - (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable - the gnu version on systems where the native one is bad. - - * --disable-qtgui Disable the Qt interface. Will allow building the - indexer and the command line search program in absence of a Qt - environment. - - * --disable-x11mon Disable X11 connection monitoring inside recollindex. - Together with --disable-qtgui, this allows building recoll without Qt - and X11. - - * Of course the usual autoconf configure options, like --prefix apply. - - Normal procedure: - - cd recoll-xxx - configure - make - (practices usual hardship-repelling invocations) - - - There is little auto-configuration. The configure script will mainly link - one of the system-specific files in the mk directory to mk/sysconf. If - your system is not known yet, it will tell you as much, and you may want - to manually copy and modify one of the existing files (the new file name - should be the output of uname -s). - -5.3.3. Installation - - Either type make install or execute recollinstall prefix, in the root of - the source tree. This will copy the commands to prefix/bin and the sample - configuration files, scripts and other shared data to prefix/share/recoll. - - If the installation prefix given to recollinstall is different from either - the system default or the value which was specified when executing - configure (as in configure --prefix /some/path), you will have to set the - RECOLL_DATADIR environment variable to indicate where the shared data is - to be found (ie for (ba)sh: export - RECOLL_DATADIR=/some/path/share/recoll). - - You can then proceed to configuration. - - -------------------------------------------------------------------------- - - Prev Home Next - Supporting packages Up Configuration overview - Link: HOME - Link: UP - Link: PREVIOUS - - Recoll user manual - Prev Chapter 5. Installation and configuration - - -------------------------------------------------------------------------- - - 5.4. Configuration overview - - Most of the parameters specific to the recoll GUI are set through the - Preferences menu and stored in the standard Qt place - ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit - this by hand. - - Recoll indexing options are set inside text configuration files located in - a configuration directory. There can be several such directories, each of - which define the parameters for one index. - - The configuration files can be edited by hand or through the Index - configuration dialog (Preferences menu). The GUI tool will try to respect - your formatting and comments as much as possible, so it is quite possible - to use both ways. - - The most accurate documentation for the configuration parameters is given - by comments inside the default files, and we will just give a general - overview here. - - For each index, there are two sets of configuration files. System-wide - configuration files are kept in a directory named like - /usr/[local/]share/recoll/examples, and define default values, shared by - all indexes. For each index, a parallel set of files defines the - customized parameters. - - The default location of the configuration is the .recoll directory in your - home. Most people will only use this directory. - - This location can be changed, or others can be added with the - RECOLL_CONFDIR environment variable or the -c option parameter to recoll - and recollindex. - - If the .recoll directory does not exist when recoll or recollindex are - started, it will be created with a set of empty configuration files. - recoll will give you a chance to edit the configuration file before - starting indexing. recollindex will proceed immediately. To avoid - mistakes, the automatic directory creation will only occur for the default - location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you - will have to create the directory). - - All configuration files share the same format. For example, a short - extract of the main configuration file might look as follows: - - # Space-separated list of directories to index. - topdirs = ~/docs /usr/share/doc - - [~/somedirectory-with-utf8-txt-files] - defaultcharset = utf-8 - - - There are three kinds of lines: - - * Comment (starts with #) or empty. - - * Parameter affectation (name = value). - - * Section definition ([somedirname]). - - Depending on the type of configuration file, section definitions either - separate groups of parameters or allow redefining some parameters for a - directory sub-tree. They stay in effect until another section definition, - or the end of file, is encountered. Some of the parameters used for - indexing are looked up hierarchically from the current directory location - upwards. Not all parameters can be meaningfully redefined, this is - specified for each in the next section. - - When found at the beginning of a file path, the tilde character (~) is - expanded to the name of the user's home directory, as a shell would do. - - White space is used for separation inside lists. List elements with - embedded spaces can be quoted using double-quotes. - - Encoding issues. Most of the configuration parameters are plain ASCII. Two - particular sets of values may cause encoding issues: - - * File path parameters may contain non-ascii characters and should use - the exact same byte values as found in the file system directory. - Usually, this means that the configuration file should use the system - default locale encoding. - - * The unac_except_trans parameter should be encoded in UTF-8. If your - system locale is not UTF-8, and you need to also specify non-ascii - file paths, this poses a difficulty because common text editors cannot - handle multiple encodings in a single file. In this relatively - unlikely case, you can edit the configuration file as two separate - text files with appropriate encodings, and concatenate them to create - the complete configuration. - -5.4.1. Main configuration file - - recoll.conf is the main configuration file. It defines things like what to - index (top directories and things to ignore), and the default character - set to use for document types which do not specify it internally. - - The default configuration will index your home directory. If this is not - appropriate, start recoll to create a blank configuration, click Cancel, - and edit the configuration file before restarting the command. This will - start the initial indexing, which may take some time. - - Most of the following parameters can be changed from the Index - Configuration menu in the recoll interface. Some can only be set by - editing the configuration file. - - 5.4.1.1. Parameters affecting what documents we index: - - topdirs - - Specifies the list of directories or files to index (recursively - for directories). You can use symbolic links as elements of this - list. See the followLinks option about following symbolic links - found under the top elements (not followed by default). - - skippedNames - - A space-separated list of patterns for names of files or - directories that should be completely ignored. The list defined in - the default file is: - - skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ - *~ .beagle .git .hg .bzr loop.ps .xsession-errors \ - .recoll* xapiandb recollrc recoll.conf - - The list can be redefined at any sub-directory in the indexed - area. - - The top-level directories are not affected by this list (that is, - a directory in topdirs might match and would still be indexed). - - The list in the default configuration does not exclude hidden - directories (names beginning with a dot), which means that it may - index quite a few things that you do not want. On the other hand, - email user agents like thunderbird usually store messages in - hidden directories, and you probably want this indexed. One - possible solution is to have .* in skippedNames, and add things - like ~/.thunderbird or ~/.evolution in topdirs. - - Not even the file names are indexed for patterns in this list. See - the recoll_noindex variable in mimemap for an alternative approach - which indexes the file names. - - skippedPaths and daemSkippedPaths - - A space-separated list of patterns for paths of files or - directories that should be skipped. There is no default in the - sample configuration file, but the code always adds the - configuration and database directories in there. - - skippedPaths is used both by batch and real time indexing. - daemSkippedPaths can be used to specify things that should be - indexed at startup, but not monitored. - - Example of use for skipping text files only in a specific - directory: - - skippedPaths = ~/somedir/..txt - - - skippedPathsFnmPathname - - The values in the *skippedPaths variables are matched by default - with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags. - This means that '/' characters must be matched explicitely. You - can set skippedPathsFnmPathname to 0 to disable the use of - FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3). - - followLinks - - Specifies if the indexer should follow symbolic links while - walking the file tree. The default is to ignore symbolic links to - avoid multiple indexing of linked files. No effort is made to - avoid duplication when this option is set to true. This option can - be set individually for each of the topdirs members by using - sections. It can not be changed below the topdirs level. - - indexedmimetypes - - Recoll normally indexes any file which it knows how to read. This - list lets you restrict the indexed mime types to what you specify. - If the variable is unspecified or the list empty (the default), - all supported types are processed. - - compressedfilemaxkbs - - Size limit for compressed (.gz or .bz2) files. These need to be - decompressed in a temporary directory for identification, which - can be very wasteful if 'uninteresting' big compressed files are - present. Negative means no limit, 0 means no processing of any - compressed file. Defaults to -1. - - textfilemaxmbs - - Maximum size for text files. Very big text files are often - uninteresting logs. Set to -1 to disable (default 20MB). - - textfilepagekbs - - If set to other than -1, text files will be indexed as multiple - documents of the given page size. This may be useful if you do - want to index very big text files as it will both reduce memory - usage at index time and help with loading data to the preview - window. A size of a few megabytes would seem reasonable (default: - 1MB). - - membermaxkbs - - This defines the maximum size in kilobytes for an archive member - (zip, tar or rar at the moment). Bigger entries will be skipped. - - indexallfilenames - - Recoll indexes file names in a special section of the database to - allow specific file names searches using wild cards. This - parameter decides if file name indexing is performed only for - files with mime types that would qualify them for full text - indexing, or for all files inside the selected subtrees, - independently of mime type. - - usesystemfilecommand - - Decide if we use the file -i system command as a final step for - determining the mime type for a file (the main procedure uses - suffix associations as defined in the mimemap file). This can be - useful for files with suffix-less names, but it will also cause - the indexing of many bogus "text" files. - - processbeaglequeue - - If this is set, process the directory where Beagle Web browser - plugins copy visited pages for indexing. Of course, Beagle MUST - NOT be running, else things will behave strangely. - - beaglequeuedir - - The path to the Beagle indexing queue. This is hard-coded in the - Beagle plugin as ~/.beagle/ToIndex so there should be no need to - change it. - - 5.4.1.2. Parameters affecting how we generate terms: - - Changing some of these parameters will imply a full reindex. Also, when - using multiple indexes, it may not make sense to search indexes that don't - share the values for these parameters, because they usually affect both - search and index operations. - - indexStripChars - - Decide if we strip characters of diacritics and convert them to - lower-case before terms are indexed. If we don't, searches - sensitive to case and diacritics can be performed, but the index - will be bigger, and some marginal weirdness may sometimes occur. - The default is a stripped index (indexStripChars = 1) for now. - When using multiple indexes for a search, this parameter must be - defined identically for all. Changing the value implies an index - reset. - - maxTermExpand - - Maximum expansion count for a single term (e.g.: when using - wildcards). The default of 10000 is reasonable and will avoid - queries that appear frozen while the engine is walking the term - list. - - maxXapianClauses - - Maximum number of elementary clauses we can add to a single Xapian - query. In some cases, the result of term expansion can be - multiplicative, and we want to avoid using excessive memory. The - default of 100 000 should be both high enough in most cases and - compatible with current typical hardware configurations. - - nonumbers - - If this set to true, no terms will be generated for numbers. For - example "123", "1.5e6", 192.168.1.4, would not be indexed - ("value123" would still be). Numbers are often quite interesting - to search for, and this should probably not be set except for - special situations, ie, scientific documents with huge amounts of - numbers in them. This can only be set for a whole index, not for a - subtree. - - nocjk - - If this set to true, specific east asian (Chinese Korean Japanese) - characters/word splitting is turned off. This will save a small - amount of cpu if you have no CJK documents. If your document base - does include such text but you are not interested in searching it, - setting nocjk may be a significant time and space saver. - - cjkngramlen - - This lets you adjust the size of n-grams used for indexing CJK - text. The default value of 2 is probably appropriate in most - cases. A value of 3 would allow more precision and efficiency on - longer words, but the index will be approximately twice as large. - - indexstemminglanguages - - A list of languages for which the stem expansion databases will be - built. See recollindex(1) or use the recollindex -l command for - possible values. You can add a stem expansion database for a - different language by using recollindex -s, but it will be deleted - during the next indexing. Only languages listed in the - configuration file are permanent. - - defaultcharset - - The name of the character set used for files that do not contain a - character set definition (ie: plain text files). This can be - redefined for any sub-directory. If it is not set at all, the - character set used is the one defined by the nls environment ( - LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set. - - unac_except_trans - - This is a list of characters, encoded in UTF-8, which should be - handled specially when converting text to unaccented lowercase. - For example, in Swedish, the letter a with diaeresis has full - alphabet citizenship and should not be turned into an a. Each - element in the space-separated list has the special character as - first element and the translation following. The handling of both - the lowercase and upper-case versions of a character should be - specified, as appartenance to the list will turn-off both standard - accent and case processing. Example for Swedish: - - unac_except_trans = aaaa AAaa a:a: A:a: o:o: O:o: - - - Note that the translation is not limited to a single character, - you could very well have something like u:ue in the list. - - The default value set for unac_except_trans can't be listed here - because I have trouble with SGML and UTF-8, but it only contains - ligature decompositions: german ss, oe, ae, fi, fl. - - This parameter can't be defined for subdirectories, it is global, - because there is no way to do otherwise when querying. If you have - document sets which would need different values, you will have to - index and query them separately. - - maildefcharset - - This can be used to define the default character set specifically - for email messages which don't specify it. This is mainly useful - for readpst (libpst) dumps, which are utf-8 but do not say so. - - localfields - - This allows setting fields for all documents under a given - directory. Typical usage would be to set an "rclaptg" field, to be - used in mimeview to select a specific viewer. If several fields - are to be set, they should be separated with a colon (':') - character (which there is currently no way to escape). Ie: - localfields= rclaptg=gnus:other = val, then select specifier - viewer with mimetype|tag=... in mimeview. - - 5.4.1.3. Parameters affecting where and how we store things: - - dbdir - - The name of the Xapian data directory. It will be created if - needed when the index is initialized. If this is not an absolute - path, it will be interpreted relative to the configuration - directory. The value can have embedded spaces but starting or - trailing spaces will be trimmed. You cannot use quotes here. - - idxstatusfile - - The name of the scratch file where the indexer process updates its - status. Default: idxstatus.txt inside the configuration directory. - - maxfsoccuppc - - Maximum file system occupation before we stop indexing. The value - is a percentage, corresponding to what the "Capacity" df output - column shows. The default value is 0, meaning no checking. - - mboxcachedir - - The directory where mbox message offsets cache files are held. - This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful - to share a directory between different configurations. - - mboxcacheminmbs - - The minimum mbox file size over which we cache the offsets. There - is really no sense in caching offsets for small files. The default - is 5 MB. - - webcachedir - - This is only used by the Beagle web browser plugin indexing code, - and defines where the cache for visited pages will live. Default: - $RECOLL_CONFDIR/webcache - - webcachemaxmbs - - This is only used by the Beagle web browser plugin indexing code, - and defines the maximum size for the web page cache. Default: 40 - MB. - - idxflushmb - - Threshold (megabytes of new text data) where we flush from memory - to disk index. Setting this can help control memory usage. A value - of 0 means no explicit flushing, letting Xapian use its own - default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD) - documents, which gives little memory usage control, as memory - usage depends on average document size. The default value is 10. - - 5.4.1.4. Miscellaneous parameters: - - autodiacsens - - IF the index is not stripped, decide if we automatically trigger - diacritics sensitivity if the search term has accented characters - (not in unac_except_trans). Else you need to use the query - language and the D modifier to specify diacritics sensitivity. - Default is no. - - autocasesens - - IF the index is not stripped, decide if we automatically trigger - character case sensitivity if the search term has upper-case - characters in any but the first position. Else you need to use the - query language and the C modifier to specify character-case - sensitivity. Default is yes. - - loglevel,daemloglevel - - Verbosity level for recoll and recollindex. A value of 4 lists - quite a lot of debug/information messages. 2 only lists errors. - The daemversion is specific to the indexing monitor daemon. - - logfilename, daemlogfilename - - Where the messages should go. 'stderr' can be used as a special - value, and is the default. The daemversion is specific to the - indexing monitor daemon. - - mondelaypatterns - - This allows specify wildcard path patterns (processed with - fnmatch(3) with 0 flag), to match files which change too often and - for which a delay should be observed before re-indexing. This is a - space-separated list, each entry being a pattern and a time in - seconds, separated by a colon. You can use double quotes if a path - entry contains white space. Example: - - mondelaypatterns = *.log:20 "this one has spaces*:10" - - - monixinterval - - Minimum interval (seconds) for processing the indexing queue. The - real time monitor does not process each event when it comes in, - but will wait this time for the queue to accumulate to diminish - overhead and in order to aggregate multiple events to the same - file. Default 30 S. - - monauxinterval - - Period (in seconds) at which the real time monitor will regenerate - the auxiliary databases (spelling, stemming) if needed. The - default is one hour. - - monioniceclass, monioniceclassdata - - These allow defining the ionice class and data used by the indexer - (default class 3, no data). - - filtermaxseconds - - Maximum filter execution time, after which it is aborted. Some - postscript programs just loop... - - filtersdir - - A directory to search for the external filter scripts used to - index some types of files. The value should not be changed, except - if you want to modify one of the default scripts. The value can be - redefined for any sub-directory. - - iconsdir - - The name of the directory where recoll result list icons are - stored. You can change this if you want different images. - - idxabsmlen - - Recoll stores an abstract for each indexed file inside the - database. The text can come from an actual 'abstract' section in - the document or will just be the beginning of the document. It is - stored in the index so that it can be displayed inside the result - lists without decoding the original file. The idxabsmlen parameter - defines the size of the stored abstract. The default value is 250 - bytes. The search interface gives you the choice to display this - stored text or a synthetic abstract built by extracting text - around the search terms. If you always prefer the synthetic - abstract, you can reduce this value and save a little space. - - aspellLanguage - - Language definitions to use when creating the aspell dictionary. - The value must match a set of aspell language definition files. - You can type "aspell config" to see where these are installed - (look for data-dir). The default if the variable is not set is to - use your desktop national language environment to guess the value. - - noaspell - - If this is set, the aspell dictionary generation is turned off. - Useful for cases where you don't need the functionality or when it - is unusable because aspell crashes during dictionary generation. - - mhmboxquirks - - This allows definining location-related quirks for the mailbox - handler. Currently only the tbird flag is defined, and it should - be set for directories which hold Thunderbird data, as their - folder format is weird. - -5.4.2. The fields file - - This file contains information about dynamic fields handling in Recoll. - Some very basic fields have hard-wired behaviour, and, mostly, you should - not change the original data inside the fields file. But you can create - custom fields fitting your data and handle them just like they were native - ones. - - The fields file has several sections, which each define an aspect of - fields processing. Quite often, you'll have to modify several sections to - obtain the desired behaviour. - - We will only give a short description here, you should refer to the - comments inside the file for more detailed information. - - Field names should be lowercase alphabetic ASCII. - - [prefixes] - - A field becomes indexed (searchable) by having a prefix defined in - this section. - - [stored] - - A field becomes stored (displayable inside results) by having its - name listed in this section (typically with an empty value). - - [aliases] - - This section defines lists of synonyms for the canonical names - used inside the [prefixes] and [stored] sections - - filter-specific sections - - Some filters may need specific configuration for handling fields. - Only the email message filter currently has such a section (named - [mail]). It allows indexing arbitrary email headers in addition to - the ones indexed by default. Other such sections may appear in the - future. - - Here follows a small example of a personal fields file. This would extract - a specific email header and use it as a searchable field, with data - displayable inside result lists. (Side note: as the email filter does no - decoding on the values, only plain ascii headers can be indexed, and only - the first occurrence will be used for headers that occur several times). - - [prefixes] - # Index mailmytag contents (with the given prefix) - mailmytag = XMTAG - - [stored] - # Store mailmytag inside the document data record (so that it can be - # displayed - as %(mailmytag) - in result lists). - mailmytag = - - [mail] - # Extract the X-My-Tag mail header, and use it internally with the - # mailmytag field name - x-my-tag = mailmytag - -5.4.3. The mimemap file - - mimemap specifies the file name extension to mime type mappings. - - For file names without an extension, or with an unknown one, the system's - file -i command will be executed to determine the mime type (this can be - switched off inside the main configuration file). - - The mappings can be specified on a per-subtree basis, which may be useful - in some cases. Example: gaim logs have a .txt extension but should be - handled specially, which is possible because they are usually all located - in one place. - - mimemap also has a recoll_noindex variable which is a list of suffixes. - Matching files will be skipped (which avoids unnecessary decompressions or - file executions). This is partially redundant with skippedNames in the - main configuration file, with a few differences: it will not affect - directories, it cannot be made dependant on the file-system location (it - is a configuration-wide parameter), and the file names will still be - indexed (not even the file names are indexed for patterns in skippedNames. - recoll_noindex is used mostly for things known to be unindexable by a - given Recoll version. Having it there avoids cluttering the more - user-oriented and locally customized skippedNames. - -5.4.4. The mimeconf file - - mimeconf specifies how the different mime types are handled for indexing, - and which icons are displayed in the recoll result lists. - - Changing the parameters in the [index] section is probably not a good idea - except if you are a Recoll developer. - - The [icons] section allows you to change the icons which are displayed by - recoll in the result lists (the values are the basenames of the png images - inside the iconsdir directory (specified in recoll.conf). - -5.4.5. The mimeview file - - mimeview specifies which programs are started when you click on an Open - link in a result list. Ie: HTML is normally displayed using firefox, but - you may prefer Konqueror, your openoffice.org program might be named - oofice instead of openoffice etc. - - Changes to this file can be done by direct editing, or through the recoll - GUI preferences dialog. - - If Use desktop preferences to choose document editor is checked in the - Recoll GUI preferences, all mimeview entries will be ignored except the - one labelled application/x-all (which is set to use xdg-open by default). - - In this case, the xallexcepts top level variable defines a list of mime - type exceptions which will be processed according to the local entries - instead of being passed to the desktop. This is so that specific Recoll - options such as a page number or a search string can be passed to - applications that support them, such as the evince viewer. - - As for the other configuration files, the normal usage is to have a - mimeview inside your own configuration directory, with just the - non-default entries, which will override those from the central - configuration file. - - All viewer definition entries must be placed under a [view] section. - - The keys in the file are normally mime types. You can add an application - tag to specialize the choice for an area of the filesystem (using a - localfields specification in mimeconf). The syntax for the key is - mimetype|tag - - The nouncompforviewmts entry, (placed at the top level, outside of the - [view] section), holds a list of mime types that should not be - uncompressed before starting the viewer (if they are found compressed, ie: - mydoc.doc.gz). - - The right side of each assignment holds a command to be executed for - opening the file. The following substitutions are performed: - - * %D. Document date - - * %f. File name. This may be the name of a temporary file if it was - necessary to create one (ie: to extract a subdocument from a - container). - - * %F. Original file name. Same as %f except if a temporary file is used. - - * %i. Internal path, for subdocuments of containers. The format depends - on the container type. If this appears in the command line, Recoll - will not create a temporary file to extract the subdocument, expecting - the called application (possibly a script) to be able to handle it. - - * %M. Mime type - - * %p. Page index. Only significant for a subset of document types, - currently only PDF, Postscript and DVI files. Can be used to start the - editor at the right page for a match or snippet. - - * %s. Search term. The value will only be set for documents with indexed - page numbers (ie: PDF). The value will be one of the matched search - terms. It would allow pre-setting the value in the "Find" entry inside - Evince for example, for easy highlighting of the term. - - * %U, %u. Url. - - In addition to the predefined values above, all strings like %(fieldname) - will be replaced by the value of the field named fieldname for the - document. This could be used in combination with field customisation to - help with opening the document. - -5.4.6. Examples of configuration adjustments - - 5.4.6.1. Adding an external viewer for an non-indexed type - - Imagine that you have some kind of file which does not have indexable - content, but for which you would like to have a functional Open link in - the result list (when found by file name). The file names end in .blob and - can be displayed by application blobviewer. - - You need two entries in the configuration files for this to work: - - * In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the - following line: - - .blob = application/x-blobapp - - Note that the mime type is made up here, and you could call it - diesel/oil just the same. - * In $RECOLL_CONFDIR/mimeview under the [view] section, add: - - application/x-blobapp = blobviewer %f - - We are supposing that blobviewer wants a file name parameter here, you - would use %u if it liked URLs better. - - If you just wanted to change the application used by Recoll to display a - mime type which it already knows, you would just need to edit mimeview. - The entries you add in your personal file override those in the central - configuration, which you do not need to alter. mimeview can also be - modified from the Gui. - - 5.4.6.2. Adding indexing support for a new file type - - Let us now imagine that the above .blob files actually contain indexable - text and that you know how to extract it with a command line program. - Getting Recoll to index the files is easy. You need to perform the above - alteration, and also to add data to the mimeconf file (typically in - ~/.recoll/mimeconf): - - * Under the [index] section, add the following line (more about the - rclblob indexing script later): - - application/x-blobapp = exec rclblob - - * Under the [icons] section, you should choose an icon to be displayed - for the files inside the result lists. Icons are normally 64x64 pixels - PNG files which live in /usr/[local/]share/recoll/images. - - * Under the [categories] section, you should add the mime type where it - makes sense (you can also create a category). Categories may be used - for filtering in advanced search. - - The rclblob filter should be an executable program or script which exists - inside /usr/[local/]share/recoll/filters. It will be given a file name as - argument and should output the text or html contents on the standard - output. - - The filter programming section describes in more detail how to write a - filter. - - -------------------------------------------------------------------------- - - Prev Home - Building from source Up diff --git a/src/README b/src/README index 98a5b9a4..7c658afe 100644 --- a/src/README +++ b/src/README @@ -14,8 +14,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or installation and use of the Recoll application. It currently describes Recoll 1.18. - [ Split HTML / Single HTML ] - ---------------------------------------------------------------------- Table of Contents @@ -54,7 +52,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 2.3.3. The index configuration GUI - 2.4. Using Beagle WEB browser plugins + 2.4. Index WEB visited page history 2.5. Periodic indexing @@ -77,22 +75,24 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 3.1.3. The result table - 3.1.4. The preview window + 3.1.4. Displaying thumbnails - 3.1.5. Complex/advanced search + 3.1.5. The preview window - 3.1.6. The term explorer tool + 3.1.6. Complex/advanced search - 3.1.7. Multiple indexes + 3.1.7. The term explorer tool - 3.1.8. Document history + 3.1.8. Multiple indexes - 3.1.9. Sorting search results and collapsing + 3.1.9. Document history + + 3.1.10. Sorting search results and collapsing duplicates - 3.1.10. Search tips, shortcuts + 3.1.11. Search tips, shortcuts - 3.1.11. Customizing the search interface + 3.1.12. Customizing the search interface 3.2. Searching with the KDE KIO slave @@ -126,11 +126,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 4.1.1. Simple filters - 4.1.2. Telling Recoll about the filter + 4.1.2. "Multiple" filters - 4.1.3. Filter HTML output + 4.1.3. Telling Recoll about the filter - 4.1.4. Page numbers + 4.1.4. Filter HTML output + + 4.1.5. Page numbers 4.2. Field data processing @@ -172,9 +174,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 5.4.6. Examples of configuration adjustments - ---------------------------------------------------------------------- - - Chapter 1. Introduction +Chapter 1. Introduction 1.1. Giving it a try @@ -192,8 +192,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or applications for document types that need them (for example antiword for Microsoft Word files). - ---------------------------------------------------------------------- - 1.2. Full text search Recoll is a full text search application. Full text search applications @@ -228,8 +226,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or explorer) which will let you explore the set of index terms along different modes. - ---------------------------------------------------------------------- - 1.3. Recoll overview Recoll uses the Xapian information retrieval library as its storage and @@ -311,9 +307,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Python programming interface, a KDE KIO slave module, and a Ubuntu Unity Lens module. - ---------------------------------------------------------------------- - - Chapter 2. Indexing +Chapter 2. Indexing 2.1. Introduction @@ -327,17 +321,15 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections. - ---------------------------------------------------------------------- - 2.1.1. Indexing modes Recoll indexing can be performed along two different modes: - * Periodic (or batch) indexing: indexing takes place at discrete times, + o Periodic (or batch) indexing: indexing takes place at discrete times, by executing the recollindex command. The typical usage is to have a nightly indexing run programmed into your cron file. - * Real time indexing: indexing takes place as soon as a file is created + o Real time indexing: indexing takes place as soon as a file is created or changed. recollindex runs as a daemon and uses a file system alteration monitor such as inotify, Fam or Gamin to detect file changes. @@ -349,9 +341,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or significant system resources. The choice of method and the parameters used can be configured from the - recoll GUI: Preferences->Indexing schedule - - ---------------------------------------------------------------------- + recoll GUI: Preferences -> Indexing schedule 2.1.2. Configurations, multiple indexes @@ -382,8 +372,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or each other. When multiple indexes need to be used for a single search, some parameters should be consistent among the configurations. - ---------------------------------------------------------------------- - 2.1.3. Document types Recoll knows about quite a few different document types. The parameters @@ -404,12 +392,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or applications for preprocessing. The list is in the installation section. After every indexing operation, Recoll updates a list of commands that would be needed for indexing existing files types. This list can be - displayed by selecting the menu option File->Show Missing Helpers in the + displayed by selecting the menu option File -> Show Missing Helpers in the recoll GUI. It is stored in the missing text file inside the configuration directory. - ---------------------------------------------------------------------- - 2.1.4. Recovery In the rare case where the index becomes corrupted (which can signal @@ -419,15 +405,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or recollindex with the -z option, which will reset the database before indexing. - ---------------------------------------------------------------------- - 2.2. Index storage The default location for the index data is the xapiandb subdirectory of the Recoll configuration directory, typically $HOME/.recoll/xapiandb/. This can be changed via two different methods (with different purposes): - * You can specify a different configuration directory by setting the + o You can specify a different configuration directory by setting the RECOLL_CONFDIR environment variable, or using the -c option to the Recoll commands. This method would typically be used to index different areas of the file system to different indexes. For example, @@ -445,7 +429,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or allows you to tailor multiple configurations and indexes to handle whatever subset of the available data you wish to make searchable. - * For a given configuration directory, you can specify a non-default + o For a given configuration directory, you can specify a non-default storage location for the index by setting the dbdir parameter in the configuration file (see the configuration section). This method would mainly be of use if you wanted to keep the configuration directory in @@ -468,8 +452,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or completely rebuilt by an index run (as long as the original documents exist), and it can always be destroyed safely. - ---------------------------------------------------------------------- - 2.2.1. Xapian index formats Xapian versions usually support several formats for index storage. A given @@ -486,8 +468,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or you will have to delete all files inside the index directory (typically ~/.recoll/xapiandb) before starting the indexing. - ---------------------------------------------------------------------- - 2.2.2. Security aspects The Recoll index does not hold copies of the indexed documents. But it @@ -504,8 +484,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or need for your index, set the directory and files access modes appropriately, and also maybe adjust the umask used during index updates. - ---------------------------------------------------------------------- - 2.3. Index configuration Variables set inside the Recoll configuration files control which areas of @@ -534,8 +512,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or depending on the treatment of character case and diacritics. The next section describes the two types in more detail. - ---------------------------------------------------------------------- - 2.3.1. Multiple indexes Multiple Recoll indexes can be created by using several configuration @@ -575,8 +551,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or are other constraints. Most of the relevant parameters are described in the linked section. - ---------------------------------------------------------------------- - 2.3.2. Index case and diacritics sensitivity As of Recoll version 1.18 you have a choice of building an index with @@ -608,17 +582,15 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or probably slightly slower, and the feature is still young, so that a certain amount of weirdness cannot be excluded. - ---------------------------------------------------------------------- - 2.3.3. The index configuration GUI Most parameters for a given index configuration can be set from a recoll GUI running on this configuration (either as default, or by setting RECOLL_CONFDIR or the -c option.) - The interface is started from the Preferences->Index Configuration menu + The interface is started from the Preferences -> Index Configuration menu entry. It is divided in four tabs, Global parameters, Local parameters, - Beagle web history (which is explained in the next section) and Search + Web history (which is explained in the next section) and Search parameters. The Global parameters tab allows setting global variables, like the lists @@ -643,34 +615,28 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or use it on hand-edited files, which you might nevertheless want to backup first... - ---------------------------------------------------------------------- +2.4. Index WEB visited page history -2.4. Using Beagle WEB browser plugins + With the help of a Firefox extension, Recoll can index the Internet pages + that you visit. The extension was initially designed for the Beagle + indexer, but it has recently be renamed and better adapted to Recoll. - Beagle is (was?) a concurrent desktop indexer, built on Lucene and the - Mono project (C#), for which a number of add-on browser plugins were - written. These work by copying visited web pages to an indexing queue - directory, which the indexer then processes. Especially, there is a - Firefox extension. - - If, for any reason, you so happen to prefer Recoll to Beagle, you can - still use the Firefox plugin, which is written in Javascript and - completely independant of C#, Beagle, Lucene..., and set Recoll to process - the Beagle queue directory. This supposes that Beagle is not running, else - both programs will fight for the same files. + The extension works by copying visited WEB pages to an indexing queue + directory, which Recoll then processes, indexing the data, storing it into + a local cache, then removing the file from the queue. This feature can be enabled in the GUI Index configuration panel, or by - editing the configuration file (set processbeaglequeue to 1). + editing the configuration file (set processwebqueue to 1). - There are more recent instructions about how to find and install the - Firefox extension on the Recoll wiki. + A current pointer to the extension can be found, along with up-to-date + instructions, on the Recoll wiki. - Unfortunately, it seems that the plugin does not work anymore with recent - Firefox versions (tried with 10.0). This is not the trival installation - version check issue, explicit manual indexing requests still work, but - automatic indexing on page load does not. - - ---------------------------------------------------------------------- + A copy of the indexed WEB pages is retained by Recoll in a local cache + (from which previews can be fetched). The cache size can be adjusted from + the Index configuration / Web history panel. Once the maximum size is + reached, old pages are purged - both from the cache and the index - to + make room for new ones, so you need to explicitly archive in some other + place the pages that you want to keep indefinitely. 2.5. Periodic indexing @@ -689,7 +655,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The recollindex indexing process can be interrupted by sending an interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal. Some time may elapse before the process exits, because it needs to properly flush and - close the index. This can also be done from the recoll GUI File->Stop + close the index. This can also be done from the recoll GUI File -> Stop Indexing menu entry. After such an interruption, the index will be somewhat inconsistent @@ -723,15 +689,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or method to build the file list to be fed to recollindex -if. Trivial example: - find . -name indexable.txt -print | recollindex -if + find . -name indexable.txt -print | recollindex -if recollindex -i will not descend into subdirectories specified as parameters, but just add them as index entries. It is up to the external file selection method to build the complete file list. - ---------------------------------------------------------------------- - 2.5.2. Using cron to automate indexing The most common way to set up indexing is to have a cron task execute it @@ -745,7 +709,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 1 15 su mylogin -c "recollindex recollindex > /tmp/rcltraceme 2>&1" As of version 1.17 the Recoll GUI has dialogs to manage crontab entries - for recollindex. You can reach them from the Preferences->Indexing + for recollindex. You can reach them from the Preferences -> Indexing Schedule menu. They only work with the good old cron, and do not give access to all features of cron scheduling. @@ -758,8 +722,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Especially the PATH variable may be of concern. Please check the crontab manual pages about possible issues. - ---------------------------------------------------------------------- - 2.6. Real time indexing Real time monitoring/indexing is performed by starting the recollindex -m @@ -788,6 +750,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or fvwm + The indexing daemon gets started, then the window manager, for which the session waits. @@ -818,8 +781,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or it if your system is short on resources. Periodic indexing is adequate in most cases. - ---------------------------------------------------------------------- - 2.6.1. Slowing down the reindexing rate for fast changing files When using the real time monitor, it may happen that some files need to be @@ -830,9 +791,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or which a file, specified by a wildcard pattern, cannot be reindexed. See the mondelaypatterns parameter in the configuration section. - ---------------------------------------------------------------------- - - Chapter 3. Searching +Chapter 3. Searching 3.1. Searching with the Qt graphical user interface @@ -841,10 +800,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or recoll has two search modes: - * Simple search (the default, on the main screen) has a single entry + o Simple search (the default, on the main screen) has a single entry field where you can enter multiple words. - * Advanced search (a panel accessed through the Tools menu or the + o Advanced search (a panel accessed through the Tools menu or the toolbox bar icon) has multiple entry fields, which you may use to build a logical condition, with additional filtering on file type, location in the file system, modification date, and size. @@ -860,8 +819,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or white space in this case (they would typically be printed without white space). - ---------------------------------------------------------------------- - 3.1.1. Simple search 1. Start the recoll program. @@ -890,16 +847,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or more efficiently on a small subset of the index (allowing wild cards on the left of terms without excessive penality). Things to know: - * White space in the entry should match white space in the file name, + o White space in the entry should match white space in the file name, and is not treated specially. - * The search is insensitive to character case and accents, independantly + o The search is insensitive to character case and accents, independantly of the type of index. - * An entry without any wild card character and not capitalized will be + o An entry without any wild card character and not capitalized will be prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc). - * If you have a big index (many files), excessively generic fragments + o If you have a big index (many files), excessively generic fragments may result in inefficient searches. You can search for exact phrases (adjacent words in a given order) by @@ -930,9 +887,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or search. This is what most differentiates this mode from the Query Language mode, where you have to care about the syntax. - You can use the Tools->Advanced search dialog for more complex searches. - - ---------------------------------------------------------------------- + You can use the Tools -> Advanced search dialog for more complex searches. 3.1.2. The default result list @@ -951,12 +906,26 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or documents side by side. (You can also browse successive results in a single preview window by typing Shift+ArrowUp/Down in the window). - Clicking the Open link will attempt to start an external viewer. The - viewer for each document type can be configured through the user - preferences dialog, or by editing the mimeview configuration file. You can - also check the Use desktop preferences option in the GUI preferences - dialog to use the desktop defaults for all documents. This is probably the - best option if you are using a well configured Gnome or KDE desktop. + Clicking the Open link will start an external viewer for the document. By + default, Recoll lets the desktop choose the appropriate application for + most document types (there is a short list of exceptions, see further). If + you prefer to completely customize the choice of applications, you can + uncheck the Use desktop preferences option in the GUI preferences dialog, + and click the Choose editor applications button to adjust the predefined + Recoll choices. The tool accepts multiple selections of mime types (e.g. + to set up the editor for the dozens of office file types). + + Even when Use desktop preferences is checked, there is a small list of + exceptions, for mime types where the Recoll choice should override the + desktop one. These are applications which are well integrated with Recoll, + especially evince for viewing PDF and Postscript files because of its + support for opening the document at a specific page and passing a search + string as an argument. Of course, you can edit the list (in the GUI + preferences) if you would prefer to lose the functionality and use the + standard desktop tool. + + You may also change the choice of applications by editing the mimeview + configuration file if you find this more convenient. The Preview and Open edit links may not be present for all entries, meaning that Recoll has no configured way to preview a given file type @@ -979,31 +948,39 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or the preferences). Use the arrow buttons in the toolbar or the links at the bottom of the page to browse the results. - ---------------------------------------------------------------------- + 3.1.2.1. No results: the spelling suggestions - 3.1.2.1. The result list right-click menu + When a search yields no result, and if the aspell dictionary is + configured, Recoll will try to check for misspellings among the query + terms, and will propose lists of replacements. Clicking on one of the + suggestions will replace the word and restart the search. You can hold any + of the modifier keys (Ctrl, Shift, etc.) while clicking if you would + rather stay on the suggestion screen because several terms need + replacement. + + 3.1.2.2. The result list right-click menu Apart from the preview and edit links, you can display a pop-up menu by right-clicking over a paragraph in the result list. This menu has the following entries: - * Preview + o Preview - * Open + o Open - * Copy File Name + o Copy File Name - * Copy Url + o Copy Url - * Save to File + o Save to File - * Find similar + o Find similar - * Preview Parent document + o Preview Parent document - * Open Parent document + o Open Parent document - * Open Snippets Window + o Open Snippets Window The Preview and Open entries do the same thing as the corresponding links. @@ -1038,8 +1015,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or start the native viewer on the appropriate page. If the viewer supports it, its search function will also be primed with one of the search terms. - ---------------------------------------------------------------------- - 3.1.3. The result table In Recoll 1.15 and newer, the results can be displayed in spreadsheet-like @@ -1065,9 +1040,25 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or links for starting a preview or a native application, and an equivalent right-click menu. Typing Esc (the Escape key) will unfreeze the display. - ---------------------------------------------------------------------- + 3.1.4. Displaying thumbnails - 3.1.4. The preview window + The default format for the result list entries and the detail area of the + result table display an icon for each result document. The icon is either + a generic one determined from the MIME type, or a thumbnail of the + document appearance. Thumbnails are only displayed if found in the + standard freedesktop location, where they would typically have been + created by a file manager. + + Recoll has no capability to create thumbnails. A relatively simple trick + is to use the Open parent document/folder entry in the result list popup + menu. This should open a file manager window on the containing directory, + which should in turn create the thumbnails (depending on your settings). + Restarting the search should then display the thumbnails. + + There are also some pointers about thumbnail generation on the Recoll + wiki. + + 3.1.5. The preview window The preview window opens when you first click a Preview link inside the result list. @@ -1100,9 +1091,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or You can print the current preview window contents by typing Ctrl-P (Ctrl + P) in the window text. - ---------------------------------------------------------------------- - - 3.1.4.1. Searching inside the preview + 3.1.5.1. Searching inside the preview The preview window has an internal search capability, mostly controlled by the panel at the bottom of the window, which works in two modes: as a @@ -1135,9 +1124,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or caused by stemming or wildcards). The search will revert to the text mode as soon as you edit the entry area. - ---------------------------------------------------------------------- - - 3.1.5. Complex/advanced search + 3.1.6. Complex/advanced search The advanced search dialog helps you build more complex queries without memorizing the search language constructs. It can be opened through the @@ -1158,25 +1145,23 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Click on the Show query details link at the top of the result page to see the query expansion. - ---------------------------------------------------------------------- - - 3.1.5.1. Avanced search: the "find" tab + 3.1.6.1. Avanced search: the "find" tab This part of the dialog lets you constructc a query by combining multiple clauses of different types. Each entry field is configurable for the following modes: - * All terms. + o All terms. - * Any term. + o Any term. - * None of the terms. + o None of the terms. - * Phrase (exact terms in order within an adjustable window). + o Phrase (exact terms in order within an adjustable window). - * Proximity (terms in any order within an adjustable window). + o Proximity (terms in any order within an adjustable window). - * Filename search. + o Filename search. Additional entry fields can be created by clicking the Add clause button. @@ -1200,23 +1185,21 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or search for quick fox with the default slack will match the latter, and also a fox is a cunning and quick animal. - ---------------------------------------------------------------------- - - 3.1.5.2. Avanced search: the "filter" tab + 3.1.6.2. Avanced search: the "filter" tab This part of the dialog has several sections which allow filtering the results of a search according to a number of criteria - * The first section allows filtering by dates of last modification. You + o The first section allows filtering by dates of last modification. You can specify both a minimum and a maximum date. The initial values are set according to the oldest and newest documents found in the index. - * The next section allows filtering the results by file size. There are + o The next section allows filtering the results by file size. There are two entries for minimum and maximum size. Enter decimal numbers. You can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12 respectively. - * The next section allows filtering the results by their mime types, or + o The next section allows filtering the results by their mime types, or mime categories (ie: media/text/message/etc.). You can transfer the types between two boxes, to define which will be @@ -1226,7 +1209,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or file type filter will not be activated at program start-up, but the lists will be in the restored state). - * The bottom section allows restricting the search results to a sub-tree + o The bottom section allows restricting the search results to a sub-tree of the indexed area. You can use the Invert checkbox to search for files not in the sub-tree instead. If you use directory filtering often and on big subsets of the file system, you may think of setting @@ -1236,20 +1219,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or dirA/dirB would match either /dir1/dirA/dirB/myfile1 or /dir2/dirA/dirB/someother/myfile2. - ---------------------------------------------------------------------- - - 3.1.5.3. Avanced search history + 3.1.6.3. Avanced search history The advanced search tool memorizes the last 100 searches performed. You can walk the saved searches by using the up and down arrow keys while the keyboard focus belongs to the advanced search dialog. The complex search history can be erased, along with the one for simple - search, by selecting the File->Erase Search History menu entry. + search, by selecting the File -> Erase Search History menu entry. - ---------------------------------------------------------------------- - - 3.1.6. The term explorer tool + 3.1.7. The term explorer tool Recoll automatically manages the expansion of search terms to their derivatives (ie: plural/singular, verb inflections). But there are other @@ -1302,9 +1281,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or simple search entry field. You can also cut/paste between the result list and any entry field (the end of lines will be taken care of). - ---------------------------------------------------------------------- - - 3.1.7. Multiple indexes + 3.1.8. Multiple indexes See the section describing the use of multiple indexes for generalities. Only the aspects concerning the recoll GUI are described here. @@ -1345,9 +1322,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or A change was made in the same update so that recoll will automatically deactivate unreachable indexes when starting up. - ---------------------------------------------------------------------- - - 3.1.8. Document history + 3.1.9. Document history Documents that you actually view (with the internal preview or an external tool) are entered into the document history, which is remembered. @@ -1358,9 +1333,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or You can erase the document history by using the Erase document history entry in the File menu. - ---------------------------------------------------------------------- - - 3.1.9. Sorting search results and collapsing duplicates + 3.1.10. Sorting search results and collapsing duplicates The documents in a result list are normally sorted in order of relevance. It is possible to specify a different sort order, either by using the @@ -1382,11 +1355,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or not be a duplicate of the text only). Duplicates hiding is controlled by an entry in the GUI configuration dialog, and is off by default. - ---------------------------------------------------------------------- + 3.1.11. Search tips, shortcuts - 3.1.10. Search tips, shortcuts - - 3.1.10.1. Terms and search expansion + 3.1.11.1. Terms and search expansion Term completion. Typing Esc Space in the simple search entry field while entering a word will either complete the current word if its beginning @@ -1423,9 +1394,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or file name search which will only look for file names, and may be faster than the generic search especially when using wildcards. - ---------------------------------------------------------------------- - - 3.1.10.2. Working with phrases and proximity + 3.1.11.2. Working with phrases and proximity Phrases and Proximity searches. A phrase can be looked for by enclosing it in double quotes. Example: "user manual" will look only for occurrences of @@ -1455,9 +1424,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or the advanced search panel control, or the o query language modifier). Literal occurences of the word will be matched normally. - ---------------------------------------------------------------------- - - 3.1.10.3. Others + 3.1.11.3. Others Using fields. You can use the query language and field specifications to only search certain parts of documents. This can be especially helpful @@ -1501,9 +1468,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Quitting. Entering Ctrl-Q almost anywhere will close the application. - ---------------------------------------------------------------------- - - 3.1.11. Customizing the search interface + 3.1.12. Customizing the search interface You can customize some aspects of the search interface by using the GUI configuration entry in the Preferences menu. @@ -1512,31 +1477,31 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or the parameters used for searching and returning results, and what indexes are searched. - User interface parameters: + User interface parameters: - * Highlight color for query terms: Terms from the user query are + o Highlight color for query terms: Terms from the user query are highlighted in the result list samples and the preview window. The color can be chosen here. Any Qt color string should work (ie red, #ff0000). The default is blue. - * Style sheet: The name of a Qt style sheet text file which is applied + o Style sheet: The name of a Qt style sheet text file which is applied to the whole Recoll application on startup. The default value is empty, but there is a skeleton style sheet (recoll.qss) inside the /usr/share/recoll/examples directory. Using a style sheet, you can change most recoll graphical parameters: colors, fonts, etc. See the sample file for a few simple examples. - * Maximum text size highlighted for preview Inserting highlights on + o Maximum text size highlighted for preview Inserting highlights on search term inside the text before inserting it in the preview window involves quite a lot of processing, and can be disabled over the given text size to speed up loading. - * Prefer HTML to plain text for preview if set, Recoll will display HTML + o Prefer HTML to plain text for preview if set, Recoll will display HTML as such inside the preview window. If this causes problems with the Qt HTML display, you can uncheck it to display the plain text version instead. - * Plain text to HTML line style: when displaying plain text inside the + o Plain text to HTML line style: when displaying plain text inside the preview window, Recoll tries to preserve some of the original text line breaks and indentation. It can either use PRE HTML tags, which will well preserve the indentation but will force horizontal scrolling @@ -1546,71 +1511,71 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or third option has been available in recent releases and is probably now the best one: use PRE tags with line wrapping. - * Use desktop preferences to choose document editor: if this is checked, + o Use desktop preferences to choose document editor: if this is checked, the xdg-open utility will be used to open files when you click the Open link in the result list, instead of the application defined in mimeview. xdg-open will in term use your desktop preferences to choose an appropriate application. - * Exceptions: when using the desktop preferences for opening documents, + o Exceptions: when using the desktop preferences for opening documents, these are mime types that will still be opened according to Recoll preferences. This is useful for passing parameters like page numbers or search strings to applications that support them (e.g. evince). This cannot be done with xdg-open which only supports passing one parameter. - * Choose editor applications this will let you choose the command + o Choose editor applications this will let you choose the command started by the Open links inside the result list, for specific document types. - * Display category filter as toolbar... this will let you choose if the + o Display category filter as toolbar... this will let you choose if the document categories are displayed as a list or a set of buttons. - * Auto-start simple search on white space entry: if this is checked, a + o Auto-start simple search on white space entry: if this is checked, a search will be executed each time you enter a space in the simple search input field. This lets you look at the result list as you enter new terms. This is off by default, you may like it or not... - * Start with advanced search dialog open : If you use this dialog + o Start with advanced search dialog open : If you use this dialog frequently, checking the entries will get it to open when recoll starts. - * Remember sort activation state if set, Recoll will remember the sort + o Remember sort activation state if set, Recoll will remember the sort tool stat between invocations. It normally starts with sorting disabled. - Result list parameters: + Result list parameters: - * Number of results in a result page + o Number of results in a result page - * Result list font: There is quite a lot of information shown in the + o Result list font: There is quite a lot of information shown in the result list, and you may want to customize the font and/or font size. The rest of the fonts used by Recoll are determined by your generic Qt config (try the qtconfig command). - * Edit result list paragraph format string: allows you to change the + o Edit result list paragraph format string: allows you to change the presentation of each result list entry. See the result list customisation section. - * Edit result page HTML header insert: allows you to define text + o Edit result page HTML header insert: allows you to define text inserted at the end of the result page HTML header. More detail in the result list customisation section. - * Date format: allows specifying the format used for displaying dates + o Date format: allows specifying the format used for displaying dates inside the result list. This should be specified as an strftime() string (man strftime). - * Abstract snippet separator: for synthetic abstracts built from index + o Abstract snippet separator: for synthetic abstracts built from index data, which are usually made of several snippets from different parts of the document, this defines the snippet separator, an ellipsis by default. - Search parameters: + Search parameters: - * Hide duplicate results: decides if result list entries are shown for + o Hide duplicate results: decides if result list entries are shown for identical documents found in different places. - * Stemming language: stemming obviously depends on the document's + o Stemming language: stemming obviously depends on the document's language. This listbox will let you chose among the stemming databases which were built during indexing (this is set in the main configuration file), or later added with recollindex -s (See the @@ -1618,31 +1583,31 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or will be deleted at the next indexing pass unless they are also added in the configuration file. - * Automatically add phrase to simple searches: a phrase will be + o Automatically add phrase to simple searches: a phrase will be automatically built and added to simple searches when looking for Any terms. This will give a relevance boost to the results where the search terms appear as a phrase (consecutive and in order). - * Autophrase term frequency threshold percentage: very frequent terms + o Autophrase term frequency threshold percentage: very frequent terms should not be included in automatic phrase searches for performance reasons. The parameter defines the cutoff percentage (percentage of the documents where the term appears). - * Replace abstracts from documents: this decides if we should synthesize + o Replace abstracts from documents: this decides if we should synthesize and display an abstract in place of an explicit abstract found within the document itself. - * Dynamically build abstracts: this decides if Recoll tries to build + o Dynamically build abstracts: this decides if Recoll tries to build document abstracts (lists of snippets) when displaying the result list. Abstracts are constructed by taking context from the document information, around the search terms. - * Synthetic abstract size: adjust to taste... + o Synthetic abstract size: adjust to taste... - * Synthetic abstract context words: how many words should be displayed + o Synthetic abstract context words: how many words should be displayed around each term occurrence. - * Query language magic file name suffixes: a list of words which + o Query language magic file name suffixes: a list of words which automatically get turned into ext:xxx file name suffix clauses when starting a query language query (ie: doc xls xlsx...). This will save some typing for people who use file types a lot when querying. @@ -1662,16 +1627,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or alternative indexer may also need to implement a way of purging the index from stale data, - ---------------------------------------------------------------------- - - 3.1.11.1. The result list format + 3.1.12.1. The result list format The result list presentation can be exhaustively customized by adjusting two elements: - * The paragraph format + o The paragraph format - * HTML code inside the header section + o HTML code inside the header section These can be edited from the Result list tab of the GUI configuration. @@ -1688,39 +1651,37 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or examples on the page about customising the result list on the Recoll web site. - ---------------------------------------------------------------------- - - 3.1.11.1.1. The paragraph format + The paragraph format This is an arbitrary HTML string where the following printf-like % substitutions will be performed: - * %A. Abstract + o %A. Abstract - * %D. Date + o %D. Date - * %I. Icon image name. This is normally determined from the mime type. + o %I. Icon image name. This is normally determined from the mime type. The associations are defined inside the mimeconf configuration file. If a thumbnail for the file is found at the standard Freedesktop location, this will be displayed instead. - * %K. Keywords (if any) + o %K. Keywords (if any) - * %L. Precooked Preview, Edit, and possibly Snippets links + o %L. Precooked Preview, Edit, and possibly Snippets links - * %M. Mime type + o %M. Mime type - * %N. result Number inside the result page + o %N. result Number inside the result page - * %R. Relevance percentage + o %R. Relevance percentage - * %S. Size information + o %S. Size information - * %T. Title or Filename if not set. + o %T. Title or Filename if not set. - * %t. Title or Filename if not set. + o %t. Title or Filename if not set. - * %U. Url + o %U. Url The format of the Preview, Edit, and Snippets links is , and where docnum (%N) expands to the document @@ -1765,8 +1726,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or It is also possible to define the value of the snippet separator inside the abstract section. - ---------------------------------------------------------------------- - 3.2. Searching with the KDE KIO slave 3.2.1. What's this @@ -1794,8 +1753,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or kio-recoll module, so check before diving into the build process, maybe it's already out there ready for one-click installation. - ---------------------------------------------------------------------- - 3.2.2. Searchable documents As a sample application, the Recoll KIO slave could allow preparing a set @@ -1817,18 +1774,17 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or .... - ---------------------------------------------------------------------- 3.3. Searching on the command line There are several ways to obtain search results as a text stream, without a graphical interface: - * By passing option -t to the recoll program. + o By passing option -t to the recoll program. - * By using the recollq program. + o By using the recollq program. - * By writing a custom Python program, using the Recoll Python API. + o By writing a custom Python program, using the Recoll Python API. The first two methods work in the same way and accept/need the same arguments (except for the additional -t to recoll). The query to be @@ -1886,8 +1842,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]... text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree.... - ---------------------------------------------------------------------- - 3.4. The query language The query language processor is activated in the GUI simple search entry @@ -1919,7 +1873,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or element). Example: Eugenie, author:balzac, dc:title:grandet The colon, if present, means "contains". Xesam defines other relations, - which are mostly supported for now (except in special cases, described + which are mostly unsupported for now (except in special cases, described further down). All elements in the search entry are normally combined with an implicit @@ -1941,23 +1895,23 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Recoll currently manages the following default fields: - * title, subject or caption are synonyms which specify data to be + o title, subject or caption are synonyms which specify data to be searched for in the document title or subject. - * author or from for searching the documents originators. + o author or from for searching the documents originators. - * recipient or to for searching the documents recipients. + o recipient or to for searching the documents recipients. - * keyword for searching the document-specified keywords (few documents + o keyword for searching the document-specified keywords (few documents actually have any). - * filename for the document's file name. + o filename for the document's file name. - * ext specifies the file name extension (Ex: ext:html) + o ext specifies the file name extension (Ex: ext:html) The field syntax also supports a few field-like, but special, criteria: - * dir for filtering the results on file location (Ex: + o dir for filtering the results on file location (Ex: dir:/home/me/somedir). -dir also works to find results not in the specified directory (release >= 1.15.8). A tilde inside the value will be expanded to the home directory. Wildcards will not be expanded. You @@ -1987,13 +1941,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or You need to use double-quotes around the path value if it contains space characters. - * size for filtering the results on file size. Example: size<10000. You + o size for filtering the results on file size. Example: size<10000. You can use <, > or = as operators. You can specify a range like the following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be used as (decimal) multipliers. Ex: size>1k to search for files bigger than 1000 bytes. - * date for searching or filtering on dates. The syntax for the argument + o date for searching or filtering on dates. The syntax for the argument is based on the ISO8601 standard for dates and time intervals. Only dates are supported, no times. The general syntax is 2 elements separated by a / character. Each element can be a date or a period of @@ -2004,22 +1958,22 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or missing element is interpreted as the lowest or highest date in the index. Examples: - * 2001-03-01/2002-05-01 the basic syntax for an interval of dates. + o 2001-03-01/2002-05-01 the basic syntax for an interval of dates. - * 2001-03-01/P1Y2M the same specified with a period. + o 2001-03-01/P1Y2M the same specified with a period. - * 2001/ from the beginning of 2001 to the latest date in the index. + o 2001/ from the beginning of 2001 to the latest date in the index. - * 2001 the whole year of 2001 + o 2001 the whole year of 2001 - * P2D/ means 2 days ago up to now if there are no documents with + o P2D/ means 2 days ago up to now if there are no documents with dates in the future. - * /2003 all documents from 2003 or older. + o /2003 all documents from 2003 or older. Periods can also be specified with small letters (ie: p2y). - * mime or format for specifying the mime type. This one is quite special + o mime or format for specifying the mime type. This one is quite special because you can specify several values which will be OR'ed (the normal default for the language is AND). Ex: mime:text/plain mime:text/html. Specifying an explicit boolean operator before a mime specification is @@ -2028,7 +1982,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or wildcards in the value (mime:text/*). Note that mime is the ONLY field with an OR default. You do need to use OR with ext terms for example. - * type or rclcat for specifying the category (as in + o type or rclcat for specifying the category (as in text/media/presentation/etc.). The classification of mime types in categories is defined in the Recoll configuration (mimeconf), and can be modified or extended. The default category names are those which @@ -2046,8 +2000,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation. - ---------------------------------------------------------------------- - 3.4.1. Modifiers Some characters are recognized as search modifiers when found immediately @@ -2055,26 +2007,24 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or term"modifierchars. The actual "phrase" can be a single term of course. Supported modifiers: - * l can be used to turn off stemming (mostly makes sense with p because + o l can be used to turn off stemming (mostly makes sense with p because stemming is off by default for phrases). - * o can be used to specify a "slack" for phrase and proximity searches: + o o can be used to specify a "slack" for phrase and proximity searches: the number of additional terms that may be found between the specified ones. If o is followed by an integer number, this is the slack, else the default is 10. - * p can be used to turn the default phrase search into a proximity one + o p can be used to turn the default phrase search into a proximity one (unordered). Example:"order any in"p - * C will turn on case sensitivity (if the index supports it). + o C will turn on case sensitivity (if the index supports it). - * D will turn on diacritics sensitivity (if the index supports it). + o D will turn on diacritics sensitivity (if the index supports it). - * A weight can be specified for a query element by specifying a decimal + o A weight can be specified for a query element by specifying a decimal value at the start of the modifiers. Example: "Important"2.5. - ---------------------------------------------------------------------- - 3.5. Search case and diacritics sensitivity For Recoll versions 1.18 and later, and when working with a raw index (not @@ -2125,8 +2075,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or When either case or diacritics sensitivity is activated, stem expansion is turned off. Having both does not make much sense. - ---------------------------------------------------------------------- - 3.6. Anchored searches and wildcards Some special characters are interpreted by Recoll in search strings to @@ -2135,8 +2083,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or if the match is found at or near the beginning of the document or one of its fields. - ---------------------------------------------------------------------- - 3.6.1. More about wildcards All words entered in Recoll search fields will be processed for wildcard @@ -2144,25 +2090,25 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The wildcard characters are: - * * which matches 0 or more characters. + o * which matches 0 or more characters. - * ? which matches a single character. + o ? which matches a single character. - * [] which allow defining sets of characters to be matched (ex: [abc] + o [] which allow defining sets of characters to be matched (ex: [abc] matches a single character which may be 'a' or 'b' or 'c', [0-9] matches any number. You should be aware of a few things before using wildcards. - * Using a wildcard character at the beginning of a word can make for a + o Using a wildcard character at the beginning of a word can make for a slow search because Recoll will have to scan the whole index term list to find the matches. - * When working with a raw index (preserving character case and + o When working with a raw index (preserving character case and diacritics), the literal part of a wildcard expression will be matched exactly for case and diacritics. - * Using a * at the end of a word can produce more matches than you would + o Using a * at the end of a word can produce more matches than you would think, and strange search results. You can use the term explorer tool to check what completions exist for a given term. You can also see exactly what search was performed by clicking on the link at the top @@ -2170,8 +2116,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or expansion will produce better results than an ending * (stem expansion is turned off when any wildcard character appears in the term). - ---------------------------------------------------------------------- - 3.6.2. Anchored searches Two characters are used to specify that a search hit should occur at the @@ -2201,24 +2145,20 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or matches inside the abstract or the list of authors (which occur at the top of the document). - ---------------------------------------------------------------------- - 3.7. Desktop integration Being independant of the desktop type has its drawbacks: Recoll desktop integration is minimal. However there are a few tools available: - * The KDE KIO Slave was described in a previous section. + o The KDE KIO Slave was described in a previous section. - * If you use a recent version of Ubuntu Linux, you may find the Ubuntu + o If you use a recent version of Ubuntu Linux, you may find the Ubuntu Unity Lens module useful. - * There is also an independantly developed Krunner plugin. + o There is also an independantly developed Krunner plugin. Here follow a few other things that may help. - ---------------------------------------------------------------------- - 3.7.1. Hotkeying recoll It is surprisingly convenient to be able to show or hide the Recoll GUI @@ -2226,8 +2166,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or the libwnck window manager interface library, which will allow you to do just this. The detailed instructions are on this wiki page. - ---------------------------------------------------------------------- - 3.7.2. The KDE Kicker Recoll applet This is probably obsolete now. Anyway: @@ -2251,9 +2189,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or a new recoll GUI instance every time (even if it is already running). You may find it useful anyway. - ---------------------------------------------------------------------- - - Chapter 4. Programming interface +Chapter 4. Programming interface Recoll has an Application Programming Interface, usable both for indexing and searching, currently accessible from the Python language. @@ -2264,36 +2200,60 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The processing of metadata attributes for documents (fields) is highly configurable. - ---------------------------------------------------------------------- - 4.1. Writing a document filter - Recoll filters are executable programs which translate from a specific - format (ie: openoffice, acrobat, etc.) to the Recoll indexing input - format, which may be text/plain or text/html. + Recoll filters cooperate to translate from the multitude of input document + formats, simple ones as opendocument, acrobat), or compound ones such as + Zip or Email, into the final Recoll indexing input format, which may be + text/plain or text/html. Most filters are executable programs or scripts. + A few filters are coded in C++ and live inside recollindex. This latter + kind will not be described here. - As of Recoll 1.13, there are two kinds of filters: + There are currently (1.18 and since 1.13) two kinds of external executable + filters: - * Simple filters (the old ones) run once and exit. They can be bare - programs like antiword, or shell-scripts using other programs. They - are very simple to write, because they just need to output the - converted to the standard output. + o Simple filters (exec filters) run once and exit. They can be bare + programs like antiword, or scripts using other programs. They are very + simple to write, because they just need to print the converted + document to the standard output. Their output can be text/plain or + text/html. - * Multiple filters, new in 1.13, run as long as their master process - (ie: recollindex) is active. They can process multiple files (sparing - the process startup time which can be very significant), or multiple - documents per file (ie: for zip or chm files). They communicate with + o Multiple filters (execm filters), run as long as their master process + (recollindex) is active. They can process multiple files (sparing the + process startup time which can be very significant), or multiple + documents per file (e.g.: for zip or chm files). They communicate with the indexer through a simple protocol, but are nevertheless a bit more - complicated than the older kind. Most of these new filters are written - in Python, using a common module to handle the protocol. + complicated than the older kind. Most of new filters are written in + Python, using a common module to handle the protocol. There is an + exception, rclimg which is written in Perl. The subdocuments output by + these filters can be directly indexable (text or HTML), or they can be + other simple or compound documents that will need to be processed by + another filter. - The following will just describe the simple filters. If you can program - and want to write one of the other kind, it shouldn't be too difficult to - make sense of one of the existing modules. For example, look at rclzip - which uses Zip file paths as internal identifiers (ipath), and rclinfo, - which uses an integer index. + In both cases, filters deal with regular file system files, and can + process either a single document, or a linear list of documents in each + file. Recoll is responsible for performing up to date checks, deal with + more complex embedding and other upper level issues. - ---------------------------------------------------------------------- + In the extreme case of a simple filter returning a document in text/plain + format, no metadata can be transferred from the filter to the indexer. + Generic metadata, like document size or modification date, will be + gathered and stored by the indexer. + + Filters that produce text/html format can return an arbitrary amount of + metadata inside HTML meta tags. These will be processed according to the + directives found in the fields configuration file. + + The filters that can handle multiple documents per file return a single + piece of data to identify each document inside the file. This piece of + data, called an ipath element will be sent back by Recoll to extract the + document at query time, for previewing, or for creating a temporary file + to be opened by a viewer. + + The following section describes the simple filters, and the next one gives + a few explanations about the execm ones. You could conceivably write a + simple filter with only the elements in the manual. This will not be the + case for the other ones, for which you will have to look at the code. 4.1.1. Simple filters @@ -2327,9 +2287,39 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Don't forget to make your filter executable before testing ! - ---------------------------------------------------------------------- + 4.1.2. "Multiple" filters - 4.1.2. Telling Recoll about the filter + If you can program and want to write an execm filter, it should not be too + difficult to make sense of one of the existing modules. For example, look + at rclzip which uses Zip file paths as identifiers (ipath), and rclics, + which uses an integer index. Also have a look at the comments inside the + internfile/mh_execm.h file and possibly at the corresponding module. + + execm filters sometimes need to make a choice for the nature of the ipath + elements that they use in communication with the indexer. Here are a few + guidelines: + + o Use ASCII or UTF-8 (if the identifier is an integer print it, for + example, like printf %d would do). + + o If at all possible, the data should make some kind of sense when + printed to a log file to help with debugging. + + o Recoll uses a colon (:) as a separator to store a complex path + internally (for deeper embedding). Colons inside the ipath elements + output by a filter will be escaped, but would be a bad choice as a + filter-specific separator (mostly, again, for debugging issues). + + In any case, the main goal is that it should be easy for the filter to + extract the target document, given the file name and the ipath element. + + execm filters will also produce a document with a null ipath element. + Depending on the type of document, this may have some associated data + (e.g. the body of an email message), or none (typical for an archive + file). If it is empty, this document will be useful anyway for some + operations, as the parent of the actual data documents. + + 4.1.3. Telling Recoll about the filter There are two elements that link a file to the filter which should process it: the association of file to mime type and the association of a mime @@ -2360,23 +2350,21 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The fragment specifies that: - * application/msword files are processed by executing the antiword + o application/msword files are processed by executing the antiword program, which outputs text/plain encoded in utf-8. - * application/ogg files are processed by the rclogg script, with default + o application/ogg files are processed by the rclogg script, with default output type (text/html, with encoding specified in the header, or utf-8 by default). - * text/rtf is processed by unrtf, which outputs text/html. The + o text/rtf is processed by unrtf, which outputs text/html. The iso-8859-1 encoding is specified because it is not the utf-8 default, and not output by unrtf in the HTML header section. - * application/x-chm is processed by a persistant filter. This is + o application/x-chm is processed by a persistant filter. This is determined by the execm keyword. - ---------------------------------------------------------------------- - - 4.1.3. Filter HTML output + 4.1.4. Filter HTML output The output HTML could be very minimal like the following example: @@ -2407,17 +2395,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or See the following section for details about configuring how field data is processed by the indexer. - ---------------------------------------------------------------------- - - 4.1.4. Page numbers + 4.1.5. Page numbers The indexer will interpret ^L characters in the filter output as indicating page breaks, and will record them. At query time, this allows starting a viewer on the right page for a hit or a snippet. Currently, only the PDF, Postscript and DVI filters generate page breaks. - ---------------------------------------------------------------------- - 4.2. Field data processing Fields are named pieces of information in or about documents, like title, @@ -2435,11 +2419,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Fields can be: - * indexed, meaning that their terms are separately stored in inverted + o indexed, meaning that their terms are separately stored in inverted lists (with a specific prefix), and that a field-specific search is possible. - * stored, meaning that their value is recorded in the index data record + o stored, meaning that their value is recorded in the index data record for the document, and can be returned and displayed with search results. @@ -2448,24 +2432,24 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The sequence of events for field processing is as follows: - * During indexing, recollindex scans all meta fields in HTML documents + o During indexing, recollindex scans all meta fields in HTML documents (most document types are transformed into HTML at some point). It compares the name for each element to the configuration defining what should be done with fields (the fields file) - * If the name for the meta element matches one for a field that should + o If the name for the meta element matches one for a field that should be indexed, the contents are processed and the terms are entered into the index with the prefix defined in the fields file. - * If the name for the meta element matches one for a field that should + o If the name for the meta element matches one for a field that should be stored, the content of the element is stored with the document data record, from which it can be extracted and displayed at query time. - * At query time, if a field search is performed, the index prefix is + o At query time, if a field search is performed, the index prefix is computed and the match is only performed against appropriately prefixed terms in the index. - * At query time, the field can be displayed inside the result list by + o At query time, the field can be displayed inside the result list by using the appropriate directive in the definition of the result list paragraph format. All fields are displayed on the fields screen of the preview window (which you can reach through the right-click menu). @@ -2479,8 +2463,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or could add a page count field to pdf documents for displaying inside result lists. - ---------------------------------------------------------------------- - 4.3. API 4.3.1. Interface elements @@ -2522,8 +2504,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or during indexing. The main indexer documents would also probably be a problem for the external indexer purge operation. - ---------------------------------------------------------------------- - 4.3.2. Python interface 4.3.2.1. Introduction @@ -2552,8 +2532,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or python setup.py build python setup.py install - ---------------------------------------------------------------------- - 4.3.2.2. Interface manual NAME @@ -2674,7 +2652,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or | | | execute(...) - | execute(query_string, stemming=1|0) + | execute(query_string, stemming=1|0, stemlang="stemming language") | | Starts a search for query_string, a Recoll search language string | (mostly Xesam-compatible). @@ -2740,8 +2718,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or extra_dbs is a list of external databases (xapian directories) writable decides if we can index new data through this connection - ---------------------------------------------------------------------- - 4.3.2.3. Example code The following sample would query the index with a user language string. @@ -2749,6 +2725,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or examples. #!/usr/bin/env python + import recoll db = recoll.connect() @@ -2769,20 +2746,20 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or print - ---------------------------------------------------------------------- - Chapter 5. Installation and configuration + +Chapter 5. Installation and configuration 5.1. Installing a binary copy There are three types of binary Recoll installations: - * Through your system normal software distribution framework (ie, + o Through your system normal software distribution framework (ie, Debian/Ubuntu apt, FreeBSD ports, etc.). - * From a package downloaded from the Recoll web site. + o From a package downloaded from the Recoll web site. - * From a prebuilt tree downloaded from the Recoll web site. + o From a prebuilt tree downloaded from the Recoll web site. In all cases, the strict software dependancies (ie on Xapian or iconv) will be automatically satisfied, you should not have to worry about them. @@ -2795,16 +2772,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or may not be necessary for a quick test with default parameters). Most parameters can be more conveniently set from the GUI interface. - ---------------------------------------------------------------------- - 5.1.1. Installing through a package system If you use a BSD-type port system or a prebuilt package (DEB, RPM, manually or through the system software configuration utility), just follow the usual procedure for your system. - ---------------------------------------------------------------------- - 5.1.2. Installing a prebuilt Recoll The unpackaged binary versions on the Recoll web site are just compressed @@ -2818,8 +2791,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or had built the package from source (that is, just type make install). The binary trees are built for installation to /usr/local. - ---------------------------------------------------------------------- - 5.2. Supporting packages Recoll uses external applications to index some file types. You need to @@ -2852,74 +2823,72 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Now for the list: - * Openoffice files need unzip and xsltproc. + o Openoffice files need unzip and xsltproc. - * PDF files need pdftotext which is part of the Xpdf or Poppler + o PDF files need pdftotext which is part of the Xpdf or Poppler packages. - * Postscript files need pstotext. The original version has an issue with + o Postscript files need pstotext. The original version has an issue with shell character in file names, which is corrected in recent packages. See the the Recoll helper applications page for more detail. - * MS Word needs antiword. It is also useful to have wvWare installed as + o MS Word needs antiword. It is also useful to have wvWare installed as it may be be used as a fallback for some files which antiword does not handle. - * MS Excel and PowerPoint need catdoc. + o MS Excel and PowerPoint need catdoc. - * MS Open XML (docx) needs xsltproc. + o MS Open XML (docx) needs xsltproc. - * Wordperfect files need wpd2html from the libwpd (or libwpd-tools on + o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on Ubuntu) package. - * RTF files need unrtf, which, in its standard version, has much trouble + o RTF files need unrtf, which, in its standard version, has much trouble with non-western character sets. Check the Recoll helper applications page. - * TeX files need untex or detex. Check the Recoll helper applications + o TeX files need untex or detex. Check the Recoll helper applications page for sources if it's not packaged for your distribution. - * dvi files need dvips. + o dvi files need dvips. - * djvu files need djvutxt and djvused from the DjVuLibre package. + o djvu files need djvutxt and djvused from the DjVuLibre package. - * Audio files: Recoll releases before 1.13 used the id3info command from + o Audio files: Recoll releases before 1.13 used the id3info command from the id3lib package to extract mp3 tag information, metaflac (standard flac tools) for flac files, and ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a single Python filter based on mutagen for all audio file types. - * Pictures: Recoll uses the Exiftool Perl package to extract tag + o Pictures: Recoll uses the Exiftool Perl package to extract tag information. Most image file formats are supported. Note that there may not be much interest in indexing the technical tags (image size, aperture, etc.). This is only of interest if you store personal tags or textual descriptions inside the image files. - * chm: files in microsoft help format need Python and the pychm module + o chm: files in microsoft help format need Python and the pychm module (which needs chmlib). - * ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar + o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar module. icalendar is not needed for newer versions, which use internal code. - * Zip archives need Python (and the standard zipfile module). + o Zip archives need Python (and the standard zipfile module). - * Rar archives need Python, the rarfile Python module and the unrar + o Rar archives need Python, the rarfile Python module and the unrar utility. - * Midi karaoke files need Python and the Midi module + o Midi karaoke files need Python and the Midi module - * Konqueror webarchive format with Python (uses the Tarfile module). + o Konqueror webarchive format with Python (uses the Tarfile module). - * mimehtml web archive format (support based on the email filter, which + o mimehtml web archive format (support based on the email filter, which introduces some mild weirdness, but still usable). Text, HTML, email folders, and Scribus files are processed internally. Lyx is used to index Lyx files. Many filters need iconv and the standard sed and awk. - ---------------------------------------------------------------------- - 5.3. Building from source 5.3.1. Prerequisites @@ -2929,10 +2898,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Development files for Xapian core. - Important: If you are building Xapian for an older CPU (before Pentium 4 - or Athlon 64), you need to add the --disable-sse flag to the configure - command. Else all Xapian application will crash with an illegal - instruction error. + Important + + If you are building Xapian for an older CPU (before Pentium 4 or Athlon + 64), you need to add the --disable-sse flag to the configure command. Else + all Xapian application will crash with an illegal instruction error. Development files for Qt . @@ -2948,8 +2918,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or not be critical). On Linux systems, the iconv interface is part of libc and you should not need to do anything special. - ---------------------------------------------------------------------- - 5.3.2. Building Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most @@ -2960,11 +2928,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Depending on the Qt 3 configuration on your system, you may have to set the QTDIR and QMAKESPECS variables in your environment: - * QTDIR should point to the directory above the one that holds the qt + o QTDIR should point to the directory above the one that holds the qt include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should be /usr/local/qt). - * QMAKESPECS should be set to the name of one of the Qt mkspecs + o QMAKESPECS should be set to the name of one of the Qt mkspecs sub-directories (ie: linux-g++). On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS @@ -2974,43 +2942,43 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or details are entirely determined by qmake (which is quite often installed as qmake-qt4). - Configure options: + Configure options: - * --without-aspell will disable the code for phonetic matching of search + o --without-aspell will disable the code for phonetic matching of search terms. - * --with-fam or --with-inotify will enable the code for real time + o --with-fam or --with-inotify will enable the code for real time indexing. Inotify support is enabled by default on recent Linux systems. - * --disable-webkit is available from version 1.17 to implement the + o --disable-webkit is available from version 1.17 to implement the result list with a Qt QTextBrowser instead of a WebKit widget if you do not or can't depend on the latter. - * --enable-xattr will enable code to fetch data from file extended + o --enable-xattr will enable code to fetch data from file extended attributes. This is only useful is some application stores data in there, and also needs some simple configuration (see comments in the fields configuration file). - * --enable-camelcase will enable splitting camelCase words. This is not + o --enable-camelcase will enable splitting camelCase words. This is not enabled by default as it has the unfortunate side-effect of making some phrase searches quite confusing: ie, "MySQL manual" would be matched by "MySQL manual" and "my sql manual" but not "mysql manual" (only inside phrase searches). - * --with-file-command Specify the version of the 'file' command to use + o --with-file-command Specify the version of the 'file' command to use (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable the gnu version on systems where the native one is bad. - * --disable-qtgui Disable the Qt interface. Will allow building the + o --disable-qtgui Disable the Qt interface. Will allow building the indexer and the command line search program in absence of a Qt environment. - * --disable-x11mon Disable X11 connection monitoring inside recollindex. + o --disable-x11mon Disable X11 connection monitoring inside recollindex. Together with --disable-qtgui, this allows building recoll without Qt and X11. - * Of course the usual autoconf configure options, like --prefix apply. + o Of course the usual autoconf configure options, like --prefix apply. Normal procedure: @@ -3026,8 +2994,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or to manually copy and modify one of the existing files (the new file name should be the output of uname -s). - ---------------------------------------------------------------------- - 5.3.3. Installation Either type make install or execute recollinstall prefix, in the root of @@ -3043,8 +3009,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or You can then proceed to configuration. - ---------------------------------------------------------------------- - 5.4. Configuration overview Most of the parameters specific to the recoll GUI are set through the @@ -3098,11 +3062,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or There are three kinds of lines: - * Comment (starts with #) or empty. + o Comment (starts with #) or empty. - * Parameter affectation (name = value). + o Parameter affectation (name = value). - * Section definition ([somedirname]). + o Section definition ([somedirname]). Depending on the type of configuration file, section definitions either separate groups of parameters or allow redefining some parameters for a @@ -3121,12 +3085,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Encoding issues. Most of the configuration parameters are plain ASCII. Two particular sets of values may cause encoding issues: - * File path parameters may contain non-ascii characters and should use + o File path parameters may contain non-ascii characters and should use the exact same byte values as found in the file system directory. Usually, this means that the configuration file should use the system default locale encoding. - * The unac_except_trans parameter should be encoded in UTF-8. If your + o The unac_except_trans parameter should be encoded in UTF-8. If your system locale is not UTF-8, and you need to also specify non-ascii file paths, this poses a difficulty because common text editors cannot handle multiple encodings in a single file. In this relatively @@ -3134,8 +3098,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or text files with appropriate encodings, and concatenate them to create the complete configuration. - ---------------------------------------------------------------------- - 5.4.1. Main configuration file recoll.conf is the main configuration file. It defines things like what to @@ -3151,8 +3113,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Configuration menu in the recoll interface. Some can only be set by editing the configuration file. - ---------------------------------------------------------------------- - 5.4.1.1. Parameters affecting what documents we index: topdirs @@ -3204,7 +3164,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Example of use for skipping text files only in a specific directory: - skippedPaths = ~/somedir/..txt + skippedPaths = ~/somedir/*.txt skippedPathsFnmPathname @@ -3275,19 +3235,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or useful for files with suffix-less names, but it will also cause the indexing of many bogus "text" files. - processbeaglequeue + processwebqueue - If this is set, process the directory where Beagle Web browser - plugins copy visited pages for indexing. Of course, Beagle MUST - NOT be running, else things will behave strangely. + If this is set, process the directory where Web browser plugins + copy visited pages for indexing. - beaglequeuedir + webqueuedir - The path to the Beagle indexing queue. This is hard-coded in the - Beagle plugin as ~/.beagle/ToIndex so there should be no need to - change it. - - ---------------------------------------------------------------------- + The path to the web indexing queue. This is hard-coded in the + Firefox plugin as ~/.recollweb/ToIndex so there should be no need + to change it. 5.4.1.2. Parameters affecting how we generate terms: @@ -3407,8 +3364,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or localfields= rclaptg=gnus:other = val, then select specifier viewer with mimetype|tag=... in mimeview. - ---------------------------------------------------------------------- - 5.4.1.3. Parameters affecting where and how we store things: dbdir @@ -3444,15 +3399,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or webcachedir - This is only used by the Beagle web browser plugin indexing code, - and defines where the cache for visited pages will live. Default: + This is only used by the web browser plugin indexing code, and + defines where the cache for visited pages will live. Default: $RECOLL_CONFDIR/webcache webcachemaxmbs - This is only used by the Beagle web browser plugin indexing code, - and defines the maximum size for the web page cache. Default: 40 - MB. + This is only used by the web browser plugin indexing code, and + defines the maximum size for the web page cache. Default: 40 MB. idxflushmb @@ -3461,9 +3415,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or of 0 means no explicit flushing, letting Xapian use its own default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD) documents, which gives little memory usage control, as memory - usage depends on average document size. The default value is 10. - - ---------------------------------------------------------------------- + usage also depends on average document size. The default value is + 10, and it is probably a bit low. If your system usually has free + memory, you can try higher values between 20 and 80. In my + experience, values beyond 100 are always counterproductive. 5.4.1.4. Miscellaneous parameters: @@ -3577,8 +3532,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or be set for directories which hold Thunderbird data, as their folder format is weird. - ---------------------------------------------------------------------- - 5.4.2. The fields file This file contains information about dynamic fields handling in Recoll. @@ -3639,8 +3592,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or # mailmytag field name x-my-tag = mailmytag - ---------------------------------------------------------------------- - 5.4.3. The mimemap file mimemap specifies the file name extension to mime type mappings. @@ -3665,8 +3616,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or given Recoll version. Having it there avoids cluttering the more user-oriented and locally customized skippedNames. - ---------------------------------------------------------------------- - 5.4.4. The mimeconf file mimeconf specifies how the different mime types are handled for indexing, @@ -3679,8 +3628,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or recoll in the result lists (the values are the basenames of the png images inside the iconsdir directory (specified in recoll.conf). - ---------------------------------------------------------------------- - 5.4.5. The mimeview file mimeview specifies which programs are started when you click on an Open @@ -3721,39 +3668,37 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The right side of each assignment holds a command to be executed for opening the file. The following substitutions are performed: - * %D. Document date + o %D. Document date - * %f. File name. This may be the name of a temporary file if it was + o %f. File name. This may be the name of a temporary file if it was necessary to create one (ie: to extract a subdocument from a container). - * %F. Original file name. Same as %f except if a temporary file is used. + o %F. Original file name. Same as %f except if a temporary file is used. - * %i. Internal path, for subdocuments of containers. The format depends + o %i. Internal path, for subdocuments of containers. The format depends on the container type. If this appears in the command line, Recoll will not create a temporary file to extract the subdocument, expecting the called application (possibly a script) to be able to handle it. - * %M. Mime type + o %M. Mime type - * %p. Page index. Only significant for a subset of document types, + o %p. Page index. Only significant for a subset of document types, currently only PDF, Postscript and DVI files. Can be used to start the editor at the right page for a match or snippet. - * %s. Search term. The value will only be set for documents with indexed + o %s. Search term. The value will only be set for documents with indexed page numbers (ie: PDF). The value will be one of the matched search terms. It would allow pre-setting the value in the "Find" entry inside Evince for example, for easy highlighting of the term. - * %U, %u. Url. + o %U, %u. Url. In addition to the predefined values above, all strings like %(fieldname) will be replaced by the value of the field named fieldname for the document. This could be used in combination with field customisation to help with opening the document. - ---------------------------------------------------------------------- - 5.4.6. Examples of configuration adjustments 5.4.6.1. Adding an external viewer for an non-indexed type @@ -3765,14 +3710,15 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or You need two entries in the configuration files for this to work: - * In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the + o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the following line: .blob = application/x-blobapp Note that the mime type is made up here, and you could call it diesel/oil just the same. - * In $RECOLL_CONFDIR/mimeview under the [view] section, add: + + o In $RECOLL_CONFDIR/mimeview under the [view] section, add: application/x-blobapp = blobviewer %f @@ -3785,8 +3731,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or configuration, which you do not need to alter. mimeview can also be modified from the Gui. - ---------------------------------------------------------------------- - 5.4.6.2. Adding indexing support for a new file type Let us now imagine that the above .blob files actually contain indexable @@ -3795,16 +3739,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or alteration, and also to add data to the mimeconf file (typically in ~/.recoll/mimeconf): - * Under the [index] section, add the following line (more about the + o Under the [index] section, add the following line (more about the rclblob indexing script later): application/x-blobapp = exec rclblob - * Under the [icons] section, you should choose an icon to be displayed + o Under the [icons] section, you should choose an icon to be displayed for the files inside the result lists. Icons are normally 64x64 pixels PNG files which live in /usr/[local/]share/recoll/images. - * Under the [categories] section, you should add the mime type where it + o Under the [categories] section, you should add the mime type where it makes sense (you can also create a category). Categories may be used for filtering in advanced search. @@ -3815,5 +3759,3 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The filter programming section describes in more detail how to write a filter. - - ----------------------------------------------------------------------