release 3636
This commit is contained in:
parent
16b63b4e14
commit
b7511f6f17
2 changed files with 209 additions and 185 deletions
89
src/INSTALL
89
src/INSTALL
|
@ -81,7 +81,7 @@ Chapter 5. Installation and configuration
|
||||||
text file inside the configuration directory.
|
text file inside the configuration directory.
|
||||||
|
|
||||||
A list of common file types which need external commands follows. Many of
|
A list of common file types which need external commands follows. Many of
|
||||||
the filters need the iconv command, which is not always listed as a
|
the handlers need the iconv command, which is not always listed as a
|
||||||
dependancy.
|
dependancy.
|
||||||
|
|
||||||
Please note that, due to the relatively dynamic nature of this
|
Please note that, due to the relatively dynamic nature of this
|
||||||
|
@ -96,7 +96,7 @@ Chapter 5. Installation and configuration
|
||||||
http://www.recoll.org/features.html if a file type is important to you.
|
http://www.recoll.org/features.html if a file type is important to you.
|
||||||
|
|
||||||
As of Recoll release 1.14, a number of XML-based formats that were handled
|
As of Recoll release 1.14, a number of XML-based formats that were handled
|
||||||
by ad hoc filter code now use the xsltproc command, which usually comes
|
by ad hoc handler code now use the xsltproc command, which usually comes
|
||||||
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
||||||
|
|
||||||
Now for the list:
|
Now for the list:
|
||||||
|
@ -114,7 +114,7 @@ Chapter 5. Installation and configuration
|
||||||
it may be be used as a fallback for some files which antiword does not
|
it may be be used as a fallback for some files which antiword does not
|
||||||
handle.
|
handle.
|
||||||
|
|
||||||
o MS Excel and PowerPoint need catdoc.
|
o MS Excel and PowerPoint are processed by internal Python handlers.
|
||||||
|
|
||||||
o MS Open XML (docx) needs xsltproc.
|
o MS Open XML (docx) needs xsltproc.
|
||||||
|
|
||||||
|
@ -133,11 +133,8 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
o djvu files need djvutxt and djvused from the DjVuLibre package.
|
o djvu files need djvutxt and djvused from the DjVuLibre package.
|
||||||
|
|
||||||
o Audio files: Recoll releases before 1.13 used the id3info command from
|
o Audio files: Recoll releases 1.14 and later use a single Python
|
||||||
the id3lib package to extract mp3 tag information, metaflac (standard
|
handler based on mutagen for all audio file types.
|
||||||
flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
|
|
||||||
Releases 1.14 and later use a single Python filter based on mutagen
|
|
||||||
for all audio file types.
|
|
||||||
|
|
||||||
o Pictures: Recoll uses the Exiftool Perl package to extract tag
|
o Pictures: Recoll uses the Exiftool Perl package to extract tag
|
||||||
information. Most image file formats are supported. Note that there
|
information. Most image file formats are supported. Note that there
|
||||||
|
@ -145,7 +142,7 @@ Chapter 5. Installation and configuration
|
||||||
aperture, etc.). This is only of interest if you store personal tags
|
aperture, etc.). This is only of interest if you store personal tags
|
||||||
or textual descriptions inside the image files.
|
or textual descriptions inside the image files.
|
||||||
|
|
||||||
o chm: files in microsoft help format need Python and the pychm module
|
o chm: files in Microsoft help format need Python and the pychm module
|
||||||
(which needs chmlib).
|
(which needs chmlib).
|
||||||
|
|
||||||
o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
||||||
|
@ -161,11 +158,11 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
o Konqueror webarchive format with Python (uses the Tarfile module).
|
o Konqueror webarchive format with Python (uses the Tarfile module).
|
||||||
|
|
||||||
o mimehtml web archive format (support based on the email filter, which
|
o Mimehtml web archive format (support based on the email handler, which
|
||||||
introduces some mild weirdness, but still usable).
|
introduces some mild weirdness, but still usable).
|
||||||
|
|
||||||
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
||||||
is used to index Lyx files. Many filters need iconv and the standard sed
|
is used to index Lyx files. Many handlers need iconv and the standard sed
|
||||||
and awk.
|
and awk.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
@ -515,10 +512,10 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
A space-separated list of patterns for names of files or
|
A space-separated list of patterns for names of files or
|
||||||
directories that should be ignored inside zip archives. This is
|
directories that should be ignored inside zip archives. This is
|
||||||
used directly by the zip filter, and has a function similar to
|
used directly by the zip handler, and has a function similar to
|
||||||
skippedNames, but works independantly. Can be redefined for
|
skippedNames, but works independantly. Can be redefined for
|
||||||
filesystem subdirectories. For versions up to 1.19, you will need
|
filesystem subdirectories. For versions up to 1.19, you will need
|
||||||
to update the Zip filter and install a supplementary Python
|
to update the Zip handler and install a supplementary Python
|
||||||
module. The details are described on the Recoll wiki.
|
module. The details are described on the Recoll wiki.
|
||||||
|
|
||||||
followLinks
|
followLinks
|
||||||
|
@ -533,11 +530,16 @@ Chapter 5. Installation and configuration
|
||||||
indexedmimetypes
|
indexedmimetypes
|
||||||
|
|
||||||
Recoll normally indexes any file which it knows how to read. This
|
Recoll normally indexes any file which it knows how to read. This
|
||||||
list lets you restrict the indexed mime types to what you specify.
|
list lets you restrict the indexed MIME types to what you specify.
|
||||||
If the variable is unspecified or the list empty (the default),
|
If the variable is unspecified or the list empty (the default),
|
||||||
all supported types are processed. Can be redefined for
|
all supported types are processed. Can be redefined for
|
||||||
subdirectories.
|
subdirectories.
|
||||||
|
|
||||||
|
excludedmimetypes
|
||||||
|
|
||||||
|
This list lets you exclude some MIME types from indexing. Can be
|
||||||
|
redefined for subdirectories.
|
||||||
|
|
||||||
compressedfilemaxkbs
|
compressedfilemaxkbs
|
||||||
|
|
||||||
Size limit for compressed (.gz or .bz2) files. These need to be
|
Size limit for compressed (.gz or .bz2) files. These need to be
|
||||||
|
@ -570,14 +572,14 @@ Chapter 5. Installation and configuration
|
||||||
Recoll indexes file names in a special section of the database to
|
Recoll indexes file names in a special section of the database to
|
||||||
allow specific file names searches using wild cards. This
|
allow specific file names searches using wild cards. This
|
||||||
parameter decides if file name indexing is performed only for
|
parameter decides if file name indexing is performed only for
|
||||||
files with mime types that would qualify them for full text
|
files with MIME types that would qualify them for full text
|
||||||
indexing, or for all files inside the selected subtrees,
|
indexing, or for all files inside the selected subtrees,
|
||||||
independently of mime type.
|
independently of MIME type.
|
||||||
|
|
||||||
usesystemfilecommand
|
usesystemfilecommand
|
||||||
|
|
||||||
Decide if we use the file -i system command as a final step for
|
Decide if we use the file -i system command as a final step for
|
||||||
determining the mime type for a file (the main procedure uses
|
determining the MIME type for a file (the main procedure uses
|
||||||
suffix associations as defined in the mimemap file). This can be
|
suffix associations as defined in the mimemap file). This can be
|
||||||
useful for files with suffix-less names, but it will also cause
|
useful for files with suffix-less names, but it will also cause
|
||||||
the indexing of many bogus "text" files.
|
the indexing of many bogus "text" files.
|
||||||
|
@ -790,6 +792,9 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
This is only used by the web browser plugin indexing code, and
|
This is only used by the web browser plugin indexing code, and
|
||||||
defines the maximum size for the web page cache. Default: 40 MB.
|
defines the maximum size for the web page cache. Default: 40 MB.
|
||||||
|
Quite unfortunately, this is only taken into account when creating
|
||||||
|
the cache file. You need to delete the file for a change to be
|
||||||
|
taken into account.
|
||||||
|
|
||||||
idxflushmb
|
idxflushmb
|
||||||
|
|
||||||
|
@ -929,15 +934,15 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
filtermaxseconds
|
filtermaxseconds
|
||||||
|
|
||||||
Maximum filter execution time, after which it is aborted. Some
|
Maximum handler execution time, after which it is aborted. Some
|
||||||
postscript programs just loop...
|
postscript programs just loop...
|
||||||
|
|
||||||
filtersdir
|
filtersdir
|
||||||
|
|
||||||
A directory to search for the external filter scripts used to
|
A directory to search for the external input handler scripts used
|
||||||
index some types of files. The value should not be changed, except
|
to index some types of files. The value should not be changed,
|
||||||
if you want to modify one of the default scripts. The value can be
|
except if you want to modify one of the default scripts. The value
|
||||||
redefined for any sub-directory.
|
can be redefined for any sub-directory.
|
||||||
|
|
||||||
iconsdir
|
iconsdir
|
||||||
|
|
||||||
|
@ -1018,17 +1023,17 @@ Chapter 5. Installation and configuration
|
||||||
This section defines lists of synonyms for the canonical names
|
This section defines lists of synonyms for the canonical names
|
||||||
used inside the [prefixes] and [stored] sections
|
used inside the [prefixes] and [stored] sections
|
||||||
|
|
||||||
filter-specific sections
|
handler-specific sections
|
||||||
|
|
||||||
Some filters may need specific configuration for handling fields.
|
Some input handlers may need specific configuration for handling
|
||||||
Only the email message filter currently has such a section (named
|
fields. Only the email message handler currently has such a
|
||||||
[mail]). It allows indexing arbitrary email headers in addition to
|
section (named [mail]). It allows indexing arbitrary email headers
|
||||||
the ones indexed by default. Other such sections may appear in the
|
in addition to the ones indexed by default. Other such sections
|
||||||
future.
|
may appear in the future.
|
||||||
|
|
||||||
Here follows a small example of a personal fields file. This would extract
|
Here follows a small example of a personal fields file. This would extract
|
||||||
a specific email header and use it as a searchable field, with data
|
a specific email header and use it as a searchable field, with data
|
||||||
displayable inside result lists. (Side note: as the email filter does no
|
displayable inside result lists. (Side note: as the email handler does no
|
||||||
decoding on the values, only plain ascii headers can be indexed, and only
|
decoding on the values, only plain ascii headers can be indexed, and only
|
||||||
the first occurrence will be used for headers that occur several times).
|
the first occurrence will be used for headers that occur several times).
|
||||||
|
|
||||||
|
@ -1060,10 +1065,10 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
5.4.3. The mimemap file
|
5.4.3. The mimemap file
|
||||||
|
|
||||||
mimemap specifies the file name extension to mime type mappings.
|
mimemap specifies the file name extension to MIME type mappings.
|
||||||
|
|
||||||
For file names without an extension, or with an unknown one, the system's
|
For file names without an extension, or with an unknown one, the system's
|
||||||
file -i command will be executed to determine the mime type (this can be
|
file -i command will be executed to determine the MIME type (this can be
|
||||||
switched off inside the main configuration file).
|
switched off inside the main configuration file).
|
||||||
|
|
||||||
The mappings can be specified on a per-subtree basis, which may be useful
|
The mappings can be specified on a per-subtree basis, which may be useful
|
||||||
|
@ -1084,7 +1089,7 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
5.4.4. The mimeconf file
|
5.4.4. The mimeconf file
|
||||||
|
|
||||||
mimeconf specifies how the different mime types are handled for indexing,
|
mimeconf specifies how the different MIME types are handled for indexing,
|
||||||
and which icons are displayed in the recoll result lists.
|
and which icons are displayed in the recoll result lists.
|
||||||
|
|
||||||
Changing the parameters in the [index] section is probably not a good idea
|
Changing the parameters in the [index] section is probably not a good idea
|
||||||
|
@ -1108,7 +1113,7 @@ Chapter 5. Installation and configuration
|
||||||
Recoll GUI preferences, all mimeview entries will be ignored except the
|
Recoll GUI preferences, all mimeview entries will be ignored except the
|
||||||
one labelled application/x-all (which is set to use xdg-open by default).
|
one labelled application/x-all (which is set to use xdg-open by default).
|
||||||
|
|
||||||
In this case, the xallexcepts top level variable defines a list of mime
|
In this case, the xallexcepts top level variable defines a list of MIME
|
||||||
type exceptions which will be processed according to the local entries
|
type exceptions which will be processed according to the local entries
|
||||||
instead of being passed to the desktop. This is so that specific Recoll
|
instead of being passed to the desktop. This is so that specific Recoll
|
||||||
options such as a page number or a search string can be passed to
|
options such as a page number or a search string can be passed to
|
||||||
|
@ -1121,13 +1126,13 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
All viewer definition entries must be placed under a [view] section.
|
All viewer definition entries must be placed under a [view] section.
|
||||||
|
|
||||||
The keys in the file are normally mime types. You can add an application
|
The keys in the file are normally MIME types. You can add an application
|
||||||
tag to specialize the choice for an area of the filesystem (using a
|
tag to specialize the choice for an area of the filesystem (using a
|
||||||
localfields specification in mimeconf). The syntax for the key is
|
localfields specification in mimeconf). The syntax for the key is
|
||||||
mimetype|tag
|
mimetype|tag
|
||||||
|
|
||||||
The nouncompforviewmts entry, (placed at the top level, outside of the
|
The nouncompforviewmts entry, (placed at the top level, outside of the
|
||||||
[view] section), holds a list of mime types that should not be
|
[view] section), holds a list of MIME types that should not be
|
||||||
uncompressed before starting the viewer (if they are found compressed, ie:
|
uncompressed before starting the viewer (if they are found compressed, ie:
|
||||||
mydoc.doc.gz).
|
mydoc.doc.gz).
|
||||||
|
|
||||||
|
@ -1147,7 +1152,7 @@ Chapter 5. Installation and configuration
|
||||||
will not create a temporary file to extract the subdocument, expecting
|
will not create a temporary file to extract the subdocument, expecting
|
||||||
the called application (possibly a script) to be able to handle it.
|
the called application (possibly a script) to be able to handle it.
|
||||||
|
|
||||||
o %M. Mime type
|
o %M. MIME type
|
||||||
|
|
||||||
o %p. Page index. Only significant for a subset of document types,
|
o %p. Page index. Only significant for a subset of document types,
|
||||||
currently only PDF, Postscript and DVI files. Can be used to start the
|
currently only PDF, Postscript and DVI files. Can be used to start the
|
||||||
|
@ -1200,7 +1205,7 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
.blob = application/x-blobapp
|
.blob = application/x-blobapp
|
||||||
|
|
||||||
Note that the mime type is made up here, and you could call it
|
Note that the MIME type is made up here, and you could call it
|
||||||
diesel/oil just the same.
|
diesel/oil just the same.
|
||||||
|
|
||||||
o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
||||||
|
@ -1211,7 +1216,7 @@ Chapter 5. Installation and configuration
|
||||||
would use %u if it liked URLs better.
|
would use %u if it liked URLs better.
|
||||||
|
|
||||||
If you just wanted to change the application used by Recoll to display a
|
If you just wanted to change the application used by Recoll to display a
|
||||||
mime type which it already knows, you would just need to edit mimeview.
|
MIME type which it already knows, you would just need to edit mimeview.
|
||||||
The entries you add in your personal file override those in the central
|
The entries you add in your personal file override those in the central
|
||||||
configuration, which you do not need to alter. mimeview can also be
|
configuration, which you do not need to alter. mimeview can also be
|
||||||
modified from the Gui.
|
modified from the Gui.
|
||||||
|
@ -1233,17 +1238,17 @@ Chapter 5. Installation and configuration
|
||||||
for the files inside the result lists. Icons are normally 64x64 pixels
|
for the files inside the result lists. Icons are normally 64x64 pixels
|
||||||
PNG files which live in /usr/[local/]share/recoll/images.
|
PNG files which live in /usr/[local/]share/recoll/images.
|
||||||
|
|
||||||
o Under the [categories] section, you should add the mime type where it
|
o Under the [categories] section, you should add the MIME type where it
|
||||||
makes sense (you can also create a category). Categories may be used
|
makes sense (you can also create a category). Categories may be used
|
||||||
for filtering in advanced search.
|
for filtering in advanced search.
|
||||||
|
|
||||||
The rclblob filter should be an executable program or script which exists
|
The rclblob handler should be an executable program or script which exists
|
||||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||||
argument and should output the text or html contents on the standard
|
argument and should output the text or html contents on the standard
|
||||||
output.
|
output.
|
||||||
|
|
||||||
The filter programming section describes in more detail how to write a
|
The filter programming section describes in more detail how to write an
|
||||||
filter.
|
input handler.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
305
src/README
305
src/README
|
@ -134,15 +134,15 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
4. Programming interface
|
4. Programming interface
|
||||||
|
|
||||||
4.1. Writing a document filter
|
4.1. Writing a document input handler
|
||||||
|
|
||||||
4.1.1. Simple filters
|
4.1.1. Simple input handlers
|
||||||
|
|
||||||
4.1.2. "Multiple" filters
|
4.1.2. "Multiple" handlers
|
||||||
|
|
||||||
4.1.3. Telling Recoll about the filter
|
4.1.3. Telling Recoll about the handler
|
||||||
|
|
||||||
4.1.4. Filter HTML output
|
4.1.4. Input handler HTML output
|
||||||
|
|
||||||
4.1.5. Page numbers
|
4.1.5. Page numbers
|
||||||
|
|
||||||
|
@ -259,7 +259,7 @@ Chapter 1. Introduction
|
||||||
|
|
||||||
Recoll stores all internal data in Unicode UTF-8 format, and it can index
|
Recoll stores all internal data in Unicode UTF-8 format, and it can index
|
||||||
files with different character sets, encodings, and languages into the
|
files with different character sets, encodings, and languages into the
|
||||||
same index. It has input filters for many document types.
|
same index. It has can process many document types.
|
||||||
|
|
||||||
Stemming is the process by which Recoll reduces words to their radicals so
|
Stemming is the process by which Recoll reduces words to their radicals so
|
||||||
that searching does not depend, for example, on a word being singular or
|
that searching does not depend, for example, on a word being singular or
|
||||||
|
@ -418,13 +418,13 @@ Chapter 2. Indexing
|
||||||
|
|
||||||
Excluding types can be done by adding wildcard name patterns to the
|
Excluding types can be done by adding wildcard name patterns to the
|
||||||
skippedNames list, which can be done from the GUI Index configuration
|
skippedNames list, which can be done from the GUI Index configuration
|
||||||
menu. It is also possible to exclude a mime type independantly of the file
|
menu. For versions 1.20 and later, you can alternatively set the
|
||||||
name by associating it with the rclnull filter. This can be done by
|
excludedmimetypes list in the configuration file. This can be redefined
|
||||||
editing the mimeconf configuration file.
|
for subdirectories.
|
||||||
|
|
||||||
In order to define a positive list, You need to edit the main
|
You can also define an exclusive list of MIME types to be indexed (no
|
||||||
configuration file (recoll.conf) and set the indexedmimetypes
|
others will be indexed), by settting the indexedmimetypes configuration
|
||||||
configuration variable. Example:
|
variable. Example:
|
||||||
|
|
||||||
indexedmimetypes = text/html application/pdf
|
indexedmimetypes = text/html application/pdf
|
||||||
|
|
||||||
|
@ -436,10 +436,11 @@ Chapter 2. Indexing
|
||||||
|
|
||||||
|
|
||||||
(When using sections like this, don't forget that they remain in effect
|
(When using sections like this, don't forget that they remain in effect
|
||||||
until the end of the file or another section indicator). There is no GUI
|
until the end of the file or another section indicator).
|
||||||
way to edit the parameter, because this option runs contrary to Recoll
|
|
||||||
main goal which is to help you find information, independantly of how it
|
excludedmimetypes or indexedmimetypes, can be set either by editing the
|
||||||
may be stored.
|
main configuration file (recoll.conf), or from the GUI index configuration
|
||||||
|
tool.
|
||||||
|
|
||||||
2.1.4. Recovery
|
2.1.4. Recovery
|
||||||
|
|
||||||
|
@ -702,7 +703,7 @@ Chapter 2. Indexing
|
||||||
|
|
||||||
mime_type
|
mime_type
|
||||||
|
|
||||||
If set, this overrides any other determination of the file mime
|
If set, this overrides any other determination of the file MIME
|
||||||
type.
|
type.
|
||||||
|
|
||||||
charset
|
charset
|
||||||
|
@ -1018,11 +1019,11 @@ Chapter 3. Searching
|
||||||
you prefer to completely customize the choice of applications, you can
|
you prefer to completely customize the choice of applications, you can
|
||||||
uncheck the Use desktop preferences option in the GUI preferences dialog,
|
uncheck the Use desktop preferences option in the GUI preferences dialog,
|
||||||
and click the Choose editor applications button to adjust the predefined
|
and click the Choose editor applications button to adjust the predefined
|
||||||
Recoll choices. The tool accepts multiple selections of mime types (e.g.
|
Recoll choices. The tool accepts multiple selections of MIME types (e.g.
|
||||||
to set up the editor for the dozens of office file types).
|
to set up the editor for the dozens of office file types).
|
||||||
|
|
||||||
Even when Use desktop preferences is checked, there is a small list of
|
Even when Use desktop preferences is checked, there is a small list of
|
||||||
exceptions, for mime types where the Recoll choice should override the
|
exceptions, for MIME types where the Recoll choice should override the
|
||||||
desktop one. These are applications which are well integrated with Recoll,
|
desktop one. These are applications which are well integrated with Recoll,
|
||||||
especially evince for viewing PDF and Postscript files because of its
|
especially evince for viewing PDF and Postscript files because of its
|
||||||
support for opening the document at a specific page and passing a search
|
support for opening the document at a specific page and passing a search
|
||||||
|
@ -1242,7 +1243,7 @@ Chapter 3. Searching
|
||||||
specifying multiple clauses which are combined to build the search.
|
specifying multiple clauses which are combined to build the search.
|
||||||
|
|
||||||
2. The second tab lets filter the results according to file size, date of
|
2. The second tab lets filter the results according to file size, date of
|
||||||
modification, mime type, or location.
|
modification, MIME type, or location.
|
||||||
|
|
||||||
Click on the Start Search button in the advanced search dialog, or type
|
Click on the Start Search button in the advanced search dialog, or type
|
||||||
Enter in any text field to start the search. The button in the main window
|
Enter in any text field to start the search. The button in the main window
|
||||||
|
@ -1305,8 +1306,8 @@ Chapter 3. Searching
|
||||||
can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
|
can use suffix multipliers: k/K, m/M, g/G, t/T for 1E3, 1E6, 1E9, 1E12
|
||||||
respectively.
|
respectively.
|
||||||
|
|
||||||
o The next section allows filtering the results by their mime types, or
|
o The next section allows filtering the results by their MIME types, or
|
||||||
mime categories (ie: media/text/message/etc.).
|
MIME categories (ie: media/text/message/etc.).
|
||||||
|
|
||||||
You can transfer the types between two boxes, to define which will be
|
You can transfer the types between two boxes, to define which will be
|
||||||
included or excluded by the search.
|
included or excluded by the search.
|
||||||
|
@ -1647,7 +1648,7 @@ Chapter 3. Searching
|
||||||
an appropriate application.
|
an appropriate application.
|
||||||
|
|
||||||
o Exceptions: when using the desktop preferences for opening documents,
|
o Exceptions: when using the desktop preferences for opening documents,
|
||||||
these are mime types that will still be opened according to Recoll
|
these are MIME types that will still be opened according to Recoll
|
||||||
preferences. This is useful for passing parameters like page numbers
|
preferences. This is useful for passing parameters like page numbers
|
||||||
or search strings to applications that support them (e.g. evince).
|
or search strings to applications that support them (e.g. evince).
|
||||||
This cannot be done with xdg-open which only supports passing one
|
This cannot be done with xdg-open which only supports passing one
|
||||||
|
@ -1789,7 +1790,7 @@ Chapter 3. Searching
|
||||||
|
|
||||||
o %D. Date
|
o %D. Date
|
||||||
|
|
||||||
o %I. Icon image name. This is normally determined from the mime type.
|
o %I. Icon image name. This is normally determined from the MIME type.
|
||||||
The associations are defined inside the mimeconf configuration file.
|
The associations are defined inside the mimeconf configuration file.
|
||||||
If a thumbnail for the file is found at the standard Freedesktop
|
If a thumbnail for the file is found at the standard Freedesktop
|
||||||
location, this will be displayed instead.
|
location, this will be displayed instead.
|
||||||
|
@ -1798,7 +1799,7 @@ Chapter 3. Searching
|
||||||
|
|
||||||
o %L. Precooked Preview, Edit, and possibly Snippets links
|
o %L. Precooked Preview, Edit, and possibly Snippets links
|
||||||
|
|
||||||
o %M. Mime type
|
o %M. MIME type
|
||||||
|
|
||||||
o %N. result Number inside the result page
|
o %N. result Number inside the result page
|
||||||
|
|
||||||
|
@ -1824,7 +1825,7 @@ Chapter 3. Searching
|
||||||
stored by default, apart from the values above (only author and filename),
|
stored by default, apart from the values above (only author and filename),
|
||||||
so this feature will need some custom local configuration to be useful. An
|
so this feature will need some custom local configuration to be useful. An
|
||||||
example candidate would be the recipient field which is generated by the
|
example candidate would be the recipient field which is generated by the
|
||||||
message filters.
|
message input handlers.
|
||||||
|
|
||||||
The default value for the paragraph format string is:
|
The default value for the paragraph format string is:
|
||||||
|
|
||||||
|
@ -1949,6 +1950,8 @@ Chapter 3. Searching
|
||||||
-m : dump the whole document meta[] array for each result
|
-m : dump the whole document meta[] array for each result
|
||||||
-A : output the document abstracts
|
-A : output the document abstracts
|
||||||
-S fld : sort by field <fld>
|
-S fld : sort by field <fld>
|
||||||
|
-s stemlang : set stemming language to use (must exist in index...)
|
||||||
|
Use -s "" to turn off stem expansion
|
||||||
-D : sort descending
|
-D : sort descending
|
||||||
-i <dbdir> : additional index, several can be given
|
-i <dbdir> : additional index, several can be given
|
||||||
-e use url encoding (%xx) for urls
|
-e use url encoding (%xx) for urls
|
||||||
|
@ -2139,7 +2142,7 @@ Chapter 3. Searching
|
||||||
|
|
||||||
Periods can also be specified with small letters (ie: p2y).
|
Periods can also be specified with small letters (ie: p2y).
|
||||||
|
|
||||||
o mime or format for specifying the mime type. This one is quite special
|
o mime or format for specifying the MIME type. This one is quite special
|
||||||
because you can specify several values which will be OR'ed (the normal
|
because you can specify several values which will be OR'ed (the normal
|
||||||
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
||||||
Specifying an explicit boolean operator before a mime specification is
|
Specifying an explicit boolean operator before a mime specification is
|
||||||
|
@ -2149,11 +2152,11 @@ Chapter 3. Searching
|
||||||
with an OR default. You do need to use OR with ext terms for example.
|
with an OR default. You do need to use OR with ext terms for example.
|
||||||
|
|
||||||
o type or rclcat for specifying the category (as in
|
o type or rclcat for specifying the category (as in
|
||||||
text/media/presentation/etc.). The classification of mime types in
|
text/media/presentation/etc.). The classification of MIME types in
|
||||||
categories is defined in the Recoll configuration (mimeconf), and can
|
categories is defined in the Recoll configuration (mimeconf), and can
|
||||||
be modified or extended. The default category names are those which
|
be modified or extended. The default category names are those which
|
||||||
permit filtering results in the main GUI screen. Categories are OR'ed
|
permit filtering results in the main GUI screen. Categories are OR'ed
|
||||||
like mime types above. This can't be negated with - either.
|
like MIME types above. This can't be negated with - either.
|
||||||
|
|
||||||
Words inside phrases and capitalized words are not stem-expanded.
|
Words inside phrases and capitalized words are not stem-expanded.
|
||||||
Wildcards may be used anywhere inside a term. Specifying a wild-card on
|
Wildcards may be used anywhere inside a term. Specifying a wild-card on
|
||||||
|
@ -2161,9 +2164,9 @@ Chapter 3. Searching
|
||||||
one if the expansion is truncated because of excessive size). Also see
|
one if the expansion is truncated because of excessive size). Also see
|
||||||
More about wildcards.
|
More about wildcards.
|
||||||
|
|
||||||
The document filters used while indexing have the possibility to create
|
The document input handlers used while indexing have the possibility to
|
||||||
other fields with arbitrary names, and aliases may be defined in the
|
create other fields with arbitrary names, and aliases may be defined in
|
||||||
configuration, so that the exact field search possibilities may be
|
the configuration, so that the exact field search possibilities may be
|
||||||
different for you if someone took care of the customisation.
|
different for you if someone took care of the customisation.
|
||||||
|
|
||||||
3.5.1. Modifiers
|
3.5.1. Modifiers
|
||||||
|
@ -2378,81 +2381,91 @@ Chapter 4. Programming interface
|
||||||
Recoll has an Application Programming Interface, usable both for indexing
|
Recoll has an Application Programming Interface, usable both for indexing
|
||||||
and searching, currently accessible from the Python language.
|
and searching, currently accessible from the Python language.
|
||||||
|
|
||||||
Another less radical way to extend the application is to write filters for
|
Another less radical way to extend the application is to write input
|
||||||
new types of documents.
|
handlers for new types of documents.
|
||||||
|
|
||||||
The processing of metadata attributes for documents (fields) is highly
|
The processing of metadata attributes for documents (fields) is highly
|
||||||
configurable.
|
configurable.
|
||||||
|
|
||||||
4.1. Writing a document filter
|
4.1. Writing a document input handler
|
||||||
|
|
||||||
Recoll filters cooperate to translate from the multitude of input document
|
Terminology
|
||||||
formats, simple ones as opendocument, acrobat), or compound ones such as
|
|
||||||
Zip or Email, into the final Recoll indexing input format, which may be
|
The small programs or pieces of code which handle the processing of the
|
||||||
text/plain or text/html. Most filters are executable programs or scripts.
|
different document types for Recoll used to be called filters, which is
|
||||||
A few filters are coded in C++ and live inside recollindex. This latter
|
still reflected in the name of the directory which holds them and many
|
||||||
|
configuration variables. They were named this way because one of their
|
||||||
|
primary functions is to filter out the formatting directives and keep the
|
||||||
|
text content. However these modules may have other behaviours, and the
|
||||||
|
term input handler is now progressively substituted in the documentation.
|
||||||
|
filter is still used in many places though.
|
||||||
|
|
||||||
|
Recoll input handlers cooperate to translate from the multitude of input
|
||||||
|
document formats, simple ones as opendocument, acrobat), or compound ones
|
||||||
|
such as Zip or Email, into the final Recoll indexing input format, which
|
||||||
|
is plain text. Most input handlers are executable programs or scripts. A
|
||||||
|
few handlers are coded in C++ and live inside recollindex. This latter
|
||||||
kind will not be described here.
|
kind will not be described here.
|
||||||
|
|
||||||
There are currently (1.18 and since 1.13) two kinds of external executable
|
There are currently (1.18 and since 1.13) two kinds of external executable
|
||||||
filters:
|
input handlers:
|
||||||
|
|
||||||
o Simple filters (exec filters) run once and exit. They can be bare
|
o Simple exec handlers run once and exit. They can be bare programs like
|
||||||
programs like antiword, or scripts using other programs. They are very
|
antiword, or scripts using other programs. They are very simple to
|
||||||
simple to write, because they just need to print the converted
|
write, because they just need to print the converted document to the
|
||||||
document to the standard output. Their output can be text/plain or
|
standard output. Their output can be plain text or HTML. HTML is
|
||||||
text/html.
|
usually preferred because it can store metadata fields and it allows
|
||||||
|
preserving some of the formatting for the GUI preview.
|
||||||
|
|
||||||
o Multiple filters (execm filters), run as long as their master process
|
o Multiple execm handlers can process multiple files (sparing the
|
||||||
(recollindex) is active. They can process multiple files (sparing the
|
|
||||||
process startup time which can be very significant), or multiple
|
process startup time which can be very significant), or multiple
|
||||||
documents per file (e.g.: for zip or chm files). They communicate with
|
documents per file (e.g.: for zip or chm files). They communicate with
|
||||||
the indexer through a simple protocol, but are nevertheless a bit more
|
the indexer through a simple protocol, but are nevertheless a bit more
|
||||||
complicated than the older kind. Most of new filters are written in
|
complicated than the older kind. Most of new handlers are written in
|
||||||
Python, using a common module to handle the protocol. There is an
|
Python, using a common module to handle the protocol. There is an
|
||||||
exception, rclimg which is written in Perl. The subdocuments output by
|
exception, rclimg which is written in Perl. The subdocuments output by
|
||||||
these filters can be directly indexable (text or HTML), or they can be
|
these handlers can be directly indexable (text or HTML), or they can
|
||||||
other simple or compound documents that will need to be processed by
|
be other simple or compound documents that will need to be processed
|
||||||
another filter.
|
by another handler.
|
||||||
|
|
||||||
In both cases, filters deal with regular file system files, and can
|
In both cases, handlers deal with regular file system files, and can
|
||||||
process either a single document, or a linear list of documents in each
|
process either a single document, or a linear list of documents in each
|
||||||
file. Recoll is responsible for performing up to date checks, deal with
|
file. Recoll is responsible for performing up to date checks, deal with
|
||||||
more complex embedding and other upper level issues.
|
more complex embedding and other upper level issues.
|
||||||
|
|
||||||
In the extreme case of a simple filter returning a document in text/plain
|
A simple handler returning a document in text/plain format, can transfer
|
||||||
format, no metadata can be transferred from the filter to the indexer.
|
no metadata to the indexer. Generic metadata, like document size or
|
||||||
Generic metadata, like document size or modification date, will be
|
modification date, will be gathered and stored by the indexer.
|
||||||
gathered and stored by the indexer.
|
|
||||||
|
|
||||||
Filters that produce text/html format can return an arbitrary amount of
|
Handlers that produce text/html format can return an arbitrary amount of
|
||||||
metadata inside HTML meta tags. These will be processed according to the
|
metadata inside HTML meta tags. These will be processed according to the
|
||||||
directives found in the fields configuration file.
|
directives found in the fields configuration file.
|
||||||
|
|
||||||
The filters that can handle multiple documents per file return a single
|
The handlers that can handle multiple documents per file return a single
|
||||||
piece of data to identify each document inside the file. This piece of
|
piece of data to identify each document inside the file. This piece of
|
||||||
data, called an ipath element will be sent back by Recoll to extract the
|
data, called an ipath element will be sent back by Recoll to extract the
|
||||||
document at query time, for previewing, or for creating a temporary file
|
document at query time, for previewing, or for creating a temporary file
|
||||||
to be opened by a viewer.
|
to be opened by a viewer.
|
||||||
|
|
||||||
The following section describes the simple filters, and the next one gives
|
The following section describes the simple handlers, and the next one
|
||||||
a few explanations about the execm ones. You could conceivably write a
|
gives a few explanations about the execm ones. You could conceivably write
|
||||||
simple filter with only the elements in the manual. This will not be the
|
a simple handler with only the elements in the manual. This will not be
|
||||||
case for the other ones, for which you will have to look at the code.
|
the case for the other ones, for which you will have to look at the code.
|
||||||
|
|
||||||
4.1.1. Simple filters
|
4.1.1. Simple input handlers
|
||||||
|
|
||||||
Recoll simple filters are usually shell-scripts, but this is in no way
|
Recoll simple handlers are usually shell-scripts, but this is in no way
|
||||||
necessary. Extracting the text from the native format is the difficult
|
necessary. Extracting the text from the native format is the difficult
|
||||||
part. Outputting the format expected by Recoll is trivial. Happily enough,
|
part. Outputting the format expected by Recoll is trivial. Happily enough,
|
||||||
most document formats have translators or text extractors which can be
|
most document formats have translators or text extractors which can be
|
||||||
called from the filter. In some cases the output of the translating
|
called from the handler. In some cases the output of the translating
|
||||||
program is completely appropriate, and no intermediate shell-script is
|
program is completely appropriate, and no intermediate shell-script is
|
||||||
needed.
|
needed.
|
||||||
|
|
||||||
Filters are called with a single argument which is the source file name.
|
Input handlers are called with a single argument which is the source file
|
||||||
They should output the result to stdout.
|
name. They should output the result to stdout.
|
||||||
|
|
||||||
When writing a filter, you should decide if it will output plain text or
|
When writing a handler, you should decide if it will output plain text or
|
||||||
HTML. Plain text is simpler, but you will not be able to add metadata or
|
HTML. Plain text is simpler, but you will not be able to add metadata or
|
||||||
vary the output character encoding (this will be defined in a
|
vary the output character encoding (this will be defined in a
|
||||||
configuration file). Additionally, some formatting may be easier to
|
configuration file). Additionally, some formatting may be easier to
|
||||||
|
@ -2461,25 +2474,26 @@ Chapter 4. Programming interface
|
||||||
field searches..
|
field searches..
|
||||||
|
|
||||||
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
||||||
the filter if the operation is for indexing or previewing. Some filters
|
the handler if the operation is for indexing or previewing. Some handlers
|
||||||
use this to output a slightly different format, for example stripping
|
use this to output a slightly different format, for example stripping
|
||||||
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
||||||
This is not essential.
|
This is not essential.
|
||||||
|
|
||||||
You should look at one of the simple filters, for example rclps for a
|
You should look at one of the simple handlers, for example rclps for a
|
||||||
starting point.
|
starting point.
|
||||||
|
|
||||||
Don't forget to make your filter executable before testing !
|
Don't forget to make your handler executable before testing !
|
||||||
|
|
||||||
4.1.2. "Multiple" filters
|
4.1.2. "Multiple" handlers
|
||||||
|
|
||||||
If you can program and want to write an execm filter, it should not be too
|
If you can program and want to write an execm handler, it should not be
|
||||||
difficult to make sense of one of the existing modules. For example, look
|
too difficult to make sense of one of the existing modules. For example,
|
||||||
at rclzip which uses Zip file paths as identifiers (ipath), and rclics,
|
look at rclzip which uses Zip file paths as identifiers (ipath), and
|
||||||
which uses an integer index. Also have a look at the comments inside the
|
rclics, which uses an integer index. Also have a look at the comments
|
||||||
internfile/mh_execm.h file and possibly at the corresponding module.
|
inside the internfile/mh_execm.h file and possibly at the corresponding
|
||||||
|
module.
|
||||||
|
|
||||||
execm filters sometimes need to make a choice for the nature of the ipath
|
execm handlers sometimes need to make a choice for the nature of the ipath
|
||||||
elements that they use in communication with the indexer. Here are a few
|
elements that they use in communication with the indexer. Here are a few
|
||||||
guidelines:
|
guidelines:
|
||||||
|
|
||||||
|
@ -2491,34 +2505,34 @@ Chapter 4. Programming interface
|
||||||
|
|
||||||
o Recoll uses a colon (:) as a separator to store a complex path
|
o Recoll uses a colon (:) as a separator to store a complex path
|
||||||
internally (for deeper embedding). Colons inside the ipath elements
|
internally (for deeper embedding). Colons inside the ipath elements
|
||||||
output by a filter will be escaped, but would be a bad choice as a
|
output by a handler will be escaped, but would be a bad choice as a
|
||||||
filter-specific separator (mostly, again, for debugging issues).
|
handler-specific separator (mostly, again, for debugging issues).
|
||||||
|
|
||||||
In any case, the main goal is that it should be easy for the filter to
|
In any case, the main goal is that it should be easy for the handler to
|
||||||
extract the target document, given the file name and the ipath element.
|
extract the target document, given the file name and the ipath element.
|
||||||
|
|
||||||
execm filters will also produce a document with a null ipath element.
|
execm handlers will also produce a document with a null ipath element.
|
||||||
Depending on the type of document, this may have some associated data
|
Depending on the type of document, this may have some associated data
|
||||||
(e.g. the body of an email message), or none (typical for an archive
|
(e.g. the body of an email message), or none (typical for an archive
|
||||||
file). If it is empty, this document will be useful anyway for some
|
file). If it is empty, this document will be useful anyway for some
|
||||||
operations, as the parent of the actual data documents.
|
operations, as the parent of the actual data documents.
|
||||||
|
|
||||||
4.1.3. Telling Recoll about the filter
|
4.1.3. Telling Recoll about the handler
|
||||||
|
|
||||||
There are two elements that link a file to the filter which should process
|
There are two elements that link a file to the handler which should
|
||||||
it: the association of file to mime type and the association of a mime
|
process it: the association of file to MIME type and the association of a
|
||||||
type with a filter.
|
MIME type with a handler.
|
||||||
|
|
||||||
The association of files to mime types is mostly based on name suffixes.
|
The association of files to MIME types is mostly based on name suffixes.
|
||||||
The types are defined inside the mimemap file. Example:
|
The types are defined inside the mimemap file. Example:
|
||||||
|
|
||||||
|
|
||||||
.doc = application/msword
|
.doc = application/msword
|
||||||
|
|
||||||
If no suffix association is found for the file name, Recoll will try to
|
If no suffix association is found for the file name, Recoll will try to
|
||||||
execute the file -i command to determine a mime type.
|
execute the file -i command to determine a MIME type.
|
||||||
|
|
||||||
The association of file types to filters is performed in the mimeconf
|
The association of file types to handlers is performed in the mimeconf
|
||||||
file. A sample will probably be of better help than a long explanation:
|
file. A sample will probably be of better help than a long explanation:
|
||||||
|
|
||||||
|
|
||||||
|
@ -2545,10 +2559,10 @@ Chapter 4. Programming interface
|
||||||
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
||||||
and not output by unrtf in the HTML header section.
|
and not output by unrtf in the HTML header section.
|
||||||
|
|
||||||
o application/x-chm is processed by a persistant filter. This is
|
o application/x-chm is processed by a persistant handler. This is
|
||||||
determined by the execm keyword.
|
determined by the execm keyword.
|
||||||
|
|
||||||
4.1.4. Filter HTML output
|
4.1.4. Input handler HTML output
|
||||||
|
|
||||||
The output HTML could be very minimal like the following example:
|
The output HTML could be very minimal like the following example:
|
||||||
|
|
||||||
|
@ -2600,8 +2614,8 @@ Chapter 4. Programming interface
|
||||||
<meta name="date" content="2013-02-24 17:50:00">
|
<meta name="date" content="2013-02-24 17:50:00">
|
||||||
|
|
||||||
|
|
||||||
Filters also have the possibility to "invent" field names. This should
|
Input handlers also have the possibility to "invent" field names. This
|
||||||
also be output as meta tags:
|
should also be output as meta tags:
|
||||||
|
|
||||||
<meta name="somefield" content="Some textual data" />
|
<meta name="somefield" content="Some textual data" />
|
||||||
|
|
||||||
|
@ -2617,10 +2631,10 @@ Chapter 4. Programming interface
|
||||||
|
|
||||||
4.1.5. Page numbers
|
4.1.5. Page numbers
|
||||||
|
|
||||||
The indexer will interpret ^L characters in the filter output as
|
The indexer will interpret ^L characters in the handler output as
|
||||||
indicating page breaks, and will record them. At query time, this allows
|
indicating page breaks, and will record them. At query time, this allows
|
||||||
starting a viewer on the right page for a hit or a snippet. Currently,
|
starting a viewer on the right page for a hit or a snippet. Currently,
|
||||||
only the PDF, Postscript and DVI filters generate page breaks.
|
only the PDF, Postscript and DVI handlers generate page breaks.
|
||||||
|
|
||||||
4.2. Field data processing
|
4.2. Field data processing
|
||||||
|
|
||||||
|
@ -2628,14 +2642,14 @@ Chapter 4. Programming interface
|
||||||
author, abstract.
|
author, abstract.
|
||||||
|
|
||||||
The field values for documents can appear in several ways during indexing:
|
The field values for documents can appear in several ways during indexing:
|
||||||
either output by filters as meta fields in the HTML header section, or
|
either output by input handlers as meta fields in the HTML header section,
|
||||||
extracted from file extended attributes, or added as attributes of the Doc
|
or extracted from file extended attributes, or added as attributes of the
|
||||||
object when using the API, or again synthetized internally by Recoll.
|
Doc object when using the API, or again synthetized internally by Recoll.
|
||||||
|
|
||||||
The Recoll query language allows searching for text in a specific field.
|
The Recoll query language allows searching for text in a specific field.
|
||||||
|
|
||||||
Recoll defines a number of default fields. Additional ones can be output
|
Recoll defines a number of default fields. Additional ones can be output
|
||||||
by filters, and described in the fields configuration file.
|
by handlers, and described in the fields configuration file.
|
||||||
|
|
||||||
Fields can be:
|
Fields can be:
|
||||||
|
|
||||||
|
@ -2794,7 +2808,7 @@ Chapter 4. Programming interface
|
||||||
|
|
||||||
The Db class
|
The Db class
|
||||||
|
|
||||||
A Db object is created by a connect() function and holds a connection to a
|
A Db object is created by a connect() call and holds a connection to a
|
||||||
Recoll index.
|
Recoll index.
|
||||||
|
|
||||||
Methods
|
Methods
|
||||||
|
@ -3088,7 +3102,7 @@ Chapter 5. Installation and configuration
|
||||||
text file inside the configuration directory.
|
text file inside the configuration directory.
|
||||||
|
|
||||||
A list of common file types which need external commands follows. Many of
|
A list of common file types which need external commands follows. Many of
|
||||||
the filters need the iconv command, which is not always listed as a
|
the handlers need the iconv command, which is not always listed as a
|
||||||
dependancy.
|
dependancy.
|
||||||
|
|
||||||
Please note that, due to the relatively dynamic nature of this
|
Please note that, due to the relatively dynamic nature of this
|
||||||
|
@ -3103,7 +3117,7 @@ Chapter 5. Installation and configuration
|
||||||
http://www.recoll.org/features.html if a file type is important to you.
|
http://www.recoll.org/features.html if a file type is important to you.
|
||||||
|
|
||||||
As of Recoll release 1.14, a number of XML-based formats that were handled
|
As of Recoll release 1.14, a number of XML-based formats that were handled
|
||||||
by ad hoc filter code now use the xsltproc command, which usually comes
|
by ad hoc handler code now use the xsltproc command, which usually comes
|
||||||
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
||||||
|
|
||||||
Now for the list:
|
Now for the list:
|
||||||
|
@ -3121,7 +3135,7 @@ Chapter 5. Installation and configuration
|
||||||
it may be be used as a fallback for some files which antiword does not
|
it may be be used as a fallback for some files which antiword does not
|
||||||
handle.
|
handle.
|
||||||
|
|
||||||
o MS Excel and PowerPoint need catdoc.
|
o MS Excel and PowerPoint are processed by internal Python handlers.
|
||||||
|
|
||||||
o MS Open XML (docx) needs xsltproc.
|
o MS Open XML (docx) needs xsltproc.
|
||||||
|
|
||||||
|
@ -3140,11 +3154,8 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
o djvu files need djvutxt and djvused from the DjVuLibre package.
|
o djvu files need djvutxt and djvused from the DjVuLibre package.
|
||||||
|
|
||||||
o Audio files: Recoll releases before 1.13 used the id3info command from
|
o Audio files: Recoll releases 1.14 and later use a single Python
|
||||||
the id3lib package to extract mp3 tag information, metaflac (standard
|
handler based on mutagen for all audio file types.
|
||||||
flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
|
|
||||||
Releases 1.14 and later use a single Python filter based on mutagen
|
|
||||||
for all audio file types.
|
|
||||||
|
|
||||||
o Pictures: Recoll uses the Exiftool Perl package to extract tag
|
o Pictures: Recoll uses the Exiftool Perl package to extract tag
|
||||||
information. Most image file formats are supported. Note that there
|
information. Most image file formats are supported. Note that there
|
||||||
|
@ -3152,7 +3163,7 @@ Chapter 5. Installation and configuration
|
||||||
aperture, etc.). This is only of interest if you store personal tags
|
aperture, etc.). This is only of interest if you store personal tags
|
||||||
or textual descriptions inside the image files.
|
or textual descriptions inside the image files.
|
||||||
|
|
||||||
o chm: files in microsoft help format need Python and the pychm module
|
o chm: files in Microsoft help format need Python and the pychm module
|
||||||
(which needs chmlib).
|
(which needs chmlib).
|
||||||
|
|
||||||
o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
||||||
|
@ -3168,11 +3179,11 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
o Konqueror webarchive format with Python (uses the Tarfile module).
|
o Konqueror webarchive format with Python (uses the Tarfile module).
|
||||||
|
|
||||||
o mimehtml web archive format (support based on the email filter, which
|
o Mimehtml web archive format (support based on the email handler, which
|
||||||
introduces some mild weirdness, but still usable).
|
introduces some mild weirdness, but still usable).
|
||||||
|
|
||||||
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
||||||
is used to index Lyx files. Many filters need iconv and the standard sed
|
is used to index Lyx files. Many handlers need iconv and the standard sed
|
||||||
and awk.
|
and awk.
|
||||||
|
|
||||||
5.3. Building from source
|
5.3. Building from source
|
||||||
|
@ -3495,10 +3506,10 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
A space-separated list of patterns for names of files or
|
A space-separated list of patterns for names of files or
|
||||||
directories that should be ignored inside zip archives. This is
|
directories that should be ignored inside zip archives. This is
|
||||||
used directly by the zip filter, and has a function similar to
|
used directly by the zip handler, and has a function similar to
|
||||||
skippedNames, but works independantly. Can be redefined for
|
skippedNames, but works independantly. Can be redefined for
|
||||||
filesystem subdirectories. For versions up to 1.19, you will need
|
filesystem subdirectories. For versions up to 1.19, you will need
|
||||||
to update the Zip filter and install a supplementary Python
|
to update the Zip handler and install a supplementary Python
|
||||||
module. The details are described on the Recoll wiki.
|
module. The details are described on the Recoll wiki.
|
||||||
|
|
||||||
followLinks
|
followLinks
|
||||||
|
@ -3513,11 +3524,16 @@ Chapter 5. Installation and configuration
|
||||||
indexedmimetypes
|
indexedmimetypes
|
||||||
|
|
||||||
Recoll normally indexes any file which it knows how to read. This
|
Recoll normally indexes any file which it knows how to read. This
|
||||||
list lets you restrict the indexed mime types to what you specify.
|
list lets you restrict the indexed MIME types to what you specify.
|
||||||
If the variable is unspecified or the list empty (the default),
|
If the variable is unspecified or the list empty (the default),
|
||||||
all supported types are processed. Can be redefined for
|
all supported types are processed. Can be redefined for
|
||||||
subdirectories.
|
subdirectories.
|
||||||
|
|
||||||
|
excludedmimetypes
|
||||||
|
|
||||||
|
This list lets you exclude some MIME types from indexing. Can be
|
||||||
|
redefined for subdirectories.
|
||||||
|
|
||||||
compressedfilemaxkbs
|
compressedfilemaxkbs
|
||||||
|
|
||||||
Size limit for compressed (.gz or .bz2) files. These need to be
|
Size limit for compressed (.gz or .bz2) files. These need to be
|
||||||
|
@ -3550,14 +3566,14 @@ Chapter 5. Installation and configuration
|
||||||
Recoll indexes file names in a special section of the database to
|
Recoll indexes file names in a special section of the database to
|
||||||
allow specific file names searches using wild cards. This
|
allow specific file names searches using wild cards. This
|
||||||
parameter decides if file name indexing is performed only for
|
parameter decides if file name indexing is performed only for
|
||||||
files with mime types that would qualify them for full text
|
files with MIME types that would qualify them for full text
|
||||||
indexing, or for all files inside the selected subtrees,
|
indexing, or for all files inside the selected subtrees,
|
||||||
independently of mime type.
|
independently of MIME type.
|
||||||
|
|
||||||
usesystemfilecommand
|
usesystemfilecommand
|
||||||
|
|
||||||
Decide if we use the file -i system command as a final step for
|
Decide if we use the file -i system command as a final step for
|
||||||
determining the mime type for a file (the main procedure uses
|
determining the MIME type for a file (the main procedure uses
|
||||||
suffix associations as defined in the mimemap file). This can be
|
suffix associations as defined in the mimemap file). This can be
|
||||||
useful for files with suffix-less names, but it will also cause
|
useful for files with suffix-less names, but it will also cause
|
||||||
the indexing of many bogus "text" files.
|
the indexing of many bogus "text" files.
|
||||||
|
@ -3770,6 +3786,9 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
This is only used by the web browser plugin indexing code, and
|
This is only used by the web browser plugin indexing code, and
|
||||||
defines the maximum size for the web page cache. Default: 40 MB.
|
defines the maximum size for the web page cache. Default: 40 MB.
|
||||||
|
Quite unfortunately, this is only taken into account when creating
|
||||||
|
the cache file. You need to delete the file for a change to be
|
||||||
|
taken into account.
|
||||||
|
|
||||||
idxflushmb
|
idxflushmb
|
||||||
|
|
||||||
|
@ -3909,15 +3928,15 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
filtermaxseconds
|
filtermaxseconds
|
||||||
|
|
||||||
Maximum filter execution time, after which it is aborted. Some
|
Maximum handler execution time, after which it is aborted. Some
|
||||||
postscript programs just loop...
|
postscript programs just loop...
|
||||||
|
|
||||||
filtersdir
|
filtersdir
|
||||||
|
|
||||||
A directory to search for the external filter scripts used to
|
A directory to search for the external input handler scripts used
|
||||||
index some types of files. The value should not be changed, except
|
to index some types of files. The value should not be changed,
|
||||||
if you want to modify one of the default scripts. The value can be
|
except if you want to modify one of the default scripts. The value
|
||||||
redefined for any sub-directory.
|
can be redefined for any sub-directory.
|
||||||
|
|
||||||
iconsdir
|
iconsdir
|
||||||
|
|
||||||
|
@ -3998,17 +4017,17 @@ Chapter 5. Installation and configuration
|
||||||
This section defines lists of synonyms for the canonical names
|
This section defines lists of synonyms for the canonical names
|
||||||
used inside the [prefixes] and [stored] sections
|
used inside the [prefixes] and [stored] sections
|
||||||
|
|
||||||
filter-specific sections
|
handler-specific sections
|
||||||
|
|
||||||
Some filters may need specific configuration for handling fields.
|
Some input handlers may need specific configuration for handling
|
||||||
Only the email message filter currently has such a section (named
|
fields. Only the email message handler currently has such a
|
||||||
[mail]). It allows indexing arbitrary email headers in addition to
|
section (named [mail]). It allows indexing arbitrary email headers
|
||||||
the ones indexed by default. Other such sections may appear in the
|
in addition to the ones indexed by default. Other such sections
|
||||||
future.
|
may appear in the future.
|
||||||
|
|
||||||
Here follows a small example of a personal fields file. This would extract
|
Here follows a small example of a personal fields file. This would extract
|
||||||
a specific email header and use it as a searchable field, with data
|
a specific email header and use it as a searchable field, with data
|
||||||
displayable inside result lists. (Side note: as the email filter does no
|
displayable inside result lists. (Side note: as the email handler does no
|
||||||
decoding on the values, only plain ascii headers can be indexed, and only
|
decoding on the values, only plain ascii headers can be indexed, and only
|
||||||
the first occurrence will be used for headers that occur several times).
|
the first occurrence will be used for headers that occur several times).
|
||||||
|
|
||||||
|
@ -4040,10 +4059,10 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
5.4.3. The mimemap file
|
5.4.3. The mimemap file
|
||||||
|
|
||||||
mimemap specifies the file name extension to mime type mappings.
|
mimemap specifies the file name extension to MIME type mappings.
|
||||||
|
|
||||||
For file names without an extension, or with an unknown one, the system's
|
For file names without an extension, or with an unknown one, the system's
|
||||||
file -i command will be executed to determine the mime type (this can be
|
file -i command will be executed to determine the MIME type (this can be
|
||||||
switched off inside the main configuration file).
|
switched off inside the main configuration file).
|
||||||
|
|
||||||
The mappings can be specified on a per-subtree basis, which may be useful
|
The mappings can be specified on a per-subtree basis, which may be useful
|
||||||
|
@ -4064,7 +4083,7 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
5.4.4. The mimeconf file
|
5.4.4. The mimeconf file
|
||||||
|
|
||||||
mimeconf specifies how the different mime types are handled for indexing,
|
mimeconf specifies how the different MIME types are handled for indexing,
|
||||||
and which icons are displayed in the recoll result lists.
|
and which icons are displayed in the recoll result lists.
|
||||||
|
|
||||||
Changing the parameters in the [index] section is probably not a good idea
|
Changing the parameters in the [index] section is probably not a good idea
|
||||||
|
@ -4088,7 +4107,7 @@ Chapter 5. Installation and configuration
|
||||||
Recoll GUI preferences, all mimeview entries will be ignored except the
|
Recoll GUI preferences, all mimeview entries will be ignored except the
|
||||||
one labelled application/x-all (which is set to use xdg-open by default).
|
one labelled application/x-all (which is set to use xdg-open by default).
|
||||||
|
|
||||||
In this case, the xallexcepts top level variable defines a list of mime
|
In this case, the xallexcepts top level variable defines a list of MIME
|
||||||
type exceptions which will be processed according to the local entries
|
type exceptions which will be processed according to the local entries
|
||||||
instead of being passed to the desktop. This is so that specific Recoll
|
instead of being passed to the desktop. This is so that specific Recoll
|
||||||
options such as a page number or a search string can be passed to
|
options such as a page number or a search string can be passed to
|
||||||
|
@ -4101,13 +4120,13 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
All viewer definition entries must be placed under a [view] section.
|
All viewer definition entries must be placed under a [view] section.
|
||||||
|
|
||||||
The keys in the file are normally mime types. You can add an application
|
The keys in the file are normally MIME types. You can add an application
|
||||||
tag to specialize the choice for an area of the filesystem (using a
|
tag to specialize the choice for an area of the filesystem (using a
|
||||||
localfields specification in mimeconf). The syntax for the key is
|
localfields specification in mimeconf). The syntax for the key is
|
||||||
mimetype|tag
|
mimetype|tag
|
||||||
|
|
||||||
The nouncompforviewmts entry, (placed at the top level, outside of the
|
The nouncompforviewmts entry, (placed at the top level, outside of the
|
||||||
[view] section), holds a list of mime types that should not be
|
[view] section), holds a list of MIME types that should not be
|
||||||
uncompressed before starting the viewer (if they are found compressed, ie:
|
uncompressed before starting the viewer (if they are found compressed, ie:
|
||||||
mydoc.doc.gz).
|
mydoc.doc.gz).
|
||||||
|
|
||||||
|
@ -4127,7 +4146,7 @@ Chapter 5. Installation and configuration
|
||||||
will not create a temporary file to extract the subdocument, expecting
|
will not create a temporary file to extract the subdocument, expecting
|
||||||
the called application (possibly a script) to be able to handle it.
|
the called application (possibly a script) to be able to handle it.
|
||||||
|
|
||||||
o %M. Mime type
|
o %M. MIME type
|
||||||
|
|
||||||
o %p. Page index. Only significant for a subset of document types,
|
o %p. Page index. Only significant for a subset of document types,
|
||||||
currently only PDF, Postscript and DVI files. Can be used to start the
|
currently only PDF, Postscript and DVI files. Can be used to start the
|
||||||
|
@ -4180,7 +4199,7 @@ Chapter 5. Installation and configuration
|
||||||
|
|
||||||
.blob = application/x-blobapp
|
.blob = application/x-blobapp
|
||||||
|
|
||||||
Note that the mime type is made up here, and you could call it
|
Note that the MIME type is made up here, and you could call it
|
||||||
diesel/oil just the same.
|
diesel/oil just the same.
|
||||||
|
|
||||||
o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
||||||
|
@ -4191,7 +4210,7 @@ Chapter 5. Installation and configuration
|
||||||
would use %u if it liked URLs better.
|
would use %u if it liked URLs better.
|
||||||
|
|
||||||
If you just wanted to change the application used by Recoll to display a
|
If you just wanted to change the application used by Recoll to display a
|
||||||
mime type which it already knows, you would just need to edit mimeview.
|
MIME type which it already knows, you would just need to edit mimeview.
|
||||||
The entries you add in your personal file override those in the central
|
The entries you add in your personal file override those in the central
|
||||||
configuration, which you do not need to alter. mimeview can also be
|
configuration, which you do not need to alter. mimeview can also be
|
||||||
modified from the Gui.
|
modified from the Gui.
|
||||||
|
@ -4213,14 +4232,14 @@ Chapter 5. Installation and configuration
|
||||||
for the files inside the result lists. Icons are normally 64x64 pixels
|
for the files inside the result lists. Icons are normally 64x64 pixels
|
||||||
PNG files which live in /usr/[local/]share/recoll/images.
|
PNG files which live in /usr/[local/]share/recoll/images.
|
||||||
|
|
||||||
o Under the [categories] section, you should add the mime type where it
|
o Under the [categories] section, you should add the MIME type where it
|
||||||
makes sense (you can also create a category). Categories may be used
|
makes sense (you can also create a category). Categories may be used
|
||||||
for filtering in advanced search.
|
for filtering in advanced search.
|
||||||
|
|
||||||
The rclblob filter should be an executable program or script which exists
|
The rclblob handler should be an executable program or script which exists
|
||||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||||
argument and should output the text or html contents on the standard
|
argument and should output the text or html contents on the standard
|
||||||
output.
|
output.
|
||||||
|
|
||||||
The filter programming section describes in more detail how to write a
|
The filter programming section describes in more detail how to write an
|
||||||
filter.
|
input handler.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue