release 2917
This commit is contained in:
parent
1be563398f
commit
4aedf7dca8
2 changed files with 220 additions and 158 deletions
|
@ -653,6 +653,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
Note that the translation is not limited to a single character,
|
Note that the translation is not limited to a single character,
|
||||||
you could very well have something like u:ue in the list.
|
you could very well have something like u:ue in the list.
|
||||||
|
|
||||||
|
The default value set for unac_except_trans can't be listed here
|
||||||
|
because I have trouble with SGML and UTF-8, but it only contains
|
||||||
|
ligature decompositions: german ss, oe, ae, fi, fl.
|
||||||
|
|
||||||
This parameter can't be defined for subdirectories, it is global,
|
This parameter can't be defined for subdirectories, it is global,
|
||||||
because there is no way to do otherwise when querying. If you have
|
because there is no way to do otherwise when querying. If you have
|
||||||
document sets which would need different values, you will have to
|
document sets which would need different values, you will have to
|
||||||
|
|
374
src/README
374
src/README
|
@ -48,9 +48,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
2.3. Index configuration
|
2.3. Index configuration
|
||||||
|
|
||||||
2.3.1. Index case and diacritics sensitivity
|
2.3.1. Multiple indexes
|
||||||
|
|
||||||
2.3.2. The index configuration GUI
|
2.3.2. Index case and diacritics sensitivity
|
||||||
|
|
||||||
|
2.3.3. The index configuration GUI
|
||||||
|
|
||||||
2.4. Using Beagle WEB browser plugins
|
2.4. Using Beagle WEB browser plugins
|
||||||
|
|
||||||
|
@ -81,7 +83,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
3.1.6. The term explorer tool
|
3.1.6. The term explorer tool
|
||||||
|
|
||||||
3.1.7. Multiple databases
|
3.1.7. Multiple indexes
|
||||||
|
|
||||||
3.1.8. Document history
|
3.1.8. Document history
|
||||||
|
|
||||||
|
@ -118,8 +120,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
3.7.2. The KDE Kicker Recoll applet
|
3.7.2. The KDE Kicker Recoll applet
|
||||||
|
|
||||||
3.8. Multiple databases
|
|
||||||
|
|
||||||
4. Programming interface
|
4. Programming interface
|
||||||
|
|
||||||
4.1. Writing a document filter
|
4.1. Writing a document filter
|
||||||
|
@ -190,7 +190,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
Also be aware that you may need to install the appropriate supporting
|
Also be aware that you may need to install the appropriate supporting
|
||||||
applications for document types that need them (for example antiword for
|
applications for document types that need them (for example antiword for
|
||||||
ms-word files).
|
Microsoft Word files).
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -205,7 +205,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
You do not need to remember in what file or email message you stored a
|
You do not need to remember in what file or email message you stored a
|
||||||
given piece of information. You just ask for related terms, and the tool
|
given piece of information. You just ask for related terms, and the tool
|
||||||
will return a list of documents where those terms are prominent, in a
|
will return a list of documents where these terms are prominent, in a
|
||||||
similar way to Internet search engines.
|
similar way to Internet search engines.
|
||||||
|
|
||||||
A search application tries to determine which documents are most relevant
|
A search application tries to determine which documents are most relevant
|
||||||
|
@ -255,8 +255,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
that searching does not depend, for example, on a word being singular or
|
that searching does not depend, for example, on a word being singular or
|
||||||
plural (floor, floors), or on a verb tense (flooring, floored). Because
|
plural (floor, floors), or on a verb tense (flooring, floored). Because
|
||||||
the mechanisms used for stemming depend on the specific grammatical rules
|
the mechanisms used for stemming depend on the specific grammatical rules
|
||||||
for each language, there is a separate stemmer module for most common
|
for each language, there is a separate Xapian stemmer module for most
|
||||||
languages where stemming makes sense.
|
common languages where stemming makes sense.
|
||||||
|
|
||||||
Recoll stores the unstemmed versions of terms in the main index and uses
|
Recoll stores the unstemmed versions of terms in the main index and uses
|
||||||
auxiliary databases for term expansion (one for each stemming language),
|
auxiliary databases for term expansion (one for each stemming language),
|
||||||
|
@ -271,21 +271,21 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
means that the stemmer will sometimes be applied to terms from other
|
means that the stemmer will sometimes be applied to terms from other
|
||||||
languages with potentially strange results. In practise, even if this
|
languages with potentially strange results. In practise, even if this
|
||||||
introduces possibilities of confusion, this approach has been proven quite
|
introduces possibilities of confusion, this approach has been proven quite
|
||||||
useful, and, awaiting the addition of an automatic language recognition
|
useful, and it is much less cumbersome than separating your documents
|
||||||
module to Recoll, it is much less cumbersome than separating your
|
according to what language they are written in.
|
||||||
documents according to what language they are written in.
|
|
||||||
|
|
||||||
Before version 1.18, Recoll always stripped most accents and diacritics
|
Before version 1.18, Recoll stripped most accents and diacritics from
|
||||||
from terms, and converted them to lower case before storing them in the
|
terms, and converted them to lower case before either storing them in the
|
||||||
index. As a consequence, it was impossible to search for a particular
|
index or searching for them. As a consequence, it was impossible to search
|
||||||
capitalization of a term (US / us), or to discriminate two terms based on
|
for a particular capitalization of a term (US / us), or to discriminate
|
||||||
diacritics (sake / sake, mate / mate).
|
two terms based on diacritics (sake / sake, mate / mate).
|
||||||
|
|
||||||
As of version 1.18, Recoll can optionally store the raw terms, without
|
As of version 1.18, Recoll can optionally store the raw terms, without
|
||||||
accent stripping or case conversion. Expansions necessary for searches
|
accent stripping or case conversion. In this configuration, it is still
|
||||||
insensitive to case and/or diacritics are then performed when searching.
|
possible (and most common) for a query to be insensitive to case and/or
|
||||||
This is described in more detail in the section about index case and
|
diacritics. Appropriate term expansions are performed before actually
|
||||||
diacritics sensitivity.
|
accessing the main index. This is described in more detail in the section
|
||||||
|
about index case and diacritics sensitivity.
|
||||||
|
|
||||||
Recoll has many parameters which define exactly what to index, and how to
|
Recoll has many parameters which define exactly what to index, and how to
|
||||||
classify and decode the source documents. These are kept in configuration
|
classify and decode the source documents. These are kept in configuration
|
||||||
|
@ -297,7 +297,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
default configuration will index your home directory with default
|
default configuration will index your home directory with default
|
||||||
parameters and should be sufficient for giving Recoll a try, but you may
|
parameters and should be sufficient for giving Recoll a try, but you may
|
||||||
want to adjust it later, which can be done either by editing the text
|
want to adjust it later, which can be done either by editing the text
|
||||||
files or by using configuration menus in the recoll GUI
|
files or by using configuration menus in the recoll GUI. Some other
|
||||||
|
parameters affecting only the recoll GUI are stored in the standard
|
||||||
|
location defined by Qt.
|
||||||
|
|
||||||
The indexing process is started automatically the first time you execute
|
The indexing process is started automatically the first time you execute
|
||||||
the recoll GUI. Indexing can also be performed by executing the
|
the recoll GUI. Indexing can also be performed by executing the
|
||||||
|
@ -346,6 +348,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
small home directory). Monitoring a big file system tree can consume
|
small home directory). Monitoring a big file system tree can consume
|
||||||
significant system resources.
|
significant system resources.
|
||||||
|
|
||||||
|
The choice of method and the parameters used can be configured from the
|
||||||
|
recoll GUI: Preferences->Indexing schedule
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
2.1.2. Configurations, multiple indexes
|
2.1.2. Configurations, multiple indexes
|
||||||
|
@ -389,8 +394,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
document. Some file types, like email folders or zip archives, can hold
|
document. Some file types, like email folders or zip archives, can hold
|
||||||
many individually indexed documents, which may themselves be compound
|
many individually indexed documents, which may themselves be compound
|
||||||
ones. Such hierarchies can go quite deep, and Recoll can process, for
|
ones. Such hierarchies can go quite deep, and Recoll can process, for
|
||||||
example, an ms-word document stored as an attachment to an email message
|
example, a LibreOffice document stored as an attachment to an email
|
||||||
inside an email folder archived in a zip file...
|
message inside an email folder archived in a zip file...
|
||||||
|
|
||||||
Recoll indexing processes plain text, HTML, OpenDocument
|
Recoll indexing processes plain text, HTML, OpenDocument
|
||||||
(Open/LibreOffice), email formats, and a few others internally.
|
(Open/LibreOffice), email formats, and a few others internally.
|
||||||
|
@ -438,15 +443,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
Using multiple configuration directories and configuration options
|
Using multiple configuration directories and configuration options
|
||||||
allows you to tailor multiple configurations and indexes to handle
|
allows you to tailor multiple configurations and indexes to handle
|
||||||
whatever subset of the available data that you wish to make
|
whatever subset of the available data you wish to make searchable.
|
||||||
searchable.
|
|
||||||
|
|
||||||
* You can also specify a different storage location for the index by
|
* For a given configuration directory, you can specify a non-default
|
||||||
setting the dbdir parameter in the configuration file (see the
|
storage location for the index by setting the dbdir parameter in the
|
||||||
configuration section). This method would mainly be of use if you
|
configuration file (see the configuration section). This method would
|
||||||
wanted to keep the configuration directory in its default location,
|
mainly be of use if you wanted to keep the configuration directory in
|
||||||
but desired another location for the index, typically out of disk
|
its default location, but desired another location for the index,
|
||||||
occupation concerns.
|
typically out of disk occupation concerns.
|
||||||
|
|
||||||
The size of the index is determined by the size of the set of documents,
|
The size of the index is determined by the size of the set of documents,
|
||||||
but the ratio can vary a lot. For a typical mixed set of documents, the
|
but the ratio can vary a lot. For a typical mixed set of documents, the
|
||||||
|
@ -506,7 +510,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
Variables set inside the Recoll configuration files control which areas of
|
Variables set inside the Recoll configuration files control which areas of
|
||||||
the file system are indexed, and how files are processed. These variables
|
the file system are indexed, and how files are processed. These variables
|
||||||
can be set either by editing the text files or using the dialogs in the
|
can be set either by editing the text files or by using the dialogs in the
|
||||||
recoll GUI.
|
recoll GUI.
|
||||||
|
|
||||||
The first time you start recoll, you will be asked whether or not you
|
The first time you start recoll, you will be asked whether or not you
|
||||||
|
@ -526,9 +530,54 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
(ie: pdf, postscript, ms-word...) are described in the external packages
|
(ie: pdf, postscript, ms-word...) are described in the external packages
|
||||||
section.
|
section.
|
||||||
|
|
||||||
|
As of Recoll 1.18 there are two incompatible types of Recoll indexes,
|
||||||
|
depending on the treatment of character case and diacritics. The next
|
||||||
|
section describes the two types in more detail.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
2.3.1. Index case and diacritics sensitivity
|
2.3.1. Multiple indexes
|
||||||
|
|
||||||
|
Multiple Recoll indexes can be created by using several configuration
|
||||||
|
directories which are usually set to index different areas of the file
|
||||||
|
system. A specific index can be selected for updating or searching, using
|
||||||
|
the RECOLL_CONFDIR environment variable or the -c option to recoll and
|
||||||
|
recollindex.
|
||||||
|
|
||||||
|
A typical usage scenario for the multiple index feature would be for a
|
||||||
|
system administrator to set up a central index for shared data, that you
|
||||||
|
choose to search or not in addition to your personal data. Of course,
|
||||||
|
there are other possibilities. There are many cases where you know the
|
||||||
|
subset of files that should be searched, and where narrowing the search
|
||||||
|
can improve the results. You can achieve approximately the same effect
|
||||||
|
with the directory filter in advanced search, but multiple indexes will
|
||||||
|
have much better performance and may be worth the trouble.
|
||||||
|
|
||||||
|
A recollindex program instance can only update one specific index.
|
||||||
|
|
||||||
|
The main index (defined by RECOLL_CONFDIR or -c) is always active. If this
|
||||||
|
is undesirable, you can set up your base configuration to index an empty
|
||||||
|
directory.
|
||||||
|
|
||||||
|
The different search interfaces (GUI, command line, ...) have different
|
||||||
|
methods to define the set of indexes to be used, see the appropriate
|
||||||
|
section.
|
||||||
|
|
||||||
|
If a set of multiple indexes are to be used together for searches, some
|
||||||
|
configuration parameters must be consistent among the set. These are
|
||||||
|
parameters which need to be the same when indexing and searching. As the
|
||||||
|
parameters come from the main configuration when searching, they need to
|
||||||
|
be compatible with what was set when creating the other indexes (which
|
||||||
|
came from their respective configuration directories).
|
||||||
|
|
||||||
|
Most importantly, all indexes to be queried concurrently must have the
|
||||||
|
same option concerning character case and diacritics stripping, but there
|
||||||
|
are other constraints. Most of the relevant parameters are described in
|
||||||
|
the linked section.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
2.3.2. Index case and diacritics sensitivity
|
||||||
|
|
||||||
As of Recoll version 1.18 you have a choice of building an index with
|
As of Recoll version 1.18 you have a choice of building an index with
|
||||||
terms stripped of character case and diacritics, or one with raw terms.
|
terms stripped of character case and diacritics, or one with raw terms.
|
||||||
|
@ -556,12 +605,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
As a cost for added capability, a raw index will be slightly bigger than a
|
As a cost for added capability, a raw index will be slightly bigger than a
|
||||||
stripped one (around 10%). Also, searches will be more complex, so
|
stripped one (around 10%). Also, searches will be more complex, so
|
||||||
probably slightly slower, and the feature is still young, and a certain
|
probably slightly slower, and the feature is still young, so that a
|
||||||
amount of weirdness cannot be excluded.
|
certain amount of weirdness cannot be excluded.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
2.3.2. The index configuration GUI
|
2.3.3. The index configuration GUI
|
||||||
|
|
||||||
Most parameters for a given index configuration can be set from a recoll
|
Most parameters for a given index configuration can be set from a recoll
|
||||||
GUI running on this configuration (either as default, or by setting
|
GUI running on this configuration (either as default, or by setting
|
||||||
|
@ -797,8 +846,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
* Advanced search (a panel accessed through the Tools menu or the
|
* Advanced search (a panel accessed through the Tools menu or the
|
||||||
toolbox bar icon) has multiple entry fields, which you may use to
|
toolbox bar icon) has multiple entry fields, which you may use to
|
||||||
build a logical condition, with additional filtering on file type and
|
build a logical condition, with additional filtering on file type,
|
||||||
location in the file system.
|
location in the file system, modification date, and size.
|
||||||
|
|
||||||
In most cases, you can enter the terms as you think them, even if they
|
In most cases, you can enter the terms as you think them, even if they
|
||||||
contain embedded punctuation or other non-textual characters. For example,
|
contain embedded punctuation or other non-textual characters. For example,
|
||||||
|
@ -832,45 +881,36 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
The Query Language features are described in a separate section.
|
The Query Language features are described in a separate section.
|
||||||
|
|
||||||
File name will specifically look for file names. The entry will be split
|
|
||||||
at white space characters, and each fragment will be separately expanded,
|
|
||||||
then the search will be for file names matching all fragments (this is new
|
|
||||||
in 1.15, older releases did an OR of the whole thing which did not make
|
|
||||||
sense). Things to know:
|
|
||||||
|
|
||||||
* The search is case- and accent-insensitive.
|
|
||||||
|
|
||||||
* Fragments without any wild card character and not capitalized will be
|
|
||||||
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc). Of
|
|
||||||
course it does not make sense to have multiple fragments if one of
|
|
||||||
them is capitalized (as this one will require an exact match).
|
|
||||||
|
|
||||||
* If you want to search for a pattern including white space, use double
|
|
||||||
quotes (ie: "admin note*").
|
|
||||||
|
|
||||||
* If you have a big index (many files), excessively generic fragments
|
|
||||||
may result in inefficient searches.
|
|
||||||
|
|
||||||
* As an example, inst recoll would match recollinstall.in (and quite a
|
|
||||||
few others...).
|
|
||||||
|
|
||||||
The point of having a separate file name search is that wild card
|
|
||||||
expansion can be performed more efficiently on a relatively small subset
|
|
||||||
of the index (allowing wild cards on the left of terms without excessive
|
|
||||||
penality).
|
|
||||||
|
|
||||||
All search modes allow wildcards inside terms (*, ?, []). You may want to
|
All search modes allow wildcards inside terms (*, ?, []). You may want to
|
||||||
have a look at the section about wildcards for more information about
|
have a look at the section about wildcards for more information about
|
||||||
this.
|
this.
|
||||||
|
|
||||||
|
File name will specifically look for file names. The point of having a
|
||||||
|
separate file name search is that wild card expansion can be performed
|
||||||
|
more efficiently on a small subset of the index (allowing wild cards on
|
||||||
|
the left of terms without excessive penality). Things to know:
|
||||||
|
|
||||||
|
* White space in the entry should match white space in the file name,
|
||||||
|
and is not treated specially.
|
||||||
|
|
||||||
|
* The search is insensitive to character case and accents, independantly
|
||||||
|
of the type of index.
|
||||||
|
|
||||||
|
* An entry without any wild card character and not capitalized will be
|
||||||
|
prepended and appended with '*' (ie: etc -> *etc*, but Etc -> etc).
|
||||||
|
|
||||||
|
* If you have a big index (many files), excessively generic fragments
|
||||||
|
may result in inefficient searches.
|
||||||
|
|
||||||
You can search for exact phrases (adjacent words in a given order) by
|
You can search for exact phrases (adjacent words in a given order) by
|
||||||
enclosing the input inside double quotes. Ex: "virtual reality".
|
enclosing the input inside double quotes. Ex: "virtual reality".
|
||||||
|
|
||||||
Character case has no influence on search, except that you can disable
|
When using a stripped index, character case has no influence on search,
|
||||||
stem expansion for any term by capitalizing it. Ie: a search for floor
|
except that you can disable stem expansion for any term by capitalizing
|
||||||
will also normally look for flooring, floored, etc., but a search for
|
it. Ie: a search for floor will also normally look for flooring, floored,
|
||||||
Floor will only look for floor, in any character case. Stemming can also
|
etc., but a search for Floor will only look for floor, in any character
|
||||||
be disabled globally in the preferences.
|
case. Stemming can also be disabled globally in the preferences. When
|
||||||
|
using a raw index, the rules are a bit more complicated.
|
||||||
|
|
||||||
Recoll remembers the last few searches that you performed. You can use the
|
Recoll remembers the last few searches that you performed. You can use the
|
||||||
simple search text entry widget (a combobox) to recall them (click on the
|
simple search text entry widget (a combobox) to recall them (click on the
|
||||||
|
@ -902,8 +942,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
By default, the document list is presented in order of relevance (how well
|
By default, the document list is presented in order of relevance (how well
|
||||||
the system estimates that the document matches the query). You can sort
|
the system estimates that the document matches the query). You can sort
|
||||||
the result by ascending or descending date by using the vertical arrows in
|
the result by ascending or descending date by using the vertical arrows in
|
||||||
the toolbar (the old sort tool is gone after release 1.15, because the new
|
the toolbar.
|
||||||
result table has much better capability).
|
|
||||||
|
|
||||||
Clicking on the Preview link for an entry will open an internal preview
|
Clicking on the Preview link for an entry will open an internal preview
|
||||||
window for the document. Further Preview clicks for the same search will
|
window for the document. Further Preview clicks for the same search will
|
||||||
|
@ -1245,8 +1284,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
Note that in cases where Recoll does not know the beginning of the string
|
Note that in cases where Recoll does not know the beginning of the string
|
||||||
to search for (ie a wildcard expression like *coll), the expansion can
|
to search for (ie a wildcard expression like *coll), the expansion can
|
||||||
take quite a long time because the full index term list will have to be
|
take quite a long time because the full index term list will have to be
|
||||||
processed. The expansion is currently limited at 200 results for wildcards
|
processed. The expansion is currently limited at 10000 results for
|
||||||
and regular expressions.
|
wildcards and regular expressions.
|
||||||
|
|
||||||
Double-clicking on a term in the result list will insert it into the
|
Double-clicking on a term in the result list will insert it into the
|
||||||
simple search entry field. You can also cut/paste between the result list
|
simple search entry field. You can also cut/paste between the result list
|
||||||
|
@ -1254,7 +1293,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
3.1.7. Multiple databases
|
3.1.7. Multiple indexes
|
||||||
|
|
||||||
See the section describing the use of multiple indexes for generalities.
|
See the section describing the use of multiple indexes for generalities.
|
||||||
Only the aspects concerning the recoll GUI are described here.
|
Only the aspects concerning the recoll GUI are described here.
|
||||||
|
@ -1330,7 +1369,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
identity is based on an MD5 hash of the document container, not only of
|
identity is based on an MD5 hash of the document container, not only of
|
||||||
the text contents (so that ie, a text document with an image added will
|
the text contents (so that ie, a text document with an image added will
|
||||||
not be a duplicate of the text only). Duplicates hiding is controlled by
|
not be a duplicate of the text only). Duplicates hiding is controlled by
|
||||||
an entry in the Query configuration dialog, and is off by default.
|
an entry in the GUI configuration dialog, and is off by default.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -1451,7 +1490,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
3.1.11. Customizing the search interface
|
3.1.11. Customizing the search interface
|
||||||
|
|
||||||
You can customize some aspects of the search interface by using the Query
|
You can customize some aspects of the search interface by using the GUI
|
||||||
configuration entry in the Preferences menu.
|
configuration entry in the Preferences menu.
|
||||||
|
|
||||||
There are several tabs in the dialog, dealing with the interface itself,
|
There are several tabs in the dialog, dealing with the interface itself,
|
||||||
|
@ -1482,14 +1521,15 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
HTML display, you can uncheck it to display the plain text version
|
HTML display, you can uncheck it to display the plain text version
|
||||||
instead.
|
instead.
|
||||||
|
|
||||||
* Use <PRE> tags instead of <BR> to display plain text as HTML in
|
* Plain text to HTML line style: when displaying plain text inside the
|
||||||
preview: when displaying plain text inside the preview window, Recoll
|
preview window, Recoll tries to preserve some of the original text
|
||||||
tries to preserve some of the original text line breaks and
|
line breaks and indentation. It can either use PRE HTML tags, which
|
||||||
indentation. It can either use PRE HTML tags, which will well preserve
|
will well preserve the indentation but will force horizontal scrolling
|
||||||
the indentation but will force horizontal scrolling for long lines, or
|
for long lines, or use BR tags to break at the original line breaks,
|
||||||
use BR tags to break at the original line breaks, which will let the
|
which will let the editor introduce other line breaks according to the
|
||||||
editor introduce other line breaks according to the window width, but
|
window width, but will lose some of the original indentation. The
|
||||||
will lose some of the original indentation.
|
third option has been available in recent releases and is probably now
|
||||||
|
the best one: use PRE tags with line wrapping.
|
||||||
|
|
||||||
* Use desktop preferences to choose document editor: if this is checked,
|
* Use desktop preferences to choose document editor: if this is checked,
|
||||||
the xdg-open utility will be used to open files when you click the
|
the xdg-open utility will be used to open files when you click the
|
||||||
|
@ -1501,6 +1541,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
these are mime types that will still be opened according to Recoll
|
these are mime types that will still be opened according to Recoll
|
||||||
preferences. This is useful for passing parameters like page numbers
|
preferences. This is useful for passing parameters like page numbers
|
||||||
or search strings to applications that support them (e.g. evince).
|
or search strings to applications that support them (e.g. evince).
|
||||||
|
This cannot be done with xdg-open which only supports passing one
|
||||||
|
parameter.
|
||||||
|
|
||||||
* Choose editor applications this will let you choose the command
|
* Choose editor applications this will let you choose the command
|
||||||
started by the Open links inside the result list, for specific
|
started by the Open links inside the result list, for specific
|
||||||
|
@ -1514,9 +1556,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
search input field. This lets you look at the result list as you enter
|
search input field. This lets you look at the result list as you enter
|
||||||
new terms. This is off by default, you may like it or not...
|
new terms. This is off by default, you may like it or not...
|
||||||
|
|
||||||
* Start with advanced search dialog open and Start with sort dialog
|
* Start with advanced search dialog open : If you use this dialog
|
||||||
open: If you use these dialogs all the time, checking these entries
|
frequently, checking the entries will get it to open when recoll
|
||||||
will get them to open when recoll starts.
|
starts.
|
||||||
|
|
||||||
* Remember sort activation state if set, Recoll will remember the sort
|
* Remember sort activation state if set, Recoll will remember the sort
|
||||||
tool stat between invocations. It normally starts with sorting
|
tool stat between invocations. It normally starts with sorting
|
||||||
|
@ -1535,8 +1577,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
presentation of each result list entry. See the result list
|
presentation of each result list entry. See the result list
|
||||||
customisation section.
|
customisation section.
|
||||||
|
|
||||||
* Edit result page html header insert: allows you to define text
|
* Edit result page HTML header insert: allows you to define text
|
||||||
inserted at the end of the result page html header. More detail in the
|
inserted at the end of the result page HTML header. More detail in the
|
||||||
result list customisation section.
|
result list customisation section.
|
||||||
|
|
||||||
* Date format: allows specifying the format used for displaying dates
|
* Date format: allows specifying the format used for displaying dates
|
||||||
|
@ -1576,10 +1618,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
the document itself.
|
the document itself.
|
||||||
|
|
||||||
* Dynamically build abstracts: this decides if Recoll tries to build
|
* Dynamically build abstracts: this decides if Recoll tries to build
|
||||||
document abstracts when displaying the result list. Abstracts are
|
document abstracts (lists of snippets) when displaying the result
|
||||||
constructed by taking context from the document information, around
|
list. Abstracts are constructed by taking context from the document
|
||||||
the search terms. This can slow down result list display significantly
|
information, around the search terms.
|
||||||
for big documents, and you may want to turn it off.
|
|
||||||
|
|
||||||
* Synthetic abstract size: adjust to taste...
|
* Synthetic abstract size: adjust to taste...
|
||||||
|
|
||||||
|
@ -1615,9 +1656,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
* The paragraph format
|
* The paragraph format
|
||||||
|
|
||||||
* Html code inside the header section
|
* HTML code inside the header section
|
||||||
|
|
||||||
These can be edited from the Result list tab of the Query configuration.
|
These can be edited from the Result list tab of the GUI configuration.
|
||||||
|
|
||||||
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
||||||
(this may be disabled at build time), and total customisation is possible
|
(this may be disabled at build time), and total customisation is possible
|
||||||
|
@ -1643,9 +1684,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
* %D. Date
|
* %D. Date
|
||||||
|
|
||||||
* %E. Precooked Snippets link (will only appear for documents indexed
|
|
||||||
with page numbers)
|
|
||||||
|
|
||||||
* %I. Icon image name. This is normally determined from the mime type.
|
* %I. Icon image name. This is normally determined from the mime type.
|
||||||
The associations are defined inside the mimeconf configuration file.
|
The associations are defined inside the mimeconf configuration file.
|
||||||
If a thumbnail for the file is found at the standard Freedesktop
|
If a thumbnail for the file is found at the standard Freedesktop
|
||||||
|
@ -1653,7 +1691,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
* %K. Keywords (if any)
|
* %K. Keywords (if any)
|
||||||
|
|
||||||
* %L. Precooked Preview and Edit links
|
* %L. Precooked Preview, Edit, and possibly Snippets links
|
||||||
|
|
||||||
* %M. Mime type
|
* %M. Mime type
|
||||||
|
|
||||||
|
@ -1669,9 +1707,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
* %U. Url
|
* %U. Url
|
||||||
|
|
||||||
The format of the Preview and Edit links is <a href="P%N"> and <a
|
The format of the Preview, Edit, and Snippets links is <a href="P%N">, <a
|
||||||
href="E%N"> where docnum (%N) expands to the document number inside the
|
href="E%N"> and <a href="A%N"> where docnum (%N) expands to the document
|
||||||
result page).
|
number inside the result page).
|
||||||
|
|
||||||
In addition to the predefined values above, all strings like %(fieldname)
|
In addition to the predefined values above, all strings like %(fieldname)
|
||||||
will be replaced by the value of the field named fieldname for this
|
will be replaced by the value of the field named fieldname for this
|
||||||
|
@ -1842,7 +1880,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
used with the KIO slave or the command line search. It broadly has the
|
used with the KIO slave or the command line search. It broadly has the
|
||||||
same capabilities as the complex search interface in the GUI.
|
same capabilities as the complex search interface in the GUI.
|
||||||
|
|
||||||
The language is roughly based on the (seemingly defunct) Xesam user search
|
The language is based on the (seemingly defunct) Xesam user search
|
||||||
language specification.
|
language specification.
|
||||||
|
|
||||||
If the results of a query language search puzzle you and you doubt what
|
If the results of a query language search puzzle you and you doubt what
|
||||||
|
@ -1862,17 +1900,19 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
the document).
|
the document).
|
||||||
|
|
||||||
An element is composed of an optional field specification, and a value,
|
An element is composed of an optional field specification, and a value,
|
||||||
separated by a colon. Example: Beatles, author:balzac, dc:title:grandet
|
separated by a colon (the field separator is the last colon in the
|
||||||
|
element). Example: Eugenie, author:balzac, dc:title:grandet
|
||||||
|
|
||||||
The colon, if present, means "contains". Xesam defines other relations,
|
The colon, if present, means "contains". Xesam defines other relations,
|
||||||
which are not supported for now.
|
which are mostly supported for now (except in special cases, described
|
||||||
|
further down).
|
||||||
|
|
||||||
All elements in the search entry are normally combined with an implicit
|
All elements in the search entry are normally combined with an implicit
|
||||||
AND. It is possible to specify that elements be OR'ed instead, as in
|
AND. It is possible to specify that elements be OR'ed instead, as in
|
||||||
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
||||||
priority over the AND associations: word1 word2 OR word3 means word1 AND
|
priority over the AND associations: word1 word2 OR word3 means word1 AND
|
||||||
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
(word2 OR word3) not (word1 AND word2) OR word3. Explicit parenthesis are
|
||||||
parenthesis, they are not supported for now.
|
not supported.
|
||||||
|
|
||||||
An element preceded by a - specifies a term that should not appear. Pure
|
An element preceded by a - specifies a term that should not appear. Pure
|
||||||
negative queries are forbidden.
|
negative queries are forbidden.
|
||||||
|
@ -2103,6 +2143,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
slow search because Recoll will have to scan the whole index term list
|
slow search because Recoll will have to scan the whole index term list
|
||||||
to find the matches.
|
to find the matches.
|
||||||
|
|
||||||
|
* When working with a raw index (preserving character case and
|
||||||
|
diacritics), the literal part of a wildcard expression will be matched
|
||||||
|
exactly for case and diacritics.
|
||||||
|
|
||||||
* Using a * at the end of a word can produce more matches than you would
|
* Using a * at the end of a word can produce more matches than you would
|
||||||
think, and strange search results. You can use the term explorer tool
|
think, and strange search results. You can use the term explorer tool
|
||||||
to check what completions exist for a given term. You can also see
|
to check what completions exist for a given term. You can also see
|
||||||
|
@ -2136,12 +2180,27 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
example, bla bla my unexpected term at the beginning of the text would be
|
example, bla bla my unexpected term at the beginning of the text would be
|
||||||
a match for "^my term"o5.
|
a match for "^my term"o5.
|
||||||
|
|
||||||
|
Anchored searches can be very useful for searches inside somewhat
|
||||||
|
structured documents like scientific articles, in case explicit metadata
|
||||||
|
has not been supplied (a most frequent case), for example for looking for
|
||||||
|
matches inside the abstract or the list of authors (which occur at the top
|
||||||
|
of the document).
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
3.7. Desktop integration
|
3.7. Desktop integration
|
||||||
|
|
||||||
Being independant of the desktop type has its drawbacks: Recoll desktop
|
Being independant of the desktop type has its drawbacks: Recoll desktop
|
||||||
integration is minimal. Here follow a few things that may help.
|
integration is minimal. However there are a few tools available:
|
||||||
|
|
||||||
|
* The KDE KIO Slave was described in a previous section.
|
||||||
|
|
||||||
|
* If you use a recent version of Ubuntu Linux, you may find the Ubuntu
|
||||||
|
Unity Lens module useful.
|
||||||
|
|
||||||
|
* There is also an independantly developed Krunner plugin.
|
||||||
|
|
||||||
|
Here follow a few other things that may help.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
@ -2156,6 +2215,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
3.7.2. The KDE Kicker Recoll applet
|
3.7.2. The KDE Kicker Recoll applet
|
||||||
|
|
||||||
|
This is probably obsolete now. Anyway:
|
||||||
|
|
||||||
The Recoll source tree contains the source code to the recoll_applet, a
|
The Recoll source tree contains the source code to the recoll_applet, a
|
||||||
small application derived from the find_applet. This can be used to add a
|
small application derived from the find_applet. This can be used to add a
|
||||||
small Recoll launcher to the KDE panel.
|
small Recoll launcher to the KDE panel.
|
||||||
|
@ -2175,48 +2236,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
a new recoll GUI instance every time (even if it is already running). You
|
a new recoll GUI instance every time (even if it is already running). You
|
||||||
may find it useful anyway.
|
may find it useful anyway.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
|
|
||||||
3.8. Multiple databases
|
|
||||||
|
|
||||||
Multiple Recoll databases or indexes can be created by using several
|
|
||||||
configuration directories which are usually set to index different areas
|
|
||||||
of the file system. A specific index can be selected for updating or
|
|
||||||
searching, using the RECOLL_CONFDIR environment variable or the -c option
|
|
||||||
to recoll and recollindex.
|
|
||||||
|
|
||||||
A typical usage scenario for the multiple index feature would be for a
|
|
||||||
system administrator to set up a central index for shared data, that you
|
|
||||||
choose to search or not in addition to your personal data. Of course,
|
|
||||||
there are other possibilities. There are many cases where you know the
|
|
||||||
subset of files that should be searched, and where narrowing the search
|
|
||||||
can improve the results. You can achieve approximately the same effect
|
|
||||||
with the directory filter in advanced search, but multiple indexes will
|
|
||||||
have much better performance and may be worth the trouble.
|
|
||||||
|
|
||||||
A recollindex program instance can only update one specific index.
|
|
||||||
|
|
||||||
The main index (defined by RECOLL_CONFDIR or -c) is always active. If this
|
|
||||||
is undesirable, you can set up your base configuration to index an empty
|
|
||||||
directory.
|
|
||||||
|
|
||||||
The different search interfaces (GUI, command line, ...) have different
|
|
||||||
methods to define the set of indexes to be used, see the appropriate
|
|
||||||
section.
|
|
||||||
|
|
||||||
If a set of multiple indexes are to be used together for searches, some
|
|
||||||
configuration parameters must be consistent among the set. These are
|
|
||||||
parameters which need to be the same when indexing and searching. As the
|
|
||||||
parameters come from the main configuration when searching, they need to
|
|
||||||
be compatible with what was set when creating the other indexes (which
|
|
||||||
came from their respective configuration directories. Most of the relevant
|
|
||||||
parameters are described in the following linked section.
|
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
Chapter 4. Programming interface
|
Chapter 4. Programming interface
|
||||||
|
|
||||||
Recoll has an Application programming Interface, usable both for indexing
|
Recoll has an Application Programming Interface, usable both for indexing
|
||||||
and searching, currently accessible from the Python language.
|
and searching, currently accessible from the Python language.
|
||||||
|
|
||||||
Another less radical way to extend the application is to write filters for
|
Another less radical way to extend the application is to write filters for
|
||||||
|
@ -2237,8 +2261,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
|
|
||||||
* Simple filters (the old ones) run once and exit. They can be bare
|
* Simple filters (the old ones) run once and exit. They can be bare
|
||||||
programs like antiword, or shell-scripts using other programs. They
|
programs like antiword, or shell-scripts using other programs. They
|
||||||
are very simple to write, just having to write the text to the
|
are very simple to write, because they just need to output the
|
||||||
standard output.
|
converted to the standard output.
|
||||||
|
|
||||||
* Multiple filters, new in 1.13, run as long as their master process
|
* Multiple filters, new in 1.13, run as long as their master process
|
||||||
(ie: recollindex) is active. They can process multiple files (sparing
|
(ie: recollindex) is active. They can process multiple files (sparing
|
||||||
|
@ -2270,12 +2294,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
They should output the result to stdout.
|
They should output the result to stdout.
|
||||||
|
|
||||||
When writing a filter, you should decide if it will output plain text or
|
When writing a filter, you should decide if it will output plain text or
|
||||||
html. Plain text is simpler, but you will not be able to add metadata or
|
HTML. Plain text is simpler, but you will not be able to add metadata or
|
||||||
vary the output character encoding (this will be defined in a
|
vary the output character encoding (this will be defined in a
|
||||||
configuration file). Additionally, some formatting may easier to preserve
|
configuration file). Additionally, some formatting may be easier to
|
||||||
when previewing html. Actually the deciding factor is metadata: Recoll has
|
preserve when previewing HTML. Actually the deciding factor is metadata:
|
||||||
a way to extract metadata from the html header and use it for field
|
Recoll has a way to extract metadata from the HTML header and use it for
|
||||||
searches..
|
field searches..
|
||||||
|
|
||||||
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
||||||
the filter if the operation is for indexing or previewing. Some filters
|
the filter if the operation is for indexing or previewing. Some filters
|
||||||
|
@ -2351,7 +2375,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
transforming them into appropriate entities. "&" should be transformed
|
transforming them into appropriate entities. "&" should be transformed
|
||||||
into "&", "<" should be transformed into "<". This is not always
|
into "&", "<" should be transformed into "<". This is not always
|
||||||
properly done by translating programs which output HTML, and of course
|
properly done by translating programs which output HTML, and of course
|
||||||
nerver by those which output plain text.
|
never by those which output plain text.
|
||||||
|
|
||||||
The character set needs to be specified in the header. It does not need to
|
The character set needs to be specified in the header. It does not need to
|
||||||
be UTF-8 (Recoll will take care of translating it), but it must be
|
be UTF-8 (Recoll will take care of translating it), but it must be
|
||||||
|
@ -2407,9 +2431,39 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
A field can be either or both indexed and stored. This and other aspects
|
A field can be either or both indexed and stored. This and other aspects
|
||||||
of fields handling is defined inside the fields configuration file.
|
of fields handling is defined inside the fields configuration file.
|
||||||
|
|
||||||
|
The sequence of events for field processing is as follows:
|
||||||
|
|
||||||
|
* During indexing, recollindex scans all meta fields in HTML documents
|
||||||
|
(most document types are transformed into HTML at some point). It
|
||||||
|
compares the name for each element to the configuration defining what
|
||||||
|
should be done with fields (the fields file)
|
||||||
|
|
||||||
|
* If the name for the meta element matches one for a field that should
|
||||||
|
be indexed, the contents are processed and the terms are entered into
|
||||||
|
the index with the prefix defined in the fields file.
|
||||||
|
|
||||||
|
* If the name for the meta element matches one for a field that should
|
||||||
|
be stored, the content of the element is stored with the document data
|
||||||
|
record, from which it can be extracted and displayed at query time.
|
||||||
|
|
||||||
|
* At query time, if a field search is performed, the index prefix is
|
||||||
|
computed and the match is only performed against appropriately
|
||||||
|
prefixed terms in the index.
|
||||||
|
|
||||||
|
* At query time, the field can be displayed inside the result list by
|
||||||
|
using the appropriate directive in the definition of the result list
|
||||||
|
paragraph format. All fields are displayed on the fields screen of the
|
||||||
|
preview window (which you can reach through the right-click menu).
|
||||||
|
This is independant of the fact that the search which produced the
|
||||||
|
results used the field or not.
|
||||||
|
|
||||||
You can find more information in the section about the fields file, or in
|
You can find more information in the section about the fields file, or in
|
||||||
comments inside the file.
|
comments inside the file.
|
||||||
|
|
||||||
|
You can also have a look at the example on the Wiki, detailing how one
|
||||||
|
could add a page count field to pdf documents for displaying inside result
|
||||||
|
lists.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.3. API
|
4.3. API
|
||||||
|
@ -2462,8 +2516,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
Recoll versions after 1.11 define a Python programming interface, both for
|
Recoll versions after 1.11 define a Python programming interface, both for
|
||||||
searching and indexing.
|
searching and indexing.
|
||||||
|
|
||||||
The Python interface is not built by default and can be found in the
|
The Python interface can be found in the source package, under
|
||||||
source package, under python/recoll.
|
python/recoll.
|
||||||
|
|
||||||
In order to build the module, you should first build or re-build the
|
In order to build the module, you should first build or re-build the
|
||||||
Recoll library using position-independant objects:
|
Recoll library using position-independant objects:
|
||||||
|
@ -3313,6 +3367,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||||
Note that the translation is not limited to a single character,
|
Note that the translation is not limited to a single character,
|
||||||
you could very well have something like u:ue in the list.
|
you could very well have something like u:ue in the list.
|
||||||
|
|
||||||
|
The default value set for unac_except_trans can't be listed here
|
||||||
|
because I have trouble with SGML and UTF-8, but it only contains
|
||||||
|
ligature decompositions: german ss, oe, ae, fi, fl.
|
||||||
|
|
||||||
This parameter can't be defined for subdirectories, it is global,
|
This parameter can't be defined for subdirectories, it is global,
|
||||||
because there is no way to do otherwise when querying. If you have
|
because there is no way to do otherwise when querying. If you have
|
||||||
document sets which would need different values, you will have to
|
document sets which would need different values, you will have to
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue