diff --git a/.hgignore b/.hgignore index eb520974..d7b3277a 100644 --- a/.hgignore +++ b/.hgignore @@ -21,6 +21,8 @@ src/desktop/unity-lens-recoll/data/recoll.lens src/desktop/unity-lens-recoll/data/unity-lens-recoll.service src/doc/user/HTML.manifest src/doc/user/RCL.INDEXING.CONFIG.html +src/doc/user/RCL.INDEXING.EXTATTR.html +src/doc/user/RCL.INDEXING.EXTTAGS.html src/doc/user/RCL.INDEXING.MONITOR.html src/doc/user/RCL.INDEXING.PERIODIC.html src/doc/user/RCL.INDEXING.STORAGE.html diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index facf7114..0cbcf8aa 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -690,7 +690,7 @@ recoll - Index WEB visited page history + Indexing WEB pages you wisit With the help of a Firefox extension, &RCL; can index the Internet pages that you visit. The @@ -723,6 +723,58 @@ recoll + + Extended attributes data + + User extended attributes are named pieces of information + that most modern file systems can attach to any file. + + &RCL; versions 1.19 and later process extended attributes + as document fields by default. For older versions, this has to + be activated at build time. + + A + + freedesktop standard defines a few special + attributes, which are handled as such by &RCL;: + + + mime_type + If set, this overrides any other + determination of the file mime type. + + + charset + If set, this defines the file character set + (mostly useful for plain text files). + + + + + By default, other attributes are handled as &RCL; fields. + On Linux, the user prefix is removed from + the name. This can be configured more precisely inside + the + fields configuration file. + + + + + + Importing external tags + + During indexing, it is possible to import metadata for + each file by executing commands. For example, this could + extract user tag data for the file and store it in a field for + indexing. + + See the + section + about the metadatacmds field in + the main configuration chapter for more detail. + + + Periodic indexing @@ -2301,21 +2353,20 @@ fvwm where docnum (%N) expands to the document number inside the result page). - In addition to the predefined values above, all strings like - %(fieldname) will be replaced by the value of - the field named fieldname for this - document. Only stored fields can be accessed in this way, the value - of indexed but not stored fields is not known at this point in the - search process (see field - configuration). There are currently very few fields stored - by default, apart from the values above (only - author and filename), so this - feature will need some custom local configuration to be useful. For - example, you could look at the fields for the document types of - interest (use the right-click menu inside the preview window), and - add what you want to the list of stored fields. A candidate example - would be the recipient field which is generated - by the message filters. + In addition to the predefined values above, all strings + like %(fieldname) will be replaced by the + value of the field named fieldname for this + document. Only stored fields can be accessed in this way, the + value of indexed but not stored fields is not known at this + point in the search process + (see field + configuration). There are currently very few fields + stored by default, apart from the values above + (only author + and filename), so this feature will need + some custom local configuration to be useful. An example + candidate would be the recipient field + which is generated by the message filters. The default value for the paragraph format string is: <meta name="somefield" content="Some textual data" /> + + + You can embed HTML markup inside the content of custom + fields, for improving the display inside result lists. In this + case, add a (wildly non-standard) markup + attribute to tell &RCL; that the value is HTML and should not + be escaped for display. + + +<meta name="somefield" markup="html" content="Some <i>textual</i> data" /> See the following section for details about configuring @@ -3366,10 +3427,11 @@ application/x-chm = execm rclchm author, abstract. The field values for documents can appear in several ways - during indexing: either output by filters as - meta fields in the HTML header section, or - added as attributes of the Doc object when - using the API, or again synthetized internally by &RCL;. + during indexing: either output by filters + as meta fields in the HTML header section, or + extracted from file extended attributes, or added as attributes + of the Doc object when using the API, or + again synthetized internally by &RCL;. The &RCL; query language allows searching for text in a specific field. @@ -4661,7 +4723,25 @@ unac_except_trans = mimeview. + + + metadatacmds + This allows executing external commands + for each file and storing the output in a &RCL; + field. This could be used for example to index external + tag data. The value is a list of field names and commands, + don't forget an initial semi-colon. Example: + +[/some/area/of/the/fs] +metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f + + + + + + + @@ -4976,6 +5056,24 @@ x-my-tag = mailmytag + + Extended attributes in the fields file + + &RCL; versions 1.19 and later process user extended + file attributes as documents fields by default. + + Attributes are processed as fields of the same name, + after removing the user prefix on + Linux. + + The [xattrtofields] + section of the fields file allows + specifying translations from extended attributes names to + &RCL; field names. An empty translation disables use of the + corresponding attribute data. + + + diff --git a/website/BUGS.html b/website/BUGS.html index abb73e35..2da9f5b6 100644 --- a/website/BUGS.html +++ b/website/BUGS.html @@ -57,34 +57,34 @@ case-insensitive search does not work for them (e.g.: searching for ds1820 will not find DS1820). -
  • On systems such as Debian Stable which use Evince version - 2.x (not 3.x) as PDF viewer, the default "Open" command for - PDF files will not work. You need to edit the command: - in Preferences->GUI configuration, - uncheck Use desktop preferences..., then - click Choose editor applications, and for - application/pdf, application/postscript and text/dvi, change - the --page-index option to --page-label.
  • +
  • On systems such as Debian Stable which use Evince version + 2.x (not 3.x) as PDF viewer, the default "Open" command for + PDF files will not work. You need to edit the command: + in Preferences->GUI configuration, + uncheck Use desktop preferences..., then + click Choose editor applications, and for + application/pdf, application/postscript and text/dvi, change + the --page-index option to --page-label.
  • -
  • It will sometimes happen that the result list paragraph - format stored in the Qt preferences file will get garbled, - causing result lists with no displayed paragraphs (the - counts and pages are ok, the results can be seen in table - mode, but not in list mode). The workaround is to go to -
    - Preferences->Query configuration->User interface -
    and erase the result paragraph format string - (^A DEL in the text area), this will reset the string to the - default value.
  • +
  • It will sometimes happen that the result list paragraph + format stored in the Qt preferences file will get garbled, + causing result lists with no displayed paragraphs (the + counts and pages are ok, the results can be seen in table + mode, but not in list mode). The workaround is to go to +
    + Preferences->Query configuration->User interface +
    and erase the result paragraph format string + (^A DEL in the text area), this will reset the string to the + default value.
  • -
  • Real time indexer: when running with gamin on FreeBSD, the - indexer can deadlock in the gamin dialog in some - cases.
  • +
  • Real time indexer: when running with gamin on FreeBSD, the + indexer can deadlock in the gamin dialog in some + cases.
  • -
  • After an upgrade, the recoll GUI sometimes crashes on - startup. This is fixed by removing (back it up just in case) - ~/.config/Recoll.org/recoll.conf, the QSettings storage for - recoll.
  • +
  • After an upgrade, the recoll GUI sometimes crashes on + startup. This is fixed by removing (back it up just in case) + ~/.config/Recoll.org/recoll.conf, the QSettings storage for + recoll.