doc and messages

2012-10-04 17:03:46 +02:00 · 2012-10-04 17:03:46 +02:00 · abe18946ed
commit abe18946ed
parent f54ac99973
3 changed files with 312 additions and 151 deletions
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@ -139,28 +139,48 @@
      index. It has input filters for many document types.</para>
      <para>Stemming is the process by which &RCL; reduces words to
-      their radicals so that searching does not depend, for example,
+        their radicals so that searching does not depend, for example, on a
-      on a word being singular or plural (floor, floors), or on a verb
+        word being singular or plural (floor, floors), or on a verb tense
-      tense (flooring, floored). Because the mechanisms used for
+        (flooring, floored). Because the mechanisms used for stemming
-      stemming depend on the specific grammatical rules for each
+        depend on the specific grammatical rules for each language, there
-      language, there is a separate stemmer module for most common
+        is a separate stemmer module for most common languages where
-      languages where stemming makes sense. Storing documents written
+        stemming makes sense.</para>
-      in different languages in the same index is possible, and
+
-      commonly done. In this situation, you can specify several
+      <para>&RCL; stores the unstemmed versions of terms in the main index
-      stemming languages for the index. &RCL; stores the unstemmed
+        and uses auxiliary databases for term expansion (one for each
-      versions of terms in the main index and uses auxiliary databases
+        stemming language), which means that you can switch stemming
-      for term expansion (one for each stemming language), which means
+        languages between searches, or add a language without needing a
-      that you can switch stemming languages between searches, or add
+        full reindex.</para>
-      a language without needing a full reindex. &RCL; currently
+
-      makes no attempt at automatic language recognition, which means
+      <para>Storing documents written in different languages in the same
-      that the stemmer will sometimes be applied to terms from other
+        index is possible, and commonly done. In this situation, you can
-      languages with potentially strange results. In practise, even if
+        specify several stemming languages for the index. </para>
-      this introduces possibilities of confusion, this approach has
+
-      been proven quite useful, and, awaiting the addition of an
+      <para>&RCL; currently makes no attempt at automatic language
-      automatic language recognition module to &RCL;, it is much less
+        recognition, which means that the stemmer will sometimes be applied
-      cumbersome than separating your documents according to what
+        to terms from other languages with potentially strange results. In
        practise, even if this introduces possibilities of confusion, this
        approach has been proven quite useful, and, awaiting the addition
        of an automatic language recognition module to &RCL;, it is much
        less cumbersome than separating your documents according to what
        language they are written in.</para>
      <para>Before version 1.18, &RCL; always stripped most accents and
        diacritics from terms, and converted them to lower case before
        storing them in the index. As a consequence, it was impossible to
        search for a particular capitalization of a term
        (<literal>US</literal> / <literal>us</literal>), or to
        discriminate two terms based on diacritics (<literal>sake</literal>
        / <literal>saké</literal>, <literal>mate</literal> /
        <literal>maté</literal>).</para>  
      <para>As of version 1.18, &RCL; can optionally store the raw terms,
        without accent stripping or case conversion. Expansions necessary
        for searches insensitive to case and/or diacritics are then
        performed when searching. This is described in more detail in the
        <link linkend="RCL.INDEXING.CONFIG.SENS">section about index case
        and diacritics sensitivity</link>.</para>
      <para>&RCL; has many parameters which define exactly what to
        index, and how to classify and decode the source
        documents. These are kept in <link
@ -507,13 +527,45 @@ recoll
      <sect2 id="rcl.indexing.config.sens">
        <title>Index case and diacritics sensitivity</title>
-        <para>Index case sensitivity
+        <para>As of &RCL; version 1.18 you have a choice of building an
-          is controlled by the <i>indexStripChars</i> configuration
+          index with terms stripped of character case and diacritics, or
          one with raw terms. For a source term of
          <literal>Résumé</literal>, the former will store
          <literal>resume</literal>, the latter
          <literal>Résumé</literal>.</para>
        <para>Each type of index allows performing searches insensitive to
          case and diacritics: with a raw index, the user entry will be
          expanded to match all case and diacritics variations present in
          the index. With a stripped index, the search term will be stripped
          before searching.</para>
        <para>A raw index allows for another possibility which a stripped
          index cannot offer: using case and diacritics to discriminate
          between terms, returning different results when searching for
          <literal>US</literal> and <literal>us</literal> or
          <literal>resume</literal> and <literal>résumé</literal>.
          Read the <link linkend="rcl.search.casediac">section about search
          case and diacritics sensitivity</link> for more details.</para>
        <para>The type of index to be created is controlled by the
          <literal>indexStripChars</literal> configuration
          variable which can only be changed by editing the
          configuration file. Any change implies an index reset (not
-          automated by recoll), and all indexes in a search must be set
+          automated by &RCL;), and all indexes in a search must be set
-          in the same way (again, not checked by recoll). </para>
+          in the same way (again, not checked by &RCL;). </para>
        <para>If the <literal>indexStripChars</literal> is not set, &RCL;
          1.18 creates a stripped index by default, for
          compatibility with previous versions.</para>
        <para>As a cost for added capability, a raw index will be slightly
          bigger than a stripped one (around 10%). Also, searches will be
          more complex, so probably slightly slower, and the feature is
          still young, and a certain amount of weirdness cannot be
          excluded.</para> 
      </sect2>
      <sect2 id="rcl.indexing.config.gui">
@ -1011,7 +1063,7 @@ fvwm
       start an external viewer. The viewer for each document type can be
       configured through the user preferences dialog, or by editing the
       <filename>mimeview</filename> configuration file. You can also check
-       the <guilabel>Use desktop preferences</guilabel> option in the user
+       the <guilabel>Use desktop preferences</guilabel> option in the GUI
       preferences dialog to use the desktop defaults for all
       documents. This is probably the best option if you are using a well
       configured <application>Gnome</application> or 
@ -1819,6 +1871,14 @@ fvwm
            application.</para>
           </listitem>
            <listitem><para><guilabel>Exceptions</guilabel>: when using the
            desktop preferences for opening documents, these are mime types
            that will still be opened according to &RCL; preferences. This
            is useful for passing parameters like page numbers or search
            strings to applications that support them
            (e.g. <application>evince</application>).</para> 
           </listitem>
            <listitem><para><guilabel>Choose editor applications</guilabel>
            this will let you choose the command started by the
            <guilabel>Open</guilabel> links inside the result list, for
@ -2369,31 +2429,44 @@ text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
          section</link>.</para> 
      <para>&RCL; currently manages the following default fields:</para>
      <itemizedlist>
        <listitem><para><literal>title</literal>,
            <literal>subject</literal> or <literal>caption</literal> are
            synonyms which specify data to be searched for in the
            document title or subject.</para>
           </listitem>
        <listitem><para><literal>author</literal> or
-        <literal>from</literal> for searching the documents originators.</para>
+            <literal>from</literal> for searching the documents
            originators.</para>
           </listitem>
        <listitem><para><literal>recipient</literal> or
-        <literal>to</literal> for searching the documents recipients.</para>
+            <literal>to</literal> for searching the documents
            recipients.</para>
           </listitem>
        <listitem><para><literal>keyword</literal> for searching the
-        document-specified keywords (few documents actually have any).</para>
+            document-specified keywords (few documents actually have
            any).</para>
           </listitem>
        <listitem><para><literal>filename</literal> for the document's
            file name.</para></listitem>
        <listitem><para><literal>ext</literal> specifies the file
            name extension (Ex: <literal>ext:html</literal>)</para>
           </listitem>
         </itemizedlist>
      <para>The field syntax also supports a few field-like, but
        special, criteria:</para>
      <itemizedlist>
        <listitem><para><literal>dir</literal> for filtering the
            results on file location (Ex:
            <literal>dir:/home/me/somedir</literal>). <literal>-dir</literal>
@ -2434,6 +2507,7 @@ text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
            <literal>/</literal> is present but an element is missing, the
            missing element is interpreted as the lowest or highest date in the
            index. Examples:</para>
 	  <itemizedlist>
 	    <listitem><para><literal>2001-03-01/2002-05-01</literal> the
 	        basic syntax for an interval of dates.</para>
@ -2491,8 +2565,9 @@ text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
        stem-expanded. Wildcards may be used anywhere inside a term.
        Specifying a wild-card on the left of a term can produce a very
        slow search (or even an incorrect one if the expansion is
-      truncated because of excessive size). Also see <link
+        truncated because of excessive size). Also see 
-      linkend="rcl.search.wildcards">More about wildcards</link>.</para>
+        <link linkend="rcl.search.wildcards">
          More about wildcards</link>.</para>
      <para>The document filters used while indexing have the
        possibility to create other fields with arbitrary names, and
@ -2507,6 +2582,7 @@ text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
          immediately after the closing double quote of a phrase, as in
          <literal>"some term"modifierchars</literal>. The actual "phrase"
          can be a single term of course. Supported modifiers:
        <itemizedlist>
            <listitem><para><literal>l</literal> can be used to turn off
            stemming (mostly makes sense with <literal>p</literal> because
@ -2525,6 +2601,12 @@ text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
            (unordered). Example:<literal>"order any in"p</literal></para>
            </listitem>
            <listitem><para><literal>C</literal> will turn on case
            sensitivity (if the index supports it).</para></listitem>
            <listitem><para><literal>D</literal> will turn on diacritics
                sensitivity (if the index supports it).</para></listitem>
            <listitem><para>A weight can be specified for a query element
            by specifying a decimal value at the start of the
            modifiers. Example: <literal>"Important"2.5</literal>.</para>
@ -2537,6 +2619,78 @@ text/html       [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
    </sect1> <!-- rcl.search.lang -->
    <sect1 id="rcl.search.casediac">
      <title>Search case and diacritics sensitivity</title>
      <para>For &RCL; versions 1.18 and later, and <emphasis>when working
          with a raw index</emphasis> (not the default), searches can be
          made sensitive
        to character case and diacritics. How this happens is controlled by
        configuration variables and what search data is entered.</para>
      <para>The general default is that searches are insensitive to case
      and diacritics. An entry of <literal>resume</literal> will match any
      of <literal>Resume</literal>, <literal>RESUME</literal>,
      <literal>résumé</literal>, <literal>Résumé</literal> etc.</para>
      <para>Two configuration variables can automate switching on
        sensitivity:</para> 
      <variablelist>
        <varlistentry>
          <term>autodiacsens</term><listitem><para>If this is set, search
              sensitivity to diacritics will be turned on as soon as an
              accented character exists in a search term. When the variable
              is set to true, <literal>resume</literal> will start a
              diacritics-unsensitive search, but <literal>résumé</literal>
              will be matched exactly. The default value is
              <emphasis>false</emphasis>.</para></listitem>
        </varlistentry>
        <varlistentry>
          <term>autocasesens</term><listitem><para>If this is set, search
              sensitivity to character case will be turned on as soon as an
              upper-case character exists in a search term <emphasis>except
              for the first one</emphasis>. When the variable is set to
              true, <literal>us</literal> or <literal>Us</literal> will
              start a diacritics-unsensitive search, but
              <literal>US</literal> will be matched exactly. The default
              value is <emphasis>true</emphasis> (contrary to
              <literal>autodiacsens</literal>).</para></listitem>
        </varlistentry>
      </variablelist>
      <para>As in the past, capitalizing the first letter of a word will
        turn off its stem expansion and have no effect on
        case-sensitivity.</para>
      <para>You can also explicitely activate case and diacritics
      sensitivity by using modifiers with the query
      language. <literal>C</literal> will make the term case-sensitive, and
      <literal>D</literal> will make it
      diacritics-sensitive. Examples:</para>
      <programlisting>
        "us"C
   </programlisting>
      <para>will search for the term <literal>us</literal> exactly
      (<literal>Us</literal> will not be a match).</para>
      <programlisting>
        "resume"D
      </programlisting>
      <para>will search for the term <literal>resume</literal> exactly
      (<literal>résumé</literal> will not be a match).</para>
      <para>When either case or diacritics sensitivity is activated, stem
        expansion is turned off. Having both does not make much sense.</para>
   </sect1>
    <sect1 id="rcl.search.anchorwild">
      <title>Anchored searches and wildcards</title>
@ -2931,9 +3085,9 @@ application/x-chm = execm rclchm
        <para>The indexer will interpret <literal>^L</literal> characters
          in the filter output as indicating page breaks, and will record
          them. At query time, this allows starting a viewer on the right
-        page for a hit or a snippet. Currently, only the PDF filter
+          page for a hit or a snippet. Currently, only the PDF, Postscript
-        generates page breaks (thanks to
+          and DVI filters generate page breaks.</para>
-        <literal>pdftotext</literal>).</para>
+
      </sect2>
    </sect1>
@ -4529,30 +4683,38 @@ x-my-tag = mailmytag
        <title>The mimeview file</title>
        <para><filename>mimeview</filename> specifies which programs
-        are started when you click on an <guilabel>Open</guilabel>
+          are started when you click on an <guilabel>Open</guilabel> link
-        link in a result list. Ie: HTML is normally displayed using
+          in a result list. Ie: HTML is normally displayed using
         <application>firefox</application>, but you may prefer
         <application>Konqueror</application>, your
         <application>openoffice.org</application> 
         program might be named <command>oofice</command> instead of
-         <command>openoffice</command> etc. 
+         <command>openoffice</command> etc.</para>
         </para>
        <para>Changes to this file can be done by direct editing, or
-        through the <command>recoll</command> user preferences dialog.</para>
+        through the <command>recoll</command> GUI preferences dialog.</para>
        <para>If <guilabel>Use desktop preferences to choose document
-        editor</guilabel> is checked in the &RCL; GUI user preferences, all
+        editor</guilabel> is checked in the &RCL; GUI preferences, all
        <filename>mimeview</filename> entries will be ignored except the
        one labelled <literal>application/x-all</literal> (which is set to
        use <command>xdg-open</command> by default).</para>
        <para>In this case, the <literal>xallexcepts</literal> top level
          variable defines a list of mime type exceptions which
          will be processed according to the local entries instead of being
          passed to the desktop. This is so that specific &RCL; options
          such as a page number or a search string can be passed to
          applications that support them, such as the
          <application>evince</application> viewer.</para>
        <para>As for the other configuration files, the normal usage
          is to have a <filename>mimeview</filename> inside your own
          configuration directory, with just the non-default entries,
          which will override those from the central configuration
          file.</para>
-        <para>Please note that these entries must be placed under a
+
        <para>All viewer definition entries must be placed under a
          <literal>[view]</literal> section.</para>
 	<para>The keys in the file are normally mime types. You can add an
@ -4602,8 +4764,8 @@ x-my-tag = mailmytag
          <listitem><formalpara><title>%p</title>
              <para>Page index. Only significant for a subset of document
-              types, currently only PDF files. Can be used to start the
+              types, currently only PDF, Postscript and DVI files. Can be
-              editor at the right page for a match or
+              used to start the editor at the right page for a match or
              snippet.</para></formalpara>
          </listitem>
--- a/src/qtgui/uiprefs.ui
+++ b/src/qtgui/uiprefs.ui
@ -184,6 +184,9 @@
               <property name="text">
                <string>Exceptions</string>
               </property>
               <property name="toolTip">
 	        <string>Mime types that should not be passed to xdg-open even when "Use desktop preferences" is set.&lt;br&gt; Useful to pass page number and search string options to, e.g. evince.</string>
 		   </property>
              </widget>
             </item>
             <item>
--- a/src/utils/conftree.cpp
+++ b/src/utils/conftree.cpp
@ -39,10 +39,6 @@
 using namespace std;
 #endif // NO_NAMESPACES
 #ifndef MIN
 #define MIN(A,B) ((A)<(B) ? (A) : (B))
 #endif
 #undef DEBUG
 #ifdef DEBUG
 #define LOGDEB(X) fprintf X
@ -276,7 +272,7 @@ int ConfSimple::set(const std::string &nm, const std::string &value,
 {
    if (status  != STATUS_RW)
 	return 0;
-    LOGDEB2(("ConfSimple::set [%s]:[%s] -> [%s]\n", sk.c_str(),
+    LOGDEB((stderr, "ConfSimple::set [%s]:[%s] -> [%s]\n", sk.c_str(),
 	     nm.c_str(), value.c_str()));
    if (!i_set(nm, value, sk))
 	return 0;