recoll/website/release-1.20.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <title>Recoll 1.20 series release notes</title>
    <meta name="Author" content="Jean-Francois Dockes">
    <meta name="Description"
          content="recoll is a simple full-text search system for unix and linux     based on the powerful and mature xapian engine">
    <meta name="Keywords" content="full text search, desktop search, unix, linux">
    <meta http-equiv="Content-language" content="en">
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <meta name="robots" content="All,Index,Follow">
    <link type="text/css" rel="stylesheet" href="styles/style.css">
  </head>

  <body>

    <div class="rightlinks">
      <ul>
        <li><a href="index.html">Home</a></li>
        <li><a href="download.html">Downloads</a></li>
        <li><a href="doc.html">Documentation</a></li>
      </ul>
    </div>

    <div class="content">
      <h1>Release notes for Recoll 1.20.x</h1>

      <h2>Caveats</h2>

      <p><em>Installing over an older version</em>: 1.19 </p>

      <p>Installing 1.20 over an 1.19 index is possible, but there
        have been small changes in the way compound words (e.g. email
        addresses) are indexed, so it will be best to reset the
        index. Still, in a pinch, 1.20 search can mostly use an 1.19
        index.</p>

      <p>Case/diacritics sensitivity is off by default. It can be
        turned on <em>only</em> by editing
        recoll.conf (
        <a href="usermanual/usermanual.html#RCL.INDEXING.CONFIG.SENS">
          see the manual</a>). If you do so, you must then reset the
        index.</p>

      <p>Always reset the index if you do not know by which version it
        was created (you're not sure it's at least 1.18). The best method
        is to quit all Recoll programs and delete the index directory
        (<span class="literal">
          rm -rf ~/.recoll/xapiandb</span>), then start <code>recoll</code>
        or <code>recollindex</code>. <br>

        <span class="literal">recollindex -z</span> will do the same
        in most, but not all, cases. It's better to use
        the <tt>rm</tt> method, which will also ensure that no debris
        from older releases remain (e.g.: old stemming files which are
        not used any more).</p>


      <h2><a name="minor_releases">Minor releases at a glance</a></h2>
      <p>The rhythm of change in Recoll is slowing as the software is
        approaching maturity, so, in order to avoid stopping progress
        by excessive intervals between releases, the first versions
        of 1.20 will be allowed to contain some functional changes (as
        opposed to only bug fixes). There will be a freeze at some
        point.

        <h2>Changes in Recoll 1.20.0p1</h2>
        <ul>

          <li>An <em>Open With</em> entry was added to the result list
            and result table popup menus. This lets you choose an
            alternative application to open a document. The list of
            applications is built from the information inside
            the <span class="filename">
              /usr/share/applications</span> desktop files.</li>

          <li>A new way for specifying multiple terms to be searched
            inside a given field: it used to be that an entry lacking
            whitespace but splittable, like [term1,term2] was
            transformed into a phrase search, which made sense in some
            cases, but no so many. The code was changed so that
            [term1,term2] now means [term1&nbsp;AND&nbsp;term2], and
            [term1/term2] means [term1&nbsp;OR&nbsp;term2]. This is
            useful for field searches where you would previously be
            forced to repeat the field name for every term.
            [somefield:term1&nbsp;somefield:term2] can now be expressed as
            [somefield:term1,term2].
          </li>

          <li>We changed the way terms are generated from a compound
            string (e.g. an email address). Previously, for an address
            like <em>jfd@recoll.org</em>, only the simple terms and
            the terms anchored at the start were generated
            (<em>jfd</em>, <em>recoll</em>, <em>org</em>, <em>jfd@recoll</em>, <em>jfd@recoll.org</em>). The
            new text splitter generates all the other possible terms
            (here, <em>recoll.org</em> only), so that it is now
            possible to search for left-truncated versions of the
            compound, e.g., all emails from a given domain.</li>

          <li>Recoll now indexes <em>#hashtags</em> as such.</li>

          <li>It is now possible to configure the GUI in wide form
            factor by dragging the toolbars to one of the sides (their
            location is remembered between sessions), and moving the
            category filters to a menu (can be set in the
            "Preferences->GUI configuration" panel).</li>

          <li>We added the <em>indexedmimetypes</em> and
            <em>excludedmimetypes</em> variables to the configuration
            GUI, which was also compacted a bit. A bunch of
            ininteresting variables were also removed.</li>

          <li>When indexing, we no longer add the top container
            file name as a term for the contained sub-documents (if
            any). This made no sense in most cases, as it meant that
            you would get hits on all the sections from a chm or epub
            when the top file name matched the search, when you
            probably wanted only the parent document in this case.<br>
            However, the container file name was sometimes useful for
            filtering results, and it is still accessible, in a
            different way: the top container file name is added as a
            term to all the sub-documents, <em>only for searching with
            a prefix</em>. The field name
            is <span class="literal">containerfilename</span>, and no
            match on the subdocuments will occur if the field is not
            specified (this is different from
            previous <span class="literal">filename</span> processing,
            which was indexed as a general
            term. <span class="literal">containerfilename</span> is
            also set on files without sub-documents (e.g. a pdf).</li>

          <li>A new attribute, <span class="literal">pfxonly</span>,
            was created. This can be set on any metadata field inside
            the <span class="literal">[prefixes]</span> section of
            the <span class="filename">fields</span> file.</li>

          <li>A new <span class="literal">[queryaliases]</span>
            section was created in
            the <span class="filename">fields</span>, for definining
            field name aliases to be used only at query time (to avoid
            unwanted collection of data on random fields during
            indexing). The section is empty by default, but 2 obvious
            alias are in
            comment: <span class="literal">filename=fn</span>
            and <span class="literal">containerfilename=cfn</span>. Setting
            them in your personal file may save you some typing if you
            search on file names.</li>

          <li>You can now use both <em>-e</em> and  <em>-i</em> for
            erasing then updating the index for the given file
            arguments with the same <em>recollindex</em> command.</li>

          <li>We now allow access to the Xapian docid for Recoll
            documents in <span class="command">recollq</span> and
            Python API search results. This allows writing scripts
            which combine Recoll and pure Xapian operations. A sample
            Python program to find document duplicates, using MD5
            terms was added. See
            <span class="filename">src/python/samples/docdups.py</span></li>

          <li>The command used to identify the mime types of files
            when the internal method is <span class="literal">file
            -i</span> by default. It is now possible to customize this
            command by setting
            the <span class="literal">systemfilecommand</span> in the
            configuration. A suggested value would
            be <span class="filename">xdg-mime</span>, which sometimes
            works better than <span class="filename">file</span>.</li>

          <li>The result list has two new elements: %P substitution
            for printing the parent folder name, and an <tt>F</tt>
            link target which will open the parent folder in a
            file manager window.</li>

          <li><span class="filename">/media</span> was added to the default
            skippedPaths list mostly as a reminder that blindly
            processing these with the general indexer is a bad idea
            (use separate indexes instead).</li>

          <li><span class="command">recollq</span>
            and <span class="command">recoll&nbsp;-t</span> get a new
            option <span class="literal">-N</span>  to print field
            names between values when
            <span class="literal">-F</span> is used. In addition,
            <span class="literal">-F&nbsp;""</span> is taken as a
            directive to print all fields.</li>

          <li>Unicode <span class="literal">hyphen</span> (0x2010) is
            now translated to ASCII
            <span class="literal">minus</span>
            during indexing and searching. There is no good way to
            handle this character, given the varius misuses of minus
            and hyphen. This choice was deemed "less bad" than the
            previous one.</li>

        </ul>


    </div>
  </body>
</html>