use dblatex for producing the PDF doc. We could now go full XML

2013-03-30 17:14:40 +01:00 · 2013-03-30 17:14:40 +01:00 · ed91113eab
commit ed91113eab
parent fc848f48ff
3 changed files with 66 additions and 56 deletions
--- a/src/doc/user/00README.txt
+++ b/src/doc/user/00README.txt
@ -1,9 +1,9 @@
 = Building the Recoll user manual
-The Recoll user manual usually used DocBook SGML and used the FreeBSD doc
+The Recoll user manual used to be written in DocBook SGML and used the
-toolchain to produce the output formats. This had the advantage of an easy
+FreeBSD doc toolchain to produce the output formats. This had the advantage
-way to produce all formats including a PDF manual, but presented two
+of an easy way to produce all formats including a PDF manual, but presented
-problems:
+two problems:
 - Dependancy on the FreeBSD platform.
 - No support for UTF-8 (last I looked), only latin1.
@ -17,21 +17,14 @@ made was to make the anchors explicitly upper-case because the SGML
 toolchain converts them to upper-case and the XML one does not, so the only
 way to have compatibility is to make them upper-case in the first place.
-We still have a problem for producing the PDF manual, because few
+We initially had a problem for producing the PDF manual, which motivated
-straightforward approaches seem to exist:
+keeping the SGML version for producing the PDF with the FreeBSD SGML
 toolchain. This problem is now solved with dblatex, so that the SGML
 version now has little reason to persist and it will go away at some point
 in the future.
- -  http://docbookpublishing.com qui a meme une version programmatique (cf:
+Asciidoc would also be a candidate as the source format, because it can
-     http://docbookpublishing.com/api/), mais necessite un peu de
+easily produce docbook, so the future will probably be:
     configuration. 
 - FOP but this is Java and complicated.
-See also http://www.valdyas.org/linguistics/printing_unicode.html 
+asciidoc->docbook-xml-> html
-Does not look simple, but dates from 2002 and seems to imply that FOP is
+                     -> pdf
 making progress.
 The current conclusion would seem to be that the SGML version should stay
 operational to give an easy way to make the PDF one on FreeBSD.
 But see also notes about dblatex on the asciidoc page. Actually asciidoc would 
 be a candidate replacement for the source format. 
 http://www.methods.co.nz/asciidoc/userguide.html
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@ -12,7 +12,7 @@
 <!ENTITY XAP "<application>Xapian</application>">
 <!ENTITY WIKI "http://bitbucket.org/medoc/recoll/wiki/">
 ]>
- 
+
 <book lang="en">
  <bookinfo>
@ -1741,7 +1741,8 @@ fvwm
      <replaceable>*coll</replaceable>), the expansion can take quite
      a long time because the full index term list will have to be
      processed. The expansion is currently limited at 10000 results for
-      wildcards and regular expressions.</para>
+      wildcards and regular expressions. It is possible to change the
      limit in the configuration file.</para>
      <para>Double-clicking on a term in the result list will insert
      it into the simple search entry field. You can also cut/paste
@ -2504,7 +2505,7 @@ fvwm
      <command>konqueror</command>.</para>
      <para>This can be done by either explicitly inserting
-      <literal>&lt;a&nbsp;href="recoll:/..."&gt;</literal> links 
+      <literal><![CDATA[<a href="recoll://...">]]></literal> links 
      around some document areas, or automatically by adding a
      very small <application>javascript</application> program to the
      documents, like the following example, which would initiate a search by
@ -3061,30 +3062,36 @@ dir:recoll dir:src -dir:utils -dir:common
        </listitem>
      </itemizedlist>
-      <para>You should be aware of a few things before using
+      <para>You should be aware of a few things when using
        wildcards.</para>
      <itemizedlist>
        <listitem><para>Using a wildcard character at the beginning of
-        a word can make for a slow search because &RCL; will have to
+            a word can make for a slow search because &RCL; will have to
-        scan the whole index term list to find the matches.</para>
+            scan the whole index term list to find the
-        </listitem>
+            matches. However, this is much less a problem for field
-          <listitem><para>When working with a raw index (preserving
+            searches, and queries
-          character case and diacritics), the literal part of a wildcard
+            like <replaceable>author:*@domain.com</replaceable> can
-          expression will be matched exactly for case and
+            sometimes be very useful.</para></listitem>
-          diacritics.</para>
+
-          </listitem>
+        <listitem><para>For &RCL; version 18 only, when working with a
            raw index (preserving character case and diacritics), the
            literal part of a wildcard expression will be matched
            exactly for case and diacritics. This is not true any
            more for versions 19 and later.</para></listitem>
        <listitem><para>Using a <literal>*</literal> at the end of a
-        word can produce more matches than you would think, and
+            word can produce more matches than you would think, and
-        strange search results. You can use the <link
+            strange search results. You can use the 
-        linkend="RCL.SEARCH.GUI.TERMEXPLORER">term explorer</link> tool to
+            <link linkend="RCL.SEARCH.GUI.TERMEXPLORER">term 
-        check what completions exist for a given term. You can also
+              explorer</link> tool to check what completions exist for
-        see exactly what search was performed by clicking on the link
+            a given term. You can also see exactly what search was
-        at the top of the result list. In general, for natural
+            performed by clicking on the link at the top of the result
-        language terms, stem expansion will produce better results
+            list. In general, for natural language terms, stem
-        than an ending <literal>*</literal> (stem expansion is turned
+            expansion will produce better results than an
-        off when any wildcard character appears in the term).</para>
+            ending <literal>*</literal> (stem expansion is turned off
-        </listitem>
+            when any wildcard character appears in the
            term).</para></listitem> 
      </itemizedlist>
    </sect2> <!-- wildchars -->
@ -4423,7 +4430,7 @@ except:
        <ulink url="mailto:jfd@recoll.org">I would
        very much welcome patches</ulink>.</para>
-      <para>Depending on the <application>Qt&nbsp;3</application>
+      <para>Depending on the <application>Qt 3</application>
      configuration on your system, you may have to set the
      <envar>QTDIR</envar> and <envar>QMAKESPECS</envar>
      variables in your environment:</para>
@ -4448,7 +4455,8 @@ except:
 	<para>Neither <envar>QTDIR</envar> nor 
 	<envar>QMAKESPECS</envar> should be needed with 
-        Qt&nbsp;4, configuration details are entirely determined by 
+        <application>Qt 4</application>,
        configuration details are entirely determined by  
 	<command>qmake</command> (which is quite often installed as 
 	<command>qmake-qt4</command>).</para> 
@ -4769,7 +4777,7 @@ skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
              <para>Example of use for skipping text files only in a
              specific directory:</para>
              <programlisting>
-skippedPaths = ~/somedir/&lowast;.txt
+skippedPaths = ~/somedir/*.txt
              </programlisting>
            </listitem>
          </varlistentry>
--- a/src/doc/user/xmlmake.sh
+++ b/src/doc/user/xmlmake.sh
@ -1,16 +1,21 @@
 #!/bin/sh
 # A script to produce the Recoll manual with an xml toolchain.
 # Tools used:
 #  - xsltproc
 #  - The docbook-xsl styleets
 #  - dblatex for producing the PDF.
 #
 # Limitations:
 #   - Does not produce the links to the whole/chunked versions at the top
 #     of the document
-#   - The anchor names from the source text are converted to uppercase by
+#   - The anchor names from the source text are converted to uppercase
-#     the sgml toolchain. This does not happen with the xml toolchain,
+#     by the sgml toolchain. This does not happen with the xml
-#     which means that external links like
+#     toolchain, which means that external links like
-#     usermanual.html#RCL.CONFIG.INDEXING won't work because fragments are
+#     usermanual.html#RCL.CONFIG.INDEXING won't work because fragments
-#     case-sensitive. This could be solved by converting all ids inside the
+#     are case-sensitive. This has been solved by converting all ids
-#     source file to upper-case.
+#     inside the source file to upper-case. DON'T REINTRODUCE
-#   - No simple way to produce pdf
+#     lower-case IDS
 # Wherever docbook.xsl and chunk.xsl live
 # Fbsd
@ -23,14 +28,15 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
 dochunky=1
 test $# -eq 1 && dochunky=0
-# Remove the SGML header and uncomment the XML one + convert from iso-8859-1
+# Remove the SGML header and uncomment the XML one. Also used to iconv
-# to utf-8
+# from iso-8859-1 to UTF-8, but the SGML manual is now UTF-8 ? Would
 # that work with the sgml toolchain ??
 echo '<?xml version="1.0" encoding="UTF-8"?>' > usermanual.xml
 sed -e '\!//FreeBSD//DTD!d' \
    -e '\!DTD DocBook XML!s/<!--//' \
    -e '\!/docbookx.dtd!s/-->//' \
    < usermanual.sgml \
-    | iconv -f iso-8859-1 -t utf-8 \
+    >> usermanual.xml
    > usermanual.xml
 # Options common to the single-file and chunked versions
 commonoptions="--stringparam section.autolabel 1 \
@ -59,3 +65,6 @@ eval xsltproc $commonoptions \
 tidy -indent usermanual-xml.html > tmpfile 
 mv -f tmpfile usermanual-xml.html
 # And the pdf with dblatex
 dblatex usermanual.xml