From ed91113eabc917cbad693d02c0a0e14f09e9f72a Mon Sep 17 00:00:00 2001
From: Jean-Francois Dockes <jfd@recoll.org>
Date: Sat, 30 Mar 2013 17:14:40 +0100
Subject: [PATCH] use dblatex for producing the PDF doc. We could now go full
 XML

---
 src/doc/user/00README.txt    | 33 ++++++++------------
 src/doc/user/usermanual.sgml | 58 ++++++++++++++++++++----------------
 src/doc/user/xmlmake.sh      | 31 ++++++++++++-------
 3 files changed, 66 insertions(+), 56 deletions(-)
diff --git a/src/doc/user/00README.txt b/src/doc/user/00README.txt
index 3a4e38c0..892a3b85 100644
--- a/src/doc/user/00README.txt
+++ b/src/doc/user/00README.txt
@@ -1,9 +1,9 @@
 = Building the Recoll user manual
 
-The Recoll user manual usually used DocBook SGML and used the FreeBSD doc
-toolchain to produce the output formats. This had the advantage of an easy
-way to produce all formats including a PDF manual, but presented two
-problems:
+The Recoll user manual used to be written in DocBook SGML and used the
+FreeBSD doc toolchain to produce the output formats. This had the advantage
+of an easy way to produce all formats including a PDF manual, but presented
+two problems:
 
  - Dependancy on the FreeBSD platform.
  - No support for UTF-8 (last I looked), only latin1.
@@ -17,21 +17,14 @@ made was to make the anchors explicitly upper-case because the SGML
 toolchain converts them to upper-case and the XML one does not, so the only
 way to have compatibility is to make them upper-case in the first place.
 
-We still have a problem for producing the PDF manual, because few
-straightforward approaches seem to exist:
+We initially had a problem for producing the PDF manual, which motivated
+keeping the SGML version for producing the PDF with the FreeBSD SGML
+toolchain. This problem is now solved with dblatex, so that the SGML
+version now has little reason to persist and it will go away at some point
+in the future.
 
- -  http://docbookpublishing.com qui a meme une version programmatique (cf:
-     http://docbookpublishing.com/api/), mais necessite un peu de
-     configuration. 
- - FOP but this is Java and complicated.
+Asciidoc would also be a candidate as the source format, because it can
+easily produce docbook, so the future will probably be:
 
-See also http://www.valdyas.org/linguistics/printing_unicode.html 
-Does not look simple, but dates from 2002 and seems to imply that FOP is
-making progress.
-
-The current conclusion would seem to be that the SGML version should stay
-operational to give an easy way to make the PDF one on FreeBSD.
-
-But see also notes about dblatex on the asciidoc page. Actually asciidoc would 
-be a candidate replacement for the source format. 
-http://www.methods.co.nz/asciidoc/userguide.html
+asciidoc->docbook-xml-> html
+                     -> pdf
diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
index be5fcb21..43f13b64 100644
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -12,7 +12,7 @@
 <!ENTITY XAP "<application>Xapian</application>">
 <!ENTITY WIKI "http://bitbucket.org/medoc/recoll/wiki/">
 ]>
- 
+
 <book lang="en">
   
   <bookinfo>
@@ -1741,7 +1741,8 @@ fvwm
       <replaceable>*coll</replaceable>), the expansion can take quite
       a long time because the full index term list will have to be
       processed. The expansion is currently limited at 10000 results for
-      wildcards and regular expressions.</para>
+      wildcards and regular expressions. It is possible to change the
+      limit in the configuration file.</para>
       
       <para>Double-clicking on a term in the result list will insert
       it into the simple search entry field. You can also cut/paste
@@ -2504,7 +2505,7 @@ fvwm
       <command>konqueror</command>.</para>
 
       <para>This can be done by either explicitly inserting
-      <literal>&lt;a&nbsp;href="recoll:/..."&gt;</literal> links 
+      <literal><![CDATA[<a href="recoll://...">]]></literal> links 
       around some document areas, or automatically by adding a
       very small <application>javascript</application> program to the
       documents, like the following example, which would initiate a search by
@@ -3061,30 +3062,36 @@ dir:recoll dir:src -dir:utils -dir:common
         </listitem>
       </itemizedlist>
 
-      <para>You should be aware of a few things before using
+      <para>You should be aware of a few things when using
         wildcards.</para>
 
       <itemizedlist>
         <listitem><para>Using a wildcard character at the beginning of
-        a word can make for a slow search because &RCL; will have to
-        scan the whole index term list to find the matches.</para>
-        </listitem>
-          <listitem><para>When working with a raw index (preserving
-          character case and diacritics), the literal part of a wildcard
-          expression will be matched exactly for case and
-          diacritics.</para>
-          </listitem>
+            a word can make for a slow search because &RCL; will have to
+            scan the whole index term list to find the
+            matches. However, this is much less a problem for field
+            searches, and queries
+            like <replaceable>author:*@domain.com</replaceable> can
+            sometimes be very useful.</para></listitem>
+
+        <listitem><para>For &RCL; version 18 only, when working with a
+            raw index (preserving character case and diacritics), the
+            literal part of a wildcard expression will be matched
+            exactly for case and diacritics. This is not true any
+            more for versions 19 and later.</para></listitem>
+
         <listitem><para>Using a <literal>*</literal> at the end of a
-        word can produce more matches than you would think, and
-        strange search results. You can use the <link
-        linkend="RCL.SEARCH.GUI.TERMEXPLORER">term explorer</link> tool to
-        check what completions exist for a given term. You can also
-        see exactly what search was performed by clicking on the link
-        at the top of the result list. In general, for natural
-        language terms, stem expansion will produce better results
-        than an ending <literal>*</literal> (stem expansion is turned
-        off when any wildcard character appears in the term).</para>
-        </listitem>
+            word can produce more matches than you would think, and
+            strange search results. You can use the 
+            <link linkend="RCL.SEARCH.GUI.TERMEXPLORER">term 
+              explorer</link> tool to check what completions exist for
+            a given term. You can also see exactly what search was
+            performed by clicking on the link at the top of the result
+            list. In general, for natural language terms, stem
+            expansion will produce better results than an
+            ending <literal>*</literal> (stem expansion is turned off
+            when any wildcard character appears in the
+            term).</para></listitem> 
       </itemizedlist>
 
     </sect2> <!-- wildchars -->
@@ -4423,7 +4430,7 @@ except:
         <ulink url="mailto:jfd@recoll.org">I would
         very much welcome patches</ulink>.</para>
 
-      <para>Depending on the <application>Qt&nbsp;3</application>
+      <para>Depending on the <application>Qt 3</application>
       configuration on your system, you may have to set the
       <envar>QTDIR</envar> and <envar>QMAKESPECS</envar>
       variables in your environment:</para>
@@ -4448,7 +4455,8 @@ except:
 
 	<para>Neither <envar>QTDIR</envar> nor 
 	<envar>QMAKESPECS</envar> should be needed with 
-        Qt&nbsp;4, configuration details are entirely determined by 
+        <application>Qt 4</application>,
+        configuration details are entirely determined by  
 	<command>qmake</command> (which is quite often installed as 
 	<command>qmake-qt4</command>).</para> 
 
@@ -4769,7 +4777,7 @@ skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
               <para>Example of use for skipping text files only in a
               specific directory:</para>
               <programlisting>
-skippedPaths = ~/somedir/&lowast;.txt
+skippedPaths = ~/somedir/*.txt
               </programlisting>
             </listitem>
           </varlistentry>
diff --git a/src/doc/user/xmlmake.sh b/src/doc/user/xmlmake.sh
index 847823da..6e314149 100644
--- a/src/doc/user/xmlmake.sh
+++ b/src/doc/user/xmlmake.sh
@@ -1,16 +1,21 @@
 #!/bin/sh
 
 # A script to produce the Recoll manual with an xml toolchain.
+# Tools used:
+#  - xsltproc
+#  - The docbook-xsl styleets
+#  - dblatex for producing the PDF.
+#
 # Limitations:
 #   - Does not produce the links to the whole/chunked versions at the top
 #     of the document
-#   - The anchor names from the source text are converted to uppercase by
-#     the sgml toolchain. This does not happen with the xml toolchain,
-#     which means that external links like
-#     usermanual.html#RCL.CONFIG.INDEXING won't work because fragments are
-#     case-sensitive. This could be solved by converting all ids inside the
-#     source file to upper-case.
-#   - No simple way to produce pdf
+#   - The anchor names from the source text are converted to uppercase
+#     by the sgml toolchain. This does not happen with the xml
+#     toolchain, which means that external links like
+#     usermanual.html#RCL.CONFIG.INDEXING won't work because fragments
+#     are case-sensitive. This has been solved by converting all ids
+#     inside the source file to upper-case. DON'T REINTRODUCE
+#     lower-case IDS
 
 # Wherever docbook.xsl and chunk.xsl live
 # Fbsd
@@ -23,14 +28,15 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
 dochunky=1
 test $# -eq 1 && dochunky=0
 
-# Remove the SGML header and uncomment the XML one + convert from iso-8859-1
-# to utf-8
+# Remove the SGML header and uncomment the XML one. Also used to iconv
+# from iso-8859-1 to UTF-8, but the SGML manual is now UTF-8 ? Would
+# that work with the sgml toolchain ??
+echo '<?xml version="1.0" encoding="UTF-8"?>' > usermanual.xml
 sed -e '\!//FreeBSD//DTD!d' \
     -e '\!DTD DocBook XML!s/<!--//' \
     -e '\!/docbookx.dtd!s/-->//' \
     < usermanual.sgml \
-    | iconv -f iso-8859-1 -t utf-8 \
-    > usermanual.xml
+    >> usermanual.xml
 
 # Options common to the single-file and chunked versions
 commonoptions="--stringparam section.autolabel 1 \
@@ -59,3 +65,6 @@ eval xsltproc $commonoptions \
 
 tidy -indent usermanual-xml.html > tmpfile 
 mv -f tmpfile usermanual-xml.html
+
+# And the pdf with dblatex
+dblatex usermanual.xml