use dblatex for producing the PDF doc. We could now go full XML

2013-03-30 17:14:40 +01:00 · 2013-03-30 17:14:40 +01:00 · ed91113eab
commit ed91113eab
parent fc848f48ff
3 changed files with 66 additions and 56 deletions
--- a/src/doc/user/00README.txt
+++ b/src/doc/user/00README.txt
@ -1,9 +1,9 @@
 = Building the Recoll user manual

-The Recoll user manual usually used DocBook SGML and used the FreeBSD doc
-toolchain to produce the output formats. This had the advantage of an easy
-way to produce all formats including a PDF manual, but presented two
-problems:
+The Recoll user manual used to be written in DocBook SGML and used the
+FreeBSD doc toolchain to produce the output formats. This had the advantage
+of an easy way to produce all formats including a PDF manual, but presented
+two problems:

 - Dependancy on the FreeBSD platform.
 - No support for UTF-8 (last I looked), only latin1.
@ -17,21 +17,14 @@ made was to make the anchors explicitly upper-case because the SGML
 toolchain converts them to upper-case and the XML one does not, so the only
 way to have compatibility is to make them upper-case in the first place.

-We still have a problem for producing the PDF manual, because few
-straightforward approaches seem to exist:
+We initially had a problem for producing the PDF manual, which motivated
+keeping the SGML version for producing the PDF with the FreeBSD SGML
+toolchain. This problem is now solved with dblatex, so that the SGML
+version now has little reason to persist and it will go away at some point
+in the future.

- -  http://docbookpublishing.com qui a meme une version programmatique (cf:
-     http://docbookpublishing.com/api/), mais necessite un peu de
-     configuration. 
- - FOP but this is Java and complicated.
+Asciidoc would also be a candidate as the source format, because it can
+easily produce docbook, so the future will probably be:

-See also http://www.valdyas.org/linguistics/printing_unicode.html 
-Does not look simple, but dates from 2002 and seems to imply that FOP is
-making progress.
-
-The current conclusion would seem to be that the SGML version should stay
-operational to give an easy way to make the PDF one on FreeBSD.
-
-But see also notes about dblatex on the asciidoc page. Actually asciidoc would 
-be a candidate replacement for the source format. 
-http://www.methods.co.nz/asciidoc/userguide.html
+asciidoc->docbook-xml-> html
+                     -> pdf
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@ -1741,7 +1741,8 @@ fvwm
      <replaceable>*coll</replaceable>), the expansion can take quite
      a long time because the full index term list will have to be
      processed. The expansion is currently limited at 10000 results for
-      wildcards and regular expressions.</para>
+      wildcards and regular expressions. It is possible to change the
+      limit in the configuration file.</para>
      
      <para>Double-clicking on a term in the result list will insert
      it into the simple search entry field. You can also cut/paste
@ -2504,7 +2505,7 @@ fvwm
      <command>konqueror</command>.</para>

      <para>This can be done by either explicitly inserting
-      <literal>&lt;a&nbsp;href="recoll:/..."&gt;</literal> links 
+      <literal><![CDATA[<a href="recoll://...">]]></literal> links 
      around some document areas, or automatically by adding a
      very small <application>javascript</application> program to the
      documents, like the following example, which would initiate a search by
@ -3061,30 +3062,36 @@ dir:recoll dir:src -dir:utils -dir:common
        </listitem>
      </itemizedlist>

-      <para>You should be aware of a few things before using
+      <para>You should be aware of a few things when using
        wildcards.</para>

      <itemizedlist>
        <listitem><para>Using a wildcard character at the beginning of
            a word can make for a slow search because &RCL; will have to
-        scan the whole index term list to find the matches.</para>
-        </listitem>
-          <listitem><para>When working with a raw index (preserving
-          character case and diacritics), the literal part of a wildcard
-          expression will be matched exactly for case and
-          diacritics.</para>
-          </listitem>
+            scan the whole index term list to find the
+            matches. However, this is much less a problem for field
+            searches, and queries
+            like <replaceable>author:*@domain.com</replaceable> can
+            sometimes be very useful.</para></listitem>
+
+        <listitem><para>For &RCL; version 18 only, when working with a
+            raw index (preserving character case and diacritics), the
+            literal part of a wildcard expression will be matched
+            exactly for case and diacritics. This is not true any
+            more for versions 19 and later.</para></listitem>
+
        <listitem><para>Using a <literal>*</literal> at the end of a
            word can produce more matches than you would think, and
-        strange search results. You can use the <link
-        linkend="RCL.SEARCH.GUI.TERMEXPLORER">term explorer</link> tool to
-        check what completions exist for a given term. You can also
-        see exactly what search was performed by clicking on the link
-        at the top of the result list. In general, for natural
-        language terms, stem expansion will produce better results
-        than an ending <literal>*</literal> (stem expansion is turned
-        off when any wildcard character appears in the term).</para>
-        </listitem>
+            strange search results. You can use the 
+            <link linkend="RCL.SEARCH.GUI.TERMEXPLORER">term 
+              explorer</link> tool to check what completions exist for
+            a given term. You can also see exactly what search was
+            performed by clicking on the link at the top of the result
+            list. In general, for natural language terms, stem
+            expansion will produce better results than an
+            ending <literal>*</literal> (stem expansion is turned off
+            when any wildcard character appears in the
+            term).</para></listitem> 
      </itemizedlist>

    </sect2> <!-- wildchars -->
@ -4423,7 +4430,7 @@ except:
        <ulink url="mailto:jfd@recoll.org">I would
        very much welcome patches</ulink>.</para>

-      <para>Depending on the <application>Qt&nbsp;3</application>
+      <para>Depending on the <application>Qt 3</application>
      configuration on your system, you may have to set the
      <envar>QTDIR</envar> and <envar>QMAKESPECS</envar>
      variables in your environment:</para>
@ -4448,7 +4455,8 @@ except:

 	<para>Neither <envar>QTDIR</envar> nor 
 	<envar>QMAKESPECS</envar> should be needed with 
-        Qt&nbsp;4, configuration details are entirely determined by 
+        <application>Qt 4</application>,
+        configuration details are entirely determined by  
 	<command>qmake</command> (which is quite often installed as 
 	<command>qmake-qt4</command>).</para> 

@ -4769,7 +4777,7 @@ skippedNames = #* bin CVS  Cache cache* caughtspam  tmp .thumbnails .svn \
              <para>Example of use for skipping text files only in a
              specific directory:</para>
              <programlisting>
-skippedPaths = ~/somedir/&lowast;.txt
+skippedPaths = ~/somedir/*.txt
              </programlisting>
            </listitem>
          </varlistentry>
--- a/src/doc/user/xmlmake.sh
+++ b/src/doc/user/xmlmake.sh
@ -1,16 +1,21 @@
 #!/bin/sh

 # A script to produce the Recoll manual with an xml toolchain.
+# Tools used:
+#  - xsltproc
+#  - The docbook-xsl styleets
+#  - dblatex for producing the PDF.
+#
 # Limitations:
 #   - Does not produce the links to the whole/chunked versions at the top
 #     of the document
-#   - The anchor names from the source text are converted to uppercase by
-#     the sgml toolchain. This does not happen with the xml toolchain,
-#     which means that external links like
-#     usermanual.html#RCL.CONFIG.INDEXING won't work because fragments are
-#     case-sensitive. This could be solved by converting all ids inside the
-#     source file to upper-case.
-#   - No simple way to produce pdf
+#   - The anchor names from the source text are converted to uppercase
+#     by the sgml toolchain. This does not happen with the xml
+#     toolchain, which means that external links like
+#     usermanual.html#RCL.CONFIG.INDEXING won't work because fragments
+#     are case-sensitive. This has been solved by converting all ids
+#     inside the source file to upper-case. DON'T REINTRODUCE
+#     lower-case IDS

 # Wherever docbook.xsl and chunk.xsl live
 # Fbsd
@ -23,14 +28,15 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
 dochunky=1
 test $# -eq 1 && dochunky=0

-# Remove the SGML header and uncomment the XML one + convert from iso-8859-1
-# to utf-8
+# Remove the SGML header and uncomment the XML one. Also used to iconv
+# from iso-8859-1 to UTF-8, but the SGML manual is now UTF-8 ? Would
+# that work with the sgml toolchain ??
+echo '<?xml version="1.0" encoding="UTF-8"?>' > usermanual.xml
 sed -e '\!//FreeBSD//DTD!d' \
    -e '\!DTD DocBook XML!s/<!--//' \
    -e '\!/docbookx.dtd!s/-->//' \
    < usermanual.sgml \
-    | iconv -f iso-8859-1 -t utf-8 \
-    > usermanual.xml
+    >> usermanual.xml

 # Options common to the single-file and chunked versions
 commonoptions="--stringparam section.autolabel 1 \
@ -59,3 +65,6 @@ eval xsltproc $commonoptions \

 tidy -indent usermanual-xml.html > tmpfile 
 mv -f tmpfile usermanual-xml.html
+
+# And the pdf with dblatex
+dblatex usermanual.xml