From ed91113eabc917cbad693d02c0a0e14f09e9f72a Mon Sep 17 00:00:00 2001 From: Jean-Francois Dockes Date: Sat, 30 Mar 2013 17:14:40 +0100 Subject: [PATCH] use dblatex for producing the PDF doc. We could now go full XML --- src/doc/user/00README.txt | 33 ++++++++------------ src/doc/user/usermanual.sgml | 58 ++++++++++++++++++++---------------- src/doc/user/xmlmake.sh | 31 ++++++++++++------- 3 files changed, 66 insertions(+), 56 deletions(-) diff --git a/src/doc/user/00README.txt b/src/doc/user/00README.txt index 3a4e38c0..892a3b85 100644 --- a/src/doc/user/00README.txt +++ b/src/doc/user/00README.txt @@ -1,9 +1,9 @@ = Building the Recoll user manual -The Recoll user manual usually used DocBook SGML and used the FreeBSD doc -toolchain to produce the output formats. This had the advantage of an easy -way to produce all formats including a PDF manual, but presented two -problems: +The Recoll user manual used to be written in DocBook SGML and used the +FreeBSD doc toolchain to produce the output formats. This had the advantage +of an easy way to produce all formats including a PDF manual, but presented +two problems: - Dependancy on the FreeBSD platform. - No support for UTF-8 (last I looked), only latin1. @@ -17,21 +17,14 @@ made was to make the anchors explicitly upper-case because the SGML toolchain converts them to upper-case and the XML one does not, so the only way to have compatibility is to make them upper-case in the first place. -We still have a problem for producing the PDF manual, because few -straightforward approaches seem to exist: +We initially had a problem for producing the PDF manual, which motivated +keeping the SGML version for producing the PDF with the FreeBSD SGML +toolchain. This problem is now solved with dblatex, so that the SGML +version now has little reason to persist and it will go away at some point +in the future. - - http://docbookpublishing.com qui a meme une version programmatique (cf: - http://docbookpublishing.com/api/), mais necessite un peu de - configuration. - - FOP but this is Java and complicated. +Asciidoc would also be a candidate as the source format, because it can +easily produce docbook, so the future will probably be: -See also http://www.valdyas.org/linguistics/printing_unicode.html -Does not look simple, but dates from 2002 and seems to imply that FOP is -making progress. - -The current conclusion would seem to be that the SGML version should stay -operational to give an easy way to make the PDF one on FreeBSD. - -But see also notes about dblatex on the asciidoc page. Actually asciidoc would -be a candidate replacement for the source format. -http://www.methods.co.nz/asciidoc/userguide.html +asciidoc->docbook-xml-> html + -> pdf diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index be5fcb21..43f13b64 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -12,7 +12,7 @@ Xapian"> ]> - + @@ -1741,7 +1741,8 @@ fvwm *coll), the expansion can take quite a long time because the full index term list will have to be processed. The expansion is currently limited at 10000 results for - wildcards and regular expressions. + wildcards and regular expressions. It is possible to change the + limit in the configuration file. Double-clicking on a term in the result list will insert it into the simple search entry field. You can also cut/paste @@ -2504,7 +2505,7 @@ fvwm konqueror. This can be done by either explicitly inserting - <a href="recoll:/..."> links + ]]> links around some document areas, or automatically by adding a very small javascript program to the documents, like the following example, which would initiate a search by @@ -3061,30 +3062,36 @@ dir:recoll dir:src -dir:utils -dir:common - You should be aware of a few things before using + You should be aware of a few things when using wildcards. Using a wildcard character at the beginning of - a word can make for a slow search because &RCL; will have to - scan the whole index term list to find the matches. - - When working with a raw index (preserving - character case and diacritics), the literal part of a wildcard - expression will be matched exactly for case and - diacritics. - + a word can make for a slow search because &RCL; will have to + scan the whole index term list to find the + matches. However, this is much less a problem for field + searches, and queries + like author:*@domain.com can + sometimes be very useful. + + For &RCL; version 18 only, when working with a + raw index (preserving character case and diacritics), the + literal part of a wildcard expression will be matched + exactly for case and diacritics. This is not true any + more for versions 19 and later. + Using a * at the end of a - word can produce more matches than you would think, and - strange search results. You can use the term explorer tool to - check what completions exist for a given term. You can also - see exactly what search was performed by clicking on the link - at the top of the result list. In general, for natural - language terms, stem expansion will produce better results - than an ending * (stem expansion is turned - off when any wildcard character appears in the term). - + word can produce more matches than you would think, and + strange search results. You can use the + term + explorer tool to check what completions exist for + a given term. You can also see exactly what search was + performed by clicking on the link at the top of the result + list. In general, for natural language terms, stem + expansion will produce better results than an + ending * (stem expansion is turned off + when any wildcard character appears in the + term). @@ -4423,7 +4430,7 @@ except: I would very much welcome patches. - Depending on the Qt 3 + Depending on the Qt 3 configuration on your system, you may have to set the QTDIR and QMAKESPECS variables in your environment: @@ -4448,7 +4455,8 @@ except: Neither QTDIR nor QMAKESPECS should be needed with - Qt 4, configuration details are entirely determined by + Qt 4, + configuration details are entirely determined by qmake (which is quite often installed as qmake-qt4). @@ -4769,7 +4777,7 @@ skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ Example of use for skipping text files only in a specific directory: -skippedPaths = ~/somedir/∗.txt +skippedPaths = ~/somedir/*.txt diff --git a/src/doc/user/xmlmake.sh b/src/doc/user/xmlmake.sh index 847823da..6e314149 100644 --- a/src/doc/user/xmlmake.sh +++ b/src/doc/user/xmlmake.sh @@ -1,16 +1,21 @@ #!/bin/sh # A script to produce the Recoll manual with an xml toolchain. +# Tools used: +# - xsltproc +# - The docbook-xsl styleets +# - dblatex for producing the PDF. +# # Limitations: # - Does not produce the links to the whole/chunked versions at the top # of the document -# - The anchor names from the source text are converted to uppercase by -# the sgml toolchain. This does not happen with the xml toolchain, -# which means that external links like -# usermanual.html#RCL.CONFIG.INDEXING won't work because fragments are -# case-sensitive. This could be solved by converting all ids inside the -# source file to upper-case. -# - No simple way to produce pdf +# - The anchor names from the source text are converted to uppercase +# by the sgml toolchain. This does not happen with the xml +# toolchain, which means that external links like +# usermanual.html#RCL.CONFIG.INDEXING won't work because fragments +# are case-sensitive. This has been solved by converting all ids +# inside the source file to upper-case. DON'T REINTRODUCE +# lower-case IDS # Wherever docbook.xsl and chunk.xsl live # Fbsd @@ -23,14 +28,15 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/" dochunky=1 test $# -eq 1 && dochunky=0 -# Remove the SGML header and uncomment the XML one + convert from iso-8859-1 -# to utf-8 +# Remove the SGML header and uncomment the XML one. Also used to iconv +# from iso-8859-1 to UTF-8, but the SGML manual is now UTF-8 ? Would +# that work with the sgml toolchain ?? +echo '' > usermanual.xml sed -e '\!//FreeBSD//DTD!d' \ -e '\!DTD DocBook XML!s///' \ < usermanual.sgml \ - | iconv -f iso-8859-1 -t utf-8 \ - > usermanual.xml + >> usermanual.xml # Options common to the single-file and chunked versions commonoptions="--stringparam section.autolabel 1 \ @@ -59,3 +65,6 @@ eval xsltproc $commonoptions \ tidy -indent usermanual-xml.html > tmpfile mv -f tmpfile usermanual-xml.html + +# And the pdf with dblatex +dblatex usermanual.xml