doc
This commit is contained in:
parent
4bb258caab
commit
22ad89b555
3 changed files with 144 additions and 172 deletions
|
@ -1,5 +1,37 @@
|
|||
# Wherever docbook.xsl and chunk.xsl live
|
||||
# Fbsd
|
||||
#XSLDIR="/usr/local/share/xsl/docbook/"
|
||||
# Mac
|
||||
#XSLDIR="/opt/local/share/xsl/docbook-xsl/"
|
||||
#Linux
|
||||
XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
|
||||
|
||||
|
||||
# Options common to the single-file and chunked versions
|
||||
commonoptions=--stringparam section.autolabel 1 \
|
||||
--stringparam section.autolabel.max.depth 3 \
|
||||
--stringparam section.label.includes.component.label 1 \
|
||||
--stringparam autotoc.label.in.hyperlink 0 \
|
||||
--stringparam abstract.notitle.enabled 1 \
|
||||
--stringparam html.stylesheet docbook-xsl.css \
|
||||
--stringparam generate.toc "book toc,title,figure,table,example,equation"
|
||||
|
||||
|
||||
all: usermanual.html index.html usermanual.pdf
|
||||
|
||||
usermanual.html: usermanual.xml
|
||||
sh xmlmake.sh
|
||||
xsltproc ${commonoptions} \
|
||||
-o tmpfile.html "${XSLDIR}/html/docbook.xsl" usermanual.xml
|
||||
-tidy -indent tmpfile.html > usermanual.html
|
||||
|
||||
index.html: usermanual.xml
|
||||
xsltproc ${commonoptions} \
|
||||
--stringparam use.id.as.filename 1 \
|
||||
--stringparam root.filename index \
|
||||
"${XSLDIR}/html/chunk.xsl" usermanual.xml
|
||||
|
||||
usermanual.pdf: usermanual.xml
|
||||
dblatex usermanual.xml
|
||||
|
||||
clean:
|
||||
rm -f RCL.*.html usermanual.pdf usermanual.html index.html
|
||||
rm -f RCL.*.html usermanual.pdf usermanual.html index.html tmpfile.html
|
||||
|
|
|
@ -39,9 +39,6 @@
|
|||
<para>This document introduces full text search notions
|
||||
and describes the installation and use of the &RCL;
|
||||
application. It currently describes &RCL; &RCLVERSION;.</para>
|
||||
<!-- <para>[ <ulink url="index.html">Split HTML</ulink> /
|
||||
<ulink url="usermanual-xml.html">Single HTML</ulink> ]</para>
|
||||
-->
|
||||
</abstract>
|
||||
|
||||
|
||||
|
@ -141,7 +138,7 @@
|
|||
<para>&RCL; stores all internal data in <application>Unicode
|
||||
UTF-8</application> format, and it can index files with
|
||||
different character sets, encodings, and languages into the same
|
||||
index. It has input filters for many document types.</para>
|
||||
index. It has can process many document types.</para>
|
||||
|
||||
<para>Stemming is the process by which &RCL; reduces words to
|
||||
their radicals so that searching does not depend, for example, on a
|
||||
|
@ -381,9 +378,9 @@
|
|||
patterns to the <literal>skippedNames</literal> list, which
|
||||
can be done from the GUI Index configuration menu. It is
|
||||
also possible to exclude a mime type independantly of the
|
||||
file name by associating it with
|
||||
the <filename>rclnull</filename> filter. This can be done by
|
||||
editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
||||
file name by associating it with the
|
||||
<filename>rclnull</filename> input handler. This can be done
|
||||
by editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
||||
<filename>mimeconf</filename> configuration
|
||||
file</link>.</para>
|
||||
|
||||
|
@ -2463,7 +2460,7 @@ fs.inotify.max_user_watches=32768
|
|||
and <literal>filename</literal>), so this feature will need
|
||||
some custom local configuration to be useful. An example
|
||||
candidate would be the <literal>recipient</literal> field
|
||||
which is generated by the message filters.</para>
|
||||
which is generated by the message input handlers.</para>
|
||||
|
||||
<para>The default value for the paragraph format string is:
|
||||
<screen><![CDATA[
|
||||
|
@ -2961,7 +2958,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
<link linkend="RCL.SEARCH.WILDCARDS">
|
||||
More about wildcards</link>.</para>
|
||||
|
||||
<para>The document filters used while indexing have the
|
||||
<para>The document input handlers used while indexing have the
|
||||
possibility to create other fields with arbitrary names, and
|
||||
aliases may be defined in the configuration, so that the exact
|
||||
field search possibilities may be different for you if someone
|
||||
|
@ -3293,7 +3290,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
<application>Python</application> language.</para>
|
||||
|
||||
<para>Another less radical way to extend the application is to
|
||||
write filters for new types of documents.</para>
|
||||
write input handlers for new types of documents.</para>
|
||||
|
||||
<para>The processing of metadata attributes for documents
|
||||
(<literal>fields</literal>) is highly configurable.</para>
|
||||
|
@ -3301,69 +3298,77 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
|
||||
|
||||
<sect1 id="RCL.PROGRAM.FILTERS">
|
||||
<title>Writing a document filter</title>
|
||||
<title>Writing a document input handler</title>
|
||||
|
||||
<note><title>Terminology</title>The small programs or pieces
|
||||
of code which handle the processing of the different document
|
||||
types for &RCL; used to be called <literal>filters</literal>,
|
||||
which is still reflected in the name of the directory which
|
||||
holds them and many configuration variables. They were named
|
||||
this way because one of their primary functions is to filter
|
||||
out the formatting directives and keep the text
|
||||
content. However these modules may have other behaviours, and
|
||||
the term <literal>input handler</literal> is now progressively
|
||||
substituted in the documentation. <literal>filter</literal> is
|
||||
still used in many places though.</note>
|
||||
|
||||
<para>&RCL; filters cooperate to translate from the multitude
|
||||
<para>&RCL; input handlers cooperate to translate from the multitude
|
||||
of input document formats, simple ones
|
||||
as <application>opendocument</application>,
|
||||
<application>acrobat</application>), or compound ones such
|
||||
as <application>Zip</application>
|
||||
or <application>Email</application>, into the final &RCL;
|
||||
indexing input format, which may
|
||||
be <literal>text/plain</literal>
|
||||
or <literal>text/html</literal>. Most filters are executable
|
||||
programs or scripts. A few filters are coded in C++ and live
|
||||
indexing input format, which is plain text.
|
||||
Most input handlers are executable
|
||||
programs or scripts. A few handlers are coded in C++ and live
|
||||
inside <command>recollindex</command>. This latter kind will not
|
||||
be described here.</para>
|
||||
|
||||
<para>There are currently (1.18 and since 1.13) two kinds of
|
||||
external executable filters:
|
||||
external executable input handlers:
|
||||
<itemizedlist>
|
||||
<listitem><para>Simple filters (<literal>exec</literal>
|
||||
filters) run once and
|
||||
exit. They can be bare programs
|
||||
like <application>antiword</application>, or scripts
|
||||
using other programs. They are very simple to write,
|
||||
because they just need to print the converted document
|
||||
to the standard output. Their output can
|
||||
be <literal>text/plain</literal>
|
||||
or <literal>text/html</literal>.</para>
|
||||
<listitem><para>Simple <literal>exec</literal> handlers
|
||||
run once and exit. They can be bare programs like
|
||||
<command>antiword</command>, or scripts using other
|
||||
programs. They are very simple to write, because they just
|
||||
need to print the converted document to the standard
|
||||
output. Their output can be plain text or HTML. HTML is
|
||||
usually preferred because it can store metadata fields and
|
||||
it allows preserving some of the formatting for the GUI
|
||||
preview.</para>
|
||||
</listitem>
|
||||
<listitem><para>Multiple filters (<literal>execm</literal>
|
||||
filters), run as long as
|
||||
their master process (<command>recollindex</command>) is
|
||||
active. They can process multiple files (sparing the
|
||||
process startup time which can be very significant),
|
||||
or multiple documents per file (e.g.: for zip or chm
|
||||
files). They communicate with the indexer through a
|
||||
simple protocol, but are nevertheless a bit more
|
||||
complicated than the older kind. Most of new
|
||||
filters are written
|
||||
in <application>Python</application>, using a common
|
||||
module to handle the protocol. There is an
|
||||
exception, <command>rclimg</command> which is written
|
||||
in Perl. The subdocuments output by these filters can
|
||||
be directly indexable (text or HTML), or they can be
|
||||
other simple or compound documents that will need to
|
||||
be processed by another filter.</para>
|
||||
<listitem><para>Multiple <literal>execm</literal> handlers
|
||||
can process multiple files (sparing the process startup
|
||||
time which can be very significant), or multiple documents
|
||||
per file (e.g.: for <application>zip</application> or
|
||||
<application>chm</application> files). They communicate
|
||||
with the indexer through a simple protocol, but are
|
||||
nevertheless a bit more complicated than the older
|
||||
kind. Most of new handlers are written in
|
||||
<application>Python</application>, using a common module
|
||||
to handle the protocol. There is an exception,
|
||||
<command>rclimg</command> which is written in Perl. The
|
||||
subdocuments output by these handlers can be directly
|
||||
indexable (text or HTML), or they can be other simple or
|
||||
compound documents that will need to be processed by
|
||||
another handler.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>In both cases, filters deal with regular file system
|
||||
<para>In both cases, handlers deal with regular file system
|
||||
files, and can process either a single document, or a
|
||||
linear list of documents in each file. &RCL; is responsible
|
||||
for performing up to date checks, deal with more complex
|
||||
embedding and other upper level issues.</para>
|
||||
|
||||
<para>In the extreme case of a simple filter returning a
|
||||
document in <literal>text/plain</literal> format, no
|
||||
metadata can be transferred from the filter to the
|
||||
indexer. Generic metadata, like document size or
|
||||
modification date, will be gathered and stored by the
|
||||
indexer.</para>
|
||||
<para>A simple handler returning a
|
||||
document in <literal>text/plain</literal> format, can transfer
|
||||
no metadata to the indexer. Generic metadata, like document
|
||||
size or modification date, will be gathered and stored by
|
||||
the indexer.</para>
|
||||
|
||||
<para>Filters that produce <literal>text/html</literal>
|
||||
<para>Handlers that produce <literal>text/html</literal>
|
||||
format can return an arbitrary amount of metadata inside HTML
|
||||
<literal>meta</literal> tags. These will be processed
|
||||
according to the directives found in
|
||||
|
@ -3371,7 +3376,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
<filename>fields</filename> configuration
|
||||
file</link>.</para>
|
||||
|
||||
<para>The filters that can handle multiple documents per file
|
||||
<para>The handlers that can handle multiple documents per file
|
||||
return a single piece of data to identify each document inside
|
||||
the file. This piece of data, called
|
||||
an <literal>ipath element</literal> will be sent back by
|
||||
|
@ -3380,27 +3385,27 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
viewer.</para>
|
||||
|
||||
<para>The following section describes the simple
|
||||
filters, and the next one gives a few explanations about
|
||||
handlers, and the next one gives a few explanations about
|
||||
the <literal>execm</literal> ones. You could conceivably
|
||||
write a simple filter with only the elements in the
|
||||
write a simple handler with only the elements in the
|
||||
manual. This will not be the case for the other ones, for
|
||||
which you will have to look at the code.</para>
|
||||
|
||||
<sect2 id="RCL.PROGRAM.FILTERS.SIMPLE">
|
||||
<title>Simple filters</title>
|
||||
<title>Simple input handlers</title>
|
||||
|
||||
<para>&RCL; simple filters are usually shell-scripts, but this is in
|
||||
<para>&RCL; simple handlers are usually shell-scripts, but this is in
|
||||
no way necessary. Extracting the text from the native format is the
|
||||
difficult part. Outputting the format expected by &RCL; is
|
||||
trivial. Happily enough, most document formats have translators or
|
||||
text extractors which can be called from the filter. In some cases
|
||||
text extractors which can be called from the handler. In some cases
|
||||
the output of the translating program is completely appropriate,
|
||||
and no intermediate shell-script is needed.</para>
|
||||
|
||||
<para>Filters are called with a single argument which is the
|
||||
<para>Input handlers are called with a single argument which is the
|
||||
source file name. They should output the result to stdout.</para>
|
||||
|
||||
<para>When writing a filter, you should decide if it will output
|
||||
<para>When writing a handler, you should decide if it will output
|
||||
plain text or HTML. Plain text is simpler, but you will not be able
|
||||
to add metadata or vary the output character encoding (this will be
|
||||
defined in a configuration file). Additionally, some formatting may
|
||||
|
@ -3411,25 +3416,25 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
|
||||
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
|
||||
variable (values <literal>yes</literal>, <literal>no</literal>)
|
||||
tells the filter if the operation is for indexing or
|
||||
previewing. Some filters use this to output a slightly different
|
||||
tells the handler if the operation is for indexing or
|
||||
previewing. Some handlers use this to output a slightly different
|
||||
format, for example stripping uninteresting repeated keywords (ie:
|
||||
<literal>Subject:</literal> for email) when indexing. This is not
|
||||
essential.</para>
|
||||
|
||||
<para>You should look at one of the simple filters, for example
|
||||
<para>You should look at one of the simple handlers, for example
|
||||
<command>rclps</command> for a starting point.</para>
|
||||
|
||||
<para>Don't forget to make your filter executable before
|
||||
<para>Don't forget to make your handler executable before
|
||||
testing !</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="RCL.PROGRAM.FILTERS.MULTIPLE">
|
||||
<title>"Multiple" filters</title>
|
||||
<title>"Multiple" handlers</title>
|
||||
|
||||
<para>If you can program and want to write
|
||||
an <literal>execm</literal> filter, it should not be too
|
||||
an <literal>execm</literal> handler, it should not be too
|
||||
difficult to make sense of one of the existing modules. For
|
||||
example, look at <command>rclzip</command> which uses Zip
|
||||
file paths as identifiers (<literal>ipath</literal>),
|
||||
|
@ -3438,7 +3443,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
the <filename>internfile/mh_execm.h</filename> file and
|
||||
possibly at the corresponding module.</para>
|
||||
|
||||
<para><literal>execm</literal> filters sometimes need to make
|
||||
<para><literal>execm</literal> handlers sometimes need to make
|
||||
a choice for the nature of the <literal>ipath</literal>
|
||||
elements that they use in communication with the
|
||||
indexer. Here are a few guidelines:
|
||||
|
@ -3453,16 +3458,16 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
separator to store a complex path internally (for
|
||||
deeper embedding). Colons inside
|
||||
the <literal>ipath</literal> elements output by a
|
||||
filter will be escaped, but would be a bad choice as a
|
||||
filter-specific separator (mostly, again, for
|
||||
handler will be escaped, but would be a bad choice as a
|
||||
handler-specific separator (mostly, again, for
|
||||
debugging issues).</para></listitem>
|
||||
</itemizedlist>
|
||||
In any case, the main goal is that it should
|
||||
be easy for the filter to extract the target document, given
|
||||
be easy for the handler to extract the target document, given
|
||||
the file name and the <literal>ipath</literal>
|
||||
element.</para>
|
||||
|
||||
<para><literal>execm</literal> filters will also produce
|
||||
<para><literal>execm</literal> handlers will also produce
|
||||
a document with a null <literal>ipath</literal>
|
||||
element. Depending on the type of document, this may have
|
||||
some associated data (e.g. the body of an email message), or
|
||||
|
@ -3472,11 +3477,11 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
</sect2>
|
||||
|
||||
<sect2 id="RCL.PROGRAM.FILTERS.ASSOCIATION">
|
||||
<title>Telling &RCL; about the filter</title>
|
||||
<title>Telling &RCL; about the handler</title>
|
||||
|
||||
<para>There are two elements that link a file to the filter which
|
||||
<para>There are two elements that link a file to the handler which
|
||||
should process it: the association of file to mime type and the
|
||||
association of a mime type with a filter.</para>
|
||||
association of a mime type with a handler.</para>
|
||||
|
||||
<para>The association of files to mime types is mostly based on
|
||||
name suffixes. The types are defined inside the
|
||||
|
@ -3490,7 +3495,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||
to execute the <command>file -i</command> command to determine a
|
||||
mime type.</para>
|
||||
|
||||
<para>The association of file types to filters is performed in
|
||||
<para>The association of file types to handlers is performed in
|
||||
the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
||||
<filename>mimeconf</filename> file</link>. A sample will probably be
|
||||
of better help than a long explanation:</para>
|
||||
|
@ -3532,7 +3537,7 @@ application/x-chm = execm rclchm
|
|||
<command>unrtf</command> in the HTML header section.</para>
|
||||
</listitem>
|
||||
<listitem><para><literal>application/x-chm</literal> is processed
|
||||
by a persistant filter. This is determined by the
|
||||
by a persistant handler. This is determined by the
|
||||
<literal>execm</literal> keyword.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
@ -3541,7 +3546,7 @@ application/x-chm = execm rclchm
|
|||
</sect2>
|
||||
|
||||
<sect2 id="RCL.PROGRAM.FILTERS.HTML">
|
||||
<title>Filter HTML output</title>
|
||||
<title>Input handler HTML output</title>
|
||||
|
||||
<para>The output HTML could be very minimal like the following
|
||||
example:
|
||||
|
@ -3607,7 +3612,7 @@ or
|
|||
</programlisting>
|
||||
</para>
|
||||
|
||||
<para>Filters also have the possibility to "invent" field
|
||||
<para>Input handlers also have the possibility to "invent" field
|
||||
names. This should also be output as meta tags:</para>
|
||||
|
||||
<programlisting>
|
||||
|
@ -3634,10 +3639,10 @@ or
|
|||
<title>Page numbers</title>
|
||||
|
||||
<para>The indexer will interpret <literal>^L</literal> characters
|
||||
in the filter output as indicating page breaks, and will record
|
||||
in the handler output as indicating page breaks, and will record
|
||||
them. At query time, this allows starting a viewer on the right
|
||||
page for a hit or a snippet. Currently, only the PDF, Postscript
|
||||
and DVI filters generate page breaks.</para>
|
||||
and DVI handlers generate page breaks.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
@ -3651,7 +3656,7 @@ or
|
|||
<literal>author</literal>, <literal>abstract</literal>.</para>
|
||||
|
||||
<para>The field values for documents can appear in several ways
|
||||
during indexing: either output by filters
|
||||
during indexing: either output by input handlers
|
||||
as <literal>meta</literal> fields in the HTML header section, or
|
||||
extracted from file extended attributes, or added as attributes
|
||||
of the <literal>Doc</literal> object when using the API, or
|
||||
|
@ -3661,7 +3666,7 @@ or
|
|||
specific field.</para>
|
||||
|
||||
<para>&RCL; defines a number of default fields. Additional
|
||||
ones can be output by filters, and described in the
|
||||
ones can be output by handlers, and described in the
|
||||
<filename>fields</filename> configuration file.</para>
|
||||
|
||||
<para>Fields can be:</para>
|
||||
|
@ -3903,7 +3908,7 @@ or
|
|||
<title>The Db class</title>
|
||||
|
||||
<para>A Db object is created by
|
||||
a <literal>connect()</literal> function and holds a
|
||||
a <literal>connect()</literal> call and holds a
|
||||
connection to a Recoll index.</para>
|
||||
<variablelist>
|
||||
<title>Methods</title>
|
||||
|
@ -4381,7 +4386,7 @@ except:
|
|||
directory.</para>
|
||||
|
||||
<para>A list of common file types which need external
|
||||
commands follows. Many of the filters need the
|
||||
commands follows. Many of the handlers need the
|
||||
<command>iconv</command> command, which is not always listed as a
|
||||
dependancy.</para>
|
||||
|
||||
|
@ -4398,7 +4403,7 @@ except:
|
|||
type is important to you.</para>
|
||||
|
||||
<para>As of &RCL; release 1.14, a number of XML-based formats that
|
||||
were handled by ad hoc filter code now use the
|
||||
were handled by ad hoc handler code now use the
|
||||
<command>xsltproc</command> command, which usually comes with
|
||||
<application>libxslt</application>. These are: abiword, fb2
|
||||
(ebooks), kword, openoffice, svg.</para>
|
||||
|
@ -4425,8 +4430,8 @@ except:
|
|||
be used as a fallback for some files which
|
||||
<command>antiword</command> does not handle.</para></listitem>
|
||||
|
||||
<listitem><para>MS Excel and PowerPoint need <command>
|
||||
catdoc</command>.</para></listitem>
|
||||
<listitem><para>MS Excel and PowerPoint are processed by
|
||||
internal <command>Python</command> handlers.</para></listitem>
|
||||
|
||||
<listitem><para>MS Open XML (docx) needs <command>
|
||||
xsltproc</command>.</para></listitem>
|
||||
|
@ -4451,15 +4456,10 @@ except:
|
|||
<command>djvused</command> from the
|
||||
<application>DjVuLibre</application> package.</para></listitem>
|
||||
|
||||
<listitem><para>Audio files: &RCL; releases before 1.13
|
||||
used the <command>id3info</command> command from the <application>
|
||||
id3lib</application> package to extract mp3 tag information,
|
||||
<command>metaflac</command> (standard flac tools) for flac files,
|
||||
and <command>ogginfo</command> (vorbis tools) for ogg
|
||||
files. Releases 1.14 and later use a single
|
||||
<application>Python</application> filter based
|
||||
on <application>mutagen</application> for all audio file
|
||||
types.</para>
|
||||
<listitem><para>Audio files: &RCL; releases 1.14 and later use
|
||||
a single <application>Python</application> handler based
|
||||
on <application>mutagen</application> for all audio file
|
||||
types.</para>
|
||||
</listitem>
|
||||
|
||||
<listitem><para>Pictures: &RCL; uses the
|
||||
|
@ -4471,7 +4471,7 @@ except:
|
|||
store personal tags or textual descriptions inside the image
|
||||
files.</para></listitem>
|
||||
|
||||
<listitem><para>chm: files in microsoft help format need Python and
|
||||
<listitem><para>chm: files in Microsoft help format need Python and
|
||||
the <application>pychm</application> module (which needs
|
||||
<application>chmlib</application>).</para></listitem>
|
||||
|
||||
|
@ -4498,15 +4498,15 @@ except:
|
|||
<listitem><para>Konqueror webarchive format with Python (uses the
|
||||
Tarfile module).</para></listitem>
|
||||
|
||||
<listitem><para>mimehtml web archive format (support based on the email
|
||||
filter, which introduces some mild weirdness, but still
|
||||
usable).</para></listitem>
|
||||
<listitem><para>Mimehtml web archive format (support based on
|
||||
the email handler, which introduces some mild weirdness, but
|
||||
still usable).</para></listitem>
|
||||
|
||||
</itemizedlist>
|
||||
|
||||
<para>Text, HTML, email folders, and Scribus files are
|
||||
processed internally. <application>Lyx</application> is used to
|
||||
index Lyx files. Many filters need <command>iconv</command> and the
|
||||
index Lyx files. Many handlers need <command>iconv</command> and the
|
||||
standard <command>sed</command> and <command>awk</command>.
|
||||
</para>
|
||||
|
||||
|
@ -4994,10 +4994,10 @@ skippedPaths = ~/somedir/*.txt
|
|||
<listitem><para>A space-separated list of patterns for
|
||||
names of files or directories that should be ignored
|
||||
inside zip archives. This is used directly by the zip
|
||||
filter, and has a function similar to skippedNames, but
|
||||
handler, and has a function similar to skippedNames, but
|
||||
works independantly. Can be redefined for filesystem
|
||||
subdirectories. For versions up to 1.19, you will need
|
||||
to update the Zip filter and install a supplementary
|
||||
to update the Zip handler and install a supplementary
|
||||
Python module. The details are
|
||||
described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
|
||||
the &RCL; wiki</ulink>.
|
||||
|
@ -5552,13 +5552,13 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|||
</varlistentry>
|
||||
|
||||
<varlistentry><term><varname>filtermaxseconds</varname></term>
|
||||
<listitem><para>Maximum filter execution time, after which it
|
||||
<listitem><para>Maximum handler execution time, after which it
|
||||
is aborted. Some postscript programs just loop...</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry><term><varname>filtersdir</varname></term>
|
||||
<listitem><para>A directory to search for the external
|
||||
filter scripts used to index some types of files. The
|
||||
input handler scripts used to index some types of files. The
|
||||
value should not be changed, except if you want to modify
|
||||
one of the default scripts. The value can be redefined for
|
||||
any sub-directory. </para>
|
||||
|
@ -5678,9 +5678,9 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>filter-specific sections</term>
|
||||
<listitem><para>Some filters may need specific
|
||||
configuration for handling fields. Only the email message filter
|
||||
<term>handler-specific sections</term>
|
||||
<listitem><para>Some input handlers may need specific
|
||||
configuration for handling fields. Only the email message handler
|
||||
currently has such a section (named
|
||||
<literal>[mail]</literal>). It allows indexing arbitrary email
|
||||
headers in addition to the ones indexed by default. Other such
|
||||
|
@ -5694,7 +5694,7 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|||
<filename>fields</filename>
|
||||
file. This would extract a specific email header and
|
||||
use it as a searchable field, with data displayable inside result
|
||||
lists. (Side note: as the email filter does no decoding on the values,
|
||||
lists. (Side note: as the email handler does no decoding on the values,
|
||||
only plain ascii headers can be indexed, and only the
|
||||
first occurrence will be used for headers that occur several times).
|
||||
|
||||
|
@ -6007,7 +6007,7 @@ application/x-blobapp = exec rclblob
|
|||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>The <replaceable>rclblob</replaceable> filter should
|
||||
<para>The <replaceable>rclblob</replaceable> handler should
|
||||
be an executable program or script which exists inside
|
||||
<filename>/usr/[local/]share/recoll/filters</filename>. It
|
||||
will be given a file name as argument and should output the
|
||||
|
@ -6015,7 +6015,7 @@ application/x-blobapp = exec rclblob
|
|||
|
||||
<para>The <link linkend="RCL.PROGRAM.FILTERS">filter
|
||||
programming</link> section describes in more detail how
|
||||
to write a filter.</para>
|
||||
to write an input handler.</para>
|
||||
|
||||
</sect3>
|
||||
|
||||
|
|
|
@ -1,60 +0,0 @@
|
|||
#!/bin/sh
|
||||
|
||||
# A script to produce the Recoll manual with an xml toolchain.
|
||||
# Tools used:
|
||||
# - xsltproc
|
||||
# - The docbook-xsl styleets
|
||||
# - dblatex for producing the PDF.
|
||||
#
|
||||
# Limitations:
|
||||
# - Does not produce the links to the whole/chunked versions at the top
|
||||
# of the document
|
||||
# - The anchor names from the source text are converted to uppercase
|
||||
# by the sgml toolchain. This does not happen with the xml
|
||||
# toolchain, which means that external links like
|
||||
# usermanual.html#RCL.CONFIG.INDEXING won't work because fragments
|
||||
# are case-sensitive. This has been solved by converting all ids
|
||||
# inside the source file to upper-case. DON'T REINTRODUCE
|
||||
# lower-case IDS
|
||||
|
||||
# Wherever docbook.xsl and chunk.xsl live
|
||||
# Fbsd
|
||||
#XSLDIR="/usr/local/share/xsl/docbook/"
|
||||
# Mac
|
||||
#XSLDIR="/opt/local/share/xsl/docbook-xsl/"
|
||||
#Linux
|
||||
XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
|
||||
|
||||
dochunky=1
|
||||
test $# -eq 1 && dochunky=0
|
||||
|
||||
# Options common to the single-file and chunked versions
|
||||
commonoptions="--stringparam section.autolabel 1 \
|
||||
--stringparam section.autolabel.max.depth 3 \
|
||||
--stringparam section.label.includes.component.label 1 \
|
||||
--stringparam autotoc.label.in.hyperlink 0 \
|
||||
--stringparam abstract.notitle.enabled 1 \
|
||||
--stringparam html.stylesheet docbook-xsl.css \
|
||||
--stringparam generate.toc \"book toc,title,figure,table,example,equation\" \
|
||||
"
|
||||
|
||||
# Do the chunky thing
|
||||
if test $dochunky -ne 0 ; then
|
||||
eval xsltproc $commonoptions \
|
||||
--stringparam use.id.as.filename 1 \
|
||||
--stringparam root.filename index \
|
||||
"$XSLDIR/html/chunk.xsl" \
|
||||
usermanual.xml
|
||||
fi
|
||||
|
||||
# Produce the single file version
|
||||
eval xsltproc $commonoptions \
|
||||
-o usermanual.html \
|
||||
"$XSLDIR/html/docbook.xsl" \
|
||||
usermanual.xml
|
||||
|
||||
tidy -indent usermanual.html > tmpfile
|
||||
mv -f tmpfile usermanual.html
|
||||
|
||||
# And the pdf with dblatex
|
||||
dblatex usermanual.xml
|
Loading…
Add table
Add a link
Reference in a new issue