This commit is contained in:
Jean-Francois Dockes 2012-11-02 09:47:39 +01:00
parent 4b63268ac5
commit 1cd3c0a659

View file

@ -31,11 +31,13 @@
* with a simple question/response protocol.
*
* The data is exchanged in TLV fashion, in a way that should be
* usable in most script languages. The basic unit has one line with a
* data type and a count, followed by the data. A 'message' ends with
* one empty line. A possible exchange:
* usable in most script languages. The basic unit of data has one line
* with a data type and a count (both ASCII), followed by the data. A
* 'message' is made of one or several units or tags and ends with one empty
* line.
*
* From recollindex (the message begins before 'Filename'):
* Example from recollindex (the message begins before 'Filename' and has
* 'Filename' and 'Ipath' tags):
*
Filename: 24
/my/home/mail/somefolderIpath: 2
@ -44,7 +46,7 @@ Filename: 24
<Message ends here: because of the empty line after '22'
*
* Example answer:
* Example answer, with 'Mimetype' and 'Data' tags
*
Mimetype: 10
text/plainData: 10
@ -55,32 +57,42 @@ text/plainData: 10
*
* This format is both extensible and reasonably easy to parse.
* While it's more fitted for python or perl on the script side, it
* should even be sort of usable from the shell (ie: use dd to read
* should even be sort of usable from the shell (e.g.: use dd to read
* the counted data). Most alternatives would need data encoding in
* some cases.
*
* Higher level dialog:
* The c++ program is the master and sends request messages to the script. The
* requests have the following fields:
* The C++ program is the master and sends request messages to the script.
* Both sides of the communication should be prepared to receive and discard
* unknown tags.
* The messages normally have the following tags:
* - Filename: the file to process. This can be empty meaning that we
* are requesting the next document in the current file.
* - Ipath: this will be present only if we are requesting a specific
* subdocument inside a container file (typically for preview, at query
* time). Absent during indexing (ipaths are generated and sent back from
* the script
* the script)
* - Mimetype: this is the mime type for the (possibly container) file.
* Can be useful to filters which handle multiple types, like rclaudio.
* Can be useful to filters which handle multiple types, like rclaudio.
*
* The script answers with messages having the following fields:
* - Document: translated document data (typically, but not always, html)
* - Document: translated document data.
* - Ipath: ipath for the returned document. Can be used at query time to
* extract a specific subdocument for preview. Not present or empty for
* non-container files.
* - Mimetype: mime type for the returned data (ie: text/html, text/plain)
* extract a specific subdocument for preview. Not present or empty for
* non-container files and for the "self" document of a container.
* - Mimetype: mime type for the returned data.
* This is optional. For multi-document filters, if mimetype is
* not present in the answer, the ipath must be a file-name-like
* string which will be used to divine the mime type (this is used
* typically with archives like Zip or Tar). If this fails,
* the document will be handled as unknown type and the contents won't
* be indexed. When neither ipath nor mimetype are present the default
* is to attempt to treat the document as HTML.
* - Charset: for document types for which it makes sense, and if the filter
* has the information.
* - Eofnow: empty field: no document is returned and we're at eof.
* - Eofnext: empty field: file ends after the doc returned by this message.
* - SubdocError: no subdoc returned by this request, but file goes on.
* (the indexer (1.14) treats this as a file-fatal error anyway).
* - FileError: error, stop for this file.
*/
class MimeHandlerExecMultiple : public MimeHandlerExec {