comments

2012-11-02 09:47:39 +01:00 · 2012-11-02 09:47:39 +01:00 · 1cd3c0a659
commit 1cd3c0a659
parent 4b63268ac5
1 changed files with 27 additions and 15 deletions
--- a/src/internfile/mh_execm.h
+++ b/src/internfile/mh_execm.h
@ -31,11 +31,13 @@
 * with a simple question/response protocol.
 *
 * The data is exchanged in TLV fashion, in a way that should be
- * usable in most script languages. The basic unit has one line with a
- * data type and a count, followed by the data. A 'message' ends with
- * one empty line. A possible exchange:
+ * usable in most script languages. The basic unit of data has one line 
+ * with a data type and a count (both ASCII), followed by the data. A
+ * 'message' is made of one or several units or tags and ends with one empty
+ * line. 
 * 
- * From recollindex (the message begins before 'Filename'):
+ * Example from recollindex (the message begins before 'Filename' and has
+ * 'Filename' and 'Ipath' tags):
 * 
 Filename: 24
 /my/home/mail/somefolderIpath: 2
@ -44,7 +46,7 @@ Filename: 24
 <Message ends here: because of the empty line after '22'

 * 
- * Example answer:
+ * Example answer, with 'Mimetype' and 'Data' tags
 * 
 Mimetype: 10
 text/plainData: 10
@ -55,32 +57,42 @@ text/plainData: 10
 *        
 * This format is both extensible and reasonably easy to parse. 
 * While it's more fitted for python or perl on the script side, it
- * should even be sort of usable from the shell (ie: use dd to read
+ * should even be sort of usable from the shell (e.g.: use dd to read
 * the counted data). Most alternatives would need data encoding in
 * some cases.
 *
 * Higher level dialog:
- * The c++ program is the master and sends request messages to the script. The
- * requests have the following fields:
+ * The C++ program is the master and sends request messages to the script. 
+ * Both sides of the communication should be prepared to receive and discard 
+ * unknown tags.
+ * The messages normally have the following tags:
 *  - Filename: the file to process. This can be empty meaning that we 
 *      are requesting the next document in the current file.
 *  - Ipath: this will be present only if we are requesting a specific 
 *      subdocument inside a container file (typically for preview, at query 
 *      time). Absent during indexing (ipaths are generated and sent back from
- *      the script
+ *      the script)
 *  - Mimetype: this is the mime type for the (possibly container) file. 
- *      Can be useful to filters which handle multiple types, like rclaudio.
+ *    Can be useful to filters which handle multiple types, like rclaudio.
 *      
 * The script answers with messages having the following fields:
- *   - Document: translated document data (typically, but not always, html)
+ *   - Document: translated document data.
 *   - Ipath: ipath for the returned document. Can be used at query time to
- *       extract a specific subdocument for preview. Not present or empty for 
- *       non-container files.
- *   - Mimetype: mime type for the returned data (ie: text/html, text/plain)
+ *     extract a specific subdocument for preview. Not present or empty for 
+ *     non-container files and for the "self" document of a container.
+ *   - Mimetype: mime type for the returned data.
+ *     This is optional. For multi-document filters, if mimetype is
+ *     not present in the answer, the ipath must be a file-name-like
+ *     string which will be used to divine the mime type (this is used
+ *     typically with archives like Zip or Tar). If this fails,
+ *     the document will be handled as unknown type and the contents won't 
+ *     be indexed. When neither ipath nor mimetype are present the default 
+ *     is to attempt to treat the document as HTML.
+ *   - Charset: for document types for which it makes sense, and if the filter
+ *     has the information.
 *   - Eofnow: empty field: no document is returned and we're at eof.
 *   - Eofnext: empty field: file ends after the doc returned by this message.
 *   - SubdocError: no subdoc returned by this request, but file goes on.
- *      (the indexer (1.14) treats this as a file-fatal error anyway).
 *   - FileError: error, stop for this file.
 */
 class MimeHandlerExecMultiple : public MimeHandlerExec {