The rclextract module

Classes

Index queries do not provide document content (only a partial and unprecise reconstruction is performed to show the snippets text). In order to access the actual document data, the data extraction part of the indexing process must be performed (subdocument access and format translation). This is not trivial in general. The rclextract module currently provides a single class which can be used to access the data content for result documents.

Classes

The Extractor class

The Extractor class

Methods

Extractor(doc)

An Extractor object is built from a Doc object, output from a query.

Extractor.textextract(ipath)

Extract document defined by ipath and return a Doc object. The doc.text field has the document text converted to either text/plain or text/html according to doc.mimetype. The typical use would be as follows:

qdoc = query.fetchone()
extractor = recoll.Extractor(qdoc)
doc = extractor.textextract(qdoc.ipath)
# use doc.text, e.g. for previewing

Extractor.idoctofile(ipath, targetmtype, outfile='')

Extracts document into an output file, which can be given explicitly or will be created as a temporary file to be deleted by the caller. Typical use:

qdoc = query.fetchone()
extractor = recoll.Extractor(qdoc)
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)

Prev	Up	Next
	Home

Recoll user manualPython interface

The rclextract module

Classes

The Extractor class

Recoll user manual

Python interface