Usage¶
To generate a topic hierarchy for a corpus, run the bacalhau
script with the appropriate arguments. These are documented in the
script; use bacalhau -h
to see them.
Handling new document formats¶
If the corpus files are not TEI XML, an implementation of the
bacalhau.document.Document
class must be written. The name of this
class (with complete package path; for example,
bacalhau.tei_document.TEIDocument
) is passed to the bacalhau
script with --document
option.
Corpora with documents of more than a single type are not supported.