Haystack docs home page

File Converters

Use File Converters to extract text from files in different formats and cast it into the unified Document format.

Position in a PipelineAt the very beginning of an indexing Pipeline
InputFilename
OutputDocuments
ClassesPDFToTextConverter
DocxToTextConverter
AzureConverter
ImageToTextConverter
MarkdownConverter

Tutorial: To see an example of file converters in a pipeline, see out advanced indexing tutorial.

Usage

Click a tab to read more about each converter and see how to initialize it:

Haystack also has a convert_files_to_dicts() utility function that will convert all txt or pdf files in a given directory.

from haystack.utils import convert_files_to_dicts
docs = convert_files_to_dicts(dir_path=doc_dir)