Pipeline Nodes

Nodes are the core components that process and route incoming text. Some perform steps like preprocessing, retrieving or summarizing text while route queries through different branches of a Pipeline. Nodes are chained together using a Pipeline and they function like building blocks that can be easily switched out for each other. A Node takes the output of the previous Node (or Nodes) as input.

Usage

All Nodes are designed to be useable within a Pipeline. When a Node has been added to a Pipeline, calling Pipeline.run() will in turn call each Node's run() method in the predefined sequence. For more information on this, see the Pipelines page.

Alternatively, you can also call the Nodes outside of the Pipeline. See each individual Node's documentation page to learn more about its available methods.

Available Nodes

Node	Classes	Description
FileConverters	PDFToTextConverter, DocxToTextConverter, AzureConverter, ImageToTextConverter, MarkdownConverter	Performs cleaning and splitting on Documents
Crawler	Crawler	Scrapes websites and returns text
PreProcessor	PreProcessor	Performs cleaning and splitting on Documents
Retriever	BM25Retriever, ElasticsearchRetriever, DensePassageRetriever, TableTextRetriever, EmbeddingRetriever, TfidfRetriever, ElasticsearchFilterOnlyRetriever	Looks into a coupled Document Store and fetches Documents that are relevant to a given Query
Reader	FARMReader, TransformersReader	Finds an answer to a question by selecting a text span in the provided Documents
Generator	RAGenerator, Seq2SeqGenerator	Generates an answer to a question by reading through the provided documents and composing an answer word-by-word
Summarizer	TransformersSummarizer	Creates a shorter overview of a given Document
Translator	TransformersTranslator	Translate text from one language into another
Ranker	SentenceTransformersRanker	Reorders a set of Documents based on their relevance to the Query
Query Classifier	TransformersQueryClassifier, SklearnQueryClassifier	Distinguishes between queries that are keywords, questions or statements and routes accordingly
Question Generator	QuestionGenerator	takes a Document as input and generates questions which it believes can be answered by the Document
Document Classifier	TransformersDocumentClassifier	Performs classification on Documents and attaches it as metadata
Entity Extractor	EntityExtractor	Extracts predefined entities out of a piece of text
Route Documents	RouteDocuments	Routes documents based on their content type or a metadata field
Join Documents	JoinDocuments	Takes Documents from multiple Nodes and joins them to form one list of Documents.
Join Answers	JoinAnswers	Takes Answers from two or more Reader or Generator nodes and joins them to produce a single list of Answers
Docs2Answers	Docs2Answers	Converts retrieved Documents into predicted Answers format.

Decision Nodes

You can add decision nodes where only one "branch" is executed afterwards. This allows, for example, to classify an incoming query and depending on the result routing it to different modules. To find a ready-made example of a decision node, have a look at the page about the QueryClassifier.

If you'd like to define our own, you'll need to create a class that looks something like this:

    class QueryClassifier(BaseComponent):
        outgoing_edges = 2

        def run(self, query):
            if "?" in query:
                return {}, "output_1"

            else:
                return {}, "output_2"

    pipe = Pipeline()
    pipe.add_node(component=QueryClassifier(), name="QueryClassifier", inputs=["Query"])
    pipe.add_node(component=es_retriever, name="ESRetriever", inputs=["QueryClassifier.output_1"])
    pipe.add_node(component=dpr_retriever, name="DPRRetriever", inputs=["QueryClassifier.output_2"])
    pipe.add_node(component=JoinDocuments(join_mode="concatenate"), name="JoinResults",
                  inputs=["ESRetriever", "DPRRetriever"])
    pipe.add_node(component=reader, name="QAReader", inputs=["JoinResults"])
    res = p.run(query="What did Einstein work on?", params={"ESRetriever": {"top_k": 1}, "DPRRetriever": {"top_k": 3}})

Stars

4649

Edit on GitHub

Usage
Available Nodes
Decision Nodes