Summarizer

The Summarizer gives a short overview of a long Document. The Summarizer can give you a glimpse of what Documents your Retriever is returning.

You can use any summarization model from HuggingFace Transformers by providing the model name. By default, the Google Pegasus model is loaded.


Position in a Pipeline	After preprocessing in an indexing Pipeline or after the Retriever in a querying Pipeline
Input	Documents
Output	Documents
Classes	TransformersSummarizer

Usage

To initialize and run a stand-alone Summarizer:

from haystack.nodes import TransformersSummarizer
from haystack import Document

docs = [Document("PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions.\
                 The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by\
                 the shutoffs which were expected to last through at least midday tomorrow.")]

summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum")
summary = summarizer.predict(documents=docs, generate_single_summary=True)

The contents of summary should contain both the summarization and also the original document text:

[
    {
        "text": "California's largest electricity provider has turned off power to hundreds of thousands of customers.",
        "meta": {
            "context": "PGE stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions."
        },
        ...
    }
]

To use a Summarizer in a pipeline:

from haystack import Pipeline

p = Pipeline()
p.add_node(component=retriever, name="ESRetriever1", inputs=["Query"])
p.add_node(component=summarizer, name="Summarizer", inputs=["ESRetriever1"])
res = p.run(query="What did Einstein work on?")

Stars

4649

Edit on GitHub

Usage