Skip to content


Intelligent question-answering and search for user interviews, powered by GPT-3.


How it works

Summ starts with a corpus of user interview transcripts. These can be in any text format, such as exports from

We flow these through a pipeline: Import -> Split -> Classify | Factify | Structure | Summarize -> Embed to create a model which can answer questions across your entire dataset. Vector embeddings are persisted to Pinecone.

Finally, we enable flexible Querying, following a recursive question-answering scheme.

Check out this blog post for more details.


You'll need an instance of Redis Stack running.

$ brew install --cask redis-stack/redis-stack/redis-stack-server
$ brew install yasyf/summ/redis-stack
$ brew services start yasyf/summ/redis-stack

You'll also need to set three environment variables: OPENAI_API_KEY, PINECONE_API_KEY, and PINECONE_ENVIRONMENT.


The easiest installation uses pip:

$ pip install summ

If you prefer to use brew:

$ brew install yasyf/summ/summ

n.b summ requires Python 3.10+.


You can confirm that summ installed properly by running the built-in example.

$ summ-example


This quickstart is taken straight from the example.


First, create a new project with:

$ summ init /path/to/project
$ cd /path/to/project


The class MyClasses in implementation/ sets out one categories of tags: audio source.

from enum import StrEnum, auto
from summ.classify import Classes

class MyClasses(Classes, StrEnum):
    # SOURCE
    SOURCE_PODCAST = auto()
    SOURCE_RADIO = auto()


The classifiers in implementation/ use simple parameters to define a prompt for each category of tags. It is normally sufficient to simply provide CATEGORY, VARS, and EXAMPLES. You may also optionally specify a PREFIX or SUFFIX for the prompt.

from textwrap import dedent
from typing_extensions import override
from summ.classify.classifier import Classifier, Document
from .classes import MyClasses

class TypeClassifier(Classifier, classes=MyClasses):
    VARS = {
        "opening": "Opening Remarks",
        "source": "Audio Source",
    EXAMPLES = [
            "opening": "Welcome to radio one!",
            "source": MyClasses.SOURCE_RADIO,
            "opening": "This is the latest episode of the Science podcast.",
            "source": MyClasses.SOURCE_PODCAST,
            "opening": "We're sitting down today with Ben.",
            "source": MyClasses.SOURCE_INTERVIEW,
    SUFFIX = f"If someone is being interviewd, the class is always {MyClasses.SOURCE_INTERVIEW}, even if the medium matches a different class."

    def classify(self, docs: list[Document]) -> dict[str, str]:
        return {"opening": docs[0].page_content}


Finally, in implementation/, we:

  1. Ensure our classifiers are imported
  2. Construct a Summ object, passing a Path to our training data.
  3. Construct a custom Pipeline object which specifies the import format.
  4. Pass these two to summ.CLI, which creates a command line interface for us.
from pathlib import Path

from summ import Pipeline, Summ
from summ.cli import CLI
from summ.splitter.otter import OtterSplitter

if __name__ == "__main__":
    summ = Summ(index="cronutt-facts")

    path = Path(__file__).parent.parent / "interviews"
    pipe = Pipeline.default(path, summ.index)
    pipe.splitter = OtterSplitter(
            "Cindy Buckmaster",
            "Michelle Greenfield",
    ), pipe)



To run the Terminal UI, simply do:

$ python -m implementation

You can also run the steps non-interactively, as shown below.


Now, to populate our model, we can do:

$ python -m implementation populate


And to query it:

$ python -m implementation query "What kind of animal is Cronutt?"
Cronutt is a California sea lion, a species of marine mammal.