Using Retrieval-Augmented Generation (RAG) pipelines#

Overview#

Retrieval-augmented generation (RAG) is a popular method to generate natural language answers to questions using LLMs and indexed data. It retrieves relevant data to a query, and then sends it along with a prompt to a LLM to sythensize an answer. Sycamore implements RAG using an OpenSearch Search Pipeline to orchestrate interactions with LLMs.

Untitled

The diagram above showes the flow of the RAG Search Processor.

  1. The results from a hybrid search query are retrieved as the search context

  2. The previous interactions from conversational memory are retrieved as conversation context

  3. The processor constructs a prompt for an LLM using the search context, conversational context, and prompt template. It sends this to the LLM and gets a response

  4. The response is added to the question and additional metadata, and saved in conversational memory as an interaction

  5. The generative response and list of hybrid search results are returned to the application

If a conversation ID wasn’t supplied, then the processor will not retrieve the conversation context or add an interaction to the conversation memory.

Using the RAG pipeline#

Sycamore has a default RAG pipeline named hybrid_rag_pipeline, and it uses OpenAI GPT-3.5-TURBO as the LLM by default. Sycamore is compatible with the OpenSearch query API, and you need to provide the embedding model ID to use in the query. To use the pipeline, specify it in your search query and add the generative_qa_parameters:

GET <index_name>/_search?search_pipeline=hybrid_rag_pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text_representation": "Who wrote the book of love"
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "Who wrote the book of love",
              "model_id": "<embedding model id>",
              "k": 100
            }
          }
        }
      ]
    }
  },
  "ext": {
    "generative_qa_parameters": {
    "llm_question": "Who wrote the book of love?"
    }
  }
}

The resulting generative answer from the RAG pipeline is in response.ext.

You can choose a different OpenAI LLM to use by adding the parameter llm-model to override the default setting. An example changing this to GPT-4 is:

GET <index_name>/_search?search_pipeline=hybrid_rag_pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text_representation": "Who wrote the book of love"
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "Who wrote the book of love",
              "model_id": "<embedding model id>",
              "k": 100
            }
          }
        }
      ]
    }
  },
  "ext": {
    "generative_qa_parameters": {
      "llm_question": "Who wrote the book of love?",
      "llm_model": "gpt-4"
    }
  }
}

Customize the RAG pipeline#

To create a RAG pipeline, you must first have a remote LLM-wrapper deployed with ml-commons. Then, for example, to create a RAG pipeline called my_rag_pipeline using OpenAI GPT-4:

PUT /_search/pipeline/my_rag_pipeline
{
  "description": "Retrieval Augmented Generation Pipeline",
  "response_processors": [
    {
      "retrieval_augmented_generation": {
        "tag": "openai_pipeline_demo",
        "model_id": "<remote_model_id>",
        "context_field_list": [
          "text_representation"
        ],
        "llm_model": "gpt-4"
      }
    }
  ]
}

To use this processor, simply add this to your OpenSearch query:

GET <index_name>/_search?search_pipeline=my_rag_pipeline
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "text_representation": "Who wrote the book of love"
          }
        },
        {
          "neural": {
            "embedding": {
              "query_text": "Who wrote the book of love",
              "model_id": "<embedding model id>",
              "k": 100
            }
          }
        }
      ]
    }
  },
  "ext": {
    "generative_qa_parameters": {
      "llm_question": "Who wrote the book of love?"
    }
  }
}

For more information, visit the OpenSearch documentation.