
OpenSearch is an open-source flexible, scalable full-text search engine that is based off a 2021 fork of Elasticsearch. OpenSearch makes it easy to build hybrid search applications with clear in-built functionality and strucutre.

Configuration for OpenSearch

Please see OpenSearch's installation page for more in-depth information on installing, configuring, and running OpenSearch. We specify the setup required to run a simple demo app.

For local development and testing, we recommend running OpenSearch through docker compose. The provided compose.yml file runs OpenSearch, which has an associated low-level Python library that makes querying easier.

version: '3'
    image: opensearchproject/opensearch:2.10.0
    container_name: opensearch
      - discovery.type=single-node
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping
        soft: -1 # Set memlock to unlimited (no soft or hard limit)
        hard: -1
      - 9200:9200 # REST API

With this you can run OpenSearch with a simple docker compose up.

Writing to OpenSearch

To write a DocSet to a OpenSearch index from Sycamore, use the docset.write.opensearch(...) function. The OpenSearch writer takes the following arguments:

  • os_client_args: Keyword parameters that are passed to the opensearch-py OpenSearch client constructor.

  • index_name: The name of the OpenSearch index into which to load this DocSet.

  • index_settings: Settings and mappings to pass when creating a new index. Specified as a Python dict corresponding to the JSON paramters taken by the OpenSearch CreateIndex API, more information is given here.

  • execute: (optional, default=True) Whether to execute this sycamore pipeline now, or return a docset to add more transforms.

To write a docset to the OpenSearch index run by the Docker compose above, we can write the following:

index_name = "test_index-other"

os_client_args = {
    "hosts": [{"host": "localhost", "port": 9200}],
    "http_auth": ("user", "password"),

index_settings = {
    "body": {
        "settings": {
            "index.knn": True,
        "mappings": {
            "properties": {
                "embedding": {
                    "type": "knn_vector",
                    "dimension": 384,
                    "method": {"name": "hnsw", "engine": "faiss"},

More information can be found in the API documentation. A demo of the writer can also be found in the demo notebook.

Reading from OpenSearch

In addition to the os_client_args and index_name arguments above, reading from OpenSearch takes in an optional query parameter, which takes in a dictionary using the OpenSearch query DSL (further information is given here). Note that if the parameter is not specified, the function will return a full scan of all documents in the index.

ctx = sycamore.init(), index_name=index_name, query={"query": {"term": {"_id": "SAMPLE-DOC-ID"}}})

More information can be found in the API documentation.