Installing and Getting Started With Sycamore

Install Library

We recommend installing the Sycamore library using pip:

pip install sycamore-ai

Connectors for vector databases can be installed via extras. For example,

pip install sycamore-ai[opensearch]

will install Sycamore with OpenSearch support. You can find a list of supported connectors here.

By default, Sycamore works with Aryn DocParse to process documents. To run inference locally, install the local-inference extra as follows:

pip install sycamore-ai[local-inference]

Next, you can set the proper API keys for related services, like Aryn DocParse for processing documents (sign-up here for free) or OpenAI to use GPT with Sycamore's LLM-based transforms.

Now, that you have installed Sycamore, you see it in action using the example Jupyter notebooks. Many of these examples load a vector database in the last step of the processing pipeline, but you can edit the notebook to write the data to a different target database or out to a file. Visit the Sycamore GitHub for the sample notebooks.

Here are a few good notebooks to start with:

  • An intermediate ETL tutorial notebook walking through an ETL flow with chunking (using DocParse), LLM-based data enrichment, data cleaning, and loading a Pinecone hybrid search index

  • A notebook showing a simple processing job using DocParse to chunk PDFs, two LLM-based entity extraction transforms, and loading an OpenSearch hybrid index (vector + keyword)

  • A notebook that visually shows the bounding boxes created by DocParse.

  • A more advanced Sycamore pipeline that chunks PDFs using DocParse, does schema extraction and population using LLM transforms, data cleaning using Python, and loads an OpenSearch hybrid index (vector + keyword)

  • A notebook showing how to load a Pinecone vector database. There are other example notebooks showing sample code for loading other targets here.