Installing and Getting Started With Sycamore¶
Install Library¶
We recommend installing the Sycamore library using pip:
pip install sycamore-ai
Connectors for vector databases can be installed via extras. For example,
pip install sycamore-ai[opensearch]
will install Sycamore with OpenSearch support. You can find a list of supported connectors here.
By default, Sycamore works with Aryn DocParse to process documents. To run inference locally, install the local-inference extra as follows:
pip install sycamore-ai[local-inference]
Next, you can set the proper API keys for related services, like Aryn DocParse for processing documents (sign-up here for free) or OpenAI to use GPT with Sycamore's LLM-based transforms.
Now, that you have installed Sycamore, you see it in action using the example Jupyter notebooks. Many of these examples load a vector database in the last step of the processing pipeline, but you can edit the notebook to write the data to a different target database or out to a file. Visit the Sycamore GitHub for the sample notebooks.
Here are a few good notebooks to start with:
An intermediate ETL tutorial notebook walking through an ETL flow with chunking (using DocParse), LLM-based data enrichment, data cleaning, and loading a Pinecone hybrid search index
A notebook showing a simple processing job using DocParse to chunk PDFs, two LLM-based entity extraction transforms, and loading an OpenSearch hybrid index (vector + keyword)
A notebook that visually shows the bounding boxes created by DocParse.
A more advanced Sycamore pipeline that chunks PDFs using DocParse, does schema extraction and population using LLM transforms, data cleaning using Python, and loads an OpenSearch hybrid index (vector + keyword)
A notebook showing how to load a Pinecone vector database. There are other example notebooks showing sample code for loading other targets here.