Loading Pinecone with Sycamore#

This tutorial is meant to show how to create an ETL pipeline with Sycamore to load a Pinecone vector database. It walks through an intermediate-level ETL flow: partitioning, extraction, cleaning, chunking, embedding, and loading. You will need an Aryn Partitioning Service API key, OpenAI API key (for LLM-powered data enrichment and creating vector embeddings), and a Pinecone API key (for creating and using a vector index). At the time of writing, there are free trial or free tier options for all of these services.

Run this tutorial in a Colab notebook or locally with Jupyter.

Once you have your data loaded in Pinecone, you can use Pinecone's query features for semantic search or a framework like Langchain for RAG. The Pinecone Writer example notebook has sample Langchain code at the end of the notebook.