Build a Semantic Search Engine over a PDF document Using LangChain
In this tutorial, we'll build a semantic search engine over a PDF document locally using LangChain and llamaCPP.
This guide introduces core LangChain concepts like document loaders, embeddings, vector stores, and retrievers, which together allow we to fetch and reason over data in AI applications—especially useful for retrieval-augmented generation (RAG).
You can find this code in my Github repository colab notebook.
In this guide, we’ll cover this:
Load and chunk text from PDFs
Create embeddings to represent document meaning
Store and search documents in a vector store
Use retrievers to integrate semantic search into LLM workflows
Use PromptTemplate and RetrievalQA Chain for Q&A
Let’s start off by setting up the system,
Install dependencies using pip or conda:
pip install langchain llama-cpp pdfplumber numpy langchain_community langchain-chroma>=0.1.2We’ll also need to download a Llama model from Hugging Face or any other local repository we prefer. To run it locally, we can use llama-cpp-python, which allows you to run Llama models on your machine without calling APIs.
Step 1: Extract Text from the PDF
We will first extract text from a PDF document. For this task, we’ll use pdfplumber, as it can handle complex PDFs better, including those with tables and multi-column formats.
from langchain_community.document_loaders import PyPDFLoader
file_path = "nke-10k-2023.pdf"
loader = PyPDFLoader(file_path)
docs = loader.load()
print(len(docs), docs[0].page_content[:200], docs[0].metadata)
Step 3: Split Text into Chunks with LangChain’s RecursiveCharacterTextSplitter
To improve retrieval, we can split the document into smaller chunks. This helps ensure that the search process captures meaningful context in smaller portions.
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200, add_start_index=True
)
# Split the loaded documents into chunks
all_splits = text_splitter.split_documents(docs)
print(f"Total chunks: {len(all_splits)}")
Step 4: Generate Embeddings Using SentenceTransformer
We’ll now use SentenceTransformers to generate embeddings for each chunk. SentenceTransformers provides a simple interface to load pre-trained models and generate sentence-level embeddings
from sentence_transformers import SentenceTransformer
# Load a pretrained Sentence Transformer model
embeddings = SentenceTransformer("all-MiniLM-L6-v2")
# Generate embeddings for each chunk
chunk_texts = [chunk.page_content for chunk in all_splits]
embedding_vectors = embeddings.encode(chunk_texts, convert_to_numpy=True)
print(f"Generated {len(embedding_vectors)} embeddings.")
Step 5: Store Embeddings in Chroma (Vector Store)
Next, we will use Chroma, an open-source vector store, to store and index the embeddings. Chroma makes it easy to work with embeddings and provides efficient search capabilities.
import chromadb
# Initialize Chroma client and create a collection for storing the embeddings
persist_directory = "chroma_db"
client = chromadb.Client()
vector_store = client.create_collection(name="document_embeddings", persist_directory=persist_directory)
# Add documents and embeddings to Chroma
vector_store.add(
documents=chunk_texts,
embeddings=embedding_vectors,
metadatas=[{"start_index": chunk.metadata["start_index"]} for chunk in all_splits],
ids=[str(i) for i in range(len(chunk_texts))]
)
print("Embeddings stored in Chroma vector store.")
Step 6: Search Using Similarity Search with Scores and retriever integration
We can perform a similarity search over the vector store (Chroma) using the similarity_search method to find the most relevant documents based on a query. This method returns the closest matching document(s) based on vector similarity.
The similarity_search_with_score method is used to return the documents along with their similarity scores, providing a ranking of relevance. This can be helpful if we want to assess how closely each document matches the query.
We use the as_retriever method to convert the vector store into a retriever, making it easier to integrate into more advanced workflows, such as the question-answering system using an LLM (Large Language Model). The retriever allows you to search based on similarity and adjust how many results we want (search_kwargs={"k": 3} means it returns the top 3 results).
# Use similarity search with the vector store
results = vector_store.similarity_search("How many distribution centers does Nike have in the US?")
print(results[0].page_content) # Print the content of the most relevant document
# Use similarity search with scores
doc, score = vector_store.similarity_search_with_score("Nike revenue in 2023")[0]
print(f"Score: {score}\nText: {doc.page_content}")
# Convert Chroma to a retriever
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})
# Use retriever to fetch relevant documents for a query
docs = retriever.invoke("When was Nike incorporated?")
print(docs) # Display the retrieved documents
Step 7: Use PromptTemplate and RetrievalQA Chain for Q&A
The PromptTemplate allows you to create dynamic prompts for the LLM that are based on the retrieved context. This ensures that the LLM is given relevant information (context) to answer questions accurately.
Finally, we use the RetrievalQA chain, which takes the LLM and the retriever to combine document retrieval and question-answering in one seamless process.
# Create a prompt template for the Q&A system
template = """
You are a helpful assistant that answers questions based on the provided document context.
Context:
{context}
Question:
{question}
Answer in a concise, fact-based manner:
"""
prompt = PromptTemplate(
input_variables=["context", "question"],
template=template,
)
# Initialize the RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm, # Replace `llm` with your specific language model
retriever=retriever,
chain_type_kwargs={"prompt": prompt},
)
# Perform a question-answering task
response = qa_chain.run("Summarize Nike’s financial highlights for 2023.")
print(response)
This method offers a simple and effective solution for developing a local semantic search engine over PDFs, utilizing SentenceTransformers for embedding creation, Chroma for storing and querying embeddings, and LangChain for combining the document retrieval and question-answer components.
This design allows you to simply create semantic search, perform question-answering, and scale it to additional sorts of documents or models as needed.

