LangChain 完全指南 2026：用 LLM 构建 AI 应用 — Chains、Agents、RAG 与生产实践

TL;DRLangChain 是构建 LLM 应用的主流开源框架，提供模型抽象、提示词模板、链式编排、RAG 检索增强生成、Agent 工具调用和对话记忆等核心能力。配合 LangSmith（追踪调试）、LangGraph（复杂工作流）和 LangServe（API 部署），可覆盖从原型到生产的完整 AI 应用开发周期。

核心要点

LangChain 生态包含四大组件：LangChain（核心）、LangSmith（可观测）、LangGraph（状态图）、LangServe（部署）
LCEL（LangChain 表达式语言）用管道符 | 组合 Runnable，支持流式、批处理和异步
RAG 管道五步：加载文档 → 拆分 → 嵌入 → 向量存储 → 检索生成
Agent 使用 ReAct 模式让 LLM 推理选择工具，支持自定义工具和多步推理
LangGraph 处理需要循环、分支、人工干预的复杂工作流
LangSmith 提供全链路追踪，是生产调试的必备工具

什么是 LangChain？

LangChain 是一个开源框架，用于构建由大语言模型（LLM）驱动的应用。它将 LLM 调用、提示词管理、外部数据检索、工具使用和对话记忆等能力抽象为可组合的模块，让开发者专注于应用逻辑而非底层对接。LangChain 支持 Python 和 JavaScript，集成了 100+ 个 LLM 提供商和工具。

LangChain 生态系统

# LangChain Ecosystem Architecture
#
# +-------------------+     +-------------------+
# |    LangChain      |     |    LangGraph      |
# |  (Core Framework) |     | (Stateful Graphs) |
# |  Models, Prompts, |     | Cycles, Branching |
# |  Chains, Agents,  |<--->| Multi-Agent,      |
# |  Memory, RAG      |     | Human-in-the-Loop |
# +-------------------+     +-------------------+
#          |                          |
#          v                          v
# +-------------------+     +-------------------+
# |    LangSmith      |     |    LangServe      |
# |  (Observability)  |     |   (Deployment)    |
# |  Tracing, Evals,  |     |  FastAPI, REST,   |
# |  Monitoring, Debug|     |  Playground, Docs |
# +-------------------+     +-------------------+

安装与配置

LangChain 采用模块化包结构，核心包与各集成包分离安装。

# Core installation
pip install langchain langchain-core

# LLM provider integrations
pip install langchain-openai        # ChatGPT / GPT-4
pip install langchain-anthropic     # Claude
pip install langchain-google-genai  # Gemini
pip install langchain-community     # 100+ community integrations

# Vector stores & embeddings
pip install langchain-chroma        # ChromaDB
pip install langchain-pinecone      # Pinecone
pip install faiss-cpu               # FAISS (CPU)

# Additional ecosystem tools
pip install langgraph               # Stateful workflows
pip install langserve               # API deployment
pip install langsmith               # Tracing & evaluation

# Environment variables
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="ls__..."   # LangSmith

核心概念

聊天模型（Chat Models）

聊天模型是 LangChain 的基础组件，提供统一接口调用 OpenAI、Anthropic、Ollama 等各种 LLM。所有模型实现相同的 BaseChatModel 接口，支持流式输出、异步调用和结构化输出。

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_community.chat_models import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage

# OpenAI GPT-4o
llm_openai = ChatOpenAI(model="gpt-4o", temperature=0.7)

# Anthropic Claude
llm_claude = ChatAnthropic(model="claude-sonnet-4-20250514", temperature=0)

# Local model via Ollama
llm_local = ChatOllama(model="llama3.1:8b")

# All models share the same interface
messages = [
    SystemMessage(content="You are a helpful coding assistant."),
    HumanMessage(content="Write a Python function to merge two sorted lists."),
]
response = llm_openai.invoke(messages)
print(response.content)

# Streaming output
for chunk in llm_openai.stream(messages):
    print(chunk.content, end="", flush=True)

提示词模板与输出解析

提示词模板将可复用的提示词参数化，输出解析器将 LLM 文本输出转为结构化数据。两者配合 LCEL 管道使用，构成类型安全的 LLM 调用链。

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from pydantic import BaseModel, Field

# Simple prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role} expert."),
    ("human", "{question}"),
])

# String output parser (most common)
chain = prompt | llm_openai | StrOutputParser()
result = chain.invoke({"role": "Python", "question": "Explain decorators"})

# Structured output with Pydantic
class CodeReview(BaseModel):
    issues: list[str] = Field(description="List of code issues found")
    score: int = Field(description="Code quality score 1-10")
    suggestion: str = Field(description="Main improvement suggestion")

structured_llm = llm_openai.with_structured_output(CodeReview)
review = structured_llm.invoke("Review this code: def f(x): return x+1")
print(review.score, review.issues)

LCEL — LangChain 表达式语言

LCEL 是 LangChain 的声明式组合语言，用管道符 | 将 Runnable 组件串联成链。每个组件实现 invoke、stream、batch、ainvoke 等方法。LCEL 链自动支持流式输出、并发批处理和 LangSmith 追踪。

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

# Basic LCEL chain: prompt | model | parser
chain = (
    ChatPromptTemplate.from_template("Summarize this text: {text}")
    | llm_openai
    | StrOutputParser()
)

# Invoke, stream, or batch
result = chain.invoke({"text": "LangChain is a framework..."})
for chunk in chain.stream({"text": "LangChain is a framework..."}):
    print(chunk, end="")
results = chain.batch([{"text": t} for t in texts])  # parallel

# Parallel branches with RunnableParallel
analysis = RunnableParallel(
    summary=ChatPromptTemplate.from_template("Summarize: {text}") | llm_openai | StrOutputParser(),
    sentiment=ChatPromptTemplate.from_template("Sentiment of: {text}") | llm_openai | StrOutputParser(),
    keywords=ChatPromptTemplate.from_template("Keywords from: {text}") | llm_openai | StrOutputParser(),
)
output = analysis.invoke({"text": "Great product, fast shipping!"})

RAG 检索增强生成

RAG 让 LLM 基于你的私有数据回答问题。管道分为索引阶段（离线）和查询阶段（在线），涵盖文档加载、拆分、嵌入、存储和检索生成五个步骤。

# RAG Pipeline Architecture
#
# INDEXING (offline):
#   Documents --> Loader --> Splitter --> Embeddings --> Vector Store
#   (PDF/Web/DB)  (chunks)  (RecursiveChar) (OpenAI)    (Chroma/FAISS)
#
# QUERYING (online):
#   User Query --> Embedding --> Similarity Search --> Context + Query --> LLM --> Answer
#                                (top-k chunks)       (prompt template)

from langchain_community.document_loaders import (
    PyPDFLoader, WebBaseLoader, TextLoader
)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Step 1: Load documents
loader = PyPDFLoader("company_handbook.pdf")
docs = loader.load()  # List[Document]

# Step 2: Split into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(docs)

# Step 3-4: Embed and store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# Step 5: Retrieve and generate
rag_prompt = ChatPromptTemplate.from_template(
    """Answer based on the following context only.
Context: {context}
Question: {question}
Answer:"""
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm_openai
    | StrOutputParser()
)

answer = rag_chain.invoke("What is the vacation policy?")

Agent 与工具

Agent 让 LLM 成为推理引擎，自主决定调用哪些工具、以什么顺序执行。LangChain 使用 ReAct（Reasoning + Acting）模式：LLM 思考 → 选择工具 → 执行 → 观察结果 → 继续推理，直到得出最终答案。

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

# Define custom tools with the @tool decorator
@tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    # Replace with actual search API (Tavily, SerpAPI, etc.)
    return f"Search results for: {query}"

@tool
def calculate(expression: str) -> str:
    """Evaluate a math expression. Input should be a valid Python expression."""
    try:
        return str(eval(expression))
    except Exception as e:
        return f"Error: {e}"

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    # Replace with actual weather API
    return f"Weather in {city}: 22C, sunny"

# Create a ReAct agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [search_web, calculate, get_weather]
agent = create_react_agent(llm, tools)

# Run the agent
result = agent.invoke(
    {"messages": [("human", "What is the weather in Tokyo and convert 72F to Celsius?")]}
)
print(result["messages"][-1].content)

# Agent reasoning flow (ReAct pattern):
# Thought: I need to get Tokyo weather and convert 72F to C
# Action: get_weather("Tokyo")
# Observation: Weather in Tokyo: 22C, sunny
# Thought: Now convert 72F to Celsius: (72 - 32) * 5/9
# Action: calculate("(72 - 32) * 5 / 9")
# Observation: 22.22
# Final Answer: Tokyo is 22C and sunny. 72F = 22.22C

对话记忆

LangChain 提供多种记忆类型来维持对话上下文。选择取决于上下文窗口限制、成本和语义检索需求。在 LCEL 和 LangGraph 中，推荐使用消息历史管理而非传统 Memory 类。

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Modern approach: RunnableWithMessageHistory
store = {}  # session_id -> ChatMessageHistory

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(
    chain,  # your LCEL chain
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

# Each session maintains its own conversation history
config = {"configurable": {"session_id": "user-123"}}
chain_with_history.invoke({"question": "My name is Alice"}, config=config)
chain_with_history.invoke({"question": "What is my name?"}, config=config)
# -> "Your name is Alice"

# Memory types comparison:
# | Type                         | Strategy            | Best For            |
# |------------------------------|---------------------|---------------------|
# | ConversationBufferMemory     | Store everything    | Short conversations |
# | ConversationWindowMemory     | Keep last K turns   | Cost-sensitive apps |
# | ConversationSummaryMemory    | LLM summarizes      | Long conversations  |
# | ConversationTokenBufferMemory| Truncate by tokens  | Fixed-budget calls  |
# | VectorStoreMemory            | Semantic retrieval  | Large knowledge base|

LangGraph：复杂工作流与状态机

LangGraph 将 AI 工作流建模为有状态的图。节点是函数，边是转换条件。它支持循环（agent 迭代）、条件分支、持久化检查点和人工干预，适合构建多步骤、多角色的复杂 AI 应用。

from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

# Define the state schema
class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    next_step: str

# Define node functions
def classifier(state: AgentState) -> AgentState:
    """Classify the user query into a category."""
    last_msg = state["messages"][-1].content
    # LLM classifies the query
    category = llm.invoke(f"Classify this query: {last_msg}")
    return {"next_step": category.content}

def handle_technical(state: AgentState) -> AgentState:
    response = llm.invoke(state["messages"] + [("system", "You are a tech expert.")])
    return {"messages": [response]}

def handle_general(state: AgentState) -> AgentState:
    response = llm.invoke(state["messages"] + [("system", "You are a helpful assistant.")])
    return {"messages": [response]}

def route(state: AgentState) -> str:
    if "technical" in state["next_step"].lower():
        return "technical"
    return "general"

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("classifier", classifier)
graph.add_node("technical", handle_technical)
graph.add_node("general", handle_general)

graph.add_edge(START, "classifier")
graph.add_conditional_edges("classifier", route, {
    "technical": "technical",
    "general": "general",
})
graph.add_edge("technical", END)
graph.add_edge("general", END)

# Compile and run
app = graph.compile()
result = app.invoke({"messages": [("human", "How do I fix a segfault in C?")]})

LangSmith：追踪、评估与调试

LangSmith 是 LangChain 的配套可观测性平台，提供全链路追踪、自动化评估、数据集管理和提示词版本控制。开启追踪只需设置环境变量，所有 LangChain 调用自动上报。

# Enable LangSmith tracing (set in environment)
# export LANGCHAIN_TRACING_V2="true"
# export LANGCHAIN_API_KEY="ls__..."
# export LANGCHAIN_PROJECT="my-project"

# All LangChain calls are now traced automatically!
# View traces at https://smith.langchain.com

# Programmatic evaluation with LangSmith
from langsmith import Client
from langsmith.evaluation import evaluate

client = Client()

# Create a dataset for evaluation
dataset = client.create_dataset("qa-test")
client.create_examples(
    inputs=[
        {"question": "What is LangChain?"},
        {"question": "How does RAG work?"},
    ],
    outputs=[
        {"answer": "LangChain is an LLM framework"},
        {"answer": "RAG combines retrieval with generation"},
    ],
    dataset_id=dataset.id,
)

# Define your target function and evaluator
def predict(inputs: dict) -> dict:
    return {"answer": chain.invoke(inputs["question"])}

# Run evaluation
results = evaluate(
    predict,
    data="qa-test",
    evaluators=["qa"],  # built-in QA correctness evaluator
    experiment_prefix="v1",
)

部署：LangServe + FastAPI

LangServe 将 LangChain Runnable 部署为 REST API，自动生成 OpenAPI 文档、交互式 Playground，并支持流式响应。底层基于 FastAPI，可与现有 Python Web 基础设施无缝集成。

# server.py — Deploy a chain as an API
from fastapi import FastAPI
from langserve import add_routes
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

app = FastAPI(title="My LLM API", version="1.0")

# Define your chain
chain = (
    ChatPromptTemplate.from_template("Translate to {language}: {text}")
    | ChatOpenAI(model="gpt-4o-mini")
    | StrOutputParser()
)

# Add routes (creates /translate/invoke, /translate/stream, etc.)
add_routes(app, chain, path="/translate")

# Run: uvicorn server:app --host 0.0.0.0 --port 8000
# Playground: http://localhost:8000/translate/playground
# Docs: http://localhost:8000/docs

# --- Client-side usage ---
from langserve import RemoteRunnable

remote_chain = RemoteRunnable("http://localhost:8000/translate")
result = remote_chain.invoke({"language": "French", "text": "Hello world"})

# Streaming from client
for chunk in remote_chain.stream({"language": "Spanish", "text": "Good morning"}):
    print(chunk, end="")

高级 RAG 技巧

基础 RAG 管道在简单场景下效果不错，但生产环境需要更精细的技术来提高检索质量和答案准确性。以下是五种关键优化策略。

多查询检索器（MultiQuery Retriever）

用 LLM 将用户问题改写为多个不同角度的查询，合并检索结果以提高召回率。

from langchain.retrievers.multi_query import MultiQueryRetriever

# Generates 3 variations of the query for broader retrieval
multi_retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(),
    llm=ChatOpenAI(model="gpt-4o-mini", temperature=0.3),
)
# "What is the refund policy?" generates:
#   - "How do I get a refund?"
#   - "What are the return and refund rules?"
#   - "Refund process and eligibility requirements"
docs = multi_retriever.invoke("What is the refund policy?")

上下文压缩与重排序

检索后对文档进行压缩和重排序，只保留与查询最相关的段落，减少噪音传入 LLM。

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_community.document_compressors import CohereRerank

# Option 1: LLM-based extraction (keeps only relevant sentences)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=vectorstore.as_retriever(search_kwargs={"k": 10}),
)

# Option 2: Cohere Re-rank (fast, production-grade)
reranker = CohereRerank(model="rerank-v3.5", top_n=4)
rerank_retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=vectorstore.as_retriever(search_kwargs={"k": 20}),
)

父文档检索器与混合搜索

父文档检索器用小块索引、大块返回，解决分块过小丢失上下文的问题。混合搜索结合向量相似性和关键词 BM25 检索，同时捕获语义和精确匹配。

from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

# Parent Document Retriever: index small, return big
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
parent_retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=InMemoryStore(),
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)
parent_retriever.add_documents(docs)

# Hybrid Search: combine vector + BM25 keyword search
bm25 = BM25Retriever.from_documents(chunks, k=4)
vector_ret = vectorstore.as_retriever(search_kwargs={"k": 4})
hybrid = EnsembleRetriever(
    retrievers=[bm25, vector_ret],
    weights=[0.4, 0.6],  # keyword 40%, semantic 60%
)

生产架构参考

将 LangChain 应用部署到生产环境需要考虑缓存、限流、监控和容错。以下是一个典型的生产级架构示意和关键配置。

# Production Architecture
#
# +----------+     +-------------+     +--------------+     +----------+
# |  Client  | --> |  FastAPI /   | --> |  LangChain   | --> |  LLM API |
# | (Web/App)|     |  LangServe  |     |  Chain/Agent |     | (OpenAI) |
# +----------+     +-------------+     +--------------+     +----------+
#                        |                    |                    |
#                        v                    v                    v
#                  +----------+        +-----------+        +----------+
#                  |  Redis   |        | LangSmith |        |  Vector  |
#                  |  Cache   |        |  Tracing  |        |   Store  |
#                  +----------+        +-----------+        +----------+

# Caching: avoid repeated LLM calls for identical inputs
from langchain_core.globals import set_llm_cache
from langchain_community.cache import RedisCache
import redis

set_llm_cache(RedisCache(redis_=redis.Redis(host="localhost", port=6379)))

# Fallback chain: switch providers on failure
main_chain = prompt | ChatOpenAI(model="gpt-4o")
fallback_chain = prompt | ChatAnthropic(model="claude-sonnet-4-20250514")
robust_chain = main_chain.with_fallbacks([fallback_chain])

# Retry with exponential backoff
from langchain_core.runnables import RunnableConfig
config = RunnableConfig(max_concurrency=5)
results = await chain.abatch(inputs, config=config)

# Token counting callback for cost monitoring
from langchain_community.callbacks import get_openai_callback
with get_openai_callback() as cb:
    result = chain.invoke({"question": "What is LangChain?"})
    print(f"Tokens: {cb.total_tokens}, Cost: ${cb.total_cost:.4f}")

LangChain vs LlamaIndex vs Haystack

三大 LLM 框架各有侧重。选择取决于你的核心需求：通用 Agent 和链式编排选 LangChain，RAG 和数据索引选 LlamaIndex，企业级 NLP 管道选 Haystack。

维度	LangChain	LlamaIndex	Haystack
核心定位	通用 LLM 应用框架	数据索引与 RAG	生产级 NLP 管道
Agent 支持	强 — ReAct、工具调用、LangGraph	中等 — 基础 Agent 模式	中等 — Agent 组件
RAG	良好 — 灵活但需手动组合	优秀 — 开箱即用的索引和查询引擎	良好 — 管道式 RAG
可观测性	LangSmith	LlamaTrace / Arize	Haystack Tracing
学习曲线	中高 — 概念多、变化快	中等 — 聚焦 RAG 更直观	中等 — Pipeline 模式清晰
适合场景	多样化 AI 应用、Agent、复杂链	知识库问答、文档搜索	企业搜索、NLP 管道

最佳实践与常见陷阱

从简单开始: 先用单个 LLM 调用验证想法，再引入链、Agent 和 RAG 的复杂性
始终开启 LangSmith 追踪: 在开发阶段就开启追踪，调试 LLM 应用的成本会大幅降低
优化提示词而非代码: LLM 应用中 80% 的改进来自更好的提示词，而非更复杂的架构
控制 RAG 分块大小: 分块过大导致噪音多，过小则丢失上下文。通常 500-1500 字符 + 10-20% 重叠是好的起点
使用结构化输出: 用 with_structured_output() 替代手动解析 LLM 输出，更可靠且类型安全
实现回退机制: 使用 chain.with_fallbacks([fallback_chain]) 处理 LLM 提供商故障
避免过度使用 Agent: 如果流程是确定性的，用链或 LangGraph 图代替 Agent。Agent 的不确定性会增加调试难度和成本
版本控制提示词: 用 LangSmith Hub 或代码仓库管理提示词版本，便于 A/B 测试和回滚
评估驱动开发: 建立自动化评估数据集，每次修改后运行评估，防止回归
注意 token 用量: Agent 循环可能消耗大量 token。设置 max_iterations 限制并使用回调追踪成本

常见陷阱

# Pitfall 1: Not handling rate limits
from langchain_core.rate_limiters import InMemoryRateLimiter
rate_limiter = InMemoryRateLimiter(requests_per_second=1, max_bucket_size=10)
llm = ChatOpenAI(model="gpt-4o", rate_limiter=rate_limiter)

# Pitfall 2: Ignoring document metadata in RAG
# Always include metadata for filtering and source attribution
chunks = splitter.split_documents(docs)
for chunk in chunks:
    chunk.metadata["source"] = "handbook_v2"
    chunk.metadata["date"] = "2026-01"

# Pitfall 3: Not using async for I/O-bound workloads
import asyncio
async def process_queries(queries):
    tasks = [chain.ainvoke({"question": q}) for q in queries]
    return await asyncio.gather(*tasks)

# Pitfall 4: Forgetting to set temperature=0 for deterministic tasks
# Use temperature=0 for extraction, classification, structured output
# Use temperature=0.7-1.0 for creative tasks

常见问题

What is LangChain and what is it used for?

LangChain is an open-source framework for building applications powered by large language models (LLMs). It provides composable abstractions for prompt management, chain orchestration, retrieval-augmented generation (RAG), agents with tool use, and memory. LangChain supports Python and JavaScript and integrates with 100+ LLM providers, vector stores, and external tools.

How do I install LangChain in Python?

Install LangChain with pip: pip install langchain langchain-openai langchain-community. For specific integrations, install additional packages like langchain-anthropic for Claude, langchain-google-genai for Gemini, or langchain-chroma for ChromaDB vector store. LangChain uses a modular package structure where each integration is a separate package.

What is the difference between LangChain, LangSmith, LangGraph, and LangServe?

LangChain is the core framework for building LLM applications with chains and agents. LangSmith is a platform for tracing, evaluating, monitoring, and debugging LLM applications. LangGraph extends LangChain for building stateful, multi-actor applications with graph-based workflows and cycles. LangServe deploys LangChain runnables as production REST APIs using FastAPI.

What is RAG (Retrieval-Augmented Generation) in LangChain?

RAG in LangChain is a pipeline that combines document retrieval with LLM generation. The pipeline consists of: document loading (PDF, web pages, databases), text splitting (RecursiveCharacterTextSplitter), embedding generation (OpenAI, HuggingFace), vector storage (Chroma, Pinecone, FAISS), retrieval (similarity search), and LLM generation with retrieved context. This enables LLMs to answer questions about your private data.

How do LangChain agents work?

LangChain agents use LLMs as reasoning engines to decide which tools to call and in what order. The agent receives a query, reasons about which tool to use (following patterns like ReAct), executes the tool, observes the result, and iterates until it has a final answer. You can create custom tools with the @tool decorator and use built-in tools for web search, code execution, APIs, and more.

What memory types does LangChain support?

LangChain supports several memory types: ConversationBufferMemory (stores full conversation history), ConversationBufferWindowMemory (keeps last K exchanges), ConversationSummaryMemory (LLM-generated summary of conversation), ConversationTokenBufferMemory (truncates by token count), and VectorStoreMemory (stores memories in a vector database for semantic retrieval). Choose based on your context window and cost constraints.

What is LangGraph and when should I use it?

LangGraph is a library for building stateful, multi-step AI workflows as graphs. Use it when you need cycles (agent loops), branching logic, human-in-the-loop patterns, persistent state across steps, or multi-agent collaboration. It extends LangChain with StateGraph for defining nodes (functions) and edges (transitions), with built-in support for checkpointing and streaming.

How does LangChain compare to LlamaIndex and Haystack?

LangChain is a general-purpose framework best for diverse LLM applications, agents, and complex chains. LlamaIndex specializes in data indexing and RAG, offering superior document handling and query engines for knowledge-intensive applications. Haystack focuses on production NLP pipelines with strong enterprise features. Choose LangChain for flexibility, LlamaIndex for RAG-heavy workloads, or Haystack for production NLP systems.