Sampling
Sampling can be used to control the volume of traces collected by Langfuse. Sampling is handled client-side.
You can configure the sample rate by setting the LANGFUSE_SAMPLE_RATE environment variable or by using the sample_rate/sampleRate constructor parameter. The value has to be between 0 and 1.
The default value is 1, meaning that all traces are collected. A value of 0.2 means that only 20% of the traces are collected. The SDK samples on the trace level meaning that if a trace is sampled, all observations and scores within that trace will be sampled as well.
With the Python SDK, you can configure sampling when initializing the client:
from langfuse import Langfuse, get_client
import os
# Method 1: Set environment variable
os.environ["LANGFUSE_SAMPLE_RATE"] = "0.5" # As string in env var
langfuse = get_client()
# Method 2: Initialize with constructor parameter then get client
Langfuse(sample_rate=0.5) # 50% of traces will be sampled
langfuse = get_client()When using the @observe() decorator:
from langfuse import observe, Langfuse, get_client
# Initialize the client with sampling
Langfuse(sample_rate=0.3) # 30% of traces will be sampled
@observe()
def process_data():
# Only ~30% of calls to this function will generate traces
# The decision is made at the trace level (first span)
passIf a trace is not sampled, none of its observations (spans or generations) or associated scores will be sent to Langfuse, which can significantly reduce data volume for high-traffic applications.
Langfuse respects OpenTelemetry's sampling decisions. You can configure a sampler in your OTEL SDK to control which traces are sent to Langfuse. This is useful for managing costs and reducing noise in high-volume applications.
Here is an example of how to configure a TraceIdRatioBasedSampler to send only 20% of traces:
import { NodeSDK } from "@opentelemetry/sdk-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
const sdk = new NodeSDK({
// Sample 20% of all traces
sampler: new TraceIdRatioBasedSampler(0.2),
spanProcessors: [new LangfuseSpanProcessor()],
});See JS/TS SDK docs for more details.
Langfuse respects OpenTelemetry's sampling decisions. You can configure a sampler in your OTEL SDK to control which traces are sent to Langfuse. This is useful for managing costs and reducing noise in high-volume applications.
Here is an example of how to configure a TraceIdRatioBasedSampler to send only 20% of traces:
import { NodeSDK } from "@opentelemetry/sdk-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
const sdk = new NodeSDK({
// Sample 20% of all traces
sampler: new TraceIdRatioBasedSampler(0.2),
spanProcessors: [new LangfuseSpanProcessor()],
});Initialize the OpenAI integration as usual:
import OpenAI from "openai";
import { observeOpenAI } from "@langfuse/openai";
const openai = observeOpenAI(new OpenAI());See OpenAI Integration (JS/TS) for more details.
Langfuse respects OpenTelemetry's sampling decisions for Langchain (JS/TS) as well. Configure sampling in your OTEL SDK first, then initialize the callback handler.
import { NodeSDK } from "@opentelemetry/sdk-node";
import { LangfuseSpanProcessor } from "@langfuse/otel";
import { TraceIdRatioBasedSampler } from "@opentelemetry/sdk-trace-base";
const sdk = new NodeSDK({
// Sample 20% of all traces
sampler: new TraceIdRatioBasedSampler(0.2),
spanProcessors: [new LangfuseSpanProcessor()],
});After setting up tracing and sampling, initialize the Langchain callback handler as usual:
import { CallbackHandler } from "langfuse-langchain";
const handler = new CallbackHandler();See Langchain Integration (JS/TS) for more details.
When using the Vercel AI SDK Integration
import { registerOTel } from "@vercel/otel";
import { LangfuseExporter } from "langfuse-vercel";
export function register() {
registerOTel({
serviceName: "langfuse-vercel-ai-nextjs-example",
traceExporter: new LangfuseExporter({ sampleRate: 0.5 }),
});
}GitHub Discussions
Releases & Versioning
Langfuse allows for rapid iteration on LLM applications by providing insights into the effect of experiments such as A/B tests on LLM costs, latencies and quality.
Token & Cost Tracking
Langfuse tracks usage and cost of LLM generations for various models (incl OpenAI, Anthropic, Google, and more). You can always add your own models.