How to Retrieve Experiment Scores?

Terminology Note: "Experiment" and "dataset run" are used interchangeably throughout Langfuse. We are moving toward deprecating the term "dataset run" in favor of "experiment", but both terms currently refer to the same concept.

Langfuse supports two types of experiment scores:

Experiment-level scores: Overall metrics for the entire experiment run (e.g., precision, recall, F1-scores). These scores are immutable and represent aggregate performance. Learn more about run-level scores.
Experiment-item-level scores: Scores for individual items within an experiment (e.g., per-generated-output evaluations).

Via API/SDK

Experiment-Level Scores

Support coming soon: Fetch experiment-level scores using the Langfuse SDK or scores API with the datasetRunId parameter. See the Scores Data Model for details on score properties.

Experiment-Item-Level Scores

Current Workaround: The method below is a workaround for retrieving experiment-item-level scores. We recommend:

Using the Experiment Runner SDK which provides direct access to all scores in context
We may add a dedicated API route for experiment scores/metrics in the near future

To retrieve experiment-item-level scores programmatically:

Step 1: Fetch the experiment run

Get the experiment run details including all trace IDs:

from langfuse import Langfuse
from urllib.parse import quote

langfuse = Langfuse()

dataset_name = "your-dataset-name"
run_name = "your-run-name"

# URL encode names if they contain special characters
encoded_dataset_name = quote(dataset_name, safe="")
encoded_run_name = quote(run_name, safe="")

# Fetch experiment run
run = langfuse.get_run(
    dataset_name=encoded_dataset_name,
    run_name=encoded_run_name
)

# Extract trace IDs
trace_ids = [item["trace_id"] for item in run["dataset_run_items"]]

import { LangfuseClient } from "@langfuse/client";

const langfuse = new LangfuseClient();

const datasetName = "your-dataset-name";
const runName = "your-run-name";

// URL encode names if they contain special characters
const encodedDatasetName = encodeURIComponent(datasetName);
const encodedRunName = encodeURIComponent(runName);

// Fetch experiment run; must use encoded names for fetching runs
const run = await langfuse.dataset.getRun({
  datasetName: encodedDatasetName,
  runName: encodedRunName
});

// Extract trace IDs
const traceIds = run.datasetRunItems.map(item => item.traceId);

GET https://api.langfuse.com/api/public/datasets/{datasetName}/runs/{runName}

API Reference

Step 2: Fetch scores for each trace

Use the trace IDs to retrieve scores for each experiment item:

# Fetch trace details including scores
for trace_id in trace_ids:
    trace = langfuse.get_trace(trace_id)
    scores = trace["scores"]

    print(f"Trace {trace_id}: {scores}")

// Fetch trace details including scores
for (const traceId of traceIds) {
  const trace = await langfuse.trace.get(traceId);
  const scores = trace.scores;

  console.log(`Trace ${traceId}:`, scores);
}

GET https://api.langfuse.com/api/public/traces/{traceId}

API Reference

Recommended: Use Experiment Runner SDK

For a better developer experience, use the Experiment Runner SDK which provides built-in access to all experiment scores and results:

from langfuse import get_client

langfuse = get_client()

# Run experiment with automatic score collection
result = langfuse.run_experiment(
    name="my-experiment",
    data=my_dataset,
    task=my_task,
    evaluators=[my_evaluator]  # optional
)

# Access all scores directly
print(result.format())  # includes all scores in formatted output

import { LangfuseClient } from "@langfuse/client";

const langfuse = new LangfuseClient();

// Run experiment with automatic score collection
const result = await langfuse.experiment.run({
  name: "my-experiment",
  data: myDataset,
  task: myTask,
  evaluators: [myEvaluator]  // optional
});

// Access all scores directly
console.log(await result.format());  // includes all scores in formatted output

Was this page helpful?

Support

On this page