OpenLLmetry: OpenTelemetry based Open-source observability

Subhajeet Dey

•

January 30, 2025

Observability is the ability to understand the internal state of a system by examining its outputs. In the context of software, this means being able to understand the internal state of a system by examining its telemetry data, which includes traces, metrics, and logs.

OpenLLMetry is an open source project that allows you to easily start monitoring and debugging the execution of your LLM app. Tracing is done in a non-intrusive way, built on top of OpenTelemetry

How to use OpenLLMetry:

Let us first install the SDK:

pip install traceloop-sdk

Then we can initialize OpenLLmetry and use it with any LLM call like:

import os

from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow

Traceloop.init(app_name="joke_generation_service")

@workflow(name="joke_creation")
def create_joke():
  client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
  completion = client.chat.completions.create(
      model="gpt-4o",
      messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
  )

  return completion.choices[0].message.content

OpenLLmetry supports multiple destinations such as Sentry, New Relic, Dynatrace etc where logs and traces can be seen which support open telemetry formats, OpenLLMetry can instrument everything that OpenTelemetry already instruments - so things like your DB, API calls, and more. On top of that, they have built a set of custom extensions that instrument things like your calls to OpenAI or Anthropic, or your Vector DB like Chroma, Pinecone, Qdrant or Weaviate, it also supports various frameworks such as LangChain, LlamaIndex etc

OpenLLmetry also introduced Semantic Conventions for LLMs for Opentelemetry, The Semantic Conventions define a common set of (semantic) attributes which provide meaning to data when collecting, producing and consuming it. The Semantic Conventions specify among other things span names and kind, metric instruments and units as well as attribute names, types, meaning and valid values, This allows easier correlation and consumption of data

Semantic conventions for generative AI events

OpenLLmetry defines Semantic conventions for Events, Traces and Spans, for example, for events:

Event: gen_ai.system.message

Event: gen_ai.user.message

Event: gen_ai.assistant.message

Event: gen_ai.tool.message

Event: gen_ai.choice

Generative AI Client Metrics

Metric: gen_ai.client.token.usage

Metric: gen_ai.client.operation.duration

Generative AI Model Server Metrics

Metric: gen_ai.server.request.duration

Metric: gen_ai.server.time_per_output_token

Metric: gen_ai.server.time_to_first_token

Reference: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md

Using these semantics it becomes easier to monitor and measure LLM performance and observability