Observability is the ability to understand the internal state of a system by examining its outputs. In the context of software, this means being able to understand the internal state of a system by examining its telemetry data, which includes traces, metrics, and logs.
OpenLLMetry is an open source project that allows you to easily start monitoring and debugging the execution of your LLM app. Tracing is done in a non-intrusive way, built on top of OpenTelemetry
How to use OpenLLMetry:
Let us first install the SDK:
pip install traceloop-sdk
Then we can initialize OpenLLmetry and use it with any LLM call like:
import os
from openai import OpenAI
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow
Traceloop.init(app_name="joke_generation_service")
@workflow(name="joke_creation")
def create_joke():
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
completion = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a joke about opentelemetry"}],
)
return completion.choices[0].message.content
OpenLLmetry supports multiple destinations such as Sentry, New Relic, Dynatrace etc where logs and traces can be seen which support open telemetry formats, OpenLLMetry can instrument everything that OpenTelemetry already instruments - so things like your DB, API calls, and more. On top of that, they have built a set of custom extensions that instrument things like your calls to OpenAI or Anthropic, or your Vector DB like Chroma, Pinecone, Qdrant or Weaviate, it also supports various frameworks such as LangChain, LlamaIndex etc

OpenLLmetry also introduced Semantic Conventions for LLMs for Opentelemetry, The Semantic Conventions define a common set of (semantic) attributes which provide meaning to data when collecting, producing and consuming it. The Semantic Conventions specify among other things span names and kind, metric instruments and units as well as attribute names, types, meaning and valid values, This allows easier correlation and consumption of data
Semantic conventions for generative AI events
OpenLLmetry defines Semantic conventions for Events, Traces and Spans, for example, for events:
Event: gen_ai.system.message
Event: gen_ai.user.message
Event: gen_ai.assistant.message
Event: gen_ai.tool.message
Event: gen_ai.choice
Generative AI Client Metrics
Metric: gen_ai.client.token.usage
Metric: gen_ai.client.operation.duration
Generative AI Model Server Metrics
Metric: gen_ai.server.request.duration
Metric: gen_ai.server.time_per_output_token
Metric: gen_ai.server.time_to_first_token
Reference: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/gen-ai-metrics.md
Using these semantics it becomes easier to monitor and measure LLM performance and observability