Mastering Observability with Splunk

Written by John Ansett
Published on Friday 4th July, 2025

When Tooling Freedom Turns into Chaos
At Teku, we work with some of the world’s most complex software engineering teams. They move fast, deploy independently, and have the freedom to choose the tools they need to execute on business demands. This is great setup for innovation, but over time it creates technology sprawl – a perfect storm for unpredictable and excessive costs, fragmented visibility and low levels of cohesion between monitoring tools.

Recently we worked with a global ticketing platform provider. Teams were deploying individual logging platforms – Sumo Logic, Elastic, Splunk, AWS OpenSearch – each with its own ingestion method, structure and cost profile. Distributed Tracing and Metric collection was spread across multiple tools, and synthetic checks either written in python and run through cron or integrated with AWS Canaries with metrics going nowhere! Does this sound familiar?

Finding the Right North Star: Consolidation + Standardization
To solve the Observability challenge, it isn’t just about reducing costs—it’s about regaining control and to rethink from the ground up. Keep it simple: centralize everything into Splunk, standardize collection using OpenTelemetry, and unlock operational insight across all your digital platforms without sacrificing developer velocity.

Here’s how we break it down.

Step 1: Modernize Logging
The first challenge is achieving a unified logging platform. It’s essential to capacity and cost-plan for scale and in large environments, this can be only be achieved with workload-based licensing. Once you have a platform for growth, it’s time to consolidate, this should start as close to the source as possible by implementing standard logging patterns and collection method. Many options exist out there: Splunk UF, FluentD, FluentBit, Vector, LogStash, etc., and while Teku has experience with all of them, these days we opt for Open Telemetry. It’s agnostic and open source, yet not only supported by, but recommended by, most major Observability providers.

As we dive into the code, migrating from Elastic, OpenSearch, Sumo Logic et al, the clear path forward is helm. It simplifies deployment, requiring only a single code line update for the majority of migrations. Finally, by using Open Telemetry you’re setup nicely for the next phases of Observability: Metrics and Traces.

Step 2: Kubernetes + OpenTelemetry FTW
By deploying OpenTelemetry DaemonSets across all Kubernetes clusters, and leveraging resource attributes and variables, onboarding logs becomes the most basic of merge requests – add the namespace to the whitelist and away we go. Otel can then rout the logs to the correct index and assign the correct sourcetypes.

This makes onboarding new products frictionless. Developers no longer need to build custom log exporters—just follow the standard and go.

Step 3: Rethink Tracing and Synthetics
With disparate platforms providing distributed tracing and metrics, neither developers, nor product teams, or executives are deriving the value that a tool sprawl promised (or that you are paying for!). With all the components in place (Open Telemetry) the consolidation of tools and migration of all services to Splunk APM will not only improve performance insights and service correlation—it will drastically reduce your tracing costs.

Using the flexibility of Splunk’s “Pooled Model” pricing for Observability we can leverage any available surplus in license capacity to further increase visibility; enabling Real User Monitoring for critical path products and moving synthetic checks back to a centralized product. The result? Better integration, fewer tools, zero spend on RUM and synthetic checks.

Step 4: Track the Impact
By aligning around Splunk and OpenTelemetry, you have all the tools to see an immediate ROI. Remember the global ticketing platform provider? They saw tangible results:

Observability costs dropped by 49% (and tracking toward 60% in 2026)

Ingestion capacity increased 10x+

Onboarding time for new services went from days to minutes

Developer experience improved dramatically

Final Thoughts
Observability isn’t just a tooling problem. It’s a systems problem—a strategy problem. Fragmented telemetry leads to slower incident response, higher costs, and frustrated developers.

Are you struggling? We’re a Splunk Premier Partner and would love to help! Get in touch to explore how we can transform your observability from a tangled mess into a powerful, cost-effective platform for scale.