Forem: Pranay Prateek

SigNoz : Open-source alternative to DataDog

Pranay Prateek — Sat, 06 Mar 2021 14:38:09 +0000

More and more companies are now shifting to a cloud-native & microservices based architecture. Having an application monitoring tool is critical in this world because you can’t just log into a machine and figure out what’s going wrong.

We have spent the last couple of years learning about application monitoring & observability. What are the key features an observability tool should have to enable fast resolution of issues.

In our opinion, a good observability tools should have

Out of the box application metrics
Way to go from metrics to traces to find why some issues are happening
Seamless flow between metrics, traces & logs — the three pillars of observability
Filtering of traces based on different tag and filters
Ability to set dynamic thresholds for alerts
Transparency in pricing

User experience not great in current open-source tools

We found that though there are open-source tools like Prometheus & Jaeger, they don’t provide great user experience like SaaS products do. It takes lots of time and effort to get them working, figuring out the long term storage, etc. And if you want metrics and traces, it’s not possible as Prometheus metrics & Jaeger traces have different formats.
SaaS tools like DataDog and NewRelic do a much better job at many of these aspects

They are easy to setup & get started
Provide out-of-box application metrics
Provides correlation between metrics & traces

But it has the following issues

Crazy node based pricing which doesn’t make sense in today’s micro-services architecture. Any node which is live for more than 8hrs in a month is charged. So, unsuitable for spiky workloads
Very costly. They charge custom metrics for $5/100 metrics
It is cloud only, so not suitable for companies which have concerns with sending data outside their infra
For any small feature, you are dependent on their roadmap. We think this is an unnecessary restriction for a product which is used by developers. A product used by developers should be extendible

To fill this gap we built SigNoz, an open-source alternative to DataDog.

Some of our key features which makes us vastly superior to current open-source products

Out of the box application metrics

Get p90, p99 latencies, RPS, Error rates and top endpoints for a service out of the box.

Seamless flow between metrics & traces

Found something suspicious in a metric, just click that point in the graph & get details of traces which may be causing the issues. Seamless, Intuitive.

Filtering based on tags

for example you can find latency experienced by customers who have customer_type set as premium

Custom aggregates on filtered traces

Create custom metrics from filtered traces to find metrics of any type of requests. Want to find p99 latency of customer_type: premium who are seeing status_code:400. Just set the filters, and you have the graph. Boom!

Transparent usage Data

You can drill down details of how many events is each application sending or at what granularity, so that you can adjust your sampling rate as needed and not get a shock at the end of the month ( case with SaaS vendors many a times)

Detailed Flamegraphs

Detailed flamegraph to find exact cause of the issue, and which of the underlying requests is causing the problem. Is it a SQL query gone rogue or a redis operation is causing an issue

Check out our Github repo & give it a try. We would love any feedback on what you like or what doesn’t make sense. We are also active on Slack, so give us a shout out there and we would be happy to answer any questions or help you set things up.

Monitoring and Observability related questions? (What should I write about?)

Pranay Prateek — Sun, 10 May 2020 12:04:18 +0000

Hello!

I'm the co-founder of SigNoz , a lightweight application monitoring tools and I live and breath monitoring & observability :)

I'd like to contribute to the Dev community, answering questions relating to monitoring & observability.

What questions do you all have? / What should I write about?

Pranay

Ask DEV: LightWeight APM for Kubernetes using OpenTelemetry?

Pranay Prateek — Thu, 02 Apr 2020 06:52:48 +0000

After going through monitoring and tracing solutions in Prometheus, DataDog, NewRelic, and other players like LightStep, HoneyComb, Instana, etc, I still don't see a product that is simple and easy to use for people who don't need to do the heavyweight RCA.

DataDog still remains the only option for companies that spend in the ticket size < 2000 USD per month in APM solutions, but they seem to be very complex to me. Another option is shifting to OSS tools using Prometheus, OpenTracing, OpenTelemetry. But then you need to spend a lot of time in learning PromQL, HA setup, Storage and building Grafana dashboard.

All vendors doing tracing don't seem to sample data to enable metrics collected over traces and enable RCA which come at a huge cost of storage (the pricing plan of these vendors can make a small company sweat). Sending data when my application is running fine seems to add little value to cost.

I see a product gap that tries to address the low ticket-size users (< $2000 spend per month on APM) of all APM players with below plans and is based on OSS tools like Prometheus/OpenTracing/Opentelemetry:

Plan 1 - 40% of the cost by other vendors (only Metrics) - Converting OpenTracing instrumentation to useful Prometheus metrics like in chapter 11 of Mastering Distributed Tracing. A rather detailed metrics from APM perspective like RPS + Latencies + Slowest queries of Redis, Mongo, MySql, etc. Also, metrics aggregated by endpoints of the application.

Plan 2 - 60% of the cost by other vendors (Metrics + sampled traces) - Tail Based Sampling based on anomaly found by gathered metrics from plan 1. This will send only the trace needed for debugging the anomaly and thus will be a huge cost saver.

Plan 3 - 100% of the cost by other vendors (100% of traces) - Full-fledged enterprise plan sending full traces for RCA and better debugging.

My current understanding is all the APM players are focussing on higher ticket customers (Plan 3) right now.

Wondering if a solution that is lightweight and cost-effective ( < 2000 USD per month) built natively for Kubernetes will be interesting to you? What are the features you would like to see there?

Folks, what are some conferences in DevOps/SRE space that you look forward to?

Pranay Prateek — Mon, 10 Feb 2020 13:12:49 +0000

This is what I have till now. Please add any suggestions in comments