At-scale Infrastructure Monitoring

Unified AIOps-powered monitoring

Problem: Fragmented monitoring forces customers to juggle disconnected legacy Azure tools, causing delayed issue detection, longer downtime, and alert overload. These disruptions erode customer trust and satisfaction, directly impacting revenue and business growth.

Solution: An unified AIOps-powered monitoring of all health performance and health into distributed infrastructure components to automatically detect issues, surface root causes and reduce manual effort.

Impact: Actionable insights guided design improvements. With real-time visibility, intelligent alerts, and AI-driven root cause analysis, this solution consolidates signals and automates workflows—enabling faster issue detection, lowers MTTR, and reduced operational overhead to boost system reliability and customer satisfaction. Insights: Compete analysis | Hypotheses | Raw notes

Process: I collaborated with customers, multiple team leads across engineering, product management to co-create and validate the proposed design. Read more...

By developing the information architecture and data visualization strategy. This approach connected related signals across metrics, time, and resources—highlighting patterns like how a CPU spike ties to disk or network issues using timelines, filters, and visual cues. My work balanced strong teamwork with individual initiative, making it easier for users to quickly identify root causes and resolve issues faster. [hide]

Next work