
Industry
DevOps, ITOps
Target group
SREs, IT operators
Client
IBM
Position
Senior UX designer
Reducing operational complexity at enterprise scale
When companies rely on complex IT systems, things can go wrong, just like a car breaking down or a phone suddenly freezing. Large enterprises have IT operations (ITOps) teams responsible for keeping everything running, but they deal with massive amounts of data and alerts. Finding the root cause of an issue quickly is a major challenge.
What I did on the project
I contributed to this initiative as a senior designer: shaping experience strategy, aligning stakeholders around a shared vision, and ensuring design decisions translated into measurable operational impact across complex enterprise environments.

Research and alignment
We kicked off the project by defining the key challenges IT operations teams faced. This was done through stakeholder workshops, user research, and data analysis. We identified core challenges:
01
Overwhelming volumes of alerts with limited prioritization
02
Manual and fragmented troubleshooting workflows
03
Lack of clear system intelligence to guide operator decisions
The broader design problem was not just improving screens, but redefining how operators understand, trust, and act on AI‑driven insights.

Design goals
Leadership & Collaboration
In this initiative, I operated with design responsibility across a complex, distributed organization:
01
Led design direction while collaborating with ~25 engineers across 3 autonomous development teams
02
Partnered with multiple product managers spanning different product areas, teams, and time zones and tasking a junior designer
03
Created new patterns for Carbon design system (I was part of the Carbon design system guild)
04
Facilitated async and live design reviews to maintain quality and consistency across geographies
05
Acted as the primary design counterpart to engineering and product leadership, helping resolve prioritization, scope, and trade-offs
These goals guided all design decisions and ensured alignment with engineering, product, and leadership priorities.
Implementation details
What We Shipped / Solution

Alert prioritization, insight surfacing and probability score creation
We redesigned how alerts were grouped and contextualized, shifting focus from volume to relevance. AI‑driven correlations helped operators understand root causes instead of reacting to noise.


Incident investigation workflow
A unified investigation experience allowed users to move seamlessly from detection to diagnosis, reducing context switching and manual effort.

Teams can define step-by-step runbooks, trigger automated actions when specific alerts occur, and collaborate on creating, reviewing, and deploying those workflows — reducing manual effort, inconsistency, and time to resolution.



Research & Iteration
Research & Iteration
We conducted usability testing and iterative validation with IT operators across enterprise environments:
01
Tested clarity of confidence in recommendations, patterns and risk score create by ML
02
Refined terminology and visual hierarchy to match real operational language
03
Iterated on workflows to reduce time spent navigating between tools
Insights continuously informed prioritization and design refinement.
Product page
I also worked with the marketing team to help with the product page and the go to market.

Result
We successfully onboarded customers to the new UI, resulting in an increase in usage.
Impact
Key outcomes included:
01
~25% reduction in Mean Time to Resolution (MTTR)
02
Improved operator confidence in ML‑driven insights
03
Reduced alert fatigue and manual investigation effort
04
Stronger alignment between product experience and enterprise operational needs
05
Design patterns were included on the Carbon design system for my portfolio
The solution influenced the broader engineering workflow and informed future platform-level enhancements.
Reflection
This project reinforced the role of design leadership in complex, AI‑driven enterprise products. Success required not only strong UX execution, but the ability to align stakeholders, translate ambiguity into direction, and design systems that scale with both technology and organizational needs.
Next project
Reducing with ML operational complexity at enterprise scale
When companies rely on complex IT systems, things can go wrong, just like a car breaking down or a phone suddenly freezing. Large enterprises have IT operations (ITOps) teams responsible for keeping everything running, but they deal with massive amounts of data and alerts. Finding the root cause of an issue quickly is a major challenge.


My role
I contributed to this initiative as a senior designer: shaping experience strategy, aligning stakeholders around a shared vision, and ensuring design decisions translated into measurable operational impact across complex enterprise environments.
what we shipped:
We built several key features to help teams identify and resolve issues before they negatively impacted end users:









