elvis: NEW Research from Google. Integration test failures are painful because the signal is buried in messy logs. Massive output, heterogeneous s

#816

elvis

@omarsar0· 298.4K followers

BuilderRank #816

Original Tweet

NEW Research from Google. Integration test failures are painful because the signal is buried in messy logs. Massive output, heterogeneous systems, low signal-to-noise ratio, and unclear root causes. This paper introduces Auto-Diagnose, an LLM-based tool deployed inside Google's Critique code review system. Auto-Diagnose analyzes failure logs, summarizes the most relevant lines, and suggests the root cause in the developer workflow where the failure is already being reviewed. The deployment numbers are notable. In a manual evaluation of 71 real-world failures, Auto-Diagnose reached 90.14% root-cause diagnosis accuracy. After Google-wide deployment, it was used across 52,635 distinct failing tests. User feedback marked it "Not helpful" in only 5.8% of cases, and it ranked #14 in helpfulness among 370 Critique tools. Paper: https://arxiv.org/abs/2604.12108 Learn to build effective AI agents in our academy: https://academy.dair.ai/

View on X →

AI Classification

Whether our pipeline considers this post AI-relevant

AI Relevant

Attachment Summary

Source: image, article

The first page of the arXiv preprint paper displays the large bold title "LLM-Based Automated Diagnosis Of Integration Test Failures At Google" above author names Celal Ziftci, Spencer Greene, Ray Liu, and Livio Dalloro with their Google New York email addresses and affiliations. Visible text includes the Abstract introducing Auto-Diagnose as an LLM-based tool integrated into Google's Critique code review system with 90.14% root cause accuracy from a 71-failure evaluation and 52,635-test deployment, the start of the 1 Introduction section on software testing and log analysis challenges, CCS Concepts, Keywords listing Software, Testing, Debugging, Diagnosis, Productivity, LLM, a Preprint Notice for ICSE 2026, and a vertical left-margin stamp reading arXiv:2604.12108v1 [cs.SE] 13 Apr 2026. Celal Ziftci and three co-authors published a paper on 13 Apr 2026 titled "LLM-Based Automated Diagnosis Of Integration Test Failures At Google," detailing Google's use of large language models to automatically diagnose integration test failures. This advances software engineering by reducing manual debugging time at scale in Google's complex CI/CD pipelines, enabling faster release cycles and higher reliability.

Enriched Text

Assembled input used for vector embedding and topic clustering

Tweet: NEW Research from Google. Integration test failures are painful because the signal is buried in messy logs. Massive output, heterogeneous systems, low signal-to-noise ratio, and unclear root causes. This paper introduces Auto-Diagnose, an LLM-based tool deployed inside Google's Critique code review system. Auto-Diagnose analyzes failure logs, summarizes the most relevant lines, and suggests the root cause in the developer workflow where the failure is already being reviewed. The deployment numbers are notable. In a manual evaluation of 71 real-world failures, Auto-Diagnose reached 90.14% root-cause diagnosis accuracy. After Google-wide deployment, it was used across 52,635 distinct failing tests. User feedback marked it "Not helpful" in only 5.8% of cases, and it ranked #14 in helpfulness among 370 Critique tools. Paper: https://arxiv.org/abs/2604.12108 Learn to build effective AI agents in our academy: https://academy.dair.ai/ The first page of the arXiv preprint paper displays the large bold title "LLM-Based Automated Diagnosis Of Integration Test Failures At Google" above author names Celal Ziftci, Spencer Greene, Ray Liu, and Livio Dalloro with their Google New York email addresses and affiliations. Visible text includes the Abstract introducing Auto-Diagnose as an LLM-based tool integrated into Google's Critique code review system with 90.14% root cause accuracy from a 71-failure evaluation and 52,635-test deployment, the start of the 1 Introduction section on software testing and log analysis challenges, CCS Concepts, Keywords listing Software, Testing, Debugging, Diagnosis, Productivity, LLM, a Preprint Notice for ICSE 2026, and a vertical left-margin stamp reading arXiv:2604.12108v1 [cs.SE] 13 Apr 2026. Celal Ziftci and three co-authors published a paper on 13 Apr 2026 titled "LLM-Based Automated Diagnosis Of Integration Test Failures At Google," detailing Google's use of large language models to automatically diagnose integration test failures. This advances software engineering by reducing manual debugging time at scale in Google's complex CI/CD pipelines, enabling faster release cycles and higher reliability.

Current Stats

14.5KViews

157Likes

36Retweets

10Replies

141Bookmarks

0Quotes

DAIR.AI Academy

Learn AI with hands-on courses on AI agents, RAG, LLMs, and more. Powered by AI tutoring and a supportive community.

Story: Google Deploys Auto-Diagnose LLM Tool for Integration Test Failure Diagnosis

Engagement Timeline(130 snapshots)

Time	Views	Likes	Bookmarks	RTs	Replies
11:00 AM UTC	+55	+1	—	—	—
10:50 AM UTC	+59	—	—	—	—
10:40 AM UTC	+78	+2	—	+1	—
10:30 AM UTC	+48	—	—	—	—
10:20 AM UTC	+61	+1	+2	—	—
10:10 AM UTC	+52	—	—	—	—
10:00 AM UTC	+75	+1	—	—	+1
9:50 AM UTC	+6	—	—	—	—
9:40 AM UTC	+131	+1	—	—	—
9:30 AM UTC	+52	+2	—	—	—

Time

Views

Likes

Bookmarks

RTs

Replies

11:00 AM UTC

+55

—

10:50 AM UTC

+59

—

10:40 AM UTC

+78

—

10:30 AM UTC

+48

—

10:20 AM UTC

+61

—

10:10 AM UTC

+52

—

10:00 AM UTC

+75

—

9:50 AM UTC

—

9:40 AM UTC

+131

—

9:30 AM UTC

+52

—