Shadcn/ui Components Break When You Stream LLM Markdown Into Them Directly
You've wired up your LLM streaming endpoint. Tokens are arriving. You're piping them into a shadcn/ui Card with a markdown renderer. The demo looks great — text streaming in, smooth appearance, everything working.
Then you start noticing things. The code blocks flash on every token. The card bounces in height as content builds. The syntax highlighting runs repeatedly during streaming and then settles into the final state. Users on slower connections see layout shifts that make the UI feel unstable. A table that appears mid-stream breaks the container width until the closing pipe character arrives.
None of these are bugs you can grep for. They're emergent behaviors that appear at the intersection of how LLMs stream tokens, how markdown parsers work, and how React and shadcn/ui manage component state. This article walks through each failure mode and the layered architecture that addresses them.
The Mental Model Mismatch
The root cause of every streaming-LLM-to-shadcn/ui problem is a mental model mismatch. Markdown parsers are built for complete documents. LLM streaming produces a character stream. The two are fundamentally incompatible without an intermediate layer.
When you stream **bold text** from an LLM, the tokens arrive in order: **, bo, ld, , tex, t**. At each step, a standard markdown parser sees a different document:
**: An unclosed bold tag — invalid markdown**bo: Still invalid**bold text: Almost valid but unclosed**bold text**: Finally valid
If you're re-parsing and re-rendering on every token (the naive implementation), you get:
- A flicker from invalid → valid state as the parser handles partial syntax
- Re-triggered animations on every parse cycle
- React diff overhead from structural DOM changes mid-stream
Shadcn/ui components compound this because they're built for stable rendered content. A Card component with a CardContent that changes structure on every token creates expensive re-renders that aren't visible in development but are noticeable in production.

The Five Failure Modes
1. The Markdown Re-Parse Problem
Every incoming token triggers a full re-parse of the accumulated content. For a 500-token response, that's 500 parse operations, many of which produce structurally different ASTs as syntax becomes complete.
The result: React sees a different component tree on each token. Even if the visual output appears similar, React reconciliation has to diff and patch the DOM repeatedly, causing subtle flickering and unnecessary work.
The fix: Buffer tokens and parse in batches. Parse every 50ms rather than on every token. During the buffer interval, tokens accumulate in a string buffer; the UI updates at 50ms intervals rather than per-token. This dramatically reduces parse operations without making the streaming feel less responsive.
2. Animation Re-Triggering
Shadcn/ui components use CSS transitions and animations for appearance effects. If a component re-mounts or its key changes during streaming, animations re-trigger from the start. A card with a fade-in animation will repeatedly fade in as the structure changes during parsing.
The fix: Stabilize component keys during streaming. Use a streaming-stable wrapper that maintains the same React key throughout the stream, only allowing structural component changes after the stream completes.
3. Layout Shift During Streaming
As content builds, container sizes change. A shadcn/ui Card that starts with one line of text and grows to twenty lines shifts the layout of surrounding content on every meaningful height change. On slower connections where each token arrives noticeably, users experience a bouncing interface.
The fix: Reserve height upfront using a skeleton or minimum-height constraint during streaming. A streaming-aware wrapper can show a height-stabilized skeleton during the first 500ms, then transition to actual content. This matches how the best production AI chat UIs handle the initial uncertainty about response length.
4. Code Block Highlighting Cascade
Syntax highlighting is particularly expensive during streaming. Libraries like Prism or Highlight.js run full tokenization passes on every update. A streaming code block that updates 200 times during delivery runs 200 highlighting passes.
Worse: partial code blocks are often invalid in the highlighting grammar, causing the highlighter to produce error states or fall back to plain text, then re-render to the highlighted state once the block is complete.
The fix: Defer highlighting until code blocks are complete. During streaming, render code blocks as plain <pre><code> elements. Detect when a code block is complete (closing triple backtick received) and then run syntax highlighting once. This eliminates the cascade entirely.
5. XSS and Content Safety
LLM output piped directly into a markdown renderer that produces raw HTML creates an XSS surface. If the LLM generates a markdown link with a javascript: URL or produces an <img> tag with an onerror handler, a naive renderer will execute it.
Shadcn/ui doesn't sanitize markdown content — that's not its job. The application layer is responsible for content safety.
The fix: Run LLM markdown output through DOMPurify or a similar sanitizer before rendering. For code blocks specifically, ensure the syntax highlighter escapes output rather than rendering raw HTML.
The Layered Architecture That Works
After working through each failure mode, the production architecture that handles all of them looks like this:
Token Stream
↓
Streaming Buffer (50ms batching)
↓
Incremental Parser (partial-markdown aware)
↓
Component-Aware Renderer (stable keys, deferred highlighting)
↓
Interaction-Aware Layer (disable interactions during stream)
↓
shadcn/ui Components (Card, CodeBlock, etc.)
Streaming Buffer: Accumulates tokens for 50ms before triggering a re-parse. Reduces 500 parse operations to ~10 for a typical response.
Incremental Parser: Uses a streaming-aware markdown parser (like streaming-markdown or llm-ui) that handles partial syntax without producing error states. Produces valid output at each frame even for incomplete markdown.
Component-Aware Renderer: Maintains stable React keys during streaming. Defers syntax highlighting until code blocks are complete. Uses React memo to prevent unnecessary re-renders of already-complete sections.
Interaction-Aware Layer: Disables copy buttons, expand/collapse, and other interactions during streaming. Enables them after the stream completes. Prevents broken interaction states when content is still building.

Libraries Worth Knowing in 2026
llm-ui: Purpose-built for rendering LLM output in React. Handles streaming markdown, code blocks, and custom components with proper buffer management. Opinionated but solves the standard problems.
streaming-markdown: Lightweight streaming markdown parser that produces valid output for partial input. Good if you need more control than llm-ui provides.
Vercel AI SDK: Provides useChat and useCompletion hooks that handle SSE connection management and token buffering. Pairs well with a custom streaming renderer.
react-markdown with remark-gfm: The standard markdown rendering stack, but requires wrapping with a streaming buffer to be used correctly with LLM streams.
The Production Pattern
For a shadcn/ui-based chat interface, the production implementation looks roughly like:
- Use Vercel AI SDK's
useChathook for connection management and token accumulation - Buffer the accumulated content and re-parse at 50ms intervals
- Use
react-markdownwithrehype-highlightdeferred until code blocks are closed - Wrap in a streaming-stable container with fixed minimum height
- Disable shadcn/ui interactive elements (copy buttons, etc.) during streaming
- Run sanitization on the final output before it enters permanent chat history storage
The key insight: the LLM stream and the rendered output aren't the same thing. There needs to be an explicit layer between them that handles the translation from "character stream" to "React component tree" with buffering, stability, and safety handled explicitly.
This is the architecture that separates demo-quality streaming UIs from production-quality ones. The demo works when the network is fast and the response is short. The production system works across all the conditions your users will actually experience.





