Needle in a haystack: measuring the impact of two nginx RCEs

Two critical CVEs, 35633 configs scraped from GitHub, and a question: does anyone actually write nginx configs that trigger these bugs?

May 29, 2026

We had a lot of fun hacking nginx earlier this year. We know from experience that finding a real RCE in nginx is hard, especially one that triggers in a default or commonly-used configuration.

So when F5 disclosed CVE-2026-42945 (better known as nginx-rift) and CVE-2026-9256 (possibly nginx-poolslip), two critical heap buffer overflows in the nginx rewrite engine, the natural question was: how many real-world configurations are actually vulnerable?

To answer that, we built ngxray, a static vulnerability scanner for nginx configs, and pointed it at GitHub.

The bugs

Both CVEs are heap buffer overflows in nginx's rewrite-phase script engine. They're distinct bugs, but they share a root cause: the engine sizes a buffer in one pass and fills it in another. A heap overflow arises when certain directive combinations cause the two passes to disagree on how much space is needed.

CVE-2026-42945: the stale flag

When a rewrite replacement contains ?, the script engine compiles a call to ngx_http_script_start_args_code, which sets e->is_args = 1. This flag tells the capture-copy function to URI-escape data: + becomes %2B, a 3x size increase.

When the rewrite finishes, regex_end_code resets e->quote but, before the fix, did not reset e->is_args:

e->quote = 0;
// e->is_args = 0;  <-- missing before the fix

If the rewrite has no flag (last, break, redirect, permanent), the engine continues to the next directive with the stale flag still set.

This creates three distinct overflow scenarios, depending on what comes after the flagless rewrite.

The set case. A subsequent set $var $1 invokes ngx_http_script_complex_value_code(). This function creates a zeroed sub-engine for the length pass:

ngx_memzero(&le, sizeof(ngx_http_script_engine_t));  // le.is_args = 0

It measures the buffer at raw capture length. But the copy pass runs through the main engine e where e->is_args = 1, so ngx_http_script_copy_capture_code applies ngx_escape_uri and writes up to 3x more than the buffer holds.

location ~ ^/api/(.*)$ {
    rewrite ^/api/(.*)$ /internal?migrated=true;
    set $original_endpoint $1;    # $1 copied with stale is_args=1
}

This is the variant described in the original nginx-rift report.

The if case. The mechanism here is identical to the previous case, albeit with a different syntax. Both funnel the captured argument (eg $1) through ngx_http_rewrite_value(). The set handler calls it on the assigned value, and the if-condition handler calls it on the right-hand side of the comparison.

When that argument contains a variable, the function emits a ngx_http_script_complex_value_code, with its zeroed length sub-engine and stale-is_args copy pass. This is the exact vulnerable code path discussed in the set case.

location ~ ^/api/(.*)$ {
    rewrite ^/api/(.*)$ /internal?migrated=true;
    if ($request_method = $1) {    # $1 on the right-hand side hits the same bug
        return 204;
    }
}

Not all if operators are affected. The = and != comparisons send the right-hand side through ngx_http_rewrite_value(), the same path set uses, as do the -f/-d/-e file tests when applied to a capture. The regex operators (~, ~*, !~, !~*) instead compile it as a regular-expression pattern, a different code path that never builds the mismatched buffer. So if ($uri ~* $1) is safe, while if ($request_method = $1) is not.

As with the set case, the if must appear after the rewrite in source order. If it runs first, is_args is still 0 and nothing overflows.

One thing worth noting: if{} blocks in nginx's rewrite module compile into the same code array as the parent location. A rewrite inside an if{} block and a set outside it still execute in the same engine run. The is_args flag leaks across the if boundary.

The rewrite-chain case. The stale flag can also overflow inside a second rewrite's own replacement. The first rewrite (with ? and no flag) sets e->is_args = 1 and continues. The second rewrite enters regex_start_code, which before the hardening fix did not reset is_args.

When the second rewrite has no named variables in its replacement (only $1, $2, etc.), regex_start_code takes a fast path for the length calculation. This fast path doesn't use a sub-engine at all. It computes the buffer size inline, adding each capture's raw byte count directly. Because is_args was not reset at the top of the function, the stale flag from the first rewrite is still alive on the main engine e.

The copy pass then calls ngx_http_script_copy_capture_code for each $N. That function checks e->is_args, sees it's 1, and applies ngx_escape_uri. The length pass measured raw bytes, but the copy pass writes escaped bytes. This results in the same mismatch as the set case, just inside a different code path.

location / {
    rewrite ^/(.*)$ /stage/$1?x=1;               # sets is_args, no flag
    rewrite ^/stage/(.*)$ /destination/$1 break;  # $1 sized raw, copied escaped
}

This variant is harder to trigger in practice because the URI produced by the first rewrite must actually match the second rewrite's regex. If the first rewrites to /index.php and the second expects ^/admin/(.*), they'll never chain.

In all three cases, the request must contain bytes that expand under URI escaping (like + becoming %2B) in the captured portion. The escaping is gated on e->request->quoted_uri || e->request->plus_in_uri. Without escapable characters, the size/copy mismatch is zero and no overflow occurs.

CVE-2026-9256: the budget undercount

This one lives in the fast path of regex_start_code, which handles rewrites where the replacement has no named variables. Before the fix, the length calculation budgeted escape space once over the entire URI:

e->buf.len += 2 * ngx_escape_uri(NULL, r->uri.data, r->uri.len,
                                  NGX_ESCAPE_ARGS);

Then it added each capture's raw byte count. But when capture groups are nested, like ^/((.*))$, $1 and $2 cover the same URI bytes. The copy pass escapes those bytes once per $N reference, exceeding the budget.

rewrite ^/((.*))$ http://backend/$1$2 redirect;

The rewrite must trigger URI escaping (redirect, permanent, http://..., or ? in the replacement), and the replacement must reference positional captures whose groups contain each other.

Scraping GitHub

Unfortunately, GitHub doesn't have a "give me all nginx configs" button. nginx configurations can be found not just in .conf files, but also inside Dockerfiles, shell heredocs, Jinja2 templates, ERB, Puppet manifests, Kubernetes ConfigMaps, Helm values, and Markdown documentation. A naive search for filename:nginx.conf misses most of the surface area.

Our collector runs over 100 distinct GitHub Code Search queries:

Direct configs: language:Nginx, filenames like nginx.conf and default.conf, paths under conf.d/ and sites-available/
Template formats: .j2, .erb, .tmpl, .mustache
Embedded configs: Dockerfiles with COPY or heredocs writing to /etc/nginx, Kubernetes YAML with nginx ConfigMap data
Documentation: Markdown and RST with fenced nginx code blocks

Each query is paginated up to GitHub's 10-page limit. Results are deduplicated by content hash. When the collector encounters a Dockerfile, it follows COPY sources back into the same repository to fetch the referenced config files. We made every part of the run resumable, because GitHub's rate limits mean you'll hit a wall eventually.

The raw downloads then pass through an extraction pipeline that separates the nginx config from the wrapper content surrounding it, and strips out any unsupported features, like Jinja templates.

What comes out the other end are clean .conf files that an nginx parser can actually tokenize. The final corpus: 35,633 parseable nginx configurations from thousands of GitHub repositories.

Parsing with nginx's own tokenizer

The parser/ directory in ngxray contains a standalone C program that compiles nginx's actual tokenizer (ngx_conf_read_token and ngx_conf_parse from src/core/ngx_conf_file.c) against a patched handler. We patched ngx_conf_handler() to log and output the parsed syntax tree:

ngx_int_t
conf_handler(ngx_conf_t *cf, ngx_int_t last)
{
    // Records every directive into a JSON syntax tree
    // instead of dispatching to nginx modules
    node = conf_node_create(tree, cf);
    conf_node_append(tree->current, node);
    ...
}

By reusing nginx's tokenizer, we avoid reinventing the wheel, while ensuring our scanner's results match real world observations.

The rule engine

The scanner loads vulnerability signatures from JSON rule files. Each rule specifies which directives to match, structural constraints, and semantic checks specific to the vulnerability.

For CVE-2026-42945, max_args: 2 enforces the no-flag requirement. A flagged rewrite has 3 args (regex, replacement, flag), so any rewrite with more than 2 args is safe. ordered: true ensures the rewrite appears before the set in source order.

For CVE-2026-9256, the overlapping_refs check does actual PCRE parsing. It maps each $N reference in the replacement back to its capture group's position in the regex, then checks whether any two referenced groups physically contain each other. not_regex: "\\$[a-zA-Z_]" ensures no named variables appear, which would force the slow path.

We wrote rules covering both CVEs: three variants of CVE-2026-42945 (the set, if, and rewrite-chain cases) and CVE-2026-9256. Each rule carries embedded test cases that the scanner validates on every run with python3 scan.py --test.

Results

The scanner flagged configs across several dozen repositories. The majority turned out to be PoC reproductions, scanner test fixtures, and tutorial snippets.

After triage, the hits fell into four buckets:

One real vulnerable config. point/cassea, a PHP MVC framework, ships an nginx vhost config with a language-routing rewrite chain. Here's the relevant section of the location / block:

set $controller index;
rewrite '^([^\.?&]*[^/])([?&#].*)?$' $1/$2;
rewrite '^/([a-z]{2})(/.*)$' $2?__lang=$1;          # <-- sets is_args, no flag
rewrite '^(.*)/([?&#].*)?$' $1/index.xml$2;

if ($uri ~* '^/([^/\.]{3,})(/.*)$') {
    set $controller $1;                               # <-- $1 copied with stale is_args
}

The language rewrite on line 3 strips a two-letter prefix like /en/... and appends ?__lang=en. It has no flag, so the script engine continues with e->is_args = 1. The if block below it extracts a controller name from the rewritten URI. The set $controller $1 inside that if runs through complex_value_code with the stale flag.

The question is whether $1 inside the if can contain escapable characters. The if regex is '^/([^/\.]{3,})(/.*)$', where the first capture group matches three or more characters that aren't / or .. That includes +.

A request to /en/++++++++++++++++++++++++/whatever passes through the language rewrite (stripping /en), producing /++++++++++++++++++++++++/whatever?__lang=en. The if regex then matches, capturing ++++++++++++++++++++++++ into $1. The set sizes the buffer at 24 raw bytes, but the copy pass escapes each + to %2B, writing 72 bytes.

We built a minimal reproduction and ran it in Docker against nginx compiled with AddressSanitizer:

==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x511000001b48
SUMMARY: AddressSanitizer: heap-buffer-overflow src/core/ngx_string.c:1689 in ngx_escape_uri

The project itself is abandoned: a PHP5 framework last updated in 2011, 3 stars, zero forks, homepage offline. As far as we can tell, nobody is running this specific config. But the pattern it uses, language prefix stripping via flagless rewrite with ?, is a legitimate design that someone could independently arrive at.

Documentation and tutorials. A handful of repos contained the vulnerable pattern inside Markdown exercise files and blog posts. Anyone who copies these snippets into a real config inherits the bug. One recurring example is an image-processing tutorial:

rewrite ^/images/([a-z]{2})/([a-z0-9]{5})/(.*)\.(png|jpg|gif)$ /data?file=$3.$4;
set $image_file $3;

Two Chinese-language nginx tutorial repos had this pattern. We confirmed it crashes with a request to /images/en/ab12c/+++...+++.jpg, where $3 captures the plus signs and the stale is_args does the rest.

PoC and lab environments. About a dozen repos were intentional CVE reproductions: nginx-rift-private-lab, CVE-2026-42945, cve-2026-42945-nginx32-lab, and so on. These all use the standard /api/(.*) trigger from the original advisory. They're doing exactly what they're supposed to do.

Scanner test fixtures. Four repos were test cases for other nginx linting tools, with files named vulnerable.conf and bad.conf.

The chain variant

The rewrite-chain variant deserves separate mention, because it shows how the triage pipeline works.

The scanner produced 29 raw matches. Then the filters kicked in:

| Stage                              | Count |
|------------------------------------|-------|
| Raw chain-rule matches             | 29    |
| After `$scheme://` redirect filter | 28    |
| After literal-prefix filter        | 7     |
| After manual review                | 0     |

The $scheme:// filter catches rewrites where the replacement starts with http:// or $scheme. These are implicit redirects, so nginx returns a 3xx and stops processing. No chaining occurs.

The literal-prefix filter compares the first rewrite's output URI against the second rewrite's regex: if the first rewrites to /index.php and the second requires ^/admin/ads/edit/, they can't chain.

The remaining 7 findings all had second regexes starting with a capture group, which the scanner can't rule out statically. Manual review killed all of them. One config rewrites to /journo but the second regex requires ^/([a-zA-Z0-9]+-...)/rss$, and /journo has no - or /rss suffix. Another rewrites to /index.php but the second regex is ^/@(\w+)/(following|followers), and /index.php doesn't start with /@.

What this means

We are living through the first AI Bugmageddon, and it has produced a lot of noise alongside real findings. We've contributed to some of that noise ourselves, so we are not in a position to judge anyone. But that's exactly why this kind of triage matters: defenders need to know which CVEs apply to their infrastructure and which ones they can deprioritize.

In this instance, the bugs are real and exploitable, but their real-world impact is likely low. Both CVEs rely on config patterns that almost never appear in production: CVE-2026-42945 requires a flagless rewrite with ? followed by set or if referencing positional captures; CVE-2026-9256 requires nested capture groups where the replacement references multiple overlapping groups. Out of 35,633 configs, we found one vulnerable config, in an abandoned project.

The caveat is that GitHub skews toward examples, tutorials, and small projects. Complex rewrite chains for language routing or URL migration tend to live in private infrastructure repos and configuration management systems that never touch public GitHub. The point/cassea pattern, language prefix stripping via a flagless ? rewrite, is a reasonable multilingual design that any organization could independently arrive at.

That said, these are still unauthenticated heap overflows. One vulnerable config in production is enough to cause denial of service or worse.

Try it

ngxray is open source. Point it at your configs:

git clone https://github.com/califio/ngxray && cd ngxray
git submodule update --init && make
python3 scan.py /etc/nginx/

If you're running nginx < 1.31.1, check your rewrite directives. Look for flagless rewrites with ? in the replacement followed by set or if using $1-$9. Look for rewrite regexes with nested capture groups whose $N references overlap.

Or just run the scanner.

Calif

Discussion about this post

Ready for more?