BLOG_POST / clickhouse-join-filter-optimizer-deadly-bug

A ClickHouse segfault in query planning (and the 5-line fix)

10 min read
1828 words
tl;dr summary

A dev added isNotNull() to a filter on the right side of an ANY LEFT JOIN, and ClickHouse started segfaulting during join optimization with allow_experimental_analyzer=1. The root cause was operator[] default-constructing ColumnWithTypeAndName with type == nullptr. The fix was to use find() and a typed stub for missing arguments, plus a regression test.

We had just migrated a KPI dashboard from Postgres to ClickHouse, the kind of route that gets hammered all day and always manages to be both business-critical and full of filters. The payoff was immediate: queries that used to be “fine if nobody touches the filters” suddenly felt effortless.

About 30 minutes after the day started, a dev sent me a Slack message that sounded innocuous: they had added a WHERE condition, and the query no longer worked. From the tRPC backend, it showed up as ECONNRESET and a socket hang up. That is usually the part where you argue with a client library, stare at timeouts, and then discover you were pointing at the wrong host.

This time, the client was innocent. ClickHouse was crashing and getting restarted by Docker.

At first, container stdout and stderr did not look helpful, so I went digging into the ClickHouse logs. The first useful line I found was the one you never want to see from a database server: a segmentation fault. Even worse, it was happening during query planning and optimization, before execution had even started.

That is the point where “my query is wrong” stops being interesting and “the engine should never do this” becomes the only thing that matters. We all joke about blaming caches and cosmic rays, but this time the crash was very real.

Setup

This happened with:

  • ClickHouse 25.12 in Docker on bare metal
  • Native protocol on port 9000 from the TypeScript client
  • allow_experimental_analyzer = 1 enabled intentionally because we needed a feature behind it
  • A single dashboard route in dev that reliably triggered the crash

The repro uses HTTP on port 8123 for convenience, even though the crash also happens over the native protocol.

I later rebuilt ClickHouse from source on 26.1.1.1 to get a clean stack trace with file and line numbers. I used gdb with gef for the traceback, and the code quality really stood out: after about 37 minutes of compiling Rust, C, and C++ code, it finished on the first try with zero warnings.

Minimal reproduction

I reduced the original query down to something I could paste into an issue without bringing our whole schema along. The goal was to keep only the ingredients that mattered: an ANY LEFT JOIN, a filter referencing the right side, and an isNotNull(...) call with the experimental analyzer enabled.

Start ClickHouse:

docker run --rm -p 8123:8123 -p 9000:9000 \
  -e CLICKHOUSE_USER=default \
  -e CLICKHOUSE_PASSWORD=default \
  clickhouse/clickhouse-server:25.12

Create a minimal schema and data set:

CREATE TABLE Session
(
  id String,
  site Enum8('STORE_A' = 1, 'STORE_B' = 2),
  device Enum8('DESKTOP' = 1, 'MOBILE' = 2)
)
ENGINE = MergeTree
ORDER BY id;

CREATE TABLE AddedToCart
(
  sessionId String,
  order Int32,
  top Nullable(Int32),
  screenHeight Nullable(Int32),
  screenWidth Nullable(Int32),
  isPromotion UInt8,
  date DateTime64(3)
)
ENGINE = MergeTree
ORDER BY (sessionId, date);

INSERT INTO Session (id, site, device) VALUES
  ('s1', 'STORE_A', 'DESKTOP'),
  ('s2', 'STORE_B', 'MOBILE');

INSERT INTO AddedToCart (sessionId, order, top, screenHeight, screenWidth, isPromotion, date) VALUES
  ('s1', 1, 100, 400, 1024, 1, parseDateTime64BestEffort('2026-01-19T12:00:00.000Z', 3)),
  ('s2', 2, 100, 400, 1024, 1, parseDateTime64BestEffort('2026-01-19T12:00:01.000Z', 3));

Enable the experimental analyzer and run the query:

SET allow_experimental_analyzer = 1;

SELECT
  s.site AS site,
  if(
    (a.order IS NULL) OR (a.order <= 0) OR (a.order > 30),
    NULL,
    accurateCastOrNull(a.order, 'Int32')
  ) AS page_level,
  count() AS count
FROM AddedToCart AS a
ANY LEFT JOIN Session AS s ON a.sessionId = s.id
WHERE (a.top IS NOT NULL)
  AND (a.screenHeight IS NOT NULL)
  AND (a.screenHeight > 0)
  AND (a.isPromotion = _CAST(1, 'UInt8'))
  AND (s.device = 'DESKTOP')
  AND isNotNull(s.site)
GROUP BY site, page_level
ORDER BY site ASC, page_level ASC
FORMAT JSONEachRow;

A few sanity checks made the trigger conditions obvious:

  • If I replaced ANY LEFT JOIN with ANY INNER JOIN, the crash disappeared.
  • If I removed isNotNull(s.site), the crash disappeared.
  • If I disabled allow_experimental_analyzer, the crash disappeared.

So the crash was not random and it was not about data size. It was a specific interaction between join optimization, partial evaluation, and a type-aware function (isNotNull) running at planning time.

What the stack trace pointed to

Once I rebuilt ClickHouse locally (26.1.1.1) to get symbols, the trace consistently passed through this chain:

QueryPlanOptimizations::tryConvertAnyOuterJoinToInnerJoin
  -> filterResultForNotMatchedRows (src/Processors/QueryPlan/Optimizations/Utils.cpp)
  -> ActionsDAG::evaluatePartialResult (src/Interpreters/ActionsDAG.cpp)
  -> executeActionForPartialResult
  -> FunctionIsNotNull::getConstantResultForNonConstArguments (src/Functions/isNotNull.cpp:61)
  -> elem.type->onlyNull() deref when type == nullptr

In other words, ClickHouse was crashing while trying to optimize the query plan, not while executing the join itself. The optimizer was evaluating parts of the filter expression to decide whether an outer join could be simplified, and it ended up calling into isNotNull() with an argument that was not in a valid state.

That last part is the whole story, but the analyzer flag decides whether this path runs at all.

Why it only happens with the analyzer

In FunctionIsNotNull::getConstantResultForNonConstArguments, ClickHouse short-circuits when the old analyzer is used:

if (!use_analyzer)
    return nullptr;

use_analyzer comes from allow_experimental_analyzer (renamed to enable_analyzer, with the alias preserved), so the crash path only executes when the new analyzer is enabled.

Why ClickHouse evaluates filters during join optimization

ClickHouse represents filters and projections as an expression graph, an ActionsDAG (directed acyclic graph). During planning, the optimizer runs a pass that tries to turn outer joins into inner joins when it is safe to do so.

The reasoning is sound: a LEFT JOIN keeps left rows even when the right side does not match, but if your WHERE clause references the right side in a way that removes “not matched” rows anyway, then the outer join is doing extra work without changing the result. If the optimizer can prove those rows would be filtered out, it can replace the outer join with an inner join.

Proving that requires evaluating the filter under the “not matched rows” scenario, which means some right-side expressions are unknown. The evaluator is designed to tolerate that. It can partially fold constants where possible and leave the rest as unknown.

The crash happened because “unknown” accidentally became “invalid”.

The root cause: operator[] inserted a poisoned default

Inside ActionsDAG::evaluatePartialResult, there is a map from DAG nodes to partially-evaluated columns, roughly:

  • key: ActionsDAG::Node const *
  • value: ColumnWithTypeAndName

In this function, node_to_column is a std::unordered_map<const Node *, ColumnWithTypeAndName>.

ColumnWithTypeAndName is defined as:

struct ColumnWithTypeAndName
{
    ColumnPtr column; // may be nullptr for headers
    DataTypePtr type; // expected to be set
    String name;
};

The comment explicitly allows column == nullptr to represent headers, but it never suggests type can be null. Default construction (ColumnWithTypeAndName()) leaves both column and type null.

Not every child node has a value at partial evaluation time, especially when the expression references the right side of an outer join and the optimizer is simulating “not matched” rows.

That is intentional. filterResultForNotMatchedRows runs partial evaluation with:

allow_unknown_function_arguments = true
throw_on_error = false

Some right-side columns are expected to be unknown, as long as their type metadata survives.

The original code built function arguments like this:

arguments[i] = node_to_column[node->children[i]];

That looks like a read, but std::unordered_map::operator[] is not a read. If the key is missing, it inserts a new entry with a default-constructed value.

That means missing children became column == nullptr, type == nullptr, name == "".

So a missing argument (which was supposed to be “unknown but still well-formed”) became an argument whose type pointer was literally null. That is a broken state for most of the function evaluation machinery, and it becomes fatal as soon as a function wants to inspect the argument type.

isNotNull() does exactly that during constant folding:

if (elem.type->onlyNull()) ...
if (canContainNull(*elem.type)) ...

So it eventually dereferenced the null type pointer in FunctionIsNotNull::getConstantResultForNonConstArguments.

This is the kind of bug that feels almost unfair because nothing in SQL is “wrong”. The query is valid. The optimizer is doing something reasonable. The crash comes from a subtle C++ detail: using operator[] on a map in a context where missing keys are expected.

The fix: represent “unknown” as typed, not as null

I did not try to redefine ClickHouse’s partial evaluation rules. The pass explicitly allows unknown arguments, and that is necessary for the optimization to exist at all.

The fix was to stop turning “missing” into “default-constructed invalid” by removing operator[] usage and using find() instead. When a child was missing from the map, I created a typed stub using the node metadata: {nullptr, child->result_type, child->result_name}.

That keeps the column pointer null (the value is still unknown), but it preserves the type so type-based functions can reason about it without crashing.

The change is small and localized:

const auto * child = node->children[i];

if (auto it = node_to_column.find(child); it != node_to_column.end())
    arguments[i] = it->second;
else
    arguments[i] = ColumnWithTypeAndName{nullptr, child->result_type, child->result_name};

This lines up with existing semantics: ColumnWithTypeAndName already treats column == nullptr as a valid “header,” and the fix keeps that header typed, which is the invariant the expression pipeline expects.

With this in place, isNotNull() can fold constants safely even when some arguments are missing, and the optimizer can decide whether the outer join conversion is valid without taking the entire server down.

Regression test

I added a regression test that mirrors the reproduction steps: create the two tables, insert the small data set, enable allow_experimental_analyzer, and run the query. The important assertion is simply that the query executes and the server remains alive.

I did not try to exhaustively test every join variant here. The intent was to lock in the exact scenario that produced a segfault and make it impossible to reintroduce silently.

Impact

The patch does not change query semantics and it does not introduce measurable overhead in practice. It replaces an accidental insertion into an unordered map with a lookup and a small stub object, which is negligible compared to planning and optimization work.

More importantly, it moves the engine from “a valid query can crash the server” back to the expected behavior, which is that a query either runs or fails with a normal error.

Takeaways

If you are using ClickHouse with allow_experimental_analyzer = 1, keep in mind that optimizer paths can be exercised more aggressively, and that is often where edge cases hide. If you see an ECONNRESET from a single query and nothing else looks wrong, it is worth checking whether the server actually survived the request before you start blaming the client.

Also, in C++, map[key] is a surprisingly expensive way to learn a life lesson.

References

hash: 10d5
EOF