BLOG_POST / clickhouse-join-filter-optimizer-deadly-bug

A ClickHouse segfault in query planning (and the 5-line fix)

5 min read
851 words
tl;dr summary

A dev added isNotNull() to a filter on the right side of an ANY LEFT JOIN, and ClickHouse started segfaulting during join optimization with allow_experimental_analyzer=1. The root cause was operator[] default-constructing ColumnWithTypeAndName with type == nullptr. The fix was to use find() and a typed stub for missing arguments, plus a regression test.

We had migrated a KPI dashboard from Postgres to ClickHouse. About 30 minutes in, a dev pinged me: they added a WHERE clause and the query crashed. The backend saw ECONNRESET. ClickHouse was segfaulting and being restarted by Docker.

We joke about blaming caches, but this time the crash was real. I rebuilt ClickHouse from source on 26.1.1.1 to get symbols, used gdb with gef to trace the crash, and the code quality impressed me: after ~37 minutes of compiling Rust, C, and C++ code, it finished on the first try with zero warnings.

Setup

This happened with:

  • ClickHouse 25.12 in Docker on bare metal
  • Native protocol on port 9000 from the TypeScript client
  • allow_experimental_analyzer = 1 enabled intentionally because we needed a feature behind it
  • A single dashboard route in dev that reliably triggered the crash

The repro uses HTTP on port 8123 for convenience, even though the crash also happens over the native protocol.

Minimal reproduction

Start ClickHouse:

docker run --rm -p 8123:8123 -p 9000:9000 \
  -e CLICKHOUSE_USER=default \
  -e CLICKHOUSE_PASSWORD=default \
  clickhouse/clickhouse-server:25.12

Create a minimal schema and data set:

CREATE TABLE Session
(
  id String,
  site Enum8('STORE_A' = 1, 'STORE_B' = 2),
  device Enum8('DESKTOP' = 1, 'MOBILE' = 2)
)
ENGINE = MergeTree
ORDER BY id;

CREATE TABLE AddedToCart
(
  sessionId String,
  order Int32,
  top Nullable(Int32),
  screenHeight Nullable(Int32),
  screenWidth Nullable(Int32),
  isPromotion UInt8,
  date DateTime64(3)
)
ENGINE = MergeTree
ORDER BY (sessionId, date);

INSERT INTO Session (id, site, device) VALUES
  ('s1', 'STORE_A', 'DESKTOP'),
  ('s2', 'STORE_B', 'MOBILE');

INSERT INTO AddedToCart (sessionId, order, top, screenHeight, screenWidth, isPromotion, date) VALUES
  ('s1', 1, 100, 400, 1024, 1, parseDateTime64BestEffort('2026-01-19T12:00:00.000Z', 3)),
  ('s2', 2, 100, 400, 1024, 1, parseDateTime64BestEffort('2026-01-19T12:00:01.000Z', 3));

Enable the experimental analyzer and run the query:

SET allow_experimental_analyzer = 1;

SELECT
  s.site AS site,
  if(
    (a.order IS NULL) OR (a.order <= 0) OR (a.order > 30),
    NULL,
    accurateCastOrNull(a.order, 'Int32')
  ) AS page_level,
  count() AS count
FROM AddedToCart AS a
ANY LEFT JOIN Session AS s ON a.sessionId = s.id
WHERE (a.top IS NOT NULL)
  AND (a.screenHeight IS NOT NULL)
  AND (a.screenHeight > 0)
  AND (a.isPromotion = _CAST(1, 'UInt8'))
  AND (s.device = 'DESKTOP')
  AND isNotNull(s.site)
GROUP BY site, page_level
ORDER BY site ASC, page_level ASC
FORMAT JSONEachRow;

Quick checks made the trigger conditions obvious: swapping to ANY INNER JOIN, removing isNotNull(s.site), or disabling allow_experimental_analyzer made the crash disappear.

What the stack trace pointed to

QueryPlanOptimizations::tryConvertAnyOuterJoinToInnerJoin
  -> filterResultForNotMatchedRows (src/Processors/QueryPlan/Optimizations/Utils.cpp)
  -> ActionsDAG::evaluatePartialResult (src/Interpreters/ActionsDAG.cpp)
  -> executeActionForPartialResult
  -> FunctionIsNotNull::getConstantResultForNonConstArguments (src/Functions/isNotNull.cpp:61)
  -> elem.type->onlyNull() deref when type == nullptr

So it was a crash during planning, not execution. The optimizer was evaluating a filter to see if it could convert an outer join to an inner join and ended up calling isNotNull() with a broken argument.

Why it only happens with the analyzer

if (!use_analyzer)
    return nullptr;

use_analyzer comes from allow_experimental_analyzer (renamed to enable_analyzer, alias preserved), so the crash path only runs when the new analyzer is enabled.

The root cause: operator[] inserted a poisoned default

ActionsDAG::evaluatePartialResult keeps a std::unordered_map<const Node *, ColumnWithTypeAndName> that stores partial results. The struct is:

struct ColumnWithTypeAndName
{
    ColumnPtr column; // may be nullptr for headers
    DataTypePtr type; // expected to be set
    String name;
};

filterResultForNotMatchedRows intentionally runs partial evaluation with:

allow_unknown_function_arguments = true
throw_on_error = false

Some right-side columns are expected to be unknown, as long as their types survive. But arguments were built like this:

arguments[i] = node_to_column[node->children[i]];

operator[] inserts a default-constructed value when a key is missing, so missing children became column == nullptr, type == nullptr, name == "". That is invalid, not just unknown.

isNotNull() then dereferences elem.type during constant folding:

if (elem.type->onlyNull()) ...
if (canContainNull(*elem.type)) ...

So the segfault was just a null type pointer.

The fix: represent “unknown” as typed, not as null

Replace operator[] with find() and provide a typed stub:

const auto * child = node->children[i];

if (auto it = node_to_column.find(child); it != node_to_column.end())
    arguments[i] = it->second;
else
    arguments[i] = ColumnWithTypeAndName{nullptr, child->result_type, child->result_name};

This keeps the header semantics (column == nullptr) but preserves type and name, which is what the expression pipeline expects.

Regression test

I added a regression test that mirrors the reproduction steps and asserts the query runs without killing the server.

Impact

The patch does not change query semantics and does not add measurable overhead. It just prevents a valid query from crashing the server during planning.

Takeaways

If you are using allow_experimental_analyzer = 1, remember that optimizer paths can be more aggressive and edge cases show up there. If a query causes ECONNRESET, check whether the server survived the request before blaming the client.

References

hash: 10d5
EOF