decompress-binary

star 48.1k

Extract the inner ELF from a ClickHouse self-extracting `clickhouse` binary, including when its architecture differs from the host (e.g. to load an aarch64 CI core dump on an x86 workstation). Use when gdb/lldb needs real symbols from a downloaded CI/release binary, or when self-extraction by running the binary is not possible because of an architecture mismatch.

ClickHouse By ClickHouse schedule Updated 6/3/2026

name: decompress-binary description: Extract the inner ELF from a ClickHouse self-extracting clickhouse binary, including when its architecture differs from the host (e.g. to load an aarch64 CI core dump on an x86 workstation). Use when gdb/lldb needs real symbols from a downloaded CI/release binary, or when self-extraction by running the binary is not possible because of an architecture mismatch. argument-hint: [output.elf] disable-model-invocation: false allowed-tools: Bash(python3:), Bash(zstd:), Bash(llvm-objdump:), Bash(llvm-nm:), Bash(file:), Bash(curl:), Bash(ls:), Bash(stat:), Read, Write

Decompress a ClickHouse Binary (Cross-Architecture)

ClickHouse release and CI clickhouse binaries are self-extracting: a small decompressor stub, followed by the zstd-compressed real ELF and a trailer.

The normal way to decompress is to run the binary once: it extracts the inner ELF in place and re-execs it. That works on the binary's own architecture, and also on a foreign architecture if qemu user-mode emulation for it is installed (e.g. qemu-aarch64 to run an aarch64 binary on x86). When qemu for the target is not available, you cannot run the binary at all and must extract the payload offline.

This skill extracts the inner ELF without executing anything, on any host.

When to use

  • You downloaded a CI/release clickhouse and gdb/lldb shows no real symbols (it only sees the decompressor stub's tiny symbol table).
  • The binary's architecture differs from the host (cannot self-extract by running).
  • You need the inner ELF to load a core dump (see ci/decrypt-cores.md for the matching core-dump decryption procedure).

Format

See utils/self-extracting-executable/types.h:

[ decompressor ELF ]
[ compressed file blobs ]
[ FileData[] ]                 # one per packed file, each followed by its name
[ MetaData (16 bytes) @ EOF ]

MetaData { uint64 number_of_files; uint64 start_of_files_data; }
FileData { uint64 start, end, name_length, uncompressed_size, umask; bool exec; }

MetaData sits at the very end of the file. start_of_files_data points at the FileData array; each 48-byte FileData is followed by the file name. The compressed bytes for a file are input[start:end] (zstd, possibly multi-frame). The packed clickhouse ELF is the entry with exec = true.

Steps

  1. Download the binary from the build job for the exact commit, for example:

    curl -s "https://clickhouse-builds.s3.amazonaws.com/PRs/<pr>/<sha>/build_<arch>_<sanitizer>/clickhouse" -o clickhouse.sfx
    

    Find the precise URL in the build job's artifact_report_build_*.json, or via .claude/tools/fetch_ci_report.js "<pr-url>". Download in the foreground (a killed/resumed curl can append garbage past EOF and break the trailer; verify the size matches Content-Length).

  2. Extract the inner ELF:

    python3 .claude/skills/decompress-binary/extract_self_extracting.py clickhouse.sfx clickhouse.elf
    
  3. Verify it is the right build and has symbols:

    file clickhouse.elf                                          # ELF ..., not stripped, with debug_info
    llvm-objdump -s -j .note.gnu.build-id clickhouse.elf | tail  # must match the core's build id
    

    The build id must equal the one in the crash report / core. A mismatched binary yields unusable backtraces.

  4. Use it with the core dump:

    gdb clickhouse.elf core.<pid>      # or: lldb clickhouse.elf -c core.<pid>
    

    gdb and lldb read foreign-architecture cores fine for backtraces and memory inspection (you are not executing the target).

Notes

  • A truncated or corrupted download is the most common failure: if the script reports an implausible number_of_files, re-download cleanly and check the size.
  • The inner ELF is large (several GB for sanitizer builds, unstripped). Make sure there is enough disk.
  • Shortcut when you can run the binary: if the host matches the binary's architecture, or qemu user-mode emulation for it is installed, just run ./clickhouse once to self-extract in place. This skill is for the case where neither is possible.
Install via CLI
npx skills add https://github.com/ClickHouse/ClickHouse --skill decompress-binary
Repository Details
star Stars 48,064
call_split Forks 8,527
navigation Branch main
article Path SKILL.md
More from Creator