name: project-xplus description: Use when working in the Project-XPlus / Project-X / Project-XS codebase, especially Cuper, TAPA, U55C, Vitis/Vivado, bitstream generation, tmux hardware builds, 395bitstream synchronization, demo bitstream testing, docs/bitstream_summaries version records, source.diff, testing.md, or code_reading_guide.md. metadata: short-description: Project-XPlus Cuper/TAPA/U55C workflow
Project-XPlus
This skill is only an entry point. The repository documents are the source of truth.
First Steps
- Find the repo root. Prefer the current working directory if it contains
Project-XPlus; otherwise check:
/home/pyx/ProjectFS/Project-X/Project-XPlus
/home/pyx/project-x/Project-XPlus
- For narrow read-only questions, load only the relevant repository docs/sections. Before changing build scripts, kernels, bitstreams, reports, or version docs, read these files:
Project-XPlus/docs/codex/coding.md
Project-XPlus/docs/codex/testing.md
Project-XPlus/395bitstream/README.md
- If the task touches implementation naming, also read:
Project-XPlus/docs/design/implementation_versions_zh.md
- Run:
git status --short
tmux ls 2>/dev/null || true
If a hardware build is already in vpl, impl, or routing, say that source edits affect only the next build.
Workflow Rules
- Follow
docs/codex/coding.mdas the entry point. It links to detailed workflow docs underdocs/codex/workflows/. - Follow
docs/codex/testing.mdfor demo testing and required datasets. New demo testing is demo-only by default; do not rerun the standard bitstreams unless the user explicitly asks, standard bitstream/host changed, or old records are insufficient to interpret a mismatch. - When updating the HTML report for a TAPA full-PCG demo, keep current diagnostic sections such
as
TAPA PCG 分段时间andInit 与 1iter 差值on the latest demo-only measurements. Put standard/previous-demo/current-demo comparisons in a separate comparison block. In that comparison block, use standalone TAPA Cuper SpMV as the SpMV standard baseline; use TAPA full-PCG standard only for the 1iter comparison. If a demo changes stage semantics, label raw counters by their real meaning, e.g.iter recv, and compare derived metrics such asAP path = iter recv + dot_p_aponly when the formula is written in the HTML. Seedocs/codex/workflows/reports.mdbefore editing HTML views. - For a single-SpMV demo, record new data only in the SpMV/demo-only sections. Leave PCG
diagnostic and 1iter data unchanged, but label those sections as not run in this round
(
本轮未跑 PCG,无 init/1iter 过程). - For bitstream/build/TAPA/report/version-record work, read the matching
docs/codex/workflows/*.mdfile before editing. - Keep version records in
docs/bitstream_summaries/<version>/. - For code-changing demo candidates, maintain
README.md,changes.md,testing.md, and, when useful,code_reading_guide.md. Update officialsource.diffonly after demo-only board testing confirms a performance improvement, or when the user explicitly asks to preserve a functional-boundary fix; do not overwrite the last effectivesource.difffor failed or slower demos. - Store synchronized candidate bitstreams in
395bitstream/with a-demosuffix until the user explicitly approves promotion or asks to archive them. Project-XPlus now has five Cuper mainlines.395bitstream/may keep up to three demo slots: onecuper-tapa-spmvsingle-SpMV candidate, onecuper-tapa-pcgfull-PCG candidate, and onecuper-tapa-jacobiJacobi-iteration candidate. New demos overwrite only the same-mainline demo slot; archived demos move tobitstream_archive/. - Cuper Jacobi candidates use names like
cuper-tapa-jacobi-u55c-YYYYMMDD-demo.xclbinand must not be confused with Jacobi-preconditioned PCG. There is no Jacobi standard xclbin yet. - Do not replace standard bitstreams without archiving the old standard and updating
395bitstream/README.md. - Before starting any
TARGET=hwbitstream build, run the matching software-level validation first (sw_emu, TAPA software simulation, or a documented host/local smoke fallback). After launching a tmux hardware build, watch until the safe checkpoint indocs/codex/workflows/builds.md: XO generated and patched, then Vitis link has entered VPL/synthesis/implementation for full xclbin builds; for XO-only tasks, watch until the XO target is up-to-date.
Current Goal
- Current measured boundary: the one-shot
CuperPcgSpmv(...)single-SpMV demo now returns through fullthermal2and is close to full/nativeCuper(...)on shared successful points. Treat it as the single-SpMV regression baseline and boundary check, not the primary optimization target. DLC/Cuper-jacobi-iterationis the fifth Cuper mainline (cuper-tapa-jacobi), not Jacobi-preconditioned PCG. Its top isCuperJacobiIteration(...): host splitsA=D+R, the vector loader feeds-x_old, the Cuper service produces-R*x_old, and the update stage writesx_next=(b+(-R*x_old))*diag_invback into the singleXbuffer. It is wired into the rootMakefilethroughcuper-jacobi-*targets and currently reports[jacobi-stage-cycles]/[jacobi-stage-ms]timing debug fromMetrics[4..7]. Current records are software/TAPA simulation plus one mmap-only micro-probe xclbin artifact:395bitstream/cuper-tapa-jacobi-u55c-20260613-demo.xclbin(UUID380f9de1-e5c1-66ab-b888-db99d2ef3523, SHA2567f0ff7e5b7999d77174105ea5cf0d44629a0b9a43521c8efdc29a70ace5d77f1). That artifact isCuperJacobiMmapProbeOnly(...), not the full Jacobi graph. It is timing-clean (routed WNS 0.003 ns, TNS 0) and maps Status/Metrics/Debug to HBM[24]/HBM[25]/HBM[26]. The previous full-graph entry mmap probe xclbin timed out atFinish()onthermal2_n16/thermal2_n1024, with Status[8..11], Metrics[8..11], and Debug[48..51] all zero. There is still no Jacobi standard xclbin. The next debug boundary is board testing the current mmap-only demo with the native XRT runnercuper_jacobi_mmap_probe_xrt.- The active optimization target has moved to full
CuperPcg(...)PCG control and vector update paths. Prioritizedetail/pcg_controller.hpp,dot_p_ap,update_xr,update_p,P_spmv/AP_spmvconsumption, controller HBM access patterns, stage timers, and service drain/stop overhead. The 2026-05-29 data shows raw SpMV/AP receive is no longer the dominant 1iter cost. - Keep
CuperPcgSpmv(...)as a Cuper-compatible one-shot graph. Do not reintroducePcg_Single*command/stop/writer-done control shells into the single-SpMV demo. - To claim a full-PCG performance improvement, modify the full
CuperPcg(...)path and run full-PCG software or hardware validation. A single-SpMV demo result alone is only a SpMV regression/boundary result. - For ongoing notes, keep single-SpMV baseline records in:
docs/bitstream_summaries/2026-05-28-cuper-tapa-spmv-single-optimization/
- Keep full-PCG controller/update optimization records in:
docs/bitstream_summaries/2026-05-27-cuper-tapa-pcg-spmv-near-native-cuper/
- Keep Cuper Jacobi iteration records in:
docs/bitstream_summaries/2026-06-10-cuper-tapa-jacobi-iteration/
DLC/Cuper-jacobi-iteration/docs/testing.md
Common Commands
Host smoke:
make cuper-tapa-pcg-fpga-host
make run-cuper-pcg-tapa-fpga DATASET=data/generated/cgsolver/n512 MAX_ITERS=1 DIFF_TOL=1e-3
Cuper Jacobi smoke:
make cuper-jacobi-build-host
MAX_ITERS=1 make cuper-jacobi-run-sw MATRIX=DLC/Cuper-jacobi-iteration/data/matrices/cant.mtx
MAX_ITERS=1 make cuper-jacobi-run-sw MATRIX=data/suitesparse/Schmid/csr/thermal2_n65536
make cuper-jacobi-regression-sw MODE=quick
Hardware builds should run in tmux and keep the shell open after completion. Use existing Makefile tmux targets when available.
Before finalizing code/docs:
git diff --check
git status --short