train-sft

star 0

Run supervised fine-tuning with veRL

Jinyeop3110 By Jinyeop3110 schedule Updated 2/25/2026

name: train-sft description: Run supervised fine-tuning with veRL

Run supervised fine-tuning for protein-LLM:

  1. Verify environment:

    source /home/yeopjin/orcd/pool/init_protein_llm.sh
    python -c "import torch, verl, flash_attn; print(f'GPUs: {torch.cuda.device_count()}')"
    
  2. Check configuration:

    • Read configs/sft_config.yaml (if exists)
    • Verify LoRA settings: r=8, applied to k/v matrices
    • Confirm ESM-3 is frozen
  3. Pre-flight checks:

    • GPU memory: nvidia-smi
    • Data exists: ls ./data/pdb_2021aug02_sample/
    • Wandb configured: echo $WANDB_DIR
  4. Run training:

    python scripts/train_sft.py --config configs/sft_config.yaml
    
  5. Monitor:

    • Check wandb dashboard for loss curves
    • Watch for OOM errors
    • Verify gradient norms are stable
  6. Post-training validation:

    • Run evaluation on validation set
    • Check checkpoint saved: ls outputs/checkpoints/
Install via CLI
npx skills add https://github.com/Jinyeop3110/Post_Training_Protein_LLM --skill train-sft
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator