Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao*, Sizhe Dang*,
Haishan Ye,
Guang Dai,
Yi Qian,
Ivor W.Tsang
ICLR 2025 ,
github
HiZOO, a diagonal Hessian informed zeroth-order optimizer which is the first work to leverage the
diagonal Hessian to enhance zeroth-order optimizer for fine-tuning LLMs.
|
A Differentiable Sparse Vector Quantization (SVQ) for Spatio-Temporal
Forecasting
Chao Chen*,
Tian Zhou*,
Yanjun Zhao,
Liang Sun,
Qian Yi,
Rong Jin
KDD 2025, github
SVQ leverages sparse regression for succinct representation, which theoretically
and practically favored over classical clustering based vector quantization methods.
|
GCformer: An Efficient Framework for Accurate and Scalable Long-Term Multivariate Time Series Forecasting
Yanjun Zhao*,
Ziqing Ma*
Tian Zhou*,
Liang Sun
Mengni Ye, Qian Yi,
CIKM 2023, github
GCformer combines a structured global convolutional branch for processing long input sequences with a local Transformer-based branch for
capturing short, recent signals.
|
SABER: Switchable and Balanced Training for Efficient LLM Reasoning
Kai Zhao*
Yanjun Zhao*, Jiaming Song, Shien He
Lusheng Zhang, Qiang Zhang, Tianjiao Li
arxiv, 2025
We propose SABER (Switchable and Balanced Training for Efficient LLM Reasoning), a reinforcement learning framework
that endows LLMs with user-controllable, token-budgeted reasoning.
|
FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed
Sizhe Dang*, Yangyang Guo*,
Yanjun Zhao*,
Haishan Ye, Xiaodong Zheng, Guang Dai, Ivor Tsang
arxiv, 2025
FZOO reduces the total forward passes needed for convergence by employing batched one-sided estimates that adapt step-sizes
based on the standard deviation of batch losses, while accelerates per-batch computation through the use of Rademacher random vector perturbations.
|
Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting
Yanjun Zhao*,
Tian Zhou*,
Chao Chen,
Liang Sun,
Qian Yi,
Rong Jin
arXiv, 2024,
github
Sparse-VQ cooperates with Reverse Instance Normalization (RevIN) to reduce noise
impact and capture sufficient statistics for forecasting, serving as an alternative
to the Feed-Forward layer (FFN) in the transformer architecture.
|
|