What makes a task vector actually work for faster inference?

In-context learning lets LLMs adapt to new tasks via examples, but longer contexts mean slower inference. Task vectors compress those demonstrations into hidden states, but existing methods only check if they work—not why. The authors introduce a metric that directly measures whether a task vector's predictions match in-context learning's distribution, then use it to design Linear Task Vector (LTV), which minimizes this gap via closed-form regression. Across eight benchmarks and five LLMs, LTV improves accuracy by 9.2% while cutting latency, and task vectors from larger models even boost smaller models' performance by 6.4%.