Self Steering Optimizations

Oct 24, 2024

SSO allows you to align models without training

Paper: Aligning Large Language Models via Self-Steering Optimization (23 Pages)

Researchers from Chinese Academy of Sciences, Alibaba and University of Chinese Academy of Sciences are interested in development of mechanism to align LLMs with human preferences.

Hmm..What’s the background?

Traditional alignment methods, such as Reinforcement Learning from Human Feedback (RLHF), rely heavily on manually annotated data, which are expensive and time-consuming to acquire. Furthermore, as LLMs grow in complexity, human capabilities might struggle to provide adequate supervision.

To address these challenges, researchers have focused on automated alignment, seeking to minimize human intervention in the alignment process. This approach aims to create scalable alignment systems by developing techniques for generating high-quality preference signals that can effectively replace human-annotated data.

Source: https://lexica.art/prompt/2112adc7-e24a-47a7-a91e-0e7853f6c0bd

Ok, So what is proposed in the research paper?

Self-Steering Optimization (SSO) is an automated alignment algorithm that generates preference signals based on predefined principles. SSO has demonstrated significant performance improvements across multiple benchmarks:

Subjective Benchmarks:
- MT-Bench: SSO achieved an average improvement of nearly 0.5 points on this benchmark, which comprises 80 questions with scores by GPT-4
- AlpacaEval 2.0: SSO showed an average improvement of nearly 8% on this benchmark, which includes 805 questions evaluated by comparing answers to reference responses using GPT-4o
Objective Benchmarks:
- Mathematical Reasoning: SSO exhibited benefits in mathematical reasoning tasks, likely due to the Logicality and Helpfulness preference features used in training

It outperforms baselines that use annotated data, highlighting its potential as a scalable and efficient alternative to human-annotated data. Additionally, SSO has been successfully applied to both Supervised Fine-Tuning (SFT) models and aligned (Instruct) models, indicating its versatility.

What’s next?

The researchers acknowledge that SSO has some limitations and identify several areas for future development:

Enhanced W and G Functions: The current weight function (W) and self-steering loss (G) used in SSO are relatively simple
Fully Automated SSO: Currently, the principles used in SSO are manually defined
Extension to Other Paradigms: While SSO is currently based on principle-based automated alignment, the researchers believe it can be extended to other automated alignment paradigms

SSO allows you to align models without training

Learned something new? Consider sharing it!

So Essentially

Discussion about this post