Search results for "SFT"
2026-04-23
04:54

Perplexity Discloses Web Search Agent Post-Training Method; Qwen3.5-Based Model Outperforms GPT-5.4 on Accuracy and Cost

Perplexity uses SFT followed by RL with Qwen3.5 models, leveraging a multi-hop QA dataset and rubric checks to boost search accuracy and efficiency, achieving best-in-class FRAMES performance. Abstract: Perplexity's post-training workflow for web-search agents combines supervised fine-tuning (SFT) to enforce instruction-following and language consistency with online reinforcement learning (RL) via the GRPO algorithm. The RL stage uses a proprietary multi-hop verifiable QA dataset and rubric-based conversational data to prevent SFT drift, with reward gating and within-group efficiency penalties. Evaluation shows Qwen3.5-397B-SFT-RL achieving top FRAMES performance, 57.3% accuracy with a single tool call and 73.9% with four calls at $0.02 per query, outperforming GPT-5.4 and Claude Sonnet 4.6 on these metrics. Pricing is API-based and excludes caching.
More
05:38

Prime Intellect launched the INTELLECT-3 model

Decentralization AI protocol Prime Intellect has launched the hybrid expert model INTELLECT-3 with 106B parameters, which is based on the GLM 4.5 Air Base model and is trained using SFT and RL. Prime Intellect completed a $15 million funding round in March this year.
More