The human baseline (🏆) represents the upper bound for model performance, with approximately 84% accuracy on single-hop and 83% on multi-hop questions.

Rank Model Params (B) Single-hop Acc Multi-hop Acc Submission Date
🏆
Human Baseline Upper Bound
- ~84%
~83%
-

Submit Your Results

Ready to evaluate your model on TAU? Contact the authors to discuss your submission and add your model to the leaderboard.

Primary Author: even.dlion8@gmail.com

Supervisor: hungyilee@ntu.edu.tw