Leaderboard - TAU: Taiwan Audio Understanding Benchmark

The human baseline (🏆) represents the upper bound for model performance, with approximately 84% accuracy on single-hop and 83% on multi-hop questions.

Rank	Model	Params (B)	Single-hop Acc	Multi-hop Acc	Submission Date
🏆	Human Baseline Upper Bound	-	~84%	~83%	-

Submit Your Results

Ready to evaluate your model on TAU? Contact the authors to discuss your submission and add your model to the leaderboard.

Primary Author: even.dlion8@gmail.com

Supervisor: hungyilee@ntu.edu.tw