TAU Benchmark Leaderboard
Compare model performance on Taiwan Audio Understanding tasks
The human baseline (🏆) represents the upper bound for model performance, with approximately 84% accuracy on single-hop and 83% on multi-hop questions.
| Rank | Model | Params (B) | Single-hop Acc | Multi-hop Acc | Submission Date |
|---|---|---|---|---|---|
| 🏆 |
Human Baseline
Upper Bound
|
- | ~84% | ~83% | - |
Submit Your Results
Ready to evaluate your model on TAU? Contact the authors to discuss your submission and add your model to the leaderboard.
Primary Author: even.dlion8@gmail.com
Supervisor: hungyilee@ntu.edu.tw