berkeley-function-calling-leaderboard¶
version |
1.0.0 |
license |
|
usage |
unrestricted |
languages |
eng |
format |
json |
channel |
|
sampling rate |
|
bit depth |
|
duration |
0 days 00:00:00 |
files |
5251, duration distribution: each file is 0.0 s |
repository |
audb-public |
Description¶
The Berkeley function calling leaderboard is a live leaderboard to evaluate the ability of different LLMs to call functions (also referred to as tools). We built this dataset from our learnings to be representative of most users’ function calling use-cases, for example, in agents, as a part of enterprise workflows, etc. To this end, our evaluation dataset spans diverse categories, and across multiple languages. Checkout the Leaderboard at gorilla.cs.berkeley.edu/leaderboard.html and further info at https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard.
Example¶
json/bfcl-v3-exec-simple/sample-89.json
[
{
"role": "system",
"tools": [
{
"type": "function",
"function": {
"name": "calculate_nutritional_needs",
"description": "Calculates the nutritional needs of a person based on their weight, height, age, gender, activity level, and goal.",
"parameters": {
"type": "object",
"properties": {
"weight": {
"type": "number",
"description": "The weight of the person in kilograms."
},
"height": {
"type": "number",
"description": "The height of the person in centimeters."
},
"age": {
"type": "number",
"description": "The age of the person in years."
},
"gender": {
"type": "string",
"description": "The gender of the person. Possible options [male, female, other]."
},
"activity_level": {
"type": "integer",
"description": "The activity level of the person. Possible options [1,2,3,4,5]."
},
"goal": {
"type": "string",
"description": "The goal of the person. Possible options [lose, gain, maintain]."
}
},
"required": [
"weight",
"height",
"age",
"gender",
"activity_level",
"goal"
]
}
}
}
]
},
{
"role": "human",
"text": "I have an 80-year-old female client who is 170 cm tall, weighs 59 kg, and is quite active with an activity level of 4. She's looking to reduce her weight. Could you calculate her daily nutritional needs based on these details?"
},
{
"role": "assistant",
"tool_calls": [
{
"type": "function",
"function": {
"name": "calculate_nutritional_needs",
"arguments": {
"weight": 59,
"height": 170,
"age": 80,
"gender": "female",
"activity_level": 4,
"goal": "lose"
}
}
}
],
"meta": {
"source": "truth"
}
}
]
Tables¶
Click on a row to toggle a preview.
ID |
Type |
Columns |
|---|---|---|
bfcl-v3-chatable |
filewise |
topic, turns |
bfcl-v3-exec-multiple |
filewise |
topic, turns |
bfcl-v3-exec-parallel |
filewise |
topic, turns |
bfcl-v3-exec-parallel-multiple |
filewise |
topic, turns |
bfcl-v3-exec-simple |
filewise |
topic, turns |
bfcl-v3-irrelevance |
filewise |
topic, turns |
bfcl-v3-java |
filewise |
topic, turns |
bfcl-v3-javascript |
filewise |
topic, turns |
bfcl-v3-live-irrelevance |
filewise |
topic, turns |
bfcl-v3-live-multiple |
filewise |
topic, turns |
bfcl-v3-live-parallel |
filewise |
topic, turns |
bfcl-v3-live-parallel-multiple |
filewise |
topic, turns |
bfcl-v3-live-relevance |
filewise |
topic, turns |
bfcl-v3-live-simple |
filewise |
topic, turns |
bfcl-v3-multi-turn-base |
filewise |
topic, turns |
bfcl-v3-multi-turn-composite |
filewise |
topic, turns |
bfcl-v3-multi-turn-long-context |
filewise |
topic, turns |
bfcl-v3-multi-turn-miss-func |
filewise |
topic, turns |
bfcl-v3-multi-turn-miss-param |
filewise |
topic, turns |
bfcl-v3-multiple |
filewise |
topic, turns |
bfcl-v3-parallel |
filewise |
topic, turns |
bfcl-v3-parallel-multiple |
filewise |
topic, turns |
bfcl-v3-rest |
filewise |
topic, turns |
bfcl-v3-simple |
filewise |
topic, turns |
bfcl-v3-sql |
filewise |
topic, turns |
Schemes¶
ID |
Dtype |
|---|---|
topic |
str |
turns |
int |