Datasets | NLP | #Cat. | #Vid. | #Frames | #Tracks | #Boxs | |
---|---|---|---|---|---|---|---|
SOT | OTB2013~[1] | ✖ | 10 | 51 | 29K | 51 | 29K |
VOT2017~[2] | ✖ | 24 | 60 | 21K | 60 | 21K | |
TrackingNet~[3] | ✖ | 21 | 31K | 14M | 31K | 14M | |
LaSOT~[4] | ✔ | 70 | 1.4K | 3.52M | 1.4K | 3.52M | |
TNL2K~[5] | ✔ | - | 2K | 1.24M | 2K | 1.24M | |
GSOT | GOT-10~[6] | ✖ | 563 | 10K | 1.5M | 10K | 1.5M |
Fish~[7] | ✖ | 1 | 1.6K | 527.2K | 8.25K | 516K | |
MOT | MOT17~[8] | ✖ | 1 | 14 | 11.2K | 1.3K | 0.3M |
MOT20~[9] | ✖ | 1 | 8 | 13.41K | 3.45K | 1.65M | |
Omni-MOT~[10] | ✖ | 1 | - | 14M+ | 250K | 110M | |
DanceTrack~[11] | ✖ | 1 | 100 | 105K | 990 | - | |
TAO~[12] | ✖ | 833 | 2.9K | 2.6M | 17.2K | 333K | |
SportMOT~[13] | ✖ | 1 | 240 | 150K | 3.4K | 1.62M | |
Refer-KITTI~[14] | ✔ | 2 | 18 | 6.65K | 637 | 28.72K | |
GMOT | AnimalTrack~[15] | ✖ | 10 | 58 | 24.7K | 1.92K | 429K |
GMOT-40~[16] | ✖ | 10 | 40 | 9K | 2.02K | 256K | |
Refer-GMOT40 (Ours) | ✔ | 10 | 40 | 9K | 2.02K | 256K | |
Refer-Animal (Ours) | ✔ | 10 | 58 | 24.7K | 1.92K | 429K |
In our research, we improved two existing datasets for tracking multiple objects (GMOT-40 and AnimalTrack) by adding text descriptions. These enhanced datasets are named 'Refer-GMOT40' and 'Refer-Animal'.
'Refer-GMOT40' includes 40 videos covering 10 different types of real-world objects, with each type having 4 video sequences. 'Refer-Animal' contains 26 videos focusing on 10 common types of animals.
Each video in these datasets has been carefully annotated with several details:
For text label:
For track label:
each line will contain 9 elements, seperated by commas
<frame>, <id>, <bb_left>, <bb_top>, <bb_width>, <bb_height>, <conf>, <x>, <y>
The annotations are formatted in JSON, and we provide examples to illustrate how they are structured. This data, prepared by 4 annotators, will be shared publicly.
Text label for referring with specific attributes { video: "", label:{ class_name: "", class_synonyms:[], definition: "", include_attributes: [] exclude_attributes: [] caption: "", track_path: "", } }
Track label for associating objects' IDs through time 1, 1, xl, yt, w, h, 1, 1, 1 1, 2, xl, yt, w, h, 1, 1, 1 1, 3, xl, yt, w, h, 1, 1, 1 2, 1, xl, yt, w, h, 1, 1, 1 2, 2, xl, yt, w, h, 1, 1, 1 2, 3, xl, yt, w, h, 1, 1, 1 3, 1, xl, yt, w, h, 1, 1, 1 3, 2, xl, yt, w, h, 1, 1, 1 3, 3, xl, yt, w, h, 1, 1, 1
video: "airplane-1", label:{ class_name: "helicopter", class_synonyms:["airplane", "aircraft", "jet", "plane"], definition: "a vehicle designed for flight in the air", include_attributes: ["black", "flying"], exclude_attributes: [], caption: "Track all black flying helicopters", track_path: "airplane_01.txt" }
video: "car-1" label:{ class_name: "car", class_synonyms: ["vehicle", "automobile", "auto", "transport", "transportation"], definition: "mechanical device designed for transportation, powered by an engine or motor, equipped by four wheels", include_attributes: ["white headlight", "oncoming traffic"], exclude_attributes: ["red taillight", "opposite traffic"], caption: "Track white headlight cars while excluding red taillight cars", track_path: "car_01.txt", }