"Track red color car while excluding yellow, blue, black, white color car"
We conduct extensive experiments to empirically prove the performance of our proposed TP-GMOT including both detection with CS-OD and association with MAC-SORT in the GMOT problem. Our strategy can help bridging the gap between human's intention and computer understanding to provide flexibility in tracking objects with distinctive characteristics follow input texts.
"Track red color car while excluding yellow, blue, black, white color car"
"Track red color car while excluding yellow, blue, black, white color car"
Red cars
"Track yellow color car while excluding red, blue, black, white color car"
Yellow cars
"Track dark blue color car while excluding red, yellow, black, white color car"
Dark blue cars
"Track metal shell airplane"
Airplanes
"Track flying helicopter"
Helicopters
We present a detailed comparison through six tables below. The first two tables highlight the distinctions between our innovative TP-GMOT approach and the existing one-shot GMOT on the Refer-GMOT dataset. In the subsequent four tables, we demonstrate the effectiveness and generalization of TP-GMOT by comparing it with other state-of-the-art fully-supervised MOT methods on Refer-GMOT40, Refer-Animal, DanceTrack and MOT20 datasets. We also conduct an ablation study on the parameter θ (theta), which measures the similarity between two vectors and plays a crucial role in adeptly balancing between motion and appearance during tracking using MAC-SORT.
Trackers | Detectors | # shot | HOTA↑ | MOTA↑ | IDF1↑ |
---|---|---|---|---|---|
SORT | OS-OD | one-shot | 30.05 | 20.83 | 33.90 |
CS-OD (Ours) |
zero-shot | 56.43 | 66.72 | 67.51 | |
DeepSORT | OS-OD | one-shot | 27.82 | 17.96 | 30.37 |
CS-OD (Ours) |
zero-shot | 50.54 | 60.21 | 57.93 | |
ByteTrack | OS-OD | one-shot | 29.88 | 20.29 | 34.70 |
CS-OD (Ours) |
zero-shot | 55.87 | 64.79 | 69.79 | |
MOTRv2 | OS-OD | one-shot | 23.76 | 13.87 | 25.17 |
CS-OD (Ours) |
zero-shot | 32.93 | 18.70 | 33.48 | |
OC-SORT | OS-OD | one-shot | 29.00 | 19.96 | 32.85 |
CS-OD (Ours) |
zero-shot | 56.06 | 63.69 | 68.85 | |
Deep-OCSORT | OS-OD | one-shot | 30.37 | 21.10 | 34.74 |
CS-OD (Ours) |
zero-shot | 55.74 | 65.53 | 66.54 | |
Average gains by CS-OD across all trackers |
+22.78↑ | +37.61↑ | +28.73↑ |
Trackers | HOTA↑ | MOTA↑ | IDF1↑ |
---|---|---|---|
SORT | 56.43 | 66.72 | 67.51 |
DeepSORT | 50.54 | 60.21 | 57.93 |
ByteTrack | 55.87 | 64.79 | 69.79 |
OC-SORT | 56.06 | 63.69 | 68.85 |
Deep-OCSORT | 55.74 | 65.53 | 66.54 |
MOTRv2 | 32.93 | 18.70 | 33.48 |
MAC-SORT (Ours) |
58.58 | 67.77 | 71.70 |
Trackers | Detectors | Category Agnostic | HOTA↑ | MOTA↑ | IDF1↑ |
---|---|---|---|---|---|
SORT | FRCNN | × | 42.80 | 55.60 | 49.20 |
DeepSORT | FRCNN | × | 32.80 | 41.40 | 35.20 |
ByteTrack | YOLOX | × | 40.10 | 38.50 | 51.20 |
TransTrack | YOLOX | × | 45.40 | 48.30 | 53.40 |
QDTrack | YOLOX | × | 47.00 | 55.70 | 56.30 |
OC-SORT | YOLOX | × | 56.93 | 65.02 | 67.48 |
Deep-OCSORT | YOLOX | × | 57.24 | 68.05 | 62.01 |
MORTv2 | YOLOX | × | 52.07 | 57.08 | 62.07 |
MAC-SORT (Ours) |
YOLOX | × | 57.86 | 68.32 | 63.01 |
MAC-SORT (Ours) |
CS-OD (Ours) |
✔ | 57.29 | 66.46 | 68.37 |
Trackers | Detectors | Category | HOTA↑ | MOTA↑ | IDF1↑ |
---|---|---|---|---|---|
Agnostic | |||||
SORT | YOLOX | × | 47.80 | 88.20 | 48.30 |
DeepSORT | YOLOX | × | 45.80 | 87.10 | 46.80 |
MOTDT | YOLOX | × | 39.20 | 84.30 | 39.60 |
ByteTrack | YOLOX | × | 47.10 | 88.20 | 51.90 |
OC-SORT | YOLOX | × | 52.10 | 87.30 | 51.60 |
Deep-OCSORT | YOLOX | × | 58.53 | -- | 59.06 |
Deep-OCSORT† | YOLOX | × | 49.36 | 84.82 | 48.89 |
MAC-SORT (Ours) |
YOLOX | × | 53.78 | 86.85 | 54.06 |
MAC-SORT (Ours) |
CS-OD (Ours) |
✔ | 48.75 | 81.74 | 48.02 |
MAC-SORT
on MOT20-testset with MOT task. We compare our MAC-SORT
with other SORT-based MOT methods. As ByteTrack, OC-SORT uses different thresholds for testset sequences and offline interpolation procedure, we also report scores by disabling these as in ByteTrack†, OC-SORT†. As Deep OC-SORT used separated weights for YOLOX object detector, we also report scores by retraining YOLOX on MOT20-trainset as in Deep OC-SORT†.
Trackers | HOTA↑ | MOTA↑ | IDF1↑ |
---|---|---|---|
MeMOT | 54.1 | 63.7 | 66.1 |
FairMOT | 54.6 | 61.8 | 67.3 |
GSDT | 53.6 | 67.1 | 67.5 |
CSTrack | 54.0 | 66.6 | 68.6 |
ByteTrack | 61.3 | 77.8 | 75.2 |
OC-SORT | 62.4 | 75.7 | 76.3 |
Deep-OCSORT | 63.9 | 75.6 | 79.2 |
ByteTrack† | 60.4 | 74.2 | 74.5 |
OC-SORT† | 60.5 | 73.1 | 74.4 |
Deep OC-SORT† | 59.6 | 75.3 | 75.2 |
MAC-SORT (Ours) |
62.6 | 75.2 | 76.9 |
θ | Standard MOT metrics | ID metrics | ||
---|---|---|---|---|
HOTA↑ | MOTA↑ | IDF1↑ | IDR↑ | |
22.5° | 57.54 | 66.82 | 69.29 | 64.16 |
45 (default)° | 59.26 | 68.03 | 70.86 | 68.39 |
67.5° | 58.06 | 67.37 | 70.46 | 65.24 |
80° | 58.21 | 65.81 | 70.13 | 68.30 |