We conduct extensive experiments to empirically prove the performance of our proposed TP-GMOT
including both detection with CS-OD and association with MAC-SORT in the GMOT problem.
Our strategy can help bridging the gap between human's intention and computer understanding to provide
flexibility in tracking objects with distinctive characteristics follow input texts.
"Track red colorcar while excluding
yellow, blue, black, white colorcar"
Quantitative Results
We present a detailed comparison through six tables below. The first two tables highlight the distinctions between our innovative TP-GMOT approach and the existing one-shot GMOT on the Refer-GMOT dataset. In the subsequent four tables, we demonstrate the effectiveness and generalization of TP-GMOT by comparing it with other state-of-the-art fully-supervised MOT methods on Refer-GMOT40, Refer-Animal, DanceTrack and MOT20 datasets. We also conduct an ablation study on the parameter θ (theta), which measures the similarity between two vectors and plays a crucial role in adeptly balancing between motion and appearance during tracking using MAC-SORT.
GMOT Tracking comparison on Refer-GMOT40 dataset between our CS-OD with existing SOTA OS-OD on with various SOTA trackers.
Trackers
Detectors
# shot
HOTA↑
MOTA↑
IDF1↑
SORT
OS-OD
one-shot
30.05
20.83
33.90
CS-OD(Ours)
zero-shot
56.43
66.72
67.51
DeepSORT
OS-OD
one-shot
27.82
17.96
30.37
CS-OD(Ours)
zero-shot
50.54
60.21
57.93
ByteTrack
OS-OD
one-shot
29.88
20.29
34.70
CS-OD(Ours)
zero-shot
55.87
64.79
69.79
MOTRv2
OS-OD
one-shot
23.76
13.87
25.17
CS-OD(Ours)
zero-shot
32.93
18.70
33.48
OC-SORT
OS-OD
one-shot
29.00
19.96
32.85
CS-OD(Ours)
zero-shot
56.06
63.69
68.85
Deep-OCSORT
OS-OD
one-shot
30.37
21.10
34.74
CS-OD(Ours)
zero-shot
55.74
65.53
66.54
Average gains by CS-OD across all trackers
+22.78↑
+37.61↑
+28.73↑
GMOT Tracking comparison on Refer-GMOT40 dataset between our MAC-SORT with various SOTA trackers.
Trackers
HOTA↑
MOTA↑
IDF1↑
SORT
56.43
66.72
67.51
DeepSORT
50.54
60.21
57.93
ByteTrack
55.87
64.79
69.79
OC-SORT
56.06
63.69
68.85
Deep-OCSORT
55.74
65.53
66.54
MOTRv2
32.93
18.70
33.48
MAC-SORT (Ours)
58.58
67.77
71.70
GMOT Tracking comparison on Refer-Animal between our MAC-SORT and CS-OD with existing fully-supervised MOT methods. These methods utilize Faster R-CNN (FRCNN) and YOLOX trained on the AnimalTrack-trainset as their object detector. It is important to note that these fully-supervised methods are limited in their ability to handle category-agnostic tracking.
Trackers
Detectors
Category Agnostic
HOTA↑
MOTA↑
IDF1↑
SORT
FRCNN
×
42.80
55.60
49.20
DeepSORT
FRCNN
×
32.80
41.40
35.20
ByteTrack
YOLOX
×
40.10
38.50
51.20
TransTrack
YOLOX
×
45.40
48.30
53.40
QDTrack
YOLOX
×
47.00
55.70
56.30
OC-SORT
YOLOX
×
56.93
65.02
67.48
Deep-OCSORT
YOLOX
×
57.24
68.05
62.01
MORTv2
YOLOX
×
52.07
57.08
62.07
MAC-SORT (Ours)
YOLOX
×
57.86
68.32
63.01
MAC-SORT (Ours)
CS-OD (Ours)
✔
57.29
66.46
68.37
Ablation study of generalizability of TP-GMOT framework on DanceTrack-valset with MOT task. We compare our MAC-SORT and CS-OD with other fully-supervised MOT methods, which use YOLOX trained on DanceTrack-trainset as their object detector. Deep-OCOSRT denotes the reported results in the paper whereas Deep-OCOSRT† presents the reproduced results with the best settings suggested by the authors on our machine with the same object detector. It is important to note that these existing fully-supervised methods are limited in their ability to handle category-agnostic tracking.
Trackers
Detectors
Category
HOTA↑
MOTA↑
IDF1↑
Agnostic
SORT
YOLOX
×
47.80
88.20
48.30
DeepSORT
YOLOX
×
45.80
87.10
46.80
MOTDT
YOLOX
×
39.20
84.30
39.60
ByteTrack
YOLOX
×
47.10
88.20
51.90
OC-SORT
YOLOX
×
52.10
87.30
51.60
Deep-OCSORT
YOLOX
×
58.53
--
59.06
Deep-OCSORT†
YOLOX
×
49.36
84.82
48.89
MAC-SORT(Ours)
YOLOX
×
53.78
86.85
54.06
MAC-SORT(Ours)
CS-OD (Ours)
✔
48.75
81.74
48.02
Ablation study on the effectiveness of MAC-SORT on MOT20-testset with MOT task. We compare our MAC-SORT with other SORT-based MOT methods. As ByteTrack, OC-SORT uses different thresholds for testset sequences and offline interpolation procedure, we also report scores by disabling these as in ByteTrack†, OC-SORT†. As Deep OC-SORT used separated weights for YOLOX object detector, we also report scores by retraining YOLOX on MOT20-trainset as in Deep OC-SORT†.
Trackers
HOTA↑
MOTA↑
IDF1↑
MeMOT
54.1
63.7
66.1
FairMOT
54.6
61.8
67.3
GSDT
53.6
67.1
67.5
CSTrack
54.0
66.6
68.6
ByteTrack
61.3
77.8
75.2
OC-SORT
62.4
75.7
76.3
Deep-OCSORT
63.9
75.6
79.2
ByteTrack†
60.4
74.2
74.5
OC-SORT†
60.5
73.1
74.4
Deep OC-SORT†
59.6
75.3
75.2
MAC-SORT (Ours)
62.6
75.2
76.9
Ablation study of θ in computing Wamw on Refer-GMOT40 dataset.