Comparative Analysis of Z-GMOT

Qualitative Result

We conduct extensive experiments to empirically prove the performance of our proposed Open-GMOT including both detection with CS-OD and association with MAC-SORT in the GMOT problem. Our strategy can help bridging the gap between human's intention and computer understanding to provide flexibility in tracking objects with distinctive characteristics follow input texts.

"Track only red color car"

Quantitative Results

We conduct extensive experiments to empirically prove the performance of our proposed Z-GMOT including both detection with Open-CSOD and association with MAC-SORT in the GMOT problem. Our strategy can help bridging the gap between human's intention and computer understanding to provide flexibility in tracking objects with distinctive characteristics follow input texts.

Table 2: Tracking comparison on Refer-GMOT40 dataset between our iGLIP with SOTA OS-OD on various trackers. For each tracker, the best scores are highlighted in bold.
Trackers Detectors #-Shot HOTA↑ MOTA↑ IDF1↑
SORT
[Bewley et al., 2016]
OS-OD one-shot 30.05 20.83 33.90
iGLIP (Ours) zero-shot 54.21 62.90 64.34
DeepSORT
[Wojke et al., 2017]
OS-OD one-shot 27.82 17.96 30.37
iGLIP (Ours) zero-shot 50.45 58.99 57.55
ByteTrack
[Zhang et al., 2022c]
OS-OD one-shot 29.89 20.30 34.70
iGLIP (Ours) zero-shot 53.69 61.49 66.21
OC-SORT
[Cao et al., 2023]
OS-OD one-shot 30.35 20.60 34.37
iGLIP (Ours) zero-shot 56.51 62.76 67.40
Deep-OCSORT
[Maggiolino et al., 2023]
OS-OD one-shot 30.37 21.10 35.12
iGLIP (Ours) zero-shot 55.89 64.02 66.52
MOTRv2
[Zhang et al., 2023]
OS-OD one-shot 23.75 13.87 25.17
iGLIP (Ours) zero-shot 31.32 18.54 31.28
Table 3: Tracking comparison on Refer-GMOT40 dataset between our MA-SORT with other trackers. Our proposed iGLIP is used as the object detection. The best scores are highlighted in bold.
Trackers HOTA↑ MOTA↑ IDF1↑
SORT
[bewley2016simple]
54.21 62.90 64.34
DeepSORT
[wojke2017simple]
50.45 58.99 57.55
ByteTrack
[zhang2021bytetrack]
53.69 61.49 66.21
OC-SORT
[cao2023observation]
56.51 62.76 67.40
Deep-OCSORT
[maggiolino2023deep]
55.89 64.02 66.52
MOTRv2
[zhang2023motrv2]
31.32 18.54 31.28
MA-SORT (Ours) 56.75 64.62 68.17
Table 4: Tracking comparison on Refer-Animal between our Z-GMOT and existing fully-supervised MOT methods. The best scores are highlighted in bold.
Tracker Detector Train HOTA MOTA IDF1
SORT FRCNN
[ren2015faster]
42.80 55.60 49.20
DeepSORT FRCNN
[ren2015faster]
32.80 41.40 35.20
ByteTrack YOLOX
[yolox2021]
40.10 38.50 51.20
TransTrack YOLOX
[yolox2021]
45.40 48.30 53.40
QDTrack YOLOX
[yolox2021]
47.00 55.70 56.30
MA-SORT (Ours) YOLOX
[yolox2021]
57.86 68.32 63.01
MA-SORT (Ours) iGLIP (Z-GMOT) (Ours) 53.28 57.64 58.43
Table 5: Ablation study of generalizability of Z-GMOT on DanceTrack validation set with MOT task.
Trackers Detectors Train HOTA↑ MOTA↑ IDF1↑
SORT
[bewley2016simple]
YOLOX
[yolox2021]
47.80 88.20 48.30
DeepSORT
[wojke2017simple]
YOLOX
[yolox2021]
45.80 87.10 46.80
MOTDT
[Chen2018RealTimeMP]
YOLOX
[yolox2021]
39.20 84.30 39.60
ByteTrack
[zhang2021bytetrack]
YOLOX
[yolox2021]
47.10 88.20 51.90
OC-SORT
[cao2023observation]
YOLOX
[yolox2021]
52.10 87.30 51.60
MA-SORT (Ours) YOLOX
[yolox2021]
53.44 87.31 53.78
MA-SORT (Ours) iGLIP Z-GMOT (Ours) 47.57 83.11 46.58
Table 6: Ablation study of effectiveness of MA-SORT on MOT20 test set with MOT task. As ByteTrack, OC-SORT (gray) uses different thresholds for test set sequences and offline interpolation procedure, we also report scores by disabling these as ByteTrack, OC-SORT. The best scores are highlighted in bold.
Trackers HOTA↑ MOTA↑ IDF1↑
MeMOT (Cai et al., 2022a) 54.1 63.7 66.1
FairMOT (Zhang et al., 2021) 54.6 61.8 67.3
TransTrack (Sun et al., 2020a) 48.9 65.0 59.4
TrackFormer (Meinhardt et al., 2022b) 54.7 68.6 65.7
ReMOT (Fan Yang and Nakamura, 2021) 61.2 77.4 73.1
GSDT (Wang et al., 2020) 53.6 67.1 67.5
CSTrack (Chao Liang and Zou, 2022) 54.0 66.6 68.6
TransMOT (Peng Chu and Liu, 2023) - 77.4 75.2
ByteTrack (Zhang et al., 2022c) 61.3 77.8 75.2
OC-SORT (Cao et al., 2023) 62.4 75.7 76.3
ByteTrack (Zhang et al., 2022c) 60.4 74.2 74.5
OC-SORT (Cao et al., 2023) 60.5 73.1 74.4
MA-SORT (Ours) 61.4 77.6 75.5