Skip to main content

Object tracking

AXIS Scene Metadata uses standard Multi-Object Tracking (MOT) concepts to produce a consistent stream of detections, object states, and tracks.

In an object tracking system, sensors generate measurements or detections of objects in the environment. A tracker associates detections over time to maintain identities and estimate each object’s state.

Computer vision is a common source of measurements: an AI model or algorithm detects and classifies objects per frame. A tracker then maintains identity continuity and updates the state across frames.

An object's state represents the current attributes of a tracked object. The object state typically includes position, classification, and appearance. The exact attributes depend on the tracker configuration and capabilities. The tracker continuously updates the state based on new observations and model predictions.

A time-ordered sequence of states for a single object forms a track. A track is created on the first reliable detection, updated on each associated detection, and terminated when the tracker no longer has sufficient evidence to maintain it. The track sequence is represented and realized via the Object Track id in ADF and tt:Object in ONVIF.

Tracking is not limited to video, radar and other sensors can also be used. For radar, tracking involves maintaining multiple object hypotheses across consecutive scans.

Combining outputs from multiple sensors and tracking algorithms (sensor fusion) improves robustness and accuracy by leveraging complementary signals at detection or track level.