Modern intelligent transportation systems require computer vision pipelines capable of operating reliably under dynamic urban conditions. At Greenwave TechLabs, the Visual Guard subsystem was developed as the visual intelligence layer within our multi-modal emergency vehicle preemption architecture.

The purpose of the Visual Guard is simple: detect emergency vehicles in real time and assist traffic systems in dynamically prioritizing emergency movement.

Stage 1 โ€” Input Acquisition

The system continuously captures live video streams from roadside cameras positioned near intersections. These frames represent the real-time traffic environment surrounding the signal infrastructure.

Stage 2 โ€” Frame Preprocessing

Before inference, frames are resized, normalized, optimized for YOLO input dimensions, and adjusted for brightness or noise when necessary. Our implementation uses standardized frame sizes of 416ร—416 or 640ร—640 depending on inference requirements.

Stage 3 โ€” YOLO Inference

The processed frames are passed into a YOLOv5s object detection model. The model performs object localization, confidence scoring, and class prediction in a single forward pass โ€” specifically identifying ambulances, fire trucks, and emergency responders within live traffic scenes.

Stage 4 โ€” Post-Processing

After inference, low-confidence detections are filtered, overlapping boxes are removed using Non-Max Suppression (NMS), and object tracking logic estimates movement direction. This reduces false positives and improves real-world stability.

Stage 5 โ€” Decision Integration

If the confidence threshold exceeds the configured limit, the Visual Guard generates an emergency detection event forwarded to the central decision logic. The multi-modal fusion system evaluates the result and coordinates the final preemption decision with acoustic detection, secure LoRa authentication, and signal synchronization logic.

Real-World Validation

58msaverage visual inference latency achieved under varied urban conditions
  • 0.895 mAP detection performance
  • 0.92 ambulance precision
  • Approximately 58ms inference latency

The architecture was specifically designed to remain scalable, low-cost, edge-compatible, and deployable across smart intersections. Reliable deployment requires redundancy, environmental resilience, low-latency inference, and integration with secure communication systems. The Visual Guard was built around exactly those principles.