DragonFly+: An FPGA-based quad-camera visual SLAM system for autonomous vehicles

Speaker:  Shaoshan Liu – Fremont, CA, United States
Topic(s):  Architecture, Embedded Systems and Electronics, Robotics


PerceptIn’s DragonFly system utilizes computer vision-based sensor fusion to achieve reliable localization. Specifically, DragonFly integrates four cameras (with 720p resolution) into one hardware module, such that a pair of cameras faces the front of the vehicle and another pair of cameras faces the rear. Each pair of cameras functions like human eyes to capture spatial information of the environment from left and right two-dimensional images. The combination of the two pairs of cameras creates a 360-degree panoramic view of the environment. With this design, visual odometry should never fail since at any moment in time, you can always extract 360-degree spatial information from the environment, and there are always enough overlapping spatial regions between consecutive frames.
To achieve affordability and reliability, PerceptIn had four basic requirements for the DragonFly system design: It must be modular, with an independent hardware module for computer-vision-based localization and map generation. It must be SLAM-ready, with hardware synchronization of four cameras and IMU. It must be low power: the total power budget for this system is less than 10 W. It must be high performance: DragonFly needs to process four-way 720p YUV images with > 30 fps. Note that, with this design, at 30 fps, it generates more than 100 MB of raw image data per second and thus imposes tremendous stress on the computing system. After initial profiling, PerceptIn found out that the image processing frontend (e.g., image feature extraction) accounts for > 80% of the processing time.
To achieve the aforementioned design goals, PerceptIn designed and implemented DragonFly+, an FPGA-based real-time localization module. The DragonFly+ system includes hardware synchronizations among the four image channels as well as the IMU; a direct I/O architecture to reduce off-chip memory communication; and a fully pipelined architecture to accelerate the image processing frontend of the localization system. In addition, it employs parallel and multiplexing processing techniques to achieve a good balance between bandwidth and hardware resource consumption. 

About this Lecture

Number of Slides:  40
Duration:  45 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.