Performance of EfficientPose Method with Reduced BiFPN Layer for 6D Pose Estimation

Budi Nugroho; Eva Yulia Puspaningrum; Anny Yuniarti

doi:10.11594/nstp.2025.4741

Authors

Budi Nugroho Department of Informatics, Universitas Pembangunan Nasional “Veteran” Jawa Timur, Surabaya 60294, Indonesia
Eva Yulia Puspaningrum Department of Informatics, Universitas Pembangunan Nasional “Veteran” Jawa Timur, Surabaya 60294, Indonesia
Anny Yuniarti Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia

DOI:

https://doi.org/10.11594/nstp.2025.4741

Keywords:

Smart technologies, position of objects, 3D space, EfficientPose

Abstract

Smart technologies, such as automatic self-driving cars, autopilot aircraft, and self-controlled robots, are devices that have the intelligence to control the steering system automatically. In this study, we propose an approach to predict the position of objects in 3D space for controlling autopilot smart devices with more diverse and accurate response capabilities when in contact with various objects around them, such as slowing down, accelerating, avoiding, approaching, changing direction, or picking up objects. One of the state-of-the-art methods in this problem is EfficientPose. This method uses a deep learning approach with a backbone network using EfficientNet and a feature fusion network using BiFPN (Bidirectional Feature Pyramid Network). Through this study, we conduct an experiment on the EfficientPose method by reducing the number of BiFPN layers. This is done because the computational cost required is very large. Reducing the number of BiFPN layers is expected to make the model more efficient (lower computational cost). However, it is possible that the reduction in the number of layers affects the effectiveness of the method. The experimental results showed that reducing the number of BiFPN layers reduced the effectiveness of EfficientPose by 7.81%. However, this step was able to increase efficiency, where the number of parameters decreased by 12.36%, the execution time decreased by 30.63%, and the number of FPS increased by 44.15%. These results provide important information regarding the development of a more efficient 6D pose estimation method using the EfficientPose method as its framework base. Some modifications or additions of certain parts may be possible to improve its effectiveness.

Downloads

Download data is not yet available.

References

Baras, N., Nantzios, G., Ziouzios, D., & Dasygenis, M. (2019). Autonomous obstacle avoidance vehicle using LIDAR and an embedded system. 8th International Conference on Modern Circuits and Systems Technologies (MOCAST), 1–4. https://doi.org/10.1109/MOCAST.2019.8742065

Brynte, L., & Kahl, F. (2020). Pose proposal critic: Robust Pose Refinement by Learning Reprojection Errors. Computer Vision and Pattern Recognition. https://api.semanticscholar.org/CorpusID:218613659

Bukschat, Y., & Vetter, M. (2020). EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach. ArXiv. https://api.semanticscholar.org/CorpusID:226281757

Chen, B., Chin, T.-J., & Klimavivcius, M. (2021). Occlusion-robust object pose estimation with holistic representation. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2223–2233.

Chen, J., Zhang, L., Liu, Y., & Xu, C. (2020). Survey on 6D pose estimation of rigid object. 39th Chinese Control Conference (CCC), 7440–7445. https://doi.org/10.23919/CCC50068.2020.9189304

Di, Y., Manhardt, F., Wang, G., Ji, X., Navab, N., & Tombari, F. (2021). SO-pose: Exploiting self-occlusion for direct 6D Pose Estimation. IEEE/CVF International Conference on Computer Vision (ICCV), 12376–12385.

Georgios, A., Nikolaos, Z., Anastasios, D., Dimitrios, Z., & Petros, D. (2020). DronePose: Photorealistic UAV-assistant dataset synthesis for 3D pose estimation via a smooth silhouette loss. In A. Bartoli Adrien and Fusiello (Ed.), Computer Vision – ECCV 2020 Workshops, pp. 663–681. Springer International Publishing.

Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Konolige, K., Bradski, G. R., & Navab, N. (2013). Model-based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. Asian Conference on Computer Vision (ACCV). https://api.semanticscholar.org/CorpusID:17334031

Iwase, S., Liu, X., Khirodkar, R., Yokota, R., & Kitani, K. M. (2021). RePOSE: Fast 6D object pose refinement via deep texture rendering. IEEE/CVF International Conference on Computer Vision (ICCV), 3283–3292.

Jayaweera, N., Rajatheva, N., & Latva-aho, M. (2019). Autonomous driving without a burden: view from outside with elevated LiDAR. IEEE 89th Vehicular Technology Conference (VTC2019-Spring), 1–7. https://doi.org/10.1109/VTCSpring.2019.8746507

Kehl, W., Manhardt, F., Tombari, F., Ilic, S., & Navab, N. (2017). SSD-6D: Making RGB-Based 3D detection and 6D pose estimation great again. IEEE International Conference on Computer Vision (ICCV), 1530–1538. https://doi.org/10.1109/ICCV.2017.169

Li, Y., Wang, G., Ji, X., Xiang, Y., & Fox, D. (2018). DeepIM: Deep iterative matching for 6D pose estimation. International Journal of Computer Vision, 128, 657–678.

Sundermeyer, M., Márton, Z.-C., Durner, M., & Triebel, R. (2019). Augmented autoencoders: Implicit 3D orientation learning for 6D Object Detection. International Journal of Computer Vision, 128, 714–729.

Tremblay, J., To, T., Sundaralingam, B., Xiang, Y., Fox, D., & Birchfield, S. (2018). Deep object pose estimation for semantic robotic grasping of household objects. ArXiv, abs/1809.10790. https://api.semanticscholar.org/CorpusID:52893770

Wang, C., Xu, D., Zhu, Y., Martín-Martín, R., Lu, C., Fei-Fei, L., & Savarese, S. (2019). DenseFusion: 6D object pose estimation by iterative dense fusion. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3338–3347.

Wang, G., Manhardt, F., Tombari, F., & Ji, X. (2021). GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16606–16616.

Wu, D., Zhuang, Z., Xiang, C., Zou, W., & Li, X. (2019). 6D-VNet: End-to-end 6DoF vehicle pose estimation from monocular RGB images. Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

Xu, Y., Lin, K.-Y., Zhang, G., Wang, X., & Li, H. (2022). RNNPose: Recurrent 6-DoF object pose refinement with robust correspondence field estimation and pose optimization. 14860–14870. https://doi.org/10.1109/CVPR52688.2022.01446