Real-Time Audio Translation and Audio-Based Construction Monitoring

Description

Environmental sound recognition has become popular over the past decade, leading to an extensive amount of research. Real-time audio translation and audio-based construction monitoring is a state-of-the-art technology that uses machine learning methods to analyze sounds emanating from work activities and equipment operations in a dynamic construction field to convey information regarding construction work progress, performance, safety, status of project, and others. The operation of non-speech sound classification known as Automatic Sound Recognition (ASR). In an ASR system, sound recognition is automatically performed using signal processing and machine learning techniques. Various applications have been developed for audio-based surveillance, sound event recognition, and environmental sound recognition related to an ASR system. The sound-based approach has been broadly adopted for managing and monitoring public traffic safety and private driving securities. For private driving securities, sound signals are classified into normal (start, idling and driving sound) and abnormal (breakdown sound) categories to diagnose the performance of a car and give an alert to the driver in advance. Car crash and tire skidding are two leading accidents on road. Early recognition (ER) system is proposed and more advanced theory (human actions may cause Doppler shifts of audio signals) is applied to recognize drivers’ inattentive behaviors (fetching forward, picking up drops, turning back and eating and drinking) with installed audio devices on smartphones (Xu et al. 2017). The speed of driving is also a critical factor influencing safety in highway, and an audio analysis is applied to measure the proportion of time when drivers drive in limited speed (Kubera et al. 2016). Rani and Sreenivas (2020) introduced a GPS system integrated with a simple voice recognition subsystem that prevents car theft and accidents through remote control.

This technology employs the use of audio sensors for capturing on-site sound data and classification algorithms such as Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) (Peng et al. 2009), Support Vector Machine (SVM) (Cheng et al. 2016; Xu et al.,2013, Zhang et al., 2018, Sherafat et al., 2019), Principle Component Analysis (PCA)( Xu et al.,2013), and Neural Network models like KNN(Xie et al., 2019), DNN and CNN (Maccagno et al., 2019, Yu et al., 2019) for analyzing the captured data to reveal useful information. For the purpose of the public traffic safety, sound-based monitoring for safety of metro station environment has been investigated. Mel-frequency cepstral coefficients (MFCC) and MPEG-7 (spectrum flatness, waveform, fundamental frequency) features are extracted form audio data, and Hidden Markov Model (HMM) is applied in two stages to recognize abnormal voice (gunshot, scream, explosion) and improve the public security in metro station (Ntalampiras et al. 2009). For railway safety, train fault diagnosis is based on the audio signal when train running, and features are extracted via wavelet packet and recognized with neural network (Zhao et al., 2017). In addition, two methods, GMM, SVM (Gaussian mixture model, Support-vector machine), are compared to detect shout events in real life railway environment and both achieve promising results (Rouas et al., 2006). In addition, sound data have been utilized for applications in health and safety monitoring. Sounds generated by body falls and distress speech expressions are applied to detect emergency incidents for patients with attached equipment at home (Doukas et al., 2009). The idea is applied to monitor the elderly at home remotely to provide them with the first assistance using a personal computer, a sound card, and a microphone. The accuracy of the system can be further improved by adding a real time SNR (Signal-to-Noise Ratio) estimator that will allow the adaptation of the GMM models (Istrate et al., 2008). Moreover, Virone et al., 2003, developed the health information system for remote monitoring of the health status of the elderly at home. They adopted multichannel audio acquisition equipment to collect real time environmental sounds and execute a home health monitoring system by detecting abnormal noise. Wu et al., 2009 also designed a robot that detects abnormal events for aging people at home with integrating video with audio information, and a full sound tele-monitoring system for the same purpose. An audio-based monitoring method is also applied into the workers’ hearing protection devices (HPDs) in the heavy industry through detecting sound events recorded in ear-canal generated by breathing, which can be applied to the construction field (Martin and Voix 2017).

Sound capturing devices have been utilized for applications related to outdoor and indoor safety monitoring. In terms of outdoor monitoring, an audio surveillance system is widely applied for outdoor events detection. An audio-based method is capable of detecting impulsive sounds including glass breaks, human screams and gunshots (Dufaux et al., 2020). Lei and Valdez, 2013, suggested a special sound detection system embedded in an emergency phone in public areas that replaced a user’s act of scrambling to reach a phone to press a button, which they regarded as time consuming. The special sound system consisted of a dynamic time warping algorithm that accurately detects the sound of a gunshot. Recently, several studies have investigated audio-based applications for improving construction processes and management [43] & [44]. One study [45] explored the potential application of speech recognition technology for bridge inspection and showed key factors influencing the results. The authors performed a comprehensive review of the applications of sound recognition for construction safety and proposed a framework for safety monitoring with an attribute-based approach that separately defined the hazards of each activity [45]. Cho et al. [46] investigated the classification of overlapped sound data. Using the support vector machine and a frequency-domain approach with a spectrogram they were able to monitor sound-based construction activity. In addition, Wei et al. [47], designed a noise hazard prediction framework that integrates a wearable audio sensor and BIM data.

Several recent studies have suggested or examined new approaches or technologies that enhance construction safety management and monitoring. Li and Becerik-Gerber, 2014, introduced an algorithm in a BIM-centered environment designed to locate trapped occupants in building fires. One system using the IoT (Internet of Things) technology introduced by Ding and Zhou, was designed for the early warning and prevention of accidents and overall improved safety management in underground construction sites. Cheung et al. 2018, created a unique system that integrated BIM and wireless sensor network (WSN) to visually monitor a construction area through a spatial, colored interface and to automatically remove dangerous gases. Marzouk and Abdelaty, 2014, also employed WSN and BIM to monitor thermal conditions in subways by monitoring indoor temperatures and particulate matter (PM) levels.

Figure1: Different types of audio sensors deployed for capturing sound data (Bello et al., 2019; Cheng et. al., 2016)

Figure 2. ZCC and LEF Feature of Enhanced Excavating Sound (Xie et, al., 2019)

Figure 3. Example of CNN-based sound classification by a log-mel spectrogram extracted from a fragment along with its derivative. Backhoe JD50D Compact, Compactor Ingersoll Rand, Concrete Mixer, Excavator Cat 320E, Excavator Hitachi 50U. (Maccagno et al., 2019)

Benefits & Barriers

Less data processing weight.
No limitations in tracking at night. (Not requiring certain level of illumination for monitoring)
No limitation in range of angle for capturing data.
Requiring regeneration of sound data and appropriate sound classifiers for each construction project because of heterogeneous sounds generated from different equipment, work environment and other noise.
Limitation in identifying multiple sounds and improving accuracy due to background noise in data collected.

Business Value Proposition

The construction industry has been suffering from the lack of real-time project management and performance monitoring systems (Navon, 2005). In addition, the lack of appropriate technologies to acquire field data has been a primary obstacle to prevent real-time data collection and analyses. Real-time audio translation and audio-based monitoring can provide a robust system for automated real-time surveillance of construction sites, as sound-data is light weight and easy to process as compared to image data. Moreover, this system allows for remote monitoring of multiple construction sites at the same time, eliminating the need for skilled professionals at each construction site for progress monitoring and safety surveillance. Since this system is still under development, the accuracy is low as background noise in construction often leads to misjudgment, but with further advancement, it can be expected that this system can not only overcome this drawback but also automatically regenerate the classifier as per the project.

Other Industries

Real-Time audio translation and audio based monitoring can also be used in areas other than construction monitoring. Several research studies have been conducted where this system has been put to use.

Petroleum Industry – The real-time audio translation system can be used to detect and timely mitigate possible hazards caused due to leakage in oil and gas pipelines. In a research study, C. Wan and A. Mita (2008), proposed a sound based hazard detection system that relied on audio sensors placed at fixed intervals throughout the course of important pipelines that run underground in modern cities. The objective was to safeguard these pipelines from any possible damage by road cutters and other heavy civil equipment that are generally used for repairing and reconstructing roadways, and thereby mitigating catastrophic consequences due to unintentional damage (Wan and Mita, 2008).

Department of Environmental Protection- A real-time audio translation system is currently being used in the SONYC (Sounds of New York City) Project that aims to monitor, analyse, and mitigate urban noise pollution, which is a matter of growing concern in modern cities. This project utilizes an integrated cyber-physical system that detects the source of the sound in real-time, thereby enabling adequate and timely action for mitigating it (Mydlarz et al., 2019).

Figure 2: Remote Visualization Interface (Bello et al., 2019)

Other applications of the real-time audio translation system may include, safety monitoring in public transport, intruder detection in wildlife areas, and monitoring of elderly people also known as medical tele-monitoring (Sharan and Moir, 2016).

News and References

Bello, J. P., Silva, C., Nov, O., Dubois, R. L., Arora, A., Salamon, J., Mydlarz, C., and Doraiswamy, H. (2019). “SONYC: A system for monitoring, analyzing, and mitigating urban noise pollution.” in Communications of the ACM, (Vol. 62, No. 2). (Feb, 2019)

Brilakis, I., Fathi, H., & Rashidi, A. (2011). Progressive 3D reconstruction of infrastructure with videogrammetry. Automation in Construction, 20(7), 884-895. (2011)

Cheng, C. F., Rashidi, A., Davenport, M. A., & Anderson, D. (2016). “Audio signal processing for activity recognition of construction heavy equipment,” in  ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction (Vol. 33, p. 1). Vilnius Gediminas Technical University, Department of Construction Economics & Property. (Jan., 2016)

Mydlarz, C., Sharma, M., Lockerman, Y., Steers, B., Silva, C., and Bello, J. P. (2019). “The life of a New York City noise sensor network.” In Sensors, 19, 1415. (Mar. 22, 2019)

Navon, R. (2005). Automated project performance control of construction projects. Automation in construction, 14(4), 467-476. (2005)

Park, J.W., Marks, E., Cho, Y.K, Suryanto, W., (2015). Performance Test of Wireless Technologies for Personnel and Equipment Proximity Sensing in Work Zones, Journal of Construction Engineering and Management, ASCE, 1(142). (2015)

Peng, Y. T., Lin, C. Y., Sun, M. T., & Tsai, K. C. (2009). “Healthcare audio event classification using hidden markov models and hierarchical hidden markov models.,” in Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on (pp. 1218-1221). (June, 2009)

Su, Y., Zhang, K., Wang, J., and Madani, K. (2019). “Environment sound classification using a two-stream CNN based on decision-level fusion.” Sensors, 19(7), 1733. (Apr. 11, 2019)

Teizer, J., Lao, D., & Sofer, M. (2007). Rapid automated monitoring of construction site activities using ultra-wideband. In Proceedings of the 24th International Symposium on Automation and Robotics in Construction, Kochi, Kerala, India (pp. 19-21). (Sep., 2007)

Turkan, Y., Bosche, F., Haas, C. T., & Haas, R. (2012). Automated progress tracking using 4D schedule and 3D sensing technologies. Automation in Construction, 22, 414-421. (2012)

Wan, C. and Mita, A. (2008). “ An automated pipeline monitoring system based on PCA and SVM,” in World Academy of Science, Engineering, and Technology, 45. (2008)

Xu, X., Gao, H., Yu, J., Chen, Y., Zhu, Y., Xue, G., and Li, M. (2017). "ER: Early recognition of inattentive driving leveraging audio devices on smartphones," in INFOCOM 2017-IEEE Conference on Computer Communications, IEEE, 2017.

Xie, Y., Lee, Y. C., Shariatfar, M., Zhang, Z. D., Rashidi, A., & Lee, H. W. (2019). Historical Accident and Injury Database-Driven Audio-Based Autonomous Construction Safety Surveillance. In Computing in Civil Engineering 2019: Data, Sensing, and Analytics (pp. 105-113). Reston, VA: American Society of Civil Engineers.

Sherafat, B., Rashidi, A., Lee, Y. C., & Ahn, C. R. (2019). Automated Activity Recognition of Construction Equipment Using a Data Fusion Approach. Proceedings of the Computing in Civil Engineering.

Sherafat, B., Rashidi, A., Lee, Y. C., & Ahn, C. R. (2019). A Hybrid Kinematic-Acoustic System for Automated Activity Detection of Construction Equipment. Sensors, 19(19), 4286.

Xie, Y., Lee, Y. C., Huther da Costa, T., Park, J., Jui, J. H., Choi, J. W., & Zhang, Z. (2019). Construction Data-Driven Dynamic Sound Data Training and Hardware Requirements for Autonomous Audio-based Site Monitoring. In ISARC. Proceedings of the International Symposium on Automation and Robotics in Construction (Vol. 36, pp. 1011-1017). IAARC Publications.

Maccagno, A., Mastropietro, A., Mazziotta, U., Scarpiniti, M., Lee, Y. C., & Uncini, A. (2019). A CNN Approach for Audio Classification in Construction Sites.

Zhang, T., Lee, Y. C., Scarpiniti, M., & Uncini, A. (2010, October). A Supervised Machine Learning-Based Sound Identification for Construction Activity Monitoring and Performance Evaluation. In Construction Research Congress 2018 (pp. 358-366).

Xu, H. Gao, J. Yu, Y. Chen, Y. Zhu, G. Xue and M. Li, "ER: Early recognition of inattentive driving leveraging audio devices on smartphones," INFOCOM, Conference on Computer Communications, IEEE, 2017. https://doi.org/10.1109/INFOCOM.2017.8057022
Kubera, A. Wieczorkowska, T. Slowik, A. Kuranc and K. Skrzypiec, "Audio-based speed change classification for vehicles," International Workshop on New Frontiers in Mining Complex Patterns, 2016. https://doi.org/10.1007/978-3-319-61461-8_4
J. Rani and T. H. Sreenivas, "Remote vehicle tracking system through voice recognition app using smart phone," International Journal of Science, Engineering and Computer Technology, vol. 5, no. 6, pp. 200, 2015. https://www.semanticscholar.org/paper/Remote-Vehicle-Tracking-System-through-Voice-App-Rani/d8e936a3372387fe8c581ac806915c086f29a06d, Last accessed on January 30, 2020
Ntalampiras, I. Potamitis and N. Fakotakis, "On acoustic surveillance of hazardous situations," In Acoustics, Speech and Signal Processing, 2009. https://doi.org/10.1109/ICASSP.2009.4959546
Zhao, M. Bai and J. Yang, "Train safety detection technology based on audio analysis," In Proceedings of the 2017 2nd International Conference on Communication and Information Systems, 2017. https://doi.org/10.1145/3158233.3159363
L. Rouas, J. Louradour and S. Ambellouis, "Audio events detection in public transport vehicle," In Intelligent Transportation Systems Conference, 2006. https://doi.org/10.1109/ITSC.2006.1706829
Doukas, L. Athanasiou, K. Fakos and I. Maglogiannis, "Advanced sound and distress speech expression classification for human status awareness in assistive environments," The Journal on Information Technology in Healthcare, vol. 7, no. 2, pp. 111-117, 2009. https://www.researchgate.net/publication/228660037_Advanced_sound_and_distress_speech_expression_ classification_for_human_status_awareness_in_assistive_environments, Last accessed on January 30, 2020
Istrate, M. Vacher and J. F. Serignat, "Embedded implementation of distress situation identification through sound analysis," The Journal on Information Technology in Healthcare, vol. 6, no. 3, pp. 204-211, 2008. https://www.researchgate.net/publication/228711376_Embedded_Implementation_of_Distress_Situation_ Identification_through_Sound_Analysis, Last accessed on January 30, 2020
Virone, D. Istrate, M. Vacher, N. Noury, J. F. Serignat and J. Demongeot, "First steps in data fusion between a multichannel audio acquisition and an information system for home healthcare," Engineering in Medicine and Biology Society, 2003. https://doi.org/10.1109/IEMBS.2003.1279557
Wu, H. Gong, P. Chen, Z. Zhong and Y. Xu, "Surveillance robot utilizing video and audio information," Journal of Intelligent and Robotic Systems, vol. 55, no. 4-5, pp. 403-421, 2009. https://doi.org/10.1007/s10846-008-9297-3
Istrate, E. Castelli, M. Vacher, L. Besacier and J. F. Serignat, "Information extraction from sound for medical telemonitoring," Transactions on Information Technology in Biomedicine, 2006. https://doi.org/10.1109/TITB.2005.859889
Martin and J. Voix, "In-ear audio wearable: measurement of heart and breathing rates for health and safety monitoring," IEEE Transactions on Biomedical Engineering, 2017. https://doi.org/10.1109/TBME.2017.2720463
Dufaux, L. Besacier, M. Ansorge and F. Pellandini, "Automatic sound detection and recognition for noisy environment," Signal Processing Conference, 2000. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7075558&isnumber=7065137, Last accessed on January 30, 2020
Lei and O. Valdez, "Special sound detection for emergency phones," Fuzzy Systems and Knowledge Discovery, 2013. https://doi.org/10.1109/FSKD.2013.6816306
Park, Y. Kim, E. T. Matson, H. Jang, C. Lee and W. Park, "An intuitive interaction system for fire safety using a speech recognition technology," Automation, Robotics and Applications, 2015. https://doi.org/10.1109/ICARA.2015.7081179
Sherafat, A. Rashidi, Y.-C. Lee and C. R. Ahn, "A hybrid kinematic-acoustic system for automated activity detection of construction equipment," Sensors, vol. 19, no. 19, 2019. https://doi.org/10.3390/s19194286
Sabillon, A. Rashidi, B. Samanta and M. A. Davenport, "Audio-based bayesian model for productivity estimation of cyclic construction activities," Computing in Civil Engineering, vol. 34, no. 1, 2020. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000863
Esmaeili, M. R. Hallowell and B. Rajagopalan, "Attribute-based safety risk assessment. I: Analysis at the fundamental level," Journal of Construction Engineering and Management, vol. 141, no. 8, pp. 04015021, 2015. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000980
Cho, Y. C. Lee and T. Zhang, "Sound recognition techniques for multi-layered construction activities and events," Computing in Civil Engineering, 2017. https://doi.org/10.1061/9780784480847.041

W Wei, C Wang and Y. C. Lee, "BIM-based construction noise hazard prediction and visualization for occupational safety and health awareness improvement," Computing in Civil Engineering, 2017. https://doi.org/10.1061/9780784480823.032

Li, B. Becerik-Gerber, B. Krishnamachari and L. Soibelman, "A BIM centered indoor localization algorithm to support building fire emergency response operations," Automation in Construction, vol. 42, pp. 78-89, 2014. https://doi.org/10.1016/j.autcon.2014.02.019
Marzouk and A. Abdelaty, "Monitoring thermal comfort in subways using building information modeling," Energy and Buildings, vol. 84, pp. 252-257, 2014. https://doi.org/10.1016/j.enbuild.2014.08.006
Liu, B. Cui, Y. Liu and D. Zhong, "Automatic control and real-time monitoring system for earth–rock dam material truck watering," Automation in Construction, vol. 30, pp. 70-80, 2013. https://doi.org/10.1016/j.autcon.2012.11.007
Awolusi, E. Marks and M. Hallowell, "Wearable technology for personalized construction safety monitoring and trending: Review of applicable devices," Automation in Construction, vol. 85, pp. 96-106, 2018. https://doi.org/10.1016/j.autcon.2017.10.010
F. Cheung, T. H. Lin and Y. C. Lin, "A real-time construction safety monitoring system for hazardous gas integrating wireless sensor network and building information modeling technologies," Sensors, vol. 18, no. 2, pp. 436, 2018. https://doi.org/10.3390/s18020436

Provide Feedback

Tell us what you think about this topic! Do you have something to add? More information about how it could be applied? Click the link below to submit feedback. We'd love to hear from you!

https://utexas.qualtrics.com/jfe/form/SV_78t8VlcCLFTFPlb

Properties

Summary
Incremental or Game Changing	Increm / GC
Learn	50%
Watch
Pilot
Use