ESP-RTC: Real-Time Audio-and-Video Solution

by | Sep 15, 2022

ESP-RTC achieves stable, smooth, ultra-low-latency voice-and-video transmission in real time, providing an ideal solution for users who want to build low-cost and low-power audio-and-video products.

Espressif Systems (SSE: 688018.SH) is pleased to announce the release of ESP-RTC (ESP Real-Time Communication), an audio-and-video communication solution, which achieves stable, smooth and ultra-low latency voice-and-video transmissions in real time.

ESP-RTC is built around Espressif’s ESP32-S3-Korvo-2 multimedia development board. ESP32-S3-Korvo-2 is equipped with the ESP32-S3 AI SoC, and a dual microphone array for near-/far-field voice wake-up and speech recognition. It also integrates cameras, Micro SD cards, LCDs and other peripherals, and supports processing based on MJPEG video streams, thus providing an ideal development board for users who wish to build low-cost and low-power audio-and-video products.

The ESP-RTC solution materialises real-time audio-and-video transmission based on Espressif’s self-developed SIP (Session Initialisation Protocol) stack, which includes a transport layer, a transaction layer and a session layer. The signalling interaction module of ESP-RTC supports UDP, TCP and TLS, while its media transmission module supports RTP (UDP), RTCP, SRTP, TURN and other NAT transmission protocols. It is worth mentioning that the transmission module of the ESP-RTC solution also includes counter-measure algorithms, such as a Jitter Buffer and PLC, which effectively solve packet loss, jitter, congestion, and delays in weak networks, fully ensuring smooth audio-and-video communication in real time.

The ESP-RTC solution also supports the RTSP (Real Time Streaming Protocol) stack, whose media transmission module supports both RTP/UDP and RTP over TCP. The ESP-RTC solution can be used as an RTSP server supporting the on-demand use of such players as VLC/FFMPEG/PotPlayer/KmPlayer, or as an RTSP client supporting EasyDarwin, an easy-to-use, open-source, streaming platform framework.

Based on Espressif’s self-developed algorithms, i.e., acoustic echo cancellation (AEC), background noise suppression (BNS), automatic gain control (AGC), ESP-RTC reduces sound interference in audio calls, ensuring high quality and stability in voice communication. ESP-RTC also utilises Espressif’s chip-level codec algorithm to provide users with a clear picture in their video calls. Furthermore, ESP-RTC takes advantage of the excellent AI computing power of Espressif’s ESP32-S3 SoC, to achieve high-performance voice wake-up, voice recognition, and image recognition. Thus, ESP-RTC is suitable for the development of smart speakers, door video-intercom systems, smart-home control panels, pet monitors, car monitors, children’s toys and other application scenarios.

The solution supports open-source servers, such as FreeSWITCH and FreePBX, and can also access mature SFU Cloud servers to materialise group conference calls. Additionally, developers can quickly build audio and video communication-related applications with the help of Espressif’s open-source ESP-IDF (IoT Development Framework) and ESP-ADF (Audio Development Framework).


To find out more, contact Alternatively, visit Espressif or Macnica.

Also, stay up to date with the most recent machine vision and image processing news right here on MVPro Media.

Related articles

Delta Selected Among the Best Taiwan Global Brands for 13 Years

Delta Selected Among the Best Taiwan Global Brands for 13 Years

Delta was selected as one of the “2023 Best Taiwan Global Brands” for the 13th consecutive year. Delta’s brand was also valued at US$544 million, a noteworthy surge of 28% from 2022, establishing a new record. “The Best Taiwan Global Brands” is organized by the...

Trending Articles

Join our mailing list

Subscribe to our mailing list to receive regular updates!