DECtalk ESPress Firmware — Build Process & Architecture

This document describes the build system, source layout, and internal architecture of the DECtalk ESPress firmware.

The firmware build supports both ESP32-S3 and ESP32-C6 targets. The separate HARDWARE.md guide documents the photographed physical perfboard build, which is specifically based on an ESP32-C6 board.

For the DECtalk component build process (dapi source compilation, dictionary cross-compilation, porting notes), see the component BUILD.md.

Table of Contents


Prerequisites

Requirement Version Notes
ESP-IDF v6.0+ (tested with v6.0) The sdkconfig.defaults header references ESP-IDF 6.0
Python 3.8+ Required by ESP-IDF tools
Host C compiler cc or gcc Used at build time to compile the dictionary compiler that runs on the host
CMake 3.5+ Bundled with ESP-IDF
Ninja any Bundled with ESP-IDF

Installing ESP-IDF

Linux / macOS

# Install system dependencies (Ubuntu/Debian)
sudo apt-get install git wget flex bison gperf python3 python3-pip \
    python3-venv cmake ninja-build ccache libffi-dev libssl-dev \
    dfu-util libusb-1.0-0

# Clone ESP-IDF
mkdir -p ~/esp && cd ~/esp
git clone --recursive https://github.com/espressif/esp-idf.git
cd esp-idf
git checkout v6.0   # or latest stable release

# Install toolchains (ESP32-S3 + ESP32-C6 targets)
./install.sh esp32s3 esp32c6

# Activate the environment (run in every new shell, or add to .bashrc)
. ~/esp/esp-idf/export.sh

Windows

Use the ESP-IDF Windows Installer which bundles Git, Python, CMake, Ninja, and the Xtensa/RISC-V toolchains.

Cloning the Repository

The upstream DECtalk source tree is included as a Git submodule at components/dectalk/dectalk. You must initialise it when you clone:

git clone --recursive https://github.com/lllucius/DECtalk_ESPress.git
cd DECtalk_ESPress

If you already cloned without --recursive, pull the submodule manually:

git submodule update --init --recursive

Directory Layout

DECtalk_ESPress/
├── CMakeLists.txt                  # Top-level ESP-IDF project file
├── sdkconfig.defaults              # Default Kconfig values (target, flash, PSRAM, TinyUSB…)
├── sdkconfig.defaults.esp32c6      # Target-specific overrides for ESP32-C6
├── sdkconfig.devel                 # Optional development overrides (diagnostics, PSRAM, debugging)
├── partitions.csv                  # Custom partition table
├── BUILD.md                        # ← this file (firmware build & architecture)
├── README.md                       # Firmware overview and quick-start
│
├── components/
│   └── dectalk/                    # ESP-IDF component wrapping the upstream dapi library
│       ├── CMakeLists.txt          # Compiles all dapi sources + local stubs; builds dictionary
│       ├── Kconfig.projbuild       # menuconfig: language, dict storage, source path (DECtalk menu)
│       ├── README.md               # Component overview, Kconfig settings, dictionary modes
│       ├── BUILD.md                # Component build process, dapi compilation, porting notes
│       ├── project_include.cmake   # Registers custom partition subtypes; manages partition CSV
│       ├── include/
│       │   ├── config.h            # Maps Kconfig DECTALK_DICT_ROOT → DECTALK_INSTALL_PREFIX
│       │   └── sys/
│       │       ├── ipc.h           # Minimal IPC_CREAT / IPC_RMID stubs
│       │       ├── mman.h          # mmap/munmap prototypes + MAP_FAILED constant
│       │       └── shm.h           # shmget/shmat/shmdt/shmctl prototypes
│       └── src/
│           ├── libc_stubs.c        # shmget/shmat/shmdt/shmctl, nanosleep, readlink, dirname
│           └── loaddict_wrappers.c # __wrap_load_dictionary / __wrap_unload_dictionary
│
├── main/                           # Main application component
│   ├── CMakeLists.txt              # Registers main sources; depends on dectalk, driver, pthread…
│   ├── Kconfig.projbuild           # menuconfig: audio, tuning, CDC/JTAG transport, fw commands, diagnostics
│   ├── idf_component.yml           # IDF component manager dep: espressif/esp_tinyusb ≥ 2.0.0
│   ├── dtesp.c                     # Entry point (app_main), I2S init, threads, ESPress protocol
│   ├── dtesp.h                     # Protocol constants, DLE encode/decode, public API
│   ├── dtesp_audio.c               # I2S output initialisation + TLV320DAC3100 codec init
│   ├── dtesp_audio.h               # Audio subsystem public API
│   ├── dtesp_transport.h           # Transport vtable (dtesp_transport_t) shared by all transports
│   ├── dtesp_jobs.h                # Job queue types (SPEAK_TEXT, ACTION, FLUSH)
│   ├── dtesp_job_pool.c            # Pre-allocated job pool allocator
│   ├── dtesp_job_pool.h            # Job pool public API
│   ├── custom_commands.c           # [:fw …] tokeniser and job-list builder
│   ├── custom_commands.h           # Tokeniser public API and job-list types
│   ├── custom_actions.c            # [:fw …] sub-command handlers (gpio, voice, rate, tone, …)
│   ├── custom_actions.h            # Action handler public API
│   ├── fw_settings.c               # NVS-backed codec settings (volume, profile, autoswitch)
│   ├── fw_settings.h               # Firmware settings public API
│   ├── tlv320.c             # TI TLV320DAC3100 codec driver (Adafruit breakout)
│   ├── tlv320.h             # Codec driver public API
│   ├── volume_knob.c               # Optional ADC-backed volume potentiometer support
│   ├── volume_knob.h               # Volume knob public API / no-op wrappers
│   ├── usb_cdc_transport.c         # ESP32-S3 USB CDC-ACM transport layer (TinyUSB wrapper)
│   ├── usb_cdc_transport.h         # ESP32-S3 transport API
│   ├── jtag_serial_transport.c     # ESP32-C6 USB Serial/JTAG transport layer
│   ├── jtag_serial_transport.h     # ESP32-C6 transport API
│   ├── diag_mem.c                  # Optional heap/stack diagnostics task
│   └── diag_mem.h                  # Diagnostics API
│
├── host/                           # Python host-side tools
│   ├── README.md                   # Host tools documentation
│   ├── dtesp_serial.py             # DECtalkESPressSerial class (serial protocol API)
│   └── dtesp_gui_qt.py             # Qt (PySide6/PyQt6) GUI for voice control, status, pause/resume
│
└── tests/                          # Host-native unit tests (no ESP-IDF required)
    ├── Makefile                    # Build and run: `make -C tests test`
    └── test_custom_commands.c      # Tests for the [:fw …] custom command parser

How the Build Works

1. Project Bootstrapping

CMakeLists.txt is a minimal ESP-IDF project file:

cmake_minimum_required(VERSION 3.22)
include($ENV{IDF_PATH}/tools/cmake/project.cmake)
project(dtesp)

ESP-IDF discovers the components/dectalk/ and main/ components automatically.

2. Component: dectalk (the TTS library)

The DECtalk component handles source resolution, language selection, dictionary cross-compilation, and dapi library compilation. For full details see the component BUILD.md.

3. Component: main (Firmware Application)

The main/ component contains the application logic:

File Role
dtesp.c Entry point (app_main), I2S initialisation, thread creation, ESPress protocol loop, speech task, TTS callback
dtesp_audio.c I2S output initialisation and, when selected, TLV320DAC3100 codec configuration
dtesp_job_pool.c Pre-allocated pool allocator for job objects, with heap fallback
custom_commands.c Tokenises incoming text for [:fw …] tokens and builds ordered job lists
custom_actions.c Sub-command handlers for [:fw gpio], [:fw voice], [:fw rate], [:fw tone], codec controls, and TLV320 DSP commands such as bass, treble, eq, drc, spkgain, and mute
fw_settings.c NVS-backed mirror of codec settings (volume, profile, autoswitch); loaded at startup, persisted by [:fw save]
tlv320.c Driver for the TI TLV320DAC3100 stereo DAC / headphone amplifier (Adafruit breakout); compiled only when DTESP_DAC_TLV320 is selected
volume_knob.c Optional ADC-sampled analog volume knob with smoothing, hysteresis, and soft-takeover against firmware volume changes
usb_cdc_transport.c ESP32-S3 TinyUSB CDC-ACM driver: RX stream buffer, DTR-based connection tracking, reconnection detection
jtag_serial_transport.c ESP32-C6 USB Serial/JTAG driver: buffered RX/TX, reconnect detection, RTS-reset suppression
diag_mem.c Optional diagnostic task enabled from idf.py menuconfig that logs stack HWM and heap stats every 10 s

Dependencies declared in CMakeLists.txt:

External dependency via idf_component.yml:

4. Partition Table

partitions.csv defines a custom layout:

Name Type SubType Size Purpose
nvs data nvs 24 KB Non-volatile storage
phy_init data phy 4 KB PHY calibration data
factory app factory 2 MB Application firmware

Additional partitions can be added for dictionary storage:

The project_include.cmake file registers the custom udict subtype (0x40) with ESP-IDF and manages dynamic partition table extension.

5. sdkconfig.defaults

These are the minimal settings required for the project. ESP-IDF applies them automatically whenever the sdkconfig file is created or recreated (e.g. after idf.py fullclean or idf.py set-target):

Setting Value Rationale
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240 y Maximum CPU clock for synthesis performance
CONFIG_ESP_TASK_WDT_EN n Task watchdog disabled (speech synthesis is CPU-intensive)
CONFIG_IDF_TARGET esp32s3 Default target SoC
CONFIG_PARTITION_TABLE_CUSTOM y Use the project's partitions.csv
CONFIG_PTHREAD_TASK_STACK_SIZE_DEFAULT 8192 Default pthread stack (8 KB)
CONFIG_TINYUSB_CDC_ENABLED y Enable TinyUSB CDC-ACM for the ESP32-S3 host protocol

When building for ESP32-C6, ESP-IDF also loads sdkconfig.defaults.esp32c6 after idf.py set-target esp32c6. That file currently overrides:

Setting Value Rationale
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_160 y ESP32-C6 default maximum CPU clock
CONFIG_TINYUSB_CDC_ENABLED n ESP32-C6 host communications use USB Serial/JTAG instead of TinyUSB CDC

6. sdkconfig.devel (Development Overrides)

sdkconfig.devel contains additional settings useful during development and debugging. These are not applied automatically — you must explicitly combine them with sdkconfig.defaults (see Combining sdkconfig Files below).

Setting Value Rationale
CONFIG_COMPILER_STACK_CHECK_MODE_STRONG y Strong stack-smashing detection
CONFIG_DTESP_ENABLE_DIAG_MEM y Enable heap/stack diagnostics task
CONFIG_DTESP_LOG_LEVEL_VERBOSE y Verbose ESP_LOG output
CONFIG_ESPTOOLPY_FLASHSIZE_8MB y 8 MB flash for firmware + dictionary
CONFIG_ESPTOOLPY_HEADER_FLASHSIZE_UPDATE y Auto-update flash size in binary header
CONFIG_ESP_SYSTEM_PANIC_PRINT_HALT y Print backtrace and halt on panic
CONFIG_FREERTOS_USE_TRACE_FACILITY y Enable FreeRTOS task trace (for diagnostics)
CONFIG_HEAP_ABORT_WHEN_ALLOCATION_FAILS y Hard-fail on OOM for easier debugging
CONFIG_HEAP_POISONING_COMPREHENSIVE y Full heap poisoning for corruption detection
CONFIG_SPIRAM y Enable PSRAM
CONFIG_SPIRAM_MODE_OCTAL y Octal SPI PSRAM

7. Combining sdkconfig Files

ESP-IDF can merge multiple defaults files at configuration time using the -D SDKCONFIG_DEFAULTS CMake variable. This is useful for layering the development overrides on top of the base defaults:

# Create (or recreate) sdkconfig with both base and devel settings:
idf.py -D SDKCONFIG_DEFAULTS="sdkconfig.defaults;sdkconfig.devel" build

Settings in later files override earlier ones, so sdkconfig.devel values take precedence over sdkconfig.defaults.

When do you need to do this? Only when the sdkconfig file needs to be created or recreated — for example after idf.py fullclean, idf.py set-target, or when cloning the project for the first time. Once sdkconfig exists, subsequent idf.py build commands reuse it and you do not need to pass -D SDKCONFIG_DEFAULTS again. You can also make further changes interactively with idf.py menuconfig at any time.


Firmware Architecture

Thread Model

app_main() creates two pthreads and then returns (freeing the default FreeRTOS task):

Thread Core Stack Role
speech_thread CPU 1 (configurable) default (8 KB) Dequeues text from speech_queue, calls TextToSpeechSpeak() + Sync()
main_thread any 12 KB (configurable) Runs the ESPress protocol loop: host-transport reads, DLE state machine, flow control

CPU pinning is important: the speech synthesis in TextToSpeechSpeak() is compute-intensive and does not yield to the scheduler. Pinning it to CPU 1 keeps CPU 0 free so the IDLE0 task can service the Task Watchdog Timer (even though the watchdog is disabled in the defaults, this is defensive).

Data Flow

  Host (PC)                         ESP32-S3 / ESP32-C6
  ─────────                         ───────────────────
  Serial terminal / GUI             USB CDC-ACM / USB Serial/JTAG
       │                                │
       │  ASCII text, control chars     │
       │  DLE command sequences         │
       ├───────────────────────────────►│
       │                                ▼
       │                         main_thread (protocol loop)
       │                           ├── DLE state machine
       │                           ├── Control char handlers
       │                           ├── Text accumulation buffer
       │                           └── XON/XOFF flow control
       │                                │
       │                                │ strdup'd text chunks
       │                                ▼
       │                         speech_queue (FreeRTOS queue)
       │                                │
       │                                ▼
       │                         speech_thread
       │                           ├── TextToSpeechSpeak()
       │                           ├── TextToSpeechSync()
       │                           └── Flush / drain
       │                                │
       │                                │ TTS_MSG_BUFFER callback
       │                                ▼
       │                         dtesp_tts_callback()
       │                           ├── Audio samples → I2S DMA
       │                           └── Index markers → DLE INDEX
       │                                │
       │  DLE STATUS, INDEX, XON/XOFF   │
       │◄───────────────────────────────┤
       │                                │
       │                                ▼
       │                         I2S peripheral → DAC → speaker

TTS In-Memory Mode

The firmware uses TextToSpeechOpenInMemory() with three rotating audio buffers (16 KB each, 8192 16-bit samples per buffer). Each buffer also carries up to 8 index-mark slots. When a buffer is filled, the dtesp_tts_callback() is invoked with TTS_MSG_BUFFER:

  1. Any embedded index marks are extracted and sent to the host as DLE INDEX sequences.
  2. Audio samples are written to the I2S DMA ring buffer via i2s_channel_write().
  3. If speech is paused (SO received), samples are zeroed before writing.
  4. The buffer is reset and re-queued with TextToSpeechAddBuffer().

Host Serial Transport

The main/ component selects the host transport at build time based on IDF_TARGET:

ESP32-S3: USB CDC-ACM

Why TinyUSB CDC instead of the built-in USB Serial/JTAG? The ESP32-S3's onboard USB Serial/JTAG peripheral automatically reboots the chip whenever the host toggles DTR. Because the ESPress protocol relies on DTR transitions to detect host connect/disconnect events, using the built-in Serial/JTAG would cause the device to reboot every time a host application opens the port. TinyUSB's CDC-ACM device avoids this by giving the firmware full control over how DTR line-state changes are handled — no reboot, just a protocol-state reset.

usb_cdc_transport.c wraps the espressif/esp_tinyusb CDC-ACM interface:

ESP32-C6: USB Serial/JTAG

jtag_serial_transport.c uses ESP-IDF's usb_serial_jtag driver:

Dictionary Loading

Dictionary loading is handled by the DECtalk component. See the component README for dictionary storage modes and the component BUILD.md for the dictionary build pipeline and __wrap_load_dictionary() implementation.

Flow Control

The protocol loop implements two-tier XON/XOFF flow control:

  1. Text buffer level — XOFF at 2/3 full, XON at 1/3 full.
  2. Speech queue depth — XOFF at 3/4 full, XON at 1/4 full.

XOFF is sent when either threshold is exceeded (aggressive). XON requires both to be below their respective thresholds (conservative) to prevent rapid oscillation.

Idle Flush

If no new characters arrive for CONFIG_DTESP_TEXT_IDLE_TIMEOUT_MS (default 200 ms), any buffered text is automatically flushed to the speech queue. This handles the case where the host sends text without a trailing CR.


Build Commands Reference

# Full clean build
idf.py fullclean && idf.py build

# Full clean build with development overrides
idf.py fullclean && idf.py -D SDKCONFIG_DEFAULTS="sdkconfig.defaults;sdkconfig.devel" build

# Build only
idf.py build

# Flash (replace /dev/ttyUSB0 with your UART port)
idf.py -p /dev/ttyUSB0 flash

# Flash dictionary partition separately (when using partition mode)
idf.py -p /dev/ttyUSB0 udict-flash

# Monitor console output (UART0)
idf.py -p /dev/ttyUSB0 monitor

# Open menuconfig
idf.py menuconfig

# Set target (only needed once, or after fullclean)
idf.py set-target esp32s3

# Build for ESP32-C6 instead
idf.py set-target esp32c6
idf.py build

Changing Language

idf.py menuconfig
# Navigate to: DECtalk → DECtalk Language
# Select desired language, save, exit

idf.py build
idf.py -p /dev/ttyUSB0 flash

See the component README for supported languages and compile definitions.

Changing Dictionary Storage Mode

idf.py menuconfig
# Navigate to: DECtalk → Dictionary location
# Choose: Embedded in firmware / Dedicated partition / File system

See the component README for detailed descriptions of each storage mode and its sub-options.


Porting Notes

For details on how the upstream dapi library was adapted for ESP32 (compile definitions, header shims, libc stubs, linker wrapping, warning suppression), see the component BUILD.md.