Release esp-va-sdk-v1.2

espressif · Dec 7, 2019 · 8d23e83 · 8d23e83
1 parent 1a3661f
commit 8d23e83
Show file tree

Hide file tree

Showing 86 changed files with 2,035 additions and 1,040 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,40 @@
 ## ChangeLog
+
+### 1.2-RC1 - 2019-11-13
+
+**Enhancements**
+
+* Support for BT A2DP sink with Alexa.
+* Support for setting device configuration via companion app. Below are the configuration options settable/gettable via the app:
+    * Assistant's language
+    * Device name (visible over local network after provisioning)
+    * Device volume
+    * Alexa WW detection tone (start tone)
+    * Query end tone (end tone)
+* Support for displaying WiFi's authentication mode (open or secure) in the app during provisioning.
+* Support for streaming binary/octate-stream media content type.
+* Support for 5 linear LED patterns for Alexa events.
+
+**API Changes**
+
+* Removed `va_playback` from alexa_config_t. It is now being handled internally.
+* `media_hal.c` is made common to all boards.
+* `va_dsp_init` api now requires two callback parameters `va_dsp_recognize_cb_t` and `va_dsp_record_cb_t`.
+* Alexa device's "Product ID" can now also be specified from menuconfig.
+
+**Bug Fixes**
+
+* Long duration stability improvements.
+* Fixed a memory leak of 48 bytes after each NVS operation.
+* Fixed occasional WDT exceptions during OTA.
+* Updated certificate for Dialogflow and GVA.
+* Using custom 128-bit UUIDs for BLE services and characteristics instead of standard 16-bit UUIDs.
+
+**Known Issues**
+
+* Enabling BT A2DP sink requires flash size > 4MB.
+* Exhaustion of internal memory when BT A2DP sink is enabled may lead to a crash. This is applicable for boards running Wakeword detection on the host (ESP32).
+
 ### 1.0-RC2 - 2019-08-13
 
 **Enhancements**
@@ -7,7 +43,7 @@
 * Memory optimisations to improve the overall functionality and stability.
 * Added an API to change the locale for amazon_alexa. Also added a cli for the same.
 * Added support for sign-in and sign-out via the app.
-* Added basic support for OTA. The APIs still need to be implemented by the application. (refer to examples/amazon_alexa/main/cloud_agent.h)
+* Added basic support for OTA. The APIs still need to be implemented by the application. (refer to examples/amazon_alexa/main/app_cloud_agent.h)
 * Support for Gaana (India) and Hungama (India) music streaming services.
 * Provisioning app for iOS has also been added. The existing Android app has been updated.
 * Added error message in addition to error LEDs when the wake word is detected and the device is having trouble processing it.
@@ -20,7 +56,7 @@
 * Authentication components have been moved from alexa.h to auth_delegate.h. Refer to the respective files for the changes.
 * `media_hal_data.c` is now made common and is not a part of `board_support_pkgs/<board_name>/esp_codec/` anymore.
   * This complete logic is now moved to `components/media_hal/`. Please take a look at (media_hal_playback.h)[components/media_hal/].
-  * Audio board must initialize `media_hal` using `media_hal_init_playback` with config. For example, for (lyrat_board)[board_support_pkgs/lyrat/audio_boar/audio_board_lyrat/audio_board_lyrat.c].
+  * audio board must initialize `media_hal` using `media_hal_init_playback` with config. For example, for (lyrat_board)[board_support_pkgs/lyrat/audio_boar/audio_board_lyrat/audio_board_lyrat.c].
 * APIs for tone have been changed to support the above media_hal change.
 
 **Bug Fixes**

diff --git a/LICENSE b/LICENSE
@@ -1,5 +1,3 @@
-For DSPG's DBMD5P DSP firmware please refer to the license at board_support_pkgs/lyratd_dspg/dspg_fw/docs/license.pdf
-For rest of the components please refer to the below license.
 
 ESPRESSIF MIT License
 

diff --git a/README-Getting-Started.md b/README-Getting-Started.md
@@ -24,19 +24,22 @@ $ export IDF_PATH=/path/to/esp-idf
 # Set audio_board path. e.g. For LyraT board:
 $ export AUDIO_BOARD_PATH=/path/to/esp-va-sdk/board_support_pkgs/lyrat/audio_board/audio_board_lyrat/
 
-$ make -j 8 flash monitor
+$ make -j 8 flash monitor [ALEXA_BT=1]
 ```
-NOTE:
-> The google_voice_assistant and google_dialogflow applications only support Tap-to-talk whereas the amazon_alexa application supports both, "Alexa" wakeword and tap-to-talk.
 * Once you have the firmware flashed, visit the following pages for interacting with the device:
-    * [Alexa](examples/amazon_alexa/README-Alexa.md)
-    * [Google Voice Assistant](examples/google_voice_assistant/README-GVA.md)
-    * [DialogFlow](examples/google_dialogflow/README-Dialogflow.md)
+   * [Alexa](examples/amazon_alexa/README-Alexa.md)
+   * [Google Voice Assistant](examples/google_voice_assistant/README-GVA.md)
+   * [DialogFlow](examples/google_dialogflow/README-Dialogflow.md)
 
+## Enabling BT A2DP Sink support (Only for Alexa)
+* In order to enable BT A2DP sink feature, please pass `ALEXA_BT=1` as command-line argument to make.
 
 # Upgrading from Previous Release
 Please skip this section if you are using the SDK for the first time.
 
+## Upgrading to 1.2-RC1
+* New firmware would require newer Android and iOS app for provisioning and local control. Please update apps from respective app stores.
+
 ## Upgrading to 1.0-RC2
 * The partition table has been changed. If you face any issue, try doing 'make erase_flash' and then flash again.
 * Authentication sequence has been changed for amazon_alexa. Refer to app_main.c in the amazon_alexa application.

diff --git a/README.md b/README.md
@@ -2,9 +2,7 @@
 
 The ESP-Voice-Assistant SDK provides an implementation of Amazon's Alexa Voice Service, Google Voice Assistant and Google's conversational interface (aka Dialogflow) for ESP32 microcontroller. This facilitates the developers directly run these voice-assistants on an ESP32. The SDK will run on hardware boards that have Microphone/Speaker interfaced with the ESP32.
 
-## License
-* For LyratD-DSPG board based on DSPG's DBMD5P DSP please read the licensing terms [here](board_support_pkgs/lyratd_dspg/dspg_fw/docs/license.pdf) for the DSP fimrware
-* For rest of the ESP-VA-SDK components please refer to the licensing terms [here](LICENSE)
+Please refer to [Changelog](CHANGELOG.md) to track release changes and known-issues.
 
 ### About the SDK
 
@@ -35,10 +33,9 @@ The SDK contains pre-built libraries for Amazon Alexa, Google Voice Assistant (G
 The SDK supports the following hardware platforms:
 * [ESP32-LyraT](https://www.espressif.com/en/products/hardware/esp32-lyrat)
 * [ESP32-LyraTD-MSC](https://www.espressif.com/en/products/hardware/esp32-lyratd-msc)
-* [ESP32-LyraTD-DSPG](https://www.espressif.com/sites/default/files/documentation/ESP32-LyraTD-DSPG_User_Guide__en.pdf)
 
 The following list of acoustic front-ends is also supported. Please contact Espressif to enable acccess to these solutions.
-* DSPG DBMD5P [GitHub support is for evaluation purpose only and uses Espressif's WakeWord Engine].
+* DSPG DBMD5P
 * Intel s1000
 * Synaptics CX20921
 

diff --git a/board_support_pkgs/lyrat/dsp_driver/lyrat_driver/components/va_dsp/va_dsp.c b/board_support_pkgs/lyrat/dsp_driver/lyrat_driver/components/va_dsp/va_dsp.c
@@ -12,8 +12,7 @@
 #include <esp_log.h>
 #include <media_hal.h>
 #include <voice_assistant.h>
-#include <speech_recognizer.h>
-#include <va_mem_utils.h>
+#include <esp_audio_mem.h>
 #include <va_button.h>
 #include <va_nvs_utils.h>
 #include <va_dsp.h>
@@ -32,22 +31,32 @@ enum va_dsp_state {
     STOPPED,
     MUTED,
 };
-static enum va_dsp_state dsp_state;
-static QueueHandle_t cmd_queue;
-static uint8_t audio_buf[AUDIO_BUF_SIZE];
-static bool va_dsp_booted = false;
+
 static int8_t dsp_mute_en;
 
+static struct va_dsp_data_t {
+    va_dsp_record_cb_t va_dsp_record_cb;
+    va_dsp_recognize_cb_t va_dsp_recognize_cb;
+    enum va_dsp_state dsp_state;
+    QueueHandle_t cmd_queue;
+    uint8_t audio_buf[AUDIO_BUF_SIZE];
+    bool va_dsp_booted;
+} va_dsp_data = {
+    .va_dsp_record_cb = NULL,
+    .va_dsp_recognize_cb = NULL,
+    .va_dsp_booted = false,
+};
+
 static inline void _va_dsp_stop_streaming()
 {
     lyrat_stop_capture();
-    dsp_state = STOPPED;
+    va_dsp_data.dsp_state = STOPPED;
 }
 
 static inline void _va_dsp_start_streaming()
 {
     lyrat_start_capture();
-    dsp_state = STREAMING;
+    va_dsp_data.dsp_state = STREAMING;
 }
 
 static inline int _va_dsp_stream_audio(uint8_t *buffer, int size, int wait)
@@ -57,39 +66,39 @@ static inline int _va_dsp_stream_audio(uint8_t *buffer, int size, int wait)
 
 static inline void _va_dsp_mute_mic()
 {
-    if (dsp_state == STREAMING) {
+    if (va_dsp_data.dsp_state == STREAMING) {
         lyrat_stop_capture();
     }
     lyrat_mic_mute();
-    dsp_state = MUTED;
+    va_dsp_data.dsp_state = MUTED;
 }
 
 static inline void _va_dsp_unmute_mic()
 {
     lyrat_mic_unmute();
-    dsp_state = STOPPED;
+    va_dsp_data.dsp_state = STOPPED;
 }
 
 static void va_dsp_thread(void *arg)
 {
     struct dsp_event_data event_data;
     while(1) {
-        xQueueReceive(cmd_queue, &event_data, portMAX_DELAY);
-        switch (dsp_state) {
+        xQueueReceive(va_dsp_data.cmd_queue, &event_data, portMAX_DELAY);
+        switch (va_dsp_data.dsp_state) {
             case STREAMING:
                 switch (event_data.event) {
                     case TAP_TO_TALK:
                         /* Stop the streaming */
                         _va_dsp_stop_streaming();
                         break;
                     case GET_AUDIO: {
-                        int read_len = _va_dsp_stream_audio(audio_buf, AUDIO_BUF_SIZE, portMAX_DELAY);
+                        int read_len = _va_dsp_stream_audio(va_dsp_data.audio_buf, AUDIO_BUF_SIZE, portMAX_DELAY);
                         if (read_len > 0) {
-                            speech_recognizer_record(audio_buf, read_len);
+                            va_dsp_data.va_dsp_record_cb(va_dsp_data.audio_buf, read_len);
                             struct dsp_event_data new_event = {
                                 .event = GET_AUDIO
                             };
-                            xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+                            xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
                         } else {
                             _va_dsp_stop_streaming();
                         }
@@ -117,33 +126,33 @@ static void va_dsp_thread(void *arg)
                             /*XXX: Should we close the stream here?*/
                             break;
                         }
-                        if (speech_recognizer_recognize(phrase_length, WAKEWORD) == 0) {
+                        if (va_dsp_data.va_dsp_recognize_cb(phrase_length, WAKEWORD) == 0) {
                             struct dsp_event_data new_event = {
                                 .event = GET_AUDIO
                             };
-                            xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
-                            dsp_state = STREAMING;
+                            xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
+                            va_dsp_data.dsp_state = STREAMING;
                         } else {
                             printf("%s: Error starting a new dialog..stopping capture\n", TAG);
                             _va_dsp_stop_streaming();
                         }
                         break;
                     }
                     case TAP_TO_TALK:
-                        if (speech_recognizer_recognize(0, TAP) == 0) {
+                        if (va_dsp_data.va_dsp_recognize_cb(0, TAP) == 0) {
                             _va_dsp_start_streaming();
                             struct dsp_event_data new_event = {
                                 .event = GET_AUDIO
                             };
-                            xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+                            xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
                         }
                         break;
                     case START_MIC:
                         _va_dsp_start_streaming();
                         struct dsp_event_data new_event = {
                             .event = GET_AUDIO
                         };
-                        xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+                        xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
                         break;
                     case MUTE:
                         _va_dsp_mute_mic();
@@ -174,7 +183,7 @@ static void va_dsp_thread(void *arg)
                 break;
 
             default:
-                printf("%s: Unknown state %d with Event %d\n", TAG, dsp_state, event_data.event);
+                printf("%s: Unknown state %d with Event %d\n", TAG, va_dsp_data.dsp_state, event_data.event);
                 break;
         }
     }
@@ -186,7 +195,7 @@ int va_app_speech_stop()
     struct dsp_event_data new_event = {
         .event = STOP_MIC
     };
-    xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+    xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
     return 0;
 }
 
@@ -196,20 +205,20 @@ int va_app_speech_start()
     struct dsp_event_data new_event = {
         .event = START_MIC
     };
-    xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+    xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
     return 0;
 }
 
 int va_dsp_tap_to_talk_start()
 {
-    if (va_dsp_booted == false) {
+    if (va_dsp_data.va_dsp_booted == false) {
         return -1;
     }
     printf("%s: Sending start for tap to talk command\n", TAG);
     struct dsp_event_data new_event = {
         .event = TAP_TO_TALK
     };
-    xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+    xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
     return ESP_OK;
 }
 
@@ -220,10 +229,10 @@ int va_app_playback_starting()
 
 void va_dsp_reset()
 {
-    if (va_dsp_booted == true) {
+    if (va_dsp_data.va_dsp_booted == true) {
         struct dsp_event_data new_event;
         new_event.event = MUTE;
-        xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+        xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
     }
 }
 
@@ -235,23 +244,26 @@ void va_dsp_mic_mute(bool mute)
     else
         new_event.event = UNMUTE;
     va_nvs_set_i8(DSP_NVS_KEY, mute);
-    xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
+    xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
 }
 
-void va_dsp_init(void)
+void va_dsp_init(va_dsp_recognize_cb_t va_dsp_recognize_cb, va_dsp_record_cb_t va_dsp_record_cb)
 {
+    va_dsp_data.va_dsp_record_cb = va_dsp_record_cb;
+    va_dsp_data.va_dsp_recognize_cb = va_dsp_recognize_cb;
+
     lyrat_init();
     TaskHandle_t xHandle = NULL;
-    StackType_t *task_stack = (StackType_t *)va_mem_alloc(STACK_SIZE, VA_MEM_INTERNAL);
+    StackType_t *task_stack = (StackType_t *) heap_caps_calloc(1, STACK_SIZE, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
     static StaticTask_t task_buf;
 
-    cmd_queue = xQueueCreate(10, sizeof(struct dsp_event_data));
-    if (!cmd_queue) {
+    va_dsp_data.cmd_queue = xQueueCreate(10, sizeof(struct dsp_event_data));
+    if (!va_dsp_data.cmd_queue) {
         ESP_LOGE(TAG, "Error creating va_dsp queue");
         return;
     }
 
-    dsp_state = STOPPED;
+    va_dsp_data.dsp_state = STOPPED;
     if (va_nvs_get_i8(DSP_NVS_KEY, &dsp_mute_en) == ESP_OK) {
         if (dsp_mute_en) {
             va_dsp_mic_mute(dsp_mute_en);
@@ -266,5 +278,5 @@ void va_dsp_init(void)
     }
 
     va_boot_dsp_signal();
-    va_dsp_booted = true;
+    va_dsp_data.va_dsp_booted = true;
 }