Skip to content

Commit

Permalink
Release esp-va-sdk-v1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
avsheth committed Dec 7, 2019
1 parent 1a3661f commit 8d23e83
Show file tree
Hide file tree
Showing 86 changed files with 2,035 additions and 1,040 deletions.
40 changes: 38 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,40 @@
## ChangeLog

### 1.2-RC1 - 2019-11-13

**Enhancements**

* Support for BT A2DP sink with Alexa.
* Support for setting device configuration via companion app. Below are the configuration options settable/gettable via the app:
* Assistant's language
* Device name (visible over local network after provisioning)
* Device volume
* Alexa WW detection tone (start tone)
* Query end tone (end tone)
* Support for displaying WiFi's authentication mode (open or secure) in the app during provisioning.
* Support for streaming binary/octate-stream media content type.
* Support for 5 linear LED patterns for Alexa events.

**API Changes**

* Removed `va_playback` from alexa_config_t. It is now being handled internally.
* `media_hal.c` is made common to all boards.
* `va_dsp_init` api now requires two callback parameters `va_dsp_recognize_cb_t` and `va_dsp_record_cb_t`.
* Alexa device's "Product ID" can now also be specified from menuconfig.

**Bug Fixes**

* Long duration stability improvements.
* Fixed a memory leak of 48 bytes after each NVS operation.
* Fixed occasional WDT exceptions during OTA.
* Updated certificate for Dialogflow and GVA.
* Using custom 128-bit UUIDs for BLE services and characteristics instead of standard 16-bit UUIDs.

**Known Issues**

* Enabling BT A2DP sink requires flash size > 4MB.
* Exhaustion of internal memory when BT A2DP sink is enabled may lead to a crash. This is applicable for boards running Wakeword detection on the host (ESP32).

### 1.0-RC2 - 2019-08-13

**Enhancements**
Expand All @@ -7,7 +43,7 @@
* Memory optimisations to improve the overall functionality and stability.
* Added an API to change the locale for amazon_alexa. Also added a cli for the same.
* Added support for sign-in and sign-out via the app.
* Added basic support for OTA. The APIs still need to be implemented by the application. (refer to examples/amazon_alexa/main/cloud_agent.h)
* Added basic support for OTA. The APIs still need to be implemented by the application. (refer to examples/amazon_alexa/main/app_cloud_agent.h)
* Support for Gaana (India) and Hungama (India) music streaming services.
* Provisioning app for iOS has also been added. The existing Android app has been updated.
* Added error message in addition to error LEDs when the wake word is detected and the device is having trouble processing it.
Expand All @@ -20,7 +56,7 @@
* Authentication components have been moved from alexa.h to auth_delegate.h. Refer to the respective files for the changes.
* `media_hal_data.c` is now made common and is not a part of `board_support_pkgs/<board_name>/esp_codec/` anymore.
* This complete logic is now moved to `components/media_hal/`. Please take a look at (media_hal_playback.h)[components/media_hal/].
* Audio board must initialize `media_hal` using `media_hal_init_playback` with config. For example, for (lyrat_board)[board_support_pkgs/lyrat/audio_boar/audio_board_lyrat/audio_board_lyrat.c].
* audio board must initialize `media_hal` using `media_hal_init_playback` with config. For example, for (lyrat_board)[board_support_pkgs/lyrat/audio_boar/audio_board_lyrat/audio_board_lyrat.c].
* APIs for tone have been changed to support the above media_hal change.

**Bug Fixes**
Expand Down
2 changes: 0 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
For DSPG's DBMD5P DSP firmware please refer to the license at board_support_pkgs/lyratd_dspg/dspg_fw/docs/license.pdf
For rest of the components please refer to the below license.

ESPRESSIF MIT License

Expand Down
15 changes: 9 additions & 6 deletions README-Getting-Started.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,22 @@ $ export IDF_PATH=/path/to/esp-idf
# Set audio_board path. e.g. For LyraT board:
$ export AUDIO_BOARD_PATH=/path/to/esp-va-sdk/board_support_pkgs/lyrat/audio_board/audio_board_lyrat/
$ make -j 8 flash monitor
$ make -j 8 flash monitor [ALEXA_BT=1]
```
NOTE:
> The google_voice_assistant and google_dialogflow applications only support Tap-to-talk whereas the amazon_alexa application supports both, "Alexa" wakeword and tap-to-talk.
* Once you have the firmware flashed, visit the following pages for interacting with the device:
* [Alexa](examples/amazon_alexa/README-Alexa.md)
* [Google Voice Assistant](examples/google_voice_assistant/README-GVA.md)
* [DialogFlow](examples/google_dialogflow/README-Dialogflow.md)
* [Alexa](examples/amazon_alexa/README-Alexa.md)
* [Google Voice Assistant](examples/google_voice_assistant/README-GVA.md)
* [DialogFlow](examples/google_dialogflow/README-Dialogflow.md)

## Enabling BT A2DP Sink support (Only for Alexa)
* In order to enable BT A2DP sink feature, please pass `ALEXA_BT=1` as command-line argument to make.

# Upgrading from Previous Release
Please skip this section if you are using the SDK for the first time.

## Upgrading to 1.2-RC1
* New firmware would require newer Android and iOS app for provisioning and local control. Please update apps from respective app stores.

## Upgrading to 1.0-RC2
* The partition table has been changed. If you face any issue, try doing 'make erase_flash' and then flash again.
* Authentication sequence has been changed for amazon_alexa. Refer to app_main.c in the amazon_alexa application.
Expand Down
7 changes: 2 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@

The ESP-Voice-Assistant SDK provides an implementation of Amazon's Alexa Voice Service, Google Voice Assistant and Google's conversational interface (aka Dialogflow) for ESP32 microcontroller. This facilitates the developers directly run these voice-assistants on an ESP32. The SDK will run on hardware boards that have Microphone/Speaker interfaced with the ESP32.

## License
* For LyratD-DSPG board based on DSPG's DBMD5P DSP please read the licensing terms [here](board_support_pkgs/lyratd_dspg/dspg_fw/docs/license.pdf) for the DSP fimrware
* For rest of the ESP-VA-SDK components please refer to the licensing terms [here](LICENSE)
Please refer to [Changelog](CHANGELOG.md) to track release changes and known-issues.

### About the SDK

Expand Down Expand Up @@ -35,10 +33,9 @@ The SDK contains pre-built libraries for Amazon Alexa, Google Voice Assistant (G
The SDK supports the following hardware platforms:
* [ESP32-LyraT](https://www.espressif.com/en/products/hardware/esp32-lyrat)
* [ESP32-LyraTD-MSC](https://www.espressif.com/en/products/hardware/esp32-lyratd-msc)
* [ESP32-LyraTD-DSPG](https://www.espressif.com/sites/default/files/documentation/ESP32-LyraTD-DSPG_User_Guide__en.pdf)

The following list of acoustic front-ends is also supported. Please contact Espressif to enable acccess to these solutions.
* DSPG DBMD5P [GitHub support is for evaluation purpose only and uses Espressif's WakeWord Engine].
* DSPG DBMD5P
* Intel s1000
* Synaptics CX20921

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@
#include <esp_log.h>
#include <media_hal.h>
#include <voice_assistant.h>
#include <speech_recognizer.h>
#include <va_mem_utils.h>
#include <esp_audio_mem.h>
#include <va_button.h>
#include <va_nvs_utils.h>
#include <va_dsp.h>
Expand All @@ -32,22 +31,32 @@ enum va_dsp_state {
STOPPED,
MUTED,
};
static enum va_dsp_state dsp_state;
static QueueHandle_t cmd_queue;
static uint8_t audio_buf[AUDIO_BUF_SIZE];
static bool va_dsp_booted = false;

static int8_t dsp_mute_en;

static struct va_dsp_data_t {
va_dsp_record_cb_t va_dsp_record_cb;
va_dsp_recognize_cb_t va_dsp_recognize_cb;
enum va_dsp_state dsp_state;
QueueHandle_t cmd_queue;
uint8_t audio_buf[AUDIO_BUF_SIZE];
bool va_dsp_booted;
} va_dsp_data = {
.va_dsp_record_cb = NULL,
.va_dsp_recognize_cb = NULL,
.va_dsp_booted = false,
};

static inline void _va_dsp_stop_streaming()
{
lyrat_stop_capture();
dsp_state = STOPPED;
va_dsp_data.dsp_state = STOPPED;
}

static inline void _va_dsp_start_streaming()
{
lyrat_start_capture();
dsp_state = STREAMING;
va_dsp_data.dsp_state = STREAMING;
}

static inline int _va_dsp_stream_audio(uint8_t *buffer, int size, int wait)
Expand All @@ -57,39 +66,39 @@ static inline int _va_dsp_stream_audio(uint8_t *buffer, int size, int wait)

static inline void _va_dsp_mute_mic()
{
if (dsp_state == STREAMING) {
if (va_dsp_data.dsp_state == STREAMING) {
lyrat_stop_capture();
}
lyrat_mic_mute();
dsp_state = MUTED;
va_dsp_data.dsp_state = MUTED;
}

static inline void _va_dsp_unmute_mic()
{
lyrat_mic_unmute();
dsp_state = STOPPED;
va_dsp_data.dsp_state = STOPPED;
}

static void va_dsp_thread(void *arg)
{
struct dsp_event_data event_data;
while(1) {
xQueueReceive(cmd_queue, &event_data, portMAX_DELAY);
switch (dsp_state) {
xQueueReceive(va_dsp_data.cmd_queue, &event_data, portMAX_DELAY);
switch (va_dsp_data.dsp_state) {
case STREAMING:
switch (event_data.event) {
case TAP_TO_TALK:
/* Stop the streaming */
_va_dsp_stop_streaming();
break;
case GET_AUDIO: {
int read_len = _va_dsp_stream_audio(audio_buf, AUDIO_BUF_SIZE, portMAX_DELAY);
int read_len = _va_dsp_stream_audio(va_dsp_data.audio_buf, AUDIO_BUF_SIZE, portMAX_DELAY);
if (read_len > 0) {
speech_recognizer_record(audio_buf, read_len);
va_dsp_data.va_dsp_record_cb(va_dsp_data.audio_buf, read_len);
struct dsp_event_data new_event = {
.event = GET_AUDIO
};
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
} else {
_va_dsp_stop_streaming();
}
Expand Down Expand Up @@ -117,33 +126,33 @@ static void va_dsp_thread(void *arg)
/*XXX: Should we close the stream here?*/
break;
}
if (speech_recognizer_recognize(phrase_length, WAKEWORD) == 0) {
if (va_dsp_data.va_dsp_recognize_cb(phrase_length, WAKEWORD) == 0) {
struct dsp_event_data new_event = {
.event = GET_AUDIO
};
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
dsp_state = STREAMING;
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
va_dsp_data.dsp_state = STREAMING;
} else {
printf("%s: Error starting a new dialog..stopping capture\n", TAG);
_va_dsp_stop_streaming();
}
break;
}
case TAP_TO_TALK:
if (speech_recognizer_recognize(0, TAP) == 0) {
if (va_dsp_data.va_dsp_recognize_cb(0, TAP) == 0) {
_va_dsp_start_streaming();
struct dsp_event_data new_event = {
.event = GET_AUDIO
};
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
}
break;
case START_MIC:
_va_dsp_start_streaming();
struct dsp_event_data new_event = {
.event = GET_AUDIO
};
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
break;
case MUTE:
_va_dsp_mute_mic();
Expand Down Expand Up @@ -174,7 +183,7 @@ static void va_dsp_thread(void *arg)
break;

default:
printf("%s: Unknown state %d with Event %d\n", TAG, dsp_state, event_data.event);
printf("%s: Unknown state %d with Event %d\n", TAG, va_dsp_data.dsp_state, event_data.event);
break;
}
}
Expand All @@ -186,7 +195,7 @@ int va_app_speech_stop()
struct dsp_event_data new_event = {
.event = STOP_MIC
};
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
return 0;
}

Expand All @@ -196,20 +205,20 @@ int va_app_speech_start()
struct dsp_event_data new_event = {
.event = START_MIC
};
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
return 0;
}

int va_dsp_tap_to_talk_start()
{
if (va_dsp_booted == false) {
if (va_dsp_data.va_dsp_booted == false) {
return -1;
}
printf("%s: Sending start for tap to talk command\n", TAG);
struct dsp_event_data new_event = {
.event = TAP_TO_TALK
};
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
return ESP_OK;
}

Expand All @@ -220,10 +229,10 @@ int va_app_playback_starting()

void va_dsp_reset()
{
if (va_dsp_booted == true) {
if (va_dsp_data.va_dsp_booted == true) {
struct dsp_event_data new_event;
new_event.event = MUTE;
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
}
}

Expand All @@ -235,23 +244,26 @@ void va_dsp_mic_mute(bool mute)
else
new_event.event = UNMUTE;
va_nvs_set_i8(DSP_NVS_KEY, mute);
xQueueSend(cmd_queue, &new_event, portMAX_DELAY);
xQueueSend(va_dsp_data.cmd_queue, &new_event, portMAX_DELAY);
}

void va_dsp_init(void)
void va_dsp_init(va_dsp_recognize_cb_t va_dsp_recognize_cb, va_dsp_record_cb_t va_dsp_record_cb)
{
va_dsp_data.va_dsp_record_cb = va_dsp_record_cb;
va_dsp_data.va_dsp_recognize_cb = va_dsp_recognize_cb;

lyrat_init();
TaskHandle_t xHandle = NULL;
StackType_t *task_stack = (StackType_t *)va_mem_alloc(STACK_SIZE, VA_MEM_INTERNAL);
StackType_t *task_stack = (StackType_t *) heap_caps_calloc(1, STACK_SIZE, MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
static StaticTask_t task_buf;

cmd_queue = xQueueCreate(10, sizeof(struct dsp_event_data));
if (!cmd_queue) {
va_dsp_data.cmd_queue = xQueueCreate(10, sizeof(struct dsp_event_data));
if (!va_dsp_data.cmd_queue) {
ESP_LOGE(TAG, "Error creating va_dsp queue");
return;
}

dsp_state = STOPPED;
va_dsp_data.dsp_state = STOPPED;
if (va_nvs_get_i8(DSP_NVS_KEY, &dsp_mute_en) == ESP_OK) {
if (dsp_mute_en) {
va_dsp_mic_mute(dsp_mute_en);
Expand All @@ -266,5 +278,5 @@ void va_dsp_init(void)
}

va_boot_dsp_signal();
va_dsp_booted = true;
va_dsp_data.va_dsp_booted = true;
}
Loading

0 comments on commit 8d23e83

Please sign in to comment.