CPU heavy consumption #84

david-oliveira-br · 2022-10-24T04:07:06Z

Hello guys , during some simple local tests I noticed my cpu processing topping 100% while running examples (mic or audio file). I spent some hours reviewing the api in a attempt to find a possible bottleneck but didnt find anything relevant yet. Do you guys have some recommendation or thoughts about it? thanks in advance

laar789 · 2023-03-27T21:32:34Z

Hi. I just notice this, and your are totally right. It's like the moment we run client.start_stream_transcription, one core of the CPU gets completely occupied, and that will be a problem. @david-oliveira-br, have you found any solution to this?

mikedavidson-evertz · 2023-03-30T15:28:53Z

Hello! I noticed this same issue as well, where my CPU would spike to 100%. Interestingly enough, this only happens when building my code using python:3.9.16-slim-bullseye, ubuntu:20.04 or ubuntu:22.04 Docker images.

When I use python:3.9-alpine for my docker image, the CPU inside the container sits at around 1 - 5%!

Stepping through the transcribe sdk code and monitoring my CPU usage, I found the exact spot where my CPU spikes. The method that triggers the spike is here:

https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/develop/amazon_transcribe/client.py#L174

response = await self._session_manager.make_request(
            signed_request.uri,
            method=signed_request.method,
            headers=signed_request.headers.as_list(),
            body=signed_request.body,
        )

and more specifically, when stepping through that request call above, the spike happens when the stream is activated here:
https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/develop/amazon_transcribe/httpsession.py#L56

def _set_stream(self, stream: http.HttpClientStream):
        if self._stream is not None:
            raise HTTPException("Stream already set on AwsCrtHttpResponse object")
        self._stream = stream
        self._stream.completion_future.add_done_callback(self._on_complete)
        self._stream.activate() # <- this call triggers the spike

self._stream.active() calls into the awscrt lib here and this is where the spike happens:
https://github.com/awslabs/aws-crt-python/blob/main/awscrt/http.py#L286

  def activate(self):
        """Begin sending the request.

        The HTTP stream does nothing until this is called. Call activate() when you
        are ready for its callbacks and events to fire.
        """
        _awscrt.http_client_stream_activate(self) # <-- 100% CPU SPIKE HAPPENS HERE

C code for the awscrt http_client_stream_activate python bindings above:

https://github.com/awslabs/aws-crt-python/blob/58de212a9288e64cdb5f698f782abf4281ba8bf6/source/http_stream.c#L301

Also, this spike occurs before I even begin transcribing any audio! It happens the moment this stream is activated.

Do you guys have any idea what's causing this? Thanks.

mikedavidson-evertz · 2023-03-30T20:33:16Z

Steps to recreate this issue

If you're on ubuntu 20.04 or 22.04, you can run this code directly using python3.9 and monitor you cpu usage with top

Note: Change the region in the transcribe client to the region you want to test with.

from amazon_transcribe.client import TranscribeStreamingClient

import asyncio


async def start_stream():
    transcribe_client = TranscribeStreamingClient(region="us-east-1")
    transcribe_stream = await transcribe_client.start_stream_transcription(
        language_code="en-US",
        media_sample_rate_hz=16000,
        media_encoding="pcm",
        language_model_name=None,
        vocabulary_name=None,
        vocab_filter_method=None,
        vocab_filter_name=None,
        show_speaker_label=None,
        enable_channel_identification=None,
        number_of_channels=None,
        enable_partial_results_stabilization=None,
        partial_results_stability=None,
        session_id=None,
    )

    # put a breakpoint here and look at your CPU usage.
    print("put breakpoint here")

    # loop so we can monitor cpu usage
    while True:
        pass


def main():
    asyncio.run(start_stream())


if __name__ == "__main__":
    main()

top CPU output:

Build the python code using docker

If you're not using ubuntu, you can build the code using Docker. Put that python code above in a main.py and the Dockerfile below in the same directory:

Note: Fill in your AWS creds in the dockerfile so it can authenticate with transcribe.

FROM ubuntu:20.04

ENV AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
ENV AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
ENV AWS_SESSION_TOKEN=<AWS_SESSION_TOKEN>
      
ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y software-properties-common && \
    add-apt-repository -y ppa:deadsnakes/ppa

RUN apt-get install --no-install-recommends -y \
    python3.9=3.9.16-1+focal1 \
    python3-pip \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

RUN python3.9 -m pip install amazon-transcribe==0.6.1

WORKDIR /transcribe_high_cpu_test

COPY main.py ./

CMD ["python3.9", "main.py"]

Now build the image and run the image while monitoring your CPU usage.
Run these commands in the same directory as the python/Dockerfile.

docker build -t transcribe-cpu-usage-test .

docker run transcribe-cpu-usage-test:latest

gregcscott · 2024-11-13T18:39:13Z

Hi - curious if there will be any progress on resolving this issue? We're encountering this with our current version of python (3.7.10) and centos (7:0.3)

mark-sinclair · 2024-11-20T22:39:42Z

Experiencing a similar issue to this -- activating more than 2 asynchronous streams and they seem to freeze on _awscrt.http_client_stream_activate(self) before anything happens.

I'm on MacOS and Python 3.9

Has anyone found a solution?

mark-sinclair · 2024-11-20T23:06:36Z

UPDATE

Seems to work if I upgrade from Python 3.9 to 3.12 -- perhaps because this allows awscrt to move to a newer version in my conda envs.

Python 3.9
awscrt 0.20.12 py39h6e6cb0c_4 conda-forge

Python 3.12
awscrt 0.23.1 py312h6d9cc1d_0 conda-forge

dc65 · 2024-12-02T03:33:34Z

Same issue on Windows 10. When you run the Quick Start sample program (with a longer .wav file), one core goes to 100%. Using Process Explorer you can see that the offending thread has a Start Address _awscrt.pyd!!PyInit__awscrt+0xce650. I am using awscrt 0.16, Python 3.12.7. I tried a pip upgrade but I get "amazon-transcribe 0.6.2 requires awscrt~=0.16.0". It seems to have upgraded anyway, and the problem remains. There must be some busy looping going on somewhere - the code should mostly be just waiting for AWS to respond not busy looping. Anyone have a solution?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU heavy consumption #84

CPU heavy consumption #84

david-oliveira-br commented Oct 24, 2022

laar789 commented Mar 27, 2023

mikedavidson-evertz commented Mar 30, 2023 •

edited

Loading

mikedavidson-evertz commented Mar 30, 2023 •

edited

Loading

gregcscott commented Nov 13, 2024

mark-sinclair commented Nov 20, 2024

mark-sinclair commented Nov 20, 2024

dc65 commented Dec 2, 2024

CPU heavy consumption #84

CPU heavy consumption #84

Comments

david-oliveira-br commented Oct 24, 2022

laar789 commented Mar 27, 2023

mikedavidson-evertz commented Mar 30, 2023 • edited Loading

mikedavidson-evertz commented Mar 30, 2023 • edited Loading

Steps to recreate this issue

Build the python code using docker

gregcscott commented Nov 13, 2024

mark-sinclair commented Nov 20, 2024

mark-sinclair commented Nov 20, 2024

dc65 commented Dec 2, 2024

mikedavidson-evertz commented Mar 30, 2023 •

edited

Loading

mikedavidson-evertz commented Mar 30, 2023 •

edited

Loading