triton-inference-server · matthewkotila · Sep 4, 2024 · Sep 4, 2024
diff --git a/Popular_Models_Guide/Llama2/trtllm_guide.md b/Popular_Models_Guide/Llama2/trtllm_guide.md
@@ -345,6 +345,7 @@ You can read more about Gen-AI Perf [here](https://docs.nvidia.com/deeplearning/
 To use Gen-AI Perf, run the following command in the same Triton docker container:
 ```bash
 genai-perf \
+  profile \
   -m ensemble \
   --service-kind triton \
   --backend tensorrtllm \
@@ -380,4 +381,4 @@ Request throughput (per sec): 0.61
 
 ## References
 
-For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md)
+For more examples feel free to refer to [End to end workflow to run llama.](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md)