Skip to content

Latest commit

 

History

History
140 lines (71 loc) · 3.87 KB

02.PromptflowWithMLX.md

File metadata and controls

140 lines (71 loc) · 3.87 KB

Lab 2 - Run Prompt flow with Phi-3-mini in AIPC

What's Prompt flow

Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

With prompt flow, you will be able to:

  • Create flows that link LLMs, prompts, Python code and other tools together in a executable workflow.

  • Debug and iterate your flows, especially the interaction with LLMs with ease.

  • Evaluate your flows, calculate quality and performance metrics with larger datasets.

  • Integrate the testing and evaluation into your CI/CD system to ensure quality of your flow.

  • Deploy your flows to the serving platform you choose or integrate into your app’s code base easily.

  • (Optional but highly recommended) Collaborate with your team by leveraging the cloud version of Prompt flow in Azure AI.

Building generation code flows on Apple Silicon

Note :If you have not completed the environment installation , please visit Lab 0 -Installations

  1. Open the Prompt flow Extension in Visual Studio Code and create a empty flow project

create

  1. Add Inputs and Outputs parameters and Add Python Code as new flow

flow

You can refer to this structure (flow.dag.yaml) to construct your flow

inputs:
  prompt:
    type: string
    default: Write python code for Fibonacci serie. Please use markdown as output
outputs:
  result:
    type: string
    reference: ${gen_code_by_phi3.output}
nodes:
- name: gen_code_by_phi3
  type: python
  source:
    type: code
    path: gen_code_by_phi3.py
  inputs:
    prompt: ${inputs.prompt}

  1. Quantify phi-3-mini

We hope to better run SLM on local devices. Generally, we quantify the model (INT4, FP16, FP32)

python -m mlx_lm.convert --hf-path microsoft/Phi-3-mini-4k-instruct

Note: default folder is mlx_model

  1. Add Code in Chat_With_Phi3.py
from promptflow import tool

from mlx_lm import load, generate


# The inputs section will change based on the arguments of the tool function, after you save the code
# Adding type to arguments and return value will help the system show the types properly
# Please update the function name/signature per need
@tool
def my_python_tool(prompt: str) -> str:

    model_id = './mlx_model_phi3_mini'

    model, tokenizer = load(model_id)

    # <|user|>\nWrite python code for Fibonacci serie. Please use markdown as output<|end|>\n<|assistant|>

    response = generate(model, tokenizer, prompt="<|user|>\n" + prompt  + "<|end|>\n<|assistant|>", max_tokens=2048, verbose=True)

    return response
  1. You can test the flow from Debug or Run to check generation code okay or not

RUN

  1. Run flow as development API in terminal

pf flow serve --source ./ --port 8080 --host localhost   

You can test it in Postman / Thunder Client

Note

  1. The first run takes a long time. It is recommended to download the phi-3 model from Hugging face CLI.

  2. Considering the limited computing power of Intel NPU, it is recommended to use Phi-3-mini-4k-instruct

  3. We use Intel NPU Acceleration to quantize INT4 conversion, but if you re-run the service, you need to delete the cache and nc_workshop folders.

Resources

  1. Learn Promptflow https://microsoft.github.io/promptflow/

  2. Learn Intel NPU Acceleration https://github.com/intel/intel-npu-acceleration-library

  3. Sample Code, download Local NPU Agent Sample Code