Skip to content

Latest commit

 

History

History
35 lines (25 loc) · 3 KB

overview.md

File metadata and controls

35 lines (25 loc) · 3 KB

In the context of Phi-3-mini, inference refers to the process of using the model to make predictions or generate outputs based on input data. Let me provide you with more details about Phi-3-mini and its inference capabilities.

Phi-3-mini is part of the Phi-3 series of models released by Microsoft. These models are designed to redefine what's possible with Small Language Models (SLMs).

Here are some key points about Phi-3-mini and its inference capabilities:

Phi-3-mini Overview:

  • Phi-3-mini has a parameter size of 3.8 billion.
  • It can run not only on traditional computing devices but also on edge devices such as mobile devices and IoT devices.
  • The release of Phi-3-mini enables individuals and enterprises to deploy SLMs on different hardware devices, especially in resource-constrained environments.
  • It covers various model formats, including the traditional PyTorch format, the quantized version of the gguf format, and the ONNX-based quantized version.

Accessing Phi-3-mini:

To access Phi-3-mini, you can use Semantic Kernel in a Copilot application. Semantic Kernel is generally compatible with Azure OpenAI Service, open-source models on Hugging Face, and local models. You can also use Ollama or LlamaEdge to call quantized models. Ollama allows individual users to call different quantized models, while LlamaEdge provides cross-platform availability for GGUF models.

Quantized Models:

Many users prefer to use quantized models for local inference. For example, you can directly run Ollama run Phi-3 or configure it offline using a Modelfile. The Modelfile specifies the GGUF file path and the prompt format.

Generative AI Possibilities:

Combining SLMs like Phi-3-mini opens up new possibilities for generative AI. Inference is just the first step; these models can be used for various tasks in resource-constrained, latency-bound, and cost-constrained scenarios.

Unlocking Generative AI with Phi-3-mini: A Guide to Inference and Deployment

Learn how to use Semantic Kernel, Ollama/LlamaEdge, and ONNX Runtime to access and infer Phi-3-mini models, and explore the possibilities of generative AI in various application scenarios.

Features Inference phi3-mini model in:

In summary, Phi-3-mini allows developers to explore different model formats and leverage generative AI in various application scenarios.