back to home

intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

8,689 stars
1,403 forks
1,484 issues
PythonShellDockerfile

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing intel/ipex-llm in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind-ai.vercel.app/repo/intel/ipex-llm)
Preview:Analyzed by RepoMind

Repository Summary (README)

Preview

THIS PROJECT IS ARCHIVED

Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates.
Patches to this project are no longer accepted by Intel.
This project has been identified as having known security issues.

💫 Intel® LLM Library for PyTorch*

<p> <b>< English</b> | <a href='./README.zh-CN.md'>中文</a> > </p>

IPEX-LLM is an LLM acceleration library for Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max), NPU and CPU 1.

[!NOTE]

Latest Update 🔥

  • [2025/05] You can now run DeepSeek V3/R1 671B and Qwen3MoE 235B models with just 1 or 2 Intel Arc GPU (such as A770 or B580) using FlashMoE in ipex-llm.
  • [2025/04] We released ipex-llm 2.2.0, which includes Ollama Portable Zip and llama.cpp Portable Zip.

    ⚠️ Warning (for llama.cpp Portable Zip)
    mmap-based model loading in llama.cpp may leak data via side-channels in multi-tenant or shared-host environments.
    To disable mmap, add:

    --no-mmap
    
  • [2025/04] We added support of PyTorch 2.6 for Intel GPU.
  • [2025/03] We added support for Gemma3 model in the latest llama.cpp Portable Zip.
  • [2025/03] We can now run DeepSeek-R1-671B-Q4_K_M with 1 or 2 Arc A770 on Xeon using the latest llama.cpp Portable Zip.
  • [2025/02] We added support of llama.cpp Portable Zip for Intel GPU (both Windows and Linux) and NPU (Windows only).
  • [2025/02] We added support of Ollama Portable Zip to directly run Ollama on Intel GPU for both Windows and Linux (without the need of manual installations).
  • [2025/02] We added support for running vLLM 0.6.6 on Intel Arc GPUs.
  • [2025/01] We added the guide for running ipex-llm on Intel Arc B580 GPU.
  • [2025/01] We added support for running Ollama 0.5.4 on Intel GPU.
  • [2024/12] We added both Python and C++ support for Intel Core Ultra NPU (including 100H, 200V, 200K and 200H series).
<details><summary>More updates</summary> <br/>
  • [2024/11] We added support for running vLLM 0.6.2 on Intel Arc GPUs.
  • [2024/07] We added support for running Microsoft's GraphRAG using local LLM on Intel GPU; see the quickstart guide here.
  • [2024/07] We added extensive support for Large Multimodal Models, including StableDiffusion, Phi-3-Vision, Qwen-VL, and more.
  • [2024/07] We added FP6 support on Intel GPU.
  • [2024/06] We added experimental NPU support for Intel Core Ultra processors; see the examples here.
  • [2024/06] We added extensive support of pipeline parallel inference, which makes it easy to run large-sized LLM using 2 or more Intel GPUs (such as Arc).
  • [2024/06] We added support for running RAGFlow with ipex-llm on Intel GPU.
  • [2024/05] ipex-llm now supports Axolotl for LLM finetuning on Intel GPU; see the quickstart here.
  • [2024/05] You can now easily run ipex-llm inference, serving and finetuning using the Docker images.
  • [2024/05] You can now install ipex-llm on Windows using just "one command".
  • [2024/04] You can now run Open WebUI on Intel GPU using ipex-llm; see the quickstart here.
  • [2024/04] You can now run Llama 3 on Intel GPU using llama.cpp and ollama with ipex-llm; see the quickstart here.
  • [2024/04] ipex-llm now supports Llama 3 on both Intel GPU and CPU.
  • [2024/04] ipex-llm now provides C++ interface, which can be used as an accelerated backend for running llama.cpp and ollama on Intel GPU.
  • [2024/03] bigdl-llm has now become ipex-llm (see the migration guide here); you may find the original BigDL project here.
  • [2024/02] ipex-llm now supports directly loading model from ModelScope (魔搭).
  • [2024/02] ipex-llm added initial INT2 support (based on llama.cpp IQ2 mechanism), which makes it possible to run large-sized LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
  • [2024/02] Users can now use ipex-llm through Text-Generation-WebUI GUI.
  • [2024/02] ipex-llm now supports Self-Speculative Decoding, which in practice brings ~30% speedup for FP16 and BF16 inference latency on Intel GPU and CPU respectively.
  • [2024/02] ipex-llm now supports a comprehensive list of LLM finetuning on Intel GPU (including LoRA, QLoRA, DPO, QA-LoRA and ReLoRA).
  • [2024/01] Using ipex-llm QLoRA, we managed to finetune LLaMA2-7B in 21 minutes and LLaMA2-70B in 3.14 hours on 8 Intel Max 1550 GPU for Standford-Alpaca (see the blog here).
  • [2023/12] ipex-llm now supports ReLoRA (see "ReLoRA: High-Rank Training Through Low-Rank Updates").
  • [2023/12] ipex-llm now supports Mixtral-8x7B on both Intel GPU and CPU.
  • [2023/12] ipex-llm now supports QA-LoRA (see "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models").
  • [2023/12] ipex-llm now supports FP8 and FP4 inference on Intel GPU.
  • [2023/11] Initial support for directly loading GGUF, AWQ and GPTQ models into ipex-llm is available.
  • [2023/11] ipex-llm now supports vLLM continuous batching on both Intel GPU and CPU.
  • [2023/10] ipex-llm now supports QLoRA finetuning on both Intel GPU and CPU.
  • [2023/10] ipex-llm now supports FastChat serving on on both Intel CPU and GPU.
  • [2023/09] ipex-llm now supports Intel GPU (including iGPU, Arc, Flex and MAX).
  • [2023/09] ipex-llm tutorial is released.
</details>

ipex-llm Demo

See demos of running local LLMs on Intel Core Ultra iGPU, Intel Core Ultra NPU, single-card Arc GPU, or multi-card Arc GPUs using ipex-llm below.

<table width="100%"> <tr> <td align="center" colspan="1"><strong>Intel Core Ultra iGPU</strong></td> <td align="center" colspan="1"><strong>Intel Core Ultra NPU</strong></td> <td align="center" colspan="1"><strong>2-Card Intel Arc dGPUs</strong></td> <td align="center" colspan="1"><strong>Intel Xeon + Arc dGPU</strong></td> </tr> <tr> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/mtl_mistral-7B_q4_k_m_ollama.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/mtl_mistral-7B_q4_k_m_ollama.gif" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/npu_llama3.2-3B.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/npu_llama3.2-3B.gif" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/2arc_DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/2arc_DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gif" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/FlashMoE-Qwen3-235B.gif" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/FlashMoE-Qwen3-235B.gif" width=100%; /> </a> </td> </tr> <tr> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md">Ollama <br> (Mistral-7B, Q4_K) </a> </td> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/npu_quickstart.md">HuggingFace <br> (Llama3.2-3B, SYM_INT4)</a> </td> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md">llama.cpp <br> (DeepSeek-R1-Distill-Qwen-32B, Q4_K)</a> </td> <td align="center" width="25%"> <a href="docs/mddocs/Quickstart/flashmoe_quickstart.md">FlashMoE <br> (Qwen3MoE-235B, Q4_K) </a> </td> </tr> </table> <!-- See the demo of running [*Text-Generation-WebUI*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html), [*local RAG using LangChain-Chatchat*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html), [*llama.cpp*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llamacpp_portable_zip_gpu_quickstart.md) and [*Ollama*](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_quickstart.html) *(on either Intel Core Ultra laptop or Arc GPU)* with `ipex-llm` below. <table width="100%"> <tr> <td align="center" colspan="2"><strong>Intel Core Ultra Laptop</strong></td> <td align="center" colspan="2"><strong>Intel Arc GPU</strong></td> </tr> <tr> <td> <video src="https://private-user-images.githubusercontent.com/1931082/319632616-895d56cd-e74b-4da1-b4d1-2157df341424.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTIyNDE4MjUsIm5iZiI6MTcxMjI0MTUyNSwicGF0aCI6Ii8xOTMxMDgyLzMxOTYzMjYxNi04OTVkNTZjZC1lNzRiLTRkYTEtYjRkMS0yMTU3ZGYzNDE0MjQubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDRUMTQzODQ1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9Y2JmYzkxYWFhMGYyN2MxYTkxOTI3MGQ2NTFkZDY4ZjFjYjg3NmZhY2VkMzVhZTU2OGEyYjhjNzI5YTFhOGNhNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Ga8mmCAO62DFCNzU1fdoyC_4MzqhDHzjZedzmi_2L-I" width=100% controls /> </td> <td> <video src="https://private-user-images.githubusercontent.com/1931082/319625142-68da379e-59c6-4308-88e8-c17e40baba7b.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTIyNDA2MzQsIm5iZiI6MTcxMjI0MDMzNCwicGF0aCI6Ii8xOTMxMDgyLzMxOTYyNTE0Mi02OGRhMzc5ZS01OWM2LTQzMDgtODhlOC1jMTdlNDBiYWJhN2IubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDRUMTQxODU0WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NzYwOWI4MmQxZjFhMjJlNGNhZTA3MGUyZDE4OTA0N2Q2YjQ4NTcwN2M2MTY1ODAwZmE3OTIzOWI0Y2U3YzYwNyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.g0bYAj3J8IJci7pLzoJI6QDalyzXzMYtQkDY7aqZMc4" width=100% controls /> </td> <td> <video src="https://private-user-images.githubusercontent.com/1931082/319625685-ff13b099-bcda-48f1-b11b-05421e7d386d.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTIyNDA4MTcsIm5iZiI6MTcxMjI0MDUxNywicGF0aCI6Ii8xOTMxMDgyLzMxOTYyNTY4NS1mZjEzYjA5OS1iY2RhLTQ4ZjEtYjExYi0wNTQyMWU3ZDM4NmQubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQwNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MDRUMTQyMTU3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MWQ3MmEwZGRkNGVlY2RkNjAzMTliODM1NDEzODU3NWQ0ZGE4MjYyOGEyZjdkMjBiZjI0MjllYTU4ODQ4YzM0NCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.OFxex8Yj6WyqJKMi6B1Q19KkmbYqYCg1rD49wUwxdXQ" width=100% controls /> </td> <td> <video src="https://private-user-images.githubusercontent.com/1931082/325939544-2fc0ad5e-9ac7-4f95-b7b9-7885a8738443.mp4?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTQxMjYwODAsIm5iZiI6MTcxNDEyNTc4MCwicGF0aCI6Ii8xOTMxMDgyLzMyNTkzOTU0NC0yZmMwYWQ1ZS05YWM3LTRmOTUtYjdiOS03ODg1YTg3Mzg0NDMubXA0P1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDQyNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA0MjZUMTAwMzAwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YjZlZDE4YjFjZWJkMzQ4NmY3ZjNlMmRiYWUzMDYxMTI3YzcxYjRiYjgwNmE2NDliMjMwOTI0NWJhMDQ1NDY1YyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.WfA2qwr8EP9W7a3oOYcKqaqsEKDlAkF254zbmn9dVv0" width=100% controls /> </td> </tr> <tr> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html">Text-Generation-WebUI</a> </td> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/chatchat_quickstart.html">Local RAG using LangChain-Chatchat</a> </td> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llamacpp_portable_zip_gpu_quickstart.md">llama.cpp</a> </td> <td align="center" width="25%"> <a href="https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/ollama_portable_zip_quickstart.md">Ollama</a> </td> </tr> </table> -->

ipex-llm Performance

See the Token Generation Speed on Intel Core Ultra and Intel Arc GPU below1 (and refer to [2][3][4] for more details).

<table width="100%"> <tr> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/MTL_perf.jpg" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/MTL_perf.jpg" width=100%; /> </a> </td> <td> <a href="https://llm-assets.readthedocs.io/en/latest/_images/Arc_perf.jpg" target="_blank"> <img src="https://llm-assets.readthedocs.io/en/latest/_images/Arc_perf.jpg" width=100%; /> </a> </td> </tr> </table>

You may follow the Benchmarking Guide to run ipex-llm performance benchmark yourself.

Model Accuracy

Please see the Perplexity result below (tested on Wikitext dataset using the script here).

Perplexitysym_int4q4_kfp6fp8_e5m2fp8_e4m3fp16
Llama-2-7B-chat-hf6.3646.2186.0926.1806.0986.096
Mistral-7B-Instruct-v0.25.3655.3205.2705.2735.2465.244
Baichuan2-7B-chat6.7346.7276.5276.5396.4886.508
Qwen1.5-7B-chat8.8658.8168.5578.8468.5308.607
Llama-3.1-8B-Instruct6.7056.5666.3386.3836.3256.267
gemma-2-9b-it7.5417.4127.2697.3807.2687.270
Baichuan2-13B-Chat6.3136.1606.0706.1456.0866.031
Llama-2-13b-chat-hf5.4495.4225.3415.3845.3325.329
Qwen1.5-14B-Chat7.5297.5207.3677.5047.2977.334

ipex-llm Quickstart

Use

  • Ollama: running Ollama on Intel GPU without the need of manual installations
  • llama.cpp: running llama.cpp on Intel GPU without the need of manual installations
  • Arc B580: running ipex-llm on Intel Arc B580 GPU for Ollama, llama.cpp, PyTorch, HuggingFace, etc.
  • NPU: running ipex-llm on Intel NPU in both Python/C++ or llama.cpp API.
  • PyTorch/HuggingFace: running PyTorch, HuggingFace, LangChain, LlamaIndex, etc. (using Python interface of ipex-llm) on Intel GPU for Windows and Linux
  • vLLM: running ipex-llm in vLLM on both Intel GPU and CPU
  • FastChat: running ipex-llm in FastChat serving on on both Intel GPU and CPU
  • Serving on multiple Intel GPUs: running ipex-llm serving on multiple Intel GPUs by leveraging DeepSpeed AutoTP and FastAPI
  • Text-Generation-WebUI: running ipex-llm in oobabooga WebUI
  • Axolotl: running ipex-llm in Axolotl for LLM finetuning
  • Benchmarking: running (latency and throughput) benchmarks for ipex-llm on Intel CPU and GPU

Docker

  • GPU Inference in C++: running llama.cpp, ollama, etc., with ipex-llm on Intel GPU
  • GPU Inference in Python : running HuggingFace transformers, LangChain, LlamaIndex, ModelScope, etc. with ipex-llm on Intel GPU
  • vLLM on GPU: running vLLM serving with ipex-llm on Intel GPU
  • vLLM on CPU: running vLLM serving with ipex-llm on Intel CPU
  • FastChat on GPU: running FastChat serving with ipex-llm on Intel GPU
  • VSCode on GPU: running and developing ipex-llm applications in Python using VSCode on Intel GPU

Applications

  • GraphRAG: running Microsoft's GraphRAG using local LLM with ipex-llm
  • RAGFlow: running RAGFlow (an open-source RAG engine) with ipex-llm
  • LangChain-Chatchat: running LangChain-Chatchat (Knowledge Base QA using RAG pipeline) with ipex-llm
  • Coding copilot: running Continue (coding copilot in VSCode) with ipex-llm
  • Open WebUI: running Open WebUI with ipex-llm
  • PrivateGPT: running PrivateGPT to interact with documents with ipex-llm
  • Dify platform: running ipex-llm in Dify(production-ready LLM app development platform)

Install

Code Examples

API Doc

FAQ

Verified Models

Over 70 models have been optimized/verified on ipex-llm, including LLaMA/LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM2/ChatGLM3, Baichuan/Baichuan2, Qwen/Qwen-1.5, InternLM and more; see the list below.

ModelCPU ExampleGPU ExampleNPU Example
LLaMAlink1, link2link
LLaMA 2link1, link2linkPython link, C++ link
LLaMA 3linklinkPython link, C++ link
LLaMA 3.1linklink
LLaMA 3.2linkPython link, C++ link
LLaMA 3.2-Visionlink
ChatGLMlink
ChatGLM2linklink
ChatGLM3linklink
GLM-4linklink
GLM-4Vlinklink
GLM-EdgelinkPython link
GLM-Edge-Vlink
Mistrallinklink
Mixtrallinklink
Falconlinklink
MPTlinklink
Dolly-v1linklink
Dolly-v2linklink
Replit Codelinklink
RedPajamalink1, link2
Phoenixlink1, link2
StarCoderlink1, link2link
Baichuanlinklink
Baichuan2linklinkPython link
InternLMlinklink
InternVL2link
Qwenlinklink
Qwen1.5linklink
Qwen2linklinkPython link, C++ link
Qwen2.5linkPython link, C++ link
Qwen-VLlinklink
Qwen2-VLlink
Qwen2-Audiolink
Aquilalinklink
Aquila2linklink
MOSSlink
Whisperlinklink
Phi-1_5linklink
Flan-t5linklink
LLaVAlinklink
CodeLlamalinklink
Skyworklink
InternLM-XComposerlink
WizardCoder-Pythonlink
CodeShelllink
Fuyulink
Distil-Whisperlinklink
Yilinklink
BlueLMlinklink
Mambalinklink
SOLARlinklink
Phixtrallinklink
InternLM2linklink
RWKV4link
RWKV5link
Barklinklink
SpeechT5link
DeepSeek-MoElink
Ziya-Coding-34B-v1.0link
Phi-2linklink
Phi-3linklink
Phi-3-visionlinklink
Yuan2linklink
Gemmalinklink
Gemma2link
DeciLM-7Blinklink
Deepseeklinklink
StableLMlinklink
CodeGemmalinklink
Command-R/coherelinklink
CodeGeeX2linklink
MiniCPMlinklinkPython link, C++ link
MiniCPM3link
MiniCPM-Vlink
MiniCPM-V-2linklink
MiniCPM-Llama3-V-2_5linkPython link
MiniCPM-V-2_6linklinkPython link
MiniCPM-o-2_6link
Janus-Prolink
Moonlightlink
StableDiffusionlink
Bce-Embedding-Base-V1Python link
Speech_Paraformer-LargePython link

Get Support

Footnotes

  1. Performance varies by use, configuration and other factors. ipex-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex. 2