25 Aug 15:47

XprobeBot

8fa1630

v0.2.2

What's new in 0.2.2 (2023-08-25)

These are the changes in inference v0.2.2.

New features

FEAT: Support Llama-2 PyTorch model by @jiayini1119 in #387
FEAT: code-llama by @UranusSeven in #402

Enhancements

ENH: Update max_tokens to 32k by @Bojun-Feng in #386

Bug fixes

BUG: last token is duplicated by @UranusSeven in #398

Documentation

DOC: readme enhancements by @aresnow1 in #390

Others

fix chatglm params by @Bojun-Feng in #400

Full Changelog: v0.2.1...v0.2.2

Contributors

Bojun-Feng, jiayini1119, and 2 other contributors

Assets 2

23 Aug 11:55

XprobeBot

v0.2.1

d199011

v0.2.1

What's new in 0.2.1 (2023-08-23)

These are the changes in inference v0.2.1.

New features

FEAT: Adding support for stream returning completion results by @takatost in #368

Enhancements

ENH: including default context length into model family by @Bojun-Feng in #374

Bug fixes

BUG: PyTorch generate config max_new_tokens not compatible with RESTful API by @Bojun-Feng in #373
BUG: llm class match by @UranusSeven in #383
BUG: return chat model handle by @UranusSeven in #382
BUG: xinference cache dir doesn't exist by @UranusSeven in #380

New Contributors

@takatost made their first contribution in #368

Full Changelog: v0.2.0...v0.2.1

Contributors

takatost, Bojun-Feng, and UranusSeven

Assets 2

19 Aug 15:49

XprobeBot

v0.2.0

7ed7a02

v0.2.0

What's new in 0.2.0 (2023-08-19)

These are the changes in inference v0.2.0.

New features

FEAT:Support Starchat-Beta and StarCoderPlus with Pytorch. by @RayJi01 in #333
FEAT: Support Ctransformers by @RayJi01 in #289
FEAT: internlm by @UranusSeven in #352
FEAT: Support Vicuna-v1.5 and Vicuna-v1.5-16k by @RayJi01 in #343
FEAT: wizardmath by @UranusSeven in #351
FEAT: support generate/chat/create_embedding/register/unregister/registrations method in cmdline by @pangyoki in #363

Enhancements

ENH: Use Llama 2 chat for inference in LangChain QA demo by @jiayini1119 in #324
ENH: cache from URI by @UranusSeven in #350
ENH: Update System Prompt for llama-2-chat by @Bojun-Feng in #359
ENH: RESTful client supports custom model APIs by @jiayini1119 in #360
BLD: fix readthedocs by @UranusSeven in #340
BLD: fix readthedocs by @UranusSeven in #342

Bug fixes

BUG: Chatglm max_length doesn't work by @Bojun-Feng in #349
BUG: builtin stop_token_ids changes by @UranusSeven in #353
BUG: custom model related bugs by @UranusSeven in #364

Documentation

DOC: framework by @UranusSeven in #332
DOC: models by @UranusSeven in #338
DOC: fix README.md by @UranusSeven in #354
DOC: update builtin models by @UranusSeven in #365

Others

FEAT : Add Model Dashboard by @Bojun-Feng in #334
Revert "FEAT : Add Model Dashboard" by @UranusSeven in #362

Full Changelog: v0.1.3...v0.2.0

Contributors

pangyoki, RayJi01, and 3 other contributors

Assets 2

09 Aug 10:46

XprobeBot

v0.1.3

4d2f61c

v0.1.3

What's new in 0.1.3 (2023-08-09)

These are the changes in inference v0.1.3.

Enhancements

ENH: accelerate 4-bit quantization for pytorch model by @pangyoki in #284
ENH: remove chatglmcpp from deps by @UranusSeven in #329
ENH: auto detect device in pytorch model by @pangyoki in #322
ENH: Include model revision by @RayJi01 in #320

Bug fixes

BUG: fix mps and cuda device detection for pytorch model by @pangyoki in #331
Bug: Fix grammar mistake in examples by @Bojun-Feng in #336
BUG: Fix log level on subprocess by @RayJi01 in #335

Documentation

DOC: fix doc warnings by @UranusSeven in #314
DOC: add ja_JP and update po files by @UranusSeven in #315
DOC: custom models by @UranusSeven in #325

Others

OTHER: add chinese podcast demo by @RayJi01 in #237

Full Changelog: v0.1.2...v0.1.3

Contributors

pangyoki, RayJi01, and 2 other contributors

Assets 2

04 Aug 10:36

XprobeBot

v0.1.2

98765f2

v0.1.2

What's new in 0.1.2 (2023-08-04)

These are the changes in inference v0.1.2.

New features

FEAT: custom model by @UranusSeven in #290

Enhancements

ENH: select q4_0 as default quantization method for ggmlv3 model in benchmark by @pangyoki in #293
ENH: disable gradio telemetry by @UranusSeven in #299

Bug fixes

BUG: llm_family.json encoding by @UranusSeven in #297
BUG: handle ChatGLM ggml specific case for RESTful API by @jiayini1119 in #309
BUG: handle Qwen update by @UranusSeven in #307

Others

DEMO: LangChain QA System with Xinference LLMs and Milvus Vector DB by @jiayini1119 in #304
Chore: update issue template by @UranusSeven in #300
Chore: remove codecov by @UranusSeven in #308

Full Changelog: v0.1.1...v0.1.2

Contributors

pangyoki, jiayini1119, and UranusSeven

Assets 2

03 Aug 15:47

XprobeBot

v0.1.1

b21d927

v0.1.1

What's new in 0.1.1 (2023-08-03)

These are the changes in inference v0.1.1.

New features

FEAT: add opt-125m pytorch model and add ut by @pangyoki in #263
FEAT: support falcon 40b pytorch model by @pangyoki in #278
FEAT: pytorch model embeddings by @jiayini1119 in #282
FEAT: support falcon-instruct 7b and 40b pytorch model by @jiayini1119 in #287
FEAT: support chatglm/chatglm2/chatglm2-32k pytorch model by @pangyoki in #283
FEAT: support qwen 7b by @UranusSeven in #294

Enhancements

ENH: Support Enviroment Variable by @RayJi01 in #285
REF: split supervisor and worker by @UranusSeven in #279

Bug fixes

BUG: fix import torch error even if user don't want to launch torch model by @pangyoki in #274
BUG: empty legacy model dir by @UranusSeven in #276

Tests

TST: add benchmark script by @pangyoki in #281

Documentation

DOC: Update README_ja_JP.md by @eltociear in #269
DOC: add docstring to client methods by @RayJi01 in #247

Full Changelog: v0.1.0...v0.1.1

Contributors

eltociear, pangyoki, and 3 other contributors

Assets 2

28 Jul 13:13

XprobeBot

v0.1.0

37ca23a

v0.1.0

What's new in 0.1.0 (2023-07-28)

These are the changes in inference v0.1.0.

New features

FEAT: support fp4 and int8 quantization for pytorch model by @pangyoki in #238
FEAT: support llama-2-chat-70b ggml by @UranusSeven in #257

Enhancements

ENH: skip 4-bit quantization for non-linux or non-cuda local deployment by @UranusSeven in #264
ENH: handle legacy cache by @UranusSeven in #266
REF: model family by @UranusSeven in #251

Bug fixes

BUG: fix restful stop parameters by @RayJi01 in #241
BUG: download integrity hot fix by @RayJi01 in #242
BUG: disable baichuan-chat and baichuan-base on macos by @pangyoki in #250
BUG: delete tqdm_class in snapshot_download by @pangyoki in #258
BUG: ChatGLM Parameter Switch by @Bojun-Feng in #262
BUG: refresh related fields when format changes by @UranusSeven in #265
BUG: Show downloading progress in gradio by @aresnow1 in #267
BUG: LLM json not included by @UranusSeven in #268

Tests

TST: Update ChatGLM Tests by @Bojun-Feng in #259

Documentation

DOC: Update installation part in readme by @aresnow1 in #253
DOC: update readme for pytorch model by @pangyoki in #207

Full Changelog: v0.0.6...v0.1.0

Contributors

pangyoki, RayJi01, and 3 other contributors

Assets 2

24 Jul 07:05

XprobeBot

v0.0.6

b753f98

v0.0.6

What's new in 0.0.6 (2023-07-24)

These are the changes in inference v0.0.6.

Enhancements

ENH: download integrity by @RayJi01 in #180

Bug fixes

BUG: baichuan-chat and baichuan-base don't support MacOS by @pangyoki in #202
BUG: fix pytorch model generate bug when stream is True by @pangyoki in #210
BUG: solve the problem that pytorch model still occupies memory after terminating the model by @pangyoki in #219
BUG: fix baichuan-chat configure by @pangyoki in #217
BUG: Update requirements of gradio by @aresnow1 in #216
BUG: chat stopwords by @UranusSeven in #222
BUG: disable vicuna pytorch model by @pangyoki in #225
BUG: Set default embedding to be True by @jiayini1119 in #236

Documentation

DOC: Add notes for metal GPU acceleration by @aresnow1 in #213
DOC: Add Japanese README by @eltociear in #228
DOC: Adding Examples to documentation by @RayJi01 in #196

New Contributors

@eltociear made their first contribution in #228

Full Changelog: v0.0.5...v0.0.6

Contributors

eltociear, pangyoki, and 4 other contributors

Assets 2

19 Jul 11:32

XprobeBot

v0.0.5

c9d42c6

v0.0.5

What's new in 0.0.5 (2023-07-19)

These are the changes in inference v0.0.5.

New features

FEAT: support pytorch models by @pangyoki in #157
FEAT: support vicuna-v1.3 33B by @Bojun-Feng in #192
FEAT: support baichuan-chat pytorch model by @pangyoki in #190
FEAT: pytorch model support MPS backend by @pangyoki in #198
FEAT: Embedding by @jiayini1119 in #194
FEAT: LLaMA-2 by @UranusSeven in #203

Enhancements

ENH: Implement RESTful API stream generate by @jiayini1119 in #171
ENH: set default device to mps on MacOS by @pangyoki in #205
ENH: Set default mlock to true and mmap to false by @RayJi01 in #206
ENH: add Gradio ChatInterface chatbot to example by @Bojun-Feng in #208

Bug fixes

BUG: fix pytorch int8 by @pangyoki in #197
BUG: RuntimeError when launching model using kwargs whose value is of type int by @jiayini1119 in #209
BUG: Fix some gradio issues by @aresnow1 in #200

Documentation

DOC: sphinx init by @UranusSeven in #189
DOC: chinese readme by @UranusSeven in #191

Full Changelog: v0.0.4...v0.0.5

Contributors

pangyoki, RayJi01, and 4 other contributors

Assets 2

14 Jul 12:15

XprobeBot

v0.0.4

80003e1

v0.0.4

What's new in 0.0.4 (2023-07-14)

These are the changes in inference v0.0.4.

New features

FEAT: implement chat and generate in RESTful client by @jiayini1119 in #161
FEAT: support wizard-v1.1 by @UranusSeven in #183

Bug fixes

BUG: fix example chat by @UranusSeven in #165

Documentation

DOC: add logo; make words more concise by @onesuper in #158

Others

OTHER: AI podcast example by @RayJi01 in #160

Full Changelog: v0.0.3...v0.0.4

Contributors

onesuper, RayJi01, and 2 other contributors

Assets 2

Releases: xorbitsai/inference

v0.2.2

What's new in 0.2.2 (2023-08-25)

New features

Enhancements

Bug fixes

Documentation

Others

Contributors

v0.2.1

What's new in 0.2.1 (2023-08-23)

New features

Enhancements

Bug fixes

New Contributors

Contributors

v0.2.0

What's new in 0.2.0 (2023-08-19)

New features

Enhancements

Bug fixes

Documentation

Others

Contributors

v0.1.3

What's new in 0.1.3 (2023-08-09)

Enhancements

Bug fixes

Documentation

Others

Contributors

v0.1.2

What's new in 0.1.2 (2023-08-04)

New features

Enhancements

Bug fixes

Others

Contributors

v0.1.1

What's new in 0.1.1 (2023-08-03)

New features

Enhancements

Bug fixes

Tests

Documentation

Contributors

v0.1.0

What's new in 0.1.0 (2023-07-28)

New features

Enhancements

Bug fixes

Tests

Documentation

Contributors

v0.0.6

What's new in 0.0.6 (2023-07-24)

Enhancements

Bug fixes

Documentation

New Contributors

Contributors

v0.0.5

What's new in 0.0.5 (2023-07-19)

New features

Enhancements

Bug fixes

Documentation

Contributors

v0.0.4

What's new in 0.0.4 (2023-07-14)

New features

Bug fixes

Documentation

Others

Contributors