Skip to content

Commit

Permalink
DOC: Add doc for ocr (xorbitsai#2492)
Browse files Browse the repository at this point in the history
Co-authored-by: qinxuye <[email protected]>
  • Loading branch information
codingl2k1 and qinxuye authored Oct 30, 2024
1 parent 9a5aeb0 commit bd599b2
Show file tree
Hide file tree
Showing 5 changed files with 83 additions and 12 deletions.
40 changes: 28 additions & 12 deletions doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ msgid ""
msgstr ""
"Project-Id-Version: Xinference \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-08-09 19:13+0800\n"
"POT-Creation-Date: 2024-10-30 07:49+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: zh_CN\n"
Expand All @@ -17,7 +17,7 @@ msgstr ""
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.14.0\n"
"Generated-By: Babel 2.16.0\n"

#: ../../source/models/model_abilities/image.rst:5
msgid "Images"
Expand Down Expand Up @@ -143,34 +143,39 @@ msgid ""
" move a model component onto the GPU when it needs to be executed, while "
"keeping the remaining components on the CPU."
msgstr ""
"``--cpu_offload True``:指定 ``True`` 会在推理过程中将模型的组件卸载到 CPU 上以节省内存,"
"这会导致推理延迟略有增加。模型卸载仅会在需要执行时将模型组件移动到 GPU 上,同时保持其余组件在 CPU 上"
"``--cpu_offload True``:指定 ``True`` 会在推理过程中将模型的组件卸载到 "
"CPU 上以节省内存,这会导致推理延迟略有增加。模型卸载仅会在需要执行时将"
"模型组件移动到 GPU 上,同时保持其余组件在 CPU 上"

#: ../../source/models/model_abilities/image.rst:117
msgid ""
"``--quantize_text_encoder <text encoder layer>``: We leveraged the "
"``bitsandbytes`` library to load and quantize the T5-XXL text encoder to "
"8-bit precision. This allows you to keep using all text encoders "
"while only slightly impacting performance."
msgstr "``--quantize_text_encoder <text encoder layer>``:我们利用 ``bitsandbytes`` 库"
"加载并量化 T5-XXL 文本编码器至8位精度。这使得你能够在仅轻微影响性能的情况下继续使用全部文本编码器。"
"8-bit precision. This allows you to keep using all text encoders while "
"only slightly impacting performance."
msgstr ""
"``--quantize_text_encoder <text encoder layer>``:我们利用 ``bitsandbytes"
"`` 库加载并量化 T5-XXL 文本编码器至8位精度。这使得你能够在仅轻微影响性能"
"的情况下继续使用全部文本编码器。"

#: ../../source/models/model_abilities/image.rst:120
msgid ""
"``--text_encoder_3 None``, for sd3-medium, removing the memory-intensive "
"4.7B parameter T5-XXL text encoder during inference can significantly "
"decrease the memory requirements with only a slight loss in performance."
msgstr ""
"``--text_encoder_3 None``,对于 sd3-medium,"
"移除在推理过程中内存密集型的47亿参数T5-XXL文本编码器可以显著降低内存需求,而仅造成性能上的轻微损失。"
"``--text_encoder_3 None``,对于 sd3-medium,移除在推理过程中内存密集型的"
"47亿参数T5-XXL文本编码器可以显著降低内存需求,而仅造成性能上的轻微损失。"

#: ../../source/models/model_abilities/image.rst:124
msgid ""
"If you are trying to run large image models liek sd3-medium or FLUX.1 "
"series on GPU card that has less memory than 24GB, you may encounter OOM "
"when launching or inference. Try below solutions."
msgstr "如果你试图在显存小于24GB的GPU上运行像sd3-medium或FLUX.1系列这样的大型图像模型,"
"你在启动或推理过程中可能会遇到显存溢出(OOM)的问题。尝试以下解决方案。"
msgstr ""
"如果你试图在显存小于24GB的GPU上运行像sd3-medium或FLUX.1系列这样的大型图像"
"模型,你在启动或推理过程中可能会遇到显存溢出(OOM)的问题。尝试以下"
"解决方案。"

#: ../../source/models/model_abilities/image.rst:128
msgid "For FLUX.1 series, try to apply quantization."
Expand Down Expand Up @@ -200,4 +205,15 @@ msgstr ""
msgid "Learn from a Stable Diffusion ControlNet example"
msgstr "学习一个 Stable Diffusion 控制网络的示例"

#: ../../source/models/model_abilities/image.rst:160
msgid "OCR"
msgstr ""

#: ../../source/models/model_abilities/image.rst:162
msgid "The OCR API accepts image bytes and returns the OCR text."
msgstr "OCR API 接受图像字节并返回 OCR 文本。"

#: ../../source/models/model_abilities/image.rst:164
msgid "We can try OCR API out either via cURL, or Xinference's python client:"
msgstr "可以通过 cURL 或 Xinference 的 Python 客户端来尝试 OCR API。"

19 changes: 19 additions & 0 deletions doc/source/models/builtin/image/got-ocr2_0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.. _models_builtin_got-ocr2_0:

==========
GOT-OCR2_0
==========

- **Model Name:** GOT-OCR2_0
- **Model Family:** ocr
- **Abilities:** ocr
- **Available ControlNet:** None

Specifications
^^^^^^^^^^^^^^

- **Model ID:** stepfun-ai/GOT-OCR2_0

Execute the following command to launch the model::

xinference launch --model-name GOT-OCR2_0 --model-type image
2 changes: 2 additions & 0 deletions doc/source/models/builtin/image/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ The following is a list of built-in image models in Xinference:

flux.1-schnell

got-ocr2_0

kolors

sd-turbo
Expand Down
31 changes: 31 additions & 0 deletions doc/source/models/model_abilities/image.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,34 @@ You can find more examples of Images API in the tutorial notebook:

Learn from a Stable Diffusion ControlNet example

OCR
--------------------

The OCR API accepts image bytes and returns the OCR text.

We can try OCR API out either via cURL, or Xinference's python client:

.. tabs::

.. code-tab:: bash cURL

curl -X 'POST' \
'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/images/ocr' \
-F model=<MODEL_UID> \
-F [email protected]


.. code-tab:: python Xinference Python Client

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")

model = client.get_model("<MODEL_UID>")
with open("xxx.jpg", "rb") as f:
model.ocr(f.read())


.. code-tab:: text output

<OCR result string>
3 changes: 3 additions & 0 deletions xinference/model/image/ocr/got_ocr2.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ def ocr(
logger.info("Got OCR 2.0 kwargs: %s", kwargs)
if "ocr_type" not in kwargs:
kwargs["ocr_type"] = "ocr"
if image.mode == "RGBA" or image.mode == "CMYK":
# convert to RGB
image = image.convert("RGB")
assert self._model is not None
# This chat API limits the max new tokens inside.
return self._model.chat(self._tokenizer, image, gradio_input=True, **kwargs)

0 comments on commit bd599b2

Please sign in to comment.