DOC: Add doc for ocr (xorbitsai#2492)

Co-authored-by: qinxuye <[email protected]>
amumu96 · Oct 30, 2024 · bd599b2 · bd599b2
1 parent 9a5aeb0
commit bd599b2
Show file tree

Hide file tree

Showing 5 changed files with 83 additions and 12 deletions.
diff --git a/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po b/doc/source/locale/zh_CN/LC_MESSAGES/models/model_abilities/image.po
@@ -8,7 +8,7 @@ msgid ""
 msgstr ""
 "Project-Id-Version: Xinference \n"
 "Report-Msgid-Bugs-To: \n"
-"POT-Creation-Date: 2024-08-09 19:13+0800\n"
+"POT-Creation-Date: 2024-10-30 07:49+0000\n"
 "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
 "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
 "Language: zh_CN\n"
@@ -17,7 +17,7 @@ msgstr ""
 "MIME-Version: 1.0\n"
 "Content-Type: text/plain; charset=utf-8\n"
 "Content-Transfer-Encoding: 8bit\n"
-"Generated-By: Babel 2.14.0\n"
+"Generated-By: Babel 2.16.0\n"
 
 #: ../../source/models/model_abilities/image.rst:5
 msgid "Images"
@@ -143,34 +143,39 @@ msgid ""
 " move a model component onto the GPU when it needs to be executed, while "
 "keeping the remaining components on the CPU."
 msgstr ""
-"``--cpu_offload True``：指定 ``True`` 会在推理过程中将模型的组件卸载到 CPU 上以节省内存，"
-"这会导致推理延迟略有增加。模型卸载仅会在需要执行时将模型组件移动到 GPU 上，同时保持其余组件在 CPU 上"
+"``--cpu_offload True``：指定 ``True`` 会在推理过程中将模型的组件卸载到 "
+"CPU 上以节省内存，这会导致推理延迟略有增加。模型卸载仅会在需要执行时将"
+"模型组件移动到 GPU 上，同时保持其余组件在 CPU 上"
 
 #: ../../source/models/model_abilities/image.rst:117
 msgid ""
 "``--quantize_text_encoder <text encoder layer>``: We leveraged the "
 "``bitsandbytes`` library to load and quantize the T5-XXL text encoder to "
-"8-bit precision. This allows you to keep using all text encoders "
-"while only slightly impacting performance."
-msgstr "``--quantize_text_encoder <text encoder layer>``：我们利用 ``bitsandbytes`` 库"
-"加载并量化 T5-XXL 文本编码器至8位精度。这使得你能够在仅轻微影响性能的情况下继续使用全部文本编码器。"
+"8-bit precision. This allows you to keep using all text encoders while "
+"only slightly impacting performance."
+msgstr ""
+"``--quantize_text_encoder <text encoder layer>``：我们利用 ``bitsandbytes"
+"`` 库加载并量化 T5-XXL 文本编码器至8位精度。这使得你能够在仅轻微影响性能"
+"的情况下继续使用全部文本编码器。"
 
 #: ../../source/models/model_abilities/image.rst:120
 msgid ""
 "``--text_encoder_3 None``, for sd3-medium, removing the memory-intensive "
 "4.7B parameter T5-XXL text encoder during inference can significantly "
 "decrease the memory requirements with only a slight loss in performance."
 msgstr ""
-"``--text_encoder_3 None``，对于 sd3-medium，"
-"移除在推理过程中内存密集型的47亿参数T5-XXL文本编码器可以显著降低内存需求，而仅造成性能上的轻微损失。"
+"``--text_encoder_3 None``，对于 sd3-medium，移除在推理过程中内存密集型的"
+"47亿参数T5-XXL文本编码器可以显著降低内存需求，而仅造成性能上的轻微损失。"
 
 #: ../../source/models/model_abilities/image.rst:124
 msgid ""
 "If you are trying to run large image models liek sd3-medium or FLUX.1 "
 "series on GPU card that has less memory than 24GB, you may encounter OOM "
 "when launching or inference. Try below solutions."
-msgstr "如果你试图在显存小于24GB的GPU上运行像sd3-medium或FLUX.1系列这样的大型图像模型，"
-"你在启动或推理过程中可能会遇到显存溢出（OOM）的问题。尝试以下解决方案。"
+msgstr ""
+"如果你试图在显存小于24GB的GPU上运行像sd3-medium或FLUX.1系列这样的大型图像"
+"模型，你在启动或推理过程中可能会遇到显存溢出（OOM）的问题。尝试以下"
+"解决方案。"
 
 #: ../../source/models/model_abilities/image.rst:128
 msgid "For FLUX.1 series, try to apply quantization."
@@ -200,4 +205,15 @@ msgstr ""
 msgid "Learn from a Stable Diffusion ControlNet example"
 msgstr "学习一个 Stable Diffusion 控制网络的示例"
 
+#: ../../source/models/model_abilities/image.rst:160
+msgid "OCR"
+msgstr ""
+
+#: ../../source/models/model_abilities/image.rst:162
+msgid "The OCR API accepts image bytes and returns the OCR text."
+msgstr "OCR API 接受图像字节并返回 OCR 文本。"
+
+#: ../../source/models/model_abilities/image.rst:164
+msgid "We can try OCR API out either via cURL, or Xinference's python client:"
+msgstr "可以通过 cURL 或 Xinference 的 Python 客户端来尝试 OCR API。"
 
diff --git a/doc/source/models/builtin/image/got-ocr2_0.rst b/doc/source/models/builtin/image/got-ocr2_0.rst
@@ -0,0 +1,19 @@
+.. _models_builtin_got-ocr2_0:
+
+==========
+GOT-OCR2_0
+==========
+
+- **Model Name:** GOT-OCR2_0
+- **Model Family:** ocr
+- **Abilities:** ocr
+- **Available ControlNet:** None
+
+Specifications
+^^^^^^^^^^^^^^
+
+- **Model ID:** stepfun-ai/GOT-OCR2_0
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name GOT-OCR2_0 --model-type image
diff --git a/doc/source/models/builtin/image/index.rst b/doc/source/models/builtin/image/index.rst
@@ -15,6 +15,8 @@ The following is a list of built-in image models in Xinference:
 
    flux.1-schnell
 
+   got-ocr2_0
+
    kolors
 
    sd-turbo

diff --git a/doc/source/models/model_abilities/image.rst b/doc/source/models/model_abilities/image.rst
@@ -156,3 +156,34 @@ You can find more examples of Images API in the tutorial notebook:
 
       Learn from a Stable Diffusion ControlNet example
 
+OCR
+--------------------
+
+The OCR API accepts image bytes and returns the OCR text.
+
+We can try OCR API out either via cURL, or Xinference's python client:
+
+.. tabs::
+
+  .. code-tab:: bash cURL
+
+    curl -X 'POST' \
+      'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/images/ocr' \
+      -F model=<MODEL_UID> \
+      -F [email protected]
+
+
+  .. code-tab:: python Xinference Python Client
+
+    from xinference.client import Client
+
+    client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
+
+    model = client.get_model("<MODEL_UID>")
+    with open("xxx.jpg", "rb") as f:
+        model.ocr(f.read())
+
+
+  .. code-tab:: text output
+
+    <OCR result string>
diff --git a/xinference/model/image/ocr/got_ocr2.py b/xinference/model/image/ocr/got_ocr2.py
@@ -71,6 +71,9 @@ def ocr(
         logger.info("Got OCR 2.0 kwargs: %s", kwargs)
         if "ocr_type" not in kwargs:
             kwargs["ocr_type"] = "ocr"
+        if image.mode == "RGBA" or image.mode == "CMYK":
+            # convert to RGB
+            image = image.convert("RGB")
         assert self._model is not None
         # This chat API limits the max new tokens inside.
         return self._model.chat(self._tokenizer, image, gradio_input=True, **kwargs)
-Original file line number
+Diff line change
@@ Expand Up @@
        flux.1-schnell
+       got-ocr2_0
        kolors
        sd-turbo
@@ Expand Down @@