VLM Engines Reference
vlm4ocr.vlm_engines.BasicVLMConfig
Bases: VLMConfig
The basic VLM configuration for most non-reasoning models.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
preprocess_messages
This method preprocesses the input messages before passing them to the VLM.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Returns:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
postprocess_response
postprocess_response(
response: Union[
str, Dict[str, str], Generator[str, None, None]
],
) -> Union[
Dict[str, str], Generator[Dict[str, str], None, None]
]
This method postprocesses the VLM response after it is generated.
Parameters:
response : Union[str, Generator[str, None, None]] the VLM response. Can be a string or a generator.
Returns: Union[str, Generator[Dict[str, str], None, None]]
the postprocessed VLM response.
if input is a generator, the output will be a generator {"type": "response", "data":
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.ReasoningVLMConfig
Bases: VLMConfig
The general configuration for reasoning vision models.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
preprocess_messages
This method preprocesses the input messages before passing them to the VLM.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Returns:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
postprocess_response
postprocess_response(
response: Union[
str, Dict[str, str], Generator[str, None, None]
],
) -> Union[
Dict[str, str], Generator[Dict[str, str], None, None]
]
This method postprocesses the VLM response after it is generated. 1. If input is a string, it will extract the reasoning and response based on the thinking tokens. 2. If input is a dict, it should contain keys "reasoning" and "response". This is for inference engines that already parse reasoning and response. 3. If input is a generator, a. if the chunk is a dict, it should contain keys "type" and "data". This is for inference engines that already parse reasoning and response. b. if the chunk is a string, it will yield dicts with keys "type" and "data" based on the thinking tokens.
Parameters:
response : Union[str, Generator[str, None, None]] the VLM response. Can be a string or a generator.
Returns:
response : Union[str, Generator[str, None, None]]
the postprocessed LLM response as a dict {"reasoning":
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.OpenAIReasoningVLMConfig
Bases: ReasoningVLMConfig
The OpenAI "o" series configuration. 1. The reasoning effort is set to "low" by default. 2. The temperature parameter is not supported and will be ignored. 3. The system prompt is not supported and will be concatenated to the next user prompt.
Parameters:
reasoning_effort : str, Optional the reasoning effort. Must be one of {"low", "medium", "high"}. Default is "low".
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
preprocess_messages
Concatenate system prompts to the next user prompt.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Returns:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.OpenAICompatibleVLMEngine
OpenAICompatibleVLMEngine(
model: str,
api_key: str,
base_url: str,
config: VLMConfig = None,
**kwrs
)
Bases: VLMEngine
General OpenAI-compatible server inference engine. https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
For parameters and documentation, refer to https://platform.openai.com/docs/api-reference/introduction
Parameters:
model_name : str model name as shown in the vLLM server api_key : str the API key for the vLLM server. base_url : str the base url for the vLLM server. config : LLMConfig the LLM configuration.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
messages_logger: MessagesLogger = None,
) -> Union[
Dict[str, str], Generator[Dict[str, str], None, None]
]
This method inputs chat messages and outputs LLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time. messages_logger : MessagesLogger, Optional the message logger that logs the chat messages.
Returns:
response : Union[Dict[str,str], Generator[Dict[str, str], None, None]]
a dict {"reasoning":
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 | |
chat_async
async
chat_async(
messages: List[Dict[str, str]],
messages_logger: MessagesLogger = None,
) -> Dict[str, str]
Async version of chat method. Streaming is not supported.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
get_ocr_messages
get_ocr_messages(
system_prompt: str,
user_prompt: str,
image: Image,
format: str = "png",
detail: str = "high",
few_shot_examples: List[FewShotExample] = None,
) -> List[Dict[str, str]]
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR. format : str, Optional the image format. detail : str, Optional the detail level of the image. Default is "high". few_shot_examples : List[FewShotExample], Optional list of few-shot examples.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.VLLMVLMEngine
VLLMVLMEngine(
model: str,
api_key: str = "",
base_url: str = "http://localhost:8000/v1",
config: VLMConfig = None,
**kwrs
)
Bases: OpenAICompatibleVLMEngine
vLLM OpenAI compatible server inference engine. https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
For parameters and documentation, refer to https://platform.openai.com/docs/api-reference/introduction
Parameters:
model_name : str model name as shown in the vLLM server api_key : str, Optional the API key for the vLLM server. base_url : str, Optional the base url for the vLLM server. config : LLMConfig the LLM configuration.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.OpenRouterVLMEngine
OpenRouterVLMEngine(
model: str,
api_key: str = None,
base_url: str = "https://openrouter.ai/api/v1",
config: VLMConfig = None,
**kwrs
)
Bases: OpenAICompatibleVLMEngine
OpenRouter OpenAI-compatible server inference engine.
Parameters:
model_name : str model name as shown in the vLLM server api_key : str, Optional the API key for the vLLM server. If None, will use the key in os.environ['OPENROUTER_API_KEY']. base_url : str, Optional the base url for the vLLM server. config : LLMConfig the LLM configuration.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.OllamaVLMEngine
OllamaVLMEngine(
model_name: str,
num_ctx: int = 8192,
keep_alive: int = 300,
config: VLMConfig = None,
**kwrs
)
Bases: VLMEngine
The Ollama inference engine.
Parameters:
model_name : str the model name exactly as shown in >> ollama ls num_ctx : int, Optional context length that LLM will evaluate. keep_alive : int, Optional seconds to hold the LLM after the last API call. config : LLMConfig the LLM configuration.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
messages_logger: MessagesLogger = None,
) -> Union[
Dict[str, str], Generator[Dict[str, str], None, None]
]
This method inputs chat messages and outputs VLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time. Messages_logger : MessagesLogger, Optional the message logger that logs the chat messages.
Returns:
response : Union[Dict[str,str], Generator[Dict[str, str], None, None]]
a dict {"reasoning":
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 | |
chat_async
async
chat_async(
messages: List[Dict[str, str]],
messages_logger: MessagesLogger = None,
) -> Dict[str, str]
Async version of chat method. Streaming is not supported.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
get_ocr_messages
get_ocr_messages(
system_prompt: str,
user_prompt: str,
image: Image,
few_shot_examples: List[FewShotExample] = None,
) -> List[Dict[str, str]]
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR. few_shot_examples : List[FewShotExample], Optional list of few-shot examples.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.OpenAIVLMEngine
Bases: VLMEngine
The OpenAI API inference engine. Supports OpenAI models and OpenAI compatible servers: - vLLM OpenAI compatible server (https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html)
For parameters and documentation, refer to https://platform.openai.com/docs/api-reference/introduction
Parameters:
model_name : str model name as described in https://platform.openai.com/docs/models config : VLMConfig, Optional the VLM configuration. Must be a child class of VLMConfig.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
messages_logger: MessagesLogger = None,
) -> Union[
Dict[str, str], Generator[Dict[str, str], None, None]
]
This method inputs chat messages and outputs LLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time. messages_logger : MessagesLogger, Optional the message logger that logs the chat messages.
Returns:
response : Union[Dict[str,str], Generator[Dict[str, str], None, None]]
a dict {"reasoning":
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 | |
chat_async
async
chat_async(
messages: List[Dict[str, str]],
messages_logger: MessagesLogger = None,
) -> Dict[str, str]
Async version of chat method. Streaming is not supported.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
get_ocr_messages
get_ocr_messages(
system_prompt: str,
user_prompt: str,
image: Image,
format: str = "png",
detail: str = "high",
few_shot_examples: List[FewShotExample] = None,
) -> List[Dict[str, str]]
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR. format : str, Optional the image format. detail : str, Optional the detail level of the image. Default is "high". few_shot_examples : List[FewShotExample], Optional list of few-shot examples. Each example is a dict with keys "image" (PIL.Image.Image) and "text" (str).
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 | |
vlm4ocr.vlm_engines.AzureOpenAIVLMEngine
Bases: OpenAIVLMEngine
The Azure OpenAI API inference engine. For parameters and documentation, refer to - https://azure.microsoft.com/en-us/products/ai-services/openai-service - https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart
Parameters:
model : str model name as described in https://platform.openai.com/docs/models api_version : str the Azure OpenAI API version config : LLMConfig the LLM configuration.