VLM Engines Reference
vlm4ocr.vlm_engines.BasicVLMConfig
Bases: LLMConfig
The basic LLM configuration for most non-reasoning models.
Source code in llm_inference_engine/llm_configs.py
preprocess_messages
This method preprocesses the input messages before passing them to the LLM.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Returns:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Source code in llm_inference_engine/llm_configs.py
postprocess_response
postprocess_response(
response: Union[
str,
Dict[str, Any],
Generator[str, None, None],
AsyncGenerator[str, None],
],
) -> Union[
Dict[str, Any],
Generator[Dict[str, Any], None, None],
AsyncGenerator[Dict[str, Any], None],
]
This method postprocesses the LLM response after it is generated.
Parameters:
response : Union[str, Dict[str, Any], Generator[str, None, None], AsyncGenerator[str, None]] the LLM response. Can be a string, a generator, or an async generator.
Returns: Union[Dict[str, Any], Generator[Dict[str, Any], None, None], AsyncGenerator[Dict[str, Any], None]]
the postprocessed LLM response.
if input is a string, the output will be a dict {"response":
Source code in llm_inference_engine/llm_configs.py
vlm4ocr.vlm_engines.ReasoningVLMConfig
Bases: LLMConfig
The general LLM configuration for reasoning models.
Source code in llm_inference_engine/llm_configs.py
preprocess_messages
This method preprocesses the input messages before passing them to the LLM.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Returns:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Source code in llm_inference_engine/llm_configs.py
postprocess_response
postprocess_response(
response: Union[
str,
Dict[str, str],
Generator[str, None, None],
AsyncGenerator[str, None],
],
) -> Union[
Dict[str, str],
Generator[Dict[str, str], None, None],
AsyncGenerator[Dict[str, str], None],
]
This method postprocesses the LLM response after it is generated. 1. If input is a string, it will extract the reasoning and response based on the thinking tokens. 2. If input is a dict, it should contain keys "reasoning", "response", or "tool_calls". This is for inference engines that already parse reasoning, response, and tool calls. 3. If input is a generator, a. if the chunk is a dict, it should contain keys "type" and "data". This is for inference engines that already parse reasoning, response, and tool calls. b. if the chunk is a string, it will yield dicts with keys "type" and "data" based on the thinking tokens.
Parameters:
response : Union[str, Generator[str, None, None]] the LLM response. Can be a string or a generator.
Returns:
response : Union[str, Generator[str, None, None]]
the postprocessed LLM response as a dict {"reasoning":
Source code in llm_inference_engine/llm_configs.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
vlm4ocr.vlm_engines.OpenAIReasoningVLMConfig
Bases: ReasoningLLMConfig
The OpenAI "o" series configuration. 1. The reasoning effort as one of {"low", "medium", "high"}. For models that do not support setting reasoning effort (e.g., o1-mini, o1-preview), set to None. 2. The temperature parameter is not supported and will be ignored. 3. The system prompt is not supported and will be concatenated to the next user prompt.
Parameters:
reasoning_effort : str, Optional the reasoning effort. Must be one of {"low", "medium", "high"}. Default is "low".
Source code in llm_inference_engine/llm_configs.py
preprocess_messages
Concatenate system prompts to the next user prompt.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Returns:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"}
Source code in llm_inference_engine/llm_configs.py
vlm4ocr.vlm_engines.OpenAICompatibleVLMEngine
OpenAICompatibleVLMEngine(
model: str,
api_key: str,
base_url: str,
config: LLMConfig = None,
max_concurrent_requests: int = None,
max_requests_per_minute: int = None,
**kwrs
)
Bases: OpenAICompatibleInferenceEngine, VLMEngine
Source code in llm_inference_engine/engines.py
get_ocr_messages
get_ocr_messages(
system_prompt: str,
user_prompt: str,
image: Image,
format: str = "png",
detail: str = "high",
few_shot_examples: List[FewShotExample] = None,
) -> List[Dict[str, str]]
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR. format : str, Optional the image format. detail : str, Optional the detail level of the image. Default is "high". few_shot_examples : List[FewShotExample], Optional list of few-shot examples.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.VLLMVLMEngine
VLLMVLMEngine(
model: str,
api_key: str = "",
base_url: str = "http://localhost:8000/v1",
config: LLMConfig = None,
max_concurrent_requests: int = None,
max_requests_per_minute: int = None,
**kwrs
)
Bases: VLLMInferenceEngine, OpenAICompatibleVLMEngine
vLLM OpenAI compatible server inference engine. https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
For parameters and documentation, refer to https://platform.openai.com/docs/api-reference/introduction
Parameters:
model_name : str model name as shown in the vLLM server api_key : str, Optional the API key for the vLLM server. base_url : str, Optional the base url for the vLLM server. config : LLMConfig the LLM configuration.
Source code in llm_inference_engine/engines.py
vlm4ocr.vlm_engines.OpenRouterVLMEngine
OpenRouterVLMEngine(
model: str,
api_key: str = None,
base_url: str = "https://openrouter.ai/api/v1",
config: LLMConfig = None,
max_concurrent_requests: int = None,
max_requests_per_minute: int = None,
**kwrs
)
Bases: OpenRouterInferenceEngine, OpenAICompatibleVLMEngine
OpenRouter OpenAI-compatible server inference engine.
Parameters:
model_name : str model name as shown in the vLLM server api_key : str, Optional the API key for the vLLM server. If None, will use the key in os.environ['OPENROUTER_API_KEY']. base_url : str, Optional the base url for the vLLM server. config : LLMConfig the LLM configuration.
Source code in llm_inference_engine/engines.py
vlm4ocr.vlm_engines.OllamaVLMEngine
OllamaVLMEngine(
model_name: str,
num_ctx: int = 4096,
keep_alive: int = 300,
config: LLMConfig = None,
max_concurrent_requests: int = None,
max_requests_per_minute: int = None,
**kwrs
)
Bases: OllamaInferenceEngine, VLMEngine
Source code in llm_inference_engine/engines.py
get_ocr_messages
get_ocr_messages(
system_prompt: str,
user_prompt: str,
image: Image,
few_shot_examples: List[FewShotExample] = None,
) -> List[Dict[str, str]]
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR. few_shot_examples : List[FewShotExample], Optional list of few-shot examples.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.OpenAIVLMEngine
OpenAIVLMEngine(
model: str,
config: LLMConfig = None,
max_concurrent_requests: int = None,
max_requests_per_minute: int = None,
**kwrs
)
Bases: OpenAIInferenceEngine, VLMEngine
Source code in llm_inference_engine/engines.py
get_ocr_messages
get_ocr_messages(
system_prompt: str,
user_prompt: str,
image: Image,
format: str = "png",
detail: str = "high",
few_shot_examples: List[FewShotExample] = None,
) -> List[Dict[str, str]]
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR. format : str, Optional the image format. detail : str, Optional the detail level of the image. Default is "high". few_shot_examples : List[FewShotExample], Optional list of few-shot examples. Each example is a dict with keys "image" (PIL.Image.Image) and "text" (str).
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.AzureOpenAIVLMEngine
AzureOpenAIVLMEngine(
model: str,
api_version: str,
config: LLMConfig = None,
max_concurrent_requests: int = None,
max_requests_per_minute: int = None,
**kwrs
)
Bases: AzureOpenAIInferenceEngine, OpenAIVLMEngine
The Azure OpenAI API inference engine. For parameters and documentation, refer to - https://azure.microsoft.com/en-us/products/ai-services/openai-service - https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart
Parameters:
model : str model name as described in https://platform.openai.com/docs/models api_version : str the Azure OpenAI API version config : LLMConfig the LLM configuration.