VLM Engines Reference
vlm4ocr.vlm_engines.OllamaVLMEngine
OllamaVLMEngine(
model_name: str,
num_ctx: int = 8192,
keep_alive: int = 300,
config: VLMConfig = None,
**kwrs
)
Bases: VLMEngine
The Ollama inference engine.
Parameters:
model_name : str the model name exactly as shown in >> ollama ls num_ctx : int, Optional context length that LLM will evaluate. keep_alive : int, Optional seconds to hold the LLM after the last API call. config : LLMConfig the LLM configuration.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
) -> Union[str, Generator[str, None, None]]
This method inputs chat messages and outputs VLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
chat_async
async
Async version of chat method. Streaming is not supported.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
get_ocr_messages
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.OpenAIVLMEngine
Bases: VLMEngine
The OpenAI API inference engine. Supports OpenAI models and OpenAI compatible servers: - vLLM OpenAI compatible server (https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html)
For parameters and documentation, refer to https://platform.openai.com/docs/api-reference/introduction
Parameters:
model_name : str model name as described in https://platform.openai.com/docs/models config : VLMConfig, Optional the VLM configuration. Must be a child class of VLMConfig.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
) -> Union[str, Generator[str, None, None]]
This method inputs chat messages and outputs LLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
chat_async
async
Async version of chat method. Streaming is not supported.
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
get_ocr_messages
get_ocr_messages(
system_prompt: str,
user_prompt: str,
image: Image,
format: str = "png",
detail: str = "high",
) -> List[Dict[str, str]]
This method inputs an image and returns the correesponding chat messages for the inference engine.
Parameters:
system_prompt : str the system prompt. user_prompt : str the user prompt. image : Image.Image the image for OCR. format : str, Optional the image format. detail : str, Optional the detail level of the image. Default is "high".
Source code in packages/vlm4ocr/vlm4ocr/vlm_engines.py
vlm4ocr.vlm_engines.AzureOpenAIVLMEngine
Bases: OpenAIVLMEngine
The Azure OpenAI API inference engine. For parameters and documentation, refer to - https://azure.microsoft.com/en-us/products/ai-services/openai-service - https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart
Parameters:
model : str model name as described in https://platform.openai.com/docs/models api_version : str the Azure OpenAI API version config : LLMConfig the LLM configuration.