CLI Reference

usage: vlm4ocr [-h] --input_path INPUT_PATH
               [--output_mode {markdown,HTML,text,JSON,bbox}]
               [--output_path OUTPUT_PATH] [--skip_existing]
               [--rotate_correction {tesseract,vlm}]
               [--max_dimension_pixels MAX_DIMENSION_PIXELS] --vlm_engine
               {openai,azure_openai,ollama,openai_compatible,vllm,sglang,openrouter}
               --model MODEL [--max_new_tokens MAX_NEW_TOKENS]
               [--temperature TEMPERATURE] [--top_p TOP_P]
               [--presence_penalty PRESENCE_PENALTY] [--extra_body EXTRA_BODY]
               [--api_key API_KEY] [--base_url BASE_URL]
               [--azure_api_key AZURE_API_KEY]
               [--azure_endpoint AZURE_ENDPOINT]
               [--azure_api_version AZURE_API_VERSION]
               [--ollama_host OLLAMA_HOST] [--ollama_num_ctx OLLAMA_NUM_CTX]
               [--ollama_keep_alive OLLAMA_KEEP_ALIVE]
               [--user_prompt USER_PROMPT]
               [--user_prompt_file USER_PROMPT_FILE]
               [--system_prompt SYSTEM_PROMPT]
               [--system_prompt_file SYSTEM_PROMPT_FILE]
               [--concurrent_batch_size CONCURRENT_BATCH_SIZE]
               [--max_file_load MAX_FILE_LOAD] [--log] [--debug]

VLM4OCR: Perform OCR on images, PDFs, or TIFF files using Vision Language
Models. Processing is concurrent by default.

options:
  -h, --help            show this help message and exit

Input/Output Options:
  --input_path INPUT_PATH
                        Path to a single input file or a directory of files.
                        (default: None)
  --output_mode {markdown,HTML,text,JSON,bbox}
                        Output format. 'JSON' requires --user_prompt or
                        --user_prompt_file to define the JSON structure.
                        'bbox' writes per-doc <basename>_ocr.json plus per-
                        page <basename>_ocr_page<N>.png annotated images.
                        (default: markdown)
  --output_path OUTPUT_PATH
                        Optional: Path to save OCR results. If input_path is a
                        directory of multiple files, this should be an output
                        directory. If input is a single file, this can be a
                        full file path or a directory. If not provided,
                        results are saved to the current working directory (or
                        a sub-directory for logs if --log is used). (default:
                        None)
  --skip_existing       Skip processing files that already have OCR results in
                        the output directory. (default: False)

Image Processing Parameters:
  --rotate_correction {tesseract,vlm}
                        Rotation correction method for input images.
                        'tesseract' requires Tesseract OCR to be installed.
                        'vlm' prompts the configured VLM engine. Omit to
                        disable. (default: None)
  --max_dimension_pixels MAX_DIMENSION_PIXELS
                        Maximum dimension (width or height) in pixels for
                        input images. Images larger than this will be resized
                        to fit within this limit while maintaining aspect
                        ratio. (default: 4000)

VLM Engine Options:
  --vlm_engine {openai,azure_openai,ollama,openai_compatible,vllm,sglang,openrouter}
                        VLM engine. (default: None)
  --model MODEL         Model identifier for the VLM engine. (default: None)
  --max_new_tokens MAX_NEW_TOKENS
                        Max new tokens for VLM. (default: 4096)
  --temperature TEMPERATURE
                        Sampling temperature. (default: None)
  --top_p TOP_P         Sampling top p. (default: None)
  --presence_penalty PRESENCE_PENALTY
                        Presence penalty. (default: None)
  --extra_body EXTRA_BODY
                        Extra body parameters as a JSON string (e.g.
                        '{"chat_template_kwargs": {"enable_thinking":
                        false}}'). (default: None)

OpenAI & OpenAI-Compatible Options:
  --api_key API_KEY     API key. (default: None)
  --base_url BASE_URL   Base URL for OpenAI-compatible services. (default:
                        None)

Azure OpenAI Options:
  --azure_api_key AZURE_API_KEY
                        Azure API key. (default: None)
  --azure_endpoint AZURE_ENDPOINT
                        Azure endpoint URL. (default: None)
  --azure_api_version AZURE_API_VERSION
                        Azure API version. (default: None)

Ollama Options:
  --ollama_host OLLAMA_HOST
                        Ollama host URL. (default: http://localhost:11434)
  --ollama_num_ctx OLLAMA_NUM_CTX
                        Context length for Ollama. (default: 4096)
  --ollama_keep_alive OLLAMA_KEEP_ALIVE
                        Ollama keep_alive seconds. (default: 300)

OCR Engine Parameters:
  --user_prompt USER_PROMPT
                        Custom user prompt (inline string). For longer prompts
                        use --user_prompt_file. If both are given,
                        --user_prompt wins with a warning. (default: None)
  --user_prompt_file USER_PROMPT_FILE
                        Path to a text file containing the user prompt.
                        (default: None)
  --system_prompt SYSTEM_PROMPT
                        Custom system prompt (inline string). For longer
                        prompts use --system_prompt_file. If both are given,
                        --system_prompt wins with a warning. (default: None)
  --system_prompt_file SYSTEM_PROMPT_FILE
                        Path to a text file containing the system prompt.
                        (default: None)

Processing Options:
  --concurrent_batch_size CONCURRENT_BATCH_SIZE
                        Number of images/pages to process concurrently. Set to
                        1 for sequential processing of VLM calls. (default: 4)
  --max_file_load MAX_FILE_LOAD
                        Number of input files to pre-load. Set to -1 for
                        automatic config: 2 * concurrent_batch_size. (default:
                        -1)
  --log                 Enable writing logs to a timestamped file in the
                        output directory. (default: False)
  --debug               Enable debug level logging for console (and file if
                        --log is active). (default: False)