Skip to content

CLI Reference

usage: vlm4ocr [-h] --input_path INPUT_PATH
               [--output_mode {markdown,HTML,text}]
               [--output_path OUTPUT_PATH] [--skip_existing]
               [--rotate_correction]
               [--max_dimension_pixels MAX_DIMENSION_PIXELS] --vlm_engine
               {openai,azure_openai,ollama,openai_compatible} --model MODEL
               [--max_new_tokens MAX_NEW_TOKENS] [--temperature TEMPERATURE]
               [--api_key API_KEY] [--base_url BASE_URL]
               [--azure_api_key AZURE_API_KEY]
               [--azure_endpoint AZURE_ENDPOINT]
               [--azure_api_version AZURE_API_VERSION]
               [--ollama_host OLLAMA_HOST] [--ollama_num_ctx OLLAMA_NUM_CTX]
               [--ollama_keep_alive OLLAMA_KEEP_ALIVE]
               [--user_prompt USER_PROMPT]
               [--concurrent_batch_size CONCURRENT_BATCH_SIZE]
               [--max_file_load MAX_FILE_LOAD] [--log] [--debug]

VLM4OCR: Perform OCR on images, PDFs, or TIFF files using Vision Language
Models. Processing is concurrent by default.

options:
  -h, --help            show this help message and exit

Input/Output Options:
  --input_path INPUT_PATH
                        Path to a single input file or a directory of files.
                        (default: None)
  --output_mode {markdown,HTML,text}
                        Output format. (default: markdown)
  --output_path OUTPUT_PATH
                        Optional: Path to save OCR results. If input_path is a
                        directory of multiple files, this should be an output
                        directory. If input is a single file, this can be a
                        full file path or a directory. If not provided,
                        results are saved to the current working directory (or
                        a sub-directory for logs if --log is used). (default:
                        None)
  --skip_existing       Skip processing files that already have OCR results in
                        the output directory. (default: False)

Image Processing Parameters:
  --rotate_correction   Enable automatic rotation correction for input images.
                        This requires Tesseract OCR to be installed and
                        configured correctly. (default: False)
  --max_dimension_pixels MAX_DIMENSION_PIXELS
                        Maximum dimension (width or height) in pixels for
                        input images. Images larger than this will be resized
                        to fit within this limit while maintaining aspect
                        ratio. (default: 4000)

VLM Engine Options:
  --vlm_engine {openai,azure_openai,ollama,openai_compatible}
                        VLM engine. (default: None)
  --model MODEL         Model identifier for the VLM engine. (default: None)
  --max_new_tokens MAX_NEW_TOKENS
                        Max new tokens for VLM. (default: 4096)
  --temperature TEMPERATURE
                        Sampling temperature. (default: 0.0)

OpenAI & OpenAI-Compatible Options:
  --api_key API_KEY     API key. (default: None)
  --base_url BASE_URL   Base URL for OpenAI-compatible services. (default:
                        None)

Azure OpenAI Options:
  --azure_api_key AZURE_API_KEY
                        Azure API key. (default: None)
  --azure_endpoint AZURE_ENDPOINT
                        Azure endpoint URL. (default: None)
  --azure_api_version AZURE_API_VERSION
                        Azure API version. (default: None)

Ollama Options:
  --ollama_host OLLAMA_HOST
                        Ollama host URL. (default: http://localhost:11434)
  --ollama_num_ctx OLLAMA_NUM_CTX
                        Context length for Ollama. (default: 4096)
  --ollama_keep_alive OLLAMA_KEEP_ALIVE
                        Ollama keep_alive seconds. (default: 300)

OCR Engine Parameters:
  --user_prompt USER_PROMPT
                        Custom user prompt. (default: None)

Processing Options:
  --concurrent_batch_size CONCURRENT_BATCH_SIZE
                        Number of images/pages to process concurrently. Set to
                        1 for sequential processing of VLM calls. (default: 4)
  --max_file_load MAX_FILE_LOAD
                        Number of input files to pre-load. Set to -1 for
                        automatic config: 2 * concurrent_batch_size. (default:
                        -1)
  --log                 Enable writing logs to a timestamped file in the
                        output directory. (default: False)
  --debug               Enable debug level logging for console (and file if
                        --log is active). (default: False)