Engines API
llm_ie.engines.InferenceEngine
This is an abstract class to provide interfaces for LLM inference engines. Children classes that inherts this class can be used in extrators. Must implement chat() method.
Parameters:
config : LLMConfig the LLM configuration. Must be a child class of LLMConfig.
Source code in package/llm-ie/src/llm_ie/engines.py
chat
abstractmethod
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
) -> Union[str, Generator[Dict[str, str], None, None]]
This method inputs chat messages and outputs LLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, LLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time.
Source code in package/llm-ie/src/llm_ie/engines.py
llm_ie.engines.OllamaInferenceEngine
OllamaInferenceEngine(
model_name: str,
num_ctx: int = 4096,
keep_alive: int = 300,
config: LLMConfig = None,
**kwrs
)
Bases: InferenceEngine
The Ollama inference engine.
Parameters:
model_name : str the model name exactly as shown in >> ollama ls num_ctx : int, Optional context length that LLM will evaluate. keep_alive : int, Optional seconds to hold the LLM after the last API call. config : LLMConfig the LLM configuration.
Source code in package/llm-ie/src/llm_ie/engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
) -> Union[str, Generator[Dict[str, str], None, None]]
This method inputs chat messages and outputs VLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time.
Source code in package/llm-ie/src/llm_ie/engines.py
chat_async
async
Async version of chat method. Streaming is not supported.
Source code in package/llm-ie/src/llm_ie/engines.py
llm_ie.engines.OpenAIInferenceEngine
Bases: InferenceEngine
The OpenAI API inference engine. Supports OpenAI models and OpenAI compatible servers: - vLLM OpenAI compatible server (https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html) - Llama.cpp OpenAI compatible server (https://llama-cpp-python.readthedocs.io/en/latest/server/)
For parameters and documentation, refer to https://platform.openai.com/docs/api-reference/introduction
Parameters:
model_name : str model name as described in https://platform.openai.com/docs/models
Source code in package/llm-ie/src/llm_ie/engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
) -> Union[str, Generator[Dict[str, str], None, None]]
This method inputs chat messages and outputs LLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time.
Source code in package/llm-ie/src/llm_ie/engines.py
chat_async
async
Async version of chat method. Streaming is not supported.
Source code in package/llm-ie/src/llm_ie/engines.py
llm_ie.engines.AzureOpenAIInferenceEngine
Bases: OpenAIInferenceEngine
The Azure OpenAI API inference engine. For parameters and documentation, refer to - https://azure.microsoft.com/en-us/products/ai-services/openai-service - https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart
Parameters:
model : str model name as described in https://platform.openai.com/docs/models api_version : str the Azure OpenAI API version config : LLMConfig the LLM configuration.
Source code in package/llm-ie/src/llm_ie/engines.py
llm_ie.engines.HuggingFaceHubInferenceEngine
HuggingFaceHubInferenceEngine(
model: str = None,
token: Union[str, bool] = None,
base_url: str = None,
api_key: str = None,
config: LLMConfig = None,
**kwrs
)
Bases: InferenceEngine
The Huggingface_hub InferenceClient inference engine. For parameters and documentation, refer to https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client
Parameters:
model : str the model name exactly as shown in Huggingface repo token : str, Optional the Huggingface token. If None, will use the token in os.environ['HF_TOKEN']. base_url : str, Optional the base url for the LLM server. If None, will use the default Huggingface Hub URL. api_key : str, Optional the API key for the LLM server. config : LLMConfig the LLM configuration.
Source code in package/llm-ie/src/llm_ie/engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
) -> Union[str, Generator[Dict[str, str], None, None]]
This method inputs chat messages and outputs LLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time.
Source code in package/llm-ie/src/llm_ie/engines.py
chat_async
async
Async version of chat method. Streaming is not supported.
Source code in package/llm-ie/src/llm_ie/engines.py
llm_ie.engines.LiteLLMInferenceEngine
LiteLLMInferenceEngine(
model: str = None,
base_url: str = None,
api_key: str = None,
config: LLMConfig = None,
)
Bases: InferenceEngine
The LiteLLM inference engine. For parameters and documentation, refer to https://github.com/BerriAI/litellm?tab=readme-ov-file
Parameters:
model : str the model name base_url : str, Optional the base url for the LLM server api_key : str, Optional the API key for the LLM server config : LLMConfig the LLM configuration.
Source code in package/llm-ie/src/llm_ie/engines.py
chat
chat(
messages: List[Dict[str, str]],
verbose: bool = False,
stream: bool = False,
) -> Union[str, Generator[Dict[str, str], None, None]]
This method inputs chat messages and outputs LLM generated text.
Parameters:
messages : List[Dict[str,str]] a list of dict with role and content. role must be one of {"system", "user", "assistant"} verbose : bool, Optional if True, VLM generated text will be printed in terminal in real-time. stream : bool, Optional if True, returns a generator that yields the output in real-time.
Source code in package/llm-ie/src/llm_ie/engines.py
chat_async
async
Async version of chat method. Streaming is not supported.