Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ test:linux --test_env LD_LIBRARY_PATH=/opt/opencv/lib/:/opt/intel/openvino/runti
test:linux --test_env OPENVINO_TOKENIZERS_PATH_GENAI=/opt/intel/openvino/runtime/lib/intel64/libopenvino_tokenizers.so
test:linux --test_env PYTHONPATH=/opt/intel/openvino/python:/ovms/bazel-bin/src/python/binding
test:linux --test_env no_proxy=localhost
test:linux --test_env "OVMS_MEDIA_URL_ALLOW_REDIRECTS=1"

# Bazelrc imports ############################################################################################################################
# file below should contain sth like
Expand Down
14 changes: 7 additions & 7 deletions demos/continuous_batching/vlm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Select deployment option depending on how you prepared models in the previous st
Running this command starts the container with CPU only target device:
```bash
mkdir -p models
docker run -d -u $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models:rw openvino/model_server:latest --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path /models --model_name OpenGVLab/InternVL2-2B --task text_generation --pipeline_type VLM
docker run -d -u $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models:rw openvino/model_server:latest --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path /models --model_name OpenGVLab/InternVL2-2B --task text_generation --pipeline_type VLM --allowed_media_domains raw.githubusercontent.com
```
**GPU**

Expand All @@ -39,7 +39,7 @@ to `docker run` command, use the image with GPU support.
It can be applied using the commands below:
```bash
mkdir -p models
docker run -d -u $(id -u):$(id -g) --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenGVLab/InternVL2-2B --task text_generation --target_device GPU --pipeline_type VLM
docker run -d -u $(id -u):$(id -g) --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/models:rw openvino/model_server:latest-gpu --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenGVLab/InternVL2-2B --task text_generation --target_device GPU --pipeline_type VLM --allowed_media_domains raw.githubusercontent.com
```
:::

Expand All @@ -49,11 +49,11 @@ If you run on GPU make sure to have appropriate drivers installed, so the device

```bat
mkdir models
ovms --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenGVLab/InternVL2-2B --task text_generation --pipeline_type VLM --target_device CPU
ovms --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenGVLab/InternVL2-2B --task text_generation --pipeline_type VLM --target_device CPU --allowed_media_domains raw.githubusercontent.com
```
or
```bat
ovms --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenGVLab/InternVL2-2B --task text_generation --pipeline_type VLM --target_device GPU
ovms --rest_port 8000 --source_model OpenVINO/InternVL2-2B-int4-ov --model_repository_path models --model_name OpenGVLab/InternVL2-2B --task text_generation --pipeline_type VLM --target_device GPU --allowed_media_domains raw.githubusercontent.com
```
:::

Expand Down Expand Up @@ -140,15 +140,15 @@ Select deployment option depending on how you prepared models in the previous st

Running this command starts the container with CPU only target device:
```bash
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/models:ro openvino/model_server:latest --rest_port 8000 --model_name OpenGVLab/InternVL2-2B --model_path /models/OpenGVLab/InternVL2-2B
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/models:ro openvino/model_server:latest --rest_port 8000 --model_name OpenGVLab/InternVL2-2B --model_path /models/OpenGVLab/InternVL2-2B --allowed_media_domains raw.githubusercontent.com
```
**GPU**

In case you want to use GPU device to run the generation, add extra docker parameters `--device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1)`
to `docker run` command, use the image with GPU support. Export the models with precision matching the GPU capacity and adjust pipeline configuration.
It can be applied using the commands below:
```bash
docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/models:ro openvino/model_server:latest-gpu --rest_port 8000 --model_name OpenGVLab/InternVL2-2B --model_path /models/OpenGVLab/InternVL2-2B
docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/models:ro openvino/model_server:latest-gpu --rest_port 8000 --model_name OpenGVLab/InternVL2-2B --model_path /models/OpenGVLab/InternVL2-2B --allowed_media_domains raw.githubusercontent.com
```
:::

Expand Down Expand Up @@ -200,7 +200,7 @@ Let's send a request with text an image in the messages context.
![zebra](../../../demos/common/static/images/zebra.jpeg)

:::{dropdown} **Unary call with curl using image url**

**Note**: using urls in request requires `--allowed_media_domains` parameter described [here](parameters.md)

```bash
curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{ \"model\": \"OpenGVLab/InternVL2-2B\", \"messages\":[{\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \"Describe what is one the picture.\"},{\"type\": \"image_url\", \"image_url\": {\"url\": \"http://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/3/demos/common/static/images/zebra.jpeg\"}}]}], \"max_completion_tokens\": 100}"
Expand Down
1 change: 1 addition & 0 deletions docs/model_server_rest_api_chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ curl http://localhost/v3/chat/completions \
"max_completion_tokens": 128
}'
```
**Note**: using urls in request requires `--allowed_media_domains` parameter described [here](parameters.md)

3) Image from local filesystem:
```
Expand Down
3 changes: 2 additions & 1 deletion docs/parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
| `"low_latency_transformation"` | `bool` | If set to true, model server will apply [low latency transformation](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-request/stateful-models/obtaining-stateful-openvino-model.html#lowlatency2-transformation) on model load. |
| `"metrics_enable"` | `bool` | Flag enabling [metrics](metrics.md) endpoint on rest_port. |
| `"metrics_list"` | `string` | Comma separated list of [metrics](metrics.md). If unset, only default metrics will be enabled.|
| `"allowed_local_media_path"` | `string` | Path to the directory containing images to include in requests. If unset, local filesystem images in requests are not supported.|

> **Note** : Specifying config_path is mutually exclusive with putting model parameters in the CLI ([serving multiple models](./starting_server.md)).

Expand Down Expand Up @@ -57,6 +56,8 @@ Configuration options for the server are defined only via command-line options a
| `allowed_methods` | `string` (default: *) | Comma-separated list of allowed methods in CORS requests. |
| `allowed_origins` | `string` (default: *) | Comma-separated list of allowed origins in CORS requests. |
| `api_key_file` | `string` | Path to the text file with the API key for generative endpoints `/v3/`. The value of first line is used. If not specified, server is using environment variable API_KEY. If not set, requests will not require authorization.|
| `allowed_local_media_path` | `string` | Path to the directory containing images to include in requests. If unset, local filesystem images in requests are not supported.|
| `allowed_media_domains` | `string` | Comma separated list of media domains from which URLs can be used as input for LLMs. Set to \"all\" to disable this restrictions. If unset, URLs in requests are not supported."

## Config management mode options

Expand Down
2 changes: 2 additions & 0 deletions docs/security_considerations.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ See also:
Generative endpoints starting with `/v3`, might be restricted with authorization and API key. It can be set during the server initialization with a parameter `api_key_file` or environment variable `API_KEY`.
The `api_key_file` should contain a path to the file containing the value of API key. The content of the file first line is used. If parameter api_key_file and variable API_KEY are not set, the server will not require any authorization. The client should send the API key inside the `Authorization` header as `Bearer <api_key>`.

OVMS supports multimodal models with image inputs provided as URL. However, to prevent Request Forgery (SSRF) attacks, all the URLs are restricted by default. To allow fetching image from specific domains use `--allowed_media_domains` parameter described [here](parameters.md). Also, consider setting OVMS_MEDIA_URL_ALLOW_REDIRECTS=1 to allow HTTP redirects, by default disabled to prevent from bypassing domain restrictions.

---

OpenVINO Model Server has a set of mechanisms preventing denial of service attacks from the client applications. They include the following:
Expand Down
32 changes: 32 additions & 0 deletions src/capi_frontend/capi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,38 @@ DLL_PUBLIC OVMS_Status* OVMS_ServerSettingsSetLogPath(OVMS_ServerSettings* setti
return nullptr;
}

DLL_PUBLIC OVMS_Status* OVMS_ServerSettingsSetAllowedLocalMediaPath(OVMS_ServerSettings* settings,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OVMS_API_VERSION_MINOR?

const char* allowed_local_media_path) {
if (settings == nullptr) {
return reinterpret_cast<OVMS_Status*>(new Status(StatusCode::NONEXISTENT_PTR, "server settings"));
}
if (allowed_local_media_path == nullptr) {
return reinterpret_cast<OVMS_Status*>(new Status(StatusCode::NONEXISTENT_PTR, "log path"));
}
ovms::ServerSettingsImpl* serverSettings = reinterpret_cast<ovms::ServerSettingsImpl*>(settings);
serverSettings->allowedLocalMediaPath = allowed_local_media_path;
return nullptr;
}

DLL_PUBLIC OVMS_Status* OVMS_ServerSettingsSetAllowedMediaDomains(OVMS_ServerSettings* settings,
const char* allowed_media_domains) {
if (settings == nullptr) {
return reinterpret_cast<OVMS_Status*>(new Status(StatusCode::NONEXISTENT_PTR, "server settings"));
}
if (allowed_media_domains == nullptr) {
return reinterpret_cast<OVMS_Status*>(new Status(StatusCode::NONEXISTENT_PTR, "log path"));
}
std::vector<std::string> domains;
std::string domain;
std::istringstream ss(allowed_media_domains);
while (std::getline(ss, domain, ',')) {
domains.push_back(domain);
}
ovms::ServerSettingsImpl* serverSettings = reinterpret_cast<ovms::ServerSettingsImpl*>(settings);
serverSettings->allowedMediaDomains = domains;
return nullptr;
}

DLL_PUBLIC OVMS_Status* OVMS_ModelsSettingsSetConfigPath(OVMS_ModelsSettings* settings,
const char* config_path) {
if (settings == nullptr) {
Expand Down
1 change: 1 addition & 0 deletions src/capi_frontend/server_settings.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ struct ServerSettingsImpl {
std::string metricsList;
std::string cpuExtensionLibraryPath;
std::optional<std::string> allowedLocalMediaPath;
std::optional<std::vector<std::string>> allowedMediaDomains;
std::string logLevel = "INFO";
std::string logPath;
bool allowCredentials = false;
Expand Down
7 changes: 7 additions & 0 deletions src/cli_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,10 @@ std::variant<bool, std::pair<int, std::string>> CLIParser::parse(int argc, char*
"A path to shared library containing custom CPU layer implementation. Default: empty.",
cxxopts::value<std::string>()->default_value(""),
"CPU_EXTENSION")
("allowed_media_domains",
"Comma separated list of media domains from which URLs can be used as input for LLMs. Set to \"all\" to disable this restriction.",
cxxopts::value<std::vector<std::string>>(),
"ALLOWED_MEDIA_DOMAINS")
("allowed_local_media_path",
"Path to directory that contains multimedia files that can be used as input for LLMs.",
cxxopts::value<std::string>(),
Expand Down Expand Up @@ -502,6 +506,9 @@ void CLIParser::prepareServer(ServerSettingsImpl& serverSettings) {
if (result->count("cpu_extension")) {
serverSettings.cpuExtensionLibraryPath = result->operator[]("cpu_extension").as<std::string>();
}
if (result->count("allowed_media_domains")) {
serverSettings.allowedMediaDomains = result->operator[]("allowed_media_domains").as<std::vector<std::string>>();
}
if (result->count("allowed_local_media_path")) {
serverSettings.allowedLocalMediaPath = result->operator[]("allowed_local_media_path").as<std::string>();
}
Expand Down
52 changes: 45 additions & 7 deletions src/llm/apis/openai_completions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "src/port/rapidjson_stringbuffer.hpp"
#include "src/port/rapidjson_writer.hpp"
#include <set>
#include <string.h>

#include "openai_json_response.hpp"

Expand Down Expand Up @@ -100,7 +101,6 @@ static size_t appendChunkCallback(void* downloadedChunk, size_t size, size_t nme
if (status == CURLE_OK) { \
status = setopt; \
}

static absl::Status downloadImage(const char* url, std::string& image, const int64_t& sizeLimit) {
CURL* curl_handle = curl_easy_init();
if (!curl_handle) {
Expand All @@ -113,7 +113,11 @@ static absl::Status downloadImage(const char* url, std::string& image, const int
CURL_SETOPT(curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, appendChunkCallback))
CURL_SETOPT(curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, &image))
CURL_SETOPT(curl_easy_setopt(curl_handle, CURLOPT_SSL_OPTIONS, CURLSSLOPT_NATIVE_CA))
CURL_SETOPT(curl_easy_setopt(curl_handle, CURLOPT_FOLLOWLOCATION, 1L))
const char* envAllowRedirects = std::getenv("OVMS_MEDIA_URL_ALLOW_REDIRECTS");
if (envAllowRedirects != nullptr && (std::strcmp(envAllowRedirects, "1") == 0)) {
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "URL redirects allowed");
CURL_SETOPT(curl_easy_setopt(curl_handle, CURLOPT_FOLLOWLOCATION, 1L))
}
CURL_SETOPT(curl_easy_setopt(curl_handle, CURLOPT_MAXFILESIZE, sizeLimit))

if (status != CURLE_OK) {
Expand All @@ -131,6 +135,37 @@ static absl::Status downloadImage(const char* url, std::string& image, const int
return absl::OkStatus();
}

static bool isDomainAllowed(const std::vector<std::string>& allowedDomains, const char* url) {
if (allowedDomains.size() == 1 && allowedDomains[0] == "all") {
return true;
}
CURLUcode rc;
CURLU* parsedUrl = curl_url();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it's a valid pointer at this point?

rc = curl_url_set(parsedUrl, CURLUPART_URL, url, 0);
if (rc) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Parsing url {} failed", url);
curl_url_cleanup(parsedUrl);
return false;
}
char* host;
rc = curl_url_get(parsedUrl, CURLUPART_HOST, &host, 0);
if (rc) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Parsing url {} hostname failed", url);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leak, missing curl_url_cleanup(parsedUrl); ?
use guards or free everything in good order?

curl_url_cleanup(parsedUrl);
return false;
}
bool allowed = false;
for (const auto& allowedDomain : allowedDomains) {
if (allowedDomain.compare(host) == 0) {
allowed = true;
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop continues iterating even after finding a match. Add a break statement after setting allowed = true to exit early when a match is found.

Suggested change
allowed = true;
allowed = true;
break;

Copilot uses AI. Check for mistakes.
break;
}
}
curl_free(host);
curl_url_cleanup(parsedUrl);
return allowed;
}

absl::Status OpenAIChatCompletionsHandler::ensureArgumentsInToolCalls(Value& messageObj, bool& jsonChanged) {
auto& allocator = doc.GetAllocator();
auto toolCallsIt = messageObj.FindMember("tool_calls");
Expand Down Expand Up @@ -159,7 +194,7 @@ absl::Status OpenAIChatCompletionsHandler::ensureArgumentsInToolCalls(Value& mes
return absl::OkStatus();
}

absl::Status OpenAIChatCompletionsHandler::parseMessages(std::optional<std::string> allowedLocalMediaPath) {
absl::Status OpenAIChatCompletionsHandler::parseMessages(std::optional<std::string> allowedLocalMediaPath, std::optional<std::vector<std::string>> allowedMediaDomains) {
auto it = doc.FindMember("messages");
if (it == doc.MemberEnd())
return absl::InvalidArgumentError("Messages missing in request");
Expand Down Expand Up @@ -237,6 +272,9 @@ absl::Status OpenAIChatCompletionsHandler::parseMessages(std::optional<std::stri
} else if (std::regex_match(url.c_str(), std::regex("^(http|https|ftp|sftp|)://(.*)"))) {
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Loading image using curl");
int64_t sizeLimit = 20000000; // restrict single image size to 20MB
if (!allowedMediaDomains.has_value() || !isDomainAllowed(allowedMediaDomains.value(), url.c_str())) {
return absl::InvalidArgumentError("Given url does not match any allowed domain from allowed_media_domains");
}
auto status = downloadImage(url.c_str(), decoded, sizeLimit);
if (status != absl::OkStatus()) {
return status;
Expand Down Expand Up @@ -469,9 +507,9 @@ std::string convertOpenAIResponseFormatToStructuralTagStringFormat(const rapidjs
return buffer.GetString();
}

absl::Status OpenAIChatCompletionsHandler::parseChatCompletionsPart(std::optional<uint32_t> maxTokensLimit, std::optional<std::string> allowedLocalMediaPath) {
absl::Status OpenAIChatCompletionsHandler::parseChatCompletionsPart(std::optional<uint32_t> maxTokensLimit, std::optional<std::string> allowedLocalMediaPath, std::optional<std::vector<std::string>> allowedMediaDomains) {
// messages: [{role: content}, {role: content}, ...]; required
auto status = parseMessages(allowedLocalMediaPath);
auto status = parseMessages(allowedLocalMediaPath, allowedMediaDomains);
if (status != absl::OkStatus()) {
return status;
}
Expand Down Expand Up @@ -791,14 +829,14 @@ void OpenAIChatCompletionsHandler::incrementProcessedTokens(size_t numTokens) {
usage.completionTokens += numTokens;
}

absl::Status OpenAIChatCompletionsHandler::parseRequest(std::optional<uint32_t> maxTokensLimit, uint32_t bestOfLimit, std::optional<uint32_t> maxModelLength, std::optional<std::string> allowedLocalMediaPath) {
absl::Status OpenAIChatCompletionsHandler::parseRequest(std::optional<uint32_t> maxTokensLimit, uint32_t bestOfLimit, std::optional<uint32_t> maxModelLength, std::optional<std::string> allowedLocalMediaPath, std::optional<std::vector<std::string>> allowedMediaDomains) {
absl::Status status = parseCommonPart(maxTokensLimit, bestOfLimit, maxModelLength);
if (status != absl::OkStatus())
return status;
if (endpoint == Endpoint::COMPLETIONS)
status = parseCompletionsPart();
else
status = parseChatCompletionsPart(maxTokensLimit, allowedLocalMediaPath);
status = parseChatCompletionsPart(maxTokensLimit, allowedLocalMediaPath, allowedMediaDomains);

return status;
}
Expand Down
6 changes: 3 additions & 3 deletions src/llm/apis/openai_completions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ class OpenAIChatCompletionsHandler {
std::unique_ptr<OutputParser> outputParser = nullptr;

absl::Status parseCompletionsPart();
absl::Status parseChatCompletionsPart(std::optional<uint32_t> maxTokensLimit, std::optional<std::string> allowedLocalMediaPath);
absl::Status parseChatCompletionsPart(std::optional<uint32_t> maxTokensLimit, std::optional<std::string> allowedLocalMediaPath, std::optional<std::vector<std::string>> allowedMediaDomains);
absl::Status parseCommonPart(std::optional<uint32_t> maxTokensLimit, uint32_t bestOfLimit, std::optional<uint32_t> maxModelLength);

ParsedOutput parseOutputIfNeeded(const std::vector<int64_t>& generatedIds);
Expand Down Expand Up @@ -112,8 +112,8 @@ class OpenAIChatCompletionsHandler {

void incrementProcessedTokens(size_t numTokens = 1);

absl::Status parseRequest(std::optional<uint32_t> maxTokensLimit, uint32_t bestOfLimit, std::optional<uint32_t> maxModelLength, std::optional<std::string> allowedLocalMediaPath = std::nullopt);
absl::Status parseMessages(std::optional<std::string> allowedLocalMediaPath = std::nullopt);
absl::Status parseRequest(std::optional<uint32_t> maxTokensLimit, uint32_t bestOfLimit, std::optional<uint32_t> maxModelLength, std::optional<std::string> allowedLocalMediaPath = std::nullopt, std::optional<std::vector<std::string>> allowedMediaDomains = std::nullopt);
absl::Status parseMessages(std::optional<std::string> allowedLocalMediaPath = std::nullopt, std::optional<std::vector<std::string>> allowedMediaDomains = std::nullopt);
absl::Status parseTools();
const bool areToolsAvailable() const;

Expand Down
Loading