Skip to content

refactor(iast): prevent out-of-bounds read in slice aspect with negative step#18129

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 2 commits into
mainfrom
kowalski/fix-iast-prevent-out-of-bounds-read-in-slice-aspect-with-negative-step
May 19, 2026
Merged

refactor(iast): prevent out-of-bounds read in slice aspect with negative step#18129
gh-worker-dd-mergequeue-cf854d[bot] merged 2 commits into
mainfrom
kowalski/fix-iast-prevent-out-of-bounds-read-in-slice-aspect-with-negative-step

Conversation

@KowalskiThomas
Copy link
Copy Markdown
Contributor

@KowalskiThomas KowalskiThomas commented May 18, 2026

Description

build_index_range_map executes a loop:

for (auto i = start_int; i < stop_int; i += step_int) {
  index_range_map_result.emplace_back(index_range_map[i]);
}

... where start_int, stop_int, and step_int are all signed longs read directly from caller-controlled PyLong objects (via PyLong_AsLong) with no normalization through PySlice_AdjustIndices. When the application code uses a negative-step slice on a tainted string (e.g. tainted[::-1]), the AST visitor at ddtrace/appsec/_iast/_ast/visitor.py rewrites the expression to slice_aspect(s, None, None, -1).
Inside build_index_range_map, start_int=0, stop_int=length_text, step_int=-1. The first iteration uses i=0 (valid); the second iteration uses i=-1, then -2, etc.

Each index_range_map[i] call invokes std::vector<TaintRangePtr>::operator[](size_t), which converts the negative long to (size_t)-1 = SIZE_MAX, reading out of bounds. The result is then emplace_back'ed into the result vector and used to build a shared_ptr<TaintRange> by dereferencing the obtained pointer.

Reproducer that crashes pre-fix/

Details
#!/usr/bin/env python3
"""
Reproducer for: out-of-bounds read in slice aspect with negative step.

With the buggy code, build_index_range_map looped:

    for (auto i = start_int; i < stop_int; i += step_int)

With step=-1: i = 0, -1, -2, ... all satisfy i < stop_int, so the loop never
terminates and reads past the beginning of the vector (UB / crash).

Run with:
    DD_IAST_ENABLED=1 python repro.py
"""
import os

os.environ.setdefault("DD_IAST_ENABLED", "1")
from ddtrace.appsec._iast._taint_tracking import OriginType
from ddtrace.appsec._iast._taint_tracking import initialize_native_state
from ddtrace.appsec._iast._taint_tracking._taint_objects import taint_pyobject
from ddtrace.appsec._iast._taint_tracking._taint_objects_base import get_tainted_ranges
from tests.appsec.iast.iast_utils import (
    _end_iast_context_and_oce,
    _iast_patched_module,
    _start_iast_context_and_oce,
)

# initialize_native_state allocates the C++ TaintEngineContext singleton;
# without it start_request_context() always returns None.
initialize_native_state()

# The module must be loaded patched so that slicing goes through the aspect.
mod = _iast_patched_module("benchmarks.bm.iast_fixtures.str_methods")


def main() -> None:
    _start_iast_context_and_oce()
    try:
        s: str = taint_pyobject(
            pyobject="hello",
            source_name="input",
            source_value="hello",
            source_origin=OriginType.PARAMETER,
        )

        # All three cases use a negative step → triggered the OOB read.
        # do_slice(text, first, second, third) maps to text[first:second:third];
        # passing None uses the default (same as omitting the index).
        cases: list[tuple[str, object, object, int]] = [
            ("s[::-1]",   None, None, -1),
            ("s[::-2]",   None, None, -2),
            ("s[4:0:-1]", 4,    0,    -1),
        ]

        for label, first, second, step in cases:
            print(f"trying {label} ...", flush=True)
            result = mod.do_slice(s, first, second, step)
            ranges = get_tainted_ranges(result)
            print(f"  result={result!r}, tainted_ranges={ranges}")
            assert len(ranges) == 1, f"expected 1 tainted range, got {ranges}"

        print("OK - all slices returned correct tainted ranges")
    finally:
        _end_iast_context_and_oce()


if __name__ == "__main__":
    main()

@KowalskiThomas KowalskiThomas added the changelog/no-changelog A changelog entry is not required for this PR. label May 18, 2026
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 18, 2026

Benchmarks

Benchmark execution time: 2026-05-18 15:21:27

Comparing candidate commit 12c71cd in PR branch kowalski/fix-iast-prevent-out-of-bounds-read-in-slice-aspect-with-negative-step with baseline commit 7ec0ac7 in branch main.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 79 metrics, 0 unstable metrics.

scenario:iastaspectsospath-ospathbasename_aspect

  • 🟥 execution_time [+103.570µs; +111.154µs] or [+24.383%; +26.169%]

@KowalskiThomas KowalskiThomas force-pushed the kowalski/fix-iast-prevent-out-of-bounds-read-in-slice-aspect-with-negative-step branch from 63fa06a to 36f686a Compare May 18, 2026 09:13
@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented May 18, 2026

Codeowners resolved as

ddtrace/appsec/_iast/_taint_tracking/aspects/aspect_slice.cpp           @DataDog/asm-python
tests/appsec/iast/aspects/test_slice_aspect_fixtures.py                 @DataDog/asm-python

@KowalskiThomas KowalskiThomas changed the title fix(iast): prevent out-of-bounds read in slice aspect with negative step refactor(iast): prevent out-of-bounds read in slice aspect with negative step May 18, 2026
@KowalskiThomas KowalskiThomas added the ASM Application Security Monitoring label May 18, 2026
@KowalskiThomas KowalskiThomas marked this pull request as ready for review May 18, 2026 13:19
@KowalskiThomas KowalskiThomas requested a review from a team as a code owner May 18, 2026 13:19
@KowalskiThomas KowalskiThomas requested a review from avara1986 May 18, 2026 15:09
Copy link
Copy Markdown
Member

@avara1986 avara1986 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for your findings and for improving the code 😉

@avara1986
Copy link
Copy Markdown
Member

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented May 19, 2026

View all feedbacks in Devflow UI.

2026-05-19 06:57:50 UTC ℹ️ Start processing command /merge


2026-05-19 06:57:55 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 54m (p90).


2026-05-19 07:43:58 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 72b2cd8 into main May 19, 2026
585 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the kowalski/fix-iast-prevent-out-of-bounds-read-in-slice-aspect-with-negative-step branch May 19, 2026 07:43
vlad-scherbich pushed a commit that referenced this pull request May 20, 2026
…ive step (#18129)

## Description

`build_index_range_map` executes a loop:

```cpp
for (auto i = start_int; i < stop_int; i += step_int) {
  index_range_map_result.emplace_back(index_range_map[i]);
}
```

... where `start_int`, `stop_int`, and `step_int` are all signed longs read directly from caller-controlled `PyLong` objects (via `PyLong_AsLong`) with no normalization through `PySlice_AdjustIndices`. When the application code uses a negative-step slice on a tainted string (e.g. `tainted[::-1]`), the AST visitor at `ddtrace/appsec/_iast/_ast/visitor.py` rewrites the expression to `slice_aspect(s, None, None, -1)`. 
Inside `build_index_range_map`, `start_int=0`, `stop_int=length_text`, `step_int=-1`. The first iteration uses  `i=0` (valid); the second iteration uses `i=-1`, then `-2`, etc. 

Each `index_range_map[i]` call invokes `std::vector<TaintRangePtr>::operator[](size_t)`, which converts the negative long to `(size_t)-1 = SIZE_MAX`, reading  out of bounds. The result is then `emplace_back`'ed into the result vector and used to build a `shared_ptr<TaintRange>` by dereferencing the obtained pointer.

Reproducer that crashes pre-fix/

<details>

```py
#!/usr/bin/env python3
"""
Reproducer for: out-of-bounds read in slice aspect with negative step.

With the buggy code, build_index_range_map looped:

    for (auto i = start_int; i < stop_int; i += step_int)

With step=-1: i = 0, -1, -2, ... all satisfy i < stop_int, so the loop never
terminates and reads past the beginning of the vector (UB / crash).

Run with:
    DD_IAST_ENABLED=1 python repro.py
"""
import os

os.environ.setdefault("DD_IAST_ENABLED", "1")
from ddtrace.appsec._iast._taint_tracking import OriginType
from ddtrace.appsec._iast._taint_tracking import initialize_native_state
from ddtrace.appsec._iast._taint_tracking._taint_objects import taint_pyobject
from ddtrace.appsec._iast._taint_tracking._taint_objects_base import get_tainted_ranges
from tests.appsec.iast.iast_utils import (
    _end_iast_context_and_oce,
    _iast_patched_module,
    _start_iast_context_and_oce,
)

# initialize_native_state allocates the C++ TaintEngineContext singleton;
# without it start_request_context() always returns None.
initialize_native_state()

# The module must be loaded patched so that slicing goes through the aspect.
mod = _iast_patched_module("benchmarks.bm.iast_fixtures.str_methods")


def main() -> None:
    _start_iast_context_and_oce()
    try:
        s: str = taint_pyobject(
            pyobject="hello",
            source_name="input",
            source_value="hello",
            source_origin=OriginType.PARAMETER,
        )

        # All three cases use a negative step → triggered the OOB read.
        # do_slice(text, first, second, third) maps to text[first:second:third];
        # passing None uses the default (same as omitting the index).
        cases: list[tuple[str, object, object, int]] = [
            ("s[::-1]",   None, None, -1),
            ("s[::-2]",   None, None, -2),
            ("s[4:0:-1]", 4,    0,    -1),
        ]

        for label, first, second, step in cases:
            print(f"trying {label} ...", flush=True)
            result = mod.do_slice(s, first, second, step)
            ranges = get_tainted_ranges(result)
            print(f"  result={result!r}, tainted_ranges={ranges}")
            assert len(ranges) == 1, f"expected 1 tainted range, got {ranges}"

        print("OK - all slices returned correct tainted ranges")
    finally:
        _end_iast_context_and_oce()


if __name__ == "__main__":
    main()
```

</details>

Co-authored-by: alberto.vara <alberto.vara@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASM Application Security Monitoring changelog/no-changelog A changelog entry is not required for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants