Skip to content

Misbehaviour of debug.getlocal() with enabled JIT #1425

@Buristan

Description

@Buristan

The regression was introduced in the commit a32aead "Handle on-trace OOM errors from helper functions." (but was well-hidden, though). Since the following commit, the pcall() emits an additional snapshot, but the pcall() arguments are momentarily purged from the snapshot since they are not used. Hence, in several cases debug.getlocal() can't find the corresponding slot for the previous frame.

In the example below, it happens due to the Lua stack reallocation on the trace start. The source stack lacks the needed slot. Hence, the VM takes unreachable code flow.

-- XXX: simplify `jit.dump()` output.
local type = type
local pcall = pcall
local math_modf = math.modf
local debug_getlocal = debug.getlocal

jit.opt.start('hotloop=1')

local checkers = {}

-- Called twice for the pseudo-type that aliases base Lua type via
-- checkers map.
local function checks(expected_type)
  -- Value expected to be `checks_tab()` or `checks_obj()`
  -- argument. It is always a table.
  local _, value = debug_getlocal(2, 1)
  -- Simple stitching function. Additional arguments are needed to
  -- occupy the corresponding slot.
  math_modf(0, nil, nil)
  -- Start trace now.
  while true do
    -- Base type?
    if type(value) == expected_type then
      return true
    end
    -- Pseudo types fallbacks to the map.
    local checker = checkers[expected_type]
    if checker(value) == true then
      return true
    end
    break
  end
  error('Unreachable path taken')
end

-- Need to be pcalled.
local function checks_tab(_)
  checks('table')
end

local function checks_tab_p(map)
  return pcall(checks_tab, map)
end

local function checks_obj(_)
  checks('obj')
end

checkers['obj'] = checks_tab_p

checks_obj({})
-- Forcify stack reallocation on trace in `checks()`. The source
-- stack lacks the needed slot.
coroutine.wrap(function()
  checks_obj({})
end)()

Results:

LUA_PATH="src/?.lua;;" src/luajit -jdump=is /tmp/t.lua
---- TRACE 1 start t.lua:21
---- TRACE 1 IR
....        SNAP   #0   [ ---- ---- ---- ---- ---- ]
0001    fun SLOAD  #0    R
0002 >  fun EQ     0001  t.lua:13
0003 >  tab SLOAD  #4    T
0004 >  str SLOAD  #2    T
....        SNAP   #1   [ t.lua:13 ---- ---- ---- ---- ]
0005 >  str NE     0004  "table"
....        SNAP   #2   [ t.lua:13 ---- ---- ---- ---- ]
0006 >  i64 UREFO  t.lua:13  #3  
0007    p64 SUB    0006  0000
0008 >  p64 UGT    0007  +40 
0009 >  tab ULOAD  0006
0010    p64 HREF   0009  0004
0011 >  fun HLOAD  0010
0012 >  fun EQ     0011  t.lua:41
....        SNAP   #3   [ t.lua:13 ---- ---- ---- ---- 0011 pcall ftsz|t.lua:37 ftsz|---- t.lua:13 ftsz|"table" trace: 0x7efdf1ac4828 +0x1.59020051d848p-1028 contpc debug.getlocal ftsz|+2   +1   ]
                                                                       expected 0003 ^^^^
---- TRACE 1 stop -> stitch

One way to fix this is to add the check that we are not going to record a FUNC* BC (IINM, needsnap is set only for pcall(), xpcall() on these BCs) before the snapshot purge:

diff --git a/src/lj_record.c b/src/lj_record.c
index 5c0f00d9..adb47b78 100644
--- a/src/lj_record.c
+++ b/src/lj_record.c
@@ -2276,7 +2276,7 @@ void lj_record_ins(jit_State *J)
   /* Need snapshot before recording next bytecode (e.g. after a store). */
   if (J->needsnap) {
     J->needsnap = 0;
-    if (J->pt) lj_snap_purge(J);
+    if (J->pt && bc_op(*J->pc) < BC_FUNCF) lj_snap_purge(J);
     lj_snap_add(J);
     J->mergesnap = 1;
   }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions