-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
I indexed some .cpp files as described in #90 (comment), adding a - doc_path: ... entry, and running llmsearch index update, but without restarting the llmsearch interact webapp ....
When I then query something via the web UI, I get:
2025-05-30 00:18:38.969 | DEBUG | __main__:<module>:246 - CONFIG FILE: /home/ubuntu/llm-search/configs/niklas-config-1.yaml
2025-05-30 00:18:38.975 | DEBUG | llmsearch.ranking:get_relevant_documents:105 - Evaluating query: What's the name of the API endpoint that generates thumbnails?
2025-05-30 00:18:38.975 | INFO | llmsearch.ranking:get_relevant_documents:107 - Adding query prefix for retrieval: query:
2025-05-30 00:18:38.975 | INFO | llmsearch.splade:query:248 - SPLADE search will search over all documents of chunk size: 1024. Number of docs: 2865
────────────────────────── Traceback (most recent call last) ───────────────────────────
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_
code.py:121 in exec_func_with_error_handling
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/scrip
t_runner.py:645 in code_to_exec
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:342 in <module>
339 │ │ │ │ conv_history_rewrite_query
340 │ │ │ )
341 │ │
❱ 342 │ │ output = generate_response(
343 │ │ │ question=text,
344 │ │ │ use_hyde=st.session_state["llm_bundle"].hyde_enabled,
345 │ │ │ use_multiquery=st.session_state["llm_bundle"].multiquery_enabled,
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util
s.py:219 in __call__
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util
s.py:261 in _get_or_create_cached_value
/home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util
s.py:320 in _handle_cache_miss
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:175 in
generate_response
172 ):
173 │ # _config and _bundle are under scored so paratemeters aren't hashed
174 │
❱ 175 │ output = get_and_parse_response(
176 │ │ query=question, config=_config, llm_bundle=_bundle, label=label_filter
177 │ )
178 │ return output
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/process.py:66 in
get_and_parse_response
63 │ │ offset_max_chars = 0
64 │
65 │ semantic_search_config = config.semantic_search
❱ 66 │ most_relevant_docs, score = get_relevant_documents(
67 │ │ original_query, queries, llm_bundle, semantic_search_config, label=lab
68 │ │ offset_max_chars = offset_max_chars
69 │ )
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/ranking.py:109 in
get_relevant_documents
106 │ │ │ if config.query_prefix:
107 │ │ │ │ logger.info(f"Adding query prefix for retrieval: {config.query
108 │ │ │ │ query = config.query_prefix + query
❱ 109 │ │ │ sparse_search_docs_ids, sparse_scores = sparse_retriever.query(
110 │ │ │ │ search=query, n=config.max_k, label=label, chunk_size=chunk_si
111 │ │ │ )
112
/home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/splade.py:253 in query
250 │ │ │ )
251 │ │
252 │ │ # print(indices)
❱ 253 │ │ embeddings = self._embeddings[indices] # type: ignore
254 │ │ ids = self._ids[indices] # type: ignore
255 │ │ l2_norm_matrix = scipy.sparse.linalg.norm(embeddings, axis=1)
256
/home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:30 in
__getitem__
27 │ This class provides common dispatching and validation logic for indexing.
28 │ """
29 │ def __getitem__(self, key):
❱ 30 │ │ index, new_shape = self._validate_indices(key)
31 │ │
32 │ │ # 1D array
33 │ │ if len(index) == 1:
/home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:288 in
_validate_indices
285 │ │ │ │ index_ndim = tmp_ndim
286 │ │ │ else: # dense array
287 │ │ │ │ N = self._shape[index_ndim]
❱ 288 │ │ │ │ idx = self._asindices(idx, N)
289 │ │ │ │ index.append(idx)
290 │ │ │ │ array_indices.append(index_ndim)
291 │ │ │ │ index_ndim += 1
/home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:332 in
_asindices
329 │ │ # Check bounds
330 │ │ max_indx = x.max()
331 │ │ if max_indx >= length:
❱ 332 │ │ │ raise IndexError('index (%d) out of range' % max_indx)
333 │ │
334 │ │ min_indx = x.min()
335 │ │ if min_indx < 0:
────────────────────────────────────────────────────────────────────────────────────────
IndexError: index (2864) out of range
It seems fixed when I restart llmsearch interact webapp, AND run llmsearch index create ... instead of llmsearch index update ...
Is that expected?
If yes, it would be nice to get a better error than IndexError, to tell me that I have to restart the whole webapp after changing the config.
But then again, if I add another entry for another programming language, the IndexError persists.
Metadata
Metadata
Assignees
Labels
No labels