IndexError: index (2864) out of range when not re-creating index or restarting webapp after config change

I indexed some `.cpp` files as described in https://github.com/snexus/llm-search/issues/90#issuecomment-2920852058, adding a `- doc_path: ...` entry, and running `llmsearch index update`, but without restarting the `llmsearch interact webapp ...`.

When I then query something via the web UI, I get:

```
2025-05-30 00:18:38.969 | DEBUG    | __main__:<module>:246 - CONFIG FILE: /home/ubuntu/llm-search/configs/niklas-config-1.yaml
2025-05-30 00:18:38.975 | DEBUG    | llmsearch.ranking:get_relevant_documents:105 - Evaluating query: What's the name of the API endpoint that generates thumbnails?
2025-05-30 00:18:38.975 | INFO     | llmsearch.ranking:get_relevant_documents:107 - Adding query prefix for retrieval: query: 
2025-05-30 00:18:38.975 | INFO     | llmsearch.splade:query:248 - SPLADE search will search over all documents of chunk size: 1024. Number of docs: 2865
────────────────────────── Traceback (most recent call last) ───────────────────────────
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/exec_  
  code.py:121 in exec_func_with_error_handling                                          
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/scriptrunner/scrip  
  t_runner.py:645 in code_to_exec                                                       
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:342 in <module>   
                                                                                        
    339 │   │   │   │   conv_history_rewrite_query                                      
    340 │   │   │   )                                                                   
    341 │   │                                                                           
  ❱ 342 │   │   output = generate_response(                                             
    343 │   │   │   question=text,                                                      
    344 │   │   │   use_hyde=st.session_state["llm_bundle"].hyde_enabled,               
    345 │   │   │   use_multiquery=st.session_state["llm_bundle"].multiquery_enabled,   
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util  
  s.py:219 in __call__                                                                  
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util  
  s.py:261 in _get_or_create_cached_value                                               
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/streamlit/runtime/caching/cache_util  
  s.py:320 in _handle_cache_miss                                                        
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/webapp.py:175 in            
  generate_response                                                                     
                                                                                        
    172 ):                                                                              
    173 │   # _config and _bundle are under scored so paratemeters aren't hashed        
    174 │                                                                               
  ❱ 175 │   output = get_and_parse_response(                                            
    176 │   │   query=question, config=_config, llm_bundle=_bundle, label=label_filter  
    177 │   )                                                                           
    178 │   return output                                                               
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/process.py:66 in            
  get_and_parse_response                                                                
                                                                                        
     63 │   │   offset_max_chars = 0                                                    
     64 │                                                                               
     65 │   semantic_search_config = config.semantic_search                             
  ❱  66 │   most_relevant_docs, score = get_relevant_documents(                         
     67 │   │   original_query, queries, llm_bundle, semantic_search_config, label=lab  
     68 │   │   offset_max_chars = offset_max_chars                                     
     69 │   )                                                                           
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/ranking.py:109 in           
  get_relevant_documents                                                                
                                                                                        
    106 │   │   │   if config.query_prefix:                                             
    107 │   │   │   │   logger.info(f"Adding query prefix for retrieval: {config.query  
    108 │   │   │   │   query = config.query_prefix + query                             
  ❱ 109 │   │   │   sparse_search_docs_ids, sparse_scores = sparse_retriever.query(     
    110 │   │   │   │   search=query, n=config.max_k, label=label, chunk_size=chunk_si  
    111 │   │   │   )                                                                   
    112                                                                                 
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/llmsearch/splade.py:253 in query      
                                                                                        
    250 │   │   │   )                                                                   
    251 │   │                                                                           
    252 │   │   # print(indices)                                                        
  ❱ 253 │   │   embeddings = self._embeddings[indices]  # type: ignore                  
    254 │   │   ids = self._ids[indices]  # type: ignore                                
    255 │   │   l2_norm_matrix = scipy.sparse.linalg.norm(embeddings, axis=1)           
    256                                                                                 
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:30 in          
  __getitem__                                                                           
                                                                                        
     27 │   This class provides common dispatching and validation logic for indexing.   
     28 │   """                                                                         
     29 │   def __getitem__(self, key):                                                 
  ❱  30 │   │   index, new_shape = self._validate_indices(key)                          
     31 │   │                                                                           
     32 │   │   # 1D array                                                              
     33 │   │   if len(index) == 1:                                                     
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:288 in         
  _validate_indices                                                                     
                                                                                        
    285 │   │   │   │   index_ndim = tmp_ndim                                           
    286 │   │   │   else:  # dense array                                                
    287 │   │   │   │   N = self._shape[index_ndim]                                     
  ❱ 288 │   │   │   │   idx = self._asindices(idx, N)                                   
    289 │   │   │   │   index.append(idx)                                               
    290 │   │   │   │   array_indices.append(index_ndim)                                
    291 │   │   │   │   index_ndim += 1                                                 
                                                                                        
  /home/ubuntu/.venv/lib/python3.12/site-packages/scipy/sparse/_index.py:332 in         
  _asindices                                                                            
                                                                                        
    329 │   │   # Check bounds                                                          
    330 │   │   max_indx = x.max()                                                      
    331 │   │   if max_indx >= length:                                                  
  ❱ 332 │   │   │   raise IndexError('index (%d) out of range' % max_indx)              
    333 │   │                                                                           
    334 │   │   min_indx = x.min()                                                      
    335 │   │   if min_indx < 0:                                                        
────────────────────────────────────────────────────────────────────────────────────────
IndexError: index (2864) out of range
```

It seems fixed when I restart `llmsearch interact webapp`, AND run `llmsearch index create ...` instead of `llmsearch index update ...`

Is that expected?

If yes, it would be nice to get a better error than `IndexError`, to tell me that I have to restart the whole webapp after changing the config.

But then again, if I add another entry for another programming language, the `IndexError` persists.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IndexError: index (2864) out of range when not re-creating index or restarting webapp after config change #131

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

IndexError: index (2864) out of range when not re-creating index or restarting webapp after config change #131

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions