Skip to content

mps-cli-py: complete binary (.mpb) persistency implementation#54

Open
Prithvi686 wants to merge 21 commits intomainfrom
feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency
Open

mps-cli-py: complete binary (.mpb) persistency implementation#54
Prithvi686 wants to merge 21 commits intomainfrom
feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency

Conversation

@Prithvi686
Copy link
Collaborator

@Prithvi686 Prithvi686 commented Feb 12, 2026

Added full binary (.mpb) model persistency support

MPS stores models in three formats: XML (.mps), file-per-root (.model directories), and binary ('.mpb'). The first two were already supported. This PR adds complete support for the binary format.

Newly Added:
SModelBuilderBinaryPersistency.py: Top-level .mpb parser that parses header, registry, model properties, node tree.

binary/registry.py: Registry section parser that populates 'index_2_concept', 'index_2_property', 'index_2_reference_role', 'index_2_child_role_in_parent', 'concept_id_2_concept'

binary/nodes.py: Node tree parser that containing methods 'read_children', 'read_node', '_read_reference', '_read_node_id'

binary/node_id_utils.py: NodeIdEncodingUtils class that encodes/decodes MPS node IDs between raw long and Base64-variant strings

Modified:
SSolutionBuilder.py: 'build_all()' now processes '.mpb' files in parallel via 'ProcessPoolExecutor' and also added 'USE_CACHE', 'CACHE_LOAD_FN', 'CACHE_SAVE_FN' hooks.

SSolutionsRepositoryBuilder.py: Three performance optimisations:

  • before extracting a jar, the zip central directory is peeked to check for any '.msd' file.
  • The remaining JARs (those with .msd) are extracted and parsed concurrently via a 'ThreadPoolExecutor' where JAR extraction is I/O-bound, so threads are the appropriate primitive here and no per-process spawn overhead.
  • Within each solution, .mpb files are parsed by worker processes thru 'ProcessPoolExecutor' rather than threads and parsing is cpu-bound (binary decoding, string table lookups), and the 'MPB_PARALLEL_THRESHOLD = 4' guard skips the pool for small batches and parses them serially and also solutions in a single JAR share one pool creation, so the cost is paid at most once per jar file.
  • After the first parse, each 'SModel' is saved to '~/.mps_cli_cache/' directory keyed by 'md5(path, time, size)' and on subsequent runs files whose path and modification time and size are unchanged are loaded from this directory and the cache is invalidated automatically when a file changes so no manual invalidation is needed but the tests always set 'USE_CACHE = false' to ensure fresh parses

demo.py: parses a plugins directory or test project, prints a structured summary of all solutions/models/nodes, runs a verification pass, and writes output to a timestamped log file

What is parsed and stored in SModel:
Every .mpb file becomes one 'SModel' containing:

  • 'uuid' which is Java-style uuid string 'r:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' matching 'UUID.toString()' in Java
  • 'name' which is a fully qualified model name (ex: 'jetbrains.mps.vcs.diff')
  • 'root_nodes' which is a list of 'SNode' objects, each carrying:
    • 'uuid': Base64-variant encoded node ID
    • 'concept': 'SConcept' with the fully qualified concept name resolved from the registry
    • 'role_in_parent': the containment link name this node fills in its parent
    • 'parent': back-reference to the containing 'SNode' (means none for root nodes)
    • 'properties': 'dict[str, str]' of all string-valued property name and value pairs
    • 'references' — 'dict[str, SNodeRef]' mapping reference role name to 'SNodeRef', which carries 'model_uuid', 'node_uuid', and 'resolve_info'
    • 'children' — ordered list of child 'SNode' objects (recursive)

9 new test files (~75 new test methods) have been added to verify all the parsing scenarios.

ratiud and others added 11 commits December 23, 2025 11:10
- Refactored binary persistency implementation to
  separate constants, low-level reader utilities.
- Fixed model header parsing to correctly handle
  model-reference kind vs model-id kind according
  to MPS binary persistency format.
- Correctly reconstruct model UUID with 'r:' prefix.
- Updated low-level test expectations to reflect fully-
  qualified model names.
… persistency and added low-level tests covering imports
… read_reference

- Integrated node loading into
  SModelBuilderBinaryPersistency
- Added root_nodes structure
- Extended tests to validate full model tree parsing
…rchitecture

- Removed registry dict usage
- Integrated index_2_* maps from base builder
- Construct real SConcept, SProperty, SNode instances
- Unified binary builder structure with XML persistency
- Updated tests to validate object-based model structure
- Implement full binary (.mpb) model parsing
- Load model header, registry, used languages and imports
- Build concept/property/reference/containment index maps
- Parse node tree including containment roles and properties
- Added support for reference kind validation and resolve_info
- Applied node id encoding during parsing
- Added repository-level completeness and resolution tests
@Prithvi686 Prithvi686 marked this pull request as draft February 12, 2026 08:49
…ode ids and also corrected existing test case failure
1) Fixed wrong field order issue in nodes.py to correctly parse node info in mpb file.

2) Corrected the parser to now correctly handle all the structural variants encountered across real plugin mpb files with V3 stream format (0x00000500) and the mpb files that use a DEPENDENCY_V1 byte.

3) Implemented complete binary persistency for real plugin mpb files but extensive py tests are still pending

4) Corrected a few issues with parsing model uuid and implemented logic to build models in parallel instead of one-by-one by using separate processes to get around Python's speed limits using concurrent.futures.ProcessPoolExecutor API.

5) Also, improved parser performance by implementing logic to only peek into jar files first to determine if msd files are present and only then extract the jars to parse mpb files. This significantly reduced the parse execution time from ~248 seconds to hardly ~9 seconds.
@Prithvi686 Prithvi686 force-pushed the feature/E3AARCHAI-23018_enhance_mps_cli_py_with_support_for_binary_persistency branch from 04d648d to 9efb3d5 Compare March 19, 2026 17:25
…rs, imports, node trees, libraries,references, registry entries, language extraction, registry parsing performance and fixed two failing tests and a few mini cleanups
…lderBinaryPersistency to correctly format uuid's
…iles in parallel is already handled by SSolutionBuilder
@Prithvi686 Prithvi686 marked this pull request as ready for review March 25, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants