Fix weights by tonywu1999 · Pull Request #184 · Vitek-Lab/MSstats

tonywu1999 · 2026-02-19T02:22:05Z

TBD to Merge

Motivation and Context

In group comparison models, when observations have associated variance estimates, inverse-variance weighting can be applied to give more weight to observations with lower variance (higher precision). Previously, weights were passed directly as inverse-variance values, which could inadvertently affect the scale of variance components. This PR normalizes the weights to sum to the sample size so that sample size calculation between MSstats and MSstats+ are comparable.

Solution

The weighting logic in .fitModelForGroupComparison has been updated to normalize inverse-variance weights. When a Variance column is present and contains non-NA values, the weights are now computed as: weight_input = (1/Variance) / mean(1/Variance), ensuring weights sum to the sample size

Detailed Changes

R/utils_groupcomparison.R (lines 142-147):
- Modified weight calculation logic in .fitModelForGroupComparison function
- Changed from passing weights as-is to computing normalized inverse-variance weights internally
- Introduced intermediate variable unnormalized_weight_input = 1/input$Variance for clarity
- Applied normalization: weight_input = unnormalized_weight_input / mean(unnormalized_weight_input)
- Preserves existing behavior for cases without variance information (weight_input = NULL)

Unit Tests

inst/tinytest/test_utils_groupComparison_model.R:
- Test verifies that weights are applied to the fitted model when Variance column is provided
- Currently expects unnormalized weights 1/Variance, requiring update to verify normalized weights calculation

Coding Guidelines Violations

The existing test in test_utils_groupComparison_model.R (lines 16-17) does not account for the weight normalization introduced in this PR. The test assertion expect_equal(result$full_fit$weights, expected_weights) where expected_weights = 1/test_input_with_variance$Variance will fail because the actual weights are normalized by their mean. The test should be updated to check for expected_weights / mean(expected_weights) to properly validate the normalized weighting behavior.

coderabbitai · 2026-02-19T02:22:15Z

📝 Walkthrough

Walkthrough

This change modifies the weighting logic in .fitModelForGroupComparison when a Variance column is present. Instead of simple inverse-variance weighting, it now applies normalized inverse-variance weighting by dividing inverse variances by their mean, ensuring weights average to 1.

Changes

Cohort / File(s)	Summary
Internal weighting logic modification `R/utils_groupcomparison.R`	Updated inverse-variance weighting to normalize weights by their mean. Replaces single-step calculation with two-step approach to produce weights that average to 1.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A hop through variance weights so fair,
Normalized now with balanced care,
No more wild swings from here to there,
Mean of one keeps all affairs square,
The bunny approves this weighting affair! 🌟

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is empty and does not follow the required template. All mandatory sections (Motivation and Context, Changes, Testing, and Checklist) are missing.	Add a complete pull request description following the template: include motivation/context for the weight normalization change, a detailed bullet-point list of modifications, description of testing performed, and complete the checklist items.
Title check	❓ Inconclusive	The title 'Fix weights' is vague and generic; it refers to a real change (weight normalization) but lacks specificity about what was fixed or how.	Use a more descriptive title such as 'Normalize inverse-variance weights to have mean of 1' to clearly convey the specific change made.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-weights

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-19T02:23:00Z

Failed to generate code suggestions for PR

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@R/utils_groupcomparison.R`:
- Around line 142-147: The current normalization of unnormalized_weight_input
uses mean(unnormalized_weight_input) which returns NA if any Variance is NA;
change the normalization to mean(unnormalized_weight_input, na.rm = TRUE) and
then guard against remaining NA entries by replacing NA weights with 0 (e.g.
after computing weight_input, do weight_input[is.na(weight_input)] <- 0) so
downstream lm()/lmer() calls don't receive all-NA weights; update the code that
references "Variance", "unnormalized_weight_input", and "weight_input"
accordingly and add/adjust tests to cover the partial-NA Variance case.

coderabbitai · 2026-02-19T22:46:10Z

R/utils_groupcomparison.R

    if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){
-        weight_input = 1/input$Variance
+        unnormalized_weight_input = 1/input$Variance
+        weight_input = unnormalized_weight_input / mean(unnormalized_weight_input)
    } else {
        weight_input = NULL
    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n R/utils_groupcomparison.R | head -200

Repository: Vitek-Lab/MSstats

Length of output: 8797

🏁 Script executed:

# Search for where Variance column is created/assigned rg -n "Variance\s*=" --type r -B 3 -A 1 | head -100

Repository: Vitek-Lab/MSstats

Length of output: 1999

🏁 Script executed:

# Read the calculateProteinVariance function cat -n R/utils_feature_selection.R | sed -n '246,300p'

Repository: Vitek-Lab/MSstats

Length of output: 2304

🏁 Script executed:

# Search for where Variance is assigned to ProteinLevelData or similar rg -n "ProteinLevelData.*Variance|Variance.*ProteinLevel" --type r -B 2 -A 2

Repository: Vitek-Lab/MSstats

Length of output: 1616

🏁 Script executed:

# Search more broadly for Variance creation in dataProcess pipeline rg -n "var_resid|Variance" --type r R/ | grep -E "(var_resid|Variance\s*=|Variance\$)" | head -50

Repository: Vitek-Lab/MSstats

Length of output: 745

🏁 Script executed:

# Search for where var_resid is renamed to Variance rg -n "var_resid.*Variance|Variance.*var_resid|setnames.*[Vv]ariance" --type r -B 1 -A 1

Repository: Vitek-Lab/MSstats

Length of output: 309

🏁 Script executed:

# Look for where var_resid might be renamed to Variance by checking column selection/renaming rg -n "Variance|var_resid" --type r R/dataProcess.R | head -50

Repository: Vitek-Lab/MSstats

Length of output: 43

🏁 Script executed:

# Find all R files and see what's the main entry point fd "\.R$" R/ | head -20

Repository: Vitek-Lab/MSstats

Length of output: 515

🏁 Script executed:

# Check dataProcess.R for Variance column handling cat -n R/dataProcess.R | grep -i "variance\|var_resid" -B 2 -A 2

Repository: Vitek-Lab/MSstats

Length of output: 3315

🏁 Script executed:

# Look for the actual output structure - search for ProteinLevelData creation rg -n "ProteinLevelData\s*=" --type r R/dataProcess.R -A 5 | head -80

Repository: Vitek-Lab/MSstats

Length of output: 43

🏁 Script executed:

# Let's check the groupComparison.R file to see how the data flows into the function cat -n R/groupComparison.R | head -150

Repository: Vitek-Lab/MSstats

Length of output: 10509

🏁 Script executed:

# Test R behavior with mean() and partial NAs python3 << 'EOF' import subprocess result = subprocess.run(['Rscript', '-e', ''' # Test mean with partial NAs x = c(1, 2, NA, 4, 5) print("Vector with partial NAs:") print(x) print("mean(x) with na.rm=FALSE (default):") print(mean(x)) print("mean(x) with na.rm=TRUE:") print(mean(x, na.rm=TRUE)) # Test division by NA result y = 1/c(0.1, 0.2, NA, 0.3) print("1/c(0.1, 0.2, NA, 0.3):") print(y) print("mean of that:") print(mean(y)) print("Dividing y by mean(y):") print(y / mean(y)) '''], capture_output=True, text=True) print(result.stdout) if result.stderr: print("STDERR:", result.stderr) EOF

Repository: Vitek-Lab/MSstats

Length of output: 616

🏁 Script executed:

# Search for test cases that might show Variance with NA values rg -n "Variance.*NA|NA.*Variance" --type r -B 2 -A 2 | head -80

Repository: Vitek-Lab/MSstats

Length of output: 577

🏁 Script executed:

# Search for tests that explicitly test Variance with NAs rg -n "all.*is.na.*Variance|is.na.*Variance" --type r -B 3 -A 3

Repository: Vitek-Lab/MSstats

Length of output: 634

🏁 Script executed:

# Look for test files that might test the weight calculation logic cat -n inst/tinytest/test_utils_groupComparison_model.R

Repository: Vitek-Lab/MSstats

Length of output: 960

🏁 Script executed:

# Check if the test at line 17 is actually passing or if there's a mismatch # The test expects unwormalized weights but the code normalizes them cat -n inst/tinytest/test_utils_groupComparison_model.R | sed -n '14,18p'

Repository: Vitek-Lab/MSstats

Length of output: 392

Missing na.rm = TRUE in mean() silently zeros out all weights when any Variance is NA

The guard at line 142 only rejects the case where all variances are NA. When some (but not all) rows have NA variance, the branch is entered, unnormalized_weight_input contains some NA elements, and mean(unnormalized_weight_input) returns NA (R's default is na.rm = FALSE). Every element of weight_input then becomes NA, and the all-NA weight vector is forwarded to every lm() / lmer() call below — effectively dropping all observations and producing a failed or degenerate model.

🐛 Proposed fix

if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){ unnormalized_weight_input = 1/input$Variance - weight_input = unnormalized_weight_input / mean(unnormalized_weight_input) + weight_input = unnormalized_weight_input / mean(unnormalized_weight_input, na.rm = TRUE) } else { weight_input = NULL }

Note: The existing test (inst/tinytest/test_utils_groupComparison_model.R:17) expects unnormalized weights but does not cover the partial-NA case, which is where this bug surfaces. The test should be updated to verify normalization and to include a case with partial NAs in the Variance column.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){

weight_input = 1/input$Variance

unnormalized_weight_input = 1/input$Variance

weight_input = unnormalized_weight_input / mean(unnormalized_weight_input)

} else {

weight_input = NULL

}

if ("Variance" %in% colnames(input) & !all(is.na(input$Variance))){

unnormalized_weight_input = 1/input$Variance

weight_input = unnormalized_weight_input / mean(unnormalized_weight_input, na.rm = TRUE)

} else {

weight_input = NULL

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@R/utils_groupcomparison.R` around lines 142 - 147, The current normalization of unnormalized_weight_input uses mean(unnormalized_weight_input) which returns NA if any Variance is NA; change the normalization to mean(unnormalized_weight_input, na.rm = TRUE) and then guard against remaining NA entries by replacing NA weights with 0 (e.g. after computing weight_input, do weight_input[is.na(weight_input)] <- 0) so downstream lm()/lmer() calls don't receive all-NA weights; update the code that references "Variance", "unnormalized_weight_input", and "weight_input" accordingly and add/adjust tests to cover the partial-NA Variance case.

tonywu1999 added 2 commits February 18, 2026 20:49

normalize weights to 1

9c82180

unnromalized variable name

40da5cf

tonywu1999 marked this pull request as ready for review February 19, 2026 22:41

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix weights#184

Fix weights#184
tonywu1999 wants to merge 2 commits intodevelfrom
fix-weights

tonywu1999 commented Feb 19, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonywu1999 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Solution

Detailed Changes

Unit Tests

Coding Guidelines Violations

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonywu1999 commented Feb 19, 2026 •

edited

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading