Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356
Group-based Splitting for Benchmarks + Batch-layer Evaluation + GPU Support #356BKHMSI wants to merge 4 commits intobrain-score:mainfrom
Conversation
|
Hi @BKHMSI did you intend to remove Pereira2018.243sentences-linear? |
Yes, I did intend to change all benchmarks to use ridge regression instead of linear. |
|
let's keep the original ones for reference? (at least in code, don't have to display on the website.) |
|
Re-added the linear metrics for all benchmarks and fixed ceiling for ridge. Note that for Pereira2018, we need to cache ceilings for the new metrics. |
|
Hi all, Please note a few things about the state of the language repo:
|
|
Hi @BKHMSI thanks for the PR. I had a chance to take a look at it and I noticed a few things that require attention:
i've attempted at addressing all of these issues in #361. The most significant differences are:
If #361 looks good to you, please let me know, otherwise, I hope it can be benefit to you. |
|
@BKHMSI I tried running I think it was because Did you ever update the |
|
Hi @KartikP, thanks for looking into the PR. Yes, I had the same issue with @mschrimpf any ideas on this? |
|
also, @mschrimpf do you recommend taking the mean of evaluated layers as the default final score? @KartikP changed it to the following: I think we can stay with the |
The way I handled it locally was to essentially filter Nan/Inf before passing to curve_fit and then also adding a catch for |
I maintain that the benchmark should not know anything about layers. The benchmark's job is to compare existing data with data of a new subject (which can be a model), there should be zero insight into the exact model implementation. So I really don't like the benchmark iterating over Could we interpret this as regions instead?
|
doesn't the original implementation already filter nan values as well? language/brainscore_language/benchmarks/pereira2018/ceiling_packaging.py Lines 211 to 212 in 3232495 |
|
@mschrimpf the idea was to evaluate multiple user selected layers at once instead of doing multiple forward passes / evaluations for each layer a user wants to evaluate. So I implemented it from an efficiency perspective. It can return a list of |
No description provided.