Conversation
removed local syntaxgym benchmark json files.
fixing npz_obj value in gpt2_precomputed.py
|
Hi, I'll figure out this discrepancy today. Currently the Huggingface implementation yields 91.67%, but there may well be a bug in that implementation. >>> import datasets
>>> import evaluate
>>> ds = datasets.load_dataset("cpllab/syntaxgym", "npz_obj")
>>> metric = evaluate.load("cpllab/syntaxgym")
>>> result = metric.compute(dataset=ds["test"], model_id="distilgpt2")
>>> result["npz_obj"].accuracy
0.9166666666666666 |
|
I also get 91.67% when manually invoking >>> from brainscore_language.model_helpers.huggingface import HuggingfaceSubject
>>> from brainscore_language.benchmarks.syntaxgym.benchmark import SyntaxGymSingleTSE
>>> b = SyntaxGymSingleTSE("npz_obj", "npz_obj")
>>> m = HuggingfaceSubject("distilgpt2", {})
>>> score = b(m)
>>> score
<xarray.Score ()>
array(0.91666667)
Attributes:
raw: [ True True True True True True True True True True T...Same result with In [13]: bs = SyntaxGymTSE({"npz_obj": "npz_obj"})
In [14]: s2 = bs(m)
/home/jon/miniconda3/envs/brainscore/lib/python3.10/site-packages/brainscore_core/metrics/__init__.py:94: UserWarning: failed to merge raw values: 'numpy.ndarray' object has no attribute 'rename'
warnings.warn("failed to merge raw values: " + str(e))
In [15]: s2
Out[15]:
<xarray.Score ()>
array(0.91666667)
Attributes:
raw: [ True True True True True True True True True True...
sub_scores: <xarray.Score (sub_benchmark: 1)>\narray([0.91666667])\nCoor... |
|
Well, this is interesting. In [1]: from brainscore_language import score
In [3]: ret2 = score("distilgpt2", "syntaxgym-npz_obj")
...
In [4]: ret2
Out[4]:
<xarray.Score ()>
array(0.83333333)
Attributes:
raw: [ True False True True True True True False ...
model_identifier: distilgpt2
benchmark_identifier: syntaxgym-npz_obj |
|
The numerical mismatch is due to differing test suite content between that in the GitHub mirror (referenced in |
|
I fixed the reference in |
|
@jimn2 I'm fine with removing the local mirror of the test suite JSONs, but this breaks the functionality in |
removed local syntaxgym benchmark json files.