fixing-sg-details by jimn2 · Pull Request #191 · brain-score/language

jimn2 · 2023-06-06T19:05:36Z

removed local syntaxgym benchmark json files.

fixing npz_obj value in gpt2_precomputed.py

hans · 2023-06-07T15:45:29Z

Hi, I'll figure out this discrepancy today. Currently the Huggingface implementation yields 91.67%, but there may well be a bug in that implementation.

>>> import datasets
>>> import evaluate

>>> ds = datasets.load_dataset("cpllab/syntaxgym", "npz_obj")
>>> metric = evaluate.load("cpllab/syntaxgym")
>>> result = metric.compute(dataset=ds["test"], model_id="distilgpt2")
>>> result["npz_obj"].accuracy
0.9166666666666666

hans · 2023-06-07T15:51:26Z

I also get 91.67% when manually invoking npz_obj through Brain Score.

>>> from brainscore_language.model_helpers.huggingface import HuggingfaceSubject
>>> from brainscore_language.benchmarks.syntaxgym.benchmark import SyntaxGymSingleTSE

>>> b = SyntaxGymSingleTSE("npz_obj", "npz_obj")
>>> m = HuggingfaceSubject("distilgpt2", {})
>>> score = b(m)
>>> score
<xarray.Score ()>
array(0.91666667)
Attributes:
    raw:      [ True  True  True  True  True  True  True  True  True  True  T...

Same result with SyntaxGymTSE.

In [13]: bs = SyntaxGymTSE({"npz_obj": "npz_obj"})

In [14]: s2 = bs(m)
/home/jon/miniconda3/envs/brainscore/lib/python3.10/site-packages/brainscore_core/metrics/__init__.py:94: UserWarning: failed to merge raw values: 'numpy.ndarray' object has no attribute 'rename'
  warnings.warn("failed to merge raw values: " + str(e))

In [15]: s2
Out[15]: 
<xarray.Score ()>
array(0.91666667)
Attributes:
    raw:         [ True  True  True  True  True  True  True  True  True  True...
    sub_scores:  <xarray.Score (sub_benchmark: 1)>\narray([0.91666667])\nCoor...

hans · 2023-06-07T16:47:13Z

Well, this is interesting. score returns a different value than the manual invocation above.

In [1]: from brainscore_language import score
In [3]: ret2 = score("distilgpt2", "syntaxgym-npz_obj")
...

In [4]: ret2
Out[4]: 
<xarray.Score ()>
array(0.83333333)
Attributes:
    raw:                   [ True False  True  True  True  True  True False  ...
    model_identifier:      distilgpt2
    benchmark_identifier:  syntaxgym-npz_obj

…mismatch

hans · 2023-06-07T17:00:18Z

The numerical mismatch is due to differing test suite content between that in the GitHub mirror (referenced in test_suites.json) and the local files we were previously referencing.

hans · 2023-06-07T17:05:57Z

I fixed the reference in test_suites.json in a8e1b83. The accuracy now comes out 0.9166.

hans · 2023-06-07T17:09:06Z

@jimn2 I'm fine with removing the local mirror of the test suite JSONs, but this breaks the functionality in SyntaxGymSingleTSE._load_suite. Can you update this to reference test_suites.json, and update the relevant outdated documentation at the top of the module?

jimn2 added 2 commits June 6, 2023 15:04

fixing-sg-details

8df9c68

removed local syntaxgym benchmark json files.

Update gpt2_precomputed.py

2229100

fixing npz_obj value in gpt2_precomputed.py

jimn2 requested review from hans, kvfairchild and mschrimpf June 6, 2023 19:33

fix wrong identifier for npz_obj in registry. does not fix numerical …

a514a68

…mismatch

fix syntaxgym test suite reference commit

a8e1b83

revert reference accuracy for npz_obj

26abb32

mschrimpf approved these changes Jun 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing-sg-details#191

fixing-sg-details#191
jimn2 wants to merge 5 commits intomainfrom
jimn2/fixing-sg-details

jimn2 commented Jun 6, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023 •

edited

Loading

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jimn2 commented Jun 6, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

hans commented Jun 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hans commented Jun 7, 2023 •

edited

Loading