Skip to content

Some questions about training dataset #27

@MikeDean2367

Description

@MikeDean2367

Great work!

I executed the following command and obtained the data file named wikipedia_links_aligned_spans.json in the folder ~/.cache/refined/datasets.

python3 src/refined/training/train/train.py --experiment_name test

I have two questions regarding this file:

  • Is wikipedia_links_aligned_spans.json the training data?
  • If so, which fields are used for training? I found three fields in the wikipedia_links_aligned_spans.json, which are hyperlinks_clean, hyperlinks, and predicted_spans. I'm not familiar with this three fields and I'm unsure how to proceed with obtaining the training data.

Thanks !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions