Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR removes the dependency on the torchtext library by transplanting its vocabulary functionality into the local codebase under lighthouse/common/vocab. The change eliminates an external dependency while maintaining the same vocabulary functionality needed for text processing.
- Replaces
torchtext.vocabimports with locallighthouse.common.vocabmodule - Creates a complete local implementation of vocabulary classes (
Vocab,GloVe,FastText, etc.) - Updates installation instructions and CI workflows to remove torchtext dependency
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 22 comments.
Show a summary per file
| File | Description |
|---|---|
| training/dataset.py | Updates import to use local vocab module instead of torchtext |
| training/cg_detr_dataset.py | Updates import to use local vocab module instead of torchtext |
| lighthouse/feature_extractor/text_encoders/glove.py | Updates import to use local vocab module instead of torchtext |
| lighthouse/common/vocab/vocab.py | Implements Vocab class with torch.jit compatibility |
| lighthouse/common/vocab/vectors.py | Implements vector classes (GloVe, FastText, CharNGram) for embeddings |
| lighthouse/common/vocab/init.py | Package initialization exposing vocab classes |
| lighthouse/common/vocab.py | Complete vocabulary implementation with all classes |
| mypy.ini | Removes torchtext from mypy ignore list |
| README.md | Removes torchtext from installation instructions |
| .github/workflows/pytest.yml | Removes torchtext from CI dependencies |
| .github/workflows/mypy_ruff.yml | Removes torchtext from CI dependencies |
lighthouse/common/vocab.py
Outdated
| from .utils import reporthook | ||
|
|
There was a problem hiding this comment.
Import error: The module '.utils' is being imported but there's no evidence of a utils.py file in the lighthouse/common directory. The reporthook function is defined within the same file at line 263, making this import unnecessary and likely to cause a runtime error.
| from .utils import reporthook |
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), |
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), |
| "fasttext.simple.300d": partial(FastText, language="simple"), | ||
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "25" but the GloVe constructor expects an integer. This should be dim=25 without quotes.
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), |
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "50" but the GloVe constructor expects an integer. This should be dim=50 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "50" but the GloVe constructor expects an integer. This should be dim=50 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "100" but the GloVe constructor expects an integer. This should be dim=100 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "200" but the GloVe constructor expects an integer. This should be dim=200 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | ||
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | ||
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | ||
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | ||
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | ||
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | ||
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | ||
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | ||
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | ||
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), |
There was a problem hiding this comment.
Type mismatch: The dim parameter is passed as a string "300" but the GloVe constructor expects an integer. This should be dim=300 without quotes.
| "glove.42B.300d": partial(GloVe, name="42B", dim="300"), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim="300"), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim="25"), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim="50"), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim="100"), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim="200"), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim="50"), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim="100"), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim="200"), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim="300"), | |
| "glove.42B.300d": partial(GloVe, name="42B", dim=300), | |
| "glove.840B.300d": partial(GloVe, name="840B", dim=300), | |
| "glove.twitter.27B.25d": partial(GloVe, name="twitter.27B", dim=25), | |
| "glove.twitter.27B.50d": partial(GloVe, name="twitter.27B", dim=50), | |
| "glove.twitter.27B.100d": partial(GloVe, name="twitter.27B", dim=100), | |
| "glove.twitter.27B.200d": partial(GloVe, name="twitter.27B", dim=200), | |
| "glove.6B.50d": partial(GloVe, name="6B", dim=50), | |
| "glove.6B.100d": partial(GloVe, name="6B", dim=100), | |
| "glove.6B.200d": partial(GloVe, name="6B", dim=200), | |
| "glove.6B.300d": partial(GloVe, name="6B", dim=300), |
lighthouse/common/vocab.py
Outdated
| Examples: | ||
| >>> examples = ['chip', 'baby', 'Beautiful'] | ||
| >>> vec = text.vocab.GloVe(name='6B', dim=50) | ||
| >>> ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True) |
There was a problem hiding this comment.
Variable name inconsistency in docstring: The parameter name is 'examples' in the previous line but 'tokens' is used here. Should be 'ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True)' for consistency.
| >>> ret = vec.get_vecs_by_tokens(tokens, lower_case_backup=True) | |
| >>> ret = vec.get_vecs_by_tokens(examples, lower_case_backup=True) |
|
Suggestions by copilot are trivial and I ignore them. |
|
@h-munakata LGTM. |
This PR is related to #68 .
I transplanted torchtext's vocab into
lighthouse/common/vocab.This change effects to the following scripts:
lighthouse/feature_extractor/text_encoders/glove.pytraining/dataset.pytraining/cg_detr_dataset.pyI confirmed that training runs correctly.