Skip to content

Conversation

@calderjo
Copy link
Contributor

@calderjo calderjo commented Dec 5, 2025

we're bumping our image to Python 3.12

which required the following:
remove numpy-mkl: unfortunately we were not able to find/install a compatible version, we opted to remove it based on previous conv we had on this topic. we will instead use the default installed in colab.

remove cuml installation hack:
thankfully are able to use the pre-installed base image version without build errors.

unpinned package due to learn:
Learn is no longer dependent on this build, we can freely unpinned many packages
-seaborn, scikit-learn, matplotlib, geopandas, TPOT, shapely, tfdf, ydf, etc

remove incompatible packages:
Some of these are no longer support and cause build issues
-pydegensac, pymc3, eli5, etc

remove preinstalled package:
where applicable we removed packages that are already installed in colab base image

https://b.corp.google.com/issues/468103319

@calderjo calderjo changed the title Py312 Upgrade to Python 3.12 Dec 11, 2025
Dockerfile.tmpl Outdated
# b/456239669: remove huggingface-hub pin when pytorch-lighting and transformer are compatible
# b/315753846: Unpin translate package, currently conflicts with adk 1.17.0
RUN uv pip install --system --force-reinstall --no-deps torchtune gensim "scipy<=1.15.3" "huggingface-hub==0.36.0" "google-cloud-translate==3.12.1"
# b/(xxxxx): Unpin Pandas once cuml/cudf are compatible, version 3.0 causes issues
Copy link
Contributor Author

@calderjo calderjo Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placeholder: we'll add bugs in one go once we stabilize changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before or after merging this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i use to wait until we have a passing sgd (we use to test learn) to file all bugs
since we could end up needing to un/re-pinned packages

given we don't have a learn dependency anymore, i'll probably add them before merging.

i'm still looking into removing/unpinning more packages aside from these in a different PR, mostly I like waiting a bit before filing a bunch of bugs.

Copy link
Contributor

@djherbis djherbis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the massive assets added?

@calderjo
Copy link
Contributor Author

calderjo commented Dec 11, 2025

@djherbis our "tests/test_keras_nlp.py" test requires them.
it loads the "bert_tiny_en_uncased" model and does a few things with it.

given our test are offline, we use a mock server and provide the files.

the older version uses bert_tiny_en_uncased version 2, this new one is requesting version 3.

we'll need to remove version 2 in a follow up.


if we want to remove the test due to size, we can, otherwise this is needed

@calderjo calderjo requested a review from djherbis December 11, 2025 22:28
@rosbo
Copy link
Contributor

rosbo commented Dec 11, 2025

Any reason for not removing the version 2 of bert_tiny_en_uncased now that we are using version 3 in this PR?

https://github.com/Kaggle/docker-python/tree/main/tests/data/kagglehub/models/keras/bert/keras/bert_tiny_en_uncased/2

"nvidia-nvjitlink-cu12==12.5.82"
RUN uv pip install --system --force-reinstall "pynvjitlink-cu12==0.5.2"

# b/385145217 Latest Colab lacks mkl numpy, install it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description, you mention we agreed on removing mkl numpy, do you have a link to this discussion?

Mostly for future reference and my own curiosity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created a high level bug for the python 3.12 and posted it in their. i tend to avoid posting internal links here (other than bugs): https://b.corp.google.com/issues/468103319

removing mkl numpy is an option for unblocking numpy 2.0 given that we been stuck on 1.26.4 for a while.
Now the new base image (py3.12) requires we have 2.2.2.

we currently install mkl from this index, but doesn't provide anything pass 1.26.4: https://pypi.anaconda.org/intel/simple/numpy/

newer versions can be found in this repo, but unfortunately doesn't have a 2.2.2 version which this new base image really needs (as in many package break): https://software.repos.intel.com/python/pypi

Copy link
Contributor

@rosbo rosbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification and good work!

@djherbis
Copy link
Contributor

@djherbis our "tests/test_keras_nlp.py" test requires them. it loads the "bert_tiny_en_uncased" model and does a few things with it.

given our test are offline, we use a mock server and provide the files.

the older version uses bert_tiny_en_uncased version 2, this new one is requesting version 3.

we'll need to remove version 2 in a follow up.

if we want to remove the test due to size, we can, otherwise this is needed

Its fine for now I guess, just generally doesn't feel great to commit a lot of extra data into the repo.
Maybe eventually our test can pull in the extra data another way.

@calderjo calderjo merged commit 4ae73d1 into main Dec 12, 2025
3 checks passed
@calderjo calderjo deleted the py312 branch December 12, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants