Add Notebook-scoped packages for command submits or notebook job run #1321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

fedemgp wants to merge 1 commit into databricks:main from fedemgp:add_notebook_scoped_packages

+203 −12

fedemgp commented Feb 5, 2026 •

edited

Loading

Resolves #1320

Description

Add feature to install packages in Notebook-scoped environment for PythonCommandSubmitt and PythonNotebookUploader classes

Execution Test

I made the changes and tested it in our environment, looking that the compiled code now have the prepended package installation

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.


          Add Notebook-scoped packages for command submits or notebook job run

e96a5f0

Signed-off-by: Federico Manuel Gomez Peter <federico.gomez@payclip.com>

fedemgp requested review from benc-db, sd-db and tejassp-db as code owners

February 5, 2026 18:40

fedemgp marked this pull request as draft

February 5, 2026 18:41

fedemgp mentioned this pull request

Notebook-scoped libraries feature not enable in dbt-databricks #1320

Open

tejassp-db reviewed

View reviewed changes

Collaborator

tejassp-db left a comment

Have you taken this into account - The Databricks docs state "Starting with Databricks Runtime 13.0 %pip commands do not automatically restart the Python process. If you install a new package or update an existing package, you may need to use dbutils.library.restartPython() to see the new packages."

Also have you considered the recommendation on driver node sizes for notebook scoped libraries

sd-db reviewed

View reviewed changes

Collaborator

sd-db left a comment

Thx for the PR. Took a look at the code changes(didn't review tests yet) and added comments.

dbt/adapters/databricks/python_models/python_submissions.py

                       logger.debug("Submitting Python model using the Command API.")
+                      # Prepare code with notebook-scoped package installation if needed
+                      code_to_execute = self._prepare_code_with_packages(compiled_code)

Collaborator

sd-db Feb 10, 2026

We should ensure we only call this path if notebook_scoped_libraries is set.

dbt/adapters/databricks/python_models/python_submissions.py

+                          parsed_model.config.packages if parsed_model.config.notebook_scoped_libraries else []
+                      )
+                  def _prepare_code_with_packages(self, compiled_code: str) -> str:

Collaborator

sd-db Feb 10, 2026

instead of duplication, better to extract into a re-usable function component. Maybe something like

def prepare_code_with_packages(compiled_code: str, packages: list[str], separator: str = "\n\n") -> str:

dbt/adapters/databricks/python_models/python_submissions.py

+                  # Only add packages to cluster-level libraries if not using notebook-scoped
+                  if not notebook_scoped_libraries:
+                      for package in packages:
+                          if index_url:

Collaborator

sd-db Feb 10, 2026

Let us look to add support for index_url in notebook scoped packages as well. Since this is supported today; it would be weird if some model silently breaks if something breaks if an user who has set this switches to using notebook scoped packages.

dbt/adapters/databricks/python_models/python_submissions.py

+                      logger.debug(f"Adding notebook-scoped package installation: {pip_install_cmd}")
+                      # Prepend the pip install command to the compiled code
+                      return f"{pip_install_cmd}\n\n# COMMAND ----------\n{compiled_code}"

Collaborator

sd-db Feb 10, 2026

As recommended by @tejassp-db, let us add dbutils.library.restartPython() after the pip_install_cmd

dbt/adapters/databricks/python_models/python_submissions.py

+                          return compiled_code
+                      # Build the %pip install command for notebook-scoped packages
+                      pip_install_cmd = "%pip install " + " ".join(self.packages)

Collaborator

sd-db Feb 10, 2026

maybe can try %pip install -q {packages} to reduce noise in notebook output

Author

fedemgp commented Feb 10, 2026

Have you taken this into account - The Databricks docs state "Starting with Databricks Runtime 13.0 %pip commands do not automatically restart the Python process. If you install a new package or update an existing package, you may need to use dbutils.library.restartPython() to see the new packages."

https://docs.databricks.com/aws/en/libraries/notebooks-python-libraries#manage-libraries-with-pip-commands

https://docs.databricks.com/aws/en/libraries/restart-python-process

Also have you considered the recommendation on driver node sizes for notebook scoped libraries

I didn't take into account the recommendation to restart python process, thanks for that. I will refactor this draft PR to make it more production ready (avoiding duplicated code) and consider that.

About the how large should the driver node be, this solution is agnostic about cluster size. It will depend on the team that previously defined the cluster (no matter if it was an all purpose cluster, job cluster, whatever).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet