Skip to content

Conversation

@fedemgp
Copy link

@fedemgp fedemgp commented Feb 5, 2026

Resolves #1320

Description

Add feature to install packages in Notebook-scoped environment for PythonCommandSubmitt and PythonNotebookUploader classes

Execution Test

I made the changes and tested it in our environment, looking that the compiled code now have the prepended package installation

image

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

Signed-off-by: Federico Manuel Gomez Peter <federico.gomez@payclip.com>
Copy link
Collaborator

@tejassp-db tejassp-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you taken this into account - The Databricks docs state "Starting with Databricks Runtime 13.0 %pip commands do not automatically restart the Python process. If you install a new package or update an existing package, you may need to use dbutils.library.restartPython() to see the new packages."

Also have you considered the recommendation on driver node sizes for notebook scoped libraries

Copy link
Collaborator

@sd-db sd-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the PR. Took a look at the code changes(didn't review tests yet) and added comments.

logger.debug("Submitting Python model using the Command API.")

# Prepare code with notebook-scoped package installation if needed
code_to_execute = self._prepare_code_with_packages(compiled_code)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ensure we only call this path if notebook_scoped_libraries is set.

parsed_model.config.packages if parsed_model.config.notebook_scoped_libraries else []
)

def _prepare_code_with_packages(self, compiled_code: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of duplication, better to extract into a re-usable function component. Maybe something like

def prepare_code_with_packages(compiled_code: str, packages: list[str], separator: str = "\n\n") -> str:

# Only add packages to cluster-level libraries if not using notebook-scoped
if not notebook_scoped_libraries:
for package in packages:
if index_url:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us look to add support for index_url in notebook scoped packages as well. Since this is supported today; it would be weird if some model silently breaks if something breaks if an user who has set this switches to using notebook scoped packages.

logger.debug(f"Adding notebook-scoped package installation: {pip_install_cmd}")

# Prepend the pip install command to the compiled code
return f"{pip_install_cmd}\n\n# COMMAND ----------\n{compiled_code}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As recommended by @tejassp-db, let us add dbutils.library.restartPython() after the pip_install_cmd

return compiled_code

# Build the %pip install command for notebook-scoped packages
pip_install_cmd = "%pip install " + " ".join(self.packages)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe can try %pip install -q {packages} to reduce noise in notebook output

@fedemgp
Copy link
Author

fedemgp commented Feb 10, 2026

Have you taken this into account - The Databricks docs state "Starting with Databricks Runtime 13.0 %pip commands do not automatically restart the Python process. If you install a new package or update an existing package, you may need to use dbutils.library.restartPython() to see the new packages."

Also have you considered the recommendation on driver node sizes for notebook scoped libraries

I didn't take into account the recommendation to restart python process, thanks for that. I will refactor this draft PR to make it more production ready (avoiding duplicated code) and consider that.

About the how large should the driver node be, this solution is agnostic about cluster size. It will depend on the team that previously defined the cluster (no matter if it was an all purpose cluster, job cluster, whatever).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Notebook-scoped libraries feature not enable in dbt-databricks

3 participants