diff --git a/.github/notebook_lists/vanilla_notebooks.txt b/.github/notebook_lists/vanilla_notebooks.txt index dfdae91..ae130f4 100644 --- a/.github/notebook_lists/vanilla_notebooks.txt +++ b/.github/notebook_lists/vanilla_notebooks.txt @@ -3,3 +3,4 @@ recipes/Guard-Rails/HAP.ipynb recipes/Auto_Documentation/Auto_Documentation.ipynb recipes/Text_to_Shell_Exec/Text_to_Shell_Exec.ipynb recipes/Text_to_SQL/Text_to_SQL.ipynb +recipes/generate_sql_and_execute/generate_sql_and_exec.ipynb diff --git a/.github/workflows/vanilla_workflow.yaml b/.github/workflows/vanilla_workflow.yaml index 680c1f1..7277e90 100644 --- a/.github/workflows/vanilla_workflow.yaml +++ b/.github/workflows/vanilla_workflow.yaml @@ -11,6 +11,7 @@ on: - 'recipes/Text_to_Shell/Text_to_Shell.ipynb' - 'recipes/Text_to_Shell_Exec/Text_to_Shell_Exec.ipynb' - 'recipes/Text_to_SQL/Text_to_SQL.ipynb' + - 'recipes/generate_sql_and_execute/generate_sql_and_exec.ipynb' pull_request: branches: - main @@ -21,7 +22,7 @@ on: - 'recipes/Text_to_Shell/Text_to_Shell.ipynb' - 'recipes/Text_to_Shell_Exec/Text_to_Shell_Exec.ipynb' - 'recipes/Text_to_SQL/Text_to_SQL.ipynb' - + - 'recipes/generate_sql_and_execute/generate_sql_and_exec.ipynb' jobs: test-vanilla-notebooks: diff --git a/recipes/generate_sql_and_execute/generate_sql_and_exec.ipynb b/recipes/generate_sql_and_execute/generate_sql_and_exec.ipynb new file mode 100644 index 0000000..e74dc04 --- /dev/null +++ b/recipes/generate_sql_and_execute/generate_sql_and_exec.ipynb @@ -0,0 +1,454 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "24c38e47-9379-4e35-a045-03b50948cca7", + "metadata": {}, + "source": [ + "# Generate SQL using IBM Granite LLM and execute it on IBM DB2 on cloud." + ] + }, + { + "cell_type": "markdown", + "id": "d504f020-73d1-4d77-ac2a-9dc00be5af3a", + "metadata": {}, + "source": [ + "In this notebook, we are going to use ibm granite model to generate a sql for a given table schema and execute the sql on DB2 on IBM cloud.\n", + "\n", + "Following are the steps that we will follow in the cookbook.\n", + "\n", + "1. #### Setup DB2 on IBM Cloud\n", + "2. #### Initialize DB with Tables and populate the tables with data\n", + "3. #### Populate the tables with data by executing the following SQL in the DB2 UI\n", + "4. #### Get the User id, password and deployment id from the Service credentials of IBM DB2 instance\n", + "5. #### Get the DB2 rest end point from IBM DB2 on cloud > dashboard > Administration section\n", + "6. #### Install IBM Granite Utils and import lang chain utils.\n", + "7. #### Setup IBM Granite LLM from Replicate\n", + "8. #### Develop the prompt to generate SQL for a given query and a table structure\n", + "9. #### Invoke the granite model with the prompt to get the SQL and apply regex to get the SQL only in case of addition text in generated response\n", + "10. #### Get the http connection to IBM DB2 on cloud and get the auth token\n", + "11. #### Execute the generated SQL on DB2 on IBM cloud and get the SQL ID of the execution using the sql_jobs api endpoint\n", + "12. #### Get the output of the SQL execution using the SQL id thru the sql_jobs api" + ] + }, + { + "cell_type": "markdown", + "id": "b9b43834-a88d-450b-9f54-d914f4e2ddac", + "metadata": {}, + "source": [ + "### Pre Requisites\n", + "For this we need,\n", + "a replicate account that hosts IBM granite models,\n", + "an IBM cloud account with IBM DB2 deployed in it.\n" + ] + }, + { + "cell_type": "markdown", + "id": "cab4a127-85cd-4adf-8eb3-e10846ced3ff", + "metadata": {}, + "source": [ + "Once the pre requisites are setup, using this cookbook, we will pass a prompt to granite model and get the response. The response sql is passed on to IBM DB2 through rest api endpoints and executed on IBM DB2 on cloud. The output of the sql is then displayed to the user." + ] + }, + { + "cell_type": "markdown", + "id": "d4d4e736-9d7c-4bbd-a823-d25e8f42c60e", + "metadata": {}, + "source": [ + "### Setup DB2 on IBM Cloud" + ] + }, + { + "cell_type": "markdown", + "id": "3da9a461-a9bd-4005-ae0c-042655e45f9d", + "metadata": {}, + "source": [ + "Visit https://cloud.ibm.com/docs/Db2onCloud?topic=Db2onCloud-getting-started and setup IBM Db2 on Cloud" + ] + }, + { + "cell_type": "markdown", + "id": "96a8a2c6-e7b6-40e1-a1de-3a5b6448e72f", + "metadata": {}, + "source": [ + "### Initialize DB with Tables and populate the tables with data" + ] + }, + { + "cell_type": "markdown", + "id": "ae6dbb61-b2b4-4d8e-8a4e-da0c1ee3fd39", + "metadata": {}, + "source": [ + "After setting up IBM DB2 on cloud go to SQL section of the IBM DB2 UI,\n", + "execute the following DDL Sql." + ] + }, + { + "cell_type": "markdown", + "id": "1ce6871c-9a5d-4a2c-b1a4-3a08720e898e", + "metadata": {}, + "source": [ + "CREATE TABLE User (\n", + " user_id INT NOT NULL PRIMARY KEY,\n", + " username VARCHAR(50) NOT NULL,\n", + " email VARCHAR(100) NOT NULL,\n", + " password_hash VARCHAR(100) NOT NULL,\n", + " full_name VARCHAR(100),\n", + " address VARCHAR(255),\n", + " phone_number VARCHAR(20)\n", + ");\n", + "\n", + "CREATE TABLE Product (\n", + " product_id INT NOT NULL PRIMARY KEY,\n", + " name VARCHAR(100) NOT NULL,\n", + " description VARCHAR(500),\n", + " price DECIMAL(10, 2) NOT NULL,\n", + " category VARCHAR(50),\n", + " image_url VARCHAR(255),\n", + " available_quantity INT\n", + ");\n", + "\n", + "CREATE TABLE Store (\n", + " store_id INT NOT NULL PRIMARY KEY,\n", + " name VARCHAR(100) NOT NULL,\n", + " location VARCHAR(255),\n", + " capacity INT,\n", + " available_capacity INT\n", + ");\n", + "\n", + "CREATE TABLE Order (\n", + " order_id INT NOT NULL PRIMARY KEY,\n", + " user_id INT REFERENCES User(user_id),\n", + " order_date TIMESTAMP,\n", + " status VARCHAR(50),\n", + " total_amount DECIMAL(10, 2),\n", + " store_id INT REFERENCES Store(store_id)\n", + ");\n", + "\n", + "CREATE TABLE OrderItem (\n", + " order_item_id INT NOT NULL PRIMARY KEY,\n", + " order_id INT REFERENCES Order(order_id),\n", + " product_id INT REFERENCES Product(product_id),\n", + " quantity INT,\n", + " subtotal DECIMAL(10, 2)\n", + ");" + ] + }, + { + "cell_type": "markdown", + "id": "1f57fad0-986f-4f7a-8f09-247f275792e8", + "metadata": {}, + "source": [ + "#### Populate the tables with data by executing the following SQL in the DB2 UI" + ] + }, + { + "cell_type": "markdown", + "id": "d2fe952d-5aa9-4411-bcd3-48febb85555b", + "metadata": {}, + "source": [ + "INSERT INTO User (user_id, username, email, password_hash, full_name, address, phone_number)\n", + "VALUES\n", + " (1, 'john_doe', 'john@example.com', 'hashed_password', 'John Doe', '123 Main St', '123-456-7890'),\n", + " (2, 'jane_smith', 'jane@example.com', 'hashed_password', 'Jane Smith', '456 Elm St', '987-654-3210');\n", + "\n", + "INSERT INTO Product (product_id, name, description, price, category, image_url, available_quantity)\n", + "VALUES\n", + " (1, 'Toy Car', 'Remote control toy car', 19.99, 'Toys', 'car.jpg', 50),\n", + " (2, 'Action Figure', 'Superhero action figure', 12.99, 'Toys', 'action_figure.jpg', 30),\n", + " (3, 'Laptop', 'High-performance laptop', 899.99, 'Electronics', 'laptop.jpg', 10),\n", + " (4, 'Smartphone', 'Latest smartphone model', 699.99, 'Electronics', 'smartphone.jpg', 15);\n", + "\n", + "INSERT INTO Store (store_id, name, location, capacity, available_capacity)\n", + "VALUES\n", + " (1, 'Center A', 'Location A', 100, 80),\n", + " (2, 'Center B', 'Location B', 150, 120);\n", + "\n", + "INSERT INTO Order (order_id, user_id, order_date, status, total_amount, store_id)\n", + "VALUES\n", + " (1, 1, '2023-08-01', 'Pending', 32.98, 1),\n", + " (2, 2, '2023-08-02', 'Shipped', 19.99, 2);\n", + "\n", + "INSERT INTO OrderItem (order_item_id, order_id, product_id, quantity, subtotal)\n", + "VALUES\n", + " (1, 1, 1, 2, 39.98),\n", + " (2, 1, 2, 1, 12.99),\n", + " (3, 2, 1, 1, 19.99);\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "ffc7787a-d76c-42e8-b8ef-d3b4cb23c50c", + "metadata": {}, + "source": [ + "### Get the User id, password and deployment id from the Service credentials of IBM DB2 instance" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "6b0590a6-3abc-4158-9990-de3b1e3eb0f7", + "metadata": {}, + "source": [ + "From the Service credential's db2.authentication key, get the user id and password\n", + "\n", + "db2\": {\n", + " \"authentication\": {\n", + " \"method\": \"direct\",\n", + " \"password\": \"maDVHz4oUqmVE_mod\",\n", + " \"username\": \"tcq27st\"\n", + " },\n", + "\n", + "Also, get the deployment id from,\n", + "\"instance_administration_api\": {\n", + " \"deployment_id\": \"crn:v1:bluemix:public:dashdb-for-transactions:us-south:a/f665a69257a9fbe8b8bf0f77bc176a0\",\n", + " \"instance_id\": \"crn:v1:bluemix:public:\",\n", + " \"root\": \"https://api.db2.cloud.ibm.com/v5/ibm\"\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8523f756-2bbc-4bfe-a0cf-2a418b55e2cf", + "metadata": {}, + "outputs": [], + "source": [ + "db_username = \"username-from-service-credentials\"\n", + "db_password = \"password-from-service-credentials\"" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "a8a9ef9b-a2d4-47cc-a7aa-2732e6cb7293", + "metadata": {}, + "source": [ + "#### Get the DB2 rest end point from IBM DB2 on cloud > dashboard > Administration section" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "185aeb6f-ee05-4e9f-ba97-abb62aec33f4", + "metadata": {}, + "outputs": [], + "source": [ + "db2_rest_endpoint = \"rest-api-host-name-from-IBM-DB2-Administration-section\"\n", + "db2_deployment_id = \"deployment-id-from-service-credentials\"" + ] + }, + { + "cell_type": "markdown", + "id": "ab0a77e9-c5fc-482d-81bf-5cee55edded8", + "metadata": {}, + "source": [ + "### Install IBM Granite Utils and import lang chain utils." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dfb239c1-1105-43fd-bda6-1850c267c08f", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install git+https://github.com/ibm-granite-community/granite-kitchen" + ] + }, + { + "cell_type": "markdown", + "id": "c3ff1454-a51b-4fab-b36e-ec9048ca7042", + "metadata": {}, + "source": [ + "### Setup IBM Granite LLM from Replicate" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2a7e0b9f-48e9-4852-a4f2-45d950bce62e", + "metadata": {}, + "outputs": [], + "source": [ + "from ibm_granite_community.notebook_utils import get_env_var\n", + "from langchain_community.llms import Replicate\n", + "\n", + "model_id = \"ibm-granite/granite-8b-code-instruct-128k\"\n", + "\n", + "model = Replicate(\n", + " model=model_id,\n", + " replicate_api_token=get_env_var('REPLICATE_API_TOKEN'),\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "id": "05c26b40-774a-461f-bbc0-81fdb7215fb8", + "metadata": {}, + "source": [ + "### Develop the prompt to generate SQL for a given query and a table structure " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "90c4cf32-05d6-4df0-b7e0-3b5204cc8b37", + "metadata": {}, + "outputs": [], + "source": [ + "user_text = \"display the order items of orders that are marked 'Pending'\"\n", + "prompt = f\"Write an executable SQL Query to {user_text} using the tables enclosed in ~~\\n~~Table: Product\\ncolumns: product_id, name, description, price, category, image_url, available_quantity\\nTable: Order\\nColumns: order_id, user_id, order_date, status, total_amount, store_id\\nTable: OrderItem\\nColumns: order_item_id, order_id, product_id, quantity, subtotal\\n~~.'\"\n", + "print(f\"Prompt to model : {prompt}\")" + ] + }, + { + "cell_type": "markdown", + "id": "2a98b09c-06b1-4eb3-96e8-87b1b1ba83cd", + "metadata": {}, + "source": [ + "#### Invoke the granite model with the prompt to get the SQL and apply regex to get the SQL only in case of addition text in generated response" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7130f937-7d12-425e-8dfc-4bc1cc9cd840", + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "replicate_response = model.invoke(prompt)\n", + "sqlRegex = re.compile('SELECT[\\\\s\\\\S]*;')\n", + "sqlMatch = sqlRegex.search(replicate_response)\n", + "sqlString = sqlMatch.group()\n", + "print(f\"Granite response from Replicate: {replicate_response}\\n\")\n", + "print(\"Extracted SQL from LLM response\\n\")\n", + "print(sqlString)\n" + ] + }, + { + "cell_type": "markdown", + "id": "3dcdec95-a9a0-400a-86d6-acb85c85af69", + "metadata": {}, + "source": [ + "### Get the http connection to IBM DB2 on cloud and get the auth token" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b71c7582-7137-491b-911e-673e55bcef7a", + "metadata": {}, + "outputs": [], + "source": [ + "import http.client\n", + "import json\n", + "conn = http.client.HTTPSConnection(db2_rest_endpoint)\n", + "\n", + "payload = {\"userid\":db_username,\"password\":db_password}\n", + "headers = {\n", + " 'content-type': \"application/json\",\n", + " 'x-deployment-id': db2_deployment_id\n", + " }\n", + "has_auth=False\n", + "try:\n", + " conn.request(\"POST\", \"/dbapi/v4/auth/tokens\", json.dumps(payload), headers)\n", + " res = conn.getresponse()\n", + " data = res.read()\n", + " auth_token = data.decode(\"utf-8\")\n", + " auth_resp = json.loads(auth_token)\n", + " print(auth_resp['token'])\n", + " print(\"Got Auth Token\")\n", + " has_auth=True\n", + "except:\n", + " print(\"An exception occurred\") \n" + ] + }, + { + "cell_type": "markdown", + "id": "3e216b51-ec53-4afd-b4bb-7c6cb7b798c3", + "metadata": {}, + "source": [ + "### Execute the generated SQL on DB2 on IBM cloud and get the SQL ID of the execution using the sql_jobs api endpoint" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c8c0d3bf-959b-4694-b8ce-6e00e9c3847d", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "if has_auth:\n", + " payload = {\"commands\":sqlString,\"limit\":10,\"separator\":\";\",\"stop_on_error\" : \"no\"}\n", + " headers = {\n", + " 'content-type': \"application/json\",\n", + " 'authorization': f\"Bearer {auth_resp['token']}\",\n", + " 'x-deployment-id': db2_deployment_id\n", + " }\n", + " \n", + " conn.request(\"POST\", \"/dbapi/v4/sql_jobs\", json.dumps(payload), headers)\n", + " \n", + " res = conn.getresponse()\n", + " data = res.read()\n", + " json_data = json.loads(data)\n", + " sql_id = json_data['id']\n", + " print(\"Executed SQL on DB..\\n\")\n", + " print(json_data)\n", + " print()\n", + " print(f\"SQL ID : {sql_id}\")" + ] + }, + { + "cell_type": "markdown", + "id": "94b3e3a4-1ded-4cfb-843f-6ee10de86829", + "metadata": {}, + "source": [ + "### Get the output of the SQL execution using the SQL id thru the sql_jobs api" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fcbb0839-9c57-4896-b10c-3eb997a79bb4", + "metadata": {}, + "outputs": [], + "source": [ + "if has_auth:\n", + " conn.request(\"GET\", f\"/dbapi/v4/sql_jobs/{sql_id}\", headers=headers)\n", + " res = conn.getresponse()\n", + " data = res.read()\n", + " \n", + " json_data = json.loads(data)\n", + " # print(data.decode(\"utf-8\"))\n", + " print(json_data['results'][0]['columns'])\n", + " for row in json_data['results'][0]['rows']:\n", + " print(row)\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}