Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Frontend #289

Open
wants to merge 20 commits into
base: legion-0.2
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
bc1dcc7
Refactors Gradio Frontend into multiple Classes
sebastianGruenwald Nov 5, 2024
cea94f1
Unifies all imports to sql_migration_assistant.
sebastianGruenwald Nov 5, 2024
f4820ed
Optimizes Imports and reformats files
sebastianGruenwald Nov 5, 2024
39813af
Makes Module installable. Tested in notebook
sebastianGruenwald Nov 6, 2024
7f31ad1
Makes Module installable. Tested in notebook
sebastianGruenwald Nov 6, 2024
d6a0cc8
Refactoring
sebastianGruenwald Nov 19, 2024
1748700
changes config path from absolute to relative
sebastianGruenwald Nov 21, 2024
daefae2
adds option to configure profile in commandline
sebastianGruenwald Nov 21, 2024
7b97c24
adds configs to installable files to remove dependency to Workspace p…
sebastianGruenwald Nov 21, 2024
5f91084
adds configs to installable files to remove dependency to Workspace p…
sebastianGruenwald Nov 21, 2024
027c15f
adds configs to installable files to remove dependency to Workspace p…
sebastianGruenwald Nov 21, 2024
2e25ae5
bug fixes
sebastianGruenwald Nov 21, 2024
3e45275
Formatting
sebastianGruenwald Nov 21, 2024
eadeaca
Merge branch 'main' into legion/refactoring/refactor_frontend
sebastianGruenwald Nov 21, 2024
0d72473
Merges main
sebastianGruenwald Nov 21, 2024
88aa9c5
Changed formatting to black
sebastianGruenwald Nov 26, 2024
107f951
removed unnecessary comments
sebastianGruenwald Nov 26, 2024
9be7b9d
removed unnecessary class
sebastianGruenwald Nov 26, 2024
a448adb
Fix for save prompt
sebastianGruenwald Nov 26, 2024
d7a959e
Removed unnecessary code
sebastianGruenwald Nov 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ ipython_config.py
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml
poetry.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
Expand Down
4 changes: 2 additions & 2 deletions cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ def ip_access_list_analyzer(**args):
import ip_access_list_analyzer.ip_acl_analyzer as analyzer
analyzer.main(args)

def sql_migration_assistant(**args):
def sql_migration_assistant(**kwargs):
from sql_migration_assistant import hello
hello()
hello(**kwargs)

MAPPING = {
"ip-access-list-analyzer": ip_access_list_analyzer,
Expand Down
2 changes: 2 additions & 0 deletions sql-migration-assistant/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.databrickscfg
.databricks
Original file line number Diff line number Diff line change
Expand Up @@ -15,45 +15,44 @@ tags:
# Project Legion - SQL Migration Assistant

Legion is a Databricks field project to accelerate migrations on to Databricks leveraging the platform’s generative AI
capabilities. It uses an LLM for code conversion and intent summarisation, presented to users in a front end web
capabilities. It uses an LLM for code conversion and intent summarisation, presented to users in a front end web
application.

Legion provides a chatbot interface to users for translating input code (for example T-SQL to Databricks SQL) and
Legion provides a chatbot interface to users for translating input code (for example T-SQL to Databricks SQL) and
summarising the intent and business purpose of the code. This intent is then embedded for serving in a Vector Search
index for finding similar pieces of code. This presents an opportunity for increased collaboration (find out who is
working on similar projects), rationalisation (identify duplicates based on intent) and discoverability (semantic search).
working on similar projects), rationalisation (identify duplicates based on intent) and discoverability (semantic
search).

Legion is a solution accelerator - it is *not* a fully baked solution. This is something for you the customer to take
on and own. This allows you to present a project to upskill your employees, leverage GenAI for a real use case,
Legion is a solution accelerator - it is *not* a fully baked solution. This is something for you the customer to take
on and own. This allows you to present a project to upskill your employees, leverage GenAI for a real use case,
customise the application to their needs and entirely own the IP.

## Installation Videos


https://github.com/user-attachments/assets/e665bcf4-265f-4a47-81eb-60845a72c798

https://github.com/user-attachments/assets/fa622f96-a78c-40b8-9eb9-f6671c4d7b47

https://github.com/user-attachments/assets/1a58a1b5-2dcf-4624-b93f-214735162584



Setting Legion up is a simple and automated process. Ensure you have the [Databricks CLI]
(https://docs.databricks.com/en/dev-tools/cli/index.html) installed and configured with the correct workspace.

Once the Databricks CLI has been installed and configured, run the following command to install the Databricks Labs
Once the Databricks CLI has been installed and configured, run the following command to install the Databricks Labs
Sandbox and the SQL Migration Assistant.

```bash
databricks labs install sandbox && databricks labs sandbox sql-migration-assistant
```

### What Legion needs - during setup above you will create or choose existing resources for the following:

- A no-isolation shared cluster to host the front end application.
- A catalog and schema in Unity Catalog.
- A catalog and schema in Unity Catalog.
- A table to store the code intent statements and their embeddings.
- A vector search endpoint and an embedding model: see docs
https://docs.databricks.com/en/generative-ai/vector-search.html#how-to-set-up-vector-search
- A chat LLM. Pay Per Token is recomended where available, but the set up will also allow for creation of
a provisioned throughput endpoint.
- A vector search endpoint and an embedding model: see docs
https://docs.databricks.com/en/generative-ai/vector-search.html#how-to-set-up-vector-search
- A chat LLM. Pay Per Token is recomended where available, but the set up will also allow for creation of
a provisioned throughput endpoint.
- A PAT stored in a secret scope chosen by you, under the key `sql-migration-pat`.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
import os
import sys


sys.path.insert(0, os.path.abspath("../../python"))
sys.path.append(os.path.abspath("./_theme"))
# -- Project information -----------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
@@ -1,28 +1,22 @@
# Databricks notebook source
# DBTITLE 1,get params
import json

from pyspark.sql.types import (
ArrayType,
StructType,
StructField,
StringType,
MapType,
IntegerType,
TimestampType,
)
import pyspark.sql.functions as f
from pyspark.sql.functions import udf, pandas_udf

agent_configs = json.loads(dbutils.widgets.get("agent_configs"))
app_configs = json.loads(dbutils.widgets.get("app_configs"))


# COMMAND ----------

checkpoint_dir = app_configs["VOLUME_NAME_CHECKPOINT_PATH"]
volume_path = app_configs["VOLUME_NAME_INPUT_PATH"]


# COMMAND ----------

bronze_raw_code = f'{app_configs["CATALOG"]}.{app_configs["SCHEMA"]}.bronze_raw_code'
Expand Down Expand Up @@ -70,7 +64,6 @@
"""
)


silver_llm_responses = (
f'{app_configs["CATALOG"]}.{app_configs["SCHEMA"]}.silver_llm_responses'
)
Expand All @@ -87,7 +80,6 @@
"""
)


gold_table = (
f'{app_configs["CATALOG"]}.{app_configs["SCHEMA"]}.gold_transformed_notebooks'
)
Expand All @@ -104,7 +96,6 @@
"""
)


# COMMAND ----------

# DBTITLE 1,convert agent_configs input string to a dataframe
Expand Down
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
# Databricks notebook source
import json

import pyspark.sql.functions as f
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole
import json
import os
from pyspark.sql.functions import pandas_udf
from pyspark.sql.types import (
ArrayType,
StructType,
StructField,
StringType,
MapType,
IntegerType,
TimestampType,
)
import pyspark.sql.functions as f
from pyspark.sql.functions import udf, pandas_udf

# COMMAND ----------

Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Databricks notebook source
import base64
import json

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.workspace import ImportFormat, Language
from pyspark.sql import functions as f
from pyspark.sql.types import *
import json

# COMMAND ----------

Expand Down Expand Up @@ -34,6 +35,7 @@
prompt_id = dbutils.jobs.taskValues.get(taskKey="ingest_to_holding", key="promptID")
output_volume_path = app_configs["VOLUME_NAME_OUTPUT_PATH"]


# COMMAND ----------


Expand Down Expand Up @@ -110,7 +112,6 @@ def write_notebook_code(llm_responses):

gold_df.display()


# COMMAND ----------

temp_table_name = "gold_temp"
Expand Down
11 changes: 11 additions & 0 deletions sql-migration-assistant/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
databricks-sdk==0.30.0
pyyaml
databricks-labs-blueprint==0.8.2
databricks-labs-lsql==0.9.0
gradio==5.5.0
aiohttp==3.10.5
fastapi
pydantic==2.8.2
dbtunnel==0.14.6
mlflow
openai
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,13 @@
# MAGIC If you want to share the app with users outside of Databricks, for example so non technical SMEs can contribute to LLM prompt development, the notebook needs to run on a no isolation shared cluster.

# COMMAND ----------
pip install databricks-sdk -U -q
%pip install .

# COMMAND ----------
pip install gradio==4.27.0 pyyaml aiohttp==3.10.5 databricks-labs-blueprint==0.8.2 databricks-labs-lsql==0.9.0 -q
dbutils.library.restartPython()

# COMMAND ----------
pip install fastapi==0.112.2 pydantic==2.8.2 dbtunnel==0.14.6 openai -q

# COMMAND ----------
dbutils.library.restartPython()
from sql_migration_assistant.utils.runindatabricks import run_app

# COMMAND ----------
from utils.runindatabricks import run_app
run_app()
run_app()
26 changes: 26 additions & 0 deletions sql-migration-assistant/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from setuptools import setup, find_packages


# Read the requirements.txt file
def load_requirements(filename="requirements.txt"):
with open(filename, "r") as file:
return file.read().splitlines()


setup(
name="sql_migration_assistant",
version="0.1",
packages=find_packages(where="src"), # Specify src as the package directory
package_dir={"": "src"},
include_package_data=True, # Include files specified in MANIFEST.in
package_data={
"sql_migration_assistant": ["config.yml"], # Include YAML file
},
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
install_requires=load_requirements(),
python_requires=">=3.10",
)
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
from sql_migration_assistant.utils.initialsetup import SetUpMigrationAssistant
from databricks.sdk import WorkspaceClient
from databricks.labs.blueprint.tui import Prompts
import yaml
from pathlib import Path

import yaml
from databricks.labs.blueprint.tui import Prompts
from databricks.sdk import WorkspaceClient

from sql_migration_assistant.utils.initialsetup import SetUpMigrationAssistant


def hello():
w = WorkspaceClient(product="sql_migration_assistant", product_version="0.0.1")
def hello(**kwargs):
w = WorkspaceClient(
product="sql_migration_assistant",
product_version="0.0.1",
profile=kwargs.get("profile"),
)
p = Prompts()
setter_upper = SetUpMigrationAssistant()
setter_upper.check_cloud(w)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
import gradio as gr

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole


class LLMCalls:
def __init__(self, openai_client, foundation_llm_name):
Expand Down Expand Up @@ -44,7 +41,7 @@ def call_llm(self, messages, max_tokens, temperature):
def llm_translate(self, system_prompt, input_code, max_tokens, temperature):
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": input_code}
{"role": "user", "content": input_code},
]

# call the LLM end point.
Expand All @@ -58,7 +55,7 @@ def llm_translate(self, system_prompt, input_code, max_tokens, temperature):
def llm_intent(self, system_prompt, input_code, max_tokens, temperature):
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": input_code}
{"role": "user", "content": input_code},
]

# call the LLM end point.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import gradio as gr


class PromptHelper:
def __init__(self, see, catalog, schema, prompt_table):
self.see = see
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from databricks.sdk import WorkspaceClient
from databricks.labs.lsql.core import StatementExecutionExt
from databricks.sdk import WorkspaceClient


class SimilarCode:
Expand Down
16 changes: 16 additions & 0 deletions sql-migration-assistant/src/sql_migration_assistant/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os

FOUNDATION_MODEL_NAME = os.environ.get("SERVED_FOUNDATION_MODEL_NAME")
SQL_WAREHOUSE_ID = os.environ.get("DATABRICKS_WAREHOUSE_ID")
VECTOR_SEARCH_ENDPOINT_NAME = os.environ.get("VECTOR_SEARCH_ENDPOINT_NAME")
VS_INDEX_NAME = os.environ.get("VS_INDEX_NAME")
CODE_INTENT_TABLE_NAME = os.environ.get("CODE_INTENT_TABLE_NAME")
CATALOG = os.environ.get("CATALOG")
SCHEMA = os.environ.get("SCHEMA")
VOLUME_NAME = os.environ.get("VOLUME_NAME")
DATABRICKS_HOST = os.environ.get("DATABRICKS_HOST")
TRANSFORMATION_JOB_ID = os.environ.get("TRANSFORMATION_JOB_ID")
WORKSPACE_LOCATION = os.environ.get("WORKSPACE_LOCATION")
VOLUME_NAME_INPUT_PATH = os.environ.get("VOLUME_NAME_INPUT_PATH")
PROMPT_HISTORY_TABLE_NAME = os.environ.get("PROMPT_HISTORY_TABLE_NAME")
DATABRICKS_TOKEN = os.environ.get("DATABRICKS_TOKEN")
Loading