Delta Lake Explorer is a Streamlit application that allows users to explore Delta Lake tables on Azure Data Lake Storage using DuckDB. The application provides a code editor for writing SQL queries, a sidebar for configuring settings, and a result viewer for displaying query results.
- Code Editor: Write and execute SQL queries.
- Query Parsing: Automatically parse and transform queries to use
delta_scan. - Query Timing: Display the time taken to execute queries.
- Clone the repository:
git clone https://github.com/mrjsj/delta-lake-explorer.git
cd delta-lake-explorer- Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows, use .venv\Scripts\activate- Install the required packages:
pip install -r requirements.txt-
Rename the
.streamlit/secrets-template.tomlto.streamlit/secrets.toml: -
Fill in the following values in
.streamlit/secrets.toml:STORAGE_ACCOUNT_NAME: The name of your Azure storage account.DELTA_LAKE_ROOT_PATH: The root path up until the delta lake catalog. This includes the container name and the path to the delta lake catalog. E.g., if the full delta table path isabfss://container/path/to/catalog/layer/table, then the root path iscontainer/path/to. If the delta lake catalog is at the root of the storage account, then the root path is an empty string.
-
Choose a way to authenticate to Azure. You can use a service principal, or a Azure CLI login. In either case, make sure you have at least Storage Blob Data Reader role assigned to your service principal or your personal user on the storage account.
- If you choose a service principal, fill in the following values in
.streamlit/secrets.toml:AZURE_TENANT_ID: The tenant ID of your Azure AD.AZURE_CLIENT_ID: The client ID of your service principal.AZURE_CLIENT_SECRET: The client secret of your service principal.
- If you choose Azure CLI login, run
az loginbefore running the application.
- If you choose a service principal, fill in the following values in
Run the Streamlit application:
streamlit run main.pyQuery using DuckDB syntax. Tables must be refences by catalog.schema.table, e.g.:
SELECT * FROM catalog.schema.table;For more information on DuckDB syntax, see the DuckDB documentation.
This project is licensed under the MIT License. See the LICENSE file for details.
