Coding HAMA

Posts

Showing posts with the label dataeng

Deploying Streamlit App on Azure App Service (without Docker) using Azure DevOps

- December 29, 2022

First create a web app on the Azure App Services. I recommend to use Python >= 3.10 to prevent any issues. (e.g. with Python 3.9 the app didn't load properly) Because we don't use Docker, we select 'Code' for Publish and 'Linux' for Operating System. Once we have the app, we're ready to deploy the app using Azure DevOps pipeline. 1. Archive the code into a zip stages: - stage: Build displayName: Build dependsOn: [] jobs: - job: Build displayName: Build the function app steps: - task: UsePythonVersion@0 displayName: "Setting python version to 3.10 as required by functions" inputs: versionSpec: '3.10' architecture: 'x64' - task: ArchiveFiles@2 displayName: "Archive files" inputs: rootFolderOrFile: "$(System.DefaultWorkingDirectory)" includeRootFolder: false archiveFile: "$(System.DefaultWorkingDirector...

Data Quality Monitoring Tools

- December 18, 2022

Data quality monitoring tools are essential for ensuring the accuracy and reliability of your data. With so many options on the market, it can be challenging to know which one to choose. In this post, we will compare five popular but different data quality monitoring tools: Soda, Great Expectations (GE), Re_data, Monte Carlo, and LightUp. Open Source Soda, GE, and Re_data are all open-source tools, while Monte Carlo and LightUp are not. All the tools are based on Python, except for Monte Carlo, which doesn't specify its base. Data Sources - In Memory Soda uses Spark for in-memory data sources, while GE uses pandas and Spark. Re_data doesn't specify, and Monte Carlo and LightUp don't support in-memory data sources. Data Sources - Database/Lake Soda supports athena, redshift, bigquery, postgresql, snowflake, while GE supports athena, bigquery, mssql, mysql, postgresql, redshift, snowflake, sqlite, and trino. Re_data supports dbt, and Monte Carlo supports snowflake, redshit, b...

Azure ML vs Databricks for deploying machine learning models

- December 03, 2022

Azure Machine Learning (Azure ML) and Databricks Machine Learning (Databricks ML) are two popular cloud-based platforms for data scientists. Both offer a range of tools and services for building and deploying machine learning models at scale. In this blog post, we'll compare Azure ML and Databricks ML, examining their features and capabilities, and highlighting their differences. Experimentation Azure ML The Python API allows you to easily create experiments that you can then track from the UI. You can do interactive runs from a Notebook. Logging metrics in this experiments still relies on the MlFlow client. Databricks ML Create experiments is easy also with the MLFlow API and Databricks UI . Tracking metrics is really nice with the MLFlow API (so nice that AzureML also uses this client for their model tracking). Winner They are both pretty much paired on this, although the fact that AzureML uses MLFlow (a Databricks product) maybe giv...