MLflow3のGenAI Agentをllama.cppで動かしたメモ
MLflow3のGenAI Agentをllama.cppで動かしてみたので、
手順のメモを残しておきます。
GenAI Agent with MLflow 3 | MLflow
https://mlflow.org/docs/latest/genai/mlflow-3/genai-agent
ggml-org/llama.cpp | GitHub
https://github.com/ggml-org/llama.cpp
llama.cppのインストール・APIサーバの実行
llama.cppのインストール
llama.cppのインストールは、次のページの通り。
llama.cpp | ggml-org/llama.cpp | GitHub
https://github.com/ggml-org/llama.cpp/blob/master/docs/install.md
macosでhomebrewを使っている場合は、以下でinstallできる。
$ brew install llama.cpp
APIサーバの実行
OpenAI互換のAPIサーバは、次のコマンドで実行できる。
$ llama-server -hf ggml-org/gemma-3-1b-it-GGUF --port 1337
# Hugging Faceのリポジトリからggml-org/gemma-3-1b-it-GGUF
を取得して実行する場合。
MLflowのインストールと実行
MLflowは、次のコマンドでinstallできる。
$ mkdir mlflow3-server && cd $_
$ python -m venv .venv
$ . .venv/bin/activate
$ pip install --upgrade 'mlflow>=3.0.0rc0' --pre
MLflowサーバは、次のコマンドで実行できる。
$ mlflow server --host 127.0.0.1 --port 8080
ブラウザでhttp://localhost:8080/
を開いて、MLFlowのUIが表示されることを確認。
動作確認用コードの実行
次のページの手順どおり、実験を実行します。
GenAI Agent with MLflow 3 | MLflow
https://mlflow.org/docs/latest/genai/mlflow-3/genai-agent
依存パッケージのインストール
動作確認用コードが依存するパッケージをインストールします。
$ mkdir mlflow3-study && cd $_
$ python -m venv .venv
$ . .venv/bin/activate
$ pip install --upgrade 'mlflow>=3.0.0rc0' --pre
$ pip install langchain
$ pip install langchain-openai
promptの登録
次のような、プロンプトを登録するスクリプトを実行します。
add-prompt.py
import mlflow
mlflow.set_tracking_uri("http://127.0.0.1:8080")
mlflow.genai.register_prompt(
name="chatbot_prompt",
template="You are a chatbot that can answer questions about IT. Answer this question: {{question}}",
commit_message="Initial version of chatbot",
)
$ python add-prompt.py
実験の実行
次のような、実験を実行するスクリプトを実行します。
experiment.py
import os
import mlflow
import pandas as pd
from langchain.schema.output_parser import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
mlflow.set_tracking_uri("http://127.0.0.1:8080")
mlflow.set_experiment("my-genai-experiment")
mlflow.set_active_model(name="langchain_model")
mlflow.langchain.autolog()
system_prompt = mlflow.genai.load_prompt(name_or_uri="chatbot_prompt", version=1)
prompt = ChatPromptTemplate.from_template(system_prompt.to_single_brace_format())
chain = prompt | ChatOpenAI(model="gemma-3") | StrOutputParser()
questions = [
"What is MLflow Tracking and how does it work?",
"What is Unity Catalog?",
"What are user-defined functions (UDFs)?",
]
outputs = []
for question in questions:
outputs.append(chain.invoke({"question": question}))
eval_df = pd.DataFrame(
{
"messages": questions,
"expected_response": [
"""MLflow Tracking is a key component of the MLflow platform designed to record and manage machine learning experiments. It enables data scientists and engineers to log parameters, code versions, metrics, and artifacts in a systematic way, facilitating experiment tracking and reproducibility.\n\nHow It Works:\n\nAt the heart of MLflow Tracking is the concept of a run, which is an execution of a machine learning code. Each run can log the following:\n\nParameters: Input variables or hyperparameters used in the model (e.g., learning rate, number of trees). Metrics: Quantitative measures to evaluate the model's performance (e.g., accuracy, loss). Artifacts: Output files like models, datasets, or images generated during the run. Source Code: The version of the code or Git commit hash used. These logs are stored in a tracking server, which can be set up locally or on a remote server. The tracking server uses a backend storage (like a database or file system) to keep a record of all runs and their associated data.\n\n Users interact with MLflow Tracking through its APIs available in multiple languages (Python, R, Java, etc.). By invoking these APIs in the code, you can start and end runs, and log data as the experiment progresses. Additionally, MLflow offers autologging capabilities for popular machine learning libraries, automatically capturing relevant parameters and metrics without manual code changes.\n\nThe logged data can be visualized using the MLflow UI, a web-based interface that displays all experiments and runs. This UI allows you to compare runs side-by-side, filter results, and analyze performance metrics over time. It aids in identifying the best models and understanding the impact of different parameters.\n\nBy providing a structured way to record experiments, MLflow Tracking enhances collaboration among team members, ensures transparency, and makes it easier to reproduce results. It integrates seamlessly with other MLflow components like Projects and Model Registry, offering a comprehensive solution for managing the machine learning lifecycle.""",
"""Unity Catalog is a feature in Databricks that allows you to create a centralized inventory of your data assets, such as tables, views, and functions, and share them across different teams and projects. It enables easy discovery, collaboration, and reuse of data assets within your organization.\n\nWith Unity Catalog, you can:\n\n1. Create a single source of truth for your data assets: Unity Catalog acts as a central repository of all your data assets, making it easier to find and access the data you need.\n2. Improve collaboration: By providing a shared inventory of data assets, Unity Catalog enables data scientists, engineers, and other stakeholders to collaborate more effectively.\n3. Foster reuse of data assets: Unity Catalog encourages the reuse of existing data assets, reducing the need to create new assets from scratch and improving overall efficiency.\n4. Enhance data governance: Unity Catalog provides a clear view of data assets, enabling better data governance and compliance.\n\nUnity Catalog is particularly useful in large organizations where data is scattered across different teams, projects, and environments. It helps create a unified view of data assets, making it easier to work with data across different teams and projects.""",
"""User-defined functions (UDFs) in the context of Databricks and Apache Spark are custom functions that you can create to perform specific tasks on your data. These functions are written in a programming language such as Python, Java, Scala, or SQL, and can be used to extend the built-in functionality of Spark.\n\nUDFs can be used to perform complex data transformations, data cleaning, or to apply custom business logic to your data. Once defined, UDFs can be invoked in SQL queries or in DataFrame transformations, allowing you to reuse your custom logic across multiple queries and applications.\n\nTo use UDFs in Databricks, you first need to define them in a supported programming language, and then register them with the SparkSession. Once registered, UDFs can be used in SQL queries or DataFrame transformations like any other built-in function.\n\nHere\'s an example of how to define and register a UDF in Python:\n\n```python\nfrom pyspark.sql.functions import udf\nfrom pyspark.sql.types import IntegerType\n\n# Define the UDF function\ndef multiply_by_two(value):\n return value * 2\n\n# Register the UDF with the SparkSession\nmultiply_udf = udf(multiply_by_two, IntegerType())\n\n# Use the UDF in a DataFrame transformation\ndata = spark.range(10)\nresult = data.withColumn("multiplied", multiply_udf(data.id))\nresult.show()\n```\n\nIn this example, we define a UDF called `multiply_by_two` that multiplies a given value by two. We then register this UDF with the SparkSession using the `udf` function, and use it in a DataFrame transformation to multiply the `id` column of a DataFrame by two.""",
],
"predictions": outputs,
}
)
with mlflow.start_run() as evaluation_run:
eval_dataset = mlflow.data.from_pandas(
df=eval_df,
name="eval_dataset",
targets="expected_response",
predictions="predictions",
)
mlflow.log_input(dataset=eval_dataset)
result = mlflow.evaluate(
data=eval_dataset,
extra_metrics=[
mlflow.metrics.genai.answer_correctness("openai:/gemma-3"),
mlflow.metrics.genai.answer_relevance("openai:/gemma-3"),
],
evaluator_config={"col_mapping": {"inputs": "messages"}},
)
print(result.tables["eval_results_table"])
$ export OPENAI_API_KEY=DUMMY
$ export OPENAI_API_BASE=http://127.0.0.1:1337
$ python experiment.py
MLflowのUIから実験結果を確認
ブラウザからhttp://localhost:8080
にアクセス。
ヘッダから「Prompts」を選ぶと、登録したプロンプトが確認できる。
ヘッダから「Experiments」を、左側メニューから実験名を選ぶと、実験の実施結果一覧を確認できる。
実験の実施結果一覧から、実施した実験を選ぶとmetricsを確認できる。
Tracesタブに切り替えると、LLMとのやりとりを確認できる。
以上。