Rename parameter overview to README.md
This commit is contained in:
71
README.md
Normal file
71
README.md
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
# LLM Benchmark v2 — Parameter-Übersicht
|
||||||
|
|
||||||
|
## CLI-Parameter
|
||||||
|
|
||||||
|
| Parameter | Typ | Standard | Beschreibung |
|
||||||
|
|-----------|-----|----------|--------------|
|
||||||
|
| `ANZAHL` | `int` (positional) | — | Anzahl der zu testenden Modelle (z.B. `4`) |
|
||||||
|
| `--backend` | choice | `vllm` | Backend-Preset: `vllm`, `ollama`, `lmstudio` |
|
||||||
|
| `--url` | string | `None` | Eigene Base-URL, überschreibt `--backend` (z.B. `http://localhost:9000/v1`) |
|
||||||
|
| `--model` | string | `None` | Modellname explizit angeben, überspringt Auto-Detect (z.B. `gemma4:31b`) |
|
||||||
|
| `--results-dir` | string | `results/` | Ausgabeverzeichnis |
|
||||||
|
|
||||||
|
## Backend-Presets
|
||||||
|
|
||||||
|
| Name | URL |
|
||||||
|
|------|-----|
|
||||||
|
| `vllm` | `http://localhost:8000/v1` |
|
||||||
|
| `ollama` | `http://localhost:11434/v1` |
|
||||||
|
| `lmstudio` | `http://localhost:1234/v1` |
|
||||||
|
|
||||||
|
## Interne Konstanten
|
||||||
|
|
||||||
|
| Konstante | Wert | Beschreibung |
|
||||||
|
|-----------|------|--------------|
|
||||||
|
| `DEFAULT_TIMEOUT` | `300.0 s` | HTTP-Timeout pro Request |
|
||||||
|
| `MAX_RETRIES` | `3` | Wiederholungen bei Fehler (429, 5xx, Timeout) |
|
||||||
|
|
||||||
|
## Prompt-Blöcke
|
||||||
|
|
||||||
|
| ID | Block | Bezeichnung |
|
||||||
|
|----|-------|-------------|
|
||||||
|
| A1 | Code | Sortierfunktion mit fehlenden Schlüsseln |
|
||||||
|
| A2 | Code | CSV-Debugging |
|
||||||
|
| A3 | Code | HTTP-API-Client |
|
||||||
|
| B1 | Business | MoE-Erklärung für Geschäftskunden |
|
||||||
|
| B2 | Business | E-Mail-Absage |
|
||||||
|
| B3 | Business | revDSG-Argumente |
|
||||||
|
|
||||||
|
## Gemessene Metriken (pro Run)
|
||||||
|
|
||||||
|
| Metrik | Beschreibung |
|
||||||
|
|--------|-------------|
|
||||||
|
| `ttft_s` | Time to First Token (Sekunden) |
|
||||||
|
| `thinking_time_s` | Dauer des `<think>`-Blocks (0 wenn kein Thinking) |
|
||||||
|
| `total_time_s` | Gesamtlaufzeit |
|
||||||
|
| `total_tokens` | Anzahl generierter Tokens |
|
||||||
|
| `tokens_per_sec` | Throughput (tok/s) |
|
||||||
|
|
||||||
|
## Beispielaufrufe
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 4 Modelle mit vllm (Standard)
|
||||||
|
python benchmark_v2.py 4
|
||||||
|
|
||||||
|
# 2 Modelle mit ollama
|
||||||
|
python benchmark_v2.py 2 --backend ollama
|
||||||
|
|
||||||
|
# 1 Modell mit custom URL und festem Modellnamen
|
||||||
|
python benchmark_v2.py 1 --url http://localhost:9000/v1 --model gemma4:31b
|
||||||
|
|
||||||
|
# Eigenes Ausgabeverzeichnis
|
||||||
|
python benchmark_v2.py 2 --results-dir /tmp/bench
|
||||||
|
```
|
||||||
|
|
||||||
|
## Ausgabe-Dateien
|
||||||
|
|
||||||
|
| Datei/Pfad | Inhalt |
|
||||||
|
|------------|--------|
|
||||||
|
| `results/<modell>.json` | Metriken aller Runs (ohne Rohantworten) |
|
||||||
|
| `results/<modell>/<prompt_id>.txt` | Rohantwort pro Prompt |
|
||||||
|
| `results/benchmark_v2_<timestamp>.md` | Markdown-Report mit Zusammenfassung + Details |
|
||||||
Reference in New Issue
Block a user