diff --git a/doc/automation.md b/doc/automation.md new file mode 100644 index 0000000000..fedace7eb3 --- /dev/null +++ b/doc/automation.md @@ -0,0 +1,522 @@ +# OrcaSlicer UI Automation Protocol (v1.0.0) + +OrcaSlicer ships an **opt-in, localhost-only JSON-RPC server** that lets external +scripts introspect, drive, and screenshot the running OrcaSlicer GUI. It is built +for end-to-end testing and automation: a script can enumerate the live widget +tree, click buttons, type text, send keyboard shortcuts, wait for UI state, query +high-level application state, and capture both window and 3D-viewport images. + +This document is the protocol reference. It describes activation, the transport, +the JSON-RPC envelope, every method, the unified node shape, the target/locator +model, error codes, the set of instrumented automation ids, ImGui specifics, +platform caveats, a quick-start snippet, and planned future work. + +--- + +## 1. Overview & activation + +The automation server is **OFF by default**. It is enabled with two +command-line flags: + +| Flag | Meaning | +|---|---| +| `--automation-server` | Enable the automation server. | +| `--automation-server-port=PORT` | Override the listening port. Optional; default is **13619**. | + +Example: + +```bash +OrcaSlicer --automation-server --automation-server-port=13619 model.stl +``` + +The server binds to **`127.0.0.1` only** (the loopback interface). It is never +exposed on an external network interface. + +**Security note (v1):** there is **no authentication token** in v1. The localhost +bind is the *only* security boundary. Any process able to run code on the machine +can connect to the port and drive the GUI — including injecting mouse and keyboard +input — while the server is enabled. The feature is intended for testing and +automation environments, not for production or shared/multi-user machines. + +When the server is enabled, OrcaSlicer emits a `warning`-level log line at startup +to make the active input-injection surface obvious in logs, for example: + +``` +UI automation server ENABLED ... input injection is active +``` + +--- + +## 2. Transport + +The server speaks **HTTP/1.1** over the loopback TCP socket: + +| Request | Response | +|---|---| +| `POST /jsonrpc` with a JSON-RPC 2.0 request body | A JSON-RPC 2.0 response with `Content-Type: application/json`. | +| `GET /` | A plain-text health page: `OrcaSlicer automation server v1.0.0` (`Content-Type: text/plain`). | +| Anything else | HTTP `404 Not Found`. | + +The server is **single-client / serialized** in v1: it handles one request at a +time on its own dedicated I/O thread. Connections are not kept alive; each request +is answered and the socket is closed. Clients should issue requests sequentially. + +--- + +## 3. JSON-RPC envelope + +The protocol follows **JSON-RPC 2.0**. + +**Request:** + +```json +{ "jsonrpc": "2.0", "id": , "method": "", "params": { ... } } +``` + +- `params` may be omitted; the server treats a missing `params` as an empty object. + +**Success response:** + +```json +{ "jsonrpc": "2.0", "id": , "result": { ... } } +``` + +**Error response:** + +```json +{ "jsonrpc": "2.0", "id": , "error": { "code": , "message": "" } } +``` + +The request `id` is echoed back in the response. When the request has no `id`, or +when the request body cannot be parsed as JSON, the response `id` is `null`. + +--- + +## 4. Methods + +There are 11 methods. Capabilities advertised by `automation.version` list the 10 +callable feature methods (every method except `automation.version` itself). + +### `automation.version` + +Returns server identity and the list of supported methods. Takes no parameters. + +**Result:** + +```json +{ + "version": "1.0.0", + "protocol": "2.0", + "capabilities": [ + "tree.dump", "tree.find", "widget.get", "input.click", "input.type", + "input.key", "sync.wait_for", "app.state", "screenshot.window", + "screenshot.viewport3d" + ] +} +``` + +### `tree.dump` + +Snapshot the live UI tree as a single root node with nested children. + +**Params (all optional):** + +| Param | Type | Default | Meaning | +|---|---|---|---| +| `root` | string (id or path) | full tree | Root the dump at the node with this id/path. | +| `max_depth` | int | `-1` | Maximum depth to descend. `-1` = unlimited. | +| `visible_only` | bool | `false` | When true, omit non-visible nodes. | +| `include_imgui` | bool | `true` | When true, include ImGui items. | + +**Result:** the serialized root [node](#5-unified-node-shape), with `children` +included. + +### `tree.find` + +Find all nodes matching a [target predicate](#6-target--locator). + +**Params:** a target predicate — any combination of `name`, `class`, `label`, +`value`, `backend` (provided fields are ANDed). The params object is the target +itself (it is *not* wrapped in a `target` key for this method). + +**Result:** a **flat JSON array** of matching nodes. The nodes in this array are +returned **without** their `children` (use `widget.get`/`tree.dump` to descend). + +### `widget.get` + +Fetch a single node by [target](#6-target--locator). + +**Params:** + +| Param | Type | Required | Meaning | +|---|---|---|---| +| `target` | object | yes | Target spec (id / path / predicate). | + +**Result:** a single [node](#5-unified-node-shape), with its `children` included. + +**Errors:** `1001` if the target is **not found** *or* **ambiguous** (more than one +match). + +### `input.click` + +Click a resolved, actionable node. + +**Params:** + +| Param | Type | Default | Meaning | +|---|---|---|---| +| `target` | object | required | Target spec; must resolve to exactly one node. | +| `button` | string | `"left"` | `"left"`, `"right"`, or `"middle"`. | +| `double` | bool | `false` | Double-click when true. | +| `modifiers` | array of string | `[]` | Held modifiers: any of `"ctrl"`, `"shift"`, `"alt"`, `"cmd"` (`"meta"` is accepted as an alias of `"cmd"`). | + +**Result:** `{ "ok": true }`. + +**Errors:** `1001` not found / ambiguous; `1002` if the target is disabled or +hidden (not actionable). The click path raises and focuses the target's top-level +window before injecting the click. + +### `input.type` + +Type text into the currently focused control. + +**Params:** + +| Param | Type | Required | Meaning | +|---|---|---|---| +| `text` | string | yes | The text to type. | +| `target` | object | no | If given, this node is clicked first (to focus it) before typing. | + +**Result:** `{ "ok": true }`. + +**Errors:** if `target` is supplied, the same actionability errors as +`input.click` apply (`1001` / `1002`). + +### `input.key` + +Send a key chord (a key plus optional modifiers) to the focused window. + +**Params:** + +| Param | Type | Required | Meaning | +|---|---|---|---| +| `keys` | string or array | yes | Either a `"+"`-joined string like `"ctrl+s"`, or an array like `["ctrl", "s"]`. The last token is the key; earlier tokens are modifiers. | + +**Result:** `{ "ok": true }`. + +**Key names must be lowercase.** Recognized key names include `"enter"`, `"tab"`, +`"esc"`, `"space"`, `"delete"`, `"backspace"`, `"f5"` (and other function keys), +and single characters (e.g. `"s"`, `"a"`). Recognized modifiers are `"ctrl"`, +`"shift"`, `"alt"`, `"cmd"` (with `"meta"` as an alias for `"cmd"`). +**Unrecognized or uppercase key names are silently ignored** — no error is +returned, the key simply does not fire. Use lowercase names exclusively. + +### `sync.wait_for` + +Poll the UI until a target node reaches a desired state, or time out. This is the +preferred way to synchronize with asynchronous UI changes (it replaces fragile +fixed sleeps). Internally it repeatedly refreshes and dumps the tree, re-resolves +the target, and evaluates the requested state until it is satisfied. + +**Params:** + +| Param | Type | Default | Meaning | +|---|---|---|---| +| `target` | object | required | Target spec. | +| `state` | string | required | One of `"exists"`, `"visible"`, `"enabled"`, `"value"`. | +| `value` | string | — | Required when `state` is `"value"`; the expected value to match. | +| `timeout_ms` | int | `5000` | Maximum time to wait, in milliseconds. | +| `poll_ms` | int | `100` | Poll interval, in milliseconds (minimum 1). | + +State semantics: + +- `exists` — the target resolves to a node. +- `visible` — the node exists and is visible. +- `enabled` — the node exists and is **both enabled and visible**. +- `value` — the node has a value and that value equals the supplied `value`. + +**Result:** `{ "ok": true, "elapsed_ms": }`. + +**Errors:** `1003` on timeout (the state was not reached within `timeout_ms`). + +### `app.state` + +Return a high-level application-state snapshot. Takes no parameters. + +**Result:** + +```json +{ + "active_tab": "", + "project_loaded": , + "slicing": , + "slice_progress": , + "foreground": , + "modal_dialog": "" +} +``` + +| Field | Meaning | +|---|---| +| `active_tab` | The active top-level tab/page. | +| `project_loaded` | Whether a project/model is currently loaded. | +| `slicing` | Whether slicing is currently in progress. | +| `slice_progress` | Slicing progress (`-1` when unknown). | +| `foreground` | Whether the main window is in the foreground. | +| `modal_dialog` | Present only when a modal dialog is active; identifies it. Omitted otherwise. | + +### `screenshot.window` + +Capture a window's own GDI/native surface as a PNG. + +**Params:** + +| Param | Type | Default | Meaning | +|---|---|---|---| +| `target` | object | main frame | If given, capture this window; otherwise capture the main frame. | + +**Result:** `{ "png_base64": "", "width": , "height": }`. + +**Errors:** `1005` on screenshot failure; `1001` if a supplied `target` is not +found or ambiguous. + +**LIMITATION:** `screenshot.window` captures the window's **own GDI surface only**. +It does **not** capture the OpenGL 3D viewport, and it may not capture some native +child controls. To capture the 3D scene, use +[`screenshot.viewport3d`](#screenshotviewport3d). + +### `screenshot.viewport3d` + +Render the active 3D plate offscreen and return it as a PNG. This is the correct +way to capture the 3D scene that `screenshot.window` cannot. + +**Params (all optional):** + +| Param | Type | Default | Meaning | +|---|---|---|---| +| `plate` | int | active plate | Plate index to render. | +| `width` | int | `800` | Output width in pixels. | +| `height` | int | `600` | Output height in pixels. | + +**Result:** `{ "png_base64": "", "width": , "height": }`. + +**Errors:** `1005` on failure. + +--- + +## 5. Unified node shape + +Both wx widgets and ImGui items are reported with the same node schema: + +```json +{ + "backend": "wx" | "imgui", + "id": "", + "path": "", + "class": "", + "label": "", + "rect": { "x": , "y": , "w": , "h": }, + "enabled": , + "visible": , + "value": "", + "children": [ , ... ] +} +``` + +| Field | Meaning | +|---|---| +| `backend` | `"wx"` for native wxWidgets controls, `"imgui"` for immediate-mode ImGui items. | +| `id` | The automation id when one is set, otherwise a derived id. For ImGui items the `path` doubles as the `id`. | +| `path` | Positional path, e.g. `"MainFrame/Panel[2]/Button[0]"`. For ImGui items: `"ImGui//