# OrcaSlicer UI Automation Protocol (v1.0.0) OrcaSlicer ships an **opt-in, localhost-only JSON-RPC server** that lets external scripts introspect, drive, and screenshot the running OrcaSlicer GUI. It is built for end-to-end testing and automation: a script can enumerate the live widget tree, click buttons, type text, send keyboard shortcuts, wait for UI state, query high-level application state, and capture window images (the on-screen capture includes the 3D viewport). This document is the protocol reference. It describes activation, the transport, the JSON-RPC envelope, every method, the unified node shape, the target/locator model, error codes, the set of instrumented automation ids, ImGui specifics, platform caveats, a quick-start snippet, and planned future work. --- ## 1. Overview & activation The automation server is **OFF by default**. It is enabled with two command-line flags: | Flag | Meaning | |---|---| | `--automation-server` | Enable the automation server. | | `--automation-server-port=PORT` | Override the listening port. Optional; default is **13619**. | Example: ```bash OrcaSlicer --automation-server --automation-server-port=13619 model.stl ``` The server binds to **`127.0.0.1` only** (the loopback interface). It is never exposed on an external network interface. **Security note (v1):** there is **no authentication token** in v1. The localhost bind is the *only* security boundary. Any process able to run code on the machine can connect to the port and drive the GUI — including injecting mouse and keyboard input — while the server is enabled. The feature is intended for testing and automation environments, not for production or shared/multi-user machines. When the server is enabled, OrcaSlicer emits a `warning`-level log line at startup to make the active input-injection surface obvious in logs, for example: ``` UI automation server ENABLED ... input injection is active ``` --- ## 2. Transport The server speaks **HTTP/1.1** over the loopback TCP socket: | Request | Response | |---|---| | `POST /jsonrpc` with a JSON-RPC 2.0 request body | A JSON-RPC 2.0 response with `Content-Type: application/json`. | | `GET /` | A plain-text health page: `OrcaSlicer automation server v1.0.0` (`Content-Type: text/plain`). | | Anything else | HTTP `404 Not Found`. | The server is **single-client / serialized** in v1: it handles one request at a time on its own dedicated I/O thread. Connections are not kept alive; each request is answered and the socket is closed. Clients should issue requests sequentially. --- ## 3. JSON-RPC envelope The protocol follows **JSON-RPC 2.0**. **Request:** ```json { "jsonrpc": "2.0", "id": , "method": "", "params": { ... } } ``` - `params` may be omitted; the server treats a missing `params` as an empty object. **Success response:** ```json { "jsonrpc": "2.0", "id": , "result": { ... } } ``` **Error response:** ```json { "jsonrpc": "2.0", "id": , "error": { "code": , "message": "" } } ``` The request `id` is echoed back in the response. When the request has no `id`, or when the request body cannot be parsed as JSON, the response `id` is `null`. --- ## 4. Methods There are 11 methods. Capabilities advertised by `automation.version` list the 10 callable feature methods (every method except `automation.version` itself). ### `automation.version` Returns server identity and the list of supported methods. Takes no parameters. **Result:** ```json { "version": "1.0.0", "protocol": "2.0", "capabilities": [ "tree.dump", "tree.find", "widget.get", "input.click", "input.type", "input.key", "sync.wait_for", "app.state", "screenshot.window" ] } ``` ### `tree.dump` Snapshot the live UI tree as a single root node with nested children. **Params (all optional):** | Param | Type | Default | Meaning | |---|---|---|---| | `root` | string (id or path) | full tree | Root the dump at the node with this id/path. | | `max_depth` | int | `-1` | Maximum depth to descend. `-1` = unlimited. | | `visible_only` | bool | `false` | When true, omit non-visible nodes. | | `include_imgui` | bool | `true` | When true, include ImGui items. | **Result:** the serialized root [node](#5-unified-node-shape), with `children` included. ### `tree.find` Find all nodes matching a [target predicate](#6-target--locator). **Params:** a target predicate — any combination of `name`, `class`, `label`, `value`, `backend` (provided fields are ANDed). The params object is the target itself (it is *not* wrapped in a `target` key for this method). **Result:** a **flat JSON array** of matching nodes. The nodes in this array are returned **without** their `children` (use `widget.get`/`tree.dump` to descend). ### `widget.get` Fetch a single node by [target](#6-target--locator). **Params:** | Param | Type | Required | Meaning | |---|---|---|---| | `target` | object | yes | Target spec (id / path / predicate). | **Result:** a single [node](#5-unified-node-shape), with its `children` included. **Errors:** `1001` if the target is **not found** *or* **ambiguous** (more than one match). ### `input.click` Click a resolved, actionable node. **Params:** | Param | Type | Default | Meaning | |---|---|---|---| | `target` | object | required | Target spec; must resolve to exactly one node. | | `button` | string | `"left"` | `"left"`, `"right"`, or `"middle"`. | | `double` | bool | `false` | Double-click when true. | | `modifiers` | array of string | `[]` | Held modifiers: any of `"ctrl"`, `"shift"`, `"alt"`, `"cmd"` (`"meta"` is accepted as an alias of `"cmd"`). | **Result:** `{ "ok": true }`. **Errors:** `1001` not found / ambiguous; `1002` if the target is disabled or hidden (not actionable). The click path raises and focuses the target's top-level window before injecting the click. ### `input.type` Type text into the currently focused control. **Params:** | Param | Type | Required | Meaning | |---|---|---|---| | `text` | string | yes | The text to type. | | `target` | object | no | If given, this node is clicked first (to focus it) before typing. | **Result:** `{ "ok": true }`. **Errors:** if `target` is supplied, the same actionability errors as `input.click` apply (`1001` / `1002`). ### `input.key` Send a key chord (a key plus optional modifiers) to the focused window. **Params:** | Param | Type | Required | Meaning | |---|---|---|---| | `keys` | string or array | yes | Either a `"+"`-joined string like `"ctrl+s"`, or an array like `["ctrl", "s"]`. The last token is the key; earlier tokens are modifiers. | **Result:** `{ "ok": true }`. **Key names must be lowercase.** Recognized key names include `"enter"`, `"tab"`, `"esc"`, `"space"`, `"delete"`, `"backspace"`, `"f5"` (and other function keys), and single characters (e.g. `"s"`, `"a"`). Recognized modifiers are `"ctrl"`, `"shift"`, `"alt"`, `"cmd"` (with `"meta"` as an alias for `"cmd"`). **Unrecognized or uppercase key names are silently ignored** — no error is returned, the key simply does not fire. Use lowercase names exclusively. ### `sync.wait_for` Poll the UI until a target node reaches a desired state, or time out. This is the preferred way to synchronize with asynchronous UI changes (it replaces fragile fixed sleeps). Internally it repeatedly refreshes and dumps the tree, re-resolves the target, and evaluates the requested state until it is satisfied. **Params:** | Param | Type | Default | Meaning | |---|---|---|---| | `target` | object | required | Target spec. | | `state` | string | required | One of `"exists"`, `"visible"`, `"enabled"`, `"value"`. | | `value` | string | — | Required when `state` is `"value"`; the expected value to match. | | `timeout_ms` | int | `5000` | Maximum time to wait, in milliseconds. | | `poll_ms` | int | `100` | Poll interval, in milliseconds (minimum 1). | State semantics: - `exists` — the target resolves to a node. - `visible` — the node exists and is visible. - `enabled` — the node exists and is **both enabled and visible**. - `value` — the node has a value and that value equals the supplied `value`. **Result:** `{ "ok": true, "elapsed_ms": }`. **Errors:** `1003` on timeout (the state was not reached within `timeout_ms`). ### `app.state` Return a high-level application-state snapshot. Takes no parameters. **Result:** ```json { "active_tab": "", "project_loaded": , "slicing": , "slice_progress": , "foreground": , "modal_dialog": "" } ``` | Field | Meaning | |---|---| | `active_tab` | The active top-level tab/page. | | `project_loaded` | Whether a project/model is currently loaded. | | `slicing` | Whether slicing is currently in progress. | | `slice_progress` | Slicing progress (`-1` when unknown). | | `foreground` | Whether the main window is in the foreground. | | `modal_dialog` | Present only when a modal dialog is active; identifies it. Omitted otherwise. | ### `screenshot.window` Capture a window as a PNG, exactly as it appears on screen. **Params:** | Param | Type | Default | Meaning | |---|---|---|---| | `target` | object | main frame | If given, capture this window; otherwise capture the main frame. | **Result:** `{ "png_base64": "", "width": , "height": }`. **Errors:** `1005` on screenshot failure; `1001` if a supplied `target` is not found or ambiguous. **How it works:** the window's on-screen rectangle is read back from the DWM-composited desktop framebuffer (`wxScreenDC`), so the capture includes every native child control, the OpenGL 3D viewport, and ImGui overlays — it is a faithful image of what the user sees. (Capturing the parent window's own client DC instead would clip out child HWNDs and the GL surface, leaving them black; that is why this method reads from the screen.) **Caveats:** - The window must be **visible and unobscured**. Because the source is the on-screen framebuffer, any overlapping window occludes the captured region. The backend raises the target window before capturing. - **HiDPI:** the reported `width`/`height` come from the window's logical client size, while the screen framebuffer is in physical pixels. On per-monitor-DPI displays the two can differ; the capture may be cropped or scaled relative to the logical size. - Because the capture is the live on-screen image, the 3D content reflects the **current view**: the model in the 3D editor, or the gcode toolpaths in Preview after a slice. There is no separate offscreen 3D-render method — the window capture already includes whatever the GL canvas is showing. --- ## 5. Unified node shape Both wx widgets and ImGui items are reported with the same node schema: ```json { "backend": "wx" | "imgui", "id": "", "path": "", "class": "", "label": "", "rect": { "x": , "y": , "w": , "h": }, "enabled": , "visible": , "value": "", "children": [ , ... ] } ``` | Field | Meaning | |---|---| | `backend` | `"wx"` for native wxWidgets controls, `"imgui"` for immediate-mode ImGui items. | | `id` | The automation id when one is set, otherwise a derived id. For ImGui items the `path` doubles as the `id`. | | `path` | Positional path, e.g. `"MainFrame/Panel[2]/Button[0]"`. For ImGui items: `"ImGui//