Files
OrcaSlicer/doc/automation.md
SoftFever 892b33bac5 refactor(automation): drop screenshot.viewport3d, keep only screenshot.window
The on-screen window capture is composited from the desktop framebuffer, so it
already includes the GL 3D viewport as currently shown (model in the editor,
toolpaths in Preview). The offscreen render_thumbnail path only ever drew the
model GLVolumeCollection — never the gcode toolpaths — and produced a blank image
after slicing because the app switches to the Preview panel. Rather than maintain a
second, more limited capture method, remove it entirely.

Removes the JSON-RPC method, IUiBackend/WxUiBackend implementation, dispatcher
route + capability entry, the now-dead opt_int/thumbnail_to_wximage helpers and
ThumbnailData include, the mock override + unit test, and the Python
screenshot_3d client method. Docs updated accordingly.
2026-06-03 18:05:23 +08:00

24 KiB

OrcaSlicer UI Automation Protocol (v1.0.0)

OrcaSlicer ships an opt-in, localhost-only JSON-RPC server that lets external scripts introspect, drive, and screenshot the running OrcaSlicer GUI. It is built for end-to-end testing and automation: a script can enumerate the live widget tree, click buttons, type text, send keyboard shortcuts, wait for UI state, query high-level application state, and capture window images (the on-screen capture includes the 3D viewport).

This document is the protocol reference. It describes activation, the transport, the JSON-RPC envelope, every method, the unified node shape, the target/locator model, error codes, the set of instrumented automation ids, ImGui specifics, platform caveats, a quick-start snippet, and planned future work.


1. Overview & activation

The automation server is OFF by default. It is enabled with two command-line flags:

Flag Meaning
--automation-server Enable the automation server.
--automation-server-port=PORT Override the listening port. Optional; default is 13619.

Example:

OrcaSlicer --automation-server --automation-server-port=13619 model.stl

The server binds to 127.0.0.1 only (the loopback interface). It is never exposed on an external network interface.

Security note (v1): there is no authentication token in v1. The localhost bind is the only security boundary. Any process able to run code on the machine can connect to the port and drive the GUI — including injecting mouse and keyboard input — while the server is enabled. The feature is intended for testing and automation environments, not for production or shared/multi-user machines.

When the server is enabled, OrcaSlicer emits a warning-level log line at startup to make the active input-injection surface obvious in logs, for example:

UI automation server ENABLED ... input injection is active

2. Transport

The server speaks HTTP/1.1 over the loopback TCP socket:

Request Response
POST /jsonrpc with a JSON-RPC 2.0 request body A JSON-RPC 2.0 response with Content-Type: application/json.
GET / A plain-text health page: OrcaSlicer automation server v1.0.0 (Content-Type: text/plain).
Anything else HTTP 404 Not Found.

The server is single-client / serialized in v1: it handles one request at a time on its own dedicated I/O thread. Connections are not kept alive; each request is answered and the socket is closed. Clients should issue requests sequentially.


3. JSON-RPC envelope

The protocol follows JSON-RPC 2.0.

Request:

{ "jsonrpc": "2.0", "id": <id>, "method": "<method>", "params": { ... } }
  • params may be omitted; the server treats a missing params as an empty object.

Success response:

{ "jsonrpc": "2.0", "id": <id>, "result": { ... } }

Error response:

{ "jsonrpc": "2.0", "id": <id>, "error": { "code": <int>, "message": "<string>" } }

The request id is echoed back in the response. When the request has no id, or when the request body cannot be parsed as JSON, the response id is null.


4. Methods

There are 11 methods. Capabilities advertised by automation.version list the 10 callable feature methods (every method except automation.version itself).

automation.version

Returns server identity and the list of supported methods. Takes no parameters.

Result:

{
  "version": "1.0.0",
  "protocol": "2.0",
  "capabilities": [
    "tree.dump", "tree.find", "widget.get", "input.click", "input.type",
    "input.key", "sync.wait_for", "app.state", "screenshot.window"
  ]
}

tree.dump

Snapshot the live UI tree as a single root node with nested children.

Params (all optional):

Param Type Default Meaning
root string (id or path) full tree Root the dump at the node with this id/path.
max_depth int -1 Maximum depth to descend. -1 = unlimited.
visible_only bool false When true, omit non-visible nodes.
include_imgui bool true When true, include ImGui items.

Result: the serialized root node, with children included.

tree.find

Find all nodes matching a target predicate.

Params: a target predicate — any combination of name, class, label, value, backend (provided fields are ANDed). The params object is the target itself (it is not wrapped in a target key for this method).

Result: a flat JSON array of matching nodes. The nodes in this array are returned without their children (use widget.get/tree.dump to descend).

widget.get

Fetch a single node by target.

Params:

Param Type Required Meaning
target object yes Target spec (id / path / predicate).

Result: a single node, with its children included.

Errors: 1001 if the target is not found or ambiguous (more than one match).

input.click

Click a resolved, actionable node.

Params:

Param Type Default Meaning
target object required Target spec; must resolve to exactly one node.
button string "left" "left", "right", or "middle".
double bool false Double-click when true.
modifiers array of string [] Held modifiers: any of "ctrl", "shift", "alt", "cmd" ("meta" is accepted as an alias of "cmd").

Result: { "ok": true }.

Errors: 1001 not found / ambiguous; 1002 if the target is disabled or hidden (not actionable). The click path raises and focuses the target's top-level window before injecting the click.

input.type

Type text into the currently focused control.

Params:

Param Type Required Meaning
text string yes The text to type.
target object no If given, this node is clicked first (to focus it) before typing.

Result: { "ok": true }.

Errors: if target is supplied, the same actionability errors as input.click apply (1001 / 1002).

input.key

Send a key chord (a key plus optional modifiers) to the focused window.

Params:

Param Type Required Meaning
keys string or array yes Either a "+"-joined string like "ctrl+s", or an array like ["ctrl", "s"]. The last token is the key; earlier tokens are modifiers.

Result: { "ok": true }.

Key names must be lowercase. Recognized key names include "enter", "tab", "esc", "space", "delete", "backspace", "f5" (and other function keys), and single characters (e.g. "s", "a"). Recognized modifiers are "ctrl", "shift", "alt", "cmd" (with "meta" as an alias for "cmd"). Unrecognized or uppercase key names are silently ignored — no error is returned, the key simply does not fire. Use lowercase names exclusively.

sync.wait_for

Poll the UI until a target node reaches a desired state, or time out. This is the preferred way to synchronize with asynchronous UI changes (it replaces fragile fixed sleeps). Internally it repeatedly refreshes and dumps the tree, re-resolves the target, and evaluates the requested state until it is satisfied.

Params:

Param Type Default Meaning
target object required Target spec.
state string required One of "exists", "visible", "enabled", "value".
value string Required when state is "value"; the expected value to match.
timeout_ms int 5000 Maximum time to wait, in milliseconds.
poll_ms int 100 Poll interval, in milliseconds (minimum 1).

State semantics:

  • exists — the target resolves to a node.
  • visible — the node exists and is visible.
  • enabled — the node exists and is both enabled and visible.
  • value — the node has a value and that value equals the supplied value.

Result: { "ok": true, "elapsed_ms": <int> }.

Errors: 1003 on timeout (the state was not reached within timeout_ms).

app.state

Return a high-level application-state snapshot. Takes no parameters.

Result:

{
  "active_tab": "<string>",
  "project_loaded": <bool>,
  "slicing": <bool>,
  "slice_progress": <int>,
  "foreground": <bool>,
  "modal_dialog": "<string>"
}
Field Meaning
active_tab The active top-level tab/page.
project_loaded Whether a project/model is currently loaded.
slicing Whether slicing is currently in progress.
slice_progress Slicing progress (-1 when unknown).
foreground Whether the main window is in the foreground.
modal_dialog Present only when a modal dialog is active; identifies it. Omitted otherwise.

screenshot.window

Capture a window as a PNG, exactly as it appears on screen.

Params:

Param Type Default Meaning
target object main frame If given, capture this window; otherwise capture the main frame.

Result: { "png_base64": "<base64 PNG>", "width": <int>, "height": <int> }.

Errors: 1005 on screenshot failure; 1001 if a supplied target is not found or ambiguous.

How it works: the window's on-screen rectangle is read back from the DWM-composited desktop framebuffer (wxScreenDC), so the capture includes every native child control, the OpenGL 3D viewport, and ImGui overlays — it is a faithful image of what the user sees. (Capturing the parent window's own client DC instead would clip out child HWNDs and the GL surface, leaving them black; that is why this method reads from the screen.)

Caveats:

  • The window must be visible and unobscured. Because the source is the on-screen framebuffer, any overlapping window occludes the captured region. The backend raises the target window before capturing.
  • HiDPI: the reported width/height come from the window's logical client size, while the screen framebuffer is in physical pixels. On per-monitor-DPI displays the two can differ; the capture may be cropped or scaled relative to the logical size.
  • Because the capture is the live on-screen image, the 3D content reflects the current view: the model in the 3D editor, or the gcode toolpaths in Preview after a slice. There is no separate offscreen 3D-render method — the window capture already includes whatever the GL canvas is showing.

5. Unified node shape

Both wx widgets and ImGui items are reported with the same node schema:

{
  "backend": "wx" | "imgui",
  "id": "<string>",
  "path": "<string>",
  "class": "<string>",
  "label": "<string>",
  "rect": { "x": <int>, "y": <int>, "w": <int>, "h": <int> },
  "enabled": <bool>,
  "visible": <bool>,
  "value": "<string>",
  "children": [ <node>, ... ]
}
Field Meaning
backend "wx" for native wxWidgets controls, "imgui" for immediate-mode ImGui items.
id The automation id when one is set, otherwise a derived id. For ImGui items the path doubles as the id.
path Positional path, e.g. "MainFrame/Panel[2]/Button[0]". For ImGui items: "ImGui/<window>/<label>".
class wx class name, or the ImGui item type.
label The control's label/caption. May include an ImGui ##-id suffix for ImGui items.
rect Bounding rectangle in screen coordinates.
enabled Whether the control is enabled.
visible Whether the control is visible.
value The control's value (text/choice/check/slider, etc.). Omitted entirely when the control has no applicable value.
children Child nodes. wx only, and present only when children are included (e.g. tree.dump, widget.get). ImGui items are flat (no children) and are listed under their window.

Notes:

  • The value key is omitted (not null) when the control has no value.
  • children is present only for wx nodes when children are requested; ImGui nodes never carry children.

6. Target / locator

Most methods accept a target object that identifies one or more nodes. A target may specify:

Field Meaning
id Exact automation id.
path Exact positional path.
name Predicate: matches either the node's id or its label.
class Predicate: exact class name.
label Predicate: exact label.
value Predicate: node has a value and it equals this string.
backend Predicate: "wx" or "imgui".

Resolution order: idpath → predicate.

  • If id is present, only id is used (exact match).
  • Else if path is present, only path is used (exact match).
  • Else the predicate fields (name, class, label, value, backend) are used, and all provided predicate fields are ANDed together.

Action methods (input.click, input.type with a target, widget.get, and single-target screenshot.window) require a unique match. If the target resolves to zero matches or more than one match, the call fails with error 1001 (not found / ambiguous). tree.find is the exception: it returns all matches as an array and never errors on ambiguity.


7. Error codes

Standard JSON-RPC codes:

Code Meaning
-32700 Parse error — the request body was not valid JSON.
-32600 Invalid request — missing/invalid method.
-32601 Method not found — unknown method name.
-32602 Invalid params — missing/invalid parameters for the method.

Application-specific codes:

Code Meaning
1001 Widget/target not found or ambiguous (more than one match).
1002 Not actionable — the target is disabled or hidden.
1003 Wait timeout — sync.wait_for did not reach the requested state in time.
1004 GUI thread busy / timeout — a backend call could not be marshaled onto the GUI thread in time (wedged GUI).
1005 Screenshot failed.
1006 Disabled.

8. Automation-id naming conventions & instrumented ids

Stable automation ids follow these prefix conventions:

Prefix Used for
btn_ Buttons
combo_ Preset combo boxes
tab_ Tabs
canvas_ Canvases
dlg_ Dialog buttons

Instrumented ids (as-built in v1)

The following controls currently carry stable automation ids:

id Control Note
btn_slice Slice-plate button
btn_export Print / Export button Multi-purpose: the action (Print plate / Export G-code / Send) depends on the current mode.
tab_device Device / Monitor tab (MonitorPanel)
combo_printer Printer preset combo (sidebar)
combo_filament Filament preset combo (sidebar) First filament row only; extra multi-material rows are not instrumented.
canvas_3d 3D editor GL canvas

Controls NOT instrumented in v1

Several controls are intentionally not instrumented in v1 because they have no stable wxWindow target to attach an id to:

  • combo_process — process settings are not a sidebar combo box in the current OrcaSlicer layout, so there is no combo control to instrument.
  • btn_add — the add/import-object control is a GLToolbar item rendered inside the GL canvas, not a wxWindow.
  • tab_prepare / tab_preview — the Prepare and Preview notebook pages are both backed by the same window, and the per-tab buttons are private; there is no distinct stable window to target.

For controls that are not instrumented, scripts should fall back to class / label / path lookup (for wx controls) or ImGui-item lookup (for ImGui controls).


9. ImGui notes

ImGui is immediate-mode: an item is addressable only while it is being drawn in the current frame. The automation backend records ImGui items each frame, and a refresh_ui is forced before every read or action so that the latest frame's items are captured.

Consequences and conventions:

  • Use sync.wait_for to wait for a transient gizmo or panel item to appear before acting on it.
  • ImGui items are reported with backend: "imgui", a path of the form ImGui/<window>/<label>, and that path doubles as the item's id in v1.
  • ImGui items are flat — they have no children and are listed under their window.
  • Labels may include ImGui ##-id suffixes (the part after ## that ImGui uses to disambiguate identically labeled widgets).
  • Raw ImGui:: gizmos that are not routed through the instrumented ImGuiWrapper widgets (for example some Emboss / SVG / Text gizmo controls) are only covered at the window level in v1; their individual sub-items are not enumerated.

10. Platform & display caveats

  • Input requires a focused, visible window. OS-level input injection uses wxUIActionSimulator, which requires a focused, visible window. The click path raises and focuses the target's top-level window first.
  • Linux CI needs a display. There must be an X display available; wrap test runs with xvfb-run (for example, xvfb-run -a python example_slice.py ...).
  • Input is asynchronous. Do not rely on fixed sleeps. Use sync.wait_for — for example, wait for btn_export to become enabled after slicing completes — rather than sleeping for a guessed duration.
  • screenshot.window reads the screen. It captures the on-screen, DWM-composited framebuffer, so the target window must be visible and unobscured, and the result is in physical pixels (see HiDPI caveat under screenshot.window). The capture includes the GL 3D viewport as currently shown (model or toolpaths).
  • Single-client / serialized. v1 handles one request at a time; issue requests sequentially from a single client.
  • GUI-thread marshaling. Every backend call is marshaled onto the GUI thread with a timeout. A wedged or unresponsive GUI returns error 1004.

11. Quick start

Using the reference client in tools/automation/orca_automation.py:

from orca_automation import OrcaClient

orca = OrcaClient(port=13619)
print(orca.version())                       # {'version': '1.0.0', ...}

orca.click({"id": "btn_slice"})             # start slicing the plate
orca.wait_for({"id": "btn_export"},         # wait until slicing finishes
              state="enabled", timeout_ms=180000)

png = orca.screenshot()                     # on-screen capture (incl. 3D view)
with open("window.png", "wb") as f:
    f.write(png)

For a full, runnable end-to-end example — launching OrcaSlicer with the automation flags, loading a model, slicing, waiting for completion, and saving a window PNG — see tools/automation/example_slice.py.


12. Future work

Planned enhancements beyond v1:

  • Authentication token plus a Preferences toggle to enable/disable the server from the GUI.
  • WebSocket push events for real-time UI/state notifications (instead of polling).
  • Per-item ImGui gizmo instrumentation so individual gizmo sub-controls (Emboss / SVG / Text, etc.) are addressable, not just at the window level.
  • More widget ids — the process combo, the add/import button, and the Prepare/Preview tabs once they expose stable windows.
  • An MCP wrapper to expose the automation surface to model-context tooling.

Verification (v1)

This section records the final regression gate for the v1 feature: confirmation that the protocol core is covered by unit tests, that the existing test suites are unaffected, and that the disabled path (automation OFF, the default) is a true no-op — zero new threads, zero socket binds, zero allocations, and zero behavior change.

Unit-suite results (Release, Windows / MSVC, Ninja Multi-Config)

Suite Result
automation (protocol core) 32 / 32 passed
libslic3r (most affected by the additive PrintConfig.cpp CLI options) 99 / 99 passed
fff_print 14 / 14 passed
libnest2d 14 / 14 passed
sla_print 21 / 21 passed
slic3rutils 3 / 5 passed — 2 pre-existing [OrcaCloudServiceAgent] SEGFAULTs, unrelated to automation (see note)

The two slic3rutils failures are Orca cloud flat/nested session resolves display name consistently. They exercise Slic3r::OrcaCloudServiceAgent, which the automation branch does not touch (verified via git diff --stat main...HEAD — no change to src/slic3r/Utils/OrcaCloudServiceAgent.* or tests/slic3rutils/*). They are pre-existing and not a regression introduced by this feature.

Static disabled-path audit (the core regression guarantee)

Verified by code reading that with no --automation-server flag:

  • Flag defaults off. m_automation_port defaults to 0 (src/slic3r/GUI/GUI_App.hpp:249); is_automation_enabled() returns m_automation_port > 0 (GUI_App.hpp:386) → false by default.
  • No server / thread / socket. post_init() calls start_automation_server() only when init_params->automation_port > 0 (src/slic3r/GUI/GUI_App.cpp:737-740), and start_automation_server() itself early-returns when m_automation_port <= 0 (GUI_App.cpp:7097). The backend / dispatcher / beast server objects are constructed nowhere else → no orca_automation thread and no localhost bind when the flag is absent.
  • Recording hooks short-circuit. ImGuiWrapper::automation_record_last_item has as its first statement if (!wxGetApp().is_automation_enabled()) return; (src/slic3r/GUI/ImGuiWrapper.cpp:576-577) — a single bool check, no ImGuiItemRecord allocation and no ImGuiItemTable access on the disabled path. In ImGuiWrapper::render() the window-enumeration loop and swap_frame() are fully wrapped in if (wxGetApp().is_automation_enabled()) (ImGuiWrapper.cpp:599-611); when off, render() is its original ImGui::Render() + render_draw_data() plus one bool check.
  • Instrumentation is inert. The ~7 set_automation_id(...) calls (MainFrame.cpp:1330,1389,1841,1842; Plater.cpp:1772,2172,5068) only store a pointer into a static registry and bind a wxEVT_DESTROY pruning handler (src/slic3r/GUI/Automation/AutomationRegistry.cpp:24-36). The registry is read only via window_for_automation_id / automation_id_of, which are called solely by the backend while the server is running → harmless when off.
  • CLI options are purely additive. automation_server (coBool, default false) and automation_server_port (coInt, default 13619) are new add() entries appended after enable_timelapse (src/libslic3r/PrintConfig.cpp:10794-10805); no existing option is changed. GUI_InitParams::automation_port defaults to 0 (src/slic3r/GUI/GUI_Init.hpp:37) and is set only when --automation-server is supplied (src/OrcaSlicer.cpp:1345-1348).

Conclusion: with automation OFF (the default), the feature allocates nothing and changes nothing — the only added cost on any hot path is a single boolean comparison.

Deferred manual runtime checks (require a display / Xvfb)

These need a live GUI and cannot be run headlessly in CI; they are the manual acceptance steps:

  1. Launch without --automation-servercurl http://127.0.0.1:13619/ fails to connect (no listener); no orca_automation thread exists.
  2. Launch with --automation-server --automation-server-port=13619GET / returns the health text; POST /jsonrpc {"method":"automation.version"} returns version / protocol / capabilities; widget.get {"target":{"id":"btn_slice"}} returns a node with a sensible screen rect.
  3. Interactive sanity: open a gizmo / move sliders with automation OFF → no visual or behavior change.

See tools/automation/example_slice.py for the runnable end-to-end path.