A comprehensive iOS automation portal that provides HTTP API access to iOS device UI state extraction and automated interactions.
The Droidrun iOS Portal is a specialized iOS application that runs UI tests to expose device automation capabilities through a RESTful HTTP API. It consists of two main components:
- Portal App (
droidrun-ios-portal): A minimal SwiftUI application that serves as the host - Portal Server (
droidrun-ios-portalUITests): XCTest-based HTTP server providing automation APIs
The portal leverages iOS XCTest framework and XCUITest capabilities to:
- Extract UI state information (accessibility trees, screenshots)
- Perform automated interactions (taps, swipes, text input)
- Launch and manage applications
- Handle device-level inputs
- DroidrunPortalServer: XCTest class that runs an HTTP server on port 6643, or the next available port up to 6652
- DroidrunPortalHandler: HTTP route handler defining the REST API endpoints
- DroidrunPortalTools: Core automation engine implementing device interactions
- AccessibilityTree: UI state extraction and compression utilities
Mobilerun does not start iOS Portal automatically. Start this XCTest server first, then point Mobilerun or any HTTP client at the local Portal URL.
- Xcode installed and opened at least once.
- An iOS simulator, or a connected and unlocked physical iPhone/iPad.
- For physical devices,
iproxyfromlibimobiledevice.
If Xcode signing fails on a local physical device, use a local command-line or Xcode user setting for your Apple Developer Team. Do not commit local signing changes to the shared project files.
Find the device UDID:
xcrun xctrace list devicesStart the Portal UI test with either the script or Xcode.
./device.sh <device-udid>Open the project:
open droidrun-ios-portal.xcodeprojIn Xcode:
- Sign in with your Apple Developer account if needed.
- Select your physical iPhone or iPad as the run destination.
- Check Signing & Capabilities for the app and UI-test targets.
- Run Product > Test.
For either option, keep the Xcode test session running. The Portal is ready when the log shows:
Portal server listening on port 6643
In another terminal, forward the device port. The explicit device form is recommended:
iproxy -u <device-udid> -s 127.0.0.1 6643:6643When only one iPhone is connected, the short form also works:
iproxy 6643 6643If the Xcode log says Portal server listening on port 6644, forward local
port 6643 to that device port instead:
iproxy -u <device-udid> -s 127.0.0.1 6643:6644List available simulators:
xcrun simctl list devices availableStart the Portal:
./simulator.sh "<simulator-name>"The simulator runs on the Mac, so iproxy is not needed.
Use these checks before running Mobilerun or another client:
curl -fsS http://127.0.0.1:6643/device/date
curl -fsS 'http://127.0.0.1:6643/state?timeout=4' -o state.json
curl -fsS http://127.0.0.1:6643/vision/screenshot -o screenshot.pngStop the XCTest run and iproxy when testing is complete.
Returns the current device date. Mobilerun uses this endpoint as a lightweight Portal health check.
Response:
{
"date": "2026-06-08T20:31:46.766Z"
}Returns the current app state, screen bounds, and compressed accessibility tree.
Response:
{
"a11y_tree": "Compressed accessibility tree string",
"phone_state": {
"currentApp": "Settings",
"packageName": "com.apple.Preferences",
"keyboardVisible": false,
"isEditable": false,
"focusedElement": null
},
"device_context": {
"screen_bounds": {
"width": 440,
"height": 956
}
}
}Captures a screenshot of the current screen.
Response: PNG image data (Content-Type: image/png)
Launches an application by bundle identifier.
Request Body:
{
"bundleIdentifier": "com.apple.Preferences"
}Response:
{
"message": "opened com.apple.Preferences"
}Performs a tap, double tap, or long press at an iOS rect.
Request Body:
{
"rect": "{{100,200},{50,50}}",
"count": 1,
"longPress": false
}Response:
{
"message": "tapped element"
}Performs a swipe between explicit start and end coordinates.
Request Body:
{
"x1": 100,
"y1": 700,
"x2": 100,
"y2": 200,
"durationMs": 300
}Response:
{
"message": "swiped"
}Navigates back when the current app exposes a supported back affordance.
Response:
{
"message": "navigated back"
}Enters text into the focused element, or taps rect first when provided.
Request Body:
{
"rect": "{{100,200},{50,50}}",
"text": "Hello World",
"clear": false
}rect and clear are optional. text is required.
Response:
{
"message": "entered text"
}Presses supported device hardware keys.
Request Body:
{
"key": 1
}Supported keys:
1: Home button2: Volume up, physical devices only3: Volume down, physical devices only4: Action button, iOS 17+ and supported hardware only5: Camera button, iOS 18+ and supported hardware only
Response:
{
"message": "pressed key"
}- Accessibility Tree: Compressed representation of the UI hierarchy with memory addresses removed
- Screenshots: PNG format screen captures
- App State: Current application context and keyboard status
- App Launching: Launch any installed app by bundle identifier
- Touch Interactions: Single taps, double taps, long presses
- Gesture Recognition: Swipe gestures between explicit start and end coordinates
- Text Input: Automated typing with keyboard handling
- Hardware Keys: Device button presses
- App Management: Automatic app switching and state management
- Keyboard Detection: Intelligent keyboard presence detection
- Focus Management: Ensures proper element focus for text input
- Error Handling: Comprehensive error reporting and validation
The Portal is designed for automation clients that can:
- send HTTP requests to the Portal endpoints
- read accessibility tree state for UI understanding
- combine
/stateand screenshots for visual verification - issue one action at a time and observe again after each action
import requests
base_url = "http://127.0.0.1:6643"
print(requests.get(f"{base_url}/device/date").json())
state = requests.get(f"{base_url}/state").json()
print(state["phone_state"])
screenshot = requests.get(f"{base_url}/vision/screenshot")
with open("screenshot.png", "wb") as f:
f.write(screenshot.content)
requests.post(
f"{base_url}/inputs/launch",
json={"bundleIdentifier": "com.apple.Preferences"},
)After the Portal health checks succeed, install Mobilerun in Python
>=3.11,<3.14 and point it at the local Portal URL:
uv pip install mobilerun
mobilerun device ui --ios --device http://127.0.0.1:6643
mobilerun device screenshot --ios --device http://127.0.0.1:6643
mobilerun device press home --ios --device http://127.0.0.1:6643
mobilerun device start com.apple.Preferences --ios --device http://127.0.0.1:6643Run an LLM-backed task with any configured Mobilerun provider:
mobilerun run "Open Settings and tell me iOS version that is currently installed" \
--ios \
--device http://127.0.0.1:6643 \
--provider <provider> \
--model <model>Example Gemini API-key run:
export GOOGLE_API_KEY="$(tr -d '\n' < "<google-key-file>")"
export GEMINI_API_KEY="$GOOGLE_API_KEY"
mobilerun run "Open Settings and tell me iOS version that is currently installed" \
--ios \
--device http://127.0.0.1:6643 \
--provider GoogleGenAI \
--model gemini-3.1-flash-liteFor screenshot-backed reasoning, enable vision:
mobilerun run "Open Settings and tell me iOS version that is currently installed" \
--ios \
--device http://127.0.0.1:6643 \
--provider GoogleGenAI \
--model gemini-3.1-flash-lite \
--vision- FlyingFox: HTTP server framework for Swift
- XCTest: iOS testing framework for UI automation
- SwiftUI: User interface framework
- Port: 6643, or the next available port up to 6652
- Protocol: HTTP/1.1
- Content Types: JSON, PNG images
- Threading: Async/await support
- Uses iOS coordinate system (points, not pixels)
- Rectangle format:
"{{x,y},{width,height}}" - Swipe coordinates use explicit start and end points
- Requires iOS testing environment to run
- Limited to apps accessible through XCUITest
- Network access required for remote operation
- Some system-level interactions may be restricted
- The portal provides full device automation access
- Should only be used in controlled testing environments
- Network access should be restricted to trusted clients
- Consider implementing authentication for production use
This project is part of the larger Droidrun automation framework. Contributions should focus on:
- Enhanced UI state extraction
- Additional gesture support
- Improved error handling
- Performance optimizations
This project is licensed under the MIT License - see the LICENSE file for details.
Note: This is the iOS portal component of the Droidrun framework. For complete automation workflows, integrate with the corresponding agent component that orchestrates automation tasks using this portal's API.