Skip to content

Commit 5b8d068

Browse files
chore: sync config from main
1 parent e68d0bd commit 5b8d068

118 files changed

Lines changed: 28202 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
---
2+
name: ops-inspector
3+
description: AIOps-style one-click inspection skill for CloudBase resources. Use this skill when users need to diagnose errors, check resource health, inspect logs, or run a comprehensive health check across cloud functions, CloudRun services, databases, and other CloudBase resources.
4+
version: 2.16.1
5+
alwaysApply: false
6+
---
7+
8+
## Standalone Install Note
9+
10+
If this environment only installed the current skill, start from the CloudBase main entry and use the published `cloudbase/references/...` paths for sibling skills.
11+
12+
- CloudBase main entry: `https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/SKILL.md`
13+
- Current skill raw source: `https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/ops-inspector/SKILL.md`
14+
15+
Keep local `references/...` paths for files that ship with the current skill directory. When this file points to a sibling skill such as `cloud-functions` or `cloudrun-development`, use the standalone fallback URL shown next to that reference.
16+
17+
## Activation Contract
18+
19+
### Use this first when
20+
21+
- The user wants to check the health or status of CloudBase resources (cloud functions, CloudRun, databases, storage, etc.).
22+
- The user reports errors, failures, or abnormal behavior and wants a quick diagnosis.
23+
- The user asks for an "inspection", "health check", "巡检", "诊断", or "troubleshooting" of their CloudBase environment.
24+
- The user wants to review recent error logs across services.
25+
26+
### Read before writing code if
27+
28+
- The inspection reveals code-level issues in cloud functions or CloudRun services — then read the relevant implementation skill before suggesting fixes.
29+
- The user wants to fix a problem found during inspection rather than just diagnose it.
30+
31+
### Then also read
32+
33+
- Cloud function issues -> `../cloud-functions/SKILL.md` (standalone fallback: `https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloud-functions/SKILL.md`)
34+
- CloudRun issues -> `../cloudrun-development/SKILL.md` (standalone fallback: `https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloudrun-development/SKILL.md`)
35+
- Database issues -> `../relational-database-tool/SKILL.md` (standalone fallback: `https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/relational-database-tool/SKILL.md`) or `../no-sql-web-sdk/SKILL.md` (standalone fallback: `https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/no-sql-web-sdk/SKILL.md`)
36+
- Platform overview -> `../cloudbase-platform/SKILL.md` (standalone fallback: `https://cnb.cool/tencent/cloud/cloudbase/cloudbase-skills/-/git/raw/main/skills/cloudbase/references/cloudbase-platform/SKILL.md`)
37+
38+
### Do NOT use for
39+
40+
- Deploying new resources or writing application code. This skill is read-only and diagnostic.
41+
- Replacing proper monitoring/alerting infrastructure. It provides point-in-time inspection, not continuous monitoring.
42+
- Directly fixing problems — it diagnoses and recommends; actual fixes should use the appropriate implementation skill.
43+
44+
### Common mistakes / gotchas
45+
46+
- Running a full inspection without first confirming the environment is bound (`auth` tool must show logged-in and env-bound state).
47+
- Ignoring CLS log service status — if CLS is not enabled, `queryLogs` will fail; always check first with `queryLogs(action="checkLogService")`.
48+
- Searching logs without a time range — this can return excessive or irrelevant results. Always scope searches to a relevant time window.
49+
- Treating a single error log as the root cause without correlating across resources. A function error may stem from a database or config issue.
50+
51+
### Minimal checklist
52+
53+
- [ ] Environment is bound and accessible (`envQuery(action="info")`)
54+
- [ ] CLS log service is enabled (`queryLogs(action="checkLogService")`)
55+
- [ ] All target resources are listed before diving into details
56+
- [ ] Time range is specified for any log searches
57+
- [ ] Findings are summarized with severity levels and actionable recommendations
58+
59+
---
60+
61+
## How to use this skill (for a coding agent)
62+
63+
### Inspection Modes
64+
65+
The skill supports two modes based on user intent:
66+
67+
| Mode | When to use | Scope |
68+
|------|-------------|-------|
69+
| **Full inspection** | User asks for a general health check / 巡检 / 全面检查 | All resource types in the environment |
70+
| **Targeted inspection** | User reports a specific error or asks about a specific resource | One resource type or a specific resource |
71+
72+
### Full Inspection Workflow
73+
74+
Follow these steps in order for a comprehensive environment health check:
75+
76+
**Step 1 — Environment Check**
77+
78+
```
79+
envQuery(action="info")
80+
```
81+
82+
Confirm the environment is accessible. Record the `envId` for console link generation.
83+
84+
**Step 2 — Log Service Status**
85+
86+
```
87+
queryLogs(action="checkLogService")
88+
```
89+
90+
If CLS is not enabled, note this as a **warning** — log-based diagnosis will be unavailable. Recommend enabling CLS in the console: `https://tcb.cloud.tencent.com/dev?envId=${envId}#/devops/log`
91+
92+
**Step 3 — Cloud Functions Inspection**
93+
94+
```
95+
queryFunctions(action="listFunctions")
96+
```
97+
98+
For each function, check:
99+
- **Status**: Is the function in an active/deployed state?
100+
- **Recent errors**: `queryFunctions(action="listFunctionLogs", functionName="<name>", startTime="<recent>")`
101+
- **Common issues**:
102+
- Timeout errors (execution exceeded limit)
103+
- Memory limit exceeded
104+
- Runtime errors (unhandled exceptions)
105+
- Cold start frequency
106+
107+
**Step 4 — CloudRun Services Inspection**
108+
109+
```
110+
queryCloudRun(action="list")
111+
```
112+
113+
For each service, check:
114+
- **Status**: Is the service running?
115+
- **Detail**: `queryCloudRun(action="detail", detailServerName="<name>")`
116+
- **Common issues**:
117+
- Service not running (scaled to zero or crashed)
118+
- Image pull failures
119+
- OOMKilled events
120+
- Health check failures
121+
122+
**Step 5 — Error Log Aggregation** (if CLS is enabled)
123+
124+
```
125+
queryLogs(action="searchLogs", queryString="ERROR", service="tcb", startTime="<24h-ago>", limit=50)
126+
queryLogs(action="searchLogs", queryString="ERROR", service="tcbr", startTime="<24h-ago>", limit=50)
127+
```
128+
129+
Look for patterns:
130+
- Repeated error messages (same error many times)
131+
- Cascading failures (errors in multiple services around the same time)
132+
- Timeout patterns
133+
134+
**Step 6 — Summary Report**
135+
136+
Generate a structured report:
137+
138+
```markdown
139+
# CloudBase Resource Inspection Report
140+
141+
**Environment**: ${envId}
142+
**Inspection Time**: ${timestamp}
143+
144+
## Overall Health: ✅ Healthy / ⚠️ Warnings Found / ❌ Issues Found
145+
146+
### Cloud Functions
147+
| Function | Status | Recent Errors | Severity |
148+
|----------|--------|---------------|----------|
149+
| ... | ... | ... | ... |
150+
151+
### CloudRun Services
152+
| Service | Status | Issues | Severity |
153+
|---------|--------|--------|----------|
154+
| ... | ... | ... | ... |
155+
156+
### Error Log Summary
157+
- Total errors in last 24h: N
158+
- Top error patterns: ...
159+
160+
## Recommendations
161+
1. ...
162+
2. ...
163+
164+
## Console Links
165+
- Cloud Functions: https://tcb.cloud.tencent.com/dev?envId=${envId}#/scf
166+
- CloudRun: https://tcb.cloud.tencent.com/dev?envId=${envId}#/platform-run
167+
- Logs: https://tcb.cloud.tencent.com/dev?envId=${envId}#/devops/log
168+
```
169+
170+
### Targeted Inspection Workflow
171+
172+
When the user specifies a resource type or a specific resource:
173+
174+
1. **Cloud function errors**: `queryFunctions(action="listFunctionLogs", functionName="<name>")` then `queryLogs(action="searchLogs", queryString="* AND functionName:<name> AND level:ERROR", ...)`
175+
2. **CloudRun errors**: `queryCloudRun(action="detail", detailServerName="<name>")` then `queryLogs(action="searchLogs", queryString="ERROR", service="tcbr", ...)`
176+
3. **Database issues**: Check `querySqlDatabase` or `readNoSqlDatabaseStructure` depending on type
177+
4. **General error search**: `queryLogs(action="searchLogs", queryString="<error-keyword>", ...)`
178+
179+
### AIOps Methodology
180+
181+
This skill follows AIOps principles for intelligent inspection:
182+
183+
1. **Data Collection**: Gather logs and resource states via MCP tools
184+
2. **Pattern Recognition**: Identify recurring errors, anomaly patterns, and correlations across services
185+
3. **Root Cause Hypothesis**: Based on error patterns, suggest likely root causes (e.g., a function timeout may be caused by a database query bottleneck)
186+
4. **Actionable Recommendations**: Provide specific, prioritized remediation steps with links to relevant skills and console pages
187+
188+
### Severity Levels
189+
190+
| Level | Icon | Meaning |
191+
|-------|------|---------|
192+
| Critical || Service is down or data is at risk; requires immediate action |
193+
| Warning | ⚠️ | Errors detected but service is still partially functional; investigate soon |
194+
| Info | ℹ️ | No errors found; informational status only |
195+
| Healthy || Resource is operating normally |
196+
197+
### Preferred Tool Map
198+
199+
| Operation | MCP Tool Call |
200+
|-----------|---------------|
201+
| Check environment | `envQuery(action="info")` |
202+
| Check CLS status | `queryLogs(action="checkLogService")` |
203+
| List cloud functions | `queryFunctions(action="listFunctions")` |
204+
| Get function detail | `queryFunctions(action="getFunctionDetail", functionName="<name>")` |
205+
| Get function logs | `queryFunctions(action="listFunctionLogs", functionName="<name>", startTime="<time>", endTime="<time>")` |
206+
| Get function log detail | `queryFunctions(action="getFunctionLogDetail", requestId="<id>")` |
207+
| List CloudRun services | `queryCloudRun(action="list")` |
208+
| Get CloudRun detail | `queryCloudRun(action="detail", detailServerName="<name>")` |
209+
| Search CLS logs | `queryLogs(action="searchLogs", queryString="<query>", service="tcb\|tcbr", startTime="<time>", endTime="<time>")` |
210+
| Check NoSQL structure | `readNoSqlDatabaseStructure(action="listCollections")` |
211+
| Check MySQL status | `querySqlDatabase(action="getContext")` |
212+
213+
### Common CLS Query Patterns
214+
215+
| Scenario | queryString |
216+
|----------|-------------|
217+
| All errors | `ERROR` |
218+
| Function timeout | `timeout OR 超时` |
219+
| Function OOM | `OOM OR out of memory OR 内存超限` |
220+
| CloudRun crash | `crash OR OOMKilled OR Error` |
221+
| Specific function errors | `functionName:<name> AND level:ERROR` |
222+
| 5xx HTTP errors | `statusCode:>499` |
223+
| Cold start issues | `coldStart OR 冷启动` |
224+
225+
### Time Range Guidance
226+
227+
- **Quick check**: Last 1 hour (`startTime` = 1 hour ago)
228+
- **Standard inspection**: Last 24 hours
229+
- **Trend analysis**: Last 7 days
230+
- **Specific incident**: Narrow to the reported time window
231+
232+
Always use ISO 8601 format for `startTime`/`endTime`, e.g., `"2025-01-15 00:00:00"`.
233+
234+
## Related Skills
235+
236+
- `cloud-functions` — Cloud function development, deployment, and debugging
237+
- `cloudrun-development` — CloudRun backend deployment and management
238+
- `cloudbase-platform` — General platform knowledge and console navigation
239+
- `relational-database-tool` — MySQL database management and diagnostics

0 commit comments

Comments
 (0)