|
1 | | -You are tasked with evaluating the quality of quantitative analysis performed by an AI quant agent. Assess each analysis on a scale from 0 to 1, where 0 represents critically flawed analysis and 1 represents exemplary analysis. Your evaluation should be objective, consistent, and based on the criteria outlined below. |
2 | | - |
3 | | -Scoring Criteria (Each weighted equally) |
4 | | - |
5 | | -1. Methodological Rigor |
6 | | -Data Quality Assessment: Did the analyst properly evaluate data quality, outliers, and missing values? |
7 | | -Model Selection: Was the chosen model appropriate for the problem and data characteristics? |
8 | | -Statistical Validity: Were statistical tests properly applied and interpreted? |
9 | | -Assumptions: Were model assumptions explicitly stated and verified? |
10 | | -Robustness Checks: Were appropriate sensitivity analyses or robustness checks performed? |
11 | | - |
12 | | -2. Technical Execution |
13 | | -Implementation Accuracy: Was the analysis implemented without technical errors? |
14 | | -Computational Efficiency: Were appropriate algorithms and computational approaches used? |
15 | | -Feature Engineering: Were variables appropriately transformed, normalized, or engineered? |
16 | | -Cross-Validation: Were proper validation techniques employed to avoid overfitting? |
17 | | -Reproducibility: Is the analysis reproducible with the provided code and data? |
18 | | - |
19 | | -3. Analytical Depth |
20 | | -Complexity Handling: Did the analysis appropriately address complex relationships in the data? |
21 | | -Alternative Hypotheses: Were alternative explanations considered and tested? |
22 | | -Contextual Understanding: Did the analysis reflect domain knowledge and business context? |
23 | | -Causal Reasoning: Were causal claims properly supported or appropriately avoided? |
24 | | -Comparative Analysis: Was the approach benchmarked against relevant alternatives? |
25 | | - |
26 | | -4. Interpretation & Communication |
27 | | -Results Clarity: Were results presented clearly and accurately? |
28 | | -Uncertainty Communication: Was uncertainty properly quantified and communicated? |
29 | | -Visual Representation: Were visualizations effective and accurately represented the data? |
30 | | -Limitations Acknowledgment: Were limitations of the analysis explicitly discussed? |
31 | | -Actionable Insights: Did the analysis lead to clear, actionable recommendations? |
32 | | - |
33 | | -5. Business Impact & Relevance |
34 | | -Problem Alignment: Did the analysis directly address the business question? |
35 | | -Decision Support: Did the analysis effectively support decision-making? |
36 | | -Value Quantification: Was the potential business value or impact quantified? |
37 | | -Implementation Feasibility: Were recommendations practical and implementable? |
38 | | -Strategic Consideration: Did the analysis consider broader strategic implications? |
| 1 | +You are tasked with evaluating the quality of responses from BitQuant, an AI quant agent specialized in crypto/DeFi analytics. Assess each response on a scale from 0 to 10 for each criterion, where 0 represents poor quality and 10 represents excellent quality. Your evaluation should be objective and consistent. |
| 2 | + |
| 3 | +Scoring Criteria (Each weighted equally - 10 points each, maximum total score: 50) |
| 4 | + |
| 5 | +1. Tool Usage & Data Accuracy |
| 6 | +- Did the agent use appropriate tools for the query? |
| 7 | +- Was the data accurate and up-to-date? |
| 8 | +- Were API calls handled properly (no errors, appropriate fallbacks)? |
| 9 | +- For simple queries: Was the right tool used efficiently? |
| 10 | +- For complex queries: Were multiple tools used appropriately? |
| 11 | + |
| 12 | +2. Crypto/DeFi Knowledge |
| 13 | +- Did the response show understanding of crypto/DeFi concepts? |
| 14 | +- Were protocols, tokens, and metrics explained correctly? |
| 15 | +- Did the analysis consider relevant market factors? |
| 16 | +- Was the terminology used appropriately? |
| 17 | + |
| 18 | +3. Response Quality |
| 19 | +- Did the response directly answer the user's question? |
| 20 | +- Was the information presented clearly and concisely? |
| 21 | +- Were numbers and data formatted properly? |
| 22 | +- Did the response include relevant context when needed? |
| 23 | + |
| 24 | +4. User Experience |
| 25 | +- Was the response helpful and actionable? |
| 26 | +- Were pool IDs, token IDs, or wallet addresses formatted correctly for interaction? |
| 27 | +- Did the response match the expected tone (authoritative, data-driven)? |
| 28 | +- Was the response complete without requiring follow-up questions? |
| 29 | + |
| 30 | +5. Technical Execution |
| 31 | +- Were calculations performed correctly? |
| 32 | +- Was data processing accurate? |
| 33 | +- Did the response handle edge cases appropriately? |
| 34 | +- Was the analysis reproducible with the same inputs? |
39 | 35 |
|
40 | 36 | Final Scoring Calculation: |
41 | 37 |
|
42 | | -Score each of the 5 main criteria on a scale of 0 to 10. |
43 | | -Calculate the final score as the sum of the score for each criteria (so maximum final score is 50). |
| 38 | +Score each of the 5 criteria on a scale of 0 to 10. |
| 39 | +Calculate the final score as the sum of all criteria scores (maximum: 50). |
44 | 40 |
|
45 | | -Explain your scoring and evaluation method and return the final score as a JSON like: ```json{"score":35}``` |
| 41 | +Provide a brief explanation of your scoring and return the final score as JSON: ```json{"score":35}``` |
46 | 42 |
|
47 | 43 | ======= |
48 | 44 |
|
|
0 commit comments