-
Notifications
You must be signed in to change notification settings - Fork 14
feat: Implement summarization by AI ✨ #2866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| className={cx(styles['viewer-ai-summarise-btn'])} | ||
| /> | ||
| } | ||
| aria-label={t('Viewer.summriseWithAi')} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summariseWithAi
| * @param {number} options.pdfPageCount - Number of pages if the file is a PDF | ||
| * @returns {boolean} Whether the file is compatible with summary | ||
| */ | ||
| export const isFileSummaryCompatible = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is a good candidate for unit tests.
| } | ||
| } | ||
| `} | ||
| </style> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we do something else than this?
|
Code is globally very clean and understandable 👍👍 |
cb9b3cd to
b51cc92
Compare
02b71c1 to
0e55302
Compare
packages/cozy-viewer/package.json
Outdated
| "react-markdown": "^4.0.8", | ||
| "react-pdf": "^5.7.2" | ||
| "react-pdf": "^5.7.2", | ||
| "react-router-dom": "^7.9.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I forgot:
- add it in devDeps ("react-router-dom": "^6.14.2") and in peerDeps ("react-router-dom": ">=6.14.2") instead of adding it here in deps to avoid multiple version of react-router-dom
- in this case, the commit become a breaking change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add "BREAKING CHANGE: explanation" to the commit message before merging it so it does a major version.
Something like BREAKING CHANGE: You need to add "react-router-dom" >= 6.14.2 as dep in your app. You won't need to do anything when updating cozy-viewer in cozy-drive because it is already here but with this we keep a clean versioning.
97a3e8e to
6908315
Compare
packages/cozy-viewer/src/helpers.js
Outdated
|
|
||
| if (config.options) { | ||
| const { pageLimit } = config.options | ||
| const { pdfPageCount } = options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature will not be restricted to PDF, we can have file summary on text files, office, etc
Furthermore, I assume you made the drive.summary flag an array to enable specific config per mime ?
Which is nice, but maybe a bit overkill at this point, I think we can keep it simple and just set a generic page count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is already here we can keep it no? But agree that pdfPageCount could be renamed (if we know page count of office file?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not because it's here that it's good to keep it 😄
I think it's a good idea to keep a list of supported mimetype anywayn and we can just use a common default page number
| role: 'user', | ||
| content: `Please provide a concise summary of the following document:\n\n${textContent}` | ||
| } | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be eventually externalized, but in the meantime, could you isolate the prompt in a separated file, like prompts.js ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the prompt itself, some improvements suggestion:
const systemPrompt = `
You are a concise and reliable text summarizer.
Your goal:
- Produce a clear and accurate summary of the provided content
- Keep the original meaning and key information only
- Remove redundancy, examples, anecdotes, and minor details
Writing rules:
- Keep the same language as the input
- Be concise and use simple phrasing
- Do not add new information
- Do not guess what is not explicitly stated
Output:
- A single coherent paragraph unless otherwise specified
- Do not add any extra information or interpret anything beyond the explicit task
`
const userPrompt = `Summarize the following content:\n\n${textContent}`There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to keep it in a separated file from now
11ac343 to
9299208
Compare
9299208 to
b1006f1
Compare
b1006f1 to
912fcfc
Compare
8adefdb to
faeddc3
Compare
| const textContent = await extractText(client, fileBlob, { | ||
| name: file.name, | ||
| mime: file.mime || file.class | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a safeguard in case there is no page info, assuming 1 token is ~4 characters.
const roughTokensEstimation = (text) => {
// Assuming 1 token ~ 4 characters
return Math.ceil(text.length / 4)
}
...
if (roughTokensEstimation(text) > summaryConfig.maxTokens) {
// bye bye
}
270e044 to
e3cf773
Compare
dcfc9aa to
588918a
Compare
| roughTokensEstimation(textContent) > summaryConfig.maxTokens | ||
| ) { | ||
| throw new Error( | ||
| `Text content exceeds maximum token limit (${summaryConfig.maxTokens} tokens)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this error happens, the user will have a generic error displayed, right? I think this specific error should be displayed to the user, so he can understand that the summary is not possible for this file, and not just a temporary issue where he can retry later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
BREAKING CHANGE: You need to add react-router-dom >= 6.14.2 as dep in your app ➕
588918a to
bcb82c4
Compare

Need:
linagora/twake-drive#3625
#2865
cozy/cozy-client#1655
BREAKING CHANGE: You need to add
react-router-dom>= 6.14.2 as dep in your app