Skip to content

fix: Fix the issue where global search matches special characters#272

Closed
pengfeixx wants to merge 1 commit intolinuxdeepin:masterfrom
pengfeixx:fix-339101
Closed

fix: Fix the issue where global search matches special characters#272
pengfeixx wants to merge 1 commit intolinuxdeepin:masterfrom
pengfeixx:fix-339101

Conversation

@pengfeixx
Copy link
Copy Markdown
Contributor

@pengfeixx pengfeixx commented Jan 22, 2026

Fix the issue where global search matches special characters

Log: Fix the issue where global search matches special characters

Summary by Sourcery

Improve pinyin-based filename search to correctly handle special characters in keywords and filenames.

Bug Fixes:

  • Prevent pinyin global search from incorrectly matching or ignoring special characters and their fullwidth/halfwidth variants in filenames.

Enhancements:

  • Normalize pinyin search keywords by stripping non-letter characters before building search queries.
  • Add special-character equivalence handling between ASCII and Unicode fullwidth punctuation when matching filenames for pinyin searches.
  • Filter pinyin search results whose filenames do not contain the special characters present in the search keyword.

Tests:

  • Add unit tests for pinyin detection, pinyin keyword normalization, character equivalence, and special-character matching behavior in file search.

@deepin-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pengfeixx

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Jan 22, 2026

Reviewer's Guide

This PR tightens pinyin detection and normalizes pinyin search keywords to strip non‑letter characters, and adds a special-character matching layer (including half/full-width equivalence) so global filename search only matches files whose punctuation actually corresponds to the search keyword while still allowing relaxed matches when the keyword has no punctuation.

Sequence diagram for filename search with pinyin normalization and special-character matching

sequenceDiagram
    actor User
    participant SearchUI
    participant FileNameWorkerPrivate
    participant FileSearchUtils
    participant SearchFactory
    participant SearchBackend

    User->>SearchUI: enter keyword
    SearchUI->>FileNameWorkerPrivate: startSearch(keyword, searchInfo)

    loop buildSearchQuery
        FileNameWorkerPrivate->>FileNameWorkerPrivate: createSearchQuery()
        FileNameWorkerPrivate->>FileSearchUtils: hasWildcard(keyword)
        FileSearchUtils-->>FileNameWorkerPrivate: bool
        alt queryType is Simple
            FileNameWorkerPrivate->>FileSearchUtils: isPinyin(keyword)
            FileSearchUtils-->>FileNameWorkerPrivate: bool
            alt keyword_is_pinyin
                FileNameWorkerPrivate->>FileSearchUtils: normalizePinyinKeyword(keyword)
                FileSearchUtils-->>FileNameWorkerPrivate: normalizedKeyword
                FileNameWorkerPrivate->>SearchFactory: createQuery(normalizedKeyword, Simple)
            else keyword_not_pinyin
                FileNameWorkerPrivate->>SearchFactory: createQuery(keyword, Simple)
            end
        else queryType is Wildcard
            FileNameWorkerPrivate->>SearchFactory: createQuery(keyword, Wildcard)
        end
        SearchFactory-->>FileNameWorkerPrivate: SearchQuery
    end

    FileNameWorkerPrivate->>SearchBackend: execute(SearchQuery)
    SearchBackend-->>FileNameWorkerPrivate: candidateFileName

    loop for each candidateFileName
        FileNameWorkerPrivate->>FileSearchUtils: isPinyin(searchInfo.keyword)
        FileSearchUtils-->>FileNameWorkerPrivate: bool
        alt keyword_is_pinyin
            FileNameWorkerPrivate->>FileSearchUtils: matchSpecialChars(candidateFileName, searchInfo.keyword)
            FileSearchUtils->>FileSearchUtils: extract searchSpecialChars
            FileSearchUtils->>FileSearchUtils: extract fileSpecialChars
            FileSearchUtils->>FileSearchUtils: charsEquivalent(searchChar, fileChar)
            FileSearchUtils-->>FileNameWorkerPrivate: bool
            alt special_chars_match
                FileNameWorkerPrivate->>FileNameWorkerPrivate: appendSearchResult(candidateFileName)
            else special_chars_mismatch
                FileNameWorkerPrivate-->>FileNameWorkerPrivate: skip candidateFileName
            end
        else keyword_not_pinyin
            FileNameWorkerPrivate->>FileNameWorkerPrivate: appendSearchResult(candidateFileName)
        end
    end

    FileNameWorkerPrivate-->>SearchUI: finalResults
    SearchUI-->>User: display results
Loading

Class diagram for updated FileSearchUtils and FileNameWorkerPrivate

classDiagram
    class FileSearchUtils {
        +static QVariantHash tailerData(QFileInfo info)
        +static QStringList buildDFMSearchFileTypes(QList_Group groupList)
        +static bool isPinyin(QString str)
        +static QString normalizePinyinKeyword(QString str)
        +static bool matchSpecialChars(QString fileName, QString searchKeyword)
        +static bool charsEquivalent(QChar ch1, QChar ch2)
        +static bool hasWildcard(QString str)
    }

    class FileNameWorkerPrivate {
        -SearchInfo m_searchInfo
        -QStringList m_tmpSearchResults
        -QMutex m_mutex
        +bool appendSearchResult(QString fileName)
        +SearchQuery createSearchQuery()
        +SearchOptions createSearchOptions()
    }

    class SearchFactory {
        +static SearchQuery createQuery(QString keyword, SearchQuery_Type type)
    }

    class SearchQuery {
        <<enumeration>> Type
        Type Simple
        Type Wildcard
        +Type queryType
    }

    class SearchInfo {
        +QString keyword
        +QStringList boolKeywords
        +QStringList typeKeywords
    }

    FileNameWorkerPrivate --> FileSearchUtils : uses
    FileNameWorkerPrivate --> SearchFactory : creates_queries
    FileNameWorkerPrivate --> SearchInfo : reads
    SearchFactory --> SearchQuery : creates
Loading

File-Level Changes

Change Details Files
Adjust pinyin detection to ignore non-letter characters before checking if a keyword is a pinyin acronym sequence.
  • Modify isPinyin to build a letters-only substring from the input keyword.
  • Return false from isPinyin when the letters-only substring is empty before calling isPinyinAcronymSequence on it.
src/dde-grand-search-daemon/searcher/file/filesearchutils.cpp
Introduce helpers to normalize pinyin keywords and to compare and match special characters between filenames and search keywords.
  • Add normalizePinyinKeyword to strip non-letter characters and log the normalized pinyin keyword.
  • Add charsEquivalent to treat identical characters as equal and map Unicode full-width punctuation to their half-width forms (and vice versa).
  • Add matchSpecialChars to extract non-letter characters from the keyword and non-alphanumeric characters from the filename base, then ensure each keyword special character has an equivalent in the filename using charsEquivalent, with debug logging on match/mismatch.
  • Declare the new helper functions in FileSearchUtils header.
src/dde-grand-search-daemon/searcher/file/filesearchutils.cpp
src/dde-grand-search-daemon/searcher/file/filesearchutils.h
Integrate pinyin and special-character handling into filename search query construction and result filtering.
  • Before adding a search result, filter out files when the search keyword is pinyin and matchSpecialChars indicates a special-character mismatch, logging filtered files.
  • When building simple queries without boolean keywords, normalize pinyin keywords via normalizePinyinKeyword before creating the SearchQuery.
  • When building queries with boolean keywords, if the query is non-wildcard and the keyword is pinyin, normalize it before creating the SearchQuery.
src/dde-grand-search-daemon/searcher/file/filenameworker.cpp
Add unit tests to validate pinyin detection, normalization, character equivalence, and special-character matching behavior.
  • Stub isPinyinAcronymSequence in tests to simulate pinyin detection over pure letter sequences.
  • Test isPinyin behavior with pure pinyin, pinyin mixed with punctuation, empty strings, and punctuation-only strings.
  • Test normalizePinyinKeyword for preserving pure pinyin, stripping various special characters, and handling empty and punctuation-only inputs.
  • Test charsEquivalent for direct equality, full-width/half-width punctuation and symbol equivalence, non-equivalent characters, and exclusion of some CJK punctuation such as '。'.
  • Test matchSpecialChars for cases with and without special characters in the keyword and filename, including full-width/half-width equivalence, negative cases, multiple special characters, and mixed-width punctuation.
tests/libgrand-search-daemon/searcher/file/ut_filesearchutils.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In matchSpecialChars, digits from the search keyword are currently treated as special characters (using !ch.isLetter()), but digits are explicitly excluded on the filename side (!ch.isLetter() && !ch.isDigit()), which will cause cases like "cs(1)" to fail unexpectedly; consider aligning the conditions so digits are consistently ignored in both places.
  • The logic to strip non-letter characters from a keyword is duplicated in isPinyin and normalizePinyinKeyword; consider refactoring to have isPinyin call normalizePinyinKeyword (or a shared helper) to avoid divergence in future changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In matchSpecialChars, digits from the search keyword are currently treated as special characters (using `!ch.isLetter()`), but digits are explicitly excluded on the filename side (`!ch.isLetter() && !ch.isDigit()`), which will cause cases like `"cs(1)"` to fail unexpectedly; consider aligning the conditions so digits are consistently ignored in both places.
- The logic to strip non-letter characters from a keyword is duplicated in isPinyin and normalizePinyinKeyword; consider refactoring to have isPinyin call normalizePinyinKeyword (or a shared helper) to avoid divergence in future changes.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Fix the issue where global search matches special characters

Log: Fix the issue where global search matches special characters
@deepin-ci-robot
Copy link
Copy Markdown

deepin pr auto review

#。这段代码主要是为了增强文件名搜索功能,特别是针对拼音搜索和特殊字符匹配的改进。以下是对代码的详细审查意见:

1. 语法逻辑

优点:

  • 代码逻辑清晰,新增的函数职责明确
  • normalizePinyinKeyword 函数正确提取了字符串中的字母部分
  • charsEquivalent 函数正确处理了全角和半角字符的等价关系
  • matchSpecialChars 函数正确实现了特殊字符匹配逻辑

改进建议:

  1. matchSpecialChars 函数中,可以添加对空格字符的特殊处理:
// 在 matchSpecialChars 函数开头添加
if (searchSpecialChars.contains(' ') || searchSpecialChars.contains(' ')) {
    // 可以选择忽略空格或进行特殊处理
}
  1. charsEquivalent 函数中,可以考虑添加更多特殊字符的等价关系,例如:
static const QMap<QChar, QChar> specialEquivalents = {
    {'"', '"'},  // 双引号
    {'\'', '\''}, // 单引号
    {'`', ''},  // 反引号
    // 添加更多等价关系
};

2. 代码质量

优点:

  • 代码结构清晰,函数命名规范
  • 添加了详细的单元测试,覆盖了多种情况
  • 使用了 Qt 的标准类型和函数

改进建议:

  1. normalizePinyinKeyword 函数可以使用 Qt 的字符串处理函数简化:
QString FileSearchUtils::normalizePinyinKeyword(const QString &str)
{
    QString result;
    for (const QChar &ch : str) {
        if (ch.isLetter()) {
            result.append(ch);
        }
    }
    return result;
}
  1. matchSpecialChars 函数可以优化循环结构:
bool FileSearchUtils::matchSpecialChars(const QString &fileName, const QString &searchKeyword)
{
    // 提取搜索关键字中的特殊字符
    QSet<QChar> searchSpecialChars;
    for (const QChar &ch : searchKeyword) {
        if (!ch.isLetter() && !ch.isDigit()) {
            searchSpecialChars.insert(ch);
        }
    }
    
    if (searchSpecialChars.isEmpty()) {
        return true;
    }
    
    // 提取文件名中的特殊字符
    QSet<QChar> fileSpecialChars;
    QFileInfo fileInfo(fileName);
    QString baseName = fileInfo.completeBaseName();
    for (const QChar &ch : baseName) {
        if (!ch.isLetter() && !ch.isDigit()) {
            fileSpecialChars.insert(ch);
        }
    }
    
    // 检查每个搜索特殊字符是否在文件特殊字符中找到等价字符
    for (const QChar &searchChar : searchSpecialChars) {
        bool found = false;
        for (const QChar &fileChar : fileSpecialChars) {
            if (charsEquivalent(searchChar, fileChar)) {
                found = true;
                break;
            }
        }
        if (!found) {
            qCDebug(logDaemon) << "Special char mismatch - File:" << fileName 
                               << "Search:" << searchKeyword
                               << "Missing char:" << searchChar;
            return false;
        }
    }
    
    return true;
}

3. 代码性能

改进建议:

  1. matchSpecialChars 函数中使用了嵌套循环,对于长字符串可能影响性能。可以考虑使用哈希表来存储字符等价关系:
static QHash<QChar, QChar> createCharEquivalentMap()
{
    QHash<QChar, QChar> map;
    // 全角半角转换
    for (ushort u = 0xFF01; u <= 0xFF5E; ++u) {
        map.insert(QChar(u), QChar(u - 0xFEE0));
        map.insert(QChar(u - 0xFEE0), QChar(u));
    }
    return map;
}

static QHash<QChar, QChar> charEquivalentMap = createCharEquivalentMap();

bool FileSearchUtils::charsEquivalent(const QChar &ch1, const QChar &ch2)
{
    if (ch1 == ch2) {
        return true;
    }
    
    return charEquivalentMap.value(ch1, QChar()) == ch2;
}
  1. normalizePinyinKeyword 函数中,可以使用 QString::remove 方法:
QString FileSearchUtils::normalizePinyinKeyword(const QString &str)
{
    QString result = str;
    for (int i = result.size() - 1; i >= 0; --i) {
        if (!result.at(i).isLetter()) {
            result.remove(i, 1);
        }
    }
    return result;
}

4. 代码安全

改进建议:

  1. matchSpecialChars 函数中,应该检查文件名是否为空:
bool FileSearchUtils::matchSpecialChars(const QString &fileName, const QString &searchKeyword)
{
    if (fileName.isEmpty()) {
        return false;
    }
    // 其余代码...
}
  1. charsEquivalent 函数中,应该检查 Unicode 转换是否会导致无效字符:
bool FileSearchUtils::charsEquivalent(const QChar &ch1, const QChar &ch2)
{
    if (ch1 == ch2) {
        return true;
    }
    
    ushort u1 = ch1.unicode();
    ushort u2 = ch2.unicode();
    
    if (u1 >= 0xFF01 && u1 <= 0xFF5E) {
        ushort converted = u1 - 0xFEE0;
        if (converted >= 0x0021 && converted <= 0x007E) {  // 检查转换后的字符是否在有效范围内
            if (QChar(converted) == ch2) {
                return true;
            }
        }
    }
    
    if (u2 >= 0xFF01 && u2 <= 0xFF5E) {
        ushort converted = u2 - 0xFEE0;
        if (converted >= 0x0021 && converted <= 0x007E) {  // 检查转换后的字符是否在有效范围内
            if (QChar(converted) == ch1) {
                return true;
            }
        }
    }
    
    return false;
}
  1. normalizePinyinKeyword 函数中,应该考虑输入字符串可能非常大,导致内存问题:
QString FileSearchUtils::normalizePinyinKeyword(const QString &str)
{
    if (str.size() > 1000) {  // 设置合理的上限
        qCWarning(logDaemon) << "Keyword too long, truncating:" << str.size();
        return QString();
    }
    
    QString result;
    result.reserve(str.size());  // 预分配内存
    for (const QChar &ch : str) {
        if (ch.isLetter()) {
            result.append(ch);
        }
    }
    return result;
}

总结

这段代码整体质量良好,实现了拼音搜索和特殊字符匹配的功能。主要改进方向包括:

  1. 优化性能,特别是在字符匹配方面
  2. 增强安全性,添加边界检查
  3. 扩展功能,支持更多特殊字符的等价关系
  4. 改进代码结构,使用更高效的算法和数据结构

建议在正式合并前,进行充分的性能测试,特别是在处理大量文件和长字符串时的性能表现。

Copy link
Copy Markdown
Contributor

@Johnson-zs Johnson-zs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nh! 搜不到应该是因为dfm-search库在处理 !时,被当做Lucene语法处理了。nh! 搜不到应该是因为dfm-search库在处理 !时,被当做Lucene语法处理了

@Johnson-zs
Copy link
Copy Markdown
Contributor

resolved by linuxdeepin/util-dfm#252

@Johnson-zs Johnson-zs closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants