Skip to content

Fixed link building#133

Open
patrick-sil wants to merge 3 commits into
useblocks:mainfrom
patrick-sil:main
Open

Fixed link building#133
patrick-sil wants to merge 3 commits into
useblocks:mainfrom
patrick-sil:main

Conversation

@patrick-sil
Copy link
Copy Markdown

This PR is supposed to resolve #123

However, let me add to the description of the issue above that it also applies to internal references inside the PDF, hence, currently :ref: or :numref: are not working for sub-headlines in any case.

The suggested changes here, should fix the issue for both the HTML and the PDF.

What does my solution do:

  1. Introduces a new function which
  2. Does check on whether the link actually contains a #
  3. If so, it does only take the part of the link of the right-hand side of the last, i.e., right most #
  4. Return the now fixed link or the original link (also works for cases where only one # is present in the link, as it basically doen't modify the link in such cases)

If anyone has a better solution, which I assume are possible, as I'm not all that familiar with this project, please suggest them.

It would be nice, if this could be resolved soon, thank you :)

Copy link
Copy Markdown
Collaborator

@patdhlk patdhlk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrick-sil I left a few comments in the files.

Concept is right, needs another pass. Thanks for the contribution.

And one more thing that can be done in a follow up: Taking the last # fragment is a blunt instrument. The bug is that Sphinx singlehtml emits #document-path#real-anchor. You know the shape of the bad input. It'd be more precise to strip a #document-* prefix rather than blindly taking the rightmost #. Right now if someone has a legitimate # in a fragment identifier (unlikely but not impossible), this silently eats it.

Comment thread sphinx_simplepdf/builders/simplepdf.py Outdated
Comment on lines +172 to +186
"""
Fixes a link that refers to a sub-headline.

For instance the broken link:
#headline#sub-headline#sub-sub-headline
Will be fixed to:
#sub-sub-headline

If the link is not a sub-headline, the link isn't modified

Args:
link_to_fix: The link to fix

Returns: If no modification is needed, link_to_fix otherwise the fixed link
"""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is floating above the function — it's a bare string expression, not a docstring. Move it inside. But honestly, the function is three lines; the 14-line comment block is doing more harm than good. If you need a comment, one line is fine.

Comment thread sphinx_simplepdf/builders/simplepdf.py Outdated
pos = link_to_fix.rfind("#")
if pos != -1:
# In case we only find one #, this will still work as it won't affect the link
link_to_fix = link_to_fix [pos::]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: link_to_fix [pos::] — space before bracket, double colon. Should be link_to_fix[pos:]

But more importantly: this is doing too much work to express a simple idea. The whole function is:

def _fix_href(self, href):
      """Strip everything before the last '#'."""
      i = href.rfind("#")
      return href[i:] if i != -1 else href

Or even just inline it — href.rsplit("#", 1)[-1] prefixed with # gets you there in one expression.

The real problem though: you only call this on toc sidebar links. The issue report says body cross-references are also broken (:py:obj:, :ref:). Those aren't in .sphinxsidebarwrapper. So this fixes the table of contents but leaves the actual document links broken if I understand it correctly.

If you're going to fix this, fix it everywhere. Find all <a class="reference internal"> in the whole soup, not just the sidebar.

Also — the name fix_link tells me nothing. Every function in this class "fixes" something. What does this one actually do? _strip_document_prefix or _normalize_fragment would tell the next reader what's happening without reading the body.

And it should have a leading underscore. It's internal.

@patrick-sil
Copy link
Copy Markdown
Author

Concept is right, needs another pass. Thanks for the contribution.

Happy to help :)

And one more thing that can be done in a follow up: Taking the last # fragment is a blunt instrument. The bug is that Sphinx singlehtml emits #document-path#real-anchor. You know the shape of the bad input. It'd be more precise to strip a #document-* prefix rather than blindly taking the rightmost #. Right now if someone has a legitimate # in a fragment identifier (unlikely but not impossible), this silently eats it.

Hmm yes, correct. However, how would one even safely figure out where to split the string?

Refer to the example here based on what the document we are building:
#document-Requirements/Software/Req_15#Req_15
But now assume someone uses # in paths/file-names:
#document-Requirements/Software#1/Req_#15#Req_#15
A simple solution for this problem would be something like

new_link = link.rsplit("/", 1)[-1]
new_link = new_link[int(len(new_link)/2):]

Which only works, if the anchor is the same as the document name.

With e.g.
#document-Chapters/Installation_and_Configuration/Installation_Guide#workstation-clients
That wouldn't work and assuming the worst case e.g.
#document-Chapters/Installation_and_Configuration/Installation_Guide#1#workstation-clients#2
Now I don't even see any trivial solution at all, as we neither can assume the last # is the one we need or that after the last / that there are no other #.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Link fixer fails to process cross-references in document body, causing malformed double-hash hrefs

2 participants