Fix issue to md by Cydonia01 · Pull Request #181 · bouncmpe/bouncmpe.github.io

Cydonia01 · 2025-12-15T20:24:39Z

Issue #144 - Check Image and Content Validity in Issue-to-Md Feature is tried to be solved in this PR.

Implemented image format validation. The supported formats are JPG, JPEG, PNG, GIF, WebP, SVG.
The user is expected to upload an image to the image field. Also, if I understood correctly, the user can upload images to description and content fields, too. Their validation are also implemented.
When the uploaded file is invalid, the comment is posted on the issue.

Below are the issue tests that is conducted on my forked repository:

Test 23: A text file in the Short Description (EN) field. It fails and posts a comment.
Test 24: A text file in the Image field. It fails and posts a comment.
Test 25: In the beginning, an invalid image is uploaded. Then edited (removed txt file link) but still creates a PR without an image.
Test 26: Valid images both in the Image field and Short Description (EN) field.

Below are the Test PRs of the valid Issues.

Test 25 PR: It renders a news entry without an image in the website. It looks like below:

Test 26 PR: It renders correctly. It looks like below (zoomed out):

Test 25 (right) and 26 (left) look like below in the news grid.

Copilot

Pull request overview

This PR implements image format validation for the issue-to-markdown automation workflow to address issue #144. The changes add comprehensive validation to ensure that only valid image files (JPG, JPEG, PNG, GIF, WebP, SVG) are uploaded to the image field and embedded in content fields.

Key Changes:

Added image format validation with support for 6 image formats (JPG, JPEG, PNG, GIF, WebP, SVG)
Implemented validation for both dedicated image fields and images embedded in content/description fields
Added automated error reporting via issue comments when invalid files are detected
Enhanced workflow to handle validation failures gracefully with proper error messaging

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.

File	Description
`.github/workflows/issue-to-pr.yml`	Added error handling with `continue-on-error` and conditional PR creation based on validation success
`.github/issue-to-md/scripts/issue_to_md.py`	Implemented image validation functions, refactored main logic into helper functions, and added issue commenting for validation errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-16T00:07:07Z

+def _download_and_save_image(url: str, issue_number: int) -> tuple[str, list[str]]:
+    """Download image and save to uploads directory."""
+    errors = []
+    try:
+        dprint("Downloading image:", url)
+        resp = requests.get(url, timeout=20)
+        resp.raise_for_status()
+        ctype = (resp.headers.get("Content-Type") or "").split(";")[0].strip()
+        dprint(f"Content-Type: {ctype}")
+
+        ext = CONTENT_TYPE_TO_EXT.get(ctype) or mimetypes.guess_extension(ctype) or Path(url).suffix or ".png"
+        dprint(f"Determined extension: {ext}")
+
+        if ext.lower() not in ALLOWED_IMAGE_FORMATS:
+            error_msg = (
+                f"Invalid image type. Server returned content type '{ctype}' "
+                f"(extension: '{ext}'). Allowed formats: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"
+            )
+            errors.append(error_msg)
+            dprint(error_msg)
+            return "", errors
+
+        fname = f"{issue_number}_{datetime.now().strftime('%Y%m%d%H%M%S')}{ext}"
+        ensure_dirs(UPLOADS_DIR)
+        path = UPLOADS_DIR / fname
+        path.write_bytes(resp.content)
+        rel = f"uploads/{fname}"
+        dprint("Saved image to", path, "→", rel)
+        return rel, []
+    except Exception as e:
+        error_msg = f"Failed to download image from {url}: {str(e)}"
+        errors.append(error_msg)
+        dprint(error_msg)
+        return "", errors


The function _download_and_save_image attempts to download images synchronously with a 20-second timeout, but there's no retry logic or handling for transient network failures. If a valid image URL temporarily fails (e.g., network hiccup, rate limiting), the entire workflow fails. Consider adding retry logic with exponential backoff for more robust handling of temporary failures.

Copilot · 2025-12-16T00:07:08Z

+        ctype = (resp.headers.get("Content-Type") or "").split(";")[0].strip()
+        dprint(f"Content-Type: {ctype}")
+
+        ext = CONTENT_TYPE_TO_EXT.get(ctype) or mimetypes.guess_extension(ctype) or Path(url).suffix or ".png"


The extension determination at line 289 has a potential issue where mimetypes.guess_extension() can return extensions not in ALLOWED_IMAGE_FORMATS. For example, for image/jpeg, it might return .jpe or .jfif instead of .jpg. This would then fail validation at line 292. The code should normalize the guessed extension or add all possible JPEG extensions to ALLOWED_IMAGE_FORMATS. Consider mapping the result of mimetypes.guess_extension() through CONTENT_TYPE_TO_EXT first, or normalizing extensions like .jpe and .jfif to .jpg.

Copilot · 2025-12-16T00:07:08Z

+def find_all_images(text: str) -> list[str]:
+    """
+    Find all image URLs in markdown or HTML text.
+    Returns list of URLs.
+    """
+    if not text:
+        return []
+    urls = []
+    # Find markdown images
+    for match in IMG_MD_RE.finditer(text):
+        urls.append(match.group(1))
+    # Find HTML images
+    for match in IMG_SRC_RE.finditer(text):
+        urls.append(match.group(1))
+    return urls
+


The find_all_images function is defined but never used anywhere in the code. This creates dead code that should either be removed or utilized. If it was intended for future use, consider whether it's needed for the current implementation.

Suggested change

def find_all_images(text: str) -> list[str]:

"""

Find all image URLs in markdown or HTML text.

Returns list of URLs.

"""

if not text:

return []

urls = []

# Find markdown images

for match in IMG_MD_RE.finditer(text):

urls.append(match.group(1))

# Find HTML images

for match in IMG_SRC_RE.finditer(text):

urls.append(match.group(1))

return urls

Copilot · 2025-12-16T00:07:08Z

+    if not text:
+        return []
+    # GitHub upload URLs pattern
+    file_url_pattern = re.compile(r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^)\s]+|github\.com/user-attachments/[^)\s]+)')


The regex pattern in find_uploaded_files has an issue with URL matching. It uses [^)\s]+ which stops at whitespace or closing parenthesis, but the regex is not enclosed in any delimiters. This could incorrectly match partial URLs or fail to match complete URLs. Additionally, URLs in markdown are typically in the format ![alt](url) or [text](url), so stopping at ) could truncate the URL. Consider using a more robust pattern like r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^\s)]+|github\.com/user-attachments/[^\s)]+)' and ensure proper testing with various URL formats.

Suggested change

file_url_pattern = re.compile(r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^)\s]+|github\.com/user-attachments/[^)\s]+)')

file_url_pattern = re.compile(

r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^\s)]+|github\.com/user-attachments/[^\s)]+)'

)

Copilot · 2025-12-16T00:07:08Z

+            errors.append(
+                f"Invalid file type '{ext}' uploaded to image field. "
+                f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"


The error message at line 268 mentions "uploaded to image field" but this function is also called for content fields (short_description_en, short_description_tr, full_content_en, full_content_tr) as seen in _validate_content_fields. This creates a misleading error message. The error message should be more generic or the function should accept a field name parameter to provide accurate context.

Copilot · 2025-12-16T00:07:10Z

+    Returns (is_valid, error_message).
+    """
+    # Extract file extension from URL
+    path = Path(url.split("?")[0])  # Remove query params


The URL parsing in validate_image_format at line 218 uses url.split("?")[0] to remove query parameters, but URLs can also contain fragment identifiers (e.g., #section). If a URL contains a fragment like image.jpg#ref, the fragment will remain in the path and could interfere with extension detection. Consider using urlparse from urllib.parse for more robust URL parsing, or split on both ? and #.

Copilot · 2025-12-16T00:07:10Z

+        dprint(error_msg)
+        return "", errors
+
+def download_image_if_present(markdown_or_html: str, validate_only: bool = False) -> tuple[str, list[str]]:


The validate_only parameter in download_image_if_present is defined but only checked at line 351-352, after all validation steps have already occurred. This means validation always happens regardless of this flag, and the parameter only controls whether downloading occurs. The parameter name is misleading - consider renaming it to skip_download or removing it entirely since it's always passed as False in the single call site (line 498).

Copilot · 2025-12-16T00:07:10Z

+        if ext and ext not in ALLOWED_IMAGE_FORMATS:
+            errors.append(
+                f"Invalid file type '{ext}' uploaded to image field. "
+                f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"
+            )
+            dprint(f"Rejected non-image file: {ext}")


The validation logic in _validate_file_uploads (lines 257-272) and the dedicated image field validation could reject valid image URLs that lack extensions. For instance, many CDN URLs don't have file extensions in the path (e.g., https://example.com/image?id=123). While line 223 returns True for URLs without extensions, the _validate_file_uploads function at line 266 uses if ext which means it will skip files without extensions rather than validating them. This creates an inconsistency where some paths validate extension-less URLs while others skip them. Consider making the behavior consistent across all validation paths.

Suggested change

if ext and ext not in ALLOWED_IMAGE_FORMATS:

errors.append(

f"Invalid file type '{ext}' uploaded to image field. "

f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"

)

dprint(f"Rejected non-image file: {ext}")

if ext:

if ext not in ALLOWED_IMAGE_FORMATS:

errors.append(

f"Invalid file type '{ext}' uploaded to image field. "

f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"

)

dprint(f"Rejected non-image file: {ext}")

Copilot · 2025-12-16T00:07:11Z

+            dprint(error_msg)
+            return "", errors
+
+        fname = f"{issue_number}_{datetime.now().strftime('%Y%m%d%H%M%S')}{ext}"


The filename generation at line 301 uses datetime.now() without timezone information, which could lead to issues in distributed systems or when dealing with issues from different timezones. The file uses IST_TZ (Istanbul timezone) elsewhere in the code. Consider using datetime.now(IST_TZ) for consistency, or use UTC with datetime.now(timezone.utc) for better practices in file naming.

Suggested change

fname = f"{issue_number}_{datetime.now().strftime('%Y%m%d%H%M%S')}{ext}"

fname = f"{issue_number}_{datetime.now(IST_TZ).strftime('%Y%m%d%H%M%S')}{ext}"

gokceuludogan

Looks good overall 👍

I am not sure we should allow images in fields other than the dedicated image field, especially in the short description. My preference would be to support images only in the dedicated image field and, optionally, the full content body. Allowing images elsewhere increases the risk of layout issues and makes validation harder.

Regarding image constraints: since images are rendered in multiple contexts with different aspect ratios (featured/hero view: ~2.1:1, news card view: ~4:3), it would be helpful to validate image dimensions and clearly warn users.

I have already revised the issue template accordingly (see #180), but adding proper validation would be ideal, especially given the related issue in #176.

doganulus · 2025-12-18T16:39:32Z

Having body images or not is not a crucial problem. They can stay. But honestly we don't create news content that requires multiple images. So I want to downplay this full-page news feature more and more.

Cydonia01 · 2025-12-18T21:51:27Z

I added smart image cropping. See Test 34 and Test 35. The resulting merges will look like below in the website:

The description fields had already had an option to upload an image when I started the implementation. So, I only added validation to them in the previous commit.

gokceuludogan

The current status looks sufficient to me.

Having body images or not is not a crucial problem. They can stay. But honestly we don't create news content that requires multiple images. So I want to downplay this full-page news feature more and more.

I am not entirely sure what your longer-term plan is here, but for validating the current functionality, this seems more than adequate. How do you think we should move forward from this point? @doganulus

gokceuludogan · 2025-12-29T19:44:06Z

I see that you have approved the PR, but it hasn't been merged yet, and I am not authorized to do so.
Could you please confirm whether there's anything left to address, or proceed with merging? @doganulus

Cydonia01 added 2 commits December 16, 2025 02:21

add image validation and comment on error

25047da

refactor: fix cognitive complexity and add missing helper function

2764dec

Cydonia01 force-pushed the fix-issue-to-md branch from 97b763c to 2764dec Compare December 15, 2025 23:22

Cydonia01 marked this pull request as ready for review December 15, 2025 23:58

Copilot AI review requested due to automatic review settings December 15, 2025 23:58

Copilot started reviewing on behalf of Cydonia01 December 15, 2025 23:59 View session

Copilot AI reviewed Dec 16, 2025

View reviewed changes

gokceuludogan self-requested a review December 17, 2025 13:26

gokceuludogan requested changes Dec 17, 2025

View reviewed changes

gokceuludogan requested a review from doganulus December 17, 2025 13:37

added smart image cropping

7cf9afe

gokceuludogan approved these changes Dec 25, 2025

View reviewed changes

doganulus approved these changes Dec 28, 2025

View reviewed changes

doganulus merged commit 364a87c into bouncmpe:main Dec 30, 2025

	fname = f"{issue_number}_{datetime.now().strftime('%Y%m%d%H%M%S')}{ext}"
	fname = f"{issue_number}_{datetime.now(IST_TZ).strftime('%Y%m%d%H%M%S')}{ext}"

Conversation

Cydonia01 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

gokceuludogan left a comment

Choose a reason for hiding this comment

Uh oh!

doganulus commented Dec 18, 2025

Uh oh!

Cydonia01 commented Dec 18, 2025

Uh oh!

gokceuludogan left a comment

Choose a reason for hiding this comment

Uh oh!

gokceuludogan commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cydonia01 commented Dec 15, 2025 •

edited

Loading