Skip to content

Fix issue to md#181

Merged
doganulus merged 3 commits into
bouncmpe:mainfrom
Cydonia01:fix-issue-to-md
Dec 30, 2025
Merged

Fix issue to md#181
doganulus merged 3 commits into
bouncmpe:mainfrom
Cydonia01:fix-issue-to-md

Conversation

@Cydonia01

@Cydonia01 Cydonia01 commented Dec 15, 2025

Copy link
Copy Markdown
Contributor

Issue #144 - Check Image and Content Validity in Issue-to-Md Feature is tried to be solved in this PR.

  • Implemented image format validation. The supported formats are JPG, JPEG, PNG, GIF, WebP, SVG.
  • The user is expected to upload an image to the image field. Also, if I understood correctly, the user can upload images to description and content fields, too. Their validation are also implemented.
  • When the uploaded file is invalid, the comment is posted on the issue.

Below are the issue tests that is conducted on my forked repository:

  • Test 23: A text file in the Short Description (EN) field. It fails and posts a comment.
  • Test 24: A text file in the Image field. It fails and posts a comment.
  • Test 25: In the beginning, an invalid image is uploaded. Then edited (removed txt file link) but still creates a PR without an image.
  • Test 26: Valid images both in the Image field and Short Description (EN) field.

Below are the Test PRs of the valid Issues.

  • Test 25 PR: It renders a news entry without an image in the website. It looks like below:
Ekran görüntüsü 2025-12-16 025426
  • Test 26 PR: It renders correctly. It looks like below (zoomed out):
Ekran görüntüsü 2025-12-16 025411

Test 25 (right) and 26 (left) look like below in the news grid.
Ekran görüntüsü 2025-12-16 025349

@Cydonia01 Cydonia01 marked this pull request as ready for review December 15, 2025 23:58
Copilot AI review requested due to automatic review settings December 15, 2025 23:58

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements image format validation for the issue-to-markdown automation workflow to address issue #144. The changes add comprehensive validation to ensure that only valid image files (JPG, JPEG, PNG, GIF, WebP, SVG) are uploaded to the image field and embedded in content fields.

Key Changes:

  • Added image format validation with support for 6 image formats (JPG, JPEG, PNG, GIF, WebP, SVG)
  • Implemented validation for both dedicated image fields and images embedded in content/description fields
  • Added automated error reporting via issue comments when invalid files are detected
  • Enhanced workflow to handle validation failures gracefully with proper error messaging

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 13 comments.

File Description
.github/workflows/issue-to-pr.yml Added error handling with continue-on-error and conditional PR creation based on validation success
.github/issue-to-md/scripts/issue_to_md.py Implemented image validation functions, refactored main logic into helper functions, and added issue commenting for validation errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +279 to +312
def _download_and_save_image(url: str, issue_number: int) -> tuple[str, list[str]]:
"""Download image and save to uploads directory."""
errors = []
try:
dprint("Downloading image:", url)
resp = requests.get(url, timeout=20)
resp.raise_for_status()
ctype = (resp.headers.get("Content-Type") or "").split(";")[0].strip()
dprint(f"Content-Type: {ctype}")

ext = CONTENT_TYPE_TO_EXT.get(ctype) or mimetypes.guess_extension(ctype) or Path(url).suffix or ".png"
dprint(f"Determined extension: {ext}")

if ext.lower() not in ALLOWED_IMAGE_FORMATS:
error_msg = (
f"Invalid image type. Server returned content type '{ctype}' "
f"(extension: '{ext}'). Allowed formats: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"
)
errors.append(error_msg)
dprint(error_msg)
return "", errors

fname = f"{issue_number}_{datetime.now().strftime('%Y%m%d%H%M%S')}{ext}"
ensure_dirs(UPLOADS_DIR)
path = UPLOADS_DIR / fname
path.write_bytes(resp.content)
rel = f"uploads/{fname}"
dprint("Saved image to", path, "→", rel)
return rel, []
except Exception as e:
error_msg = f"Failed to download image from {url}: {str(e)}"
errors.append(error_msg)
dprint(error_msg)
return "", errors

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function _download_and_save_image attempts to download images synchronously with a 20-second timeout, but there's no retry logic or handling for transient network failures. If a valid image URL temporarily fails (e.g., network hiccup, rate limiting), the entire workflow fails. Consider adding retry logic with exponential backoff for more robust handling of temporary failures.

Copilot uses AI. Check for mistakes.
ctype = (resp.headers.get("Content-Type") or "").split(";")[0].strip()
dprint(f"Content-Type: {ctype}")

ext = CONTENT_TYPE_TO_EXT.get(ctype) or mimetypes.guess_extension(ctype) or Path(url).suffix or ".png"

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extension determination at line 289 has a potential issue where mimetypes.guess_extension() can return extensions not in ALLOWED_IMAGE_FORMATS. For example, for image/jpeg, it might return .jpe or .jfif instead of .jpg. This would then fail validation at line 292. The code should normalize the guessed extension or add all possible JPEG extensions to ALLOWED_IMAGE_FORMATS. Consider mapping the result of mimetypes.guess_extension() through CONTENT_TYPE_TO_EXT first, or normalizing extensions like .jpe and .jfif to .jpg.

Copilot uses AI. Check for mistakes.
Comment on lines +230 to +245
def find_all_images(text: str) -> list[str]:
"""
Find all image URLs in markdown or HTML text.
Returns list of URLs.
"""
if not text:
return []
urls = []
# Find markdown images
for match in IMG_MD_RE.finditer(text):
urls.append(match.group(1))
# Find HTML images
for match in IMG_SRC_RE.finditer(text):
urls.append(match.group(1))
return urls

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_all_images function is defined but never used anywhere in the code. This creates dead code that should either be removed or utilized. If it was intended for future use, consider whether it's needed for the current implementation.

Suggested change
def find_all_images(text: str) -> list[str]:
"""
Find all image URLs in markdown or HTML text.
Returns list of URLs.
"""
if not text:
return []
urls = []
# Find markdown images
for match in IMG_MD_RE.finditer(text):
urls.append(match.group(1))
# Find HTML images
for match in IMG_SRC_RE.finditer(text):
urls.append(match.group(1))
return urls

Copilot uses AI. Check for mistakes.
if not text:
return []
# GitHub upload URLs pattern
file_url_pattern = re.compile(r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^)\s]+|github\.com/user-attachments/[^)\s]+)')

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern in find_uploaded_files has an issue with URL matching. It uses [^)\s]+ which stops at whitespace or closing parenthesis, but the regex is not enclosed in any delimiters. This could incorrectly match partial URLs or fail to match complete URLs. Additionally, URLs in markdown are typically in the format ![alt](url) or [text](url), so stopping at ) could truncate the URL. Consider using a more robust pattern like r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^\s)]+|github\.com/user-attachments/[^\s)]+)' and ensure proper testing with various URL formats.

Suggested change
file_url_pattern = re.compile(r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^)\s]+|github\.com/user-attachments/[^)\s]+)')
file_url_pattern = re.compile(
r'https?://(?:github\.com/[^/]+/[^/]+/files/\d+/[^\s)]+|github\.com/user-attachments/[^\s)]+)'
)

Copilot uses AI. Check for mistakes.
Comment on lines +267 to +269
errors.append(
f"Invalid file type '{ext}' uploaded to image field. "
f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message at line 268 mentions "uploaded to image field" but this function is also called for content fields (short_description_en, short_description_tr, full_content_en, full_content_tr) as seen in _validate_content_fields. This creates a misleading error message. The error message should be more generic or the function should accept a field name parameter to provide accurate context.

Copilot uses AI. Check for mistakes.
Returns (is_valid, error_message).
"""
# Extract file extension from URL
path = Path(url.split("?")[0]) # Remove query params

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL parsing in validate_image_format at line 218 uses url.split("?")[0] to remove query parameters, but URLs can also contain fragment identifiers (e.g., #section). If a URL contains a fragment like image.jpg#ref, the fragment will remain in the path and could interfere with extension detection. Consider using urlparse from urllib.parse for more robust URL parsing, or split on both ? and #.

Copilot uses AI. Check for mistakes.
Comment thread .github/issue-to-md/scripts/issue_to_md.py
dprint(error_msg)
return "", errors

def download_image_if_present(markdown_or_html: str, validate_only: bool = False) -> tuple[str, list[str]]:

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validate_only parameter in download_image_if_present is defined but only checked at line 351-352, after all validation steps have already occurred. This means validation always happens regardless of this flag, and the parameter only controls whether downloading occurs. The parameter name is misleading - consider renaming it to skip_download or removing it entirely since it's always passed as False in the single call site (line 498).

Copilot uses AI. Check for mistakes.
Comment on lines +266 to +271
if ext and ext not in ALLOWED_IMAGE_FORMATS:
errors.append(
f"Invalid file type '{ext}' uploaded to image field. "
f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"
)
dprint(f"Rejected non-image file: {ext}")

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic in _validate_file_uploads (lines 257-272) and the dedicated image field validation could reject valid image URLs that lack extensions. For instance, many CDN URLs don't have file extensions in the path (e.g., https://example.com/image?id=123). While line 223 returns True for URLs without extensions, the _validate_file_uploads function at line 266 uses if ext which means it will skip files without extensions rather than validating them. This creates an inconsistency where some paths validate extension-less URLs while others skip them. Consider making the behavior consistent across all validation paths.

Suggested change
if ext and ext not in ALLOWED_IMAGE_FORMATS:
errors.append(
f"Invalid file type '{ext}' uploaded to image field. "
f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"
)
dprint(f"Rejected non-image file: {ext}")
if ext:
if ext not in ALLOWED_IMAGE_FORMATS:
errors.append(
f"Invalid file type '{ext}' uploaded to image field. "
f"Only image files are allowed: {', '.join(sorted(ALLOWED_IMAGE_FORMATS))}"
)
dprint(f"Rejected non-image file: {ext}")

Copilot uses AI. Check for mistakes.
dprint(error_msg)
return "", errors

fname = f"{issue_number}_{datetime.now().strftime('%Y%m%d%H%M%S')}{ext}"

Copilot AI Dec 16, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename generation at line 301 uses datetime.now() without timezone information, which could lead to issues in distributed systems or when dealing with issues from different timezones. The file uses IST_TZ (Istanbul timezone) elsewhere in the code. Consider using datetime.now(IST_TZ) for consistency, or use UTC with datetime.now(timezone.utc) for better practices in file naming.

Suggested change
fname = f"{issue_number}_{datetime.now().strftime('%Y%m%d%H%M%S')}{ext}"
fname = f"{issue_number}_{datetime.now(IST_TZ).strftime('%Y%m%d%H%M%S')}{ext}"

Copilot uses AI. Check for mistakes.
@gokceuludogan gokceuludogan self-requested a review December 17, 2025 13:26

@gokceuludogan gokceuludogan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall 👍

I am not sure we should allow images in fields other than the dedicated image field, especially in the short description. My preference would be to support images only in the dedicated image field and, optionally, the full content body. Allowing images elsewhere increases the risk of layout issues and makes validation harder.

Regarding image constraints: since images are rendered in multiple contexts with different aspect ratios (featured/hero view: ~2.1:1, news card view: ~4:3), it would be helpful to validate image dimensions and clearly warn users.

I have already revised the issue template accordingly (see #180), but adding proper validation would be ideal, especially given the related issue in #176.

@doganulus

Copy link
Copy Markdown
Member

Having body images or not is not a crucial problem. They can stay. But honestly we don't create news content that requires multiple images. So I want to downplay this full-page news feature more and more.

@Cydonia01

Copy link
Copy Markdown
Contributor Author

I added smart image cropping. See Test 34 and Test 35. The resulting merges will look like below in the website:

Ekran görüntüsü 2025-12-19 002045 Ekran görüntüsü 2025-12-19 002048 Ekran görüntüsü 2025-12-19 002053

The description fields had already had an option to upload an image when I started the implementation. So, I only added validation to them in the previous commit.

@gokceuludogan gokceuludogan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current status looks sufficient to me.

Having body images or not is not a crucial problem. They can stay. But honestly we don't create news content that requires multiple images. So I want to downplay this full-page news feature more and more.

I am not entirely sure what your longer-term plan is here, but for validating the current functionality, this seems more than adequate. How do you think we should move forward from this point? @doganulus

@gokceuludogan

Copy link
Copy Markdown
Contributor

I see that you have approved the PR, but it hasn't been merged yet, and I am not authorized to do so.
Could you please confirm whether there's anything left to address, or proceed with merging? @doganulus

@doganulus doganulus merged commit 364a87c into bouncmpe:main Dec 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants