Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
21f6c96
Revert "Revert "Add Dockerfile parser based on ANTLR (#6203)""
timtebeek Jan 9, 2026
9179a8e
Rename classes and methods as suggested
timtebeek Jan 9, 2026
7a01663
Further renames
timtebeek Jan 9, 2026
68f085a
Regenerate ANTLR sources
timtebeek Jan 9, 2026
19e5d6a
Add a small utility to parse dockerfiles in bulk
timtebeek Jan 10, 2026
d8d4320
Add a search recipe for base images
timtebeek Jan 10, 2026
745dcc5
Add search recipe for exposed ports
timtebeek Jan 10, 2026
9575f84
Add recipe to add or update Label
timtebeek Jan 10, 2026
73f7d7d
Add search recipes with a security focus
timtebeek Jan 10, 2026
618fa2d
Add better search result messages to other find recipes
timtebeek Jan 10, 2026
501d661
Fix JSON array whitespace preservation in exec form
timtebeek Jan 10, 2026
9157c64
Fix LABEL old format and VOLUME/SHELL JSON array whitespace
timtebeek Jan 10, 2026
3d62966
Add ENV_VAR support to flag values in grammar
timtebeek Jan 10, 2026
c9330e7
Fix flag value ENV_VAR parsing and organize tests
timtebeek Jan 10, 2026
956d5c8
Fix heredoc lexer rule ordering and add defensive checks
timtebeek Jan 10, 2026
8989b58
Add JSON array (exec form) support for COPY and ADD instructions
timtebeek Jan 10, 2026
cba861c
Reorder data table columns to put stageName second
timtebeek Jan 10, 2026
e7b893e
Add keyword support in shell form for --shell, --user flags
timtebeek Jan 10, 2026
5624b34
Add ENV_VAR support to VOLUME and EXPOSE instructions
timtebeek Jan 10, 2026
da8fa0d
Fix heredoc with interpreter parsing (e.g., RUN <<EOT bash)
timtebeek Jan 10, 2026
b9e1cd8
Minimize the LocalDockerParser
timtebeek Jan 10, 2026
562ff16
Remove JSON lexer mode to allow [ ] in shell commands
timtebeek Jan 10, 2026
748d34f
Allow MAINTAINER keyword as LABEL key
timtebeek Jan 10, 2026
141504d
Allow any characters in single-quoted strings
timtebeek Jan 10, 2026
4fcd6da
Allow instruction keywords as LABEL keys in equals format
timtebeek Jan 10, 2026
95ea6de
Add command substitution $() and backtick support
timtebeek Jan 10, 2026
5aa63b6
Add support for special shell variables ($!, $$, $?, etc.)
timtebeek Jan 10, 2026
905fe88
Use the `ForwardingErrorListener` as intended
timtebeek Jan 10, 2026
6c9d9fa
Add Windows escape directive and backtick line continuation support
timtebeek Jan 10, 2026
55f9665
Update headers after regeneration
timtebeek Jan 10, 2026
5f9b88f
Allow line continuation inside single quoted strings
timtebeek Jan 10, 2026
2005c95
Also allow keywords like `add` and `run` in shell
timtebeek Jan 10, 2026
3e57394
Track new lines to avoid accidental consumption
timtebeek Jan 10, 2026
ef8bd49
Change how LocalDockerParserTest verifies results
timtebeek Jan 10, 2026
9f10dab
Handle `flagValueToken` with `=`
timtebeek Jan 10, 2026
d4d15f9
Change how LocalDockerParserTest formats output
timtebeek Jan 10, 2026
e124f27
Add structured Port type for EXPOSE instruction
timtebeek Jan 10, 2026
fe709ba
Drop `DockerExposedPorts.Row.range`
timtebeek Jan 10, 2026
0d5a9ab
Find docker base images that use global args
timtebeek Jan 10, 2026
231a90b
Create a `Docker.Literal` with optional quotes
timtebeek Jan 10, 2026
66fa913
Rename method for consistency
timtebeek Jan 10, 2026
a4b55be
Change the type of Docker.Arg.name to Docker.Literal
timtebeek Jan 10, 2026
cd6d4af
Change the type of Docker.Env.EnvPair.key to Docker.Literal
timtebeek Jan 10, 2026
3580249
Change the type of Docker.From.As.name to Docker.Literal
timtebeek Jan 10, 2026
ee22dc4
Limit what files are picked up as docker files
timtebeek Jan 10, 2026
7b8ff8a
Use `List<Literal> arguments` in ShellForm adn Execform
timtebeek Jan 10, 2026
5ee9e39
Remove unused imports
timtebeek Jan 10, 2026
9bccc60
Show we can parse `RUN if [ -d /app ]; then \`
timtebeek Jan 10, 2026
3c1aad4
Remove the intermediate `CommandLine`: Use `CommandForm` directly
timtebeek Jan 10, 2026
e6e051c
Verify single line conditional as well
timtebeek Jan 10, 2026
986568a
Change ShellFrom.arguments into singular Literal argument
timtebeek Jan 10, 2026
66d5b5a
No need for an `isNone` field
timtebeek Jan 10, 2026
0d1c484
Add tests for healthcheck with flags and continuation
timtebeek Jan 10, 2026
5e572e3
Ensure we capture CMD after HEALTHCHECK flags
timtebeek Jan 10, 2026
235f2b8
Verify source and destination after COPY flags
timtebeek Jan 10, 2026
e59645e
Apply best practices
timtebeek Jan 10, 2026
20effac
Polish AddOrUpdateLabel already
timtebeek Jan 10, 2026
80502e2
Separate visitors for the update and add concerns
timtebeek Jan 11, 2026
59a9f62
Rework how healthcheck options are parsed
timtebeek Jan 11, 2026
63eda0f
AS is only a keyword after FROM (for stage aliasing)
timtebeek Jan 11, 2026
9d62cf1
Add failing tests based on feedback
timtebeek Jan 12, 2026
d25b135
Fix most issues already; disable multiple heredocs test
timtebeek Jan 12, 2026
12fd573
Remove unused argument
timtebeek Jan 12, 2026
af1bd91
Rework how flags are parsed
timtebeek Jan 12, 2026
5c9fb85
Merge branch 'main' into dockerfile-parser
timtebeek Jan 12, 2026
42ad359
Comment lines need not use a continuation char
timtebeek Jan 12, 2026
71c072f
Validate that there's no non-whitespace characters in Space
timtebeek Jan 12, 2026
90d2ac0
Parse and print the commas separately in ExecForm.arguments
timtebeek Jan 12, 2026
8d8280c
For now `allowNonWhitespaceInWhitespace` with continuation chars
timtebeek Jan 12, 2026
a0e6bcb
Tolerate one more failure for now
timtebeek Jan 12, 2026
d336300
Comments in quoted strings do not require continuation
timtebeek Jan 12, 2026
bf4fef3
Add support for multiple heredocs
timtebeek Jan 13, 2026
79f9812
Consolidate heredoc mapping and handling
timtebeek Jan 13, 2026
d6ff40b
Apply best practice of not assigning before return
timtebeek Jan 13, 2026
edb8591
Add tests for the mapping in Copy for execForm
timtebeek Jan 13, 2026
2c857d2
Introduce `CopyShellForm` for ADD/COPY
timtebeek Jan 13, 2026
0b967c0
Only use a single field in ADD and COPY
timtebeek Jan 13, 2026
9bd1e0b
Minor tweaks to parsing visitor and printer
timtebeek Jan 13, 2026
45ff424
Inline assignment before return
timtebeek Jan 13, 2026
57828d3
Add recipes.csv and category.yml
timtebeek Jan 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions IDE.properties.tmp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ rewrite-java-25

# Other language modules

rewrite-docker
rewrite-gradle
rewrite-groovy
rewrite-hcl
Expand Down
35 changes: 35 additions & 0 deletions rewrite-docker/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
plugins {
id("org.openrewrite.build.language-library")
}

val antlrGeneration by configurations.creating {
extendsFrom(configurations.implementation.get())
}

tasks.register<JavaExec>("generateAntlrSources") {
mainClass.set("org.antlr.v4.Tool")

args = listOf(
"-o", "src/main/java/org/openrewrite/docker/internal/grammar",
"-package", "org.openrewrite.docker.internal.grammar",
"-visitor"
) + fileTree("src/main/antlr").matching { include("**/*.g4") }.map { it.path }

classpath = antlrGeneration

finalizedBy("licenseFormat")
}

dependencies {
implementation(project(":rewrite-core"))
implementation("org.antlr:antlr4-runtime:4.13.2")
implementation("io.micrometer:micrometer-core:1.9.+")

antlrGeneration("org.antlr:antlr4:4.13.2"){
exclude(group = "com.ibm.icu", module = "icu4j")
}

compileOnly(project(":rewrite-test"))

testImplementation(project(":rewrite-test"))
}
218 changes: 218 additions & 0 deletions rewrite-docker/src/main/antlr/DockerLexer.g4
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
// $antlr-format alignTrailingComments true, columnLimit 150, maxEmptyLinesToKeep 1, reflowComments false, useTab false
// $antlr-format allowShortRulesOnASingleLine true, allowShortBlocksOnASingleLine true, minEmptyLines 0, alignSemicolons ownLine
// $antlr-format alignColons trailing, singleLineOverrulesHangingColon true, alignLexerCommands true, alignLabels true, alignTrailers true

lexer grammar DockerLexer;

@lexer::header
{import java.util.LinkedList;
import java.util.Queue;}

@lexer::members
{
// Use a queue (FIFO) for heredoc markers so they are matched in order of declaration
private Queue<String> heredocIdentifiers = new LinkedList<String>();
private boolean heredocIdentifierCaptured = false;
// Track if we're at the start of a logical line (where instructions can appear)
private boolean atLineStart = true;
// Track if we're after FROM to recognize AS as a keyword (for stage aliasing)
private boolean afterFrom = false;
// Track if we're after HEALTHCHECK to recognize CMD/NONE as keywords
private boolean afterHealthcheck = false;
}

options {
caseInsensitive = true;
}

// Parser directives (must be at the beginning of file)
// After a parser directive, we're at line start (it consumes the newline)
PARSER_DIRECTIVE : '#' WS_CHAR* [A-Z_]+ WS_CHAR* '=' WS_CHAR* ~[\r\n]* NEWLINE_CHAR { atLineStart = true; };

// Comments (after parser directives) - HIDDEN in main mode
COMMENT : '#' ~[\r\n]* -> channel(HIDDEN);

// Instructions (case-insensitive)
// Instructions are only recognized at line start. Otherwise they become UNQUOTED_TEXT.
// This eliminates ambiguity between instruction keywords and shell command text.
FROM : 'FROM' { if (!atLineStart) setType(UNQUOTED_TEXT); else afterFrom = true; atLineStart = false; };
RUN : 'RUN' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
// CMD is a keyword at line start (CMD instruction) or after HEALTHCHECK
CMD : 'CMD' { if (!atLineStart && !afterHealthcheck) setType(UNQUOTED_TEXT); atLineStart = false; afterHealthcheck = false; };
// NONE is only a keyword after HEALTHCHECK
NONE : 'NONE' { if (!afterHealthcheck) setType(UNQUOTED_TEXT); atLineStart = false; afterHealthcheck = false; };
LABEL : 'LABEL' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
EXPOSE : 'EXPOSE' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
ENV : 'ENV' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
ADD : 'ADD' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
COPY : 'COPY' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
ENTRYPOINT : 'ENTRYPOINT' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
VOLUME : 'VOLUME' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
USER : 'USER' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
WORKDIR : 'WORKDIR' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
ARG : 'ARG' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
// ONBUILD is special: it keeps atLineStart true so the following instruction is recognized
ONBUILD : 'ONBUILD' { if (!atLineStart) setType(UNQUOTED_TEXT); /* atLineStart stays true */ };
STOPSIGNAL : 'STOPSIGNAL' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
// HEALTHCHECK is special: it keeps atLineStart true and sets afterHealthcheck so CMD/NONE are recognized after flags
HEALTHCHECK: 'HEALTHCHECK'{ if (!atLineStart) setType(UNQUOTED_TEXT); else afterHealthcheck = true; /* atLineStart stays true */ };
SHELL : 'SHELL' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };
MAINTAINER : 'MAINTAINER' { if (!atLineStart) setType(UNQUOTED_TEXT); atLineStart = false; };

// AS is only a keyword after FROM (for stage aliasing)
AS : 'AS' { if (!afterFrom) setType(UNQUOTED_TEXT); atLineStart = false; afterFrom = false; };

// Heredoc start - captures <<EOF or <<-EOF including the identifier and switches to HEREDOC_PREAMBLE mode
HEREDOC_START : '<<' '-'? [A-Z_][A-Z0-9_]* {
// Extract and store the heredoc marker identifier in FIFO order
String text = getText();
int prefixLen = text.charAt(2) == '-' ? 3 : 2;
String marker = text.substring(prefixLen);
heredocIdentifiers.add(marker);
heredocIdentifierCaptured = true;
atLineStart = false;
} -> pushMode(HEREDOC_PREAMBLE);

// Line continuation - HIDDEN in main mode
// Supports both backslash (Linux) and backtick (Windows with # escape=`)
LINE_CONTINUATION : ('\\' | '`') [ \t]* NEWLINE_CHAR -> channel(HIDDEN);

// JSON array delimiters (for exec form) - no mode switching, handled in parser
LBRACKET : '[' { atLineStart = false; };
RBRACKET : ']' { atLineStart = false; };
COMMA : ',' { atLineStart = false; };

// Assignment (used in ENV, ARG, LABEL, etc.)
EQUALS : '=' { if (!afterHealthcheck) atLineStart = false; };

// Flag with optional value: --name or --name=value
// Captures the entire flag as a single token, stopping at whitespace
// This avoids the greedy flagValue+ parsing issue while keeping shell commands working
FLAG : '--' [a-zA-Z] [a-zA-Z0-9_-]* ('=' ~[ \t\r\n]+)? { if (!afterHealthcheck) atLineStart = false; };

// Standalone -- (double dash without flag name) - used in shell commands
DASH_DASH : '--' { if (!afterHealthcheck) atLineStart = false; };

// Unquoted text fragment (to be used in UNQUOTED_TEXT)
// This matches text that doesn't start with -- or <<
// Note: < is excluded to allow HEREDOC_START (<<) to match
fragment UNQUOTED_CHAR : ~[ \t\r\n\\"'$[\]=<];
fragment ESCAPED_CHAR : '\\' .;

// String literals
// Double-quoted strings support escape sequences, line continuation, and bare newlines
// Backtick followed by whitespace+newline is continuation; standalone backtick is regular char
// Bare newlines are allowed (e.g., comment lines inside PowerShell strings don't need trailing backtick)
DOUBLE_QUOTED_STRING : '"' ( ESCAPE_SEQUENCE | INLINE_CONTINUATION | '`' | [\r\n] | ~["\\\r\n`] )* '"' { if (!afterHealthcheck) atLineStart = false; };
// Single-quoted strings in shell are literal - no escape processing inside
// But they DO support line continuation (backslash or backtick followed by newline)
// Bare newlines are also allowed for multi-line strings
SINGLE_QUOTED_STRING : '\'' ( INLINE_CONTINUATION | [\r\n] | ~['\r\n] )* '\'' { if (!afterHealthcheck) atLineStart = false; };

// Inline line continuation (inside strings) - backtick or backslash followed by newline
fragment INLINE_CONTINUATION : ('\\' | '`') [ \t]* [\r\n]+;

fragment ESCAPE_SEQUENCE
: '\\' ~[\r\n] // Backslash followed by any char except newline (includes \n, \t, \\, \", Windows paths like \P)
;

fragment HEX_DIGIT : [0-9A-F];

// Environment variable reference
ENV_VAR : ('$' '{' [A-Z_][A-Z0-9_]* ( ':-' | ':+' | ':' )? ~[}]* '}' | '$' [A-Z_][A-Z0-9_]*) { atLineStart = false; };

// Special shell variables ($!, $$, $?, $#, $@, $*, $0-$9)
SPECIAL_VAR : '$' [!$?#@*0-9] { atLineStart = false; };

// Command substitution $(command) or $((arithmetic))
// Handles nested parentheses by counting them
COMMAND_SUBST : '$(' ( COMMAND_SUBST | ~[()] | '(' COMMAND_SUBST_INNER* ')' )* ')' { atLineStart = false; };
fragment COMMAND_SUBST_INNER : COMMAND_SUBST | ~[()];

// Backtick command substitution `command`
// First char after backtick must NOT be whitespace/newline (which would be line continuation)
// Content cannot span newlines (backtick command substitution doesn't support that)
BACKTICK_SUBST : '`' ~[ \t\r\n`] ~[`\r\n]* '`' { atLineStart = false; };

// Unquoted text (arguments, file paths, etc.)
// This should be after more specific tokens
// Note: comma is NOT excluded here - it's only special in JSON arrays
// We structure this to not match text starting with -- (so DASH_DASH can match first)
// Also exclude < from starting char to allow HEREDOC_START (<<) to match
UNQUOTED_TEXT
: ( ~[-< \t\r\n\\"'$[\]=] ( UNQUOTED_CHAR | ESCAPED_CHAR )* // Start with non-hyphen, non-<, non-space
| '-' ~[- \t\r\n\\"'$[\]=<] ( UNQUOTED_CHAR | ESCAPED_CHAR )* // Single hyphen followed by non-hyphen, non-space
| '-' // Just a hyphen by itself
| '<' ~[< \t\r\n\\"'$[\]=] ( UNQUOTED_CHAR | ESCAPED_CHAR )* // Single < followed by non-<
| '<' // Just a < by itself
| ESCAPED_CHAR ( UNQUOTED_CHAR | ESCAPED_CHAR )* // Start with escaped char (e.g., \; in find -exec)
) { if (!afterHealthcheck) atLineStart = false; }
;

// Whitespace - HIDDEN in main mode
WS : WS_CHAR+ -> channel(HIDDEN);

fragment WS_CHAR : [ \t];

// Newlines - HIDDEN in main mode, reset state for next line
NEWLINE : NEWLINE_CHAR+ { atLineStart = true; afterFrom = false; afterHealthcheck = false; } -> channel(HIDDEN);

fragment NEWLINE_CHAR : [\r\n];

// ----------------------------------------------------------------------------------------------
// HEREDOC_PREAMBLE mode - for parsing shell command preamble after heredoc marker(s)
// The heredoc identifier (e.g., EOF) is already captured in HEREDOC_START
// This mode handles the shell command text including additional heredoc markers for multi-heredoc.
// ----------------------------------------------------------------------------------------------
mode HEREDOC_PREAMBLE;

// Line continuation in preamble - stay in HEREDOC_PREAMBLE mode
HP_LINE_CONTINUATION : ('\\' | '`') [ \t]* '\n' -> channel(HIDDEN);

// Newline without continuation - transition to HEREDOC mode for body content
HP_NEWLINE : '\n' -> type(NEWLINE), mode(HEREDOC);

HP_WS : [ \t\r\u000C]+ -> channel(HIDDEN);
HP_COMMENT : '/*' .*? '*/' -> channel(HIDDEN);
HP_LINE_COMMENT : ('//' | '#') ~[\r\n]* '\r'? -> channel(HIDDEN);

// Additional heredoc marker in preamble (for multi-heredoc support)
HP_HEREDOC_START : '<<' '-'? [A-Z_][A-Z0-9_]* {
// Extract and store the heredoc marker identifier in FIFO order
String text = getText();
int prefixLen = text.charAt(2) == '-' ? 3 : 2;
String marker = text.substring(prefixLen);
heredocIdentifiers.add(marker);
} -> type(HEREDOC_START);

// Any text on the heredoc line after the marker (destination paths, interpreter names, shell commands, etc.)
// Exclude < to allow HP_HEREDOC_START to match <<
// Exclude \ and ` to allow HP_LINE_CONTINUATION to match
HP_UNQUOTED_TEXT : ( ~[<\\` \t\r\n]+
| '<' ~[< \t\r\n] ~[ \t\r\n]* // single < followed by non-< char
| '<' // standalone <
) -> type(UNQUOTED_TEXT);

// ----------------------------------------------------------------------------------------------
// HEREDOC mode - for parsing heredoc content
// Supports multiple heredocs by only popping mode when all markers have been matched.
// ----------------------------------------------------------------------------------------------
mode HEREDOC;

H_NEWLINE : '\n' -> type(NEWLINE);

// Match heredoc content lines - emit as HEREDOC_CONTENT unless it's an ending identifier
// For multi-heredoc, we only popMode when the queue is empty (all markers matched in FIFO order)
HEREDOC_CONTENT : ~[\n]+
{
if(!heredocIdentifiers.isEmpty() && getText().equals(heredocIdentifiers.peek())) {
setType(UNQUOTED_TEXT);
heredocIdentifiers.poll(); // Remove from front of queue (FIFO)
// Only pop mode when all heredoc markers have been matched
if(heredocIdentifiers.isEmpty()) {
popMode();
atLineStart = true; // After heredoc ends, next line is at line start
}
}
};

Loading