Extend HostToDomainGraph to fold host-level graphs stripping the www. prefix#30
Extend HostToDomainGraph to fold host-level graphs stripping the www. prefix#30
Conversation
|
@sebastian-nagel thanks for the comments, just to be sure, in both the pointed parts, the variable |
Sorry, yes, of course. Also EffectiveTldFinder needs the unreversed form. That makes the code even simpler. |
|
Yes. AFAIK the variable |
…ing an option value
sebastian-nagel
left a comment
There was a problem hiding this comment.
Thanks, @lfoppiano!
Sorry, there was an conceptual misunderstanding: host-without-www is on the same level as are registered and private domain.
The three nodes
0 com.example
1 com.example.www
2 com.example.xyz
are folded to (without counts of "subdomains"):
0 com.example 2
1 com.example.xyz 1
I'll update the script src/script/host2domaingraph.sh to work with the option --aggregation-level requiring an option value.
| System.err.println(" -c\tcount hosts per domain (additional column in <nodes_out>"); | ||
| System.err.println(" --private-domains\tconvert to private domains (include suffixes from the"); | ||
| System.err.println(" --private-domains\t(deprecated - use --aggregation-level)"); | ||
| System.err.println(" \tconvert to private domains (include suffixes from the"); |
There was a problem hiding this comment.
Tabs in the whitespace. If the terminal shows tabs with length 8, then the help might look skewed:
--private-domains (deprecated - use --aggregation-level)
convert to private domains (include suffixes from the
PRIVATE domains subdivision of the public suffix list,
see https://github.com/publicsuffix/list/wiki/Format#divisions)
There was a problem hiding this comment.
For testing, could run as:
./src/script/host2domaingraph.sh -h 2>&1 | expand -8
|
If I have understood correctly I've made all the required changes. |
…ateDomain=true AND stripwww=true, the CLI ensure that this condition never happens
sebastian-nagel
left a comment
There was a problem hiding this comment.
Thanks! Looks good to me. Tested with a small host-level web graph and all three aggregation levels.
Please squash the commits to a smaller meaningful number.
As specified in #29