Skip to content

Jsoup Crawler #43

@xiayank

Description

@xiayank

I have a question about using jsop api to select the target element.
Here is the HTML.
image
I want to get the href attribute value in <a>tag, which is under the <div class=bxc-grid__image bxc-grid__image--light>.
I tried use

Elements elements = doc.select("div[class=bxc-grid__image   bxc-grid__image--light]");

to locate the div. It works. I followed the API E > F an F direct child of E . So the select css will be li[class=sub-categories__list__item]>a. Howerver, there is exception.

Anyone knows how to locate the <a>tag?

Thanks in advance!
Jsoup select API
URL OF ORGINAL PAGE

Here is the exception log:

Exception in thread "main" java.lang.IllegalArgumentException: String must not be empty
	at org.jsoup.helper.Validate.notEmpty(Validate.java:92)
	at org.jsoup.nodes.Attribute.setKey(Attribute.java:51)
	at org.jsoup.parser.ParseSettings.normalizeAttributes(ParseSettings.java:54)
	at org.jsoup.parser.HtmlTreeBuilder.insert(HtmlTreeBuilder.java:185)
	at org.jsoup.parser.HtmlTreeBuilderState$7.process(HtmlTreeBuilderState.java:553)
	at org.jsoup.parser.HtmlTreeBuilder.process(HtmlTreeBuilder.java:113)
	at org.jsoup.parser.TreeBuilder.runParser(TreeBuilder.java:50)
	at org.jsoup.parser.TreeBuilder.parse(TreeBuilder.java:43)
	at org.jsoup.parser.HtmlTreeBuilder.parse(HtmlTreeBuilder.java:56)
	at org.jsoup.parser.Parser.parseInput(Parser.java:32)
	at org.jsoup.helper.DataUtil.parseByteData(DataUtil.java:135)
	at org.jsoup.helper.HttpConnection$Response.parse(HttpConnection.java:747)
	at org.jsoup.helper.HttpConnection.get(HttpConnection.java:250)
	at test.main(test.java:26)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions