How to enable self closing tags in 1.20.1 #2322
Replies: 4 comments 2 replies
-
|
Hi Eugen, Sorry, I should have included a sample in one of those. Thanks for raising the point! Here's a simple example: TagSet tags = TagSet.Html();
Tag div = tags.valueOf("div", Parser.NamespaceHtml);
div.set(Tag.SelfClose);
String html = "<div /><p>One</p>";
Document doc = Jsoup.parse(html, Parser.htmlParser().tagSet(tags));
System.out.println(doc.body());Gives us: <body>
<div></div>
<p>One</p>
</body>Compared to the default (to-spec) HTML parse, which will not honor that self-close: <body>
<div>
<p>One</p>
</div>
</body>The essential idea is that a TagSet contains all of the Tag objects that the parser will use. Each Tag in the set can be updated with properties like self-closing, data, block, etc. Those properties will variously impact the parse and/or the serialization. Note that in the case of self-closing tags in HTML, the setting specifically applies to how we parse. The output does not self-close but has an explicit closer. This is so that the output is still valid HTML. If there are other properties that would be useful to add (or cases that the parser / serializer is not yet honoring) I'd be happy to take a look. |
Beta Was this translation helpful? Give feedback.
-
|
We're hitting this now over on Apache Tika. We were surprised by the behavior here. When we add script as selfcloseable as in the example above, it works. I'm grepping our regression corpus (samples of Common Crawl) to find the most common self-closing tags to add. Do you have any recs? Would anyone else be interested in our list? |
Beta Was this translation helpful? Give feedback.
-
|
Thank you @jhy for a way to handle unknown tags in >= 1.21.1 |
Beta Was this translation helpful? Give feedback.
-
|
Note, for those of you like us that were previously using |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
While i have seen the announcement in https://github.com/jhy/jsoup/releases/tag/jsoup-1.20.1 that closing tags are no longer parsed by default and it is possible to enable this again, the hint how to do so was nothing i could work with.
I checked the corresponding issues like
But i cannot see to find any hint, test or example - neither i could get any hint on https://jsoup.org/
Could you provide a small example / more hints for us, this would be awesome!
Beta Was this translation helpful? Give feedback.
All reactions