Really nice work. I appreciate the example with JSDOM as that’s exactly how I use readability, and this looks like a nice drop-in replacement.

Question: How did you validate this? You say it works better than readability but I don’t see any tests or datasets in the repo to evaluate accuracy or coverage. Would it be possible to share that as well?

▲ kepano a month ago | parent [-]

Currently I am relying on manual testing and user feedback, but yes, I'd like to add tests.

Defuddle works quite differently from Readability. Readability tends to be overly conservative and tends to remove useful content because it tests blocks to find the beginning and end of the "main" content.

Defuddle is able to run multiple passes and detect if it returned no content to try and expand its results. It also uses a greater variety of techniques to clean the content — for example, by using a page's mobile styles to detect content that can be hidden.

Lastly, Defuddle is not only extracting the content but also standardizing the output (which Readability doesn't do). For example footnotes and code blocks all aim to output a single format, whereas Readability keeps the original DOM intact.

	▲	honodk123 a month ago \| parent [-]
		This looks great! I would love to give Defuddle a try as a Readability replacement. However, for my use case I want to do in a Chrome extension background script (service worker). I have not been able to get Defuddle to work, while readability does (when combining with linkedom). So basically, while this works: `import { parseHTML } from 'linkedom'; ... private extractArticleWithReadability(html: string) { const { document } = parseHTML(html); const reader = new Readability(document); return reader.parse(); }` This does not: `import { parseHTML } from 'linkedom'; ... private async extractArticleWithDefuddle(html: string) { const { document } = parseHTML(html); const result = new Defuddle(document); result.parse(); return result; }` I get errors like: - Error in findExtractor: TypeError: Failed to construct 'URL': Invalid URL - Defuddle: Error evaluating media queries: TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator)) - Defuddle Error processing document: TypeError: b.getComputedStyle is not a function Is there a way to run Defuddle in a chrome extension background script/service worker? Or do you have any plans of adding support for that?