Snacktory – Yet another Readability clone. This time in Java.
Snacktory – Yet another Readability clone. This time in Java.
Join the DZone community and get the full member experience.
Join For FreeFor Jetslide I needed a readability Java clone. There are already some tools, but I wanted some more and other features so I adapted the existing goose and jreadability and added some stuff. Check out the detection quality at Jetslide and fork it to improve it – since today snacktory is free software !
Copied from the README:
- better article text detection than jReadability
- only Java deps
- more tests
- similar article text detection although better detection for none-english sites (German, Japanese, …)
- snacktory does not depend on the word count in its text detection to support CJK languages
- no external Services required to run the core tests => faster tests
- better charset detection
- with caching support
- skipping some known filetypes
- only the detection of the top image and the top text is supported at the moment
- some tests which passed do not pass. But added a bunch of other useful sites (stackoverflow, facebook, other languages …)
HtmlFetcher fetcher = new HtmlFetcher(); // set cache. e.g. take the map implementation from google collections: // fetcher.setCache(new MapMaker().concurrencyLevel(20). // maximumSize(count).expireAfterWrite(minutes, TimeUnit.MINUTES).makeMap(); JResult res = fetcher.fetchAndExtract(url, resolveTimeout, true); res.getText(); res.getTitle(); res.getImageUrl();
From http://karussell.wordpress.com/2011/07/12/snacktory-yet-another-readability-clone-this-time-in-java/
Opinions expressed by DZone contributors are their own.
{{ parent.title || parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}