Snacktory – Yet another Readability clone. This time in Java.
Join the DZone community and get the full member experience.Join For Free
for jetslide i needed a readability java clone. there are already some tools, but i wanted some more and other features so i adapted the existing goose and jreadability and added some stuff. check out the detection quality at jetslide and fork it to improve it – since today snacktory is free software !
copied from the readme:
- better article text detection than jreadability
- only java deps
- more tests
- similar article text detection although better detection for none-english sites (german, japanese, …)
- snacktory does not depend on the word count in its text detection to support cjk languages
- no external services required to run the core tests => faster tests
- better charset detection
- with caching support
- skipping some known filetypes
- only the detection of the top image and the top text is supported at the moment
- some tests which passed do not pass. but added a bunch of other useful sites (stackoverflow, facebook, other languages …)
htmlfetcher fetcher = new htmlfetcher(); // set cache. e.g. take the map implementation from google collections: // fetcher.setcache(new mapmaker().concurrencylevel(20). // maximumsize(count).expireafterwrite(minutes, timeunit.minutes).makemap(); jresult res = fetcher.fetchandextract(url, resolvetimeout, true); res.gettext(); res.gettitle(); res.getimageurl();
Opinions expressed by DZone contributors are their own.