XML Sitemap Generation in Java
Want to make sure search engines pick up your content? Build an XML sitemap to give them a hand. This approach uses Java to generate an XML sitemap.
Join the DZone community and get the full member experience.
Join For FreeXML sitemaps are a great way to expose your site's content to search engines, especially when you do not have an internal or external linking structure built out yet. An XML sitemap, in its simplest form, is a directory of every unique URL your website contains. This gives Google and other search engines a one-stop-shop for all pages they should index. XML sitemaps are restricted to 10MB or 50k links per sitemap, but this limitation can be circumvented with sitemap indexes that link to multiple sitemaps. Sitemaps can also include additional metadata. such as how frequently pages get updated or when was the last time a page was updated. After you design a site with HTML/CSS templates, make sure you include sitemaps to index the pages quicker.
XML Sitemap With Java
The SitemapGen4J library gives a nice object model for generating all URLs required to build out a sitemap. Most likely, you will need to write code that can generate all possible URLs for your website. Another alternative is to build a generic crawler that can build a sitemap for any website. It's not too difficult to build all of the custom URLs so we can create a method for each page type. We section them all out because we plan on making a sitemap index later.
public class StubbornJavaSitemapGenerator {
private static final String HOST = "https://www.stubbornjava.com";
private static final InMemorySitemap sitemap = InMemorySitemap.fromSupplier(StubbornJavaSitemapGenerator::generateSitemap);
public static InMemorySitemap getSitemap() {
return sitemap;
}
private static Map<String, List<String>> generateSitemap() {
Map<String, List<String>> index = Maps.newHashMap();
try {
index.put("posts", genPosts());
index.put("guides", genGuides());
index.put("recommendations", genRecommendations());
index.put("tags", genTags());
index.put("libraries", genLibraries());
return index;
} catch (MalformedURLException ex) {
throw new RuntimeException(ex);
}
}
private static List<String> genPosts() throws MalformedURLException {
WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
List<String> slugs = Posts.getAllSlugs();
for (String slug: slugs) {
String url = HttpUrl.parse(HOST)
.newBuilder()
.addPathSegment("posts")
.addPathSegment(slug)
.build()
.toString();
wsg.addUrl(url);
}
return wsg.writeAsStrings();
}
private static List<String> genGuides() throws MalformedURLException {
WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
List<GuideTitle> guides = Guides.findTitles();
for (GuideTitle guide : guides) {
String url = HttpUrl.parse(HOST)
.newBuilder()
.addPathSegment("guides")
.addPathSegment(guide.getSlug())
.build()
.toString();
wsg.addUrl(url);
}
return wsg.writeAsStrings();
}
private static List<String> genRecommendations() throws MalformedURLException {
WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
List<String> recommendations = Lists.newArrayList(
"java-libraries"
, "best-selling-html-css-themes-and-website-templates"
);
for (String recommendation : recommendations) {
String url = HttpUrl.parse(HOST)
.newBuilder()
.addPathSegment(recommendation)
.build()
.toString();
wsg.addUrl(url);
}
return wsg.writeAsStrings();
}
private static List<String> genTags() throws MalformedURLException {
WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
List<Tag> tags = Tags.getTags();
for (Tag tag : tags) {
String url = HttpUrl.parse(HOST)
.newBuilder()
.addPathSegment("tags")
.addPathSegment(tag.getName())
.addEncodedPathSegment("posts")
.build()
.toString();
wsg.addUrl(url);
}
return wsg.writeAsStrings();
}
private static List<String> genLibraries() throws MalformedURLException {
WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
List<JavaLib> libraries = Seq.of(JavaLib.values()).toList();
for (JavaLib lib : libraries) {
String url = HttpUrl.parse(HOST)
.newBuilder()
.addPathSegment("java-libraries")
.addPathSegment(lib.getName())
.build()
.toString();
wsg.addUrl(url);
}
return wsg.writeAsStrings();
}
public static void main(String[] args) {
generateSitemap();
}
}
XML Sitemap Index
SitemapGen4J was built to write the sitemaps to files on disk, but I just want to keep ours in memory since it is fairly small. Unfortunately, it looks like exposing the internal object model or additional rendering features was an afterthought. There is an override for the individual sitemaps, but not for the index. We should probably contribute an implementation or create a fully custom sitemap generator. Instead, we need to build our own internal mapping. Sitemaps have a limit of 10MB or 50k URLs per sitemap. This is why an index is needed.
public class InMemorySitemap {
private final Supplier<Map<String, String>> indexSupplier;
private InMemorySitemap(Supplier<Map<String, String>> indexSupplier) {
this.indexSupplier = indexSupplier;
}
public String getIndex(String sitemapName) {
return indexSupplier.get().get(sitemapName);
}
public List<String> getIndexNames() {
return Seq.seq(indexSupplier.get().keySet())
.sorted()
.toList();
}
// Cache the sitemap for the lifetime of the JVM
public static InMemorySitemap fromSupplier(Supplier<Map<String, List<String>>> supplier) {
Supplier<Map<String, String>> sup = mapSupplier(supplier);
Supplier<Map<String, String>> memoized = Suppliers.memoize(sup::get);
return new InMemorySitemap(memoized);
}
// Cache the sitemap but refresh after the given duration.
public static InMemorySitemap fromSupplierWithExpiration(
Supplier<Map<String, List<String>>> supplier,
long duration,
TimeUnit unit) {
Supplier<Map<String, String>> sup = mapSupplier(supplier);
Supplier<Map<String, String>> memoized = Suppliers.memoizeWithExpiration(sup::get, duration, unit);
return new InMemorySitemap(memoized);
}
private static Supplier<Map<String, String>> mapSupplier(Supplier<Map<String, List<String>>> supplier) {
return () -> {
Map<String, List<String>> originalMap = supplier.get();
Map<String, String> newIndex = Maps.newHashMap();
for (Entry<String, List<String>> entry : originalMap.entrySet()) {
for (int i = 0; i < entry.getValue().size(); i++) {
newIndex.put(entry.getKey() + "-" + i + ".xml", entry.getValue().get(i));
}
}
return newIndex;
};
}
}
XML Sitemap Routes
With an internal representation of the sitemap, we now need to expose it in our Undertow web server. A cool feature of the RoutingHandler is that it allows you to combine two RoutingHandlers
with the addAll
method.
public class SitemapRoutes {
private final InMemorySitemap sitemap;
private SitemapRoutes(InMemorySitemap sitemap) {
this.sitemap = sitemap;
}
public void getSitemap(HttpServerExchange exchange) {
String sitemapName = Exchange.pathParams().pathParam(exchange, "sitemap").orElse(null);
String content = sitemap.getIndex(sitemapName);
if (null == content) {
exchange.setStatusCode(404);
Exchange.body().sendText(exchange, String.format("Sitemap %s doesn't exist", sitemapName));
return;
}
Exchange.body().sendXml(exchange, content);
}
/*
* Routing Handlers can be reused and combined with each other
* using the RoutingHandler.addAll() method.
*/
public static RoutingHandler router(InMemorySitemap sitemap) {
SitemapRoutes routes = new SitemapRoutes(sitemap);
RoutingHandler router = new RoutingHandler()
.get("/sitemaps/{sitemap}", timed("getSitemap", routes::getSitemap))
;
return router;
}
}
Exposing the Sitemap
Ideally, you can just expose a single sitemap index file that references all of the others. Since we had to hack around this a bit, another option is to include all of the sitemap files in our robots.txt.
public static void robots(HttpServerExchange exchange) {
String host = Exchange.urls().host(exchange).toString();
List<String> sitemaps = StubbornJavaSitemapGenerator.getSitemap().getIndexNames();
Response response = Response.fromExchange(exchange)
.with("sitemaps", sitemaps)
.with("host", host);
Exchange.body().sendText(exchange, Templating.instance().renderTemplate("templates/src/pages/robots.txt", response));
}
Published at DZone with permission of Bill O'Neil. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments