DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Coding
  3. Languages
  4. XML Sitemap Generation in Java

XML Sitemap Generation in Java

Want to make sure search engines pick up your content? Build an XML sitemap to give them a hand. This approach uses Java to generate an XML sitemap.

Bill O'Neil user avatar by
Bill O'Neil
·
Jul. 27, 17 · Tutorial
Like (9)
Save
Tweet
Share
21.42K Views

Join the DZone community and get the full member experience.

Join For Free

XML sitemaps are a great way to expose your site's content to search engines, especially when you do not have an internal or external linking structure built out yet. An XML sitemap, in its simplest form, is a directory of every unique URL your website contains. This gives Google and other search engines a one-stop-shop for all pages they should index. XML sitemaps are restricted to 10MB or 50k links per sitemap, but this limitation can be circumvented with sitemap indexes that link to multiple sitemaps. Sitemaps can also include additional metadata. such as how frequently pages get updated or when was the last time a page was updated. After you design a site with HTML/CSS templates, make sure you include sitemaps to index the pages quicker.

XML Sitemap With Java

The SitemapGen4J library gives a nice object model for generating all URLs required to build out a sitemap. Most likely, you will need to write code that can generate all possible URLs for your website. Another alternative is to build a generic crawler that can build a sitemap for any website. It's not too difficult to build all of the custom URLs so we can create a method for each page type. We section them all out because we plan on making a sitemap index later.

public class StubbornJavaSitemapGenerator {
    private static final String HOST = "https://www.stubbornjava.com";

    private static final InMemorySitemap sitemap = InMemorySitemap.fromSupplier(StubbornJavaSitemapGenerator::generateSitemap);
    public static InMemorySitemap getSitemap() {
        return sitemap;
    }

    private static Map<String, List<String>> generateSitemap() {
        Map<String, List<String>> index = Maps.newHashMap();
        try {
            index.put("posts", genPosts());
            index.put("guides", genGuides());
            index.put("recommendations", genRecommendations());
            index.put("tags", genTags());
            index.put("libraries", genLibraries());
            return index;
        } catch (MalformedURLException ex) {
            throw new RuntimeException(ex);
        }
    }

    private static List<String> genPosts() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<String> slugs = Posts.getAllSlugs();
        for (String slug: slugs) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("posts")
                                .addPathSegment(slug)
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genGuides() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<GuideTitle> guides = Guides.findTitles();
        for (GuideTitle guide : guides) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("guides")
                                .addPathSegment(guide.getSlug())
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genRecommendations() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<String> recommendations = Lists.newArrayList(
            "java-libraries"
            , "best-selling-html-css-themes-and-website-templates"
        );
        for (String recommendation : recommendations) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment(recommendation)
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genTags() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<Tag> tags = Tags.getTags();
        for (Tag tag : tags) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("tags")
                                .addPathSegment(tag.getName())
                                .addEncodedPathSegment("posts")
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genLibraries() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<JavaLib> libraries = Seq.of(JavaLib.values()).toList();
        for (JavaLib lib : libraries) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("java-libraries")
                                .addPathSegment(lib.getName())
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    public static void main(String[] args) {
        generateSitemap();
    }
}

 View on GitHub

XML Sitemap Index

SitemapGen4J was built to write the sitemaps to files on disk, but I just want to keep ours in memory since it is fairly small. Unfortunately, it looks like exposing the internal object model or additional rendering features was an afterthought. There is an override for the individual sitemaps, but not for the index. We should probably contribute an implementation or create a fully custom sitemap generator. Instead, we need to build our own internal mapping. Sitemaps have a limit of 10MB or 50k URLs per sitemap. This is why an index is needed.

public class InMemorySitemap {
    private final Supplier<Map<String, String>> indexSupplier;
    private InMemorySitemap(Supplier<Map<String, String>> indexSupplier) {
        this.indexSupplier = indexSupplier;
    }

    public String getIndex(String sitemapName) {
        return indexSupplier.get().get(sitemapName);
    }

    public List<String> getIndexNames() {
        return Seq.seq(indexSupplier.get().keySet())
                  .sorted()
                  .toList();
    }

    // Cache the sitemap for the lifetime of the JVM
    public static InMemorySitemap fromSupplier(Supplier<Map<String, List<String>>> supplier) {
        Supplier<Map<String, String>> sup = mapSupplier(supplier);
        Supplier<Map<String, String>> memoized = Suppliers.memoize(sup::get);
        return new InMemorySitemap(memoized);
    }

    // Cache the sitemap but refresh after the given duration.
    public static InMemorySitemap fromSupplierWithExpiration(
            Supplier<Map<String, List<String>>> supplier,
            long duration,
            TimeUnit unit) {
        Supplier<Map<String, String>> sup = mapSupplier(supplier);
        Supplier<Map<String, String>> memoized = Suppliers.memoizeWithExpiration(sup::get, duration, unit);
        return new InMemorySitemap(memoized);
    }

    private static Supplier<Map<String, String>> mapSupplier(Supplier<Map<String, List<String>>> supplier) {
        return () -> {
            Map<String, List<String>> originalMap = supplier.get();
            Map<String, String> newIndex = Maps.newHashMap();
            for (Entry<String, List<String>> entry : originalMap.entrySet()) {
                for (int i = 0; i < entry.getValue().size(); i++) {
                    newIndex.put(entry.getKey() + "-" + i + ".xml", entry.getValue().get(i));
                }
            }
            return newIndex;
        };
    }
}

 View on GitHub

XML Sitemap Routes

With an internal representation of the sitemap, we now need to expose it in our Undertow web server. A cool feature of the RoutingHandler is that it allows you to combine two RoutingHandlers with the addAll method.

public class SitemapRoutes {
    private final InMemorySitemap sitemap;
    private SitemapRoutes(InMemorySitemap sitemap) {
        this.sitemap = sitemap;
    }

    public void getSitemap(HttpServerExchange exchange) {
        String sitemapName = Exchange.pathParams().pathParam(exchange, "sitemap").orElse(null);
        String content = sitemap.getIndex(sitemapName);
        if (null == content) {
            exchange.setStatusCode(404);
            Exchange.body().sendText(exchange, String.format("Sitemap %s doesn't exist", sitemapName));
            return;
        }
        Exchange.body().sendXml(exchange, content);
    }

    /*
     * Routing Handlers can be reused and combined with each other
     * using the RoutingHandler.addAll() method.
     */
    public static RoutingHandler router(InMemorySitemap sitemap) {
        SitemapRoutes routes = new SitemapRoutes(sitemap);
        RoutingHandler router = new RoutingHandler()
            .get("/sitemaps/{sitemap}", timed("getSitemap", routes::getSitemap))
        ;
        return router;
    }
}

 View on GitHub

Exposing the Sitemap

Ideally, you can just expose a single sitemap index file that references all of the others. Since we had to hack around this a bit, another option is to include all of the sitemap files in our robots.txt.

public static void robots(HttpServerExchange exchange) {
    String host = Exchange.urls().host(exchange).toString();
    List<String> sitemaps = StubbornJavaSitemapGenerator.getSitemap().getIndexNames();
    Response response = Response.fromExchange(exchange)
                                .with("sitemaps", sitemaps)
                                .with("host", host);
    Exchange.body().sendText(exchange, Templating.instance().renderTemplate("templates/src/pages/robots.txt", response));
}

 View on GitHub

XML Sitemaps Java (programming language)

Published at DZone with permission of Bill O'Neil. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Reliability Is Slowing You Down
  • gRPC on the Client Side
  • Build an Automated Testing Pipeline With GitLab CI/CD and Selenium Grid
  • [DZone Survey] Share Your Expertise and Take our 2023 Web, Mobile, and Low-Code Apps Survey

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: