DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How to Convert XLS to XLSX in Java
  • Thread-Safety Pitfalls in XML Processing
  • How to Merge Excel XLSX Files in Java
  • How to Query XML Files Using APIs in Java

Trending

  • How Kubernetes Cluster Sizing Affects Performance and Cost Efficiency in Cloud Deployments
  • Optimizing Serverless Computing with AWS Lambda Layers and CloudFormation
  • ITBench, Part 1: Next-Gen Benchmarking for IT Automation Evaluation
  • Apache Spark 4.0: Transforming Big Data Analytics to the Next Level
  1. DZone
  2. Coding
  3. Languages
  4. XML Sitemap Generation in Java

XML Sitemap Generation in Java

Want to make sure search engines pick up your content? Build an XML sitemap to give them a hand. This approach uses Java to generate an XML sitemap.

By 
Bill O'Neil user avatar
Bill O'Neil
·
Jul. 27, 17 · Tutorial
Likes (9)
Comment
Save
Tweet
Share
23.5K Views

Join the DZone community and get the full member experience.

Join For Free

XML sitemaps are a great way to expose your site's content to search engines, especially when you do not have an internal or external linking structure built out yet. An XML sitemap, in its simplest form, is a directory of every unique URL your website contains. This gives Google and other search engines a one-stop-shop for all pages they should index. XML sitemaps are restricted to 10MB or 50k links per sitemap, but this limitation can be circumvented with sitemap indexes that link to multiple sitemaps. Sitemaps can also include additional metadata. such as how frequently pages get updated or when was the last time a page was updated. After you design a site with HTML/CSS templates, make sure you include sitemaps to index the pages quicker.

XML Sitemap With Java

The SitemapGen4J library gives a nice object model for generating all URLs required to build out a sitemap. Most likely, you will need to write code that can generate all possible URLs for your website. Another alternative is to build a generic crawler that can build a sitemap for any website. It's not too difficult to build all of the custom URLs so we can create a method for each page type. We section them all out because we plan on making a sitemap index later.

public class StubbornJavaSitemapGenerator {
    private static final String HOST = "https://www.stubbornjava.com";

    private static final InMemorySitemap sitemap = InMemorySitemap.fromSupplier(StubbornJavaSitemapGenerator::generateSitemap);
    public static InMemorySitemap getSitemap() {
        return sitemap;
    }

    private static Map<String, List<String>> generateSitemap() {
        Map<String, List<String>> index = Maps.newHashMap();
        try {
            index.put("posts", genPosts());
            index.put("guides", genGuides());
            index.put("recommendations", genRecommendations());
            index.put("tags", genTags());
            index.put("libraries", genLibraries());
            return index;
        } catch (MalformedURLException ex) {
            throw new RuntimeException(ex);
        }
    }

    private static List<String> genPosts() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<String> slugs = Posts.getAllSlugs();
        for (String slug: slugs) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("posts")
                                .addPathSegment(slug)
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genGuides() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<GuideTitle> guides = Guides.findTitles();
        for (GuideTitle guide : guides) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("guides")
                                .addPathSegment(guide.getSlug())
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genRecommendations() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<String> recommendations = Lists.newArrayList(
            "java-libraries"
            , "best-selling-html-css-themes-and-website-templates"
        );
        for (String recommendation : recommendations) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment(recommendation)
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genTags() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<Tag> tags = Tags.getTags();
        for (Tag tag : tags) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("tags")
                                .addPathSegment(tag.getName())
                                .addEncodedPathSegment("posts")
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    private static List<String> genLibraries() throws MalformedURLException {
        WebSitemapGenerator wsg = new WebSitemapGenerator(HOST);
        List<JavaLib> libraries = Seq.of(JavaLib.values()).toList();
        for (JavaLib lib : libraries) {
            String url = HttpUrl.parse(HOST)
                                .newBuilder()
                                .addPathSegment("java-libraries")
                                .addPathSegment(lib.getName())
                                .build()
                                .toString();
            wsg.addUrl(url);
        }
        return wsg.writeAsStrings();
    }

    public static void main(String[] args) {
        generateSitemap();
    }
}

 View on GitHub

XML Sitemap Index

SitemapGen4J was built to write the sitemaps to files on disk, but I just want to keep ours in memory since it is fairly small. Unfortunately, it looks like exposing the internal object model or additional rendering features was an afterthought. There is an override for the individual sitemaps, but not for the index. We should probably contribute an implementation or create a fully custom sitemap generator. Instead, we need to build our own internal mapping. Sitemaps have a limit of 10MB or 50k URLs per sitemap. This is why an index is needed.

public class InMemorySitemap {
    private final Supplier<Map<String, String>> indexSupplier;
    private InMemorySitemap(Supplier<Map<String, String>> indexSupplier) {
        this.indexSupplier = indexSupplier;
    }

    public String getIndex(String sitemapName) {
        return indexSupplier.get().get(sitemapName);
    }

    public List<String> getIndexNames() {
        return Seq.seq(indexSupplier.get().keySet())
                  .sorted()
                  .toList();
    }

    // Cache the sitemap for the lifetime of the JVM
    public static InMemorySitemap fromSupplier(Supplier<Map<String, List<String>>> supplier) {
        Supplier<Map<String, String>> sup = mapSupplier(supplier);
        Supplier<Map<String, String>> memoized = Suppliers.memoize(sup::get);
        return new InMemorySitemap(memoized);
    }

    // Cache the sitemap but refresh after the given duration.
    public static InMemorySitemap fromSupplierWithExpiration(
            Supplier<Map<String, List<String>>> supplier,
            long duration,
            TimeUnit unit) {
        Supplier<Map<String, String>> sup = mapSupplier(supplier);
        Supplier<Map<String, String>> memoized = Suppliers.memoizeWithExpiration(sup::get, duration, unit);
        return new InMemorySitemap(memoized);
    }

    private static Supplier<Map<String, String>> mapSupplier(Supplier<Map<String, List<String>>> supplier) {
        return () -> {
            Map<String, List<String>> originalMap = supplier.get();
            Map<String, String> newIndex = Maps.newHashMap();
            for (Entry<String, List<String>> entry : originalMap.entrySet()) {
                for (int i = 0; i < entry.getValue().size(); i++) {
                    newIndex.put(entry.getKey() + "-" + i + ".xml", entry.getValue().get(i));
                }
            }
            return newIndex;
        };
    }
}

 View on GitHub

XML Sitemap Routes

With an internal representation of the sitemap, we now need to expose it in our Undertow web server. A cool feature of the RoutingHandler is that it allows you to combine two RoutingHandlers with the addAll method.

public class SitemapRoutes {
    private final InMemorySitemap sitemap;
    private SitemapRoutes(InMemorySitemap sitemap) {
        this.sitemap = sitemap;
    }

    public void getSitemap(HttpServerExchange exchange) {
        String sitemapName = Exchange.pathParams().pathParam(exchange, "sitemap").orElse(null);
        String content = sitemap.getIndex(sitemapName);
        if (null == content) {
            exchange.setStatusCode(404);
            Exchange.body().sendText(exchange, String.format("Sitemap %s doesn't exist", sitemapName));
            return;
        }
        Exchange.body().sendXml(exchange, content);
    }

    /*
     * Routing Handlers can be reused and combined with each other
     * using the RoutingHandler.addAll() method.
     */
    public static RoutingHandler router(InMemorySitemap sitemap) {
        SitemapRoutes routes = new SitemapRoutes(sitemap);
        RoutingHandler router = new RoutingHandler()
            .get("/sitemaps/{sitemap}", timed("getSitemap", routes::getSitemap))
        ;
        return router;
    }
}

 View on GitHub

Exposing the Sitemap

Ideally, you can just expose a single sitemap index file that references all of the others. Since we had to hack around this a bit, another option is to include all of the sitemap files in our robots.txt.

public static void robots(HttpServerExchange exchange) {
    String host = Exchange.urls().host(exchange).toString();
    List<String> sitemaps = StubbornJavaSitemapGenerator.getSitemap().getIndexNames();
    Response response = Response.fromExchange(exchange)
                                .with("sitemaps", sitemaps)
                                .with("host", host);
    Exchange.body().sendText(exchange, Templating.instance().renderTemplate("templates/src/pages/robots.txt", response));
}

 View on GitHub

XML Sitemaps Java (programming language)

Published at DZone with permission of Bill O'Neil. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How to Convert XLS to XLSX in Java
  • Thread-Safety Pitfalls in XML Processing
  • How to Merge Excel XLSX Files in Java
  • How to Query XML Files Using APIs in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!