The latest and popular trending topics on DZone.
Artificial intelligence (AI) and machine learning (ML) are two fields that work together to create computer systems capable of perception, recognition, decision-making, and translation. Separately, AI is the ability for a computer system to mimic human intelligence through math and logic, and ML builds off AI by developing methods that "learn" through experience and do not require instruction. In the AI/ML Zone, you'll find resources ranging from tutorials to use cases that will help you navigate this rapidly growing field.
Java is an object-oriented programming language that allows engineers to produce software for multiple platforms. Our resources in this Zone are designed to help engineers with Java program development, Java SDKs, compilers, interpreters, documentation generators, and other tools used to produce a complete application.
JavaScript (JS) is an object-oriented programming language that allows engineers to produce and implement complex features within web browsers. JavaScript is popular because of its versatility and is preferred as the primary choice unless a specific function is needed. In this Zone, we provide resources that cover popular JS frameworks, server applications, supported data types, and other useful topics for a front-end engineer.
Open source refers to non-proprietary software that allows anyone to modify, enhance, or view the source code behind it. Our resources enable programmers to work or collaborate on projects created by different teams, companies, and organizations.
The IDC projects that 4.2M new Salesforce-related jobs will be created between 2019 and 2024. A portion of these roles are Salesforce developer positions. But what’s the job of a Salesforce developer, and how do professionals transition into these roles? These questions– and workforce evolutions– are close to my personal experience. I had a conversation with Alba Rivas to discuss these topics, as they’re part of her journey as well. In this post, we’ll discuss her history with the platform, and she’ll outline what a Salesforce developer job looks like today. Alba will also give you tips and tricks to successfully get started yourself. Alba began her path to Salesforce by studying telecommunications engineering, and she’s been developing software since she graduated. Early in her career, she worked on applications primarily built with Python and Java. But in 2014, Alba was contacted by a company who built apps on the Salesforce platform to sell through Salesforce’s app marketplace, AppExchange. The company was looking for back-end developers. The native back-end language used in Salesforce is Apex, an object-oriented domain-specific language similar to Java or C++. Because there weren’t many Salesforce developers in the market yet, the company was looking for developers with object-oriented programming skills. Alba told me that initially, she was reluctant to take the job. She was skeptical of a platform that had created its own programming languages. Ultimately, Alba finally decided to accept the offer and give it a chance. The Upsides of a Software as a Service Platform When Alba and I discussed the early days of her Salesforce developer work, she said she felt the benefits of developing on a SaaS (Software as a Service) platform from the very beginning. When developing apps with the Salesforce platform, Alba pointed out that you don’t have to take care of hosting your app. Developers don’t need to create and attach databases, start and stop web servers, or balance incoming requests. The Salesforce platform does all that for you. Becoming a Salesforce developer gave Alba the opportunity to invest her time in creating apps, which is what she told me she loves doing. Salesforce executes in a multi-tenant environment, meaning that the underlying resources are shared with other customers. The platform has well-defined limits that prevent one environment’s code from hoarding the resources. These limits, Alba found, served as positive motivation in her development work. Developing in an environment with clear limits forces developers like Alba to write optimal code. She told me she’s always been an advocate of clean code, design patterns, and performance. So, learning how to code with Apex was fun for Alba– like doing a sudoku in which everything fits together nicely. Building Faster with Configuration and Builders Later in our chat, Alba and I talked about how on the Salesforce platform, there are two main ways to build apps: using configuration and coding. Thanks to this powerful combination, building apps on the Salesforce platform is extremely fast. When configuring an app, the objects, the table where they’re stored, the basic user interfaces and APIs for the app, and their APIs are configured through a web application. No code is needed. This initial configuration produces an app that is basic but could be used immediately in production. To implement the custom logic and UI for your app, you can further configure it. The tools for these configurations are referred to as “builders.” Lightning App Builder lets Alba and other Salesforce developers assemble pages with web components. These can be out-of-the-box components, components downloaded from the AppExchange, or custom components built by them. For custom business logic and business processes, Alba likes to use Flow Builder. Flows are units of logic built with a block-based programming interface. Flows can be configured to be executed when a record is modified or through other entry points. For instance, Alba talked about how they can be invoked from Apex and how they are also exposed for execution from a REST API call. Flows can also incorporate UI to create complex guided visual experiences with no code. These are called “Screen Flows.” These builders vastly speed up the app creation process. At this point in our conversation, Alba acknowledged that traditional coders might be wary of what she was saying. But she encouraged me (and you) to think of builders as a booster to speed up development. She pointed out that you can quickly build basic, less interesting functionality with no code, freeing up developers to concentrate on solving more complex and challenging problems with code. At this point, I asked Alba how much time a Salesforce developer invests in coding alone. She said it can depend on many factors. Primarily, it depends on the structure of the company that you work for. In Alba’s case, she has always dedicated 95% of her time to code, as there haven’t been other people in her company whose role it was to do the configuration. Having said that, she acknowledged this is not always the case. Transferable Skills for Programmatic Development When Alba started with Salesforce development, the supported coding languages were Apex to implement business processes and Visualforce (a server-side rendering framework with its own custom markup) to create pages. Now in Alba’s role, she and her team use Lightning Web Components (LWC) to create custom user interfaces, a framework based on Web Components standards. With LWC, Alba creates components using modern, standard JavaScript. This means the skills that she learned previously are transferable to other JavaScript-based technologies. She pointed out that on top of this transferability, LWC is open source, and developers can use it outside of the Salesforce platform. Components can be used in Lightning App Builder and published on the AppExchange for other customers to use them. We also talked about how the door has been opened to other back-end programming languages (beyond Apex) thanks to Salesforce Functions. With this new technology, developers are able to, from different Salesforce entry points, invoke functions written with standard-industry languages like Node.js or Java. These functions run elastically, using the exact resources that they need. Improved Developer Experience I asked Alba about the most noticeable Salesforce developer improvements she’s seen over the years. According to her, it boils down to the developer experience. When she started coding with Salesforce, it was normal to code in a limited, web-based IDE called the Developer Console. (Developers could also use the Force.com ANT-based migration tool to deploy their changes to Salesforce.) While using the migration tool brought Alba the familiarity of ANT, there were limited capabilities for tracking changes and performing conflict resolution. Alba told me the launch of the Salesforce CLI in 2016 fixed a lot of the earlier limitations. In addition to change tracking and improved conflict resolution, the Salesforce CLI introduced features to manage Salesforce environments, run tests, load sample data, and much more. By using the Salesforce CLI, Alba said she became much more productive. This also helped her team to transform their application lifecycle management, as the Salesforce CLI allows developers to implement modern CI/CD flows much easier. Salesforce has also invested in the Salesforce extension pack for VisualStudio Code. Alba listed some of the extension’s benefits which included autocompletion capabilities, direct access to the help feature, tools for debugging, testing, and access to many of the Salesforce CLI features. Tons of Free Learning Resources Since I know many developers are anxious about their time to productivity when switching roles, I asked Alba about this. She said that in her first months as a Salesforce developer, she invested some time into learning, but that honestly, the learning curve was not too steep. There was great documentation, developer guides, and other training materials that she could find online. This onboarding experience has vastly improved thanks to Trailhead. Trailhead is Salesforce’s fun, free learning platform in which developers can learn by following interactive tutorials, all while getting up to speed lightning fast. It contains more than 1000 learning modules and hands-on projects. The Invaluable Salesforce Community In thinking about Salesforce’s size, I asked Alba whether she felt it was a benefit or drawback. Alba described it as an appealing differentiating factor– the community around Salesforce is huge. So, if you have a question, you can ask it in a public community, such as the online Trailblazer community, the Developer Forums, or Salesforce Stackexchange, and lots of folks will be happy to quickly help you. Alba also pointed out that developers can join a Trailblazer community group near where they live and connect with people at online and in-person events. She said that Salesforce Community members normally have a spirit of cohesion; they look out for each other. Being involved in the community was key in her own learning path, and it also helped Alba to join the developer relations team at Salesforce, where she works as a developer advocate today. Next Steps If you want to get started with Salesforce development, Alba recommends starting with the Developer Beginner trail. In addition to Trailhead, Salesforce publishes lots of development resources to read, watch, and listen to. Alba also suggested this developer’s blog, the Salesforce podcast, and their YouTube channel (subscribe if you haven’t yet!). Alba specifically encouraged new Salesforce developers to take a look at the Beginner’s Mind and Quick Start video series. Want to take a look at some sample code? Salesforce has sample apps with patterns and best practices to help you ramp up your development skills. I hope you enjoyed reading this post as much as I enjoyed connecting with Alba. Best of luck in following her path toward becoming a Salesforce developer. Have a really great day!
I first dabbled in Kotlin soon after its 1.0 release in 2016. For lack of paying gigs in which to use it, I started my own open-source project and released the first alpha over the Christmas holidays. I’m now firmly in love with the language. But I’m not here to promote my pet project. I want to talk about the emotional value of the tools we use, the joys and annoyances beyond mere utility. Some will tell you that there’s nothing you can do in Kotlin that you can’t do just as fine with Java. There’s no compelling reason to switch. Kotlin is just a different tool doing the same thing. Software is a product of the mind, not of your keyboard. You’re not baking an artisanal loaf of bread, where ingredients and the right oven matter as much as your craftsmanship. Tools only support your creativity. They don’t create anything. I agree that we mustn’t get hung up on our tools, but they are important. Both the hardware and software we use to create our code matter a great deal. I’ll argue that we pick these tools not only for their usefulness but also for the joy of working with them. And don’t forget snob appeal. Kotlin can be evaluated on all these three motivations. Let’s take a detour outside the world of software to illustrate. I’m an amateur photographer who spends way too much on gear, knowing full well that it doesn’t improve my work. I’m the butt of Ken Rockwell’s amusing rant: “Your camera doesn’t matter." Only amateurs believe that it does. Get a decent starter kit and then go out to interesting places and take lots of pictures, is his advice. Better even, take classes to get professional feedback on your work. Spend your budget on that instead of on fancy gear. In two words: Leica shmeica. He’s right. Cameras and gear facilitate your creativity at best, but a backpack full of it can weigh you down. A photographer creates with their eyes and imagination. You only need the camera to register the result of that process. Your iPhone Pro has superior resolution and sharpness over Henri Cartier-Bresson’s single-lens compact camera (Leica, by the way) that he used in the 1940s for The Decisive Moment. But your pics won’t come anywhere near his greatness. I didn’t take Ken’s advice to heart of course. The outlay on photo gear still exceeds what I spent on training over the years by a factor of five. Who cares, it’s my hobby budget. I shouldn’t have a need for something to desire it. I’m drawing the parallel with photography because the high-tech component makes it an attractive analogy with programming, but it’s hardly the same thing. Photography is the art of recognizing a great subject and composition when you see it, not mastering your kit. Anyone can click a button and all good cameras are interchangeable. Do you care which one Steve McCurry used for his famous photo of the Afghan girl? I don’t. On the other hand, the programmer’s relationship to their tools, especially the language, is a much more intimate one, like the musician has with their instrument. Both take thousands of hours of hard practice. An accomplished violinist can’t just pick up a cello. Similarly, you don’t migrate from Haskell to C# like you switch from Nikon to Canon. The latter is closer to swapping a Windows laptop for a Mac: far less of a deal. If like musicians, we interact with our tools eight hours a day, they must be great, not just good. Quality hardware for the programmer should be a no-brainer. It infuriates me how many companies still don’t get this. Everybody should be given the best setup that money can buy when it costs less than a senior dev’s monthly rate. There’s a joy that comes from working with a superior tool. Working with junk is as annoying as working with premium tools is delightful. The mechanical DAS keyboard I’m writing this on isn’t faster, but still, the best 150 euros ever spent on office equipment. Thirdly, there is snob appeal and pride of ownership. If utility and quality were all that mattered, nobody would be buying luxury brands. Fashion would not exist. A limousine doesn’t get you any faster from A to B – except perhaps on a social ladder. Spending 7000 on a Leica compact as an amateur is extravagant, but you can flaunt its prestigious red dot and imagine yourself a pro. If I were filthy rich, I’d get one. I would also buy a Steinway grand and love it more than it’s appropriate to love an inanimate object. Let’s look at the parallels with the most important tool the programmer has in their belt: the language. Programming is creating a new virtual machine inside the Matryoshka doll of other virtual machines that make up a running application. As for plain utility, each modern language is Turing complete and can do the job, but nobody can reasonably argue that that makes them equally useful for every job. To not overcomplicate the argument, I’ll stay within the JVM ecosystem. There is no coding job that you could implement in any of the JVM languages (Java, Kotlin, Scala, Groovy, Ceylon, Frege, etc.) which would be impossible to emulate in any of the other ones, but they differ greatly in their syntax and idioms. That, and their snob appeal. Yes, programmers turn up their noses at competitive tools, maybe more secretly than openly, but they do. I spent two years on a Scala project and attended the Scala world conference. Scala’s advanced syntactical constructs (higher kinded types, multiple argument lists) have been known to give rise to much my-language-is-better-than-yours snootiness. Don’t get me wrong: it’s impressively powerful but has a steep learning curve. It may be free to download, but when time is money, it’s expensive to master. It’s the Leica of programming languages and for some, that’s exactly the appeal: learning something hard and then boasting it’s not hard at all. It’s a long-standing Linux tradition. Kotlin has no such snob appeal. It was conceived to be a more developer-friendly Java, to radically upgrade the old syntax in ways that would never be possible in Java itself, due to the non-negotiable requirement for every new compiler to support source code written in 1999. If mere utility and snob appeal don’t apply, then the argument left to favor Kotlin over Java must be the positive experience of working with it. While coding my Kotlin project I was also studying for the OCP-17 Java exam. This proved a revealing exercise in comparative language analysis. Some features simply delight. Kotlin’s built-in null safety is wonderful, a killer feature. Don’t tell me you don’t need it because you’re such a clean coder. That betrays a naive denial of other people’s sloppiness. Most stuff you write interacts with their code. Do you plan on educating them too? Other (Java) features simply keep annoying me. Evolution forbids breaking change because each new baby must be viable and produce offspring for incremental change to occur. Likewise, the more you work with Kotlin, the more certain architectural decisions in the Java syntax stick out like ugly quirks that nature (i.e., Gosling and Goetz) can’t correct without breaking the legacy. Many things in Java feel awkward and ugly for that very reason. Nobody would design a language with fixed-length arrays syntactically distinct from other collection types. Arrays can take primitive value types (numbers and booleans), which you need to (un)box in an object for lists, sets, and maps. You wouldn’t make those arrays mutable and covariant. I give you a bag of apples, you call it a bag of fruit, insert a banana, and give it back to me. Mayhem! The delight of working with a language that doesn’t have these design quirks is as strong as the annoyance over a language that does. I make no excuses for the fact that my reaction is more emotional than rational. To conclude, I don’t want to denigrate what Java designers have achieved over the years. They’re smarter than me, and they didn’t have a crystal ball. In twenty years, the Kotlin team may well find out that they painted themselves in a corner over some design decision they took in 2023. Who knows. I’ll be retired and expect to be coding only for pleasure, not to impress anyone, or even be useful.
Looking to improve your unit and integration tests? I made a short video giving you an overview of 7 libraries that I regularly use when writing any sort of tests in Java, namely: AssertJ Awaitility Mockito Wiser Memoryfilesystem WireMock Testcontainers What’s in the Video? The video gives a short overview of how to use the tools mentioned above and how they work. In order of appearance: AssertJ JUnit comes with its own set of assertions (i.e., assertEquals) that work for simple use cases but are quite cumbersome to work with in more realistic scenarios. AssertJ is a small library giving you a great set of fluent assertions that you can use as a direct replacement for the default assertions. Not only do they work on core Java classes, but you can also use them to write assertions against XML or JSON files, as well as database tables! // basic assertions assertThat(frodo.getName()).isEqualTo("Frodo"); assertThat(frodo).isNotEqualTo(sauron); // chaining string specific assertions assertThat(frodo.getName()).startsWith("Fro") .endsWith("do") .isEqualToIgnoringCase("frodo"); (Note: Source Code from AssertJ) Awaitility Testing asynchronous workflows is always a pain. As soon as you want to make sure that, for example, a message broker received or sent a specific message, you'll run into race condition problems because your local test code executes faster than any asynchronous code ever would. Awaitility to the rescue: it is a small library that lets you write polling assertions, in a synchronous manner! @Test public void updatesCustomerStatus() { // Publish an asynchronous message to a broker (e.g. RabbitMQ): messageBroker.publishMessage(updateCustomerStatusMessage); // Awaitility lets you wait until the asynchronous operation completes: await().atMost(5, SECONDS).until(customerStatusIsUpdated()); ... } (Note: Source Code from Awaitility) Mockito There comes a time in unit testing when you want to make sure to replace parts of your functionality with mocks. Mockito is a battle-tested library to do just that. You can create mocks, configure them, and write a variety of assertions against those mocks. To top it off, Mockito also integrates nicely with a huge array of third-party libraries, from JUnit to Spring Boot. // mock creation List mockedList = mock(List.class); // or even simpler with Mockito 4.10.0+ // List mockedList = mock(); // using mock object - it does not throw any "unexpected interaction" exception mockedList.add("one"); mockedList.clear(); // selective, explicit, highly readable verification verify(mockedList).add("one"); verify(mockedList).clear(); (Note: Source Code from Mockito) Wiser Keeping your code as close to production and not just using mocks for everything is a viable strategy. When you want to send emails, for example, you neither need to completely mock out your email code nor actually send them out via Gmail or Amazon SES. Instead, you can boot up a small, embedded Java SMTP server called Wiser. Wiser wiser = new Wiser(); wiser.setPort(2500); // Default is 25 wiser.start(); Now you can use Java's SMTP API to send emails to Wiser and also ask Wiser to show you what messages it received. for (WiserMessage message : wiser.getMessages()) { String envelopeSender = message.getEnvelopeSender(); String envelopeReceiver = message.getEnvelopeReceiver(); MimeMessage mess = message.getMimeMessage(); // now do something fun! } (Note: Source Code from Wiser on GitHub) Memoryfilesystem If you write a system that heavily relies on files, the question has always been: "How do you test that?" File system access is somewhat slow, and also brittle, especially if you have your developers working on different operating systems. Memoryfilesystem to the rescue! It lets you write tests against a file system that lives completely in memory, but can still simulate OS-specific semantics, from Windows to macOS and Linux. try (FileSystem fileSystem = MemoryFileSystemBuilder.newEmpty().build()) { Path p = fileSystem.getPath("p"); System.out.println(Files.exists(p)); } (Note: Source Code from Memoryfilesystem on GitHub) WireMock How to handle flaky 3rd-party REST services or APIs in your tests? Easy! Use WireMock. It lets you create full-blown mocks of any 3rd-party API out there, with a very simple DSL. You can not only specify the specific responses your mocked API will return, but even go so far as to inject random delays and other unspecified behavior into your server or to do some chaos monkey engineering. // The static DSL will be automatically configured for you stubFor(get("/static-dsl").willReturn(ok())); // Instance DSL can be obtained from the runtime info parameter WireMock wireMock = wmRuntimeInfo.getWireMock(); wireMock.register(get("/instance-dsl").willReturn(ok())); // Info such as port numbers is also available int port = wmRuntimeInfo.getHttpPort(); (Note: Source Code from WireMock) Testcontainers Using mocks or embedded replacements for databases, mail servers, or message queues is all nice and dandy, but nothing beats using the real thing. In comes Testcontainers: a small library that allows you to boot up and shut down any Docker container (and thus software) that you need for your tests. This means your test environment can be as close as possible to your production environment. @Testcontainers class MixedLifecycleTests { // will be shared between test methods @Container private static final MySQLContainer MY_SQL_CONTAINER = new MySQLContainer(); // will be started before and stopped after each test method @Container private PostgreSQLContainer postgresqlContainer = new PostgreSQLContainer() .withDatabaseName("foo") .withUsername("foo") .withPassword("secret"); (Note: Source Code from Testcontainers) Enjoy the video!
Jakarta EE is a unanimously adopted and probably the most popular Java enterprise-grade software development framework. With the industry-wide adoption of microservices-based architectures, its popularity is skyrocketing and during these last years, it has become the preferred framework for professional software enterprise applications and services development in Java. Jakarta EE applications used to traditionally be deployed in run-times or application servers like Wildfly, GlassFish, Payara, JBoss EAP, WebLogic, WebSphere, and others, which might have been criticized for their apparent heaviness and expansive costs. With the advent and the ubiquitousness of the cloud, these constraints are going to become less restrictive, especially thanks to the serverless technology, which provides increased flexibility, for standard low costs. This article demonstrates how to alleviate the Jakarta EE run-times, servers, and applications, by deploying them on AWS Serverless infrastructures. Overview of AWS Fargate As documented in the User Guide, AWS Fargate is a serverless paradigm used in conjunction with AWS ECS (Elastic Container Service) to run containerized applications. In a nutshell, this concept allows us to: Package applications in containers Specify the host operating system, the CPU's architecture, and capacity, the memory requirements, the network, and security policies Execute in the cloud the whole resulting stack. Running containers with AWS ECS requires handling a so-called launch type (i.e. an abstraction layer) defining the way to execute standalone tasks and services. There are several launch types that might be defined for AWS ECS-based containers, and Fargate is one of them. It represents the serverless way to host AWS ECS workloads and consists of components like clusters, tasks, and services, as explained in the AWS Fargate User Guide. The figure below, extracted from the AWS Fargate documentation, is emphasizing its general architecture: As the figure above shows, in order to deploy serverless applications running as ECS containers, we need a quite complex infrastructure consisting in: A VPC (Virtual Private Cloud) An ECR (Elastic Container Registry) An ECS cluster A Fargate launch type by ECS cluster node One or more tasks by Fargate launch type An ENI (Elastic Network Interface) by task Now, if we want to deploy Jakarta EE applications in the AWS serverless cloud as ECS-based containers, we need to: Package the application as a WAR. Create a Docker image containing the Jakarta EE-compliant run-time or application server with the WAR deployed. Register this Docker image into the ECR service. Define a task to run the Docker container built from the previously defined image. The AWS console allows us to perform all these operations in a user-friendly way; nevertheless, the process is quite time-consuming and laborious. Using AWS CloudFormation, or even AWS CLI, we could automatize it, of course, but the good news is that we have a much better alternative, as explained below. Overview of AWS Copilot AWS Copilot is a CLI (Command Line Interface) tool that provides application-first, high-level commands to simplify modeling, creating, releasing, and managing production-ready containerized applications on Amazon ECS from a local development environment. The figure below shows its software architecture: Using AWS Copilot, developers can easily manage the required AWS infrastructure, from their local machine, by executing simple commands which result in the creation of the deployment pipelines, fulfilling all the required resources enumerated above. In addition, AWS Copilot can also create extra resources like subnets, security groups, load balancers, and others. Here is how. Deploying Payara 6 Applications on AWS Fargate Installing AWS Copilot is as easy as downloading and unzipping an archive, such that the documentation is guiding you. Once installed, run the command above to check whether everything works: ~$ copilot --version copilot version: v1.24.0 The first thing to do in order to deploy a Jakarta EE application is to develop and package it. A very simple way to do that for test purposes is by using the Maven archetype jakartaee10-basic-archetype, as shown below: mvn -B archetype:generate \ -DarchetypeGroupId=fr.simplex-software.archetypes \ -DarchetypeArtifactId=jakartaee10-basic-archetype \ -DarchetypeVersion=1.0-SNAPSHOT \ -DgroupId=com.exemple \ -DartifactId=test This Maven archetype generates a simple, complete Jakarta EE 10 project with all the required dependencies and artifacts to be deployed on Payara 6. It generates also all the required components to perform integration tests of the exposed JAX-RS API (for more information on this archetype please see here). Among other generated artifacts, the following Dockerfile will be of real help in our AWS Fargate Cluster setup: Dockerfile FROM payara/server-full:6.2022.1 COPY ./target/test.war $DEPLOY_DIR Now that we have our test Jakarta EE application, as well as the Dockerfile required to run the Payara Server 6 with this application deployed, let's use AWS Copilot in order to start the process of the serverless infrastructure creation. Simply run the following command: Shell $ copilot init Note: It's best to run this command in the root of your Git repository. Welcome to the Copilot CLI! We're going to walk you through some questions to help you get set up with a containerized application on AWS. An application is a collection of containerized services that operate together. Application name: jakarta-ee-10-app Workload type: Load Balanced Web Service Service name: lb-ws Dockerfile: test/Dockerfile parse EXPOSE: no EXPOSE statements in Dockerfile test/Dockerfile Port: 8080 Ok great, we'll set up a Load Balanced Web Service named lb-ws in application jakarta-ee-10-app listening on port 8080. * Proposing infrastructure changes for stack jakarta-ee-10-app-infrastructure-roles - Creating the infrastructure for stack jakarta-ee-10-app-infrastructure-roles [create complete] [76.2s] - A StackSet admin role assumed by CloudFormation to manage regional stacks [create complete] [34.0s] - An IAM role assumed by the admin role to create ECR repositories, KMS keys, and S3 buckets [create complete] [33.3s] * The directory copilot will hold service manifests for application jakarta-ee-10-app. * Wrote the manifest for service lb-ws at copilot/lb-ws/manifest.yml Your manifest contains configurations like your container size and port (:8080). - Update regional resources with stack set "jakarta-ee-10-app-infrastructure" [succeeded] [0.0s] All right, you're all set for local development. Deploy: No No problem, you can deploy your service later: - Run `copilot env init` to create your environment. - Run `copilot deploy` to deploy your service. - Be a part of the Copilot community ! Ask or answer a question, submit a feature request... Visit https://aws.github.io/copilot-cli/community/get-involved/ to see how! The process of the serverless infrastructure creation conducted by AWS Copilot is based on a dialog during which the utility is asking questions and accepts your answers. The first question concerns the name of the serverless application to be deployed. We choose to name it jakarta-ee-10-app. In the next step, AWS Copilot is asking what is the new workload type of the new service to be deployed and proposes a list of such workload types, from which we need to select Load Balanced Web Service. The name of this new service is lb-ws. Next, AWS Copilot is looking for Dockerfiles in the local workspace and will display a list from which you have either to choose one, create a new one, or use an already-existent image, in which case you need to provide its location (i.e., a DockerHub URL). We choose the Dockerfile we just created previously, when we ran the Maven archetype. It only remains for us to define the TCP port number that the newly created service will use for HTTP communication. By default, AWS Copilot proposes the TCP port number 80, but we overload it with 8080. Now, all the required information is collected and the process of infrastructure generation may start. This process consists in creating two CloudFormation stacks, as follows: A first CloudFormation stack containing the definition of the required IAM security roles; A second CloudFormation stack containing the definition of a template whose execution creates a new ECS cluster. In order to check the result of the execution of the AWS Copilot initialization phase, you can connect to your AWS console, go to the CloudFormation service, and you will see something similar to this: As you can see, the two mentioned CloudFormation stacks appear on the screen copy above and you can click on them in order to inspect the details. We just finished the initialization phase of our serverless infrastructure creation driven by AWS Copilot. Now, let's create our development environment: Shell $ copilot env init Environment name: dev Credential source: [profile default] Default environment configuration? Yes, use default. * Manifest file for environment dev already exists at copilot/environments/dev/manifest.yml, skipping writing it. - Update regional resources with stack set "jakarta-ee-10-app-infrastructure" [succeeded] [0.0s] - Update regional resources with stack set "jakarta-ee-10-app-infrastructure" [succeeded] [128.3s] - Update resources in region "eu-west-3" [create complete] [128.2s] - ECR container image repository for "lb-ws" [create complete] [2.2s] - KMS key to encrypt pipeline artifacts between stages [create complete] [121.6s] - S3 Bucket to store local artifacts [create in progress] [99.9s] * Proposing infrastructure changes for the jakarta-ee-10-app-dev environment. - Creating the infrastructure for the jakarta-ee-10-app-dev environment. [create complete] [65.8s] - An IAM Role for AWS CloudFormation to manage resources [create complete] [25.8s] - An IAM Role to describe resources in your environment [create complete] [27.0s] * Provisioned bootstrap resources for environment dev in region eu-west-3 under application jakarta-ee-10-app. Recommended follow-up actions: - Update your manifest copilot/environments/dev/manifest.yml to change the defaults. - Run `copilot env deploy --name dev` to deploy your environment. AWS Copilot starts by asking us what name we want to give to our development environment and continues by proposing to use either the current user default credentials or some temporary credentials created for the purpose. We choose the first alternative. Then, AWS Copilot creates a new stack set, named jakarta-ee-10-app-infrastructure containing the following infrastructure elements: An ECR container to register the Docker image resulted further in the execution of the build operation on the Dockerfile selected during the previous step A new KMS (Key Management Service) key, to be used for encrypting the artifacts belonging to our development environment An S3 (Simple Storage Service) bucket, to be used in order to store inside the artifacts belonging to our development environment A new dedicated CloudFormation IAM role which aims at managing resources A new dedicated IAM role to describe the resources This operation may take a significant time, depending on your bandwidth, and, once finished, the development environment, named jakarta-ee-10-app-dev, is created. You can see its details in the AWS console, as shown below: Notice that the environment creation can be also performed as an additional operation of the first initialization step. As a matter of fact, the copilot init command, as shown above, ends by asking whether you want to create a test environment. Answering yes to this question allows you to proceed immediately with a test environment creation and initialization. For pedagogical reasons, here we preferred to separate these two actions. The next phase is the deployment of our development environment: Shell $ copilot env deploy Only found one environment, defaulting to: dev * Proposing infrastructure changes for the jakarta-ee-10-app-dev environment. - Creating the infrastructure for the jakarta-ee-10-app-dev environment. [update complete] [74.2s] - An ECS cluster to group your services [create complete] [2.3s] - A security group to allow your containers to talk to each other [create complete] [0.0s] - An Internet Gateway to connect to the public internet [create complete] [15.5s] - Private subnet 1 for resources with no internet access [create complete] [5.4s] - Private subnet 2 for resources with no internet access [create complete] [2.6s] - A custom route table that directs network traffic for the public subnets [create complete] [11.5s] - Public subnet 1 for resources that can access the internet [create complete] [2.6s] - Public subnet 2 for resources that can access the internet [create complete] [2.6s] - A private DNS namespace for discovering services within the environment [create complete] [44.7s] - A Virtual Private Cloud to control networking of your AWS resources [create complete] [12.7s] The CloudFormation template created during the previous step is now executed and it results in the creation and initialization of the following infrastructure elements: The new ECS cluster, grouping all the stateless required artifacts An IAM security group to allow communication between containers An Internet Gateway such that the new service be publicly accessible Two private and two public subnets A new routing table with the required rules such that to allow traffic between public and private subnets A private Route53 (DNS) namespace A new VPC (Virtual Private Cloud) which aims at controlling the whole bunch of the AWS resources created during this step Take some time to navigate through your AWS console pages and inspect the infrastructure that AWS Copilot has created for you. As you can see, it's an overladen one and it would have been laborious and time-consuming to create it manually. The sharp-eyed reader has certainly noticed that creating and deploying an environment, like our development one, doesn't activate any service to it. In order to do that, we need to proceed with our last step: the service deployment. Simply run the command below: Shell $ copilot deploy Only found one workload, defaulting to: lb-ws Only found one environment, defaulting to: dev Sending build context to Docker daemon 13.67MB Step 1/2 : FROM payara/server-full:6.2022.1 ---> ada23f507bd2 Step 2/2 : COPY ./target/test.war $DEPLOY_DIR ---> Using cache ---> f1b0fe950252 Successfully built f1b0fe950252 Successfully tagged 495913029085.dkr.ecr.eu-west-3.amazonaws.com/jakarta-ee-10-app/lb-ws:latest WARNING! Your password will be stored unencrypted in /home/nicolas/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded Using default tag: latest The push refers to repository [495913029085.dkr.ecr.eu-west-3.amazonaws.com/jakarta-ee-10-app/lb-ws] d163b73cdee1: Pushed a9c744ad76a8: Pushed 4b2bb262595b: Pushed b1ed0705067c: Pushed b9e6d039a9a4: Pushed 99413601f258: Pushed d864802c5436: Pushed c3f11d77a5de: Pushed latest: digest: sha256:cf8a116279780e963e134d991ee252c5399df041e2ef7fc51b5d876bc5c3dc51 size: 2004 * Proposing infrastructure changes for stack jakarta-ee-10-app-dev-lb-ws - Creating the infrastructure for stack jakarta-ee-10-app-dev-lb-ws [create complete] [327.9s] - Service discovery for your services to communicate within the VPC [create complete] [2.5s] - Update your environment's shared resources [update complete] [144.9s] - A security group for your load balancer allowing HTTP traffic [create complete] [3.8s] - An Application Load Balancer to distribute public traffic to your services [create complete] [124.5s] - A load balancer listener to route HTTP traffic [create complete] [1.3s] - An IAM role to update your environment stack [create complete] [25.3s] - An IAM Role for the Fargate agent to make AWS API calls on your behalf [create complete] [25.3s] - A HTTP listener rule for forwarding HTTP traffic [create complete] [3.8s] - A custom resource assigning priority for HTTP listener rules [create complete] [3.5s] - A CloudWatch log group to hold your service logs [create complete] [0.0s] - An IAM Role to describe load balancer rules for assigning a priority [create complete] [25.3s] - An ECS service to run and maintain your tasks in the environment cluster [create complete] [119.7s] Deployments Revision Rollout Desired Running Failed Pending PRIMARY 1 [completed] 1 1 0 0 - A target group to connect the load balancer to your service [create complete] [0.0s] - An ECS task definition to group your containers and run them on ECS [create complete] [0.0s] - An IAM role to control permissions for the containers in your tasks [create complete] [25.3s] * Deployed service lb-ws. Recommended follow-up action: - You can access your service at http://jakar-Publi-H9B68022ZC03-1756944902.eu-west-3.elb.amazonaws.com over the internet. The listing above shows the process of creation of the whole bunch of the resources required in order to produce our serverless infrastructure containing the Payara Server 6, together with the test Jakarta EE 10 application, deployed into it. This infrastructure consists of a CloudFormation stack named jakarta-ee-10-app-dev-lb-ws containing, among others, security groups, listeners, IAM roles, dedicated CloudWatch log groups, and, most important, an ECS task definition having a Fargate launch type that runs the Payara Server 6 platform. This way makes available our test application, and its associated exposed JAX-RS API, at its associated public URL. You can test it by simply running the curl utility: Shell curl http://jakar-Publi-H9B68022ZC03-1756944902.eu-west-3.elb.amazonaws.com/test/api/myresource Got it ! Here we have appended to the public URL our JAX-RS API relative URN, as displayed by AWS Copilot. You may perform the same test by using your preferred browser. Also, if you prefer to run the provided integration test, you may slightly adapt it by amending the service URL. Don't hesitate to go to your AWS console to inspect the serverless infrastructure created by the AWS Copilot in detail. And once finished, don't forget to clean up your workspace by running the command below which removes the CloudFormation stack jakarta-ee-10-app-lb-ws with all its associated resources: Shell $ copilot app delete Sure? Yes * Delete stack jakarta-ee-10-app-dev-lb-ws - Update regional resources with stack set "jakarta-ee-10-app-infrastructure" [succeeded] [12.4s] - Update resources in region "eu-west-3" [update complete] [9.8s] * Deleted service lb-ws from application jakarta-ee-10-app. * Retained IAM roles for the "dev" environment * Delete environment stack jakarta-ee-10-app-dev * Deleted environment "dev" from application "jakarta-ee-10-app". * Cleaned up deployment resources. * Deleted regional resources for application "jakarta-ee-10-app" * Delete application roles stack jakarta-ee-10-app-infrastructure-roles * Deleted application configuration. * Deleted local .workspace file. Enjoy!
At some point, you begin to lead. This is very different than managing. The difference can be summed up with the phrase, “Leaders make their own problems." I’ll explain that in a bit, but first, let me tell you a story. When I First Realized I Was Missing Something When I first became a director, I had a conversation with another director named Jim. Jim: “So, you’ve been in this role for a month now. Have you figured out your vision for your organization?” Me: [What is he talking about?] I realized I was used to taking direction from others, and I had become a sort of “execution-” focused manager. I didn’t have any idea what my organization needed. I had no idea what the future should look like. I was managing my organization, not leading it. Interrogate Yourself I brought this up with an executive coach I was working with at the time. He encouraged me to set time aside every week to really think deeply about my area of the product. At first, it was hard to preserve the time I set aside. In my busy schedule, it seemed luxurious to save a couple of hours just for thinking, but it quickly became some of the most valuable time I spent each week. I would spend time each week thinking amount my organization, the product, and how things were going. I researched other products that competed with mine. I spent time using the product. I thought critically about our technical and product failings. I ended up developing a better sense of what possible futures I could create. I started to see possibilities that were invisible to me. I started to learn how to lead. Mental Models: Try and Catch Them All! At around this time, my manager challenged me to start using more mental models in how I assessed my organization. He suggested looking at the situation from the point of view of: A systems thinker (this was my default) An architect A product manager A designer Most leaders use their own experience as their main mental model. Being able to add new mental models to your toolkit makes you more flexible. Each mental model can alter what you think the future should look like, and what the present should look like. It can also inform you of the actions you should take to make things happen. As I started developing a habit of looking at things from other mental models, I began to see blind spots in my previous ways of looking at the world. Interrogate Yourself With What Exactly? Each week, I sat down and looked at an evolving set of questions. I would go through the list and write answers to the questions on the list. Here is my list of questions: Review High-Level Objectives What are the results that are needed? What needs to be true in a year? X needs to be Y by Z. What would need to change for that to happen? What are the implications of that need? How are the teams performing? On time, hitting the mark, right characteristics? How are they performing AS a team? Where are things on the path to fail? What’s the worst thing that could happen? Is there anything I think my boss is making a mistake on? Can all of my reports take over my job in 1-2 years? What seems like it’s not working right now? What’s going well that could be turned into a repeatable process? What should I worry about? What do my managers need? What does my org need? What do I need? Review Progress Against Objectives How on track is everything? How long should the rope be for all of my objectives? Agree to them with other people. For example: when should we expect reliability to improve? Review team health. Review progress against goals. How are team members doing against their goals? Identify Next Actions What actions should be taken? I found writing down the answers to these questions to be tremendously helpful. If you’re an external processor, you might find it more useful to talk with others about these questions. For me, I did a bit of both. I found a few other people that I would talk about these things with. But I found the private thinking time valuable — it gave me more material to test with others in conversation. Leaders Make Their Own Problems A leader is a person that asserts a future for a group of people. If you want to lead, you have to learn to make your own problems. And hopefully, you’ll find some of the processes I’ve used to develop that future useful. Do you have your own methods? Please share them with me! Thank You Jim Ruppert helped me see that I didn’t have a vision for my organization. Robert Goldmann helped me learn the perspective I needed to lead better. He helped me develop some of the questions on this list. Alex Kroman challenged me to consider things through multiple mental models.
Database sharding is the process of dividing data into smaller pieces called "shards." Sharding is typically introduced when there’s a need to scale writes. During the lifespan of a successful application, the database server will hit the maximum number of writes it can perform either at the processing or capacity level. Slicing the data into multiple shards—each one on its own database server—reduces the stress on each individual node, effectively increasing the write capacity of the overall database. This is what database sharding is. Distributed SQL is the new way to scale relational databases with a sharding-like strategy that's fully automated and transparent to applications. Distributed SQL databases are designed from the ground up to scale almost linearly. In this article, you'll learn the basics of distributed SQL and how to get started. Disadvantages of Database Sharding Sharding introduces a number of challenges: Data partitioning: Deciding how to partition data across multiple shards can be a challenge, as it requires finding a balance between data proximity and even distribution of data to avoid hotspots. Failure handling: If a key node fails and there are not enough shards to carry the load, how do you get the data on a new node without downtime? Query complexity: Application code is coupled to the data-sharding logic and queries that require data from multiple nodes need to be re-joined. Data consistency: Ensuring data consistency across multiple shards can be a challenge, as it requires coordinating updates to the data across shards. This can be particularly difficult when updates are made concurrently, as it may be necessary to resolve conflicts between different writes. Elastic scalability: As the volume of data or the number of queries increases, it may be necessary to add additional shards to the database. This can be a complex process with unavoidable downtime, requiring manual processes to relocate data evenly across all shards. Some of these disadvantages can be alleviated by adopting polyglot persistence (using different databases for different workloads), database storage engines with native sharding capabilities, or database proxies. However, while helping with some of the challenges in database sharding, these tools have limitations and introduce complexity that requires constant management. What Is Distributed SQL? Distributed SQL refers to a new generation of relational databases. In simple terms, a distributed SQL database is a relational database with transparent sharding that looks like a single logical database to applications. Distributed SQL databases are implemented as a shared-nothing architecture and a storage engine that scales both reads and writes while maintaining true ACID compliance and high availability. Distributed SQL databases have the scalability features of NoSQL databases—which gained popularity in the 2000s—but don’t sacrifice consistency. They keep the benefits of relational databases and add cloud compatibility with multi-region resilience. A different but related term is NewSQL (coined by Matthew Aslett in 2011). This term also describes scalable and performant relational databases. However, NewSQL databases don’t necessarily include horizontal scalability. How Does Distributed SQL Work? To understand how Distributed SQL works, let’s take the case of MariaDB Xpand—a distributed SQL database compatible with the open-source MariaDB database. Xpand works by slicing the data and indexes among nodes and automatically performing tasks such as data rebalancing and distributed query execution. Queries are executed in parallel to minimize lag. Data is automatically replicated to make sure that there’s no single point of failure. When a node fails, Xpand rebalances the data among the surviving nodes. The same happens when a new node is added. A component called rebalancer ensures that there are no hotspots—a challenge with manual database sharding—which occurs when one node unevenly has to handle too many transactions compared to other nodes that may remain idle at times. Let’s study an example. Suppose we have a database instance with some_table and a number of rows: We can divide the data into three chunks (shards): And then move each chunk of data into a separate database instance: This is what manual database sharing looks like. Distributed SQL does this automatically for you. In the case of Xpand, each shard is called a slice. Rows are sliced using a hash of a subset of the table’s columns. Not only is data that is sliced, but indexes are also sliced and distributed among the nodes (database instances). Moreover, to maintain high availability, slices are replicated in other nodes (the number of replicas per node is configurable). This also happens automatically: When a new node is added to the cluster or when one node fails, Xpand automatically rebalances the data without the need for manual intervention. Here’s what happens when a node is added to the previous cluster: Some rows are moved to the new node to increase the overall system capacity. Keep in mind that, although not shown in the diagram, indexes as well as replicas are also relocated and updated accordingly. A slightly more complete view (with a slightly different relocation of data) of the previous cluster is shown in this diagram: This architecture allows for nearly linear scalability. There’s no need for manual intervention at the application level. To the application, the cluster looks like a single logical database. The application simply connects to the database through a load balancer (MariaDB MaxScale): When the application sends a write operation (for example, INSERT or UPDATE), the hash is calculated and sent to correct the slice. Multiple writes are sent in parallel to multiple nodes. When Not To Use Distributed SQL Sharding a database improves performance but also introduces additional overhead at the communication level between nodes. This can lead to slower performance if the database is not configured correctly or if the query router is not optimized. Distributed SQL might not be the best alternative in applications with less than 10K queries per second or 5K transactions per second. Also, if your database consists of mostly many small tables, then a monolithic database might perform better. Getting Started With Distributed SQL Since a distributed SQL database looks to an application as if it was one logical database, getting started is straightforward. All you need is the following: An SQL client like DBeaver, DbGate, DataGrip, or any SQL client extension for your IDE A distributed SQL database Docker makes the second part easy. For example, MariaDB publishes the mariadb/xpand-single Docker image that allows you to spin up a single-node Xpand database for evaluation, testing, and development. To start an Xpand container, run the following command: Shell docker run --name xpand \ -d \ -p 3306:3306 \ --ulimit memlock=-1 \ mariadb/xpand-single \ --user "user" \ --passwd "password" See the Docker image documentation for details. Note: At the time of writing this article, the mariadb/xpand-single Docker image is not available on ARM architectures. On these architectures (for example Apple machines with M1 processors), use UTM to create a virtual machine (VM) and install, for example, Debian. Assign a hostname and use SSH to connect to the VM to install Docker and create the MariaDB Xpand container. Connecting to the Database Connecting to an Xpand database is the same as connecting to a MariaDB Community or Enterprise server. If you have the mariadb CLI tool installed, simply execute the following: Shell mariadb -h 127.0.0.1 -u user -p You can connect to the database using a GUI for SQL databases like DBeaver, DataGrip, or an SQL extension for your IDE (like this one for VS Code). We are going to use a free and open-source SQL client called DbGate. You can download DbGate and run it as a desktop application or since you are using Docker, you can deploy it as a web application that you can access from anywhere via a web browser (similar to the popular phpMyAdmin). Simply run the following command: Shell docker run -d --name dbgate -p 3000:3000 dbgate/dbgate Once the container starts, point your browser to http://localhost:3000/. Fill in the connection details: Click on Test and confirm that the connection is successful: Click on Save and create a new database by right-clicking on the connection in the left panel and selecting Create database. Try creating tables or importing an SQL script. If you just want to try out something, the Nation or Sakila are good example databases. Connecting From Java, JavaScript, Python, and C++ To connect to Xpand from applications you can use the MariaDB Connectors. There are many programming languages and persistence framework combinations possible. Covering this is outside of the scope of this article, but if you just want to get started and see something in action, take a look at this quick start page with code examples for Java, JavaScript, Python, and C++. The True Power of Distributed SQL In this article, we learned how to spin up a single-node Xpand for development and testing purposes as opposed to production workloads. However, the true power of a distributed SQL database is in its capability to scale not only reads (like in classic database sharding) but also writes by simply adding more nodes and letting the rebalancer optimally relocate the data. Although it is possible to deploy Xpand in a multi-node topology, the easiest way to use it in production is through SkySQL. If you want to learn more about distributed SQL and MariaDB Xpand, here's a list of useful resources: MariaDB Xpand for distributed SQL (video animation) MariaDB Xpand documentation Taking Distributed SQL to the Next Level with Columnar Indexing (talk) Getting Started With Distributed SQL (refcard)
“There are three kinds of variance: invariance, covariance, and contravariance…” It looks pretty scary already, doesn’t it? If we search Wikipedia, we will find covariance and contravariance in category theory and linear algebra. Some of you who learned these subjects in university might be having dreadful flashbacks because it can be complex stuff. Because these terms look so scary, people avoid learning this topic regarding programming languages. From my experience, many middle-level and sometimes senior-level Java and Kotlin developers fail to understand type variance. This leads to a poor design of internal APIs because to create convenient APIs using generics, you need to understand type variance, otherwise, you either don’t use generics at all or use them incorrectly. It is all about creating better APIs. If we compare a program to a building, then its internal API is the foundation of the building. If your internal API is convenient, then your code is more robust and maintainable. So let’s fill this gap in our knowledge. The best way to explain this topic is with a historical and evolutionary perspective. I will start by considering examples from ancient and primitive concepts such as arrays that appeared in early Java versions, through Java Collections API, and finally, Kotlin, which has an advanced type variance. Going from simple to more complex examples, you’ll see how language features have evolved and what problems were solved by introducing these language features. After reading this article no mysteries will remain about Java’s “? extends,” “? super,” or Kotlin’s “in” and “out.” For illustration purposes, I’ll be using the same example of type hierarchy everywhere: We have a base class called Person, a subclass called Employee, and another subclass called Manager. Each Employee is a Person, each Manager is a Person, and each Manager is an Employee, but not necessarily vice versa: some of the Persons are not Employees. In Java and Kotlin, this means you can assign an expression of type Manager to a variable of type Employee and so on, but not vice versa. We will also consider a lot of code examples and for all of them, we’re interested in only four kinds of possible outcomes. We will use emojis to identify them: The code won’t compile. The code will compile and run, but there will be a runtime exception. The code will compile and run normally. The heap pollution will occur. Heap pollution is a situation where a variable of a certain type contains an object of the wrong type. For example, a variable declared as a String refers to an instance of a Manager or Employee. Yes, it is what it looks like: a flaw in the language’s type system. In general, this should not happen, but this sometimes happens both in Java and in Kotlin, and I’ll show you an example of heap pollution as well. Covariance of Reified Java Arrays Arrays have been present in Java for more than twenty-five years, starting from Java 1.0, and in a way, we can consider arrays as a prototype for generics. For example, when we have a Manager type, we can build an array Manager[] and by getting elements of this array, we are getting values of the Manager type. It is trivial about the types of variables that we get from the array, but what about assigning the values to the array’s elements? Can we assign a Manager as an element of Employee[]? And what about the Person? All of the possible combinations are represented in the table below. Have a look and try to figure out what is going on: The result of assigning a value to an element of a Java array The rightmost column is green because in Java null can be assigned (and returned) everywhere. In the lower-left corner, we have cases that won’t compile, which also makes sense: you cannot assign a Person to an Employee or Manager without an explicit type cast, and thus, you cannot set a Person as an element of an array of employees or managers. That’s the main idea of type checking! Everything was understandable so far, but what about the rest of the combinations? We would expect that assigning an Employee to an element of Employee[], Person[], or Object[] will cause no problems, just as assigning it to a variable of type Employee, Person, or Object. What do these exclamation marks mean? A runtime exception? Why? What is this exception and what can go wrong? I will explain this soon. Meanwhile, let’s consider another question: Can we assign a Java array of a given type to an array of another type? That is, can we assign Employee[] to Person[]? And vice versa? All the possible combinations are given in the following table: Can we assign a Java array of a given type to an array of another type? We could remove square brackets and this would give us a table of possible assignments of simple objects: Employee is assignable to Person, but Person is not assignable to Employee. Since each Manager is an Employee, then an array of managers is an array of employees, right? At this point, we can already say that arrays in Java are covariant against the types of their elements, but we will go back to strict terms soon. The following UML diagram is valid: Covariance Now have a look at the code below to see how it behaves: Java Manager[] managers = new Manager[10]; Person[] persons = managers; //this should compile and run persons[0] = new Person(); //line 1 ?? Manager m = managers[0]; //line 2 ?! Nothing special happens in the beginning. Since the Manager is a Person, the assignment is possible. But since arrays, just like any objects, are reference types in Java, both manager and person variables keep the reference to the same object. On line 1, we are trying to insert a Person into this array. Note: the compiler type-checking cannot prevent us from doing this. But if this line is allowed to be executed, then, on line 2, we should expect a catastrophic error: an array of Managers will contain someone who is not a Manager—in other words, heap pollution. But Java won’t let you do it here. Experienced Java developers might know that an ArrayStoreException will occur on line 1. To prevent heap pollution, an array object “knows” the type of its elements in runtime, and each time we assign a value, a runtime check is performed. This explains the exclamation marks in one of the previous tables: writing a non-null value to any Java array, generally speaking, may lead to an ArrayStoreException if the actual type of the array is the subtype of the array variable. The ability of a container to “know” the type of its elements is called “reification.” So now we know that arrays in Java are covariant and reified. To sum up, we may say that: The need for arrays reification and runtime check (and possible runtime exceptions) comes from the covariance of arrays (the fact that the Manager[] array can be assigned to Person[]). Covariance is safe when we read values, but can lead to problems when we write values. Note: the problem is so huge that Java even abandoned the main static language objective here, that is to have all the type checking in compile time, and behaves more like a dynamically-typed language (e.g., Python) in this scenario. You might ask: “Was covariance the right choice for Java arrays?” “What if we just prohibit the assignment of arrays of different types?” In this case, it would have been impossible to assign Manager[] to Person[], we would have known the array elements type at compile time, and there would have been no need to resort to run-time checking. The ability of the type to be assignable only to the variables of the same type strictly is called invariance, and we will discover it in Java and Kotlin generics very soon. But imagine the problems that the invariance of arrays would have led to in Java. Imagine we have a method that accepts a Person[] as its argument and calculates, for example, the average age of the given people: Java Double calculateAverageAge(Person[] people) Now we have a variable of type Manager[]. Managers are people, but can we pass this variable as an argument for calculateAverageAge? In Java we can because of the covariance of arrays. If arrays were invariant, we would have to create a new array of type Person[], copy all the values from Manager[] to this array, and only then call the method. The memory and CPU overhead would have been enormous. This is why invariance is impractical in APIs, and this is the real reason why Java arrays are covariant (although covariance implies difficulties with value assignments). The example of Java arrays shows the full range of problems associated with type variance. Java and Kotlin generics tried to address these problems. Invariance of Java and Kotlin Mutable Lists I believe you are familiar with the concept of generics. In Java and Kotlin, given that list is not empty, we will have the following return types of list.get(0): type of list type of list.get(0) List<Person> Person List<?> Object List<*> Any? The difference between Java and Kotlin is in the last two lines. Both Java and Kotlin have a notion of an “unknown” type parameter: both List<?> in Java and List<*> in Kotlin denote “a List of elements of some type, and we don’t know/don’t care what the type is.” In Java, everything is nullable, thus the Object type returned by list.get(...) can be null. In Kotlin, we have to care about nullability, thus get the method for List<*> returns Any? Now, let’s build the same tables we have previously built for Java arrays. First, let’s consider the assignment of elements. Here we will find a huge difference between Java and Kotlin Collections API (and as we will discover very soon, this difference is tightly related to the difference between type variance in Java and Kotlin). In Java, every List has methods for its modification (add, remove, and so on). The difference between mutable and immutable collections in Java is visible only in runtime. We may have UnsupportedModificationException if we try to change an immutable list. In Kotlin, mutability is visible at compile time. The List interface itself does not have any modification methods, and if we want mutability, we need to use MutableList. In other respects, List<..> in Java and MutableList<..> in Kotlin are nearly the same. Here are the results of the list.add(…) method in Java and Kotlin: What is the result of list.add(…) method in Java and Kotlin? Why we cannot add a null to MutableList<*> is understandable: “star” may mean any type, both nullable and non-nullable. Since we don’t know anything about the actual type and its nullability, we cannot allow adding nullable values to MutableList<*>. Note: we don’t have anything similar to ArrayStoreException, although the table looks similar to the one we have built for arrays so far. Now, let’s try to figure out when we can assign Java and Kotlin lists to each other. All the possible combinations are presented here: Can we assign these lists to each other? The rightmost green column means that List<?>/MutableList<*> are universally assignable: since we “don’t care” about the actual type parameter, we can assign anything. In the rest of the diagram, we see the green diagonal, which means that MutableList<...> can be assigned only to a MutableList parameterized with the same type. In other words, List<T> in Java and MutableList<T> in Kotlin are invariant against the type parameters. This cuts off the possibility of insertion of elements of the wrong type already in compilation time: Java List<Manager> managers = new ArrayList<>(); List<Person> persons = managers; //won't compile persons.add(new Person()); //no runtime check is possible Two concerns may arise at this point: As we know from the Java arrays example, invariance is bad for building APIs. What if we need a method that processes List<Person>, but can be called with List<Manager> without having to copy the whole list element by element? Why not implement everything the same way as for arrays? The answer for the first concern is the declaration site and use site variance that we are going to consider soon. The answer to the second question is that, unlike arrays, which are reified, generics in Java and Kotlin are type erased, which means they have no information about their type parameters in run time, and run-time type checking is impossible. Let’s dive deeper into type erasure now. Type Erasure, Generics/Arrays Incompatibility, and Heap Pollution One of the reasons why the Java platform implements generics via type erasure is purely historical. Generics appeared in Java version 5 when the Java platform already was quite mature. Java keeps backward compatibility at the source code and bytecode level, which means that very old source code can be compiled in modern Java versions, and very old compiled libraries can be used in modern applications by placing them on the classpath. To facilitate an upgrade to Java 5, the decision had been made to implement Generics as a language feature, not a platform feature. This means that in run time JVM doesn’t know anything about generics and their type parameters. For example, a simple Pair<T> class is compiled to byte code in the following way (type parameter T is “erased” and replaced with Object): Generic Type (Source) Raw Type (compiled) Java class Pair<T> { private T first; private T second; Pair(T first, T second) { this.first = first; this.second = second; } T getFirst() {return first; } T getSecond() {return second; } void setFirst(T newValue) {first = newValue;} void setSecond(T newValue) {second = newValue;} } Java class Pair { private Object first; private Object second; Pair(Object first, Object second) { this.first = first; this.second = second; } Object getFirst() {return first; } Object getSecond() {return second; } void setFirst(Object newValue) {first = newValue;} void setSecond(Object newValue) {second = newValue;} } Or, if we use bounded types in the generic type definition, the type parameter is replaced with boundary type: Generic Type (source) Raw Type (compiled) Java class Pair<T extends Employee>{ private T first; private T second; Pair(T first, T second) { this.first = first; this.second = second; } T getFirst() {return first; } T getSecond() {return second; } void setFirst(T newValue) {first = newValue;} void setSecond(T newValue) {second = newValue;} } Java class Pair { private Employee first; private Employee second; Pair(Employee first, Employee second) { this.first = first; this.second = second; } Employee getFirst() {return first; } Employee getSecond() {return second; } void setFirst(Employee newValue) {first = newValue;} void setSecond(Employee newValue) {second = newValue;} } This implies many strict and sometimes counterintuitive limitations on how we can use generics in Java and Kotlin. If you want to know more details (e.g. if you want to know more about bounded types, and know what “bridge methods” are), you can refer to my lecture on Java Generics titled Mainor 2022: Java Generics. But the most important restriction is the following: neither in Java nor Kotlin can we determine the type parameter in the runtime. In the following situation, These code snippets won’t compile: Java Kotlin Java if (a instanceof Pair<String>) ... Kotlin if (a is Pair<String>) ... But these will compile and run successfully, although probably we would like to know more about a: Java Kotlin Java if (a instanceof Pair<?>) ... Kotlin if (a is Pair<*>) ... An important implication of this is Java arrays and generic incompatibility. For example, the following line wont compile in Java with the error “generic array creation:” Java List<String>[] a = new ArrayList<String>[10]; As we know, Java arrays need to keep the full type information in runtime, while all the information that will be available in this case is that it is an array of ArrayList of something unknown (“String” type parameter will be erased). Interestingly, we can overcome this protection and create an array of generics in Java (either via type cast or varargs (variable arguments) parameter), and then easily make heap pollution with it. But let’s consider another example. It doesn’t involve Java arrays and thus it is possible both in Java and Kotlin: Java Kotlin Java Pair<Integer> intPair = new Pair<>(42, 0); Pair<?> pair = intPair; Pair<String> stringPair = (Pair<String>) pair; stringPair.b = "foo"; System.out.println(intPair.a * intPair.b); Kotlin var intPair = Pair<Int>(42, 0) var pair: Pair<*> = intPair var stringPair: Pair<String> = pair as Pair<String> stringPair.b = "foo" println(intPair.a * intPair.b) An example of heap pollution. A chimera appears! First, we create a pair of integers. Then we “forget” its type in compile time and through explicit typecast we are casting it to a pair of Strings. Note: we cannot cast intPair to stringPair straightforwardly: Integer cannot be cast to String, and the compiler will warn us about it. But we can do this via Pair<?> / Pair <*>: although there will be a warning about unsafe typecast, the compiler won’t prohibit the typecast in this scenario (we can imagine a Pair<String> casted to Pair<?> and then explicitly casted back to Pair<String>). Then something weird happens: we assign a String to the second component of our object, and this code is going to compile and run. It compiles because the compiler “thinks” that b has a type of String. It runs because in runtime there are no checks, and the type of b is Object. After the execution of this line, we have a “chimera” object: its first variable is Integer, its second variable is String, and it’s neither Pair<String> nor Pair<Integer>. We’ve broken the type safety of Java and Kotlin and made heap pollution. To sum up: Because of type erasure, it’s impossible to perform type checking of objects passed to generics in run time. It’s unsafe to store type-erased generics in Java native reified arrays. Both Java and Kotlin languages permit heap pollution: a situation where a variable of some type refers to an object that is not of that type. Use Site Covariance Imagine we are facing the following practical task: we are implementing a class MyList<E>, and we want it to have the ability to add elements from other lists via the addAllFrom method and the ability to add its elements to another list via addAllTo. Since we have the usual Manager – Employee – Person inheritance chain, these must be the valid and invalid options: Java MyList<Manager> managers = ... MyList<Employee> employees = ... //Valid options, we want these to be compilable! employees.addAllFrom(managers); managers.addAllTo(employees); //Invalid options, we don't want these to be compilable! managers.addAllFrom(employees); employees.addAllTo(managers); A naive approach (the one that, unfortunately, I’ve seen many times in real life projects) is to use type parameters straightforwardly: Java class MyList<E> implements Iterable<E> { void add(E item) { ... } //Don't do this :-( void addAllFrom(MyList<E> list) { for (E item : list) this.add(item); } void addAllTo(MyList<E> list) { for (E item : this) list.add(item); } ...} Now, when we try to write the following code, it will not compile. Java MyList<Manager> managers = ...; MyList<Employee> employees = ...; employees.addAllFrom(managers); managers.addAllTo(employees); I often see people struggling with this: they tried to introduce generic classes in their code, but these classes were unusable. Now we know why this happens: it is due to the invariance of MyList. We have figured out that due to the lack of runtime type-checking, type invariance is the best that can be done for type safety of Java’s List/ Kotlin’s MutableList. Both Java and Kotlin offer a solution for this problem: to create convenient APIs, we need to use wildcard types in Java or type projections in Kotlin. Let’s look at Java first: Java class MyList<E> implements Iterable<E> { void addAllFrom (List<? extends E> list){ for (Е item: list) add(item); } } MyList<Manager> managers = ...; MyList<Employee> employees = ... employees.addAllFrom(managers); List<? extends E> means: “a list of any type will do as soon as this type is a subtype of E.” When we iterate over this list, the items can be safely cast to “E.” And since our list is a list of “E,” then we can safely add these elements to our list. The program will compile and run. In Kotlin, this looks very similar, but instead of “? extends E,” we are using “out E:” Kotlin class MyList<E> : Iterable<E> { fun addAllFrom(list: MyList<out E>) { for (item in list) add(item) } } val managers: MyList<Manager> = ... ; val employees: MyList<Employee> = ... employees.addAllFrom(managers) By declaring <? extends E> or <out E> are making the type of the argument covariant. But to avoid heap pollution, this implies certain limitations to what can be done with the variable declared with wildcard types/type projections. One of my favourite questions for a Java technical interview is: given a variable declared as List<? extends E> list in Java, what can be done with this variable? Of course, we can use list.get(...), and the return type will be E. On the other hand, if we have a variable E element, we cannot use list.add(element): such code won’t compile. Why? We know that the list is a list of elements of some type which is a subtype of E. But we don’t know what subtype. For example, if E is Person, then ? extends E might be Employee or Manager. We cannot blindly append a Person to such a list then. An interesting exception: list.add(null) will compile and run. This happens because null in Java is assignable to a variable of any type, and thus it is safe to add it to any list. We can also use an “unbounded wildcard” in Java, which is just a question mark in triangular braces, like in Foo<?>. The rules for it are as follows: If Foo<T extends Bound>, then Foo<?> is the same as Foo<? extends Bound>. We can read elements, but only as Bound (or Object, if no Bound is given). If we’re using intersection types Foo<T extends Bound1 & Bound2>, any of the bound types will do. We can put only null values. What about covariant types in Kotlin? Unlike Java, nullability now plays a role. If we have a function parameter with a type MyList<out E?>: We can read values typed E?. We cannot add anything. Even null won’t do because, although we have nullable E?, out means any subtype. In Kotlin, a non-nullable type is a subtype of a nullable type. So the actual type of the list element might be non-nullable, and this is why Kotlin won’t let you add an even null to such a list. Use Site Contravariance We’ve been talking about covariance so far. Covariant types are good for reading values and bad for writing. What about contravariance? Before figuring out where it might be needed, let’s have a look at the following diagram: Unlike in covariant types, subtyping works the other way around in contravariant ones, and this makes them good for writing values, but bad for reading. The classical example of a use case for contravariance is Predicate<E>, which is a functional type that takes E as an argument and returns a boolean value. The wider the type of E in a predicate, the more “powerful” it is. For example, Predicate<Person> can substitute Predicate<Employee> (because Employee is a Person), and thus it can be considered as its subtype. Of course, everything is invariant in Java and Kotlin by default, and this is why we need to use another kind of wildcard type and type projections. The addAllTo method of our MyList class can be implemented the following way: Java class MyList<E> implements Iterable<E> { void addAllTo (List<? super E> list) { for (Е item: this) list.add(item); } } MyList<Employee> employees = ...; MyList<Person> people = ...; employees.addAllTo(people); List<? super E> means “a list of any type will do as soon as this type is E or a supertype of E, up to Object.” When we iterate over our list, our items, which have type E, can be safely cast to this unknown type and can be safely added to another list. The program will compile and run. In Kotlin it looks the same, but we use MyList<in E> instead of MyList<? super E>: Kotlin class MyList<E> : Iterable<E> { fun addAllTo(list: MyList<in E>) { for (item in this) list.add(item) } } val employees: MyList<Employee> = ... ; val people: MyList<Person> = ... employees.addAllTo(people) What Can Be Done With an Object Typed List<? super E> in Java? When we have an element of type E, we can successfully add it to this list. The same works for null. null can be added everywhere in Java. We can call get(..) method for such a list, but we read its values only as Objects. Indeed, <? super E> means that the actual type parameter is unknown and can be anything up to Object, so Object is the only safe assumption about the type of list.get(..). And what about Kotlin? Again, nullability plays a role. If we have a parameter list: MyList<in E>, then: We can add elements of type E to the list. We cannot add nulls (but we can add nulls if the variable is declared like MyList<in E?>). The type of its elements (e. g. the type of list.first()) is Any? – mind the question mark. In Kotlin, “Any?” is a universal supertype, while “Any” is a subtype of “Any?”. If a type is contravariant, it can always potentially hold nulls. PECS The Mnemonic Rule for Java Now we know that covariance is for reading (and writing is generally prohibited to a covariantly-typed object), and contravariance is for writing (and although we can read values for contravariant-typed objects, all the type information is lost). Joshua Bloch in his famous “Effective Java” book proposes the following mnemonic rule for Java programmers: PECS: Producer — Extends, Consumer — Super This rule makes it simple to reason about the correct wildcard types in your API. If, for example, an argument for our method is a Function, we should always (no exceptions here) declare it this way: Java void myMethod(Function<? super T, ? extends R> arg) The T parameter in Function is a type of the input, i.e. something that is being consumed, and thus we use ? super for it. The R parameter is the result, something that is produced, and thus we use ? extends. This trick will allow us to use any compatible Function as an argument. Any Function that can process T or its supertype will do, as well as any Function that yields R or any of its subtypes. In the standard Java library API, we can see a lot of examples of wildcard types, all of them following the PECS rule. For example, a method that finds a maximum number in a Collection given a Comparator is defined like this: Java public static <T> T max (Collection<? extends T> coll, Comparator<? super T> comp) This allows us to conveniently use the following parameters: Collections.max(List<Integer>, Comparator<Number>) (if we can compare any Numbers, then we can compare Integers), Collections.max(List<String>, Comparator<Object>) (if we can compare Objects, then we can compare Strings). In Kotlin, it is easy to memorize that producers always use the “out” keyword and consumers use “in.” Although Kotlin syntax is more concise and “in/out” keywords make it clearer which type is used for producer and which for consumer, it is still very useful to understand that “out” actually means a subtype, while “in” means a supertype. Declaration Site Variance in Kotlin Now we’re going to consider a feature that Kotlin has and Java doesn’t have: declaration site variance. Let’s have a look at Kotlin’s immutable List. When we check the assignability of Kotlin’s List, we find that it looks similar to Java arrays. In other words, Kotlin’s List is covariant itself: Can we assign these immutable lists to each other? Сovariance for Kotlin’s List doesn’t imply any problems related to Java covariant arrays, since you cannot add or modify anything. When just reading the values, we can safely cast Manager to Employee. That’s why a Kotlin function that requires List<Person> as its parameter will happily accept, say, List<Manager> even if that parameter does not use type projections. There is no similar functionality in Java. When we compare the declaration of the List interface in Java and Kotlin, we’ll see the difference: Java Kotlin Java public interface List<E> extends Collection<E> {...} Kotlin public interface List<out E> : Collection<E> {...} The keyword “out” in type declaration makes the List interface in Kotlin a covariant type everywhere. Of course, you cannot make any type covariant in Kotlin: only those that are not using type parameters as an argument of a public method (while return type for E is OK). So it’s a good idea to declare all your immutable classes as covariant in Kotlin. In our ‘MyList’ example we might also want to introduce an immutable pair like this: Kotlin class MyImmutablePair<out E>(val a: E, val b: E) In this class, we can only declare methods that return something of type E, but not public methods that will have E-typed arguments. Note: constructor parameters and private methods with E-typed arguments are OK. Now, if we want to add a method that takes values from MyImmutablePair, we don’t need to bother about use-site variance. Kotlin class MyList<E> : Iterable<E> { //Don't bother about use-site type variance! fun addAllFrom(pair: MyImmutablePair<E>){ add(pair.a); add(pair.b) } ... } val twoManagers: MyImmutablePair<Manager> = ... employees.addAllFrom(twoManagers) The same applies to contravariance, of course. We might want to define a contravariant class MyConsumer in this way: Kotlin class MyConsumer<in E> { fun consume(p: E){ ... } } As soon as we defined a type as contravariant, the following limitations emerge: We can define methods that have E-typed arguments, but we cannot expose anything of type E. We can have private class variables of type E, and even private methods that return E. The addAllTo method, which dumps all the values to the given consumer, now doesn’t need to use type projections. The following code will compile and run: Kotlin class MyList<E> : Iterable<E> { //Don't bother about use-site type variance! fun addAllTo(consumer: MyConsumer<E>){ for (item in this) consumer.consume(item) } ... } val employees: MyList<Employee> = ... val personConsumer: MyConsumer<Person> = ... employees.addAllTo(personConsumer) The one thing that’s worth mentioning is how declaration-site variance influences star projection Foo<*>. If we have an object typed Foo<*>, does it matter if Foo class is defined as invariant, covariant, or contravariant if we want to do something with this object? If the original type declaration is Foo<T : TUpper> (invariant), then, of course, you can read values as TUpper, and you cannot write anything (even null), because we don’t know the exact type. If Foo<out T : TUpper> is covariant, you can still read values as TUpper, and you cannot write anything just because there are no public methods for writing in this class. If Foo<in T : TUpper> is contravariant, then you cannot read anything (because there are no such public methods) and you still cannot write anything (because you “forgot” the exact type). So the contravariant Foo<*> variable is the most useless thing in Kotlin. Kotlin Is Better for the Creation of Fluent APIs When we consider switching between languages, the most important question is: what can a new language provide that cannot be achieved with the old one? The more concise syntax is good, but if everything that a new language offers is just syntactic sugar, then maybe it is not worth switching from familiar tools and ecosystems. In regards to type variance in Kotlin vs. Java, the question is: does declaration site variance provide the options that are impossible in Java with wildcard types? In my opinion, the answer is definitely yes, as use site variance is not just about getting rid of “? extends” and “? super” everywhere. Here’s a real-life example of the problems that arise when we design APIs for data streaming processing frameworks (in particular, this example relates to Apache Kafka Streams API). The key classes of such frameworks are abstractions of data streams, like KStream<K>, which are semantically covariant: stream of Employee can be safely considered as a stream of Person if all that we are interested in are Person’s properties. Now imagine that in library code we have a class which accepts a funciton capable of transforming into a stream. Java class Processor<E> { void withFunction(Function<? super KStream<E>, ? extends KStream<E>> chain) {...} } In the user’s code these functions may look like this: Java KStream<Employee> transformA(KStream<Employee> s) {...} KStream<Manager> transformB(KStream<Person> s) {...} As you can see, both of these functions can work as a transformer from KStream<Employee> to KStream<Employee>. But if we try to use them as method references passed to the withFunction method, only the first one will do: Java Processor<Employee> processor = ... //Compiles processor.withFunction(this::transformA); //Won't compile with "KStream<Employee> is not convertible to KStream<Person>" processor.withFunction(this::transformB); The problem cannot be fixed by just adding more “? extends.” If we define the class in this way: Java class Processor<E> { //A mind-blowing number of question marks void withFunction(Function<? super KStream<? super E>, ? extends KStream<? extends E>> chain) {...} } then both lines Java processor.withFunction(this::transformA); processor.withFunction(this::transformB); will fail to compile with something like “KStream<capture of ? super Employee> is not convertible to KStream<Employee>.” Type calculation in Java is not too “wise” to support complex recursive definitions. Meanwhile in Kotlin, if we declare class KStream<out E> as covariant, this is easily possible: Kotlin /* LIBRARY CODE */ class KStream<out E> class Processor<E> { fun withFunction(chain: (KStream<E>) -> KStream<E>) {} } /* USER'S CODE */ fun transformA(s: KStream<Employee>): KStream<Employee> { ... } fun transformB(s: KStream<Person>): KStream<Manager> { ... } val processor: Processor<Employee> = Processor() processor.withFunction(this::transformA) processor.withFunction(this::transformB) All the lines will compile and run as intended (besides the fact that we have more concise syntax). Kotlin has a clear win in this scenario. Conclusion To sum up, here are some properties of different kinds of type variance. Covariance is: ? extends in Java out in Kotlin safe reading, unsafe or impossible writing described by the following diagram: When A is a supertype of B, then the matrix of possible assignments fills the lower left corner: Contravariance is: ? super in Java in in Kotlin safe writing, type information lost or impossible reading described by the following diagram: When A is a supertype of B, then the matrix of possible assignments fills the upper right corner: Invariance is: assumed in Java and Kotlin by default safe writing and reading when A is a supertype of B, then the matrix of possible assignments fills only the diagonal: To create good APIs, understanding type variance is necessary. Kotlin offers great enhancements for Java Generics, making usage of ready-made generic types even more straightforward. But to create your generic types in Kotlin, it’s even more important to understand the principles of type variance. I hope that it’s now clear how type variance works and how it can be used in your APIs. Thanks for reading.
Introduction Language models are an essential part of natural language processing (NLP), which is a field of artificial intelligence (AI) that focuses on enabling computers to understand and generate human language. ChatGPT and GPT-3 are two popular language models that have been developed by OpenAI, a leading AI research institute. In this blog post, we will explore the features and capabilities of these two models and discuss how they differ from each other. ChatGPT Overview of ChatGPT ChatGPT is a state-of-the-art conversational language model that has been trained on a large amount of text data from various sources, including social media, books, and news articles. This model is capable of generating human-like responses to text input, making it suitable for tasks such as chatbots and conversational AI systems. Features and Capabilities of ChatGPT ChatGPT has several key features and capabilities that make it a powerful language model for NLP tasks. Some of these include: Human-like responses: ChatGPT has been trained to generate responses that are similar to how a human would respond in a given situation. This allows it to engage in natural, human-like conversations with users. Contextual awareness: ChatGPT is able to maintain context and track the flow of a conversation, allowing it to provide appropriate responses even in complex or multi-turn conversations. Large training data: ChatGPT has been trained on a large amount of text data, which has allowed it to learn a wide range of language patterns and styles. This makes it capable of generating diverse and nuanced responses. How ChatGPT Differs From Other Language Models ChatGPT differs from other language models in several ways. First, it is specifically designed for conversational tasks, whereas many other language models are more general-purpose and can be used for a wide range of language-related tasks. Second, ChatGPT is trained on a large amount of text data from various sources, including social media and news articles, which gives it a wider range of language patterns and styles compared to other models that may be trained on more limited data sets. Finally, ChatGPT has been specifically designed to generate human-like responses, making it more suitable for tasks that require natural, human-like conversations. GPT-3 or Generative Pre-Trained Transformer 3 Overview of GPT-3 GPT-3 is a large-scale language model that has been developed by OpenAI. This model is trained on a massive amount of text data from various sources, including books, articles, and websites. It is capable of generating human-like responses to text input and can be used for a wide range of language-related tasks. Features and Capabilities of GPT-3 GPT-3 has several key features and capabilities that make it a powerful language model for NLP tasks. Some of these include: Large training data: GPT-3 has been trained on a massive amount of text data, which has allowed it to learn a wide range of language patterns and styles. This makes it capable of generating diverse and nuanced responses. Multiple tasks: GPT-3 can be used for a wide range of language-related tasks, including translation, summarization, and text generation. This makes it a versatile model that can be applied to a variety of applications. How GPT-3 Differs From Other Language Models GPT-3 differs from other language models in several ways. First, it is one of the largest and most powerful language models currently available, with 175 billion parameters. This allows it to learn a wide range of language patterns and styles and generate highly accurate responses. Second, GPT-3 is trained on a massive amount of text data from various sources, which gives it a broader range of language patterns and styles compared to other models that may be trained on more limited data sets. Finally, GPT-3 is capable of multiple tasks, making it a versatile model that can be applied to a variety of applications. Comparison of ChatGPT and GPT-3 Similarities Between the Two Models Both ChatGPT and GPT-3 are language models developed by OpenAI that are trained on large amounts of text data from various sources. Both models are capable of generating human-like responses to text input, and both are suitable for tasks such as chatbots and conversational AI systems. Differences Between the Two Models There are several key differences between ChatGPT and GPT-3. First, ChatGPT is specifically designed for conversational tasks, whereas GPT-3 is a more general-purpose model that can be used for a wide range of language-related tasks. Second, ChatGPT is trained on a smaller amount of data compared to GPT-3, which may affect its ability to generate diverse and nuanced responses. Finally, GPT-3 is significantly larger and more powerful than ChatGPT, with 175 billion parameters compared to only 1.5 billion for ChatGPT. ChatGPT is a state-of-the-art conversational language model that has been trained on a large amount of text data from various sources, including social media, books, and news articles. This model is capable of generating human-like responses to text input, making it suitable for tasks such as chatbots and conversational AI systems. GPT-3, on the other hand, is a large-scale language model that has been trained on a massive amount of text data from various sources. It is capable of generating human-like responses and can be used for a wide range of language-related tasks. In terms of similarities, both ChatGPT and GPT-3 are trained on large amounts of text data, allowing them to generate human-like responses to text input. They are also both developed by OpenAI and are considered state-of-the-art language models. However, there are also some key differences between the two models. ChatGPT is specifically designed for conversational tasks, whereas GPT-3 is more general-purpose and can be used for a wider range of language-related tasks. Additionally, ChatGPT is trained on a wide range of language patterns and styles, making it more capable of generating diverse and nuanced responses compared to GPT-3. In terms of when to use each model, ChatGPT is best suited for tasks that require natural, human-like conversations, such as chatbots and conversational AI systems. GPT-3, on the other hand, is best suited for tasks that require a general-purpose language model, such as text generation and translation. Final Words In conclusion, understanding the differences between ChatGPT and GPT-3 is important for natural language processing tasks. While both models are highly advanced and capable of generating human-like responses, they have different strengths and are best suited for different types of tasks. By understanding these differences, users can make informed decisions about which model to use for their specific NLP needs.
In this post, you will learn how to deploy a Go Lambda function and trigger it in response to events sent to a topic in an MSK Serverless cluster. The following topics have been covered: How to use the franz-go Go Kafka client to connect to MSK Serverless using IAM authentication Write a Go Lambda function to process data in MSK topic. Create the infrastructure: VPC, subnets, MSK cluster, Cloud9 etc. Configure Lambda and Cloud9 to access MSK using IAM roles and fine-grained permissions. MSK Serverless is a cluster type for Amazon MSK that makes it possible for you to run Apache Kafka without having to manage and scale cluster capacity. It automatically provisions and scales capacity while managing the partitions in your topic, so you can stream data without thinking about right-sizing or scaling clusters. Consider using a serverless cluster if your applications need on-demand streaming capacity that scales up and down automatically.- MSK Serverless Developer Guide Prerequisites You will need an AWS account to install AWS CLI, as well as a recent version of Go (1.18 or above). Clone this GitHub repository and change it to the right directory: git clone https://github.com/abhirockzz/lambda-msk-serverless-trigger-golang cd lambda-msk-serverless-trigger-golang Infrastructure Setup AWS CloudFormation is a service that helps you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS. You create a template that describes all the AWS resources that you want (like Amazon EC2 instances or Amazon RDS DB instances), and CloudFormation takes care of provisioning and configuring those resources for you. You don't need to individually create and configure AWS resources and figure out what's dependent on what; CloudFormation handles that.- AWS CloudFormation User Guide Create VPC and Other Resources Use a CloudFormation template for this. aws cloudformation create-stack --stack-name msk-vpc-stack --template-body file://template.yaml Wait for the stack creation to complete before proceeding to other steps. Create MSK Serverless Cluster Use AWS Console to create the cluster. Configure the VPC and private subnets created in the previous step. Create an AWS Cloud9 Instance Make sure it is in the same VPC as the MSK Serverless cluster and choose the public subnet that you created earlier. Configure MSK Cluster Security Group After the Cloud9 instance is created, edit the MSK cluster security group to allow access from the Cloud9 instance. Configure Cloud9 To Send Data to MSK Serverless Cluster The code that we run from Cloud9 is going to produce data to the MSK Serverless cluster. So we need to ensure that it has the right privileges. For this, we need to create an IAM role and attach the required permissions policy. aws iam create-role --role-name Cloud9MSKRole --assume-role-policy-document file://ec2-trust-policy.json Before creating the policy, update the msk-producer-policy.json file to reflect the required details including MSK cluster ARN etc. aws iam put-role-policy --role-name Cloud9MSKRole --policy-name MSKProducerPolicy --policy-document file://msk-producer-policy.json Attach the IAM role to the Cloud9 EC2 instance: Send Data to MSK Serverless Using Producer Application Log into the Cloud9 instance and run the producer application (it is a Docker image) from a terminal. export MSK_BROKER=<enter the MSK Serverless endpoint> export MSK_TOPIC=test-topic docker run -p 8080:8080 -e MSK_BROKER=$MSK_BROKER -e MSK_TOPIC=$MSK_TOPIC public.ecr.aws/l0r2y6t0/msk-producer-app The application exposes a REST API endpoint using which you can send data to MSK. curl -i -X POST -d 'test event 1' http://localhost:8080 This will create the specified topic (since it was missing, to begin with) and also send the data to MSK. Now that the cluster and producer applications are ready, we can move on to the consumer. Instead of creating a traditional consumer, we will deploy a Lambda function that will be automatically invoked in response to data being sent to the topic in MSK. Configure and Deploy the Lambda Function Create Lambda Execution IAM Role and Attach the Policy A Lambda function's execution role is an AWS Identity and Access Management (IAM) role that grants the function permission to access AWS services and resources. When you invoke your function, Lambda automatically provides your function with temporary credentials by assuming this role. You don't have to call sts:AssumeRole in your function code. aws iam create-role --role-name LambdaMSKRole --assume-role-policy-document file://lambda-trust-policy.json aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaMSKExecutionRole --role-name LambdaMSKRole Before creating the policy, update the msk-consumer-policy.json file to reflect the required details including MSK cluster ARN etc. aws iam put-role-policy --role-name LambdaMSKRole --policy-name MSKConsumerPolicy --policy-document file://msk-consumer-policy.json Build and Deploy the Go Function and Create a Zip File Build and zip the function code: GOOS=linux go build -o app zip func.zip app Deploy to Lambda: export LAMBDA_ROLE_ARN=<enter the ARN of the LambdaMSKRole created above e.g. arn:aws:iam::<your AWS account ID>:role/LambdaMSKRole> aws lambda create-function \ --function-name msk-consumer-function \ --runtime go1.x \ --zip-file fileb://func.zip \ --handler app \ --role $LAMBDA_ROLE_ARN Lambda VPC Configuration Make sure you choose the same VPC and private subnets as the MSK cluster. Also, select the same security group ID as MSK (for convenience). If you select a different one, make sure to update the MSK security group to add an inbound rule (for port 9098), just like you did for the Cloud9 instance in an earlier step. Configure the MSK Trigger for the Function When Amazon MSK is used as an event source, Lambda internally polls for new messages from the event source and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides these to your function as an event payload. The maximum batch size is configurable (the default is 100 messages). Lambda reads the messages sequentially for each partition. After Lambda processes each batch, it commits the offsets of the messages in that batch. If your function returns an error for any of the messages in a batch, Lambda retries the whole batch of messages until processing succeeds or the messages expire. Lambda sends the batch of messages in the event parameter when it invokes your function. The event payload contains an array of messages. Each array item contains details of the Amazon MSK topic and partition identifier, together with a timestamp and a base64-encoded message. Make sure to choose the right MSK Serverless cluster and enter the correct topic name. Verify the Integration Go back to the Cloud9 terminal and send more data using the producer application. I used a handy JSON utility called jo (sudo yum install jo). APP_URL=http://localhost:8080 for i in {1..5}; do jo email=user${i}@foo.com name=user${i} | curl -i -X POST -d @- $APP_URL; done In the Lambda function logs, you should see the messages that you sent. Conclusion You were able to set up, configure and deploy a Go Lambda function and trigger it in response to events sent to a topic in an MSK Serverless cluster!
This article is a follow-up to Engineering Manager: Resolving Intrapersonal Conflicts and Engineering Manager: Resolving Interpersonal Situations. In the previous articles, we talked about interpersonal conflicts, which are conflicts between two or more people that do not agree on topics such as what actions to take or what the priorities are. Today we will talk about organizational conflicts, common scenarios, and some tips on how to manage them in a professional environment. Organizational This conflict is associated with misalignment or disagreement between members of the organization and the organization's cultural values, strategic decisions, goals, or methodologies. Usually, this kind of conflict starts when people believe that their values, professional development, goals, or attitude are not aligned with the organization or with the management layer of their areas. Career Path Most people are interested in a professional career path and together with the salary level and manager affinity are one of the main reasons to change companies. People want to promote because of the recognition, provide value across the company, and also the salary increase. The problem with the career path and promotions not only depend on people there are other factors such as: How many positions are available? If the company prioritizes talent recruitment or team growth Company financial status Sometimes it is necessary to bring in outside talent to bring in new ideas. When the promotion implies a change of role, even if all the technical requirements are met, there are some soft skills that are considered restricted. Soft skills are sometimes not easy to measure. When people want to be promoted to global contributor roles, soft skills are often more important than technical capabilities. Impacts When people meet all requirements to promote to a new role but promotion does not occur, three things usually happen, one followed by the other: People become demotivated. They share this demotivation with part of the team. They leave the company. Tips Transparency: We must always be honest and transparent with our team. Never promise promotions based on the requirements of merit because there are factors that we do not control. No promise: We can never promise a promotion; we can manage the promotion. If not possible: Sometimes promotions are not possible in the short term. We have to share this message in a constructive way and try to look for alternatives that can satisfy them for a period of time. Cultural Values When people join a new company or team, they bring their cultural values from their last experience. During the day-to-day, they probably face situations that generate intra-personal and interpersonal conflicts because they are not used to resolving them in the same way. There are two common scenarios: Fear to make errors so that when they make a mistake (I make mistakes every day), it implies a situation of internal confrontation between being honest with their team or trying to solve them quietly. They try to apply their cultural values in terms of best practices, responsibilities, or communications that do not have to be the ones we want in our company. Impacts In a company's normal growth, people usually need a few months to adapt to the team and this should be part of the onboarding process. The higher risk appears when we are hiring many people in a situation that requires fast growth because we are generally not as meticulous in the onboarding process. They can spread along the company the opposite culture that the company promotes. Tips Always remember that the people who join the company bring with them a culture that may be very different and that we have to work with them to align them with the values we want to promote. The cultural values, tips, and common scenarios must be included in the onboarding process and also in the follow-up plan. People automate behaviors, so when they start a new company they probably unconsciously apply methodologies or behaviors learned in their previous roles. Teams need to be aware of this because during the onboarding process they will face situations that they will have to manage. Our teams have to empathize with people and at the same time mentor them. It is important to assign a mentor who can spend time with new people and support them in their new environment. In very rapid growth it is necessary to assess these risks and establish a team growth strategy. There are several strategies that we will discuss in another article. Burnout Burnout syndrome was defined by Herbert Freudenberg as a "state of mental and physical exhaustion caused by one's professional life." Of course, it is an intra-personal conflict but caused by many other conflicts during people's professional careers. It is very different from impostor syndrome mainly because burnout does not provide any positive aspects to the organization or people. People who are experiencing burnout are emotionally exhausted, interpret any situation negatively, have low self-esteem, and have generally lost the feeling of fellowship. These are some of the main factors that contribute to employee burnout: Deadlines/time pressures: Working in a constant environment of pressure to achieve goals with unmanageable workloads Lack of role definition: Not being clear about the responsibilities and goals of people's roles generates a constant situation of uncertainty about what is expected of them. It usually leads to a feeling of unfairness. The relationship with the management layer Impacts When people are burnout imply a negative impact on every area of life: Health: There are studies that have shown how negative it is for overall health, not only mentally. Team: Promotes negative dynamics and does not empathize with people and therefore tends to generate conflicts Performance: Being exhausted means that they cannot focus on the objectives. Tips There is no single factor, people generally get into this situation because of mismanagement by their manager and the organization: Marathon, not a sprint: People cannot be constantly working as if everything were a sprint. We have to be very careful not to create a culture of continuous excessive effort. Conciliation: Promote a goal-based and flexible culture. People have children and emergent situations where we have to support them when they need this flexibility. The important thing is not how many hours they are working, but achieving the goals as a team. Responsibilities: Not giving responsibilities that they cannot manage because they have no skills or because they are in a complex personal situation at this moment Support with conflicts: We need to support them to resolve conflict situations and not look the other way. Take a break: Provide mental and cognitive breaks for people who are under higher stress. Giving them more time for self-learning, and making temporary rotations to lower stress initiatives. Burnout is the consequence of many other unmanaged conflicts and occurs progressively over a long period of time. There are many indicators that we can detect during one-to-one or informal talks. Conclusions These types of conflicts are the main causes of people leaving the company, but there are cases in which as managers, we cannot do anything in the short or medium term about the root cause of these conflict situations. We have to prevent people from getting into burnout by providing the best possible workplace and providing all the necessary support to our teams. We must provide constant feedback to the organization on these types of conflicts and promote a culture of transparency, honesty, and continuous feedback. These are the foundations for constant improvement and the way to reduce these kinds of conflicts. We must also be honest with ourselves and with people when cultural values do not fit between the two parties and perhaps the best thing to do is to seek new challenges in another team or company. Sometimes as a manager, we have an obsession with retaining people in the team and I think we need to change that goal to find the best environment for them.