Enterprise AI
In recent years, artificial intelligence has become less of a buzzword and more of an adopted process across the enterprise. With that, there is a growing need to increase operational efficiency as customer demands arise. AI platforms have become increasingly more sophisticated, and there has become the need to establish guidelines and ownership. In DZone’s 2022 Enterprise AI Trend Report, we explore MLOps, explainability, and how to select the best AI platform for your business. We also share a tutorial on how to create a machine learning service using Spring Boot, and how to deploy AI with an event-driven platform. The goal of this Trend Report is to better inform the developer audience on practical tools and design paradigms, new technologies, and the overall operational impact of AI within the business. This is a technology space that's constantly shifting and evolving. As part of our December 2022 re-launch, we've added new articles pertaining to knowledge graphs, a solutions directory for popular AI tools, and more.
Continuous Integration Patterns and Anti-Patterns
Getting Started With CI/CD Pipeline Security
This is the last part of the debugging series. To learn the rest, you’ll need to get the book “Practical Debugging at Scale: Cloud Native Debugging in Kubernetes and Production” or the course. One of the most frequently asked questions I receive is: can we do these things in VS Code? The answer is, unfortunately, no. But I elaborate on the debugging capabilities of VS Code in this video: “16 Missing Features in the VS Code Debugger” on YouTube. I’ll do a blog post that covers that next week. Below is the last video in the series: Transcript Welcome back to the ninth part of debugging at scale, where we really know the quality of your code. Remote debugging doesn’t always deal with a remote machine. We often need it when debugging into Kubernetes or Docker. We’ll delve more into that later, but for now, we’ll discuss the basic mechanics. How to connect, how to make it slightly less vulnerable to security attacks, and then we’ll discuss the problems of remote debugging. The Connection and Command Line We’ll start with a discussion around the connection. We first need to run the process that we’ll connect to remotely. To do that, we need to run a command similar to this one. Notice that this is a simplified version. In many cases, the argument should be embedded in configuration files. When you inspect your maven or gradle files, you might see many of the arguments listed here. This is how these things work under the hood. Let’s go over the command and break it down piece by piece to see that we understand it correctly. The first part is the launch of the Java command line. This is pretty obvious. We need quotes in bash since there’s a star at the end of the line, and bash wants to expand it. Without this quote, the command won’t work properly. Agent lib is the system that loads the native library wiring directly into the virtual machine, and JDWP is the Java Debug Wire Protocol. This is the underlying networking protocol used to communicate between the debugger and the running process. It’s a high-level protocol, that means it can be implemented on top of various transports. Typically, it’s implemented over TCP sockets, but it’s the same protocol we used to debug devices directly. You don’t need to know too much about JDWP, but the concept is simple. You send commands and can query the system. That’s what the IDE does for you. When you add a breakpoint, the IDE sends a JDWP command to add a breakpoint at the given location. When the breakpoint is hit, JDWP sends back an event to the IDE, indicating that the IDE can then query the details about the current environment, stack, variables, etc. In this case, we transfer the details via a server socket. We can use dt_shmem, which stands for shared memory, as the wire protocol. This is faster and useful for processes that have access to a shared memory area. This is actually pluggable, and you can build your own JDWP transport. This isn’t useful usually but speaks to the power and flexibility of the API. We can optionally suspend the virtual machine on launch if you want to debug something right from the start. I’ve set this to no, which means the VM will start running right away. If you set it to yes with the letter “y,” the VM will pause on launch and wait for the JDWP connection. This is the address and port we are listening on. In this case, I allow anyone to connect on port 5005. I can limit this to localhost only by changing the star character. This is probably the better approach. Although, it won’t make the protocol fully secure. This is the rest of the command, the class we’re running. Typically, you would have something more substantial here. In this case, I’m just running the PrimeMain class. To start debugging, we need to edit the run configuration in intellij. Connecting from IntelliJ/IDEA Next, we need to locate a configuration for remote debugging. Once I select that, we can add it. Notice it’s pre-configured with the defaults, such as port 5005. I give the new run configuration a name, and we’re ready to go with debugging the app. Notice there are many options to tune here, but we don’t need any of them. Also, check out this area right here. Seems familiar? That’s the exact line we discussed before. The IDE is showing us how to set up the command line for the remote process. This lets us verify that we entered everything correctly. We now have a new debug remote run configuration. We can switch to a different configuration from the same location. But when we want to do remote debugging, we need to toggle it here. Next, we need to press the debug button to run this command. We are now instantly connected to the running process. Once that is done, this feels and acts like any debugger instance launched from within the IDE. I can set a breakpoint, step over, inspect variables, etc., so why do it? In some cases, running the server locally in the IDE is impractical. A good example would be debugging a container on your own machine. That might not be trivial. Security Implications of JDWP Calling JDWP insecure is inaccurate. That would be like putting your house keys and home address wrapped in a nice gift wrapping with an itemized list of your valuables sorted by value in front of your house. This is an open door. An open door isn’t a security vulnerability. It’s an open door! JDWP is very insecure when used remotely. Locally, on your own machine, it isn’t a problem, but it has almost no security protections. There’s no solution for that. But there’s a very partial workaround of tunneling it over SSH. This is relatively trivial. Just use this command to open a tunnel between the remote machine to your local machine. For both sides, it will seem like local debugging. So the example I showed before (of connecting to a local host server) would work perfectly with this remote host as SSH will move all the packets back and forth securely. We can’t SSH into a Kubernetes container, but we can port forward, which is almost identical. We can do something similar to this command to forward the port from the given pod to the local machine and vice versa. Same idea as the SSH tunneling but appropriate to the Kubernetes world. Dangers of Remote Debugging In this final section, I want to talk about the dangers of remote debugging in production. Breakpoints break, seems obvious. That’s what they’re here to do. But if we run on a server, we block it completely by mistake. We can use trace points. As I said, they’re great. But they are no replacement to breakpoints, and an accidental click in the gutter can literally stop your server in its tracks. JDWP effectively allows remote code execution. Lets you access all the bytecode of the app, which is effectively the same as giving access to your full server source code. It lets attackers do almost anything since it wasn’t designed with security in mind. We need to relaunch the application with debugging enabled. That means killing the running process and starting it over again. Disconnecting existing users, etc. That isn’t great. Some operations in the debugger require more than one step in terms of the protocol. As a result, you could send a request to the debugger, lose your connection, and the debugger could be stuck in a problematic state. This is an inherent limitation of the JDWP protocol and can’t be worked around in a standard debugger. The problem is that even unintentional actions can demolish a server. A simple conditional breakpoint that invokes a method as part of the condition can demolish server performance and crash it. JDWP effectively allows remote code execution. Lets you access all the bytecode of the app, which is effectively the same as giving access to your full source code. It lets attackers do almost anything since it wasn’t designed with security in mind. Imagine placing a breakpoint where the user password is passed to authentication. If JDWP is open for your server, a member of your team might use that, and you will never know. There’s no tracking at all! 60% of security hacks happen from within the organization. If your company does remote debugging, they have no way of knowing whether an employee used that to manipulate the application state or siphon user details. There’s no tracking or anything. This can be in violation of various rules and regulations since it might expose personal user data. Remote debugging into production can trigger liability risks. I discuss some of the solutions for those problems both in the low-level tooling and in higher-level observability solutions. This is covered in the book and in the full course. Final Word With this, we finished the first part of the course. If you want to check out the full course; go to “debugagent.com” to learn more. The next video covers the strategies for debugging and the science of debugging. If you have any questions, please use the comments section below. Thank you!
Have you ever needed to make some HTTP requests in your Jenkins pipeline? How will you do that? The first that comes to mind is to use the simple CURL command. It's a powerful tool, and it will be fine for simple cases like the usual GET request. But what about more complicated issues? What if you need to execute POST with a huge JSON body and many headers? CURL command will be obscure and not maintainable. Also, it will be hell with escaping quotes and special symbols to make it works in the pipeline. While last interacting with this hell, I've thought about an alternative and found that - a powerful plugin, HTTP Request. To use the HTTP Request Plugin, you will need to install it in your Jenkins. This can be done through the Jenkins plugin manager, which allows you to browse and install plugins from the Jenkins interface. Once the plugin is installed, you can use it in pipelines with the keyword httpRequest. Main features: Request types GET, HEAD, POST, PUT, PATCH, DELETE Expected response code or string content Basic/Form authentication Connection timeout Add custom headers I've prepared a few examples to compare and contrast both approaches. These examples illustrate the key differences between executing HTTP requests in the pipeline and demonstrate how HTTP Request Plugin may be better suited for specific situations. By examining these examples, you can better understand each tool's strengths and weaknesses and determine which is the best fit for your needs. GET Request CURL stage('Execute Request') { steps { script { sh "curl https://dummyjson.com/products" } } } Plugin stage('Execute Request') { steps { httpRequest "https://dummyjson.com/products" } } Custom Headers CURL stage('Execute Request') { steps { script { sh "curl --location --request GET 'https://dummyjson.com/auth/products/1' \\\n" + "--header 'Content-Type: application/json' \\\n" + "--header 'Authorization: Bearer YOUR_TOKEN' \\\n" + "--data-raw ''" } } } Plugin stage('Execute Request') { steps { httpRequest customHeaders: [[name: 'Content-Type', value: 'application/json'], [name: 'Authorization', value: 'Bearer YOUR_TOKEN']], url: 'https://dummyjson.com/auth/products/1' } } POST Request With Complex Payload CURL stage('Execute Request') { steps { script { sh "curl --location --request POST 'https://dummyjson.com/carts/add' \\\n" + "--header 'Content-Type: application/json' \\\n" + "--data-raw '{\n" + " \"userId\": \"1\",\n" + " \"products\": [\n" + " {\n" + " \"id\": \"23\",\n" + " \"quantity\": 5\n" + " },\n" + " {\n" + " \"id\": \"45\",\n" + " \"quantity\": 19\n" + " }\n" + " ]\n" + "}'" } } } Plugin stage('Execute Request') { steps { httpRequest contentType: 'APPLICATION_JSON', httpMode: 'POST', requestBody: '''{ "userId": "1", "products": [ {"id": "23","quantity": 5}, {"id": "45","quantity": 19} }''', url: 'https://dummyjson.com/carts/add' } } Response Code and Content Validation CURL stage('Execute Request') { steps { script { def (String response, int code) = sh(script: "curl -s -w '\\n%{response_code}' https://dummyjson.com/products", returnStdout: true) .trim() .tokenize("\n") if(code != "200") { throw new Exception("Status code doesn't match. Received: $code" ) } if(!response.contains("userId")){ throw new Exception("Very strange message\n" + response) } } } } Plugin stage('Execute Request') { steps { httpRequest url:"https://dummyjson.com/products", validResponseCodes:'200', validResponseContent:'userId' } } The validation rule will fail to build with the following: hudson.AbortException: Fail: Status code 500 is not in the accepted range: 200 hudson.AbortException: Fail: Response doesn't contain expected content 'userId' Conclusion HTTP Request Plugin is a valuable tool for Jenkins users. It allows you to easily send HTTP requests as part of your Jenkins jobs, enabling you to automate tasks and integrate Jenkins with other tools and services. The plugin's user-friendly interface and additional functionality for handling responses make it easy to use and powerful. If you are looking to expand the capabilities of your Jenkins instance and streamline your workflow, the Jenkins HTTP Request Plugin is definitely worth considering. Feel free to read the official documentation and visit the GitHub repository.
This tutorial shows how to optimize pattern-matching requests in PostgreSQL and YugabyteDB by applying several indexing strategies. Since YugabyteDB is built on PostgreSQL source code, the strategies work for both single-server PostgreSQL deployments and multi-node YugabyteDB clusters. Loading Sample Data Let’s begin by building out an example. Assume you have a table that stores user profiles. One of the columns is the user’s first name. Then, let’s find all the users whose name starts with An and is an arbitrary length. For that, we can use a pattern-matching query like WHERE firstname LIKE ‘An%’. First, let’s create a table with sample data. 1. Start by connecting to your PostgreSQL or YugabyteDB instance using your preferred method. As an option, you can do this in Docker within a minute. 2. Create a table that will store the names, and how frequently a name was used for newborns: SQL create table firstname ( name text, frequency int ); 3. Download sample data from here with 32,000+ first names. 4. Load the sample data to the table: SQL \copy firstname FROM '{path_to_file}/us_first_names.csv' csv header; If you search for all the names that start with An, the databases will do a full table scan of 32,000+ records and return 665 records that satisfy the search criteria: SQL explain analyze select name, frequency from firstname where name like 'An%'; Seq Scan on firstname (rows=665 loops=1) Filter: (name ~~ 'An%'::text) Rows Removed by Filter: 32287 Note: all execution plans are shortened for clarity and provided for PostgreSQL only. YugabyteDB’s execution plans can look, or be formatted, differently because its storage engine is distributed and uses the LSM tree instead of B-tree (used in PostgreSQL). The full table scan doesn’t look like a problem now. But imagine that new users keep signing up for your service, making the table bigger and bigger. So, now you have thousands of users, but in three months you might have tens of thousands of active customers. In a year or two, the number can grow exponentially. To keep the latency of the earlier pattern-matching query predictable and low, regardless of the data volume, you need to use a proper database index. Search With Default Index Let’s try to improve the search by creating an index with default settings for the firstname column: 1. Create the index and update statistics for PostgreSQL. YugabyteDB doesn’t have (and need) the vacuum analyze command as long as its distributed storage works differently: SQL create index name_idx on firstname(name); vacuum analyze; /* PostgreSQL only. */ 2. Confirm the index is selected for a simple query using the equality operator: SQL explain analyze select name, frequency from firstname where name like 'Andrew'; Index Scan using name_idx on firstname (rows=2 loops=1) Index Cond: (name = 'Andrew'::text) Filter: (name ~~ 'Andrew'::text) 3. Finally, search for names that start with An: SQL explain analyze select name, frequency from firstname where name like 'An%'; Seq Scan on firstname (rows=665 loops=1) Filter: (name ~~ 'An%'::text) Rows Removed by Filter: 32287 As the final execution plan shows, the current solution failed to use the index on my macOS laptop. It’s highly likely that the created index won’t be used on your machine as well. Let’s figure out why. Search With “C” Collation PostgreSQL and YugabyteDB rely on collations when it comes to the indexing of character string data types. A collation is a set of rules that define how to compare and sort character strings (refer to this article or video for more details on the topic). Let’s talk about PostgreSQL first. By default, PostgreSQL uses locale-specific collation rules of your operating system. Usually, those rules don’t work for queries involving the LIKE or similar operators. My default collation on PostgreSQL is en_US.utf8, and it’s not suitable for pattern-matching requests: SQL show lc_collate; lc_collate ------------ en_US.utf8 A suggested solution is to use the “C” collation that sorts and compares character strings strictly byte by byte. You can specify this collation for the whole database, on a per-column level, or while creating an index. Let’s use it for the index in PostgreSQL: 1. Drop the current index with the default operator class: SQL drop index name_idx; 2. Create a new index using the “C” collation: SQL create index name_idx on firstname(name collate "C"); vacuum analyze; 3. Search for all the names that start with An: explain analyze select name, frequency from firstname where name like 'An%'; Bitmap Heap Scan on firstname (rows=665 loops=1) Filter: (name ~~ 'An%'::text) Heap Blocks: exact=88 -> Bitmap Index Scan on name_idx (665 loops=1) Index Cond: ((name >= 'An'::text) AND (name < 'Ao'::text)) Success! PostgreSQL used the index and found the 665 names that satisfy the search pattern. In YugabyteDB, the situation is different. YugabyteDB already uses the “C” collation by default, so there is nothing to change here: SQL show lc_collate; lc_collate ------------ C (1 row) However, the index was created using the HASH algorithm which is not suitable for pattern-matching requests: SQL \d+ firstname; Table "public.firstname" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description -----------+---------+-----------+----------+---------+----------+--------------+------------- name | text | | | | extended | | frequency | integer | | | | plain | | Indexes: "name_idx" lsm (name HASH) To get the index selected for pattern-matching requests in YugabyteDB, we need to build the range index: 1. Drop the current index: SQL drop index name_idx; 2. Create a new index by specifying the ASC operator (ascending order). The operator enables the range indexing of data: SQL create index name_idx on firstname(name ASC); 3. Repeat the previous pattern-matching search: SQL explain analyze select name, frequency from firstname where name like 'An%'; Index Scan using name_idx on firstname (rows=665 loops=1) Index Cond: ((name >= 'An'::text) AND (name < 'Ao'::text)) Filter: (name ~~ 'An%'::text) Job done! Now our pattern-matching request uses the index in both PostgreSQL and YugabyteDB. One thing to keep in mind that applies to both databases: the index access is used only for characters preceding the first wildcard. Characters that go after the wildcard will be used for further sequential filtering over a result set prepared during the index traversal. For instance, let’s say that you want to find all the names that start with An and end with a: SQL explain analyze select name, frequency from firstname where name like 'An%a'; Bitmap Heap Scan on firstname (rows=220 loops=1) Filter: (name ~~ 'An%a'::text) Rows Removed by Filter: 445 Heap Blocks: exact=85 -> Bitmap Index Scan on name_idx (rows=665 loops=1) Index Cond: ((name ~>=~ 'An'::text) AND (name ~<~ 'Ao'::text)) The index is used for the characters before the wildcard - see the line Index Cond: ((name ~>=~ 'An'::text) AND (name ~<~ 'Ao'::text)) Then the prepared result set will be filtered sequentially - see the line Filter: (name ~~ 'An%a'::text) Note on pattern_ops Operator A quick note on the text_pattern_ops operator class (or varchar_pattern_ops) that is still suggested by many Internet resources as a go-to solution for pattern-matching requests in PostgreSQL. This operator class can also be used for fast lookups via the index: 1. Create a new index specifying the text_pattern_ops class: SQL /* In PostgreSQL */ create index name_idx on firstname(name text_pattern_ops); vacuum analyze; /*In YugabyteDB */ create index name_idx on firstname(name text_pattern_ops ASC); 2. Execute the query with the LIKE operator: SQL explain analyze select name, frequency from firstname where name like 'An%'; Bitmap Heap Scan on firstname (rows=665 loops=1) Filter: (name ~~ 'An%'::text) Heap Blocks: exact=85 -> Bitmap Index Scan on name_idx (rows=665 loops=1) Index Cond: ((name ~>=~ 'An'::text) AND (name ~<~ 'Ao'::text)) As you see, the text_pattern_ops does work, but many consider it a legacy solution. One limitation is that this operator class doesn’t support queries involving ordinary <, <=, >, or >= comparisons. You can learn more about the topic here. Case Insensitive Search What if you need to do a case-insensitive search using the ILIKE operator, or another method? Then the previously discussed solution using the “C” collation (as well as the text_pattern_ops class) won’t be sufficient. As before, let’s find all the names that start with An (or an, AN, aN) using the case-insensitive ILIKE operator: SQL explain analyze select name, frequency from firstname where name ilike 'an%'; Seq Scan on firstname (rows=665 loops=1) Filter: (name ~~* 'an%'::text) Rows Removed by Filter: 32287 The query returns the same 665 first name but didn’t use the existing index. There are several solutions that support case-insensitive index lookups. One of them is trigrams that are backed by GiST or GIN indexes. PostgreSQL (and inherently YugabyteDB) support the trigrams-based lookups via the pg_trgm extension. Let’s install it and use the extension for case-insensitive pattern-matching queries: 1. Drop the current index: SQL drop index name_idx; 2. Enable the pg_trgm extension: SQL create extension pg_trgm; 3. Create a new GIN index with the trigrams operator class: SQL create index name_idx on firstname using gin(name gin_trgm_ops); vacuum analyze; /* PostgreSQL only. */ 4. Perform a case-insensitive search for all the names that start with An/an/aN/AN: SQL explain analyze select name, frequency from firstname where name ilike 'an%'; Bitmap Heap Scan on firstname (rows=665 loops=1) Recheck Cond: (name ~~* 'an%'::text) Heap Blocks: exact=85 -> Bitmap Index Scan on name_idx (rows=665 loops=1) Index Cond: (name ~~* 'an%'::text) This time both PostgreSQL and YugabyteDB used the index lookup. Finally, as a bonus, with the trigrams-based index, you can run pattern-matching queries that start with a wildcard. For instance, if you need to find all the names that end with an, then just use the %an expression, and the databases will still pick the index: SQL explain analyze select name, frequency from firstname where name ilike '%an'; Bitmap Heap Scan on firstname (rows=1530 loops=1) Recheck Cond: (name ~~* '%an'::text) Heap Blocks: exact=175 -> Bitmap Index Scan on name_idx (rows=1530 loops=1) Index Cond: (name ~~* '%an'::text) Summary As you see from the steps in this example, database indexing capabilities are very rich. You can create an optimal index for every business operation: you just need to study what types of indexes a database supports, and match those capabilities to your use cases. If you’d like to complete this tutorial from start to finish and see how the databases behave, use these Docker commands to get PostgreSQL and YugabyteDB ready on your personal machine in under a minute.
TL; DR: Agile Transformation With ChatGPT or McBoston? This article is another excursion into this nascent yet fascinating new technology of generative AI and LLMs and the future of knowledge work. I was interested in learning more about a typical daily challenge many Agile practitioners face: How shall we successfully pursue an Agile transformation? Shall we outsource the effort to one of the big consultancies, lovely dubbed McBoston? Or shall we embark on an Agile transformation with ChatGPT providing some guidance? If technology can pass a Wharton MBA exam, maybe, it deserves some attention. We thought that AI might initially come after simple office jobs. I am no longer sure about that. Maybe, ChatGPT’s successor will start at the top of the food chain. How to Create ChatGPT Prompts Generally, ChatGPT prompts comprise several elements: I define in what capacity ChatGPT shall interact with the prompts: for example, as a Scrum Master. Then I sketch the situation and suggest what knowledge ChatGPT shall consider answering. As a third step, I point to helpful practices. Finally, I describe the scenario and place my first request. Once the initial answer is available, follow up with additional questions to guide the bot in a more suitable direction or to receive information on the rationale of the response from the LLM’s algorithm. I created all prompts and scenarios on January 19, 2023, based on the ChatGPT Jan 9 version. The General Scenario and Initial Suggestions for an Agile Transformation Prompt: I want you to act as a change leader in a traditional organization: I will provide you with a scenario, and you will suggest ways of overcoming the problems using your knowledge of change management, communication, leadership, collaboration, stakeholder management, and conflict resolution. Your organization used to be the technology leader in your market, but that was 20 years ago. Since then, your market share has been continuously dropping, and earnings have fallen. Your share price has fallen by 50%. Your organization wants to regain its former market dominance by becoming an Agile organization. What would you suggest to the leadership of that organization to help the change effort? ChatGPT: “To help the change effort in this scenario, I would suggest the following: Clearly communicate the vision and goals of the change to all stakeholders. This will help them understand the importance of the change and their role in it. Build a strong and diverse change management team to lead the effort. This team should include members from different departments and levels within the organization. Engage key stakeholders early on in the process to gather input and buy-in. This will help ensure that their needs and concerns are taken into account during the change. Develop a detailed plan for the change, including specific milestones and timelines. This will help keep the effort on track and ensure that progress is being made. Provide regular communication and updates on the progress of the change to all stakeholders. This will help keep them informed and engaged. Leverage the expertise of the consulting firm but also empower employees to take ownership of the change by providing them with the necessary training and resources. Be prepared to address and resolve any conflicts that may arise during the change process through effective conflict resolution strategies. Continuously evaluate and adjust the change effort as needed to ensure it stays aligned with the organization’s goals and objectives. Encourage collaboration and teamwork across the organization to build a culture of agility and speed. Finally, celebrate the successes along the way and acknowledge the efforts of employees to keep morale high.” Comment: This answer is classic consultant mumbo-jumbo: “We have done this before, and we know what to do when and how. We will create a winning concept and support changing top-down. Just ensure the minions comply and don’t ask too many questions.” Follow-up question: Your answer has too much focus on traditional project management. Make the approach more Agile. ChatGPT: “You’re right, my previous response focused primarily on traditional project management methods. To approach this change effort with an Agile mindset, I would suggest the following: Emphasize on creating a culture of continuous improvement and experimentation, where employees are encouraged to take risks and learn from failures. Use Agile methodologies such as Scrum or Kanban to manage the change effort. This will allow for more flexibility and adaptability in the process. Encourage cross-functional teams that are self-organizing and empowered to make decisions. This will help to ensure that the change effort is aligned with the organization’s goals and objectives. Foster a culture of transparency and open communication. This will allow for regular feedback and adjustment of the change effort as needed. Encourage experimentation and learning by providing time and resources for pilots, prototypes, and other small-scale testing. Use data and metrics to track progress and adjust the change effort as needed. Continuously involve stakeholders, including customers and partners, in the change effort to gather feedback and ensure alignment with their needs. Be prepared to pivot or change course as needed, based on feedback and results. Celebrate successes and failures alike, as they are opportunities to learn and improve. Finally, empower employees to take ownership of their work and the change effort. Provide them with the autonomy, resources, and support they need to succeed.” Comment: Now we are talking. These suggestions resemble the top ten of the critical challenges any Agile transformation faces. Let’s figure out whether a guided Agile transformation with ChatGPT gets the priorities right. Follow-up question: Which elements from your previous answer do you consider mission-critical to succeeding in an Agile transformation? Rank the list from most to least critical. ChatGPT: I would rank the elements from my previous answer as follows in terms of most to least critical to succeeding in an Agile transformation: Emphasizing a culture of continuous improvement and experimentation, where employees are encouraged to take risks and learn from failures. This is the foundation for an Agile mindset and essential for the transformation to succeed. Encouraging cross-functional teams that are self-organizing and empowered to make decisions. This is critical for ensuring alignment with the organization’s goals and objectives and for creating a sense of ownership among employees. Fostering a culture of transparency and open communication. This is essential for ensuring regular feedback and adjustment of the change effort as needed. Using Agile methodologies such as Scrum or Kanban to manage the change effort. This is important for allowing for more flexibility and adaptability in the process. Encouraging experimentation and learning by providing time and resources for pilots, prototypes, and other small-scale testing. This is important for ensuring that the change effort is constantly evolving and improving. Using data and metrics to track progress and adjust the change effort as needed. This is important for ensuring that the change effort is on track and making progress. Continuously involving stakeholders, including customers and partners, in the change effort to gather feedback and ensure alignment with their needs. This is important for ensuring that the change effort is aligned with the organization’s goals and objectives. Being prepared to pivot or change course as needed, based on feedback and results. This is important for being able to respond quickly to changes in the market or within the organization. Celebrating successes and failures alike, as they are opportunities to learn and improve. This is important for keeping morale high and fostering a culture of continuous improvement. Empowering employees to take ownership of their work and the change effort. Providing them with the autonomy, resources, and support they need to succeed. This is important for creating a sense of ownership among employees and for ensuring that the change effort is aligned with the organization’s goals and objectives.” Comment: Interestingly, ChatGPT lists “empowering employees to take ownership of their work and the change effort” at the bottom. I would move this issue way up in the list. Maybe an Agile transformation with ChatGPT is not as promising as I thought. Follow-up question: Why is “empowering employees to take ownership of their work and the change effort” only #10 on your list? What is the rationale behind putting it at the bottom of the list? In other words: why do you consider the other nine factors more important? ChatGPT: I apologize if my previous ranking was not clear. In an Agile transformation, all the elements listed are crucial and interrelated, and it is difficult to prioritize one over the other. However, empowering employees to take ownership of their work and the change effort is essential for the success of the transformation. Here are a few reasons why I considered the other nine factors to be more critical in my previous response: Emphasizing a culture of continuous improvement and experimentation is the foundation for an Agile mindset and essential for the transformation to succeed. Without this mindset, employees may be resistant to change and the Agile methodologies will not be fully embraced. Encouraging cross-functional teams that are self-organizing and empowered to make decisions is critical for ensuring alignment with the organization’s goals and objectives and for creating a sense of ownership among employees. This is important to ensure that the teams have the autonomy to make decisions that align with the overall objectives and are able to take ownership of the change effort. Fostering a culture of transparency and open communication is essential for ensuring regular feedback and adjustment of the change effort as needed. This is important for ensuring that the change effort is on track and making progress. Using Agile methodologies such as Scrum or Kanban to manage the change effort is important for allowing for more flexibility and adaptability in the process. Without this, the change effort may become too rigid and inflexible, making it difficult to adapt to changes in the market or within the organization. Encouraging experimentation and learning by providing time and resources for pilots, prototypes, and other small-scale testing is important for ensuring that the change effort is constantly evolving and improving. Using data and metrics to track progress and adjust the change effort as needed is important for ensuring that the change effort is on track and making progress. Continuously involving stakeholders, including customers and partners, in the change effort to gather feedback and ensure alignment with their needs is important for ensuring that the change effort is aligned with the organization’s goals and objectives. Being prepared to pivot or change course as needed, based on feedback and results is important for being able to respond quickly to changes in the market or within the organization. Celebrating successes and failures alike, as they are opportunities to learn and improve is important for keeping morale high and fostering a culture of continuous improvement. All these elements are critical for the success of an Agile transformation. Empowering employees to take ownership of their work and the change effort is a crucial aspect of the change effort, but it is also dependent on the other elements being in place and being successful.” Comment: Our Agile transformation with ChatGPT is back in the game. First, the ChatGPT acknowledges that ranking the success factors is tricky. However, to support its ranking of critical factors, the LLM now offers an explanation and points to the dependencies between all aspects. Typical Problems and Anti-Patterns of Agile Transformations Prompt: What are typical problems and anti-patterns that you observe during Agile transformations? List these critical issues for each of the previous ten elements and suggestion how to avoid them in the spirit of agility. Note: ChatGPT also created the HTML for the following list. ChatGPT: “Here are typical problems and anti-patterns that I observe during Agile transformations and suggestions on how to avoid them: Emphasizing a culture of continuous improvement and experimentation: Problem: Employees may be resistant to change and may not fully embrace the Agile mindset. Anti-pattern: Managers may impose Agile practices without involving employees and without explaining the reasons for the change. Suggestion: Communicate the reasons for the change, involve employees in the process, provide training and coaching on Agile methodologies, and create a safe environment for experimentation and learning. Encouraging cross-functional teams that are self-organizing and empowered to make decisions: Problem: Teams may lack the autonomy to make decisions that align with the overall objectives or may not take ownership of the change effort. Anti-pattern: Managers may micromanage teams or make decisions on their behalf. Suggestion: Empower teams to make decisions that align with the overall objectives, provide training and coaching on Agile methodologies, and create a culture of trust and transparency. Fostering a culture of transparency and open communication: Problem: Communication may be siloed, and regular feedback may not be provided. Anti-pattern: Managers may not provide regular feedback or may not be transparent about the progress of the change effort. Suggestion: Encourage open communication, provide regular feedback, and create a culture of transparency and trust. Using Agile methodologies such as Scrum or Kanban to manage the change effort: Problem: Agile methodologies may not be fully embraced or may be applied incorrectly. Anti-pattern: Managers may impose Agile methodologies without involving employees or providing training and coaching. Suggestion: Provide training and coaching on Agile methodologies, involve employees in the process, and continuously evaluate and adjust the Agile methodologies as needed. Encouraging experimentation and learning by providing time and resources for pilots, prototypes, and other small-scale testing: Problem: Experiments may not be conducted or may not be incorporated into the change effort. Anti-pattern: Managers may not provide time and resources for experimentation or may not incorporate the results into the change effort. Suggestion: Encourage experimentation, provide time and resources for pilots, prototypes, and other small-scale testing, and incorporate the results into the change effort. Using data and metrics to track progress and adjust the change effort as needed: Problem: Data and metrics may not be used or may not be used effectively to track progress and adjust the change effort. Anti-pattern: Managers may not use data and metrics or may not use them effectively. Suggestion: Use data and metrics to track progress and adjust the change effort as needed, provide training and coaching on data analysis, and involve employees in the process. Continuously involving stakeholders, including customers and partners, in the change effort to gather feedback and ensure alignment with their needs: Problem: Stakeholders may not be involved in the change effort, or their needs may not be taken into account. Anti-pattern: Managers may not involve stakeholders or may not take their needs into account. Suggestion: Involve stakeholders early on in the process, gather input and buy-in, and take their needs into account during the change effort. Being prepared to pivot or change course as needed, based on feedback and results: Problem: The change effort may become too rigid and inflexible, making it difficult to adapt to changes in the market or within the organization. Anti-pattern: Managers may not be prepared to pivot or change course as needed, and may not be responsive to feedback and results. Suggestion: Continuously evaluate and adjust the change effort as needed, based on feedback and results, and be prepared to pivot or change course as needed. Celebrating successes and failures alike, as they are opportunities to learn and improve: Problem: Morale may be low and employees may not be motivated to participate in the change effort. Anti-pattern: Managers may not celebrate successes or may not acknowledge failures. Suggestion: Celebrate successes and acknowledge failures, as they are opportunities to learn and improve, and keep morale high. Empowering employees to take ownership of their work and the change effort: Problem: Employees may not take ownership of their work or the change effort. Anti-pattern: Managers may not empower employees or may not provide them with the autonomy, resources, and support they need to succeed. Suggestion: Empower employees to take ownership of their work and the change effort, provide them with the autonomy, resources, and support they need to succeed, and create a culture of trust and transparency.” Comment: Our Agile transformation with ChatGPT is picking up speed. Now, the bot adds insight to all previously identified success factors: It identifies a likely problem, why the problem may exist, and a first set of ideas on how to remedy it. (Admittedly, sometimes, the text sounds repetitive.) The suggested solutions remain at the meta-level, given the information we provided to ChatGPT. ChatGPT Acting as a Consultant, Creating a Concept for an Agile Transformation Prompt: I want you to act as a consultant for business agility for a large, traditional organization. Based on your previous list of critical success factors, I want you to propose a detailed concept of an Agile transformation. Your proposal shall address: All the people that need to be involved in the transformation A timeline when what shall happen A concept of when and whom to educate in Agile practices and concepts in the organization A way to identify possible change agents to support the transformation effort bottom-up Suggestions on how to handle resistance to change, particularly from middle managers A rough estimation of the budget is required A suggestion on how to finance the transformation Create the concept in the spirit of the Agile Manifesto or the Scrum Guide. ChatGPT: “As a consultant for business agility, I would propose the following detailed concept of an Agile transformation for a large, traditional organization: Involvement: The Scrum framework is used to involve a cross-functional, self-organizing team, including employees at all levels, customers, and partners. This allows the team to align their work with the organization’s goals and objectives and ensures that all stakeholders are engaged and invested in the process. Timeline: The transformation should be an ongoing effort, where the team will continuously inspect and adapt to deliver valuable results. The team will establish a Sprint cadence, usually two weeks, in which they will plan, execute, and review their work. This approach allows for more flexibility and adaptability in the process. Education: Education on Agile practices and concepts should be provided to the team, and the team will continuously educate themselves and others in the organization on Agile principles and values. Change agents: The team should be empowered to drive the change, they will be responsible for the transformation and will have the autonomy to make decisions that align with the overall objectives. The team should also have a clear understanding of the vision of the organization, and they will be able to act as change agents within the organization. Resistance to change: The team will continuously gather feedback from all stakeholders, including those who may resist change. They will use this feedback to adjust their work and to address any concerns or resistance. Budget: The budget for the transformation should be allocated for the team, for example for training, coaching, and implementation. The team should be able to adjust their budget as needed and should be able to measure their value delivered. Financing: The financing of the transformation should be done through the allocation of budget to the team, and the team should be able to measure their value delivered. In summary, the Agile transformation should be an ongoing effort, where the team will continuously inspect and adapt to deliver valuable results. The team should be empowered to drive the change, and continuously gather feedback from all stakeholders to address any concerns or resistance. The budget for the transformation should be allocated for the team, and the team should be able to measure their value delivered.” Comment: Some of these responses are a reasonable start, although they are in classic consultant speak. However, the answers regarding covering resistance to change, budget, and funding do not meet my expectations for an Agile transformation with ChatGPT. Follow-up question: Can you recommend outsourcing an Agile transformation to a big consultancy? ChatGPT: “Outsourcing an Agile transformation to a big consultancy can have its benefits, such as access to specialized expertise and resources, as well as a proven track record of successful transformations. However, it can also have its drawbacks, such as a lack of understanding of the organization’s unique culture and constraints, and a lack of ownership and commitment from the organization’s employees. Here are a few things to consider when outsourcing an Agile transformation to a big consultancy: Align with the organization’s goals and objectives: It is important to ensure that the consultancy’s approach aligns with the organization’s goals and objectives and that the consultancy is able to deliver the results that the organization is seeking. Involve employees in the process: To ensure that the transformation is successful, it is important to involve employees at all levels in the process, regardless of whether it is outsourced or not. This will ensure that the change effort is aligned with the organization’s goals and objectives and that employees are engaged and invested in the process. Ensure a good fit: It is important to choose a consultancy that has a good understanding of the organization’s culture and constraints, and that is able to work well with the organization’s employees. A good fit will ensure that the transformation is successful and that the organization’s employees are engaged and invested in the process. Keep the ownership: Even if the consultancy is providing the expertise and resources, it’s important that the ownership of the change stays within the organization. This will ensure that the employees are able to take ownership of the change and that they are invested in the process. Be prepared for resistance: Change is hard, and it is important to be prepared for resistance, regardless of whether the change is outsourced or not. This can include resistance from middle managers and other employees who may be hesitant to embrace change. It is important to address any resistance head-on and to provide support and resources to help employees through the transition. This can include communication and training on the benefits of the change, as well as addressing concerns and providing support during the transition period. Comment: These answers weigh the pros and cons of outsourcing an Agile transformation to an external consultancy. Regarding the previous set of responses, this one is more substantial. Follow-up question: Why do you believe outside consultants with no intimate knowledge of the organization can support an Agile transformation? ChatGPT: While outside consultants may not have intimate knowledge of the organization, they can bring a fresh perspective and specialized expertise to the Agile transformation effort. They can provide valuable insights into best practices and industry trends, and can help the organization identify and address any issues that may be hindering the transformation. Additionally, outside consultants can provide training and coaching on Agile methodologies and principles, which can be beneficial for employees who may not have experience with Agile practices. They can also provide support and guidance to the organization’s leadership team, helping them to navigate the complexities of the transformation and to make informed decisions. It’s important to note that, while outside consultants can provide valuable expertise and support, the success of the Agile transformation ultimately depends on the organization’s leadership and employees. Therefore, it is important to involve the employees in the process, to ensure that they understand the reasons for the change and are invested in the process. In summary, while outside consultants may not have intimate knowledge of the organization, they can bring specialized expertise, fresh perspective, and support to the Agile transformation effort. However, the success of the transformation ultimately depends on the organization’s leadership and employees, who should be involved in the process and understand the reasons for the change.” Comment: This response could be from a sales prospect of a consultancy specializing in Agile transformations. Unfortunately, I miss information on how often outsourced change projects fail and why it would be different regarding an Agile transformation. Follow-up question: Is there more I should know? ChatGPT: “There are a few additional things to consider when outsourcing an Agile transformation to a big consultancy: Communication and transparency: It is important to establish clear communication channels between the organization and the consultancy, to ensure that everyone is on the same page and that progress is being made. This includes regular status updates, progress reports, and regular meetings between the organization and the consultancy. Sustainability: The transformation should be sustainable, meaning that the organization should be able to continue the change after the consultancy is gone. The consultancy should provide training, coaching, and support to the organization’s employees, so they can continue to improve and evolve after the consultancy leaves. Tailored approach: The consultancy should be able to tailor their approach to the organization’s needs, and not to force a one-size-fits-all solution. They should be able to adapt to the organization’s culture, constraints, and goals. Measuring success: It’s important to establish metrics to measure the success of the transformation, such as customer satisfaction, employee engagement, and delivery of value. This will help the organization to understand the impact of the transformation and to make data-driven decisions. In summary, outsourcing an Agile transformation to a big consultancy can bring specialized expertise and support to the organization, but it’s important to consider factors such as clear communication, sustainability, tailored approach, measuring success, and employee involvement to ensure the success of the transformation.” Comment: This response is more substantial than the previous one. ChatGPT points to several critical success factors that apply whenever external consultancies join a change project. Maybe, an Agile transformation with ChatGPT guiding would have a (slim) chance of success. Follow-up question: Is it worth spending thousands of dollars on junior consultants from a big consultancy with little practical experience in Agile transformations and no knowledge of your organization? ChatGPT: “It is not always necessary to spend thousands of dollars on senior consultants from a big consultancy for an Agile transformation. While they may have more experience and expertise, it’s important to consider the specific needs of your organization and whether junior consultants with less experience but more knowledge about your organization can be as effective. It is important to note that the success of an Agile transformation depends on the ability of the consultants to work with the organization’s employees and understand the organization’s culture and constraints, regardless of their experience level. Junior consultants who have little practical experience in Agile transformations but have knowledge of the organization and its specific needs can be just as effective in leading the transformation. Additionally, it’s important to consider the cost-benefit ratio of hiring a big consultancy firm, and evaluate if the cost of hiring them outweighs the benefits they can bring. Organizations can also consider other options such as hiring an internal Agile coach or finding a smaller, local consultancy that is more specialized in the specific industry. In summary, it’s important to consider the specific needs of the organization and the cost-benefit ratio when deciding whether to hire a big consultancy firm for an Agile transformation. Junior consultants with knowledge of the organization and its specific needs can be just as effective in leading the transformation as senior consultants with more experience.” Comment: This is a bold statement: junior consultants, inexperienced in Agile practices, can successfully settle an Agile transformation if they only know the organization well. I have my doubts in this regard. Conclusion: An Agile Transformation With ChatGPT Given that ChatGPT’s underlying LLM was not trained beyond 2021 and given the niche topic, some of the answers have their weak moments, while others reflect the challenges of an Agile transformation with ChatGPT well. I am eager to learn what will be possible with a newer LLM that also allows training by practitioners in specialized knowledge areas. We thought that AI might initially come after simple office jobs. I am no longer sure about that. Maybe, ChatGPT’s successor will start at the top of the food chain. Would you consider embarking on an Agile transformation with ChatGPT instead of McBoston? Please share your experience with us in the comments.
Why It's Hard to Extract Information From PDF Files PDF, or Portable Document Format, is a popular file format that is widely used for documents such as invoices, purchase orders, and other business documents. However, extracting information from PDFs can be a challenging task for developers. One reason why it is difficult to extract information from PDFs is that the format is not structured. Unlike HTML, which has a specific format for tables and headers that developers can easily identify, PDFs do not have a consistent layout for information. This makes it harder for developers to know where to find the specific information they need. Another reason why it is difficult to extract information from PDFs is that there is no standard layout for information. Each system generates invoices and purchase orders differently, so developers must often write custom code to extract information from each individual document. This can be a time-consuming and error-prone process. Additionally, PDFs can contain both text and images, making it difficult for developers to programmatically extract information from the document. OCR (optical character recognition) can be used to extract text from images, but this adds complexity to the process and may result in errors if the OCR software is not accurate. Existing Solutions Existing solutions for extracting information from PDFs include: Using regex: to match patterns in text after converting the PDF to plain text. Examples include invoice2data and traprange-invoice. However, this method requires knowledge of the format of the data fields. AI-based cloud services: utilize machine learning to extract structured data from PDFs. Examples include pdftables and docparser, but these are not open-source friendly. Yet Another Solution for PDF Data Extraction: Using OpenAI One solution to extract information from PDF files is to use OpenAI's natural language processing capabilities to understand the content of the document. However, OpenAI is not able to work with PDF or image formats directly, so the first step is to convert the PDF to text while retaining the relative positions of the text items. One way to achieve this is to use the PDFLayoutTextStripper library, which uses PDFBox to read through all text items in the PDF file and organize them in lines, keeping the relative positions the same as in the original PDF file. This is important because, for example, in an invoice's items table, if the amount is in the same column as the quantity, it will result in incorrect values when querying for the total amount and total quantity. Here is an example of the output from the stripper: *PO-003847945* Page.........................: 1 of 1 Address...........: Aeeee Consumer Good Co.(QSC) Purchase Order P.O.Box 1234 Dooo, PO-003847945 ABC TL-00074 Telephone........: USR\S.Morato 5/10/2020 3:40 PM Fax...................: 100225 Aaaaaa Eeeeee Date...................................: 5/10/2020 Expected DeliveryDate...: 5/10/2020 Phone........: Attention Information Fax.............: Vendor : TL-00074 AAAA BBBB CCCCCAAI W.L.L. Payment Terms Current month plus 60 days Discount Barcode Item number Description Quantity Unit Unit price Amount Discount 5449000165336 304100 CRET ZERO 350ML PET 5.00 PACK24 54.00 270.00 0.00 0.00 350 5449000105394 300742 CEEOCE EOE SOFT DRINKS 1.25LTR 5.00 PACK6 27.00 135.00 0.00 0.00 1.25 (truncated...) Once the PDF has been converted to text, the next step is to call the OpenAI API and pass the text along with queries such as "Extract fields: 'PO Number', 'Total Amount'". The response will be in JSON format, and GSON can be used to parse it and extract the final results. This two-step process of converting the PDF to text and then using OpenAI's natural language processing capabilities can be an effective solution for extracting information from PDF files. The query is as simple as follows, with %s replaced by PO text content: private static final String QUERY = """ Want to extract fields: "PO Number", "Total Amount" and "Delivery Address". Return result in JSON format without any explanation. The PO content is as follows: %s """; The query consists of two components: Specifying the desired fields. Formatting the field values as JSON data for easy retrieval from API response. And here is the example response from OpenAI: { "object": "text_completion", "model": "text-davinci-003", "choices": [ { "text": "\\n{\\n \\"PO Number\\": \\"PO-003847945\\",\\n \\"Total Amount\\": \\"1,485.00\\",\\n \\"Delivery Address\\": \\"Peera Consumer Good Co.(QSC), P.O.Box 3371, Dohe, QAT\\"\\n}", "index": 0, "logprobs": null, "finish_reason": "stop" } ], // ... some more fields } Decoding the text field's JSON string yields the following desired fields: { "PO Number": "PO-003847945", "Total Amount": "1,485.00", "Delivery Address": "Peera Consumer Good Co.(QSC), P.O.Box 3371, Dohe, QAT" } Run Sample Code Prerequisites: Java 16+ Maven Steps: Create an OpenAI account. Log in and generate an API key. Replace OPENAI_API_KEY in Main.java with your key. Update SAMPLE_PDF_FILE if needed. Execute the code and view the results from the output.
Recently, we learned of a breach of CircleCI. They strongly suggest that all their customers: Immediately rotate any and all secrets stored in CircleCI. Review internal logs for their systems for unauthorized access from December 21, 2022, through January 4, 2023, or the date you rotated the secrets in CircleCI. The CircleCI team also invalidated all Project API tokens and informed users that they would need to be replaced. This moment can serve as an inflection point for any team relying on CI/CD infrastructure as a chance to review their pipeline security. While the steps the CircleCI team laid out are solid, they are reactions to an event. There are some steps DevOps teams can take to become more proactive. Identity Management and Access Control in CI/CD If you are new to this topic, a continuous integration/continuous delivery, CI/CD pipeline is a way to deliver applications through a consistent automated process. Developers push their code to repository hosts like GitHub, GitLab, Azure DevOps, or BitBucket, which starts the automated processes of building the application, testing it, and finally deploying that code to production. While CI/CD pipelines allow teams to work faster and with fewer chances for manual errors, they also present a new attack surface for malicious actors to exploit. One of the best ways to ensure attackers cannot exploit your CD/CD pipelines is by locking down who can access which tools and resources. If someone is not explicitly granted access privileges, they should not be able to gain access. This can include passwords, access keys, and other access control mechanisms and should be implemented as granularly as possible. While it might be convenient to grant access to all the members of the DevOps teams, developers and engineers accessing everything; this is a very bad idea, as it significantly increases the chances that a single breached account will allow attackers full access to all related systems. We encourage everyone to enforce single sign-on (SSO) or use role-based access controls (RBAC) wherever possible. These can help you tightly scope access to grant access to only the relevant systems to the right people. Secure Non-Human Access We have discussed human access, but it is also critical to properly scope and manage access of third-party services and tools. When deploying any element of your CI/CD pipeline, you should know exactly what systems and services will send requests and where those responses will go. If malicious code does find its way into your pipeline, make sure it cannot phone home or communicate with unauthorized users. When relying on container-based systems, ensure you employ authenticators to verify machine identity. An authenticator verifies a container's attributes and only grants the scoped access level if approved.One final note on non-human access: You should eliminate any unused assets as soon as possible. This means destroying containers and VMs once they are no longer needed and blocking access to third-party tools if they are not being actively used. Don't Expose Secrets in Your CI/CD Pipeline One of the first things attackers look for within any system is a way to gain further access, namely hardcoded credentials, also known as secrets. While it might seem obvious that you should not store your secrets in plaintext, this happens more often than you might expect. Given the complexity of CI/CD pipelines and the rate at which they evolve, it is often the easiest and quickest solution developers turn to when rushing to meet deadlines. It is also easy to overlook how secrets are presented in your pipeline logs, where they can show up if not properly redacted. Fortunately, most CI/CD providers offer built-in solutions that allow you to store credentials and then programmatically call them when needed. Alternatively, most solutions offer paths to integrate key managers like Hashicorp Vault into your pipelines.A critical step towards keeping plaintext credentials away from your pipelines is implementing secrets detection as early as possible. The best case is when developers catch hardcoded secrets before they can commit them, which they can do on their local machine with pre-commit hooks or a pre-push hook, in a pre-receive hook, or in a CI environment. Automate Secrets Rotation Another important element of secrets management is controlling the lifetime of each credential. The longer a secret remains valid, the more likely it can be found and used by an attacker. One of the biggest reasons many credentials live for as long as they do is the pain caused by manually rotating keys and the fear of disrupting production systems. That is where automatic key rotation saves the day.Most platform providers like AWS, Google Cloud, and Azure DevOps either offer or provide straightforward ways to script automated key rotation. This can be executed manually or on a fixed schedule. Automating key rotation can help everyone rest easy in an emergency, as it allows the invalidation and replacement process to be tested outside of an incident. When automating secrets creation, remember to scope those credentials rightly to their task. Always follow the principle of least privilege, limiting access to only the necessary data and systems to complete the job at hand. Actively Monitor for Suspicious Behavior Looking for suspicious activity after an event can help you identify what went wrong or who accessed what systems during an attack. But you don't need to wait for something bad to happen to leverage your logs.When most people think about actively motoring CI/CD, it is with the intention of watching for resource consumption or availability. However, you can also configure alerts for suspicious activity, such as the unexpected addition of a privileged role or attempts to connect to an unauthorized service. Honeytokens for Intrusion Detection Another way to protect your CI/CD pipelines is to set traps that intruders can trigger throughout your environment. These are commonly referred to as honeytokens. Honeytokens are credentials that look like real secrets but are not valid against any service, but when used, they will trigger an alert and report on the user's address. One way to consistently deploy honeytokens is to make them a part of your automated environment builds, leveraging tools like Terraform and our open-source project ggcanary. Canary tokens can be created and deployed in your CI/CD pipelines, as well as code repositories or project management systems. Your CI/CD Security Is a Shared Responsibility One of the main things users expect from their service providers is that they will keep their platform secure. The reality is that malicious attackers will continue to penetrate and exploit any and all targets, including CI/CD providers. Ultimately securing your pipelines is a joint effort of all the players involved, including your team.While it's unfortunate that CircleCI has been breached and so many customers have been affected, we also want to make this a call to action to anyone relying on a CI/CD pipeline to review their security practices.This article lays out just a few of the ways you can secure your CI/CD pipeline. For more information on CI/CD security, we recommend reviewing the OWASP Top 10 CI/CD Security Risks.
For more than three decades – practically an eon in tech years – Excel has greased the wheels of businesses and aspiring individuals around the world. In the US alone, more than 700,000 companies are estimated to use Excel, and worldwide the number of unique Excel users hovers at just over a billion – about 1/8th of the global population. However, with a fast-developing era of subscription-based, point-and-click enterprise SaaS data solutions well underway, it’s becoming clear that using Excel in the “traditional” sense is less obviously efficient than it once was. Clicking on the Excel desktop icon, selecting “New Workbook” and formatting a unique document from a blank page or template is just enough of a resource commitment to put you behind the eight ball by today’s lightning-fast project delivery standards. The mundane, predictable nature of Excel’s usage has, for well over a decade, begged for and received the same form of intervention which much of the restless tech community constantly marches toward: automation. Thankfully, automating the creation and manipulation of standard Excel documents requires simple programmatic intervention. Just about any operation you can perform within Excel documents manually can be performed programmatically with access to the right libraries. Once those doors are opened, the process of customizing Excel documents and integrating data within them at scale becomes instantly more efficient (this approach reduces the natural element of human input error that we expect from manual data entry and spreadsheet formatting). If we know, for example, that a monthly report spreadsheet only requires up-to-date information from a consistent set of data sources, it’s ideal if we can configure those sources to write information directly to the file without a human being acting as the middleman. We don’t have to look far to find examples of modern, commonly used Excel editing libraries – the extremely popular Pandas library in Python, for example, natively supports programmatic Excel editing with a variety of useful functions. Since not all programming languages (like Java, for example) natively support Excel manipulation the way Python does, leveraging an external API is one great way to achieve Excel automation. Especially for applications processing Excel files at scale, external API calls can deliver the required libraries for Excel file manipulation along with a welcome boost to overall efficiency by offloading the processing power for that task to a separate server. Demonstration In the remainder of this article, I’ll demonstrate several free-to-use API solutions which are collectively designed for programmatically creating and editing Excel files securely and with ease. All require just a few lines of complementary, ready-to-run Java code (supplied below) along with a free-tier API key to complete your API call. The API solutions demonstrated in this article include the following: Begin editing a document: This API initiates document upload to begin editing operations on existing Excel documents, returning a secure editing URL (which will cache for 30 minutes and cannot be used to directly access the document). Create a blank Excel XLSX spreadsheet: This API returns the encoding for a blank Excel XLSX format file, along with its own editing URL. Insert a new worksheet into an Excel XLSX spreadsheet: This API creates and inserts a new worksheet using the temporary editing URL from either an existing or programmatically generated XLSX file. Set and update cell contents in an Excel XLSX spreadsheet, worksheet by cell identifier: This API allows content to be placed and formatted within specified cells in an Excel worksheet using a temporary editing URL. Finish editing a document: This API finalizes document editing by processing the temporary editing URL and providing the encoding for the final, edited Excel file in return. To summarize the above, you can securely edit existing Excel documents programmatically using the temporary editing URL generated via the “Begin Editing a Document” API call. To create a new Excel document from scratch, you can call the “Create a Blank Excel XLSX Spreadsheet” API instead, which also generates its own version of a temporary editing URL. Once a temporary editing URL is generated in either scenario, you can subsequently insert new worksheets into your document or write information to specific cells by passing the URL through each API’s input request parameters. For both pre-existing and programmatically generated Excel documents, once the editing process is complete, the temporary editing URL must be processed in the “Finish Editing” stage to return the updated XLSX file encoding. This final step is necessary to ensure your document’s security; the temporary editing URL cannot be used to access the document directly, and it will only stay cached for 30 minutes. Below, I will provide code to structure each individual API call in Java. To begin, however, you’ll first need to follow a few steps to install the Java SDK. To install with Maven, first, add a reference to the repository in poml.xml: XML <repositories> <repository> <id>jitpack.io</id> <url>https://jitpack.io</url> </repository> </repositories> And then add a reference to the dependency in pom.xml: XML <dependencies> <dependency> <groupId>com.github.Cloudmersive</groupId> <artifactId>Cloudmersive.APIClient.Java</artifactId> <version>v4.25</version> </dependency> </dependencies> Alternatively, to install with Gradle, add it to your root build.gradle at the end of repositories: allprojects { repositories { ... maven { url 'https://jitpack.io' } } } And then add the dependency in build.gradle: dependencies { implementation 'com.github.Cloudmersive:Cloudmersive.APIClient.Java:v4.25' } With installation complete, you can turn your attention to structuring each individual API call. 1. Begin Editing a Document To begin the process of uploading and editing an existing Excel document based on its file path, first, call the “Begin Editing” API using the following code examples. This will return a temporary editing URL: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.EditDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); EditDocumentApi apiInstance = new EditDocumentApi(); File inputFile = new File("/path/to/inputfile"); // File | Input file to perform the operation on. try { String result = apiInstance.editDocumentBeginEditing(inputFile); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling EditDocumentApi#editDocumentBeginEditing"); e.printStackTrace(); } 2. Create a Blank Excel XLSX Spreadsheet To begin your programmatic editing process with a brand-new file, use the following code instead. This will automatically generate a blank XLSX file and return a temporary URL: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.EditDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); EditDocumentApi apiInstance = new EditDocumentApi(); CreateBlankSpreadsheetRequest input = new CreateBlankSpreadsheetRequest(); // CreateBlankSpreadsheetRequest | Document input request try { CreateBlankSpreadsheetResponse result = apiInstance.editDocumentXlsxCreateBlankSpreadsheet(input); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling EditDocumentApi#editDocumentXlsxCreateBlankSpreadsheet"); e.printStackTrace(); } 3. Insert a New Worksheet Into an Excel Spreadsheet To insert a new worksheet into your Excel document, use the code below to make your API call. You will need to pass the temporary editing URL through the InputFileUrl request parameter, and specify the path and name of your new worksheet: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.EditDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); EditDocumentApi apiInstance = new EditDocumentApi(); InsertXlsxWorksheetRequest input = new InsertXlsxWorksheetRequest(); // InsertXlsxWorksheetRequest | Document input request try { InsertXlsxWorksheetResponse result = apiInstance.editDocumentXlsxInsertWorksheet(input); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling EditDocumentApi#editDocumentXlsxInsertWorksheet"); e.printStackTrace(); } 4. Set, Update Cell Contents in an Excel Spreadsheet, Worksheet by Cell Identifier To go about setting cell contents in your Excel file, use the code below, and once again include the original file’s temporary editing URL when you make your API call. The request parameters for this API include the option to specify worksheet, cell identifier, cell value, and more (enumerated in an example request body below). Within the cell value specification, you can include text contents in your specified cell(s), format styles based on style index, and/or configure formulas for that specific cell. Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.EditDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); EditDocumentApi apiInstance = new EditDocumentApi(); SetXlsxCellByIdentifierRequest input = new SetXlsxCellByIdentifierRequest(); // SetXlsxCellByIdentifierRequest | Document input request try { SetXlsxCellByIdentifierResponse result = apiInstance.editDocumentXlsxSetCellByIdentifier(input); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling EditDocumentApi#editDocumentXlsxSetCellByIdentifier"); e.printStackTrace(); } Please refer to the below example to properly structure your input request: JSON { "InputFileBytes": "string", "InputFileUrl": "string", "WorksheetToUpdate": { "Path": "string", "WorksheetName": "string" }, "CellIdentifier": "string", "CellValue": { "Path": "string", "TextValue": "string", "CellIdentifier": "string", "StyleIndex": 0, "Formula": "string" } } 5. Finish Editing a Document, Download Result From Document Editing When your editing process is complete, call the “finish editing” operation using the code provided below. You only need your file’s temporary editing URL for this operation, and it will return proper XLSX file encoding: Java // Import classes: //import com.cloudmersive.client.invoker.ApiClient; //import com.cloudmersive.client.invoker.ApiException; //import com.cloudmersive.client.invoker.Configuration; //import com.cloudmersive.client.invoker.auth.*; //import com.cloudmersive.client.EditDocumentApi; ApiClient defaultClient = Configuration.getDefaultApiClient(); // Configure API key authorization: Apikey ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey"); Apikey.setApiKey("YOUR API KEY"); // Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null) //Apikey.setApiKeyPrefix("Token"); EditDocumentApi apiInstance = new EditDocumentApi(); FinishEditingRequest reqConfig = new FinishEditingRequest(); // FinishEditingRequest | Cloudmersive Document URL to complete editing on try { byte[] result = apiInstance.editDocumentFinishEditing(reqConfig); System.out.println(result); } catch (ApiException e) { System.err.println("Exception when calling EditDocumentApi#editDocumentFinishEditing"); e.printStackTrace(); } Once you’ve completed your “Finish Editing” API call, you can use the output XLSX file encoding to generate your new file. For your reference, the API solutions provided above are only a few of many document editing API iterations available through the Cloudmersive Convert API. You may also programmatically rename worksheets, get specific columns from a worksheet, clear contents, get macro information, and much more using additional APIs (starting and ending with the “Begin Editing” and “Finish Editing” API calls).
As much as the JPA Criteria builder is expressive, JPA queries are often equally verbose, and the API itself can be unintuitive to use, especially for newcomers. In the Quarkus ecosystem, Panache is a partial remedy for these problems when using Hibernate. Still, I find myself juggling the Panache’s helper methods, preconfigured enums, and raw strings when composing anything but the simplest of queries. You could claim I am just inexperienced and impatient or, instead, acknowledge that the perfect API is frictionless to use for everyone. Thus, the user experience of writing JPA queries can be further improved in that direction. Introduction One of the remaining shortcomings is that raw strings are inherently not type-safe, meaning my IDE rejects me the helping hand of code completion and wishes me good luck at best. On the upside, Quarkus facilitates application relaunches in a split second to issue quick verdicts on my code. And nothing beats the heart-felt joy and genuine surprise when I have composed a working query on the fifth, rather than the tenth, attempt... With this in mind, we built the open-source library JPAstreamer to make the process of writing Hibernate queries more intuitive and less time-consuming while leaving your existing codebase intact. It achieves this goal by allowing queries to be expressed as standard Java Streams. Upon execution, JPAstreamer translates the stream pipeline to an HQL query for efficient execution and avoids materializing anything but the relevant results. Let me take an example—in some random database exists a table called Person represented in a Hibernate application by the following standard Entity: Java @Entity @Table(name = "person") public class Person { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) @Column(name = "person_id", nullable = false, updatable = false) private Integer actorId; @Column(name = "first_name", nullable = false, columnDefinition = "varchar(45)") private String firstName; @Column(name = "last_name", nullable = false, columnDefinition = "varchar(45)") private String lastName; @Column(name = "created_at", nullable = false, updatable = false) private LocalDateTime createdAt; // Getters for all fields will follow from here } To fetch the Person with an Id of 1 using JPAstreamer, all you need is the following: Java @ApplicationScoped public class PersonRepository { @PersistenceContext EntityManagerFactory entityManagerFactory; private final JPAStreamer jpaStreamer; public PersonRepository EntityManagerFactory entityManagerFactory) { jpaStreamer = JPAStreamer.of(entityManagerFactory); <1> } @Override public Optional<Person> getPersonById(int id) { return this.jpaStreamer.from(Person.class) <2> .filter(Person$.personId.equal(id)) <3> .findAny(); } } <1> Initialize JPAstreamer in one line, the underlying JPA provider handles the DB configuration. <2> The stream source is set to be the Person table. <3> The filter operation is treated as an SQL WHERE clause and the condition is expressed type-safely with JPAstreamer predicates (more on this to follow). Despite it looking as if JPAstreamer operates on all Person objects, the pipeline is optimized to a single query, in this case: Plain Text select person0_.person_id as person_id1_0_, person0_.first_name as first_na2_0_, person0_.last_name as last_nam3_0_, person0_.created_at as created_4_0_, from person person0_ where person0_.person_id=1 Thus, only the Person matching the search criteria is ever materialized. Next, we can look at a more complex example in which I am searching for Person’s with a first name ending with an “A” and a last name that starts with “B.” The matches are sorted primarily by first name and secondly by last name. I further decide to apply an offset of 5, excluding the first five results, and to limit the total results to 10. Here is the stream pipeline to achieve this task: Java List<Person> list = jpaStreamer.stream(Person.class) .filter(Person$.firstName.endsWith("A").and(Person$.lastName.startsWith("B"))) <1> .sorted(Person$.firstName.comparator().thenComparing(Person$.lastName.comparator())) <2> .skip(5) <3> .limit(10) <4> .collect(Collectors.toList()) <1> Filters can be combined with the and/or operators. <2> Easily filter on one or more properties. <3> Skip the first 5 Persons. <4> Return at most 10 Persons. In the context of queries, the stream operators filter, sort, limit, and skip, all have a natural mapping that makes the resulting query expressive and intuitive to read while remaining compact. This query is translated by JPAstreamer to the following HQL statement: Plain Text select person0_.person_id as person_id1_0_, person0_.first_name as first_na2_0_, person0_.last_name as last_nam3_0_, person0_.created_at as created_4_0_, from person person0_ where person0_.person_id=1 where (person0_.first_name like ?) and (person0_.last_name like ?) order by person0_.first_name asc, person0_.last_name asc limit ?, ? How JPAstreamer Works Okay, it looks simple. But how does it work? JPAstreamer uses an annotation processor to form a meta-model at compile time. It inspects any classes marked with the standard JPA annotation @Entity, and for every entity Foo.class, a corresponding Foo$.class is created. The generated classes represent entity attributes as Fields used to form predicates on the form User$.firstName.startsWith("A") that can be interpreted by JPAstreamer’s query optimizer. It is worth repeating that JPAstreamer does not alter or disturb the existing codebase but merely extends the API to handle Java stream queries. Installing the JPAstreamer Extension JPAstreamer is installed as any other Quarkus extension, using a Maven dependency: XML <dependency> <groupId>io.quarkiverse.jpastreamer</groupId> <artifactId>quarkus-jpastreamer</artifactId> <version>1.0.0</version> </dependency> After the dependency is added, rebuild your Quarkus application to trigger JPAstreamer’s annotation processor. The installation is complete once the generated fields reside in /target/generated-sources; you’ll recognize them by the trailing $ in the class names, e.g., Person$.class. Note: JPAstreamer requires an underlying JPA provider, such as Hibernate. For this reason, JPAstreamer needs no additional configuration as the database integration is taken care of by the JPA provider. JPAstreamer and Panache Any Panache fan will note that JPAstreamer shares some of its objectives with Panache, in simplifying many common queries. Still, JPAstreamer distinguishes itself by instilling more confidence in the queries with its type-safe stream interface. However, no one is forced to take a pick as Panache and JPAstreamer work seamlessly alongside each other. Note: Here is an example Quarkus application that uses both JPAstreamer and Panache. At the time of writing, JPAstreamer does not have support for Panache’s Active Record Pattern, as it relies on standard JPA Entities to generate its meta-model. This will likely change in the near future. Summary JPA in general, and Hibernate have greatly simplified application database access, but its API sometimes forces unnecessary complexity. With JPAstreamer, you can utilize JPA while keeping your codebase clean and maintainable.
ChatGPT can help you generate API tests in REST Assured! ChatGPT is a type of AI that can comprehend and reply to human input, like a highly intelligent robot you can have a conversation with. Just open an account for free and start chatting with it like with a friend. Let's begin by requesting ChatGPT to generate an API specification so that we can show how it can also create the tests later on. Me to ChatGPT: "Create an OpenAPI specification for a tracking API of a new shipping company called FoxShipping." It generates a sample OpenAPI spec for our new FoxShipping tracking API: YAML openapi: 3.0. info: title: FoxShipping Tracking API version: 1.0.0 description: This API allows customers to track their shipments with FoxShipping. contact: name: FoxShipping Support email: support@foxshipping.com url: https://www.foxshipping.com servers: - url: https://api.foxshipping.com/tracking paths: /: post: summary: Track a shipment with FoxShipping operationId: trackShipment requestBody: required: true content: application/json: schema: type: object properties: trackingNumber: type: string example: 1234567890 apiKey: type: string example: YOUR_API_KEY responses: 200: description: Successful tracking request content: application/json: schema: type: object properties: trackingNumber: type: string example: 1234567890 status: type: string example: "Delivered" estimatedDelivery: type: string format: date-time example: "2022-01-01T12:00:00Z" 401: description: Unauthorized request 404: description: Tracking number not found Now, we can ask it to create the API tests for it. Me to ChatGPT: "Write an API test for FoxShipping Tracking API using RestAssured." Java import io.restassured.RestAssured import io.restassured.response.Response; import org.junit.BeforeClass; import org.junit.Test; public class FoxShippingTrackingAPITest { private static final String API_KEY = "YOUR_API_KEY"; @BeforeClass public static void setup() { RestAssured.baseURI = "https://api.foxshipping.com/tracking"; } @Test public void testTrackShipment() { String trackingNumber = "1234567890"; String requestBody = "{\"trackingNumber\":\"" + trackingNumber + "\",\"apiKey\":\"" + API_KEY + "\"}"; Response response = RestAssured.given(). contentType("application/json"). body(requestBody). when(). post(); int statusCode = response.getStatusCode(); assertEquals(200, statusCode); String responseBody = response.getBody().asString(); assertTrue(responseBody.contains(trackingNumber)); assertTrue(responseBody.contains("status")); assertTrue(responseBody.contains("estimatedDelivery")); } } Looks decent, but we need to cover the error cases as well. We follow up. Me to ChatGPT: "Please also add the tests for the error scenarios." import io.restassured.RestAssured import io.restassured.response.Response; import org.junit.BeforeClass; import org.junit.Test; public class FoxShippingTrackingAPITest { private static final String API_KEY = "YOUR_API_KEY"; @BeforeClass public static void setup() { RestAssured.baseURI = "https://api.foxshipping.com/tracking"; } @Test public void testTrackShipment() { String trackingNumber = "1234567890"; String requestBody = "{\"trackingNumber\":\"" + trackingNumber + "\",\"apiKey\":\"" + API_KEY + "\"}"; Response response = RestAssured.given(). contentType("application/json"). body(requestBody). when(). post(); int statusCode = response.getStatusCode(); assertEquals(200, statusCode); String responseBody = response.getBody().asString(); assertTrue(responseBody.contains(trackingNumber)); assertTrue(responseBody.contains("status")); assertTrue(responseBody.contains("estimatedDelivery")); } @Test public void testUnauthorizedRequest() { String trackingNumber = "1234567890"; String requestBody = "{\"trackingNumber\":\"" + trackingNumber + "\",\"apiKey\":\"INVALID_API_KEY\"}"; Response response = RestAssured.given(). contentType("application/json"). body(requestBody). when(). post(); int statusCode = response.getStatusCode(); assertEquals(401, statusCode); String responseBody = response.getBody().asString(); assertTrue(responseBody.contains("Unauthorized request")); } @Test public void testTrackingNumberNotFound() { String trackingNumber = "INVALID_TRACKING_NUMBER"; String requestBody = "{\"trackingNumber\":\"" + trackingNumber + "\",\"apiKey\":\"" + API_KEY + "\"}"; Response response = RestAssured.given(). contentType("application/json"). body(requestBody). when(). post(); int statusCode = response.getStatusCode(); assertEquals(404, statusCode); String responseBody = response.getBody().asString(); assertTrue(responseBody.contains("Tracking number not found")); } } With that, we now have 3 test cases that cover all 3 scenarios outlined in the FoxShipping Tracking service OpenAPI specification. These serve as a solid foundation for you to use as a starting point and build upon by adding your own customizations and additional test cases. What other topics related to using ChatGPT in testing would you like to learn more about? Please leave your suggestions in the comments below and I will write about them. Feel free to follow me on DZone and on LinkedIn for more content like this. P.S. I generated the image for this article you have seen in the header using DALL-E 2 from OpenAPI by asking it for "an AI entity pair programming with a human developer." It's pretty good! Please note that it is not recommended to paste your company's private proprietary code into ChatGPT for security reasons unless you have a good reason and permission to do so! This article's purpose is to demonstrate the power of the new AI technology that might disrupt the testing landscape. If you are looking for specific test-generation recipes for your team, feel free to reach out to me directly.
In the cloud-native era, we often hear that "security is job zero," which means it's even more important than any number one priority. Modern infrastructure and methodologies bring us enormous benefits, but, at the same time, since there are more moving parts, there are more things to worry about: how to control access to your infrastructure? How to control access between services? Who can access what? Etc. There are many questions to be answered, including policies: a bunch of security rules, criteria, and conditions. Examples: Who can access this resource? Which subnet egress traffic is allowed from? Which clusters a workload must be deployed to? Which protocols are not allowed for reachable servers from the Internet? Which registry binaries can be downloaded from? Which OS capabilities can a container execute with? Which times of day can the system be accessed? All organizations have policies since they encode important knowledge about compliance with legal requirements, work within technical constraints, avoid repeating mistakes, etc. Since policies are so important today, let's dive deeper into how to best handle them in the cloud-native era. Why Policy-as-Code? Policies are based on written or unwritten rules that permeate an organization's culture. So, for example, there might be a written rule in our organizations explicitly saying: For servers accessible from the Internet on a public subnet, it's not a good practice to expose a port using the non-secure "HTTP" protocol. How do we enforce it? If we create infrastructure manually, a four-eye principle may help. But first, always have a second guy together when doing something critical. If we do Infrastructure as Code and create our infrastructure automatically with tools like Terraform, a code review could help. However, the traditional policy enforcement process has a few significant drawbacks: You can't be guaranteed this policy will never be broken. People can't be aware of all the policies at all times, and it's not practical to manually check against a list of policies. For code reviews, even senior engineers will not likely catch all potential issues every single time. Even though we've got the best teams in the world that can enforce policies with no exceptions, it's difficult, if possible, to scale. Modern organizations are more likely to be agile, which means many employees, services, and teams continue to grow. There is no way to physically staff a security team to protect all of those assets using traditional techniques. Policies could be (and will be) breached sooner or later because of human error. It's not a question of "if" but "when." And that's precisely why most organizations (if not all) do regular security checks and compliance reviews before a major release, for example. We violate policies first and then create ex post facto fixes. I know, this doesn't sound right. What's the proper way of managing and enforcing policies, then? You've probably already guessed the answer, and you are right. Read on. What Is Policy-as-Code (PaC)? As business, teams, and maturity progress, we'll want to shift from manual policy definition to something more manageable and repeatable at the enterprise scale. How do we do that? First, we can learn from successful experiments in managing systems at scale: Infrastructure-as-Code (IaC): treat the content that defines your environments and infrastructure as source code. DevOps: the combination of people, process, and automation to achieve "continuous everything," continuously delivering value to end users. Policy-as-Code (PaC) is born from these ideas. Policy as code uses code to define and manage policies, which are rules and conditions. Policies are defined, updated, shared, and enforced using code and leveraging Source Code Management (SCM) tools. By keeping policy definitions in source code control, whenever a change is made, it can be tested, validated, and then executed. The goal of PaC is not to detect policy violations but to prevent them. This leverages the DevOps automation capabilities instead of relying on manual processes, allowing teams to move more quickly and reducing the potential for mistakes due to human error. Policy-as-Code vs. Infrastructure-as-Code The "as code" movement isn't new anymore; it aims at "continuous everything." The concept of PaC may sound similar to Infrastructure as Code (IaC), but while IaC focuses on infrastructure and provisioning, PaC improves security operations, compliance management, data management, and beyond. PaC can be integrated with IaC to automatically enforce infrastructural policies. Now that we've got the PaC vs. IaC question sorted out, let's look at the tools for implementing PaC. Introduction to Open Policy Agent (OPA) The Open Policy Agent (OPA, pronounced "oh-pa") is a Cloud Native Computing Foundation incubating project. It is an open-source, general-purpose policy engine that aims to provide a common framework for applying policy-as-code to any domain. OPA provides a high-level declarative language (Rego, pronounced "ray-go," purpose-built for policies) that lets you specify policy as code. As a result, you can define, implement and enforce policies in microservices, Kubernetes, CI/CD pipelines, API gateways, and more. In short, OPA works in a way that decouples decision-making from policy enforcement. When a policy decision needs to be made, you query OPA with structured data (e.g., JSON) as input, then OPA returns the decision: Policy Decoupling OK, less talk, more work: show me the code. Simple Demo: Open Policy Agent Example Pre-requisite To get started, download an OPA binary for your platform from GitHub releases: On macOS (64-bit): curl -L -o opa https://openpolicyagent.org/downloads/v0.46.1/opa_darwin_amd64 chmod 755 ./opa Tested on M1 mac, works as well. Spec Let's start with a simple example to achieve an Access Based Access Control (ABAC) for a fictional Payroll microservice. The rule is simple: you can only access your salary information or your subordinates', not anyone else's. So, if you are bob, and john is your subordinate, then you can access the following: /getSalary/bob /getSalary/john But accessing /getSalary/alice as user bob would not be possible. Input Data and Rego File Let's say we have the structured input data (input.json file): { "user": "bob", "method": "GET", "path": ["getSalary", "bob"], "managers": { "bob": ["john"] } } And let's create a Rego file. Here we won't bother too much with the syntax of Rego, but the comments would give you a good understanding of what this piece of code does: File example.rego: package example default allow = false # default: not allow allow = true { # allow if: input.method == "GET" # method is GET input.path = ["getSalary", person] input.user == person # input user is the person } allow = true { # allow if: input.method == "GET" # method is GET input.path = ["getSalary", person] managers := input.managers[input.user][_] contains(managers, person) # input user is the person's manager } Run The following should evaluate to true: ./opa eval -i input.json -d example.rego "data.example" Changing the path in the input.json file to "path": ["getSalary", "john"], it still evaluates to true, since the second rule allows a manager to check their subordinates' salary. However, if we change the path in the input.json file to "path": ["getSalary", "alice"], it would evaluate to false. Here we go. Now we have a simple working solution of ABAC for microservices! Policy as Code Integrations The example above is very simple and only useful to grasp the basics of how OPA works. But OPA is much more powerful and can be integrated with many of today's mainstream tools and platforms, like: Kubernetes Envoy AWS CloudFormation Docker Terraform Kafka Ceph And more. To quickly demonstrate OPA's capabilities, here is an example of Terraform code defining an auto-scaling group and a server on AWS: With this Rego code, we can calculate a score based on the Terraform plan and return a decision according to the policy. It's super easy to automate the process: terraform plan -out tfplan to create the Terraform plan terraform show -json tfplan | jq > tfplan.json to convert the plan into JSON format opa exec --decision terraform/analysis/authz --bundle policy/ tfplan.json to get the result.
Mind Map Reuse in Software Groups
February 4, 2023 by
Agile Scrum and the New Way of Work in 2023
February 3, 2023 by
February 3, 2023 by CORE
Machine Learning and Artificial Intelligence Advancements in MDM
February 4, 2023 by
February 4, 2023 by
Enabling DB Migrations Using Kubernetes Init
February 3, 2023 by
February 4, 2023 by
Enabling DB Migrations Using Kubernetes Init
February 3, 2023 by
Implementing Adaptive Concurrency Limits
February 3, 2023 by
Enabling DB Migrations Using Kubernetes Init
February 3, 2023 by
Data Binding in React: The Easy Way
February 3, 2023 by
How To Migrate From PostgreSQL to YugabyteDB With YugabyteDB Voyager
February 3, 2023 by
Enabling DB Migrations Using Kubernetes Init
February 3, 2023 by
Implementing Adaptive Concurrency Limits
February 3, 2023 by
How To Get Page Source in Selenium Using Python
February 3, 2023 by
Machine Learning and Artificial Intelligence Advancements in MDM
February 4, 2023 by
Data Binding in React: The Easy Way
February 3, 2023 by
How To Get Page Source in Selenium Using Python
February 3, 2023 by