The topic of security covers many different facets within the SDLC. From focusing on secure application design to designing systems to protect computers, data, and networks against potential attacks, it is clear that security should be top of mind for all developers. This Zone provides the latest information on application vulnerabilities, how to incorporate security earlier in your SDLC practices, data governance, and more.
In today’s text, I will describe and explain OpenID Connect Flows. The processes of authentication are described in the OpenID Connect specification. As OpenID Connect is built upon OAuth, part of the concepts below will have the same meaning as in the case of OAuth. What Is an OpenID Connect Flow? Flow is the OpenID Connect counterpart of the OAuth Grant Type. It is a process of obtaining an Access Token. It describes the exact sequence of steps involved in handling a particular request. As a result, flow affects how applications involved in handling particular requests communicate with one another. Everything is more or less similar to Grant Types from OAuth. However, there is a slight difference in how the abstract protocol works in OpenID Connect. 1. The RP (Client) sends a request to the OpenID Provider (OP). 2. The OP authenticates the End-User and obtains authorization. 3. The OP responds with an ID Token and usually an Access Token. 4. The RP can send a request with the Access Token to the UserInfo Endpoint. 5. The UserInfo Endpoint returns Claims about the End-User. As for abbreviations and concepts used in the above description: Claim is a piece of information about the requesting Entity. RP means Relying Party. It is an OAuth 2.0 Client requiring End-User Authentication and Claims from an OpenID Provider. OP means OpenID Provider. It is an OAuth 2.0 Authorization Server that is capable of Authenticating the End-User. Additionally, it provides Claims to a Relying Party about the Authentication event and the End-User. UserInfo Endpoint is a protected Resource. When presented with an Access Token by the Client, it returns authorized information about the End-User. The information is represented by the corresponding Authorization Grant. The UserInfo Endpoint URL MUST use HTTPS and MAY contain a port, path, and query parameter components. OpenID Connect Flows Opposite to OAuth being the authorization protocol, OpenID Connect is the authentication protocol. It extensively relies on pseudo-authentication, a mechanism of authentication available in OAuth. In the current OpenID Connect specification, we can find three grant types: Authorization Code Flow Implicit Flow Hybrid Flow The value of the response_type parameter from the Authorization Request determines the Flow for the current process. The table below illustrates how particular values map to Flows. The biggest difference between the flows comes in the form of the “place” where we get our Access Tokens. In the case of Authorization Code, we get them from Token Endpoint. In Implicit Flow, we get Access Tokens from Authentication Response. While in Hybrid Flow, we can choose the source of our tokens. Below you can find a table from the OpenID specification. It can be very useful while picking the Flow you want to use. The Property column contains a set of features. At the same time, the rest of the columns specify if a particular Flow supports the feature or not. Additionally, unlike OAuth, there were no major changes here. Therefore, no Flows were deprecated, and all three are still recommended. Flows Lexicon Authorization Code Flow This flow works by exchanging Authorization Code directly for an ID Token and an Access Token. Authorization Code can be obtained from Token Endpoint. Because we exchange data directly, we can not expose any details to malicious applications with possible access to the User Agent. Furthermore, authentication itself can be done before exchanging code for a token. Therefore, this flow is best suited for Clients that can securely maintain a Client Secret between themselves and the Authorization Server. All tokens are returned from Token Endpoint when using the Authorization Code Flow. Implicit Flow Opposite to the Authorization Code flow here, we get our tokens from Authorization Endpoint. Here Access and Id Tokens are returned directly to the client, exposing them to any application with access to the end user’s Agent. Thanks to this direct return, this flow is best suited for Clients implemented in a browser. Moreover, the Authorization Server does not perform Client Authentication, and Token Endpoint is not used. Hybrid Flow It is the most complex of all three flows. Here, the access token can be returned from both Authorization and Token Endpoints. It can also be returned from both of them at the same time. What is pretty interesting is that the returned tokens are not guaranteed to be the same. Because this flow combines the two previously mentioned, both Authorization and Token endpoints inherit some part of their original behavior. There are also differences here, mostly in the process of handling and validating ID Token. Summary OpenID Connect specification describes fewer procedures than OAuth. However, it is more in detail about how flows should work. So I hope that this humble lexicon of OIDC Flows will come in handy to you at some point. Thank you for your time.
I have a confession to make. As a software developer, I never thought much about security. It was about as important to me as the daily weather on Pluto. Admittedly, my work was mostly deployed in lab environments, but I never did much to look for CVEs, attack paths, and other security vulnerabilities. As a developer, I made some presumptions: The latest Linux images I use (Ubuntu, AlmaLinux, openSUSE, Alpine) are secure by default. If deploying to AWS or another public cloud, the cloud provider took care of the security I cared about. The infrastructure-as-code I used to provision systems was simple and therefore solid. I’ve been safe from attacks for years, so the risk was low. My presumptions were wrong on all counts. Linux Images Aren’t All That Secure Out of the Box When I worked for Chef, the software-defined infrastructure folks, I used a tool called Inspec to quickly analyze systems running the latest (and updated) versions of Linux OS via ssh. I used publicly available benchmarks — like this Linux one from DevSec — to scan for problems. I was stunned by the results. They showed 60 of the 177 vulnerability tests failed on my simple mail server running Ubuntu: $ inspec exec https://github.com/dev-sec/linux-baseline -t ssh://myuser@mail.tiny.lab -i ~/.ssh/id_rsa … Profile Summary: 28 successful controls, 29 control failures, 1 control skipped Test Summary: 115 successful, 60 failures, 2 skipped The errors include routing problems, lax file and directory permissions, and CPU and file-system vulnerabilities. So much for base Linux images being secure out of the box! I now realize these base images (whether I use .iso files or Docker images) are designed to provide modest protections, but not all. And the older the Linux distro version, the worse it gets. I found this to be true regardless of whether I tested systems in my home lab or instances running in AWS. The base images are not automatically secure, and my sample was only testing using a Linux baseline profile. Running more specific tests — like the github.com/dev-sec/ssh-baseline — showed 44 successful tests and 60 failures! My Cloud Instances Were Even Less Secure Similar problems revealed themselves when I scanned instances in AWS using InSpec and Tenable’s Nessus and Cloud Security tools. Nessus uses an agent installed on each instance to grab and report on vulnerability data, and Cloud Security uses an agentless approach that looks at all instances running in my VPCs. The latter revealed misconfigurations and vulnerabilities in my instances and with the cloud environment itself, such as IaM role problems, subnet weaknesses, public IP addresses, and risks associated with using my default VPC for instances. Again, I was stunned by the sheer number of vulnerabilities and misconfigurations I had in my running systems, including many problems marked “critical.” Perhaps most troubling was the fact that these security risks are the result of my oft-used base Terraform code settings. Admittedly, I’m not an AWS or Terraform expert, but I figured I knew my way around pretty well and was following safe practices. In a word, the reality was more like “sorta.” Committing to Fixing My Errant Ways The Tenable tools I used offer instructions on how to fix most vulnerabilities, and I immediately went into my AWS dashboard to fix my Security Groups and IaM roles to make them more secure. I took a fresh look at the code I use to provision systems, too, to make sure I was following good security practices. With those changes in place I went on to set up an AWS site-to-site VPN, which allows me to access my instances using private AWS IP addresses, not public ones. That way, I can access the nodes without ever having to attach risky public IPs. Since my nodes are primarily for testing — and not public-facing — this was a reasonable added bit of security. To be clear, most of my work is in development and testing and nothing I stand up contains proprietary data. Still, I was careless. Perhaps even more so because I was mostly just “testing” and not running production systems. The problem with that presumption — particularly for the things I spun up in the cloud — is that poor cloud settings left open ways for hackers to get access to my AWS infrastructure, not just the instances themselves. AWS covers a lot of the security bases, yes, but my actions (or inactions) quickly and easily put me at risk. Going forward, I’ve hardened my Terraform code by adding much more specific definitions for my VPCs, subnets, roles, and other cloud features. I use tools to regularly check for and apply patches to my instances, even if they’re running the latest OS version. I’m more deliberate in how I create and destroy machines, and I’ve started relying more on Terraform Vault for secrets management and Tenable and Terrascan for regular code and instance scanning. Though I’m not much of a New Year’s resolution guy, taking security more seriously is one commitment I plan to keep in 2023 and beyond.
This is an article from DZone's 2022 Enterprise Application Security Trend Report.For more: Read the Report According to a 2020 Gartner report, it is estimated that by 2023, 75 percent of cybersecurity incidents will result from inadequate management of identities and excessive privileges. To a large extent, this is attributable to the increased number of identities used by modern cloud infrastructures. Applications run as microservices in fully virtualized environments that consist of dynamically orchestrated clusters of multiple containers in the cloud. The security requirements in such environments are significantly different compared to monolithic applications running on-premises. First, the concept of the perimeter does not exist in the cloud. Second, organizations are now handling thousands of dynamically created workloads and identities. Applying traditional IAM tools to manage the dynamic nature of these identities is not adequate. Using static, long-lived, and often excessive access permissions enables attackers to perform lateral movement. To address these issues, a security model is needed that better satisfies today's application security and identity requirements. Zero-trust security is a proactive security model that uses continuous verification and adaptive security controls to protect endpoints and access to applications as well as the data that flows between them. Zero trust replaces the outdated assumption that everything running inside an organization's network can be implicitly trusted. This security model has proven to minimize the attack surface, offer threat protection against internal and external attackers, reduce the lateral movement of attackers, increase operational efficiency, and help support continuous compliance with regulations such as PCI-DSS and the White House's 2021 Cybersecurity Executive Order. Since its inception, zero trust has evolved and expanded, touching almost every corner of the enterprise. This article will provide an overview of how the zero-trust principles can be applied in a microservices environment and what security controls should be implemented on the back end. Zero-Trust Principles Zero trust is primarily based on the concepts of "never trust, always verify" and "assume everything is hostile by default." It is driven by three core principles: assume breach, verify explicitly, and the principle of least privilege. Assume Breach Always assume that cyber attacks will happen, the security controls have been compromised, and the network has been infiltrated. This requires using redundant and layered security controls, constant monitoring, and collection of telemetry to detect anomalies and respond in real-time. Verify Explicitly No network traffic, component, action, or user is inherently trusted within a zero-trust security model, regardless of location, source, or identity. Trust only to the extent that you verify the identity, authenticity, permissions, data classification, etc. Principle of Least Privilege Always grant the least number of privileges. Only give access for the time that it is needed and remove access when it is not needed anymore. Least privilege access is essential to reduce the attack surface, limit the "blast radius," and minimize an attacker's opportunity to move laterally within an environment in case of compromise. Zero-Trust Security in a Microservices Environment When a microservice is compromised, it may maliciously influence other services. By applying the principles of zero trust to a microservices environment, the trust between services, components, and networks is eliminated or minimized. Identity and Access Management Identity and access management is the backbone of zero trust, which requires strong authentication and authorization of end-user identities, services, functions, workloads, and devices. To enable authentication and authorization, we must first ensure that each workload is automatically assigned a cryptographically secure identity that is validated on every request. Importantly, ensure that there is an automated mechanism to reliably distribute, revoke in case of compromise, and frequently rotate the services' certificates and secrets. Use a cloud-neutral identity for workloads, such as SPIFFE for authentication and OPA for unified authorization across the stack. Secure Service-To-Service Communications In zero trust, it is fundamental to treat the network as adversarial. Thus, all communication between services, APIs, and storage layers must be encrypted. The standard way of protecting data in transit is to use HTTPS and strict mTLS everywhere. Similarly, a strong authentication mechanism should be enforced across all microservices. It must be understood that not every service that can be authenticated should be authorized. Authorization must be done based on the authentication context and on access control policies, and it should be performed at the edge of each microservice — not at the network edge. To achieve this, use a service mesh, like Istio or Linkerd, for: Automatic certificate management Traffic interception Secure service-to-service communication without application code changes Micro-segmentation (via authorization policies) This reduces the blast radius of an attack and prevents attackers from pivoting from one compromised service into other parts of the infrastructure. In a container orchestration environment, such as Kubernetes, define network policies for egress and ingress isolation at a granular level. Enforce zero trust for all traffic (east-west and north-south) by specifying network policies and service-to-service level RBAC policies that limit access per cluster and per source, following the need-to-know principle. Secure Access to Resources External entities must not access the microservices environment directly. Instead, use an API gateway as a single entry point to the microservices deployment. To pass the user context or the identity of the caller, implement a pattern, such as the phantom token pattern (API Security in Action, part 11.6.1) or the passport pattern. Validate the external access token and user context at the edge and generate a new short-lived token that represents the external entity identity and is cryptographically signed by the trusted issuer and propagated to back-end microservices. Ensure that the new token's scope of access is as limited as the scope of the identity of the external entity. Most importantly, assume that access tokens can be stolen and create access tokens with a short lifespan on a resource-by-resource basis. Use a service mesh to verify the validity of the access tokens at the microservice edge. In all cases, access to resources should be granted using fine-grained role-based access controls with the least privileges. Figure 1: Data in-transit, data at-rest, and data in-use encryption Data Security It is essential to ensure that all data is classified according to their secrecy and confidentiality. Create a data registry to know which microservice handles what data. Then, implement multiple layers of data encryption, depending on the data classification. Do not trust only the encryption of external components (including databases and messaging systems like Kafka). Use application-level encryption (ALE) to transfer personally identifiable information (PII) and highly confidential data between microservices. To mitigate the risk of unauthorized data modification, perform data integrity checksums throughout the data lifecycle. Infrastructure Security Adopting an immutable infrastructure has become standard. Use Infrastructure as Code to provision components upfront and never change them after deployment. Do not trust the storage mediums (persistent or temporary) and do not store any sensitive data or secrets in an unencrypted form. All secrets, certificates, and API keys should be securely stored in access-controlled centralized key vaults. Zero trust always assumes that the network is compromised. To contain a possible compromise and prevent lateral spreading through the rest of the network, implement network micro-segmentation, create software-defined perimeters in each segment, and place microservices in each segment according to their functionality, business domain, and data classification. Communication between segments should be well-defined and controlled through API gateways. Consider adopting a cell-based architecture for inter-segment communication. Container and Cluster Security Zero trust requires the explicit verification of container images, containers, and cluster nodes. Thus, use container images that are signed only from trusted issuers and registries. Allow images to be used only if they are scanned in the DevSecOps pipeline and have no vulnerabilities. To reduce the risk of privilege escalation, run the Docker daemon and all containers without root privileges. One standard way is to run Docker in rootless mode. Logically isolate high-risk applications and workloads in the same cluster for the least number of privileges. Runtime Security Consider running security-sensitive microservices on confidential virtual machines in hardware-based trusted execution environments with encrypted memory. To reduce the risk of rogue or compromised nodes in the cluster, verify the integrity of nodes, VMs, and containers by running them on instances enabled with Secure Boot and Virtual Trusted Platform Module. Also, by running containers in read-only mode, filesystem integrity is achieved and attackers are prevented from making modifications. Finally, we can reduce our trust for the runtime by adopting a RASP solution that inspects all code executed by the runtime and dynamically stops the execution of malicious code. Figure 2: Zero-trust runtime via confidential computing and RASP Image adapted from "Application enclave support with Intel SGX based confidential computing nodes on AKS," Microsoft Azure Documentation Conclusion Implementing a zero-trust architecture is a critical defense-in-depth strategy and has become a mandatory security model in modern IT infrastructures. It is important to understand that implementing a zero-trust architecture does not mean zero security incidents. The goal is to continually layer security controls to increase the cost of attacks. As we introduce more friction into the cyber-attack kill chain, the attacker's value proposition will be reduced, and potential attacks will be disrupted. The key to a successful implementation of a zero-trust architecture is to follow the guidance of whitepapers such as NIST's "Planning for a Zero Trust Architecture" and the U.S. Office of Management and Budget's "Moving the U.S. Government Towards Zero Trust Cybersecurity Principles." In this article, we provided an overview of how to apply the core principles of the zero-trust model in a microservices environment, and we examined the critical areas and the zero-trust security goals of microservices that need to be achieved. The highly distributed and heterogeneous nature of a microservice deployment and its complex communication patterns has increased the number of different components and the volume of data that is exposed on the network. This provides a broader attack surface compared to a traditional deployment of a monolithic application. Because the security of a system is as good as its weakest link, applying the zero-trust core principles to proactively secure all layers and components of a microservices deployment is fundamental for a modern, reliable, and mature cybersecurity strategy. With a proper zero-trust strategy for microservices, the risk of compromised clusters, lateral movement, and data breaches in most cases can be eliminated. Zero trust is a necessary evolution to security; however, its implementation should not be a destination. It is a continuous journey and an organization-wide commitment. Since its inception, zero trust has become a widely deployed security model and a business-critical cybersecurity priority. Microsoft's 2021 Zero Trust Adoption Report confirms that point on page 11, indicating that 76 percent of organizations have started adopting a zero-trust strategy. The industry is rapidly adopting zero trust across the whole infrastructure and not just on end-user access. This is an article from DZone's 2022 Enterprise Application Security Trend Report.For more: Read the Report
In the previous article in this series—The Everything Guide to Data Collection in DevSecOps—we discussed the importance of data collection. In this article, we'll explore the role of monitoring in observability, especially as it relates to security, performance, and reliability. Monitoring is essential for detecting issues and outliers as they happen in production and allows DevSecOps teams to identify and address issues before they cause serious damage. Monitoring performance degradations or suspicious activity can result in alerts and automatic responses to isolate potential problems or attacks. In this article, we’ll look at monitoring in detail, provide several use cases and best practices, and discuss how monitoring can specifically improve security, performance, and reliability through observability. What Is the Role of Monitoring in Observability? In an observable system, we collect data from logs, metrics, and distributed traces. And while for very small systems you can manually browse and search the logs, visualize the metrics as charts, and trace through diagrams showing how traffic flows through the system in order to identify problems—at scale, this is not enough. You need monitoring, an automated process that keeps an eye on this data and alerts you appropriately. (For a more detailed treatment on the difference between monitoring and observability, you can check out this resource.) In an enterprise, you need automated ways to filter, aggregate, enrich, and analyze all this data. Enterprises also need automated ways to take action when something unusual is detected. The automated response can notify the responsible team or even take remediating action directly. In other fields like medicine, monitoring the vital signs of patients is a key activity, which saves lives. Monitoring software systems is very similar, and we even use the same methodology when performing health checks and discussing the health of different components. Enough theory, let's look at some concrete examples of monitoring. Use Cases of Monitoring for Observability Here are some typical use cases that take advantage of monitoring: Web applications are a major part of many large-scale distributed systems and are key to the success of digital-first businesses. Monitoring Kubernetes, containerized applications, or simply the web server logs for an excessive appearance of error codes (such as 4xx or 5xx) can help a team address performance and reliability issues before they become significant problems. At the infrastructure level, it is important to monitor the CPU, memory, and storage of your servers. Like most enterprises, you're likely using autoscaling so your system can allocate more capacity. Platform logs capture when there are changes to resources, such as when they are provisioned, deprovisioned, or reconfigured. However, monitoring these resource metrics and logs can help you ensure you’re working within quotas and limits, and it can help your organization when it comes to resource planning and budgeting. Datastores are at the heart of most large-scale systems. If your data is lost, corrupt, or unavailable, then you have a serious situation on your hands. To keep track of your data you need to monitor database connections, query duration metrics, disk space, backups, and error rates. You should also understand your datastores and set alerts when you observe values that are outside the expected range, such as slow queries, high rate of errors, or low disk space. You can also set up logging for your databases to capture connections, queries, and changes to fields or tables. Monitoring your database logs can help you detect not only where you can improve your performance and reliability, but also security if malicious (or unintentional) operations are being performed. Note that monitoring is much more involved than setting a simple condition (such as “more than five INSERT queries to the orders database within two minutes”) and firing an alert when that condition is met. Seasonality may be at play, with usage patterns that cause spikes at certain times of the day, week, or year. Effective monitoring that detects unexpected behavior takes context into account and can recognize trends based on past data. This type of monitoring, especially when implemented with a tool that combines observability, monitoring, and security at scale can be immensely effective, such as in this case study from Sumo Logic and Infor, where Infor was able to save 5,000 hours of time spent on incidents. How Does Monitoring Contribute Specifically To Improving Performance and Reliability? Monitoring improves the performance and reliability of a system by detecting problems early to avoid degradation. Performance problems often become availability and reliability problems. This is especially true in the presence of timeouts. For example, suppose an application times out after 60 seconds. Due to a recent performance issue, many requests suddenly take more than 60 seconds to process. All these requests will now fail, and the application is now unreliable. A common best practice for addressing this is to monitor the four golden signals of any component in the critical path of high-priority services and applications: latency, traffic, errors, and saturation. Latency How long does it take to process a request? Note that the latency of successful requests may be different than failed requests. Any significant increase in latency may indicate degrading system performance. On the other hand, any significant decrease might be a sign that some processing is skipped. Either way, monitoring will bring attention to the possible issue. Traffic Monitoring traffic gives you an understanding of the overall load on each component. Traffic can be measured in different ways for different components. For example: REST API: The number of requests A backend service: The depth of a queue A data-crunching component: The total bytes of data processed An increase in traffic may be due to organic business growth, which is a good thing. However, it may also point to a problem in an upstream system that generates unusually more traffic than before. Errors An increase in error rates of any component directly impacts the reliability and utility of the system. In addition, if failed options are automatically retired, this can lead to an increase in traffic, and this may subsequently lead to performance problems. Saturation Out of the resources available, how much is the service or application using? This is what saturation monitoring tells you. For example, if a disk is full, then a service that writes logs to that disk will fail on every subsequent request. At a higher level, if a Kubernetes cluster doesn’t have available space on its nodes, then new pods will be pending and not scheduled, which can lead to latency issues. As you notice, the four golden signals are related to one another. Problems often manifest across multiple signals. How Does Monitoring Contribute Specifically To Improving Security? While any system health problem can directly or indirectly impact security, there are some direct threats that monitoring can help detect and mitigate. Any anomaly, such as excessive CPU usage or a high volume of requests, may be an attacker trying to cause segmentation faults, perform illegal cryptomining, or launch a DDoS attack on the system. An unusual number of packets hitting unusual ports might be a port-knocking attack. A high number of 401 errors (authentication errors) with valid usernames and invalid passwords might be a dictionary attack. A high number of 403 errors (forbidden access) may be a privilege escalation by an attacker using a compromised account. Invalid payloads to public APIs resulting in an increase in 400 errors might be an attacker trying to maliciously crash your public-facing web applications. A download of large amounts of data or any sensitive data outside of business hours might be an exfiltration attack by a compromised employee or rogue insider. Best Practices for Monitoring to Improve Performance and Security A system is made of multiple components, but it is more than the sum of its parts. At a basic level, you should monitor every component of your system (at least on the critical paths) for the four golden signals. What does this mean in practice? Observing the key metrics Establishing the metric ranges for normal operation Setting alerts when components deviate from the acceptable range You should also pay close attention to external dependencies. For example, if you run in the cloud or integrate with a third-party service provider, then you should monitor the public endpoints that you depend on and set alerts to detect problems. If a third party is down or its performance is degraded, this can cause a cascading failure in your system. It’s not possible to have 100% reliable components. However, monitoring can help you create a reliable system from non-reliable components by detecting problems with components—both internal and external—and either replacing them or gracefully degrading service. For example, if you are running your system in a multi-zone configuration and there is a problem in one zone, then monitoring can detect this and trigger re-routing (manually or automatically) of all traffic to other zones. For security, the four signals may be auxiliary indicators of a compromise too. This is especially the case, for example, if you see a spike in your endpoint device or cloud workload CPUs, or an increase in the number of failed login attempts. However, security monitoring must be very deliberate since you deal with malicious adversaries. You must define the attack service of each component and the entire system and ensure the information you are collecting is sufficient to detect issues. For example, to detect data exfiltration, you can monitor the IP addresses and amount of data sent outside your internal network by different applications and services. If you don’t have that data, you will be blind to that attack methodology. Implementing a Monitoring Strategy Once you set up your data collection, you can follow the steps below to roll out a robust and effective monitoring strategy. 1. Identify Critical Assets You have already performed a comprehensive inventory of all your assets as part of data collection. Now, your task is to identify the critical assets that must be monitored closely to prevent and mitigate disasters. It is easy to say, “just monitor everything,” but there are costs to consider with monitoring. Monitoring and raising alerts for your staging and developing environments or experimental services can put a lot of unnecessary stress on your engineers. Frequent 3 AM alerts for insignificant issues will cause alert fatigue, crippling your team’s drive to address an issue when it really matters. 2. Assign an Owner for Every Critical Asset Once you identify the critical assets, you need a clear owner for each one. The owner can be a person or a team. In the case of a person, be sure to also identify a fallback. It’s also important to maintain asset ownership as people join and leave the organization or move to other roles and teams. 3. Define Alerts for Critical Assets Ultimately, your monitoring strategy will live or die based on how you define alerts for assets that are unhealthy or potentially compromised. You need to understand what's normal for each asset. If you’re monitoring metrics, then defining “normal” means associating an attribute (such as CPU utilization) with a range of values (such as “50%-80%”). The normal band can change dynamically with the business and can vary at different times and different locations. In some cases, you may just have ceilings or floors. By defining normal ranges, you create alerts to notify an asset owner when their asset is operating outside of the normal range. If you’re monitoring logs, then alerts are usually defined based on the result of certain log queries (such as “number of 404 errors logged across all API services in the last five minutes”) satisfying or failing a condition (such as “is fewer than 10”). Log management and analytics tools can help. 4. Define Runbooks for Every Alert When a critical alert fires, what do you do? What you don’t want to do is try to figure out your strategy on the spot, while customers are tweeting about your company's unreliable products and management is panicking. A runbook is a recipe of easy-to-follow-up steps that you prepare and test ahead of time to help you collect additional information (for example, which dashboards to look at and what command-line scripts to run to diagnose root cause) and mitigate actions (for example, deploy the previous version of the application). Your runbook should help you to quickly pinpoint the problem to a specific issue and identify the best person to handle it. 5. Set up an On-call Process You have owners, alerts, and runbooks. Often, the alerts are not specific enough to map directly to the owners. The best practice is to assign on-call engineers to different areas of the business. This on-call engineer will receive the alert, follow the runbook, look at the dashboard, and try to understand the root cause. If they can't understand or fix the problem, they will escalate it to the owner. Keep in mind that this process can be complicated; often, a problem occurs due to a chain of failures that require multiple stakeholders to collaborate to solve the issue. 6. Move Towards Self-Healing Runbooks are great, but maintaining complex runbooks and training on-call engineers to follow them takes a lot of effort. And in the end, your remediation process still depends on a slow and error-prone human. If your runbook is not up to date, following it can worsen the crisis. Theoretically, a runbook can be executed programmatically. If the runbook says, “when alert X fires, process Y should restart”, then a script or program can receive a notification of alert X and restart process Y. The same program can monitor process Y post-restart, ensure everything is fine, and eventually generate a report of the incident—all without waking up the on-call engineer. If the self-healing action fails, then the on-call engineer can be contacted. 7. Establish a Post-Mortem Process Self-healing is awesome, however, an ounce of prevention is worth a pound of cure, so it’s best to prevent problems in the first place. Every incident is an opportunity to learn and possibly prevent a whole class of problems. For example, if multiple incidents happen because buggy code makes its way to production, then a lesson from incident post-mortems could be to improve testing in staging. If the response of the on-call engineer to an alert was too slow or the runbook was out of date, then this may suggest that the team should invest in some self-healing practices. Conclusion Monitoring is a crucial part of observability in general and observability for security in particular. It’s impractical at a large scale for humans to “just look every now and then” at various dashboards and graphs to detect problems. You need an entire set of incident response practices that include identifying owners, setting up alerts, writing runbooks, automating runbooks, and setting up on-call processes and post-mortem processes. Have a really great day!
This is an article from DZone's 2022 Enterprise Application Security Trend Report.For more: Read the Report Building secure mobile applications is a difficult process, especially in the cloud. We must consider that mobile platforms, like iOS and Android, have completely different architectures and quality guidelines. Also, we need to take care of our cloud architecture on the back end. In this article, we will have a look at the top six security vulnerabilities, OWASP's best practices for building/testing iOS and Android applications, and guidelines for iOS and Android. Last but not least, we will explore an example of DevSecOps for mobile applications. Top Three Attack Examples To understand the importance of security for mobile apps, let's first look at three of the most prominent hacks of mobile apps that led to huge financial and marketing issues for the affected companies. ParkMobile Breach In the cyber attack on the ParkMobile app in 2021, hackers managed to steal 21 million user accounts. According to Security7, hackers managed to steal telephone numbers, license plate numbers, and email addresses. It seems like all the unencrypted data were stolen passwords. However, credit cards were encrypted, so hackers didn't manage to encrypt data as the keys weren't stolen. Juspay Data Leak Juspay, a payment operator that provides services for Uber, Amazon, Swiggy, and Flipkart, was hacked through their mobile app in August 2020. The hacker stole 35 million records, including credit card data, fingerprints, and masked card data. Walgreens Mobile App Leak In 2020, Walgreens' mobile app had integrated malware that watched personal messages and info. It resulted in a lot of user data being compromised, including names, prescription numbers, and addresses. Top Six OWASP Security Vulnerability Types in iOS and Android Before we jump into iOS and Android guidelines and OWASP Testing Guides, let's look at the top six OWASP vulnerability types: Vulnerability Type Description Authentication issues, insecure communication A mobile application has unencrypted UI forms, algorithms, and protocols to authenticate. An attacker uses the fake app/malware to scan and observe the application transport layer. Also, weak passwords, using geolocation to authenticate users, or using persistent authentication may lead to sensitive data leaks. Reverse engineering This vulnerability allows an attacker to analyze and obfuscate the targeted application. This may lead to sensitive data leakage that is hard coded in application configuration variables or constants. In addition, attackers may find URLs and configs to the back-end servers. Data storage security vulnerability This vulnerability allows attackers to steal data from data storage. We partially link it with "improper platform usage." To prevent data leakage, we should use only encrypted data storage, avoid storing sensitive data (passwords, card numbers) in the device, encrypt data transfer, and use only encrypted storage OS features (e.g., iOS Keychain). We can reference CWE-922 of the mobile vulnerability registry. Improper platform usage This type of attack relies on the issue of developers not using (or improperly using) security features that are included in the operation system. Security features include Face ID, iOS Keychain, and Touch ID. For example, developers may use insecure local storage instead of iOS Keychain to store sensitive data. Code tampering Code tampering is when an attacker downloads an app and makes code changes. For example, they create fake registrations or payment forms and then upload apps back to the market or create cloned ones. It can also be a fake app (such as free mobile cleaning tools or free games in app stores) that can modify the code of another app. Usually, banking apps are one of the scenarios to target, and Mobile ZeuS or Trojan-Spy can be used to steal mobile TAN code. In my opinion, this is a list of the most important vulnerability types. However, OWASP provides a list of 10, and it also provides standards and testing guides. We will cover these in the next section. OWASP Mobile Application Security Fundamentals OWASP mobile application security fundamentals consist of several sources and contain OWASP Mobile AppSecurity Verification Standard (MASVS), OWASP Mobile Application Security Testing Guide (MASTG), and the Mobile Security Checklist. Below in Figure 1, you will see the fundamentals of mobile application security in detail: Figure 1: OWASP mobile app security fundamentals Let's have a more detailed look at the mobile app checklist. Mobile Application Security Checklist The Mobile Application Security Checklist is a part of the MASTG. It is a set of rules/checks that a dev team should include when securing a mobile app. It contains more than 100 rows and is organized by the following categories: Architecture, Design, and Threat Modeling Requirements Data Storage and Privacy Requirements Cryptography Requirements Authentication and Session Management Requirements Network Communication Requirements Platform Interaction Requirements Code Quality and Build Setting Requirements Resilience Requirements Each rule (or check) has an identification code and description. All rules have priority marks. "L1" or "L2" means that the application should have the rule/check implemented. "R" means that it is required, so the team must implement everything marked "R." Download the full example on OWASP's website. Next, let's focus on guidelines for specific platforms, with attention to the most popular ones: iOS and Android. Secure Mobile Apps in iOS and Android: Guidelines As we have already partially touched some iOS security APIs, we will continue discussing it with the addition of Android. In the first section below, I've gathered guidelines and best practices about iOS API security features. Apple App Sandbox, Data Protection API, and Keychain The Apple App Sandbox provides an API to isolate an app and prevent access to the main system or other apps. It's based on UNIX's user permission and ensures that apps get executed with a less privileged "mobile" user. Also, it includes address space layout randomization (ASLR) and ARMs Never eXecute, which prevent memory-related security bugs and stops malicious code from being executed. The Data Protection API allows an app to encrypt and decrypt its files, and it may solve several security issues like authentication and reverse engineering. Each file has four available protection levels, and by default, it's encrypted with the first user authentication. However, we should increase the level to provide the highest protection. Last but not least, the keychain. It provides secured hardware-accelerated data storage. iOS provides this API to store certificates and passwords with the highest level of security. For each item in the keychain, we can define specific access policies. Especially when the user needs to request Face ID or Touch ID, the biometric enrollments won't change since the item was added to the keychain. Figure 2: Keychain API Android-Encrypted Key-Value Storage, File Encryption, and Cryptographic APIs Same as iOS, Android also has many similar features to store data securely. The first one is key-value storage. It allows storing data using SharedPreferences to set a scope of visibility for items in the storage. We need to keep in mind that stored values are not encrypted by default. Therefore, malware may have access to the data. If we need to encrypt data manually, we benefit from using Cryptographic API. We can generate a secure key with KeyGenerator, then save and extract the encrypted value to Android Keystore. To work securely with files and external storage, Android has the Cryptography Support Library. It supports a lot of cryptography algorithms to encrypt/decrypt files. HTTPS, SSL Pinning, and Push Notifications A secured communication layer is the next big milestone to a secured app. First, we need to ensure that we are using HTTPS. iOS has a feature called App Transport Security (ATS) that blocks insecure connections by default, so all connections must use HTTPS/TLS. In addition, the SSL pinning feature helps to prevent man-in-the-middle attacks. It will validate the system certificate if it were signed by a root certificate authority. To use this feature, the app should run additional trust validation of server certificates. Push notifications are another part that should be secured. We should use Apple's Push Notification service (APNs) and the UNNotificationServiceExtension extension . This will allow us to use placeholders for sensitive mobile app data and send encrypted messages. Also, consider using Apple's CryptoKit. It is a new API introduced in iOS 13 that provides the following features: Hashing data Authenticating data using message authentication codes Performing key agreement Creating and verifying signatures Android has similar options. It allows only HTTPS to transport encrypted data with TLS. And it is the same story for SSL pinning. To prevent man-in-the-middle attacks, we can perform additional trust validations of the server certificates. Secure Mobile Apps in Azure and AWS To build secure applications, Azure has services such as the Azure App Center. It allows for the building and distribution of mobile apps and provides a lot of security options: Data transit encryption – support HTTPS using TLS 1.2 by default; also encrypted at rest Code security – provides multiple tools to analyze code dependency to detect security vulnerabilities Authentication – contains features like Microsoft Authentication Library (MSAL), which supports multiple authorization grants and associated token flows Alongside Azure, AWS has some powerful services to consider when building a secure mobile app. Take AWS Cognito as an example. It is a user-state service with options to develop unique identities for users. It supports: Secure app authentication Enabling developers to include user sign-up Easy sign-in and access control focused on web and mobile apps AWS has one unique service with the name AWS Device Farm. It provides not only automated testing and simulation environments, but also contains features to validate app dependencies and run security checks. Now let's move on to an example of building a DevOps process with security features. An Example of DevSecOps for Mobile Applications In this section, I've created an example of a DevSecOps scenario to deliver secure mobile applications (see Figure 3). This process can be reused within the most popular CI/CD platforms and cloud providers: Git operation steps – Contains standard commit/push operations when the source control triggers a build. Run static analyses and code linting steps – Validates the code styles, usability, data flow issues, and security issues (e.g., Xcode Static Analyzer). Dependency validation step – Provides excessive validation checks through the library tree used in the app. This validation step may reveal a fake, malicious library that can manipulate code or even steal personal user data. Application log validation step – Checks if logs contain sensitive data like environment passwords, test tokens, or authorization data. After the dev/test process, the application package may contain some sensitive data as developers may not notice it after debugging the app. (This step can be run after deployment to the dev/test environment as well). QA steps: – Deploy the app to the dev/test environment for the QA team to test. – Promote and deploy the app to the marketplace validation. Figure 3: Common DevSecOps process of a secure mobile app Conclusion In this article, I've provided a short guide on secure mobile applications. We discovered that the OWASP community has major security fundamentals, and OWASP can be used as a strong base for building a new app or refactoring an existing one. Knowing cloud services and examples of DevSecOps allows us to start building secure mobile apps with minimum effort and makes it harder for an attacker to compromise our app. Also, we went through iOS and Android security features, security APIs, and discovered how to use them properly. This is an article from DZone's 2022 Enterprise Application Security Trend Report.For more: Read the Report
This is an article from DZone's 2022 Enterprise Application Security Trend Report.For more: Read the Report A wave of cyber incidents in recent years, such as the SolarWinds supply chain attack, Accellion data breach, Exchange Server, and Log4j vulnerabilities, have exposed the "fragility" of modern businesses and the challenges in information security. The outcome of a cyber event can range from a minor business operations disruption to "crippling financial and legal costs" (Alan P., GCST). Despite how increasingly sophisticated threat actors have become, the typical attack scenarios remain the same, and thus analysts and responders can adequately address these events using previous tactics. Effective incident response (IR) playbooks are the best antidote to unpredictable attacks. A practical IR playbook is key to stay organized and lead to minimal risk. According to Jyotsana Gupta, an "incident response is a process that allows you to respond quickly and effectively to a cybersecurity breach" (Wire19). IR playbooks enable analysts to respond to an incident consistently, ensure correct procedures are followed, and provide organizations with a roadmap to determine where processes can be automated and enhanced to improve critical response time. What goes into creating an IR playbook? We will discuss how to design an IR playbook for an organization, covering topics such as assembling your IR team, identifying critical systems, creating notification procedures, and conducting post-incident review. For those eager to design a large-scale IR playbook, NIST and SANS are the two most popular incident response frameworks for granular IR planning approaches. While they differ in categorizing the incident response phases, both follow the same basic process. How to Build an Incident Response Playbook What's your plan? While Indiana Jones might say, "I don't know, I'm making this up as I go," good luck trying that line on customers and regulators. To begin drafting the IR playbook, some companies require engagement from critical business units to form an incident response (IR) team, while other large organizations may already have an established, dedicated computer security incident response team (CSIRT). The core of the IR or CSIRT team will usually be IT or cybersecurity staff, and the ideal sponsor is from the C-Suite, such as the CISO, who will help empower the team with resources to act swiftly and drive accountability across the organization. Other members may be drawn from the IT operations staff, the vulnerability and risk management team, security engineers and architects, and intelligence analysts. Extended partners may include other capabilities, such as PR, HR, and legal. Figure 1: Hierarchy of a CSIRT/IR team Identify the Crown Jewels The next step to the IR playbook is to identify the "crown jewels" of the organization — the critical systems, services, and operations that, if impacted by a cyber event, would disrupt business operations and cause a loss of revenue. Similarly, understanding the collected data type, how it is transmitted and stored, and who should access it must be mapped to ensure data security. Identifying and mapping critical systems can be accomplished through penetration tests, risk assessments, and threat modeling. A risk assessment is often the first tool to identify potential attack vectors and prioritize security events. However, to achieve a proactive stance, organizations are increasingly leveraging threat intelligence and modeling to identify and address vulnerabilities and security gaps early on before a known attack occurs. The primary goal is to identify weaknesses or vulnerabilities with assets to reduce the attack surface and close all the security gaps. This guide will focus on web application security as our attack scenario. Why web application security? Applications have become the backbone of most organizations, making web applications, websites, and web-based services a prime target for malicious threat actors attempting to exploit vulnerable application code. As is stated by Gupta: Web applications are attractive targets for the following reasons: Complexity – Web applications have inherent complex source code, making it likely that an app contains unpatched vulnerabilities or is open to code manipulation. High-value rewards – Attackers can manipulate source code to access valuable data, including personal and sensitive business information. Easy execution – attacking web applications is usually straightforward with automation and large-scale attacks targeting multiple sites simultaneously (Wire19). Understand the Threats The next step is creating notification procedures led by the IR or CSIRT team. In the attack scenario (Figure 2), the IR team or CSIRT will often begin with basic security procedures of securing web applications, such as deploying and configuring a web application firewall (WAF) or configuring access control policies. The security team might block cyber attacks using WAF rules to block active exploits and prevent further damage. Figure 2: Sample attack scenario response workflow Web application configurations allow and deny access by creating policy rules in the web access layer. A baseline example is focusing on URL category filtering by creating policies: Allow users to access all organizations' approved social networking sites such as LinkedIn and block other sites such as TikTok. Allow users to access personal email accounts but prevent sending email attachments. When WAFs and configurations are in place, the web application is monitored for suspicious activity. The communication aspect of the IR playbook comes into play when careful logging and monitoring at the application level detect suspicious activity, such as repeated access attempts or unexpected user accounts being created. Suspicious activity is often detected using a security information and event management (SIEM) solution or through regular security testing using a web vulnerability scanner (Alan P., GCST). After detecting a security incident, the IR team should "triage it," meaning they should determine the appropriate action to limit short-term consequences and stop minor incidents from growing into large-scale attacks (Alan P., GCST). But what happens when the cyber event cannot be contained, and the evidence of a data breach is quickly mounting? According to Alan P., "global cyberattacks have served as a reminder that plugging security holes is often the easiest part." Threat actors such as advanced persistent threats (APT) don't conduct hit-and-run attacks. These attacks "infiltrate target systems to maintain a stealthy and persistent presence. Eliminating the entry point is the start of a long and arduous process" with the IR or CSIRT team (Alan P., GCST). Determine Communication Channels When outages in programs or degradation of system performance occur, communication must be clear and effective. Communication will directly impact your team's responsiveness to threats. While the CSIRT team is focused on analysis, containment, and remediation, proper incident response communications to leadership and read-in members are critical to success. Managers should be informed of the situation and understand the implications to give their team the next steps to respond to an incident. Many users have questions such as whether the users should keep working or turn off their machines. What is worse are users unplugging machines with the belief it will stop the issue from spreading (David Landsberger, CompTIA). The next phase is to manage staff and customers. Gupta notes: After a data breach, there might be a legal obligation to notify data owners (i.e., per the GDPR). The jurisdiction and data class will determine whether you must report the incident to the relevant authorities and provide updates when new information is available (Wire19). A few examples of how to communicate an incident to staff and customers include: An email for internal incident response A status page A message via chat channels within your team (e.g., Slack) Outward-facing communication with clients and customers through public relations teams Enable Postmortems Molly Star (Lightstep Blog) states: "Incident response management doesn't end when the incident is resolved. Your playbook should detail your postmortem process, from documentation and discussion to action items." Postmortem follow-up is essential because the organization needs to review how they can improve the security of their systems and, in other words, "harden" their perimeter. The first step is to identify the root cause of the cyber event via the logs on your firewall/WAFs or through the SIEM. By identifying the root cause, remediation can begin to prevent the same or similar cyber events. No solution is perfect, and even if an organization is doing everything right, it is best to adopt an assume-breach mentality. Figure 3: Summary of incident response process Conclusion Data breaches are inarguable and demonstrate a failure to secure systems, and it's on the organization to respond quickly and effectively. Having a tested IR playbook in hand, continually practiced through tabletop exercises, can help your team cross the finish line while minimizing material impact to the company. Incidents can have a lasting impact, even if adequately handled. A developer-driven, security-first mindset is worth considering. It has a positive ripple effect on the business if developers and security teams can work together and learn lessons from previous incidents. This guide is a wake-up call to implement holistic information security practices that have both developers and security teams moving toward the same goal as a collective. Dedicating time to threat model with developers to build secure code upfront is a small resource investment that has the potential to yield significant gains for the business in the long term. After all, it can shrink the attack surface, reducing impact should an incident occur. References: "An Incident Response Plan for Your Website" by Jyotsana Gupta (Wire19) "What Is an Incident Response Plan and How to Create One" by David Landsberger (CompTIA) "Incident Response Steps in Web Application Security" by Alan P. (GCST) "Building an Incident Response Playbook" by Molly Star (Lightstep Blog) This is an article from DZone's 2022 Enterprise Application Security Trend Report.For more: Read the Report
AWS Identity and Access Management (IAM) is a service that enables you to manage users and user permissions for your AWS account. With IAM, you can create and manage users, groups, and policies that control access to Amazon EC2 instances, Amazon S3 buckets, and other AWS resources. This article will discuss the basics of AWS IAM: what it is, how it works, and how you can use it to secure your AWS account. What Is IAM Used For? IAM is used to manage the security of user access to AWS resources. It is basically responsible for managing user life cycles, meaning creating accounts, assigning roles, granting access, deleting accounts, enforcing policy, and more. With IAM solutions in place, organizations can enable secure access and authentication of user accounts while minimizing the risk of unauthorized access. You can manage users and groups, assign permissions, and control user access to your AWS resources. For example, you could create a group of users with permission to view Amazon S3 buckets but not modify them or create a user that only has permission to manage EC2 instances. How Does IAM Work? AWS IAM provides access control through the use of policies. Policies are documents that define who has access to what resources and what actions they can take on those resources. For example, you could create a policy that allows only certain users to view S3 buckets or modify EC2 instances. Once you've created your policies, you assign them to users or groups of users. Then, when an AWS user attempts to access a resource, IAM evaluates the user's permissions against the policy assigned to them and either grants or denies access accordingly. AWS IAM Components AWS IAM consists of four core components: users, groups, roles, and policies. Users Users are individual AWS accounts that can be granted access to your AWS resources. You can assign users specific permissions with policies or assign them to groups so they inherit the group's permissions. This means you can give different levels of access to certain services and control what types of actions each user is able to perform. Groups Groups are collections of users that share the same set of permissions. When you assign a policy to a group, all members of the group will receive those same permissions. AWS IAM groups provide a secure and consistent way for teams with varying needs and roles to access cloud resources without needing multiple administrative logins. Policies Policies define what actions a user or service may take on AWS resources. They are written using JSON and contain one or more statements that control who has access, what actions they may take, and which resources they can access. Policies are assigned to users or groups and govern how they interact with AWS resources, such as Amazon S3 buckets and EC2 instances. Below you can find an example of JSON policy syntax from the IAM documentation: JSON { "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::example_bucket" } } Roles Roles are similar to groups. They also have associated policies, but roles are not tied to a particular user or group. They can be used to grant limited access to applications and users, allowing for greater security and control over resources. For example, an IAM Role can be assigned to an IAM user, and this role will determine what part of the AWS environment they have access to, such as EC2 instances or S3 buckets. Each IAM Role also includes a set of permissions rules which further limit what user activities can be performed within that role's scope. Using AWS IAM The AWS IAM console is the main interface for managing users, groups, and policies. From here, you can create new users and groups, assign policies to them, manage existing user permissions, and view access logs. You can also use the AWS CLI or APIs to manage your IAM resources from the command line or programmatically. This allows you to integrate IAM into automated processes, such as setting up EC2 instances or deploying applications. The console provides a graphical user interface for managing IAM components, while the CLI is used for more complex tasks like creating custom policies. Features of the Identity Access Management AWS IAM provides a number of features to help you manage your users and resources. Here are some of the key features: Multi-factor authentication (MFA): MFA can be used to increase security by requiring users to provide additional forms of identification, such as FIDO security keys, TOTP hardware tokens, or time-based one-time passwords generated from a virtual authenticator app. Access control lists (ACLs): ACLs can be used to restrict access to specific resources or actions on those resources. For example, you can create an ACL that only allows certain users to view S3 buckets but not modify them. Identity federation: Identity federation enables users from other systems, such as Active Directory, to log in with their existing credentials. This can be used to simplify user management and reduce the burden of maintaining separate accounts for each system. Identity and access auditing: IAM provides audit logs that track user activities such as login attempts, policy changes, and resource accesses. These logs can be used to monitor user activity and detect potential security issues. AWS IAM is an essential part of any AWS account. It provides a secure way to manage users and resources and control who has access to what resources. With IAM, you can create policies that define user permissions, assign them to users or groups, and use MFA and ACLs for additional security. The audit logging features allow you to monitor user activity and detect potential issues. In addition, AWS IAM is an important tool for ensuring your AWS account remains secure and compliant with industry standards. Conclusion This article has provided an overview of the basics of AWS IAM, what it is, how it works, and how you can use it to secure your AWS account. To learn more about IAM, including creating users and groups, assigning permissions with policies, and managing user access logs, be sure to check out the official Amazon documentation on Identity Access Management.
This topic has come up a few times this year in question period: arguments that quality bugs and security bugs "have equal value," that security testing and QA are "the same thing," that security testing should "just be performed by QA" and that "there’s no specific skillset" required to do security testing versus QA. This article will explain why I fundamentally disagree with all of those statements. First, some definitions. A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. A security bug is specifically a bug that causes a vulnerability. A vulnerability is a weakness which can be exploited by a threat actor, such as an attacker, to perform unauthorized actions within a computer system. QA looks for software bugs (any kind); security testers look for vulnerabilities. This is the main difference — their goals. Just as all women are human beings, but not all human beings are women; while all security bugs are defects, not all defects are security bugs. Now let’s dissect each of the claims above. 1. Quality Bugs and Security Bugs "Have Equal Value" If a security bug can lead to a low risk vulnerability, it does not have "the same value" as a non-security-related bug that is making the system crash over and over. The same as if a security bug is creating a situation of a potential data breach, or worse, it’s not equivalent to the fonts not matching from page to page. I am of the opinion that security bugs are more likely to be able to cause catastrophic business harm than a regular bug, due to the fact that if your system has fallen under the control of a malicious actor, creativity is the only limit. Malicious actors never cease to amaze me with the damage they can do. Someone is wearing camouflage. — #MSIgniteTheTour, Toronto, 2019 2. Security Testing and QA Are "The Same Thing" The goals of security testing and quality assurance testing are different, which I feel makes them obviously different (if they were the same, why would they not be called the same thing?). However, I want to get deeper into this idea. Security is a part of quality. I often say “Security is a part of quality” because I believe this to be true. You cannot have a high-quality product that is insecure; it is an oxymoron. If an application is fast, beautiful, and does everything the client asked for, but someone breaks into it the first day that it is released, I don’t think you will find anyone willing to call it a high-quality application. There are many different types of testing: Unit Testing — small automate-able tests to verify small units of code (functions/subroutines) to verify it does the one thing it is supposed to do. Integration Testing — test between different components to ensure they work well together. Larger than unit tests, but less intense than end-to-end tests. End-To-End Testing — ensuring the flow of your application from start to finish is as expected. User Acceptance Testing (UAT) — manual and/or automated testing of client requirements (often used interchangeably with "QA"). User Experience Testing (UX) — verifying that the application or product is easy to use and understand from a user perspective). Regression Testing — verifying that new changes have not broken anything that was already tested, a "retesting" of all previously released functionality. Stress/Performance/Load Testing — verifying your application can handle large amounts of usage/traffic while continuing to perform well, generally performed using software tools (although each of these three have slight differences, they are all generally lumped in together). Security Testing — a mix of manual and automated testing, using one or more tools, with the aim to find vulnerabilities within applications. There are more types of testing, but I think you get the point. Some or all of these types of testing can be used to verify that a product is of high quality, and security is just one part. Therefore, security testing and QA are not "the same thing." 3. Security Testing "Should Be Performed by QA" For each one of the types of testing listed above, a different skillset is required. All of them require patience, attention to detail, basic technical skills, and the ability to document what you have found in a way that the software developers will understand and be able to fix the issue(s). That is where the similarities end. Each one of these types of testing requires different experience, knowledge, and tools, often meaning you need to hire different resources to perform the different tasks. Also, we can’t concentrate on everything at once and still do a great job at each one of them. Although theoretically you could find one person who is both skilled and experienced in all of these areas, it is rare, and that person would likely be costly to employ as a full-time resource. This is one reason that people hired for general software testing are not often also tasked with security testing. Another reason is that people who have the experience and skills to perform thorough and complete security testing are currently a rarity. There is a skill shortage, while as an industry we are lucky to have quite a number of skilled QA professionals, making them easier to hire and staff. Lastly, the amount of time, training, and experience that it takes to become a security tester versus a general software tester is more difficult to acquire. Training on how to perform security testing is extremely expensive and difficult to find, it generally takes longer to learn it as a skill than other types of testing, and there are fewer opportunities to get into that industry, when compared to QA. Thus, it is more difficult to become a security tester, when compared to general testing. Scarce resources, high demand, and expensive training means it costs significantly more to hire security testers than it does to hire general software testers. All of these facts lead up to the reality that it is cost-prohibitive to have staff your QA team with professionals who are skilled and experienced in both QA and security testing. This also means that you are creating a single point of failure for testing in your organization, which will not save you money in the long run. Another point on this topic: those who work in the security industry are likely to have a preference for their area of focus, security, and may be unwilling to perform other types of work outside their area of concentration. People who specialize generally want to work within their area of specialization, whenever possible, and security testing is a specialization. 4. "There’s No Specific Skillset" Required to Do Security Testing Versus QA First of all, I feel this statement is insulting to QA testers, as though they do not have a specific skillset that makes them good at what they do. I don’t believe that to be true. I suspect that when people make this argument that it is out of frustration with our industry, because I honestly cannot fathom someone thinking that security testing does not require specific experience, training, or skills; otherwise, there would be no skills shortage and it would not be a high-paying job. Security testing is a specialization within the field of testing, just as there are specializations within any field, and by definition it requires more knowledge and training to form the skillset in order to do the job. I do not intend to downplay the value of QA testing; I only explain that quality assurance is different from ensuring that a product is secure. I should also say that I feel that hacking is sometimes glorified in television, the media, and our industry as a whole, in a way that isn’t logical to me. Security testing is very important, but I do not believe that hackers are superior to other professionals who work in IT. In fact, I choose to focus my career on AppSec, DevSecOps, and other types of defense because I truly believe that it is more important that we write secure code than we "hack all the things." Security is so much more than just security testing (ethical hacking); it is secure design, secure coding, threat modelling, and so on. I feel comments like this (#4) are not based on facts, but feelings, and it’s difficult to debate with someone when that is the case. It is okay if we disagree on this topic. Debate is good and healthy, and I would love to hear your feelings, thoughts and ideas in the comments. At this point I’d like to remind you all that security is everybody’s job. Not only is it everyone’s responsibility to do their job in the most secure way they know how, but having many different people look at something with security in mind can help us find new and different problems that may have otherwise been missed.
Network security refers to the technologies, processes, and policies used to protect networks, network traffic, and network-accessible assets from cyberattacks, unauthorized access, and data loss. Organizations of all sizes need network security to protect their critical assets and infrastructure. Modern network security takes a layered approach to protect the many edges of the network and the network perimeter. Any element of the network could be an entry point for attackers—endpoint devices, data paths, applications, or users. Because organizations face numerous potential threats, it is common to deploy multiple network security controls designed to address different types of threats at different layers of the network and infrastructure. This is called a defense in-depth security approach. Top 5 Network Security Risks in 2023 Supply Chain Attacks Supply chain attacks exploit relationships between organizations and external parties. Here are a few ways an attacker could exploit this trust relationship: Third-party access: Companies often allow vendors and other external parties to access their IT environments and systems. If an attacker gains access to a trusted partner's network, they can exploit the partner's legitimate access to corporate systems. Trusted external software: All companies use third-party software and make it available on their network. If an attacker can inject malicious code into third-party software or updates, the malware can access trusted and sensitive data or sensitive systems in an organization's environment. This was the method used for the global-scale SolarWinds hack. Third-party code: Almost all applications contain third-party and open-source code and libraries. This external code could contain exploitable vulnerabilities or malicious functions that could be abused by an attacker. If your organization's applications are vulnerable or rely on malicious code, they are vulnerable to attacks and exploits. A high-profile example of a third-party code exploit was the Log4j vulnerability. Ransomware Ransomware is a type of malicious software (malware) designed to lock data on a targeted computer and display a ransom note. Typically, ransomware programs use encryption to lock data and demand payment in cryptocurrency in return for a decryption key. Cybercriminals often go to the deep web to buy ransomware kits. These software tools enable attackers to generate ransomware with certain functionalities and distribute it to demand ransom from victims. Another option for acquiring ransomware is Ransomware as a Service (RaaS), which delivers affordable ransomware programs that require little or no technical expertise to operate. It makes it easier for cybercriminals to launch attacks quickly and with minimal effort. Types of Ransomware There are many types of ransomware available for cybercriminals, each working differently. Here are common types: Scareware: This type imitates tech support or security software. Its victims might receive pop-up notifications claiming there is malware on their system. It typically continues to pop up until the victim responds. Encrypting ransomware: This ransomware encrypts the victim's data, demanding a payment to decrypt the files. However, victims might not get access to their data back even if they negotiate or comply with the demand. Master boot record ransomware: This ransomware type encrypts the entire hard drive, not just the user's files. It makes it impossible to gain access to the operating system. Mobile ransomware: This enables attackers to deploy mobile ransomware to steal data from phones or encrypt it and demand a ransom in return for unlocking the device or returning the data. API Attacks An API attack is the malicious use or compromise of an application programming interface (API). API security comprises practices and technologies that prevent attackers from exploiting and abusing APIs. Hackers target APIs because they are at the heart of modern web applications and microservices architectures. Examples of API attacks include: Injection attack: This type of attack occurs when an API does not properly validate its inputs and allows attackers to submit malicious code as part of API requests. SQL injection (SQLi) and cross-site scripting (XSS) are the most prominent examples, but there are others. Most types of injection attacks, traditionally aimed at websites and databases, can also be used against APIs. DoS/DDoS attacks: In a denial-of-service (DoS) or distributed denial-of-service (DDoS) attack, an attacker attempts to make the API unavailable to a target user. Rate limiting can help mitigate small-scale DoS attacks, but large-scale DDoS attacks can leverage millions of computers, and can only be addressed with cloud-scale anti-DDoS technology. Data exposure: APIs frequently process and transmit sensitive data, including credit card information, passwords, session tokens, or personally identifiable information (PII). Data can be compromised if the API handles data incorrectly, if it can easily be tricked into providing data to unauthorized users, and if attackers manage to compromise the API server. Social Engineering Attacks Social engineering attacks employ various psychological manipulation techniques, such as trickery and coercion, to make a target do a certain action. Here are common social engineering tactics: Phishing: Phishing is an attempt to trick a recipient into taking a certain action that benefits the attacker. Attackers send phishing messages using various platforms, such as email, corporate communications apps, and social media. These messages might trick their target into opening a malicious attachment, revealing sensitive information like login credentials, or clicking a malicious link. Spear phishing: A phishing attack that targets a certain person or group, using information about the target to make the phishing message seem more believable. For instance, a spear phishing email to finance personnel might claim to send an unpaid invoice from the targeted company’s legitimate supplier. Smishing: These phishing attacks use SMS text messages, taking advantage of common characteristics, like link shortening services, to trick victims into clicking malicious links. Vishing: This occurs when an attacker attempts to convince the victim to perform a certain action or reveal sensitive data, like login credentials or credit card information. Vishing is performed over the phone. MitM Attacks An MitM attack, or man-in-the-middle attack, is a type of network attack in which an attacker intercepts a data transfer or conversation between two parties. An attacker can successfully transfer and impersonate one of the parties. By intercepting the communication, an attacker can steal data or alter the data transmitted between participants, for example by inserting a malicious link. Both parties are unaware of the manipulation until it is too late. Common targets for MitM attacks include users of financial applications, e-commerce websites, and other systems that require authentication. There are many ways to carry out an MitM attack. Attackers can compromise a public free Wi-Fi hotspot, and when users connect to these hotspots, attackers have full visibility over their activity. Attackers can also use IP spoofing, ARP spoofing, or DNS spoofing to redirect users to a malicious website, or redirect user-submitted data to the attacker instead of their intended destination. Conclusion In this article, I explained the basics of network security and covered 5 network security risks: Ransomware: Ransomware is a type of malicious software (malware) designed to lock data on a targeted computer and display a ransom note API attacks: An API attack is the malicious use or compromise of an application programming interface. Social engineering attacks: Social engineering attacks employ various psychological manipulation techniques to make a target do a certain action. Supply chain attacks: Supply chain attacks exploit relationships between organizations and external parties. MitM attacks: An MitM attack is a type of network attack in which an attacker intercepts a data transfer or conversation between two parties. I hope this will be useful as you begin taking the appropriate measures against these attacks.
Two-factor authentication (2FA) is a great way to improve the security of user accounts in an application. It helps protect against common issues with passwords, like users picking easily guessable passwords or reusing the same password across multiple sites. There are different ways to implement two-factor authentication, including SMS, an authenticator application, and WebAuthn. SMS is the most widely used and won’t be going away, so it falls on us as developers to do our best to build the best SMS 2FA experience for our users. The WebOTP API is one way we can help reduce friction in the login experience and even provide some protection against phishing. What Is the WebOTP API? The WebOTP API is an extension of the Credential Management API. The Credential Management API started by giving us the ability to store and access credentials in a browser’s password manager, but now encompasses WebAuthn and two-factor authentication. The WebOTP API allows us to request permission from the user to read a 2FA code out of an incoming SMS message. When you implement the WebOTP API, the second step of the login process can go from an awkward process of reading and copying a number of digits from an SMS to a single button press. A great improvement, I think you’ll agree. How Does it Work? To implement WebOTP, you will need to do two things: Update the message you send with the WebOTP format. Add some JavaScript to the login page to request permission to read the message. The SMS Message To have the WebOTP API recognize a message as an incoming 2FA code, you need to add a line to the end of the message that you send. That line must include an @ symbol followed by the domain for the site that your user will be logging in to, then a space, the # symbol, and then the code itself. If your user is logging in on example.com and the code you are sending them is 123456, then the message needs to look like this: Your code to log in to the application is 123456 @example.com #123456 The domain ties the message to the website the user should be logging onto. This helps protect against phishing; WebOTP can’t be used to request the code from an SMS if the domain the user is logging in to doesn’t match the domain in the message. Obviously, it can’t stop a user from copying a code across from a message, but it might give them pause if they come to expect this behavior. The JavaScript Once you have your messages set up in the right format, you need some JavaScript on your second-factor page that will trigger the WebOTP API, ask the user permission for access to the message and collect the code. The most minimal version of this code looks like this: if ('OTPCredential' in window) { navigator.credentials.get({ otp: { transport: ['sms'] } }).then((otp) => { submitOTP(otp.code); }); } We ask the navigator.credentials object to get a one-time password (OTP) from the SMS transport. If the browser detects an incoming message with the right domain and a code in it, the user will be prompted, asking for access. If the user approves, the promise resolves with an otp object which has a code property. You can then submit that code to the form and complete the user’s login process. A more complete version of the code that handles things like finding an input and form, canceling the request if the form is submitted, and submitting the form if the request is successful, looks like this: if ('OTPCredential' in window) { window.addEventListener('DOMContentLoaded', e => { const input = document.querySelector('input[autocomplete="one-time-code"]'); if (!input) return; const ac = new AbortController(); const form = input.closest('form'); if (form) { form.addEventListener('submit', e => ac.abort()); } navigator.credentials.get({ otp: { transport:['sms'] }, signal: ac.signal }).then(otp => { input.value = otp.code; if (form) { form.submit(); } }).catch(err => { console.error(err); }); }); } This will work for many sites, but copying and pasting code isn’t the best way to share code, so I came up with something a bit easier. Declarative WebOTP With Web Components On Safari, you can get similar behavior to the WebOTP API by adding one attribute to the <input> element for the OTP code. Setting autocomplete="one-time-code" will trigger Safari to offer the code from the SMS via autocomplete. Inspired by this, I wanted to make WebOTP just as easy. So, I published a web component, the <web-otp-input> component, that handles the entire process. You can see all the code and how to use it on GitHub. For a quick example, you can add the component to your page as an ES module: <script type="module" src="https://unpkg.com/@philnash/web-otp-input"></script> Or install it to your project from npm: npm install @philnash/web-otp-input And import it to your application: import { WebOTPInput } from "@philnash/web-otp-input"; You can then wrap the <web-otp-input> around your existing <input> within a <form>, like this: <form action="/verification" method="POST"> <div> <label for="otp">Enter your code:</label> <web-otp-input> <input type="text" autocomplete="one-time-code" inputmode="numeric" id="otp" name="otp" /> </web-otp-input> </div> <button type="submit">Submit</button> </form> Then the WebOTP experience will happen automatically for anyone on a browser that supports it without writing any additional JavaScript. WebOTP: A Better Experience The WebOTP API makes two-factor authentication with SMS a better experience. For browsers that support it, entering the code that is sent as a second factor becomes a breeze for users. There are even circumstances where it works for desktop browsers too. For a user with Chrome on the desktop and Chrome on Android and signed into their Google account on both, signing in on the desktop will cause a notification on the mobile device asking to approve sending the code to the desktop. Approving that on mobile devices transfers the code to the desktop browser. You don’t even have to write more code to handle this; all you need is the JavaScript in this article. If you are building two-factor authentication or phone verification, consider implementing the WebOTP API as well to make that process easier for your users.
Apostolos Giannakidis
Product Security,
Microsoft
Samir Behara
Senior Cloud Infrastructure Architect,
AWS
Boris Zaikin
Senior Software Cloud Architect,
Nordcloud GmBH
Anca Sailer
Distinguished Engineer,
IBM