DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Security Topics

article thumbnail
Understanding TCP/IP Network Stack & Writing Network Apps
We cannot imagine Internet service without TCP/IP. All Internet services we have developed and used at NHN are based on a solid basis, TCP/IP. Understanding how data is transferred via the network will help you to improve performance through tuning, troubleshooting, or introduction to a new technology. This article will describe the overall operation scheme of the network stack based on data flow and control flow in Linux OS and the hardware layer. Key Characteristics of TCP/IP How should I design a network protocol to transmit data quickly while keeping the data order without any data loss? TCP/IP has been designed with this consideration. The following are the key characteristics of TCP/IP required to understand the concept of the stack. TCP and IP Technically, since TCP and IP have different layer structures, it would be correct to describe them separately. However, here we will describe them as one. 1. Connection-oriented First, a connection is made between two endpoints (local and remote) and then data is transferred. Here, the "TCP connection identifier" is a combination of addresses of the two endpoints, having type. 2. Bidirectional Byte Stream Bidirectional data communication is made by using byte stream. 3. In-order Delivery A receiver receives data in the order of sending data from a sender. For that, the order of data is required. To mark the order, 32-bit integer data type is used. 4. Reliability through ACK When a sender did not receive ACK (acknowledgement) from a receiver after sending data to the receiver, the sender TCP re-sends the data to the receiver. Therefore, the sender TCP buffers unacknowledged data from the receiver. 5. Flow Control A sender sends as much data as a receiver can afford. A receiver sends the maximum number of bytes that it can receive (unused buffer size, receive window) to the sender. The sender sends as much data as the size of bytes that the receiver's receive window allows. 6. Congestion Control The congestion window is used separately from the receive window to prevent network congestion by limiting the volume of data flowing in the network. Like the receive window, the sender sends as much data as the size of bytes that the receiver's congestion window allows by using a variety of algorithms such as TCP Vegas, Westwood, BIC, and CUBIC. Different from flow control, congestion control is implemented by the sender only. Data Transmission As indicated by its name, a network stack has many layers. The following Figure 1 shows the layer types. Figure 1: Operation Process by Each Layer of TCP/IP Network Stack for Data Transmission. There are several layers and the layers are briefly classified into three areas: User area Kernel area Device area Tasks at the user area and the kernel area are performed by the CPU. The user area and the kernel area are called "host" to distinguish them from the device area. Here, the device is the Network Interface Card (NIC) that sends and receives packets. It is a more accurate term than the commonly used "LAN card". Let's take a look at the user area. First, the application creates data to send (the "User data" box in Figure 1) and then calls the write() system call to send the data. Assume that the socket (fd in Figure 1) has been already created. When the system call is called, the area is switched to the kernel area. POSIX-series operating systems including Linux and Unix expose the socket to the application by using a file descriptor. In the POSIX-series operating system, the socket is a kind of a file. The file layer executes a simple examination and calls the socket function by using the socket structure connected to the file structure. The kernel socket has two buffers. One is the send socket buffer for sending; And the other is the receive socket buffer for receiving. When the write system call is called, the data in the user area is copied to the kernel memory and then added to the end of the send socket buffer. This is to send data in order. In the Figure 1, the light-gray box refers to the data in the socket buffer. Then, TCP is called. There is the TCP Control Block (TCB) structure connected to the socket. The TCB includes data required for processing the TCP connection. Data in the TCB are connection state (LISTEN, ESTABLISHED, TIME_WAIT),receive window, congestion window, sequence number, resending timer, etc. If the current TCP state allows for data transmission, a new TCP segment (in other words, a packet) is created. If data transmission is impossible due to flow control or such a reason, the system call is ended here and then the mode is returned to the user mode (in other words, the control is passed to the application). There are two TCP segments as shown in Figure 2: TCP header; And payload. Figure 2: TCP Frame Structure (source). The payload includes the data saved in the unacknowledged send socket buffer. The maximum length of the payload is the maximum value among the receive window, congestion window, and maximum segment size (MSS). Then, TCP checksum is computed. In this checksum computation, pseudo header information (IP addresses, segment length, and protocol number) is included. One or more packets can be transmitted according to the TCP state. In fact, since the current network stack uses the checksum offload, the TCP checksum is computed by NIC, not by the kernel. However, we assume that the kernel computes the TCP checksum for convenience. The created TCP segment goes down to the IP layer. The IP layer adds an IP header to the TCP segment and performs IP routing. IP routing is a procedure of searching the next hop IP in order to go to the destination IP. After the IP layer has computed and added the IP header checksum, it sends the data to the Ethernet layer. The Ethernet layer searches for the MAC address of the next hop IP by using the Address Resolution Protocol (ARP). It then adds the Ethernet header to the packet. The host packet is completed by adding the Ethernet header. After IP routing is performed, the transmit interface (NIC) is known as the result of IP routing. The interface is used for transmitting a packet to the next hop IP and the IP. Therefore, the transmit NIC driver is called. At this time, if a packet capture program such as tcpdump or Wireshark is running, the kernel copies the packet data onto the memory buffer that the program uses. In that way, the receiving packet is directly captured on the driver. Generally, the traffic shaper function is implemented to run on this layer. The driver requests packet transmission according to the driver-NIC communication protocol defined by the NIC manufacturer. After receiving the packet transmission request, the NIC copies the packets from the main memory to its memory and then sends it to the network line. At this time, by complying with the Ethernet standard, it adds the IFG (Inter-Frame Gap), preamble, and CRC to the packet. The IFG and preamble are used to distinguish the start of the packet (as a networking term, framing), and the CRC is used to protect the data (the same purpose as TCP and IP checksum). Packet transmission is started based on the physical speed of the Ethernet and the condition of Ethernet flow control. It is like getting the floor and speaking in a conference room. When an NIC sends a packet, the NIC generates interrupts on the host CPU. Every interrupt has its own interrupt number and the OS searches an adequate driver to handle the interrupt by using the number. The driver registers a function to handle the interrupt (an interrupt handler) when the driver is started. The OS calls the interrupt handler and then the interrupt handler returns the transmitted packet to the OS. So far we have discussed the procedure of data transmission through the kernel and the device when an application performs write. However, without a direct write request from the application, the kernel can transmit a packet by directly calling TCP. For example, when an ACK is received and the receive window is expanded, the kernel creates a TCP segment including the data left in the socket buffer and sends the TCP segment to the receiver. Data Receiving Now, let's take a look at how data is received. Data receiving is a procedure for how the network stack handles a packet coming in. Figure 3 shows how the network stack handles a packet received. Figure 3: Operation Process by Each Layer of TCP/IP Network Stack for Handling Data Received. First, the NIC writes the packet onto its memory. It checks whether the packet is valid by performing the CRC check and then sends the packet to the memory buffer of the host. This buffer is a memory that has already been requested by the driver to the kernel and allocated for receiving packets. After the buffer has been allocated, the driver tells the memory address and size to the NIC. When there is no host memory buffer allocated by the driver even though the NIC receives a packet, the NIC may drop the packet. After sending the packet to the host memory buffer, the NIC sends an interrupt to the host OS. Then, the driver checks whether it can handle the new packet or not. So far, the driver-NIC communication protocol defined by the manufacturer is used. When the driver should send a packet to the upper layer, the packet must be wrapped in a packet structure that the OS uses for the OS to understand the packet. For example, sk_buff of Linux, mbuf of BSD-series kernel, and NET_BUFFER_LIST of Microsoft Windows are the packet structures of the corresponding OS. The driver sends the wrapped packets to the upper layer. The Ethernet layer checks whether the packet is valid and then de-multiplexes the upper protocol (network protocol). At this time, it uses the ethertype value of the Ethernet header. The IPv4 ethertype value is 0x0800. It removes the Ethernet header and then sends the packet to the IP layer. The IP layer also checks whether the packet is valid. In other words, it checks the IP header checksum. It logically determines whether it should perform IP routing and make the local system handle the packet, or send the packet to the other system. If the packet must be handled by the local system, the IP layer de-multiplexes the upper protocol (transport protocol) by referring to the proto value of the IP header. The TCP proto value is 6. It removes the IP header and then sends the packet to the TCP layer. Like the lower layer, the TCP layer checks whether the packet is valid. It also checks the TCP checksum. As mentioned before, since the current network stack uses the checksum offload, the TCP checksum is computed by NIC, not by the kernel. Then it searches the TCP control block where the packet is connected. At this time, of the packet is used as an identifier. After searching the connection, it performs the protocol to handle the packet. If it has received new data, it adds the data to the receive socket buffer. According to the TCP state, it can send a new TCP packet (for example, an ACK packet). Now TCP/IP receiving packet handling has completed. The size of the receive socket buffer is the TCP receive window. To a certain point, the TCP throughput increases when the receive window is large. In the past, the socket buffer size had been adjusted on the application or the OS configuration. The latest network stack has a function to adjust the receive socket buffer size, i.e., the receive window, automatically. When the application calls the read system call, the area is changed to the kernel area and the data in the socket buffer is copied to the memory in the user area. The copied data is removed from the socket buffer. And then the TCP is called. The TCP increases the receive window because there is new space in the socket buffer. And it sends a packet according to the protocol status. If no packet is transferred, the system call is terminated. Network Stack Development Direction The functions of network stack layers described so far are the most basic functions. The network stack in the early 1990s had few more functions than the functions described above. However, the latest network stack has many more functions and complexity as the network stack implementation structure gets higher. The latest network stack is classified by purpose as follows. Packet Processing Procedure Manipulation It is a function like Netfilter (firewall, NAT) and traffic control. By inserting the user-controllable code to the basic processing flow, the function can work differently according to the user configuration. Protocol Performance It aims to improve the throughput, latency, and stability that the TCP protocol can achieve within the given network environment. Various congestion control algorithms and additional TCP functions such as SACK are the typical examples. The protocol improvement will not be discussed here since it is out of the scope. Packet Processing Efficiency The packet processing efficiency aims to improve the maximum number of packets that can be processed per second by reducing the CPU cycle, memory usage, and memory accesses that one system consumes to process packets. There have been several attempts to reduce the latency in the system. The attempts include stack parallel processing, header prediction, zero-copy, single-copy, checksum offload, TSO, LRO, RSS, etc. Control Flow in the Stack Now, we will take a more detailed look at the internal flow of the Linux network stack. Like a subsystem which is not a network stack, a network stack basically runs as the event-driven way that reacts when the event occurs. Therefore, there is no separated thread to execute the stack. Figure 1 and Figure 3 showed the simplified diagrams of control flow. Figure 4 below illustrates more exact control flow. Figure 4: Control Flow in the Stack. At Flow (1) in Figure 4, an application calls a system call to execute (use) the TCP. For example, calls the read system call and the write system call and then executes TCP. However, there is no packet transmission. Flow (2) is same as Flow (1) if it requires packet transmission after executing TCP. It creates a packet and sends down the packet to the driver. A queue is in front of the driver. The packet comes into the queue first, and then the queue implementation structure decides the time to send the packet to the driver. This is queue discipline (qdisc) of Linux. The function of Linux traffic control is to manipulate the qdisc. The default qdisc is a simple First-In-First-Out (FIFO) queue. By using another qdisc, operators can achieve various effects such as artificial packet loss, packet delay, transmission rate limit, etc. At Flow (1) and Flow (2), the process thread of the application also executes the driver. Flow (3) shows the case in which the timer used by the TCP has expired. For example, when the TIME_WAITtimer has expired, the TCP is called to delete the connection. Like Flow (3), Flow (4) is the case in which the timer used by the TCP has expired and the TCP execution result packet should be transmitted. For example, when the retransmit timer has expired, the packet of which ACK has not been received is transmitted. Flow (3) and Flow (4) show the procedure of executing the timer softirq that has processed the timer interrupt. When the NIC driver receives an interrupt, it frees the transmitted packet. In most cases, execution of the driver is terminated here. Flow (5) is the case of packet accumulation in the transmit queue. The driver requests softirq and the softirq handler executes the transmit queue to send the accumulated packet to the driver. When the NIC driver receives an interrupt and finds a newly received packet, it requests softirq. The softirq that processes the received packet calls the driver and transmits the received packet to the upper layer. In Linux, processing the received packet as shown above is called New API (NAPI). It is similar to polling because the driver does not directly transmit the packet to the upper layer, but the upper layer directly gets the packet. The actual code is called NAPI poll or poll. Flow (6) shows the case that completes execution of TCP, and Flow (7) shows the case that requires additional packet transmission. All of Flow (5), (6), and (7) are executed by the softirq which has processed the NIC interrupt. How to Process Interrupt and Received Packet Interrupt processing is complex; however, you need to understand the performance issue related to processing of packets received. Figure 5 shows the procedure of processing an interrupt. Figure 5: Processing Interrupt, softirq, and Received Packet. Assume that the CPU 0 is executing an application program (user program). At this time, the NIC receives a packet and generates an interrupt for the CPU 0. Then the CPU executes the kernel interrupt (called irq) handler. This handler refers to the interrupt number and then calls the driver interrupt handler. The driver frees the packet transmitted and then calls the napi_schedule() function to process the received packet. This function requests the softirq (software interrupt). After execution of the driver interrupt handler has been terminated, the control is passed to the kernel handler. The kernel handler executes the interrupt handler for the softirq. After the interrupt context has been executed, the softirq context will be executed. The interrupt context and the softirq context are executed by an identical thread. However, they use different stacks. And, the interrupt context blocks hardware interrupts; however, the softirq context allows for hardware interrupts. The softirq handler that processes the received packet is the net_rx_action() function. This function calls thepoll() function of the driver. The poll() function calls the netif_receive_skb() function and then sends the received packets one by one to the upper layer. After processing the softirq, the application restarts execution from the stopped point in order to request a system call. Therefore, the CPU that has received the interrupt processes the received packets from the first to the last. In Linux, BSD, and Microsoft Windows, the processing procedure is basically the same on this wise. When you check the server CPU utilization, sometimes you can check that only one CPU executes the softirq hard among the server CPUs. The phenomenon occurs due to the way of processing received packets explained so far. To solve the problem, multi-queue NIC, RSS, and RPS have been developed. Data Structure The followings are some key data structures. Take a look at them and review the code. sk_buff structure First, there is the sk_buff structure or skb structure that means a packet. Figure 6 shows some of the sk_buffstructure. As the functions have been advanced, they get more complicated. However, the basic functions are very common that anyone can think. Figure 6: Packet Structure sk_buff. Including Packet Data and meta data The structure directly includes the packet data or refers to it by using a pointer. In Figure 6, some of the packets (from Ethernet to buffer) refer to using the data pointer and the additional data (frags) refer to the actual page. The necessary information such as header and payload length is saved in the meta data area. For example, inFigure 6, the mac_header, the network_header, and the transport_header have the corresponding pointer data that points the starting position of the Ethernet header, IP header and TCP header, respectively. This way makes TCP protocol processing easy. How to Add or Delete a Header The header is added or deleted as up and down each layer of the network stack. Pointers are used for more efficient processing. For example, to remove the Ethernet header, just increase the head pointer. How to Combine and Divide Packet The linked list is used for efficient execution of tasks such as adding or deleting packet payload data to the socket buffer, or packet chain. The next pointer and the prev pointer are used for this purpose. Quick Allocation and Free As a structure is allocated whenever creating a packet, the quick allocator is used. For example, if data is transmitted at the speed of 10-Gigabit Ethernet, more than one million packets per second must be created and deleted. TCP Control Block Second, there is a structure that represents the TCP connection. Previously, it was abstractly called a TCP control block. Linux uses tcp_sock for the structure. In Figure 7, you can see the relationship among the file, the socket, and the tcp_sock. Figure 7: TCP Connection Structure. When a system call has occurred, it searches the file in the file descriptor used by the application that has called the system call. For the Unix-series OS, the socket, the file and the device for general file system for storage are abstracted to a file. Therefore, the file structure includes the least information. For a socket, a separate socket structure saves the socket-related information and the file refers to the socket as a pointer. The socket refers to the tcp_sock again. The tcp_sock is classified into sock, inet_sock, etc to support various protocols except TCP. It may be considered as a kind of polymorphism. All status information used by the TCP protocol is saved in the tcp_sock. For example, the sequence number, receive window, congestion control, and retransmit timer are saved in the tcp_sock. The send socket buffer and the receive socket buffer are the sk_buff lists and they include the tcp_sock. The dst_entry, the IP routing result, is referred to in order to avoid too frequent routing. The dst_entry allows for easy search of the ARP result, i.e., the destination MAC address. The dst_entry is part of the routing table. The structure of the routing table is very complex that it will not be discussed in this document. The NIC to be used for packet transmission is searched by using the dst_entry. The NIC is expressed as the net_device structure. Therefore, by searching just the file, it is very easy to find all structures (from the file to the driver) required to process the TCP connection with the pointer. The size of the structures is the memory size used by one TCP connection. The memory size is a few KBs (excluding the packet data). As more functions have been added, the memory usage has been gradually increased. Finally, let's see the TCP connection lookup table. It is a hash table used to search the TCP connection where the received packet belongs. The hash value is calculated by using the input data of of the packet and the Jenkins hash algorithm. It is told that the hash function has been selected by considering defense against attacks to the hash table. Following Code: How to Transmit Data We will check the key tasks performed by the stack by following the actual Linux kernel source code. Here, we will observe two paths which are frequently used. First, this is a path used to transmit data when an application calls the write system call. SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, ...) { struct file *file; [...] file = fget_light(fd, &fput_needed); [...] ===> ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos); struct file_operations { [...] ssize_t (*aio_read) (struct kiocb *, const struct iovec *, ...) ssize_t (*aio_write) (struct kiocb *, const struct iovec *, ...) [...] }; static const struct file_operations socket_file_ops = { [...] .aio_read = sock_aio_read, .aio_write = sock_aio_write, [...] }; When the application calls the write system call, the kernel performs the write() function of the file layer. First, the actual file structure of the file descriptor fd is fetched. And then the aio_write is called. This is the function pointer. In the file structure, you will see the file_operations structure pointer. The structure is generally called function table and includes the function pointers such as aio_read and aio_write. The actual table for the socket is socket_file_ops. The aio_write function used by the socket is sock_aio_write. The function table is used for the purpose that is similar to the Java interface. It is generally used for the kernel to perform code abstraction or refactoring. static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov, ..) { [...] struct socket *sock = file->private_data; [...] ===> return sock->ops->sendmsg(iocb, sock, msg, size); struct socket { [...] struct file *file; struct sock *sk; const struct proto_ops *ops; }; const struct proto_ops inet_stream_ops = { .family = PF_INET, [...] .connect = inet_stream_connect, .accept = inet_accept, .listen = inet_listen, .sendmsg = tcp_sendmsg, .recvmsg = inet_recvmsg, [...] }; struct proto_ops { [...] int (*connect) (struct socket *sock, ...) int (*accept) (struct socket *sock, ...) int (*listen) (struct socket *sock, int len); int (*sendmsg) (struct kiocb *iocb, struct socket *sock, ...) int (*recvmsg) (struct kiocb *iocb, struct socket *sock, ...) [...] }; The sock_aio_write() function gets the socket structure from the file and then calls sendmsg. It is also the function pointer. The socket structure includes the proto_ops function table. The proto_ops implemented by the IPv4 TCP is inet_stream_ops and the sendmsg is implemented by tcp_sendmsg. int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size) { struct sock *sk = sock->sk; struct iovec *iov; struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; [...] mss_now = tcp_send_mss(sk, &size_goal, flags); /* Ok commence sending. */ iovlen = msg->msg_iovlen; iov = msg->msg_iov; copied = 0; [...] while (--iovlen >= 0) { int seglen = iov->iov_len; unsigned char __user *from = iov->iov_base; iov++; while (seglen > 0) { int copy = 0; int max = size_goal; [...] skb = sk_stream_alloc_skb(sk, select_size(sk, sg), sk->sk_allocation); if (!skb) goto wait_for_memory; /* * Check whether we can use HW checksum. */ if (sk->sk_route_caps & NETIF_F_ALL_CSUM) skb->ip_summed = CHECKSUM_PARTIAL; [...] skb_entail(sk, skb); [...] /* Where to copy to? */ if (skb_tailroom(skb) > 0) { /* We have some space in skb head. Superb! */ if (copy > skb_tailroom(skb)) copy = skb_tailroom(skb); if ((err = skb_add_data(skb, from, copy)) != 0) goto do_fault; [...] if (copied) tcp_push(sk, flags, mss_now, tp->nonagle); [...] } tcp_sengmsg gets tcp_sock (i.e.,TCP control block) from the socket and copies the data that the application has requested to transmit to the send socket buffer. When copying data to sk_buff, how many bytes will one sk_buff include? One sk_buff copies and includes MSS (tcp_send_mss) bytes to help the code that actually creates packets. Maximum Segment Size (MSS) stands for the maximum payload size that one TCP packet includes. By using TSO and GSO, one sk_buff can save more data than MSS. This will be discussed later, not in this document. The sk_stream_alloc_skb function creates a new sk_buff, and skb_entail adds the new sk_buff to the tail of the send_socket_buffer. The skb_add_data function copies the actual application data to the data buffer of thesk_buff. All the data is copied by repeating the procedure (creating an sk_buff and adding it to the send socket buffer) several times. Therefore, sk_buffs at the size of the MSS are in the send socket buffer as a list. Finally, the tcp_push is called to make the data which can be transmitted now as a packet, and the packet is sent. static inline void tcp_push(struct sock *sk, int flags, int mss_now, ...) [...] ===> static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, ...) int nonagle, { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; [...] while ((skb = tcp_send_head(sk))) { [...] cwnd_quota = tcp_cwnd_test(tp, skb); if (!cwnd_quota) break; if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now))) break; [...] if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp))) break; /* Advance the send_head. This one is sent out. * This call will increment packets_out. */ tcp_event_new_data_sent(sk, skb); [...] The tcp_push function transmits as many of the sk_buffs in the send socket buffer as the TCP allows in sequence. First, the tcp_send_head is called to get the first sk_buff in the socket buffer and thetcp_cwnd_test and the tcp_snd_wnd_test are performed to check whether the congestion window and the receive window of the receiving TCP allow new packets to be transmitted. Then, the tcp_transmit_skb function is called to create a packet. static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, gfp_t gfp_mask) { const struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet; struct tcp_sock *tp; [...] if (likely(clone_it)) { if (unlikely(skb_cloned(skb))) skb = pskb_copy(skb, gfp_mask); else skb = skb_clone(skb, gfp_mask); if (unlikely(!skb)) return -ENOBUFS; } [...] skb_push(skb, tcp_header_size); skb_reset_transport_header(skb); skb_set_owner_w(skb, sk); /* Build TCP header and checksum it. */ th = tcp_hdr(skb); th->source = inet->inet_sport; th->dest = inet->inet_dport; th->seq = htonl(tcb->seq); th->ack_seq = htonl(tp->rcv_nxt); [...] icsk->icsk_af_ops->send_check(sk, skb); [...] err = icsk->icsk_af_ops->queue_xmit(skb); if (likely(err <= 0)) return err; tcp_enter_cwr(sk, 1); return net_xmit_eval(err); } tcp_transmit_skb creates the copy of the given sk_buff (pskb_copy). At this time, it does not copy the entire data of the application but the metadata. And then it calls skb_push to secure the header area and records the header field value. Send_check computes the TCP checksum. With the checksum offload, the payload data is not computed. Finally, queue_xmit is called to send the packet to the IP layer. Queue_xmit for IPv4 is implemented by the ip_queue_xmit function. int ip_queue_xmit(struct sk_buff *skb) [...] rt = (struct rtable *)__sk_dst_check(sk, 0); [...] /* OK, we know where to send it, allocate and build IP header. */ skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0)); skb_reset_network_header(skb); iph = ip_hdr(skb); *((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff)); if (ip_dont_fragment(sk, &rt->dst) && !skb->local_df) iph->frag_off = htons(IP_DF); else iph->frag_off = 0; iph->ttl = ip_select_ttl(inet, &rt->dst); iph->protocol = sk->sk_protocol; iph->saddr = rt->rt_src; iph->daddr = rt->rt_dst; [...] res = ip_local_out(skb); [...] ===> int __ip_local_out(struct sk_buff *skb) [...] ip_send_check(iph); return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, skb, NULL, skb_dst(skb)->dev, dst_output); [...] ===> int ip_output(struct sk_buff *skb) { struct net_device *dev = skb_dst(skb)->dev; [...] skb->dev = dev; skb->protocol = htons(ETH_P_IP); return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, skb, NULL, dev, ip_finish_output, [...] ===> static int ip_finish_output(struct sk_buff *skb) [...] if (skb->len > ip_skb_dst_mtu(skb) && !skb_is_gso(skb)) return ip_fragment(skb, ip_finish_output2); else return ip_finish_output2(skb); The ip_queue_xmit function executes tasks required by the IP layers. __sk_dst_check checks whether the cached route is valid. If there is no cached route or the cached route is invalid, it performs IP routing. And then it calls skb_push to secure the IP header area and records the IP header field value. After that, as following the function call, ip_send_check computes the IP header checksum and calls the netfilter function. IP fragment is created when ip_finish_output function needs IP fragmentation. No fragmentation is generated when TCP is used. Therefore, ip_finish_output2 is called and it adds the Ethernet header. Finally, a packet is completed. int dev_queue_xmit(struct sk_buff *skb) [...] ===> static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, ...) [...] if (...) { .... } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && qdisc_run_begin(q)) { [...] if (sch_direct_xmit(skb, q, dev, txq, root_lock)) { [...] ===> int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q, ...) [...] HARD_TX_LOCK(dev, txq, smp_processor_id()); if (!netif_tx_queue_frozen_or_stopped(txq)) ret = dev_hard_start_xmit(skb, dev, txq); HARD_TX_UNLOCK(dev, txq); [...] } int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, ...) [...] if (!list_empty(&ptype_all)) dev_queue_xmit_nit(skb, dev); [...] rc = ops->ndo_start_xmit(skb, dev); [...] } The completed packet is transmitted through the dev_queue_xmit function. First, the packet passes via the qdisc. If the default qdisc is used and the queue is empty, the sch_direct_xmit function is called to directly send down the packet to the driver, skipping the queue. Dev_hard_start_xmit function calls the actual driver. Before calling the driver, the device TX is locked first. This is to prevent several threads from accessing the device simultaneously. As the kernel locks the device TX, the driver transmission code does not need an additional lock. It is closely related to the parallel processing that will be discussed next time. Ndo_start_xmit function calls the driver code. Just before, you will see ptype_all and dev_queue_xmit_nit. The ptype_all is a list that includes the modules such as packet capture. If a capture program is running, the packet is copied by ptype_all to the separate program. Therefore, the packet that tcpdump shows is the packet transmitted to the driver. When checksum offload or TSO is used, the NIC manipulates the packet. So the tcpdump packet is different from the packet transmitted to the network line. After completing packet transmission, the driver interrupt handler returns the sk_buff. Following Code: How to Receive Data The general executed path is to receive a packet and then to add the data to the receive socket buffer. After executing the driver interrupt handler, follow the napi poll handle first. static void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = &__get_cpu_var(softnet_data); unsigned long time_limit = jiffies + 2; int budget = netdev_budget; void *have; local_irq_disable(); while (!list_empty(&sd->poll_list)) { struct napi_struct *n; [...] n = list_first_entry(&sd->poll_list, struct napi_struct, poll_list); if (test_bit(NAPI_STATE_SCHED, &n->state)) { work = n->poll(n, weight); trace_napi_poll(n); } [...] } int netif_receive_skb(struct sk_buff *skb) [...] ===> static int __netif_receive_skb(struct sk_buff *skb) { struct packet_type *ptype, *pt_prev; [...] __be16 type; [...] list_for_each_entry_rcu(ptype, &ptype_all, list) { if (!ptype->dev || ptype->dev == skb->dev) { if (pt_prev) ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = ptype; } } [...] type = skb->protocol; list_for_each_entry_rcu(ptype, &ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) { if (ptype->type == type && (ptype->dev == null_or_dev || ptype->dev == skb->dev || ptype->dev == orig_dev)) { if (pt_prev) ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = ptype; } } if (pt_prev) { ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev); static struct packet_type ip_packet_type __read_mostly = { .type = cpu_to_be16(ETH_P_IP), .func = ip_rcv, [...] }; As mentioned before, the net_rx_action function is the softirq handler that receives a packet. First, the driver that has requested the napi poll is retrieved from the poll_list and the poll handler of the driver is called. The driver wraps the received packet with sk_buff and then calls netif_receive_skb. When there is a module that requests all packets, the netif_receive_skb sends packets to the module. Like packet transmission, the packets are transmitted to the module registered to the ptype_all list. The packets are captured here. Then, the packets are transmitted to the upper layer based on the packet type. The Ethernet packet includes 2-byte ethertype field in the header. The value indicates the packet type. The driver records the value in sk_buff(skb->protocol). Each protocol has its own packet_type structure and registers the pointer of the structure to the ptype_base hash table. IPv4 uses ip_packet_type. The Type field value is the IPv4 ethertype (ETH_P_IP) value. Therefore, the IPv4 packet calls the ip_rcv function. int ip_rcv(struct sk_buff *skb, struct net_device *dev, ...) { struct iphdr *iph; u32 len; [...] iph = ip_hdr(skb); [...] if (iph->ihl < 5 || iph->version != 4) goto inhdr_error; if (!pskb_may_pull(skb, iph->ihl*4)) goto inhdr_error; iph = ip_hdr(skb); if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) goto inhdr_error; len = ntohs(iph->tot_len); if (skb->len < len) { IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INTRUNCATEDPKTS); goto drop; } else if (len < (iph->ihl*4)) goto inhdr_error; [...] return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL, ip_rcv_finish); [...] ===> int ip_local_deliver(struct sk_buff *skb) [...] if (ip_hdr(skb)->frag_off & htons(IP_MF | IP_OFFSET)) { if (ip_defrag(skb, IP_DEFRAG_LOCAL_DELIVER)) return 0; } return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN, skb, skb->dev, NULL, ip_local_deliver_finish); [...] ===> static int ip_local_deliver_finish(struct sk_buff *skb) [...] __skb_pull(skb, ip_hdrlen(skb)); [...] int protocol = ip_hdr(skb)->protocol; int hash, raw; const struct net_protocol *ipprot; [...] hash = protocol & (MAX_INET_PROTOS - 1); ipprot = rcu_dereference(inet_protos[hash]); if (ipprot != NULL) { [...] ret = ipprot->handler(skb); [...] ===> static const struct net_protocol tcp_protocol = { .handler = tcp_v4_rcv, [...] }; The ip_rcv function executes tasks required by the IP layers. It examines packets such as the length and header checksum. After passing through the netfilter code, it performs the ip_local_deliver function. If required, it assembles IP fragments. Then, it calls ip_local_deliver_finish through the netfilter code. Theip_local_deliver_finish function removes the IP header by using the __skb_pull and then searches the upper protocol whose value is identical to the IP header protocol value. Similar to the Ptype_base, each transport protocol registers its own net_protocol structure in inet_protos. IPv4 TCP uses tcp_protocol and callstcp_v4_rcv that has been registered as a handler. When packets come into the TCP layer, the packet processing flow varies depending on the TCP status and the packet type. Here, we will see the packet processing procedure when the expected next data packet has been received in the ESTABLISHED status of the TCP connection. This path is frequently executed by the server receiving data when there is no packet loss or out-of-order delivery. int tcp_v4_rcv(struct sk_buff *skb) { const struct iphdr *iph; struct tcphdr *th; struct sock *sk; [...] th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) goto bad_packet; if (!pskb_may_pull(skb, th->doff * 4)) goto discard_it; [...] th = tcp_hdr(skb); iph = ip_hdr(skb); TCP_SKB_CB(skb)->seq = ntohl(th->seq); TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + skb->len - th->doff * 4); TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); TCP_SKB_CB(skb)->when = 0; TCP_SKB_CB(skb)->flags = iph->tos; TCP_SKB_CB(skb)->sacked = 0; sk = __inet_lookup_skb(&tcp_hashinfo, skb, th->source, th->dest); [...] ret = tcp_v4_do_rcv(sk, skb); First, the tcp_v4_rcv function validates the received packets. When the header size is larger than the data offset (th->doff < sizeof(struct tcphdr) / 4), it is the header error. And then __inet_lookup_skb is called to look for the connection where the packet belongs from the TCP connection hash table. From the sock structure found, all required structures such as tcp_sock and socket can be got. int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) [...] if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */ sock_rps_save_rxhash(sk, skb->rxhash); if (tcp_rcv_established(sk, skb, tcp_hdr(skb), skb->len)) { [...] ===> int tcp_rcv_established(struct sock *sk, struct sk_buff *skb, [...] /* * Header prediction. */ if ((tcp_flag_word(th) & TCP_HP_BITS) == tp->pred_flags && TCP_SKB_CB(skb)->seq == tp->rcv_nxt && !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt))) { [...] if ((int)skb->truesize > sk->sk_forward_alloc) goto step5; NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITS); /* Bulk data transfer: receiver */ __skb_pull(skb, tcp_header_len); __skb_queue_tail(&sk->sk_receive_queue, skb); skb_set_owner_r(skb, sk); tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq; [...] if (!copied_early || tp->rcv_nxt != tp->rcv_wup) __tcp_ack_snd_check(sk, 0); [...] step5: if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0) goto discard; tcp_rcv_rtt_measure_ts(sk, skb); /* Process urgent data. */ tcp_urg(sk, skb, th); /* step 7: process the segment text */ tcp_data_queue(sk, skb); tcp_data_snd_check(sk); tcp_ack_snd_check(sk); return 0; [...] } The actual protocol is executed from the tcp_v4_do_rcv function. If the TCP is in the ESTABLISHED status,tcp_rcv_esablished is called. Processing of the ESTABLISHED status is separately handled and optimized since it is the most common status. The tcp_rcv_established first executes the header prediction code. The header prediction is also quickly processed to detect in the common state. The common case here is that there is no data to transmit and the received data packet is the packet that must be received next time, i.e., the sequence number is the sequence number that the receiving TCP expects. In this case, the procedure is completed by adding the data to the socket buffer and then transmitting ACK. Go forward and you will see the sentence comparing truesize with sk_forward_alloc. It is to check whether there is any free space in the receive socket buffer to add new packet data. If there is, header prediction is "hit" (prediction succeeded). Then __skb_pull is called to remove the TCP header. After that, __skb_queue_tail is called to add the packet to the receive socket buffer. Finally, __tcp_ack_snd_check is called for transmitting ACK if necessary. In this way, packet processing is completed. If there is not enough free space, a slow path is executed. The tcp_data_queue function newly allocates the buffer space and adds the data packet to the socket buffer. At this time, the receive socket buffer size is automatically increased if possible. Different from the quick path, tcp_data_snd_check is called to transmit a new data packet if possible. Finally, tcp_ack_snd_check is called to create and transmit the ACK packet if necessary. The amount of code executed by the two paths is not much. This is accomplished by optimizing the common case. In other words, it means that the uncommon case will be processed significantly more slowly. The out-of-order delivery is one of the uncommon cases. How to Communicate between Driver and NIC Communication between a driver and the NIC is the bottom of the stack and most people do not care about it. However, the NIC is executing more and more tasks to solve the performance issue. Understanding the basic operation scheme will help you understand the additional technology. A driver and the NIC asynchronously communicate. First, a driver requests packet transmission (call) and the CPU performs another task without waiting for the response. And then the NIC sends packets and notifies the CPU of that, the driver returns the received packets (returns the result). Like packet transmission, packet receiving is asynchronous. First, a driver requests packet receiving and the CPU performs another task (call). Then, the NIC receives packets and notifies the CPU of that, and the driver processes the received packets received (returns the result). Therefore, a space to save the request and the response is necessary. In most cases, the NIC uses the ring structure. The ring is similar to the common queue structure. With the fixed number of entries, one entry saves one request data or one response data. The entries are sequentially used in turn. The name "ring" is generally used since the fixed entries are reused in turn. As following the packet transmission procedure shown in the following Figure 8, you will see how the ring is used. Figure 8: Driver-NIC Communication: How to Transmit Packet. The driver receives packets from the upper layer and creates the send descriptor that the NIC can understand. The send descriptor includes the packet size and the memory address by default. As the NIC needs the physical address to access the memory, the driver should change the virtual address of the packets to the physical address. Then, it adds the send descriptor to the TX ring (1). The TX ring is the send descriptor ring. Next, it notifies the NIC of the new request (2). The driver directly writes the data to a specific NIC memory address. In this way, Programmed I/O (PIO) is the data transmission method in which the CPU directly sends data to the device. The notified NIC gets the send descriptor of the TX ring from the host memory (3). Since the device directly accesses the memory without intervention of the CPU, the access is called Direct Memory Access (DMA). After getting the send descriptor, the NIC determines the packet address and the size and then gets the actual packets from the host memory (4). With the checksum offload, the NIC computes the checksum when the NIC gets the packet data from the memory. Therefore, overhead rarely occurs. The NIC sends packets (5) and then writes the number of packets that are sent to the host memory (6). Then, it sends an interrupt (7). The driver reads the number of packets that are sent and then returns the packets that have been sent so far. In the following Figure 9, you will see the procedure of receiving packets. Figure 9: Driver-NIC Communication: How to Receive Packets. First, the driver allocates the host memory buffer for receiving packets and then creates the receive descriptor. The receive descriptor includes the buffer size and the memory address by default. Like the send descriptor, it saves the physical address that the DMA uses in the receive descriptor. Then, it adds the receive descriptor to the RX ring (1). It is the receive request and the RX ring is the receive request ring. Through the PIO, the driver notifies that there is a new descriptor in the NIC (2). The NIC gets the new descriptor of the RX ring. And then it saves the size and location of the buffer included in the descriptor to the NIC memory (3). After the packets have been received (4), the NIC sends the packets to the host memory buffer (5). If the checksum offload function is existing, the NIC computes the checksum at this time. The actual size of received packets, the checksum result, and any other information are saved in the separate ring (the receive return ring) (6). The receive return ring saves the result of processing the receive request, i.e., the response. And then the NIC sends an interrupt (7). The driver gets packet information from the receive return ring and processes the received packets. If necessary, it allocates new memory buffer and repeats Step (1) and Step (2). To tune the stack, most people say that the ring and interrupt setting should be adjusted. When the TX ring is large, a lot of send requests can be made at once. When the RX ring is large, a lot of packet receives can be done at once. A large ring is useful for the workload that has a huge burst of packet transmission/receiving. In most cases, the NIC uses a timer to reduce the number of interrupts since the CPU may suffer from large overhead to process interrupts. To avoid flooding the host system with too many interrupts, interrupts are collected and sent regularly(interrupt coalescing) while sending and receiving the packets. Stack Buffer and Flow Control Flow control is executed in several stages in the stack. Figure 10 shows buffers used to transmit data. First, an application creates data and adds it to the send socket buffer. If there is no free space in the buffer, the system call is failed or the blocking occurs in the application thread. Therefore, the application data rate flowing into the kernel must be controlled by using the socket buffer size limit. Figure 10: Buffers Related to Packet Transmission. The TCP creates and sends packets to the driver through the transmit queue (qdisc). It is a typical FIFO queue type and the maximum length of the queue is the value of txqueuelen which can be checked by executing the ifconfig command. Generally, it is thousands of packets. The TX ring is between the driver and the NIC. As mentioned before, it is considered as a transmission request queue. If there is no free space in the queue, no transmission request is made and the packets are accumulated in the transmit queue. If too many packets are accumulated, packets are dropped. The NIC saves the packets to transmit in the internal buffer. The packet rate from this buffer is affected by the physical rate (ex: 1 Gb/s NIC cannot offer performance of 10 Gb/s). And with the Ethernet flow control, packet transmission is stopped if there is no free space in the receive NIC buffer. When the packet rate from the kernel is faster than the packet rate from the NIC, packets are accumulated in the buffer of the NIC. If there is no free space in the buffer, processing of transmission request from the TX ring is stopped. More and more requests are accumulated in the TX ring and finally there is no free space in the queue. The driver cannot make any transmission request and the packets are accumulated in the transmit queue. Like this, backpressure is sent from the bottom to the top through many buffers. Figure 11 shows the buffers that the receive packets are passing. The packets are saved in the receive buffer of the NIC. From the view of flow control, the RX ring between the driver and the NIC is considered as a packet buffer. The driver gets packets coming into the RX ring and then sends them to the upper layer. There is no buffer between the driver and the upper layer since the NIC driver that is used by the server system uses NAPI by default. Therefore, it can be considered as the upper layer directly gets packets from the RX ring. The payload data of packets is saved in the receive socket buffer. The application gets the data from the socket buffer later. Figure 11: Buffers Related to Packet Receiving. The driver that does not support NAPI saves packets in the backlog queue. Later, the NAPI handler gets packets. Therefore, the backlog queue can be considered as a buffer between the upper layer and the driver. If the packet processing rate of the kernel is slower than the packet flow rate into the NIC, the RX ring space is full. And the space of the buffer in the NIC is full, too. When the Ethernet flow control is used, the NIC sends a request to stop transmission to the transmission NIC or makes the packet drop. There is no packet drop due to lack of space in the receive socket buffer because the TCP supports end-to-end flow control. However, packet drop occurs due to lack of space in the socket buffer when the application rate is slow because the UDP does not support flow control. The sizes of the TX ring and the RX ring used by the driver in Figure 10 and Figure 11 are the sizes of the rings shown by the ethtool. For most workloads which regard throughput as important, it will be helpful to increase the ring size and the socket buffer size. Increasing the sizes reduces the possibility of failures caused by lack of space in the buffer while receiving and transmitting a lot of packets at a fast rate. Conclusion Initially, I planned to explain only the things that would be helpful for you to develop network programs, execute performance tests, and perform troubleshooting. In spite of my initial plan, the amount of description included in this document is not small. I hope this document will help you to develop network applications and monitor their performance. The TCP/IP protocol itself is very complicated and has many exceptions. However, you don't need to understand every line of TCP/IP-related code of the OS to understand performance and analyze the phenomena. Just understanding its context will be very helpful for you. With continuous advancement of system performance and implementation of the OS network stack, the latest server can offer 10-20 Gb/s TCP throughput without any problem. These days, there are too many technology types related to performance, such as TSO, LRO, RSS, GSO, GRO, UFO, XPS, IOAT, DDIO, and TOE, just like alphabet soup, to make us confused. In the next article, I will explain about the network stack from the performance perspective and discuss the problems and effects of this technology. By Hyeongyeop Kim, Senior Engineer at Performance Engineering Lab, NHN Corporation.
February 27, 2013
by Esen Sagynov
· 13,720 Views · 1 Like
article thumbnail
Solving RPM installation conflicts
This post comes from Ignacio Nin at the MySQL Performance Blog. Lately we’ve had many reports of the RPM packages for CentOS 5 (mostly) and CentOS 6 having issues when installing different combinations of our products, particularly with Percona Toolkit. Examples of bugs related to these issues are lp:1031427 and lp:1051874. These problems arise when trying to install a package from the distribution that is linked against the version of libmysqlclient.so shipped by the distribution (libmysqlclient.so.15 for CentOS 5/libmysqlclient.so.16 for CentOS 6) and a version of Percona Server that depends on another version of libmysqlclient.so, usually more recent. Bug lp:1031427 is an example of this, and shows how the packages would conflict when trying to install libmysqlclient.so. For example, when installing php-mysql alongside PS 5.5 in CentOS 6: # yum -q install Percona-Server-server-55 php-mysql Installing: Percona-Server-server-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 15 M php-mysql x86_64 5.3.3-14.el6_3 updates 79 k Installing for dependencies: Percona-Server-client-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 7.0 M Percona-Server-shared-51 x86_64 5.1.67-rel14.3.506.rhel6 percona 2.8 M Percona-Server-shared-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 787 k Transaction Summary ===================================================================================================================================================== Install 5 Package(s) Is this ok [y/N]: y Transaction Check Error: file /usr/lib64/libmysqlclient.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64 file /usr/lib64/libmysqlclient_r.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64 The traditional solution for this situation was to provide a special package, Percona-Server-shared-compat (modeled after upstream’s MySQL-shared-compat) which would contain ALL versions of libmysqlclient.so.* together and wouldn’t conflict. Probably some of you are familiar with this approach. # yum -q install Percona-Server-server-55 Percona-Server-shared-compat php-mysql Installing: Percona-Server-server-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 15 M Percona-Server-shared-compat x86_64 5.5.29-rel29.4.401.rhel6 percona 3.4 M php-mysql x86_64 5.3.3-14.el6_3 updates 79 k Installing for dependencies: Percona-Server-client-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 7.0 M Percona-Server-shared-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 787 k Transaction Summary ===================================================================================================================================================== Install 5 Package(s) Notice how PS-shared-compat installs along the -shared package, providing the older libmysqlclient.so.16 required by php-mysql. However, this has proved non-intuitive and problematic, since the shared-compat package wouldn’t get selected unless explicitely installed — and many of our users would rather have it “just work” without requiring additional knowledge of what the particular workaround was, etc.. We’re now trying a solution in which our -shared packages won’t conflict anymore at libmysqlclient.so, so we are able to install them side-by-side, modelled after the mysql-libs packages provided by CentOS/Redhat. So even if the user wants to install PS 5.5 alongside packages that depend on 5.1/5.0, the -shared packages will work together. For example installing 5.5 and postfix in CentOS: # yum -q install Percona-Server-server-55 postfix Installing: Percona-Server-server-55 x86_64 5.5.29-rel29.4.402.rhel5 percona-testing 19 M postfix x86_64 2:2.3.3-6.el5 base 3.8 M Installing for dependencies: Percona-SQL-shared-50 x86_64 5.0.92-b23.89.rhel5 percona-testing 1.8 M Percona-Server-client-55 x86_64 5.5.29-rel29.4.402.rhel5 percona-testing 9.1 M Percona-Server-shared-55 x86_64 5.5.29-rel29.4.402.rhel5 percona-testing 993 k … and this will install without problems. Additionally, this has the advantage of allowing an upgrade from 5.1 to 5.5 without uninstalling any software that depended on the old version. # rpm -qa | grep ^Percona Percona-Server-client-51-5.1.67-rel14.3.507.rhel6.x86_64 Percona-Server-shared-51-5.1.67-rel14.3.507.rhel6.x86_64 Percona-Server-server-51-5.1.67-rel14.3.507.rhel6.x86_64 In this case only Percona-Server-client-51 and Percona-Server-server-51 need be removed, allowing any package that depends on Percona-Server-shared-51 (providing libmysqlclient.so.16) to remain installed. After the server and client packages are uninstalled, you can install PS 5.5 without conflict. The current package candidates for versions 5.0.92 (which required an update), 5.1.67-14.3 and 5.5.29-29.4 can be tested from the percona-testing repository. We encourage you to try these out and send us your feedback and/or file any bugs you find. Installation instructions for Percona Testing repositories. We’re aiming to include these fixes in our next releases of 5.1 and 5.5. Percona Toolkit users in particular will enjoy this update since it’ll mean no more trouble when installing it from repository!
February 25, 2013
by Peter Zaitsev
· 7,802 Views
article thumbnail
Spring-Test-MVC Junit Testing Spring Security Layer with Method Level Security
For people in hurry get the code from Github. In continuation of my earlier blog on spring-test-mvc junit testing Spring Security layer with InMemoryDaoImpl, in this blog I will discuss how to use achieve method level access control. Please follow the steps in this blog to setup spring-test-mvc and run the below test case. mvn test -Dtest=com.example.springsecurity.web.controllers.SecurityControllerTest The JUnit test case looks as below, @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(loader = WebContextLoader.class, value = { "classpath:/META-INF/spring/services.xml", "classpath:/META-INF/spring/security.xml", "classpath:/META-INF/spring/mvc-config.xml" }) public class SecurityControllerTest { @Autowired CalendarService calendarService; @Test public void testMyEvents() throws Exception { Authentication auth = new UsernamePasswordAuthenticationToken("[email protected]", "user1"); SecurityContext securityContext = SecurityContextHolder.getContext(); securityContext.setAuthentication(auth); calendarService.findForUser(0); SecurityContextHolder.clearContext(); } @Test(expected = AuthenticationCredentialsNotFoundException.class) public void testForbiddenEvents() throws Exception { calendarService.findForUser(0); } } @Test(expected=AccessDeniedException.class) public void testWrongUserEvents() throws Exception { Authentication auth = new UsernamePasswordAuthenticationToken("[email protected]", "user2"); SecurityContext securityContext = SecurityContextHolder.getContext(); securityContext.setAuthentication(auth); calendarService.findForUser(0); SecurityContextHolder.clearContext(); } If you notice, if the user did not login or if the user is trying to access another users information it will throw an exception. The interface access control is as below, public interface CalendarService { @PreAuthorize("hasRole('ROLE_ADMIN') or principal.id == #userId") List findForUser(int userId); } The PreAuthorize only works on interface so that any implementation that implements this interface has this access control. I hope this blog helps you.
February 21, 2013
by Krishna Prasad
· 23,486 Views
article thumbnail
Building SOLID Databases: Dependency Inversion and Robust DB Interfaces
Dependency inversion is the idea that interfaces should depend on abstractions not on specifics. According to Wikipedia, the principle states: A. High-level modules should not depend on low-level modules. Both should depend on abstractions. B. Abstractions should not depend upon details. Details should depend upon abstractions. Of course the second part of this principle is impossible if read literally. You can't have an abstraction until you know what details are to be covered, and so the abstraction and details are both co-dependent. If the covered details change sufficiently the abstraction will become either leaky or inadequate and so it is worth seeing these as intertwined to some extent. The focus on abstraction is helpful because it suggests that the interface contract should be designed in such a way that neither side really has to understand any internal details of the other in order to make things work. Both sides depend on well-encapsulated API's and neither side has to worry about what the other side is really doing. This is what is meant by details depending on abstractions rather than the other way around. This concept is quite applicable beyond object oriented programming because it covers a very basic aspect of API contract design, namely how well an API should encapsulate behavior. This principle is first formulated in its current form in the object oriented programming paradigm but is generally applicable elsewhere. SQL as an Abstraction Layer, or Why RDBMS are Still King There are plenty of reasons to dislike SQL, such as the fact that nulls are semantically ambiguous. As a basic disclaimer I am not holding SQL up to be a paragon of programming languages or even db interfaces, but I think it is important to discuss what SQL does right in this regard. SQL is generally understood to be a declarative language which approximates relational mathematics for database access purposes. With SQL, you specify what you want returned, not how to get it, and the planner determines the best way to get it. SQL is thus an interface language rather than a programming language per se. With SQL, you can worry about the logical structure, leaving the implementation details to the db engine. SQL queries are basically very high level specifications of operations, not detailed descriptions of how to do something efficiently. Even update and insert statements (which are by nature more imperative than select statements) leave the underlying implementation entirely to the database management system. I think that this, along with many concessions the language has made to real-world requirements (such as bags instead of sets and the addition of ordering to bags) largely account for the success of this language. SQL, in essence, encapsulates a database behind a mature mathematical, declarative model in the same way that JSON and REST do (in a much less comprehensive way) in many NoSQL db's. In essence SQL provides encapsulation, interface, and abstraction in a very full-featured way and this is why it has been so successful. SQL Abstraction as Imperfect One obvious problem with treating SQL as an abstraction layer in its own right is that one is frequently unable to write details in a way that is clearly separate from the interface. Often storage tables are hit directly, and therefore there is little separation between logical detail and logical interface, and so this can break down when database complexity reaches a certain size. Approaches to managing this problem include using stored procedures or user defined functions, and using views to encapsulate storage tables. Stored Procedures and User Defined Functions Done Wrong Of the above methods, stored procedures and functional interfaces have bad reputations frequently because of bad experiences that many people have with them. These include developers pushing too much logic into stored procedures, and the fact that defining functional interfaces in this way usually produces a very tight binding between database code and application code, often leading to maintainability problems. The first case is quite obvious, and includes the all-too-frequent case of trying to send emails directly from stored procedures (always a bad idea). This mistake leads to certain types of problems, including the fact that ACID-compliant operations may be mixed with non-ACID-compliant ones, leading to cases where a transaction can only be partially rolled back. Oops, we didn't actually record the order as shipped, but we told the customer it was..... MySQL users will also note this is an argument against mixing transactional and nontransactional backend table types in the same db..... However that problem is outside the scope of this post. Additionally, MySQL is not well suited for many applications against a single set of db relations. The second problem, though, is more insidious. The traditional way stored procedures and user defined functions are typically used, the application has to be deeply aware of the interface to the database, but the rollout for these aspects is different leading to the possibility or service interruptions, and a need to very carefully and closely time rollout of db changes with application changes. As more applications use the database, this becomes harder and the chance of something being overlooked becomes greater. For this reason the idea that all operations must go through a set of stored procedures is a decision fraught with hazard as the database and application environment evolves. Typically it is easier to manage backwards-compatibility in schemas than it is in functions and so a key question is how many opportunities you have to create new bugs when a new column is added. There are, of course, more hazards which I have dealt with before, but the point is that stored procedures are potentially harmful and a major part of the reason is that they usually form a fairly brittle contract with the application layer. In a traditional stored procedure, adding a column to be stored will require changing the number of variables in the stored procedure's argument list, the queries to access it, and each application's call to that stored procedure. In this way, they provide (in the absence of other help) at best a leaky abstraction layer around the database details. This is the sort of problem that dependency inversion helps to avoid. Stored Procedures and User Defined Functions Done Right Not all stored procedures are done wrong. In the LedgerSMB project we have at least partially solved the abstraction/brittleness issue by looking to web services for inspiration. Our approach provides an additional mapping layer and dynamic query generation around a stored procedure interface. By using a service locator pattern, and overloading the system tables in PostgreSQL as the service registry, we solve the problem of brittleness. Our approach of course is not perfect and it is not the only possibility. One shortcoming is that our approach is that the invocation of the service locator is relatively spartan. We intend to allow more options there in the future. However one thing I have noticed is the fact that there are far fewer places where bugs can hide and therefore faster and more robust development takes place. Additionally a focus on clarity of code in stored procedures has eliminated a number of important performance bottlenecks, and it limits the number of places where a given change propagates to. Other Important Options in PostgreSQL Stored procedures are not the only abstraction mechanisms available from PostgreSQL. In addition to views, there are also other interesting ways of using functions to accomplish this without insisting that all access goes through stored procedures. In addition these methods can be freely mixed to produce very powerful, intelligent database systems. Such options include custom types, written in C, along with custom operators, functions and the like. These would then be stored in columns and SQL can be used to provide an abstraction layer around the types. In this way SQL becomes the abstraction and the C programs become the details. A future post will cover the use of ip4r in network management with PostgreSQL db's as an example of what can be done here. Additionally, things like triggers and notifications can be used to ensure that appropriate changes trigger other changes in the same transaction or, upon transaction commit, hand off control to other programs in subsequent transactions (allowing for independent processing and error control for things like sending emails). Recommendations Rather than specific recommendations, the overall point here is to look at the database itself as a an application running in an application server (the RDBMS) and design it as an application with an appropriate API. There are many ways to do this, from writing components in C and using SQL as an abstraction mechanism to writing things in SQL and using stored procedures as a mechanism. One could even write code in SQL and still use SQL as an abstraction mechanism. The key point however is to be aware of the need for discoverable abstraction, a need which to date things like ORMs and stored procedures often fill very imperfectly. A well designed db with appropriate abstraction in interfaces, should be able to be seen as an application in its own right, engineered as such, and capable of serving multiple client apps through a robust and discoverable API. As with all things, it starts by recognizing the problems and putting solutions as priorities from the design stage onward.
February 19, 2013
by Chris Travers
· 5,223 Views
article thumbnail
XML->JSON->HashMap
Yes, it is long time since i posted… Was just trying to see how a XML can be converted to JSON and to HashMap. The situation is very imaginary. import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.List; import java.util.Map; import net.sf.json.JSON; import net.sf.json.xml.XMLSerializer; import org.apache.commons.io.IOUtils; import org.codehaus.jackson.JsonGenerationException; import org.codehaus.jackson.map.JsonMappingException; import org.codehaus.jackson.map.ObjectMapper; import org.codehaus.jackson.type.TypeReference; public class XML2JSONConvertor { public static void main(String[] args) throws Exception { InputStream is = new FileInputStream(new File( “e:\\jagannathan\\personal\\java-projects\\secondtest.xml”)); String xml = IOUtils.toString(is); XMLSerializer xmlSerializer = new XMLSerializer(); JSON json = xmlSerializer.read(xml); System.out.println(json.toString(2)); printJSON(json.toString(2)); } public static void printJSON(String jsonString) { ObjectMapper mapper = new ObjectMapper(); try { Map jsonInMap = mapper.readValue(jsonString, new TypeReference>() { }); List keys = new ArrayList(jsonInMap.keySet()); for (String key : keys) { System.out.println(key + “: ” + jsonInMap.get(key)); } } catch (JsonGenerationException e) { e.printStackTrace(); } catch (JsonMappingException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } } Dependencies net.sf.json-lib json-lib 2.4 jdk15 commons-io commons-io 2.3 compile xom xom 1.2.5 org.codehaus.jackson jackson-mapper-asl 1.9.0 The Input XML Jags Inc Jagan Male 24-jul Satya Male 24-apr The output 7 Feb, 2013 7:20:50 PM net.sf.json.xml.XMLSerializer getType INFO: Using default type string { “name”: “Jags Inc”, “employees”: [ { "name": "Jagan", "sex": "Male", "dob": "24-jul" }, { "name": "Satya", "sex": "Male", "dob": "24-apr" } ] } name: Jags Inc employees: [{name=Jagan, sex=Male, dob=24-jul}, {name=Satya, sex=Male, dob=24-apr}]
February 18, 2013
by Jagannathan Asokan
· 33,494 Views
article thumbnail
JDBC Realm and Form Based Authentication with GlassFish 3.1.2.2 and Primefaces 3.4
One of the most popular posts on my blog is the short tutorial about the JDBC Security Realm and form based Authentication on GlassFish with Primefaces. After I received some comments about it that it isn't any longer working with latest GlassFish 3.1.2.2 I thought it might be time to revisit it and present an updated version. Here we go: Preparation As in the original tutorial I am going to rely on some stuff. Make sure to have a recent NetBeans 7.3 beta2 (which includes GlassFish 3.1.2.2) and the MySQL Community Server (5.5.x) installed. You should have verified that everything is up an running and that you can start GlassFish and the MySQL Server also is started. Some Basics A GlassFish authentication realm, also called a security policy domain or security domain, is a scope over which the GlassFish Server defines and enforces a common security policy. GlassFish Server is preconfigured with the file, certificate, and administration realms. In addition, you can set up LDAP, JDBC, digest, Oracle Solaris, or custom realms. An application can specify which realm to use in its deployment descriptor. If you want to store the user credentials for your application in a database your first choice is the JDBC realm. Prepare the Database Fire up NetBeans and switch to the Services tab. Right click the "Databases" node and select "Register MySQL Server". Fill in the details of your installation and click "ok". Right click the new MySQL node and select "connect". Now you see all the already available databases. Right click again and select "Create Database". Enter "jdbcrealm" as the new database name. Remark: We're not going to do all that with a separate database user. This is something that is highly recommended but I am using the root user in this examle. If you have a user you can also grant full access to it here. Click "ok". You get automatically connected to the newly created database. Expand the bold node and right click on "Tables". Select "Execute Command" or enter the table details via the wizard. CREATE TABLE USERS ( `USERID` VARCHAR(255) NOT NULL, `PASSWORD` VARCHAR(255) NOT NULL, PRIMARY KEY (`USERID`) ); CREATE TABLE USERS_GROUPS ( `GROUPID` VARCHAR(20) NOT NULL, `USERID` VARCHAR(255) NOT NULL, PRIMARY KEY (`GROUPID`) ); That is all for now with the database. Move on to the next paragraph. Let GlassFish know about MySQL First thing to do is to get the latest and greatest MySQL Connector/J from the MySQL website which is 5.1.22 at the time of writing this. Extract the mysql-connector-java-5.1.22-bin.jar file and drop it into your domain folder (e.g. glassfish\domains\domain1\lib). Done. Now it is finally time to create a project. Basic Project Setup Start a new maven based web application project. Choose "New Project" > "Maven" > Web Application and hit next. Now enter a name (e.g. secureapp) and all the needed maven cordinates and hit next. Choose your configured GlassFish 3+ Server. Select Java EE 6 Web as your EE version and hit "Finish". Now we need to add some more configuration to our GlassFish domain.Right click on the newly created project and select "New > Other > GlassFish > JDBC Connection Pool". Enter a name for the new connection pool (e.g. SecurityConnectionPool) and underneath the checkbox "Extract from Existing Connection:" select your registered MySQL connection. Click next. review the connection pool properties and click finish. The newly created Server Resources folder now shows your sun-resources.xml file. Follow the steps and create a "New > Other > GlassFish > JDBC Resource" pointing the the created SecurityConnectionPool (e.g. jdbc/securityDatasource).You will find the configured things under "Other Sources / setup" in a file called glassfish-resources.xml. It gets deployed to your server together with your application. So you don't have to care about configuring everything with the GlassFish admin console.Additionally we still need Primefaces. Right click on your project, select "Properties" change to "Frameworks" category and add "JavaServer Faces". Switch to the Components tab and select "PrimeFaces". Finish by clicking "OK". You can validate if that worked by opening the pom.xml and checking for the Primefaces dependency. 3.4 should be there. Feel free to change the version to latest 3.4.2. Final GlassFish Configuration Now it is time to fire up GlassFish and do the realm configuration. In NetBeans switch to the "Services" tab again and right click on the "GlassFish 3+" node. Select "Start" and watch the Output window for a successful start. Right click again and select "View Domain Admin Console", which should open your default browser pointing you to http://localhost:4848/. Select "Configurations > server-config > Security > Realms" and click "New..." on top of the table. Enter a name (e.g. JDBCRealm) and select the com.sun.enterprise.security.auth.realm.jdbc.JDBCRealm from the drop down. Fill in the following values into the textfields: JAAS jdbcRealm JNDI jdbc/securityDatasource User Table users User Name Column username Password Column password Group Table groups Group Name Column groupname Leave all the other defaults/blanks and select "OK" in the upper right corner. You are presented with a fancy JavaScript warning window which tells you to _not_ leave the Digest Algorithm Field empty. I field a bug about it. It defaults to SHA-256. Which is different to GlassFish versions prior to 3.1 which used MD5 here. The older version of this tutorial didn't use a digest algorithm at all ("none"). This was meant to make things easier but isn't considered good practice at all. So, let's stick to SHA-256 even for development, please. Secure your application Done with configuring your environment. Now we have to actually secure the application. First part is to think about the resources to protect. Jump to your Web Pages folder and create two more folders. One named "admin" and another called "users". The idea behind this is, to have two separate folders which could be accessed by users belonging to the appropriate groups. Now we have to create some pages. Open the Web Pages/index.xhtml and replace everything between the h:body tags with the following: Select where you want to go: Now add a new index.xhtml to both users and admin folders. Make them do something like this: Hello Admin|User On to the login.xhtml. Create it with the following content in the root of your Web Pages folder. Username: Password: As you can see, whe have the basic Primefaces p:panel component which has a simple html form which points to the predefined action j_security_check. This is, where all the magic is happening. You also have to include two input fields for username and password with the predefined names j_username and j_password. Now we are going to create the loginerror.xhtml which is displayed, if the user did not enter the right credentials. (use the same DOCTYPE and header as seen in the above example). Sorry, you made an Error. Please try again: Login The only magic here is the href link of the Login anchor. We need to get the correct request context and this could be done by accessing the faces context. If a user without the appropriate rights tries to access a folder he is presented a 403 access denied error page. If you like to customize it, you need to add it and add the following lines to your web.xml: 403 /faces/403.xhtml That snippet defines, that all requests that are not authorized should go to the 403 page. If you have the web.xml open already, let's start securing your application. We need to add a security constraint for any protected resource. Security Constraints are least understood by web developers, even though they are critical for the security of Java EE Web applications. Specifying a combination of URL patterns, HTTP methods, roles and transport constraints can be daunting to a programmer or administrator. It is important to realize that any combination that was intended to be secure but was not specified via security constraints, will mean that the web container will allow those requests. Security Constraints consist of Web Resource Collections (URL patterns, HTTP methods), Authorization Constraint (role names) and User Data Constraints (whether the web request needs to be received over a protected transport such as TLS). Admin Pages Protected Admin Area /faces/admin/* GET POST HEAD PUT OPTIONS TRACE DELETE admin NONE All Access None Protected User Area /faces/users/* GET POST HEAD PUT OPTIONS TRACE DELETE NONE If the constraints are in place you have to define, how the container should challenge the user. A web container can authenticate a web client/user using either HTTP BASIC, HTTP DIGEST, HTTPS CLIENT or FORM based authentication schemes. In this case we are using FORM based authentication and define the JDBCRealm FORM JDBCRealm /faces/login.xhtml /faces/loginerror.xhtml The realm name has to be the name that you assigned the security realm before. Close the web.xml and open the sun-web.xml to do a mapping from the application role-names to the actual groups that are in the database. This abstraction feels weird, but it has some reasons. It was introduced to have the option of mapping application roles to different group names in enterprises. I have never seen this used extensively but the feature is there and you have to configure it. Other appservers do make the assumption that if no mapping is present, role names and group names do match. GlassFish doesn't think so. Therefore you have to put the following into the glassfish-web.xml. You can create it via a right click on your project's WEB-INF folder, selecting "New > Other > GlassFish > GlassFish Descriptor" admin admin hat was it _basically_ ... everything you need is in place. The only thing that is missing are the users in the database. It is still empty ...We need to add a test user: Adding a Test-User to the Database And again we start by right clicking on the jdbcrealm database on the "Services" tab in NetBeans. Select "Execute Command" and insert the following: INSERT INTO USERS VALUES ("admin", "8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918"); INSERT INTO USERS_GROUPS VALUES ("admin", "admin"); You can login with user: admin and password: admin and access the secured area. Sample code to generate the hash could look like this: try { MessageDigest md = MessageDigest.getInstance("SHA-256"); String text = "admin"; md.update(text.getBytes("UTF-8")); // Change this to "UTF-16" if needed byte[] digest = md.digest(); BigInteger bigInt = new BigInteger(1, digest); String output = bigInt.toString(16); System.out.println(output); } catch (NoSuchAlgorithmException | UnsupportedEncodingException ex) { Logger.getLogger(PasswordTest.class.getName()).log(Level.SEVERE, null, ex); } Have fun securing your apps and keep the questions coming! In case you need it, the complete source code is on https://github.com/myfear/JDBCRealmExample
January 29, 2013
by Markus Eisele
· 39,391 Views · 1 Like
article thumbnail
OAuth 2.0 Bearer Token Profile Vs MAC Token Profile
Almost all the implementation I see today are based on OAuth 2.0 Bearer Token Profile. Of course its an RFC proposed standard today. OAuth 2.0 Bearer Token profile brings a simplified scheme for authentication. This specification describes how to use bearer tokens in HTTP requests to access OAuth 2.0 protected resources. Any party in possession of a bearer token (a "bearer") can use it to get access to the associated resources (without demonstrating possession of a cryptographic key). To prevent misuse, bearer tokens need to be protected from disclosure in storage and in transport. Before dig in to the OAuth 2.0 MAC profile lets have quick high-level overview of OAuth 2.0 message flow. OAuth 2.0 has mainly three phases. 1. Requesting an Authorization Grant. 2. Exchanging the Authorization Grant for an Access Token. 3. Access the resources with the Access Token. Where does the token type come in to action ? OAuth 2.0 core specification does not mandate any token type. At the same time at any point token requester - client - cannot decide which token type it needs. It's purely up to the Authorization Server to decide which token type to be returned in the Access Token response. So, the token type comes in to action in phase-2 when Authorization Server returning back the OAuth 2.0 Access Token. The access token type provides the client with the information required to successfully utilize the access token to make a protected resource request (along with type-specific attributes). The client must not use an access token if it does not understand the token type. Each access token type definition specifies the additional attributes (if any) sent to the client together with the "access_token" response parameter. It also defines the HTTP authentication method used to include the access token when making a protected resource request. For example following is what you get for Access Token response irrespective of which grant type you use. HTTP/1.1 200 OK Content-Type: application/json;charset=UTF-8 Cache-Control: no-store Pragma: no-cache { "access_token":"mF_9.B5f-4.1JqM", "token_type":"Bearer", "expires_in":3600, "refresh_token":"tGzv3JOkF0XG5Qx2TlKWIA" } The above is for Bearer - following is for MAC. HTTP/1.1 200 OK Content-Type: application/json Cache-Control: no-store { "access_token":"SlAV32hkKG", "token_type":"mac", "expires_in":3600, "refresh_token":"8xLOxBtZp8", "mac_key":"adijq39jdlaska9asud", "mac_algorithm":"hmac-sha-256" } Here you can see MAC Access Token response has two additional attributes. mac_key and the mac_algorithm. Let me rephrase this - "Each access token type definition specifies the additional attributes (if any) sent to the client together with the "access_token" response parameter". This MAC Token Profile defines the HTTP MAC access authentication scheme, providing a method for making authenticated HTTP requests with partial cryptographic verification of the request, covering the HTTP method, request URI, and host. In the above response access_token is the MAC key identifier. Unlike in Bearer, MAC token profile never passes it's top secret over the wire. The access_token or the MAC key identifier is a string identifying the MAC key used to calculate the request MAC. The string is usually opaque to the client. The server typically assigns a specific scope and lifetime to each set of MAC credentials. The identifier may denote a unique value used to retrieve the authorization information (e.g. from a database), or self-contain the authorization information in a verifiable manner (i.e. a string consisting of some data and a signature). The mac_key is a shared symmetric secret used as the MAC algorithm key. The server will not reissue a previously issued MAC key and MAC key identifier combination. Now let's see what happens in phase-3. Following shows how the Authorization HTTP header looks like when Bearer Token been used. Authorization: Bearer mF_9.B5f-4.1JqM This adds very low overhead on client side. It simply needs to pass the exact access_token it got from the Authorization Server in phase-2. Under MAC token profile, this is how it looks like. Authorization: MAC id="h480djs93hd8", ts="1336363200", nonce="dj83hs9s", mac="bhCQXTVyfj5cmA9uKkPFx1zeOXM=" This needs bit more attention. id is the MAC key identifier or the access_token from the phase-2. ts the request timestamp. The value is a positive integer set by the client when making each request to the number of seconds elapsed from a fixed point in time (e.g. January 1, 1970 00:00:00 GMT). This value is unique across all requests with the same timestamp and MAC key identifier combination. nonce is a unique string generated by the client. The value is unique across all requests with the same timestamp and MAC key identifier combination. The client uses the MAC algorithm and the MAC key to calculate the request mac. This is how you derive the normalized string to generate the HMAC. The normalized request string is a consistent, reproducible concatenation of several of the HTTP request elements into a single string. By normalizing the request into a reproducible string, the client and server can both calculate the request MAC over the exact same value. The string is constructed by concatenating together, in order, the following HTTP request elements, each followed by a new line character (%x0A): 1. The timestamp value calculated for the request. 2. The nonce value generated for the request. 3. The HTTP request method in upper case. For example: "HEAD", "GET", "POST", etc. 4. The HTTP request-URI as defined by [RFC2616] section 5.1.2. 5. The hostname included in the HTTP request using the "Host" request header field in lower case. 6. The port as included in the HTTP request using the "Host" request header field. If the header field does not include a port, the default value for the scheme MUST be used (e.g. 80 for HTTP and 443 for HTTPS). 7. The value of the "ext" "Authorization" request header field attribute if one was included in the request (this is optional), otherwise, an empty string. Each element is followed by a new line character (%x0A) including the last element and even when an element value is an empty string. Either you use Bearer of MAC - the end user or the resource owner is identified using the access_token. Authorization, throttling, monitoring or any other quality of service operations can be carried out against the access_token irrespective of which token profile you use.
January 24, 2013
by Prabath Siriwardena
· 37,124 Views
article thumbnail
How to Publish Maven Site Docs to BitBucket or GitHub Pages
In this post we will Utilize GitHub and/or BitBucket's static web page hosting capabilities to publish our project's Maven 3 Site Documentation. Each of the two SCM providers offer a slightly different solution to host static pages. The approach spelled out in this post would also be a viable solution to "backup" your site documentation in a supported SCM like Git or SVN. This solution does not directly cover site documentation deployment covered by the maven-site-plugin and the Wagon library (scp, WebDAV or FTP). There is one main project hosted on GitHub that I have posted with the full solution. The project URL is https://github.com/mike-ensor/clickconcepts-master-pom/. The POM has been pushed to Maven Central and will continue to be updated and maintained. com.clickconcepts.project master-site-pom 0.16 GitHub Pages GitHub hosts static pages by using a special branch "gh-pages" available to each GitHub project. This special branch can host any HTML and local resources like JavaScript, images and CSS. There is no server side development. To navigate to your static pages, the URL structure is as follows: http://.github.com/ An example of the project I am using in this blog post: http://mike-ensor.github.com/clickconcepts-master-pom/ where the first bold URL segment is a username and the second bold URL segment is the project. GitHub does allow you to create a base static hosted static site for your username by creating a repository with your username.github.com. The contents would be all of your HTML and associated static resources. This is not required to post documentation for your project, unlike the BitBucket solution. There is a GitHub Site plugin that publishes site documentation via GitHub's object API but this is outside the scope of this blog post because it does not provide a single solution for GitHub and BitBucket projects using Maven 3. BitBucket BitBucket provides a similar service to GitHub in that it hosts static HTML pages and their associated static resources. However, there is one large difference in how those pages are stored. Unlike GitHub, BitBucket requires you to create a new repository with a name fitting the convention. The files will be located on the master branch and each project will need to be a directory off of the root. mikeensor.bitbucket.org/ /some-project +index.html +... /css /img /some-other-project +index.html +... /css /img index.html .git .gitignore The naming convention is as follows: .bitbucket.org An example of a BitBucket static pages repository for me would be: http://mikeensor.bitbucket.org/. The structure does not require that you create an index.html page at the root of the project, but it would be advisable to avoid 404s. Generating Site Documentation Maven provides the ability to post documentation for your project by using the maven-site-plugin. This plugin is difficult to use due to the many configuration options that oftentimes are not well documented. There are many blog posts that can help you write your documentation including my post on maven site documentation. I did not mention how to use "xdoc", "apt" or other templating technologies to create documentation pages, but not to fear, I have provided this in my GitHub project. Putting it all Together The Maven SCM Publish plugin (http://maven.apache.org/plugins/maven-scm-publish-plugin/ publishes site documentation to a supported SCM. In our case, we are going to use Git through BitBucket or GitHub. Maven SCM Plugin does allow you to publish multi-module site documentation through the various properties, but the scope of this blog post is to cover single/mono module projects and the process is a bit painful. Take a moment to look at the POM file located in the clickconcepts-master-pom project. This master POM is rather comprehensive and the site documentation is only one portion of the project, but we will focus on the site documentation. There are a few things to point out here, first, the scm-publish plugin and the idiosyncronies when implementing the plugin. In order to create the site documentation, the "site" plugin must first be run. This is accomplished by running site:site. The plugin will generate the documentation into the "target/site" folder by default. The SCM Publish Plugin, by default, looks for the site documents to be in "target/staging" and is controlled by the content parameter. As you can see, there is a mismatch between folders. NOTE: My first approach was to run the site:stage command which is supposed to put the site documents into the "target/staging" folder. This is not entirely correct, the site plugin combines with the distributionManagement.site.url property to stage the documents, but there is very strange behavior and it is not documented well. In order to get the site plugin's site documents and the SCM Publish's location to match up, use the content property and set that to the location of the Site Plugin output (). If you are using GitHub, there is no modification to the siteOutputDirectory needed, however, if you are using BitBucket, you will need to modify the property to add in a directory layer into the site documentation generation (see above for differences between GitHub and BitBucket pages). The second property will tell the SCM Publish Plugin to look at the root "site" folder so that when the files are copied into the repository, the project folder will be the containing folder. The property will look like: ${project.build.directory}/site/ ${project.artifactId} ${project.build.directory} /site Next we will take a look at the custom properties defined in the master POM and used by the SCM Publish Plugin above. Each project will need to define several properties to use the Master POM that are used within the plugins during the site publishing. Fill in the variables with your own settings. BitBucket ... ... master scm:git:[email protected]:mikeensor/mikeensor.bitbucket.org.git ${project.build.directory}/site/${project.artifactId} ${project.build.directory}/site ${changelog.bitbucket.fileUri} ${changelog.revision.bitbucket.fileUri} ... ... GitHub ... ... gh-pages scm:git:[email protected]:mikeensor/clickconcepts-master-pom.git ${changelog.github.fileUri} ${changelog.revision.github.fileUri} ... ... NOTE: changelog parameters are required to use the Master POM and are not directly related to publishing site docs to GitHub or BitBucket How to Generate If you are using the Master POM (or have abstracted out the Site Plugin and the SCM Plugin) then to generate and publish the documentation is simple. mvn clean site:site scm-publish:publish-scm mvn clean site:site scm-publish:publish-scm -Dscmpublish.dryRun=true Gotchas In the SCM Publish Plugin documentation's "tips" they recommend creating a location to place the repository so that the repo is not cloned each time. There is a risk here in that if there is a git repository already in the folder, the plugin will overwrite the repository with the new site documentation. This was discovered by publishing two different projects and having my root repository wiped out by documentation from the second project. There are ways to mitigate this by adding in another folder layer, but make sure you test often! Another gotcha is to use the -Dscmpublish.dryRun=true to test out the site documentation process without making the SCM commit and push Project and Documentation URLs Here is a list of the fully working projects used to create this blog post: Master POM with Site and SCM Publish plugins &ndash https://github.com/mike-ensor/clickconcepts-master-pom. Documentation URL: http://mike-ensor.github.com/clickconcepts-master-pom/ Child Project using Master Pom &ndash http://mikeensor.bitbucket.org/fest-expected-exception. Documentation URL: http://mikeensor.bitbucket.org/fest-expected-exception/
January 23, 2013
by Mike Ensor
· 13,414 Views
article thumbnail
Spring Data JDBC Generic DAO Implementation: Most Lightweight ORM Ever
I am thrilled to announce first version of my Spring Data JDBC repository project. The purpose of this open source library is to provide generic, lightweight and easy to use DAO implementation for relational databases based on JdbcTemplate from Spring framework, compatible with Spring Data umbrella of projects. Design objectives Lightweight, fast and low-overhead. Only a handful of classes, no XML, annotations, reflection This is not full-blown ORM. No relationship handling, lazy loading, dirty checking, caching CRUD implemented in seconds For small applications where JPA is an overkill Use when simplicity is needed or when future migration e.g. to JPA is considered Minimalistic support for database dialect differences (e.g. transparent paging of results) Features Each DAO provides built-in support for: Mapping to/from domain objects through RowMapper abstraction Generated and user-defined primary keys Extracting generated key Compound (multi-column) primary keys Immutable domain objects Paging (requesting subset of results) Sorting over several columns (database agnostic) Optional support for many-to-one relationships Supported databases (continuously tested): MySQL PostgreSQL H2 HSQLDB Derby ...and most likely most of the others Easily extendable to other database dialects via SqlGenerator class. Easy retrieval of records by ID API Compatible with Spring Data PagingAndSortingRepository abstraction, all these methods are implemented for you: public interface PagingAndSortingRepository extends CrudRepository { T save(T entity); Iterable save(Iterable entities); T findOne(ID id); boolean exists(ID id); Iterable findAll(); long count(); void delete(ID id); void delete(T entity); void delete(Iterable entities); void deleteAll(); Iterable findAll(Sort sort); Page findAll(Pageable pageable); } Pageable and Sort parameters are also fully supported, which means you get paging and sorting by arbitrary properties for free. For example say you have userRepository extending PagingAndSortingRepository interface (implemented for you by the library) and you request 5th page of USERS table, 10 per page, after applying some sorting: Page page = userRepository.findAll( new PageRequest( 5, 10, new Sort( new Order(DESC, "reputation"), new Order(ASC, "user_name") ) ) ); Spring Data JDBC repository library will translate this call into (PostgreSQL syntax): SELECT * FROM USERS ORDER BY reputation DESC, user_name ASC LIMIT 50 OFFSET 10 ...or even (Derby syntax): SELECT * FROM ( SELECT ROW_NUMBER() OVER () AS ROW_NUM, t.* FROM ( SELECT * FROM USERS ORDER BY reputation DESC, user_name ASC ) AS t ) AS a WHERE ROW_NUM BETWEEN 51 AND 60 No matter which database you use, you'll get Page object in return (you still have to provide RowMapper yourself to translate from ResultSet to domain object. If you don't know Spring Data project yet, Page is a wonderful abstraction, not only encapsulating List , but also providing metadata such as total number of records, on which page we currently are, etc. Reasons to use You consider migration to JPA or even some NoSQL database in the future. Since your code will rely only on methods defined in PagingAndSortingRepository and CrudRepository from Spring Data Commons umbrella project you are free to switch from JdbcRepository implementation (from this project) to: JpaRepository, MongoRepository, GemfireRepository or GraphRepository. They all implement the same common API. Of course don't expect that switching from JDBC to JPA or MongoDB will be as simple as switching imported JAR dependencies - but at least you minimize the impact by using same DAO API. You need a fast, simple JDBC wrapper library. JPA or even MyBatis is an overkill You want to have full control over generated SQL if needed You want to work with objects, but don't need lazy loading, relationship handling, multi-level caching, dirty checking... You need CRUD and not much more You want to by DRY You are already using Spring or maybe even JdbcTemplate, but still feel like there is too much manual work You have very few database tables Getting started For more examples and working code don't forget to examine project tests. Prerequisites Maven coordinates: com.blogspot.nurkiewicz jdbcrepository 0.1 Unfortunately the project is not yet in maven central repository. For the time being you can install the library in your local repository by cloning it: $ git clone git://github.com/nurkiewicz/spring-data-jdbc-repository.git $ git checkout 0.1 $ mvn javadoc:jar source:jar install In order to start your project must have DataSource bean present and transaction management enabled. Here is a minimal MySQL configuration: @EnableTransactionManagement @Configuration public class MinimalConfig { @Bean public PlatformTransactionManager transactionManager() { return new DataSourceTransactionManager(dataSource()); } @Bean public DataSource dataSource() { MysqlConnectionPoolDataSource ds = new MysqlConnectionPoolDataSource(); ds.setUser("user"); ds.setPassword("secret"); ds.setDatabaseName("db_name"); return ds; } } Entity with auto-generated key Say you have a following database table with auto-generated key (MySQL syntax): CREATE TABLE COMMENTS ( id INT AUTO_INCREMENT, user_name varchar(256), contents varchar(1000), created_time TIMESTAMP NOT NULL, PRIMARY KEY (id) ); First you need to create domain object User mapping to that table (just like in any other ORM): public class Comment implements Persistable { private Integer id; private String userName; private String contents; private Date createdTime; @Override public Integer getId() { return id; } @Override public boolean isNew() { return id == null; } //getters/setters/constructors/... } Apart from standard Java boilerplate you should notice implementing Persistable where Integer is the type of primary key. Persistable is an interface coming from Spring Data project and it's the only requirement we place on your domain object. Finally we are ready to create our CommentRepository DAO: @Repository public class CommentRepository extends JdbcRepository { public CommentRepository() { super(ROW_MAPPER, ROW_UNMAPPER, "COMMENTS"); } public static final RowMapper ROW_MAPPER = //see below private static final RowUnmapper ROW_UNMAPPER = //see below @Override protected Comment postCreate(Comment entity, Number generatedId) { entity.setId(generatedId.intValue()); return entity; } } First of all we use @Repository annotation to mark DAO bean. It enables persistence exception translation. Also such annotated beans are discovered by CLASSPATH scanning. As you can see we extend JdbcRepository which is the central class of this library, providing implementations of all PagingAndSortingRepository methods. Its constructor has three required dependencies: RowMapper , RowUnmapper and table name. You may also provide ID column name, otherwise default "id" is used. If you ever used JdbcTemplate from Spring, you should be familiar with RowMapper interface. We need to somehow extract columns from ResultSet into an object. After all we don't want to work with raw JDBC results. It's quite straightforward: public static final RowMapper ROW_MAPPER = new RowMapper () { @Override public Comment mapRow(ResultSet rs, int rowNum) throws SQLException { return new Comment( rs.getInt("id"), rs.getString("user_name"), rs.getString("contents"), rs.getTimestamp("created_time") ); } }; RowUnmapper comes from this library and it's essentially the opposite of RowMapper : takes an object and turns it into a Map . This map is later used by the library to construct SQL CREATE / UPDATE queries: private static final RowUnmapper ROW_UNMAPPER = new RowUnmapper () { @Override public Map mapColumns(Comment comment) { Map mapping = new LinkedHashMap (); mapping.put("id", comment.getId()); mapping.put("user_name", comment.getUserName()); mapping.put("contents", comment.getContents()); mapping.put("created_time", new java.sql.Timestamp(comment.getCreatedTime().getTime())); return mapping; } }; If you never update your database table (just reading some reference data inserted elsewhere) you may skip RowUnmapper parameter or use MissingRowUnmapper. Last piece of the puzzle is the postCreate() callback method which is called after an object was inserted. You can use it to retrieve generated primary key and update your domain object (or return new one if your domain objects are immutable). If you don't need it, just don't override postCreate() . Check out JdbcRepositoryGeneratedKeyTest for a working code based on this example. By now you might have a feeling that, compared to JPA or Hibernate, there is quite a lot of manual work. However various JPA implementations and other ORM frameworks are notoriously known for introducing significant overhead and manifesting some learning curve. This tiny library intentionally leaves some responsibilities to the user in order to avoid complex mappings, reflection, annotations... all the implicitness that is not always desired. This project is not intending to replace mature and stable ORM frameworks. Instead it tries to fill in a niche between raw JDBC and ORM where simplicity and low overhead are key features. Entity with manually assigned key In this example we'll see how entities with user-defined primary keys are handled. Let's start from database model: CREATE TABLE USERS ( user_name varchar(255), date_of_birth TIMESTAMP NOT NULL, enabled BIT(1) NOT NULL, PRIMARY KEY (user_name) ); ...and User domain model: public class User implements Persistable { private transient boolean persisted; private String userName; private Date dateOfBirth; private boolean enabled; @Override public String getId() { return userName; } @Override public boolean isNew() { return !persisted; } public User withPersisted(boolean persisted) { this.persisted = persisted; return this; } //getters/setters/constructors/... } Notice that special persisted transient flag was added. Contract of CrudRepository.save() from Spring Data project requires that an entity knows whether it was already saved or not ( isNew() ) method - there are no separate create() and update() methods. Implementing isNew() is simple for auto-generated keys (see Comment above) but in this case we need an extra transient field. If you hate this workaround and you only insert data and never update, you'll get away with return true all the time from isNew() . And finally our DAO, UserRepository bean: @Repository public class UserRepository extends JdbcRepository { public UserRepository() { super(ROW_MAPPER, ROW_UNMAPPER, "USERS", "user_name"); } public static final RowMapper ROW_MAPPER = //... public static final RowUnmapper ROW_UNMAPPER = //... @Override protected User postUpdate(User entity) { return entity.withPersisted(true); } @Override protected User postCreate(User entity, Number generatedId) { return entity.withPersisted(true); } } "USERS" and "user_name" parameters designate table name and primary key column name. I'll leave the details of mapper and unmapper (see source code). But please notice postUpdate() and postCreate() methods. They ensure that once object was persisted, persisted flag is set so that subsequent calls to save() will update existing entity rather than trying to reinsert it. Check out JdbcRepositoryManualKeyTest for a working code based on this example. Compound primary key We also support compound primary keys (primary keys consisting of several columns). Take this table as an example: CREATE TABLE BOARDING_PASS ( flight_no VARCHAR(8) NOT NULL, seq_no INT NOT NULL, passenger VARCHAR(1000), seat CHAR(3), PRIMARY KEY (flight_no, seq_no) ); I would like you to notice the type of primary key in Peristable : public class BoardingPass implements Persistable { private transient boolean persisted; private String flightNo; private int seqNo; private String passenger; private String seat; @Override public Object[] getId() { return pk(flightNo, seqNo); } @Override public boolean isNew() { return !persisted; } //getters/setters/constructors/... } Unfortunately we don't support small value classes encapsulating all ID values in one object (like JPA does with @IdClass), so you have to live with Object[] array. Defining DAO class is similar to what we've already seen: public class BoardingPassRepository extends JdbcRepository { public BoardingPassRepository() { this("BOARDING_PASS"); } public BoardingPassRepository(String tableName) { super(MAPPER, UNMAPPER, new TableDescription(tableName, null, "flight_no", "seq_no") ); } public static final RowMapper ROW_MAPPER = //... public static final RowUnmapper UNMAPPER = //... } Two things to notice: we extend JdbcRepository and we provide two ID column names just as expected: "flight_no", "seq_no" . We query such DAO by providing both flight_no and seq_no (necessarily in that order) values wrapped by Object[] : BoardingPass pass = repository.findOne(new Object[] {"FOO-1022", 42}); No doubts, this is cumbersome in practice, so we provide tiny helper method which you can statically import: import static com.blogspot.nurkiewicz.jdbcrepository.JdbcRepository.pk; //... BoardingPass foundFlight = repository.findOne(pk("FOO-1022", 42)); Check out JdbcRepositoryCompoundPkTest for a working code based on this example. Transactions This library is completely orthogonal to transaction management. Every method of each repository requires running transaction and it's up to you to set it up. Typically you would place @Transactional on service layer (calling DAO beans). I don't recommend placing @Transactional over every DAO bean. Caching Spring Data JDBC repository library is not providing any caching abstraction or support. However adding @Cacheable layer on top of your DAOs or services using caching abstraction in Spring is quite straightforward. See also: @Cacheable overhead in Spring. Contributions ..are always welcome. Don't hesitate to submit bug reports and pull requests. Biggest missing feature now is support for MSSQL and Oracle databases. It would be terrific if someone could have a look at it. Testing This library is continuously tested using Travis (). Test suite consists of 265 tests (53 distinct tests each run against 5 different databases: MySQL, PostgreSQL, H2, HSQLDB and Derby. When filling bug reports or submitting new features please try including supporting test cases. Each pull request is automatically tested on a separate branch. Building After forking the official repository building is as simple as running: $ mvn install You'll notice plenty of exceptions during JUnit test execution. This is normal. Some of the tests run against MySQL and PostgreSQL available only on Travis CI server. When these database servers are unavailable, whole test is simply skipped: Results : Tests run: 265, Failures: 0, Errors: 0, Skipped: 106 Exception stack traces come from root AbstractIntegrationTest. Design Library consists of only a handful of classes, highlighted in the diagram below: JdbcRepository is the most important class that implements all PagingAndSortingRepository methods. Each user repository has to extend this class. Also each such repository must at least implement RowMapper and RowUnmapper (only if you want to modify table data). SQL generation is delegated to SqlGenerator. PostgreSqlGenerator. and DerbySqlGenerator are provided for databases that don't work with standard generator. License This project is released under version 2.0 of the Apache License (same as Spring framework).
January 22, 2013
by Tomasz Nurkiewicz
· 76,619 Views · 2 Likes
article thumbnail
9 Software Security Design Principles
The term security has many meanings based on the context and perspective in which it is used. Security from the perspective of software/system development is the continuous process of maintaining confidentiality, integrity, and availability of a system, sub-system, and system data. This definition at a very high level can be restated as the following: Computer security is a continuous process dealing with confidentiality, integrity, and availability on multiple layers of a system. Key Aspects of Software Security Integrity Confidentiality Availability Integrity within a system is the concept of ensuring only authorized users can only manipulate information through authorized methods and procedures. An example of this can be seen in a simple lead management application. If the business decided to allow each sales member to only update their own leads in the system and sales managers can update all leads in the system then an integrity violation would occur if a sales member attempted to update someone else’s leads. An integrity violation occurs when a team member attempts to update someone else’s lead because it was not entered by the sales member. This violates the business rule that leads can only be update by the originating sales member. Confidentiality within a system is the concept of preventing unauthorized access to specific information or tools. In a perfect world the knowledge of the existence of confidential information/tools would be unknown to all those who do not have access. When this this concept is applied within the context of an application only the authorized information/tools will be available. If we look at the sales lead management system again, leads can only be updated by originating sales members. If we look at this rule then we can say that all sales leads are confidential between the system and the sales person who entered the lead in to the system. The other sales team members would not need to know about the leads let alone need to access it. Availability within a system is the concept of authorized users being able to access the system. A real world example can be seen again from the lead management system. If that system was hosted on a web server then IP restriction can be put in place to limit access to the system based on the requesting IP address. If in this example all of the sales members where accessing the system from the 192.168.1.23 IP address then removing access from all other IPs would be need to ensure that improper access to the system is prevented while approved users can access the system from an authorized location. In essence if the requesting user is not coming from an authorized IP address then the system will appear unavailable to them. This is one way of controlling where a system is accessed. Through the years several design principles have been identified as being beneficial when integrating security aspects into a system. These principles in various combinations allow for a system to achieve the previously defined aspects of security based on generic architectural models. Security Design Principles Least Privilege Fail-Safe Defaults Economy of Mechanism Complete Mediation Open Design Separation Privilege Least Common Mechanism Psychological Acceptability Defense in Depth Least Privilege Design Principle The Least Privilege design principle requires a minimalistic approach to granting user access rights to specific information and tools. Additionally, access rights should be time based as to limit resources access bound to the time needed to complete necessary tasks. The implications of granting access beyond this scope will allow for unnecessary access and the potential for data to be updated out of the approved context. The assigning of access rights will limit system damaging attacks from users whether they are intentional or not. This principle attempts to limit data changes and prevents potential damage from occurring by accident or error by reducing the amount of potential interactions with a resource. Fail-Safe Defaults Design Principle The Fail-Safe Defaults design principle pertains to allowing access to resources based on granted access over access exclusion. This principle is a methodology for allowing resources to be accessed only if explicit access is granted to a user. By default users do not have access to any resources until access has been granted. This approach prevents unauthorized users from gaining access to resource until access is given. Economy of Mechanism Design Principle The Economy of mechanism design principle requires that systems should be designed as simple and small as possible. Design and implementation errors result in unauthorized access to resources that would not be noticed during normal use. Complete Mediation Design Principle The Complete Mediation design principle states that every access to every resource must be validated for authorization. Open Design Design Principle The Open Design Design Principle is a concept that the security of a system and its algorithms should not be dependent on secrecy of its design or implementation Separation Privilege Design Principle The separation privilege design principle requires that all resource approved resource access attempts be granted based on more than a single condition. For example a user should be validated for active status and has access to the specific resource. Least Common Mechanism Design Principle The Least Common Mechanism design principle declares that mechanisms used to access resources should not be shared. Psychological Acceptability Design Principle The Psychological Acceptability design principle refers to security mechanisms not make resources more difficult to access than if the security mechanisms were not present Defense in Depth Design Principle The Defense in Depth design principle is a concept of layering resource access authorization verification in a system reduces the chance of a successful attack. This layered approach to resource authorization requires unauthorized users to circumvent each authorization attempt to gain access to a resource. When designing a system that requires meeting a security quality attribute architects need consider the scope of security needs and the minimum required security qualities. Not every system will need to use all of the basic security design principles but will use one or more in combination based on a company’s and architect’s threshold for system security because the existence of security in an application adds an additional layer to the overall system and can affect performance. That is why the definition of minimum security acceptably is need when a system is design because this quality attributes needs to be factored in with the other system quality attributes so that the system in question adheres to all qualities based on the priorities of the qualities. Resources: Barnum, Sean. Gegick, Michael. (2005). Least Privilege. Retrieved on August 28, 2011 from https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/principles/351-BSI.html Saltzer, Jerry. (2011). BASIC PRINCIPLES OF INFORMATION PROTECTION. Retrieved on August 28, 2011 from http://web.mit.edu/Saltzer/www/publications/protection/Basic.html Barnum, Sean. Gegick, Michael. (2005). Defense in Depth. Retrieved on August 28, 2011 from https://buildsecurityin.us-cert.gov/bsi/articles/knowledge/principles/347-BSI.html Bertino, Elisa. (2005). Design Principles for Security. Retrieved on August 28, 2011 from http://homes.cerias.purdue.edu/~bhargav/cs526/security-9.pdf
January 15, 2013
by Todd Merritt
· 99,281 Views · 1 Like
article thumbnail
How to Un-install a Plugin From Eclipse / STS?
It is easy to do - a few button clicks (generally) - but the button location is damn unintuitive. So, this is what you have got to do Go to "Help" menu item. Click on "About ..." button (why on earth should I click that when I am trying to un-install a plugin. By the way, the menu item just above "About ..." is "Install New Software ...". Would it have been too much pain to have a "Manage plugins" and / or "Un-install plugins" right underneath it?) A form opens up. At the bottom of it there is button "Installation details". Click that. (Again, why on earth would anyone think "Installation details" would have anything to do with un-installing stuff. I would have expected only a static display of stuff that are already installed.) Another multi tabbed form opens up (Anyone keeping count of the number of windows opened already. This is the 3rd window by now, including the parent editor window) which shows all the installed plugins. If you select any of the installed plugins, a button to "uninstall" becomes available. Click that and you should be able to un-install and after a restart everything should be fine. My interest in software and IT has always been much more than a 9 to 5 job (and I am sure there is a huge population that it holds equally true for). I have always wanted software to be efficient and beautiful apart from doing it's job. However, it took an excellent session on usability (which I joined only with casual curiosity but left with renewed interest in the subject and admiration for David Travis who delivered the course) to get me to start looking at all software with an "user's" perspective. And I was surprised with what I found and how it changed my coding. I have been using Eclipse and STS for years now (nearing a decade now) and I absolutely love these software. However when you start looking at them as a "user" and not only as a developer, there are quite a few usability opportunities of improvement that meets the eye. This article - apart from helping folks looking to un-install plugins in Eclipse - is also intended at folks who design Eclipse - just a humble request to consider this also as a usability improvement.
January 1, 2013
by Partha Bhattacharjee
· 16,194 Views
article thumbnail
How to Override Java Security Configuration per JVM Instance
Lately I encountered a configuration tweak I was not aware of, the problem: I had a single Java installation on a Linux machine from which I had to start two JVM instances - each using a different set of JCE providers. A reminder: the JVM loads its security configuration, including the JCE providers list, from a master security properties file within the JRE folder (JRE_HOME/lib/security/java.security), the location of that file is fixed in the JVM and cannot be modified. Going over the documentation (not too much helpful, I must admit) and the code (more helpful, look for Security.java, for example here) reveled the secret. security.overridePropertiesFile It all starts within the default java.security file provided with the JVM, looking at it we will find the following (somewhere around the middle of the file) # # Determines whether this properties file can be appended to # or overridden on the command line via -Djava.security.properties # security.overridePropertiesFile=true If the overridePropertiesFile doesn’t equal to true we can stop here - the rest of this article is irrelevant (unless we have the option to change it – but I didn’t have that). Lucky to me by default it does equal to true. java.security.properties Next step, the interesting one, is to override or append configuration to the default java.security file per JVM execution. This is done by setting the 'java.security.properties' system property to point to a properties file as part of the JVM invocation; it is important to notice that referencing to the file can be done in one of two flavors: Overriding the entire file provided by the JVM - if the first character in the java.security.properties' value is the equals sign the default configuration file will be entirely ignored, only the values in the file we are pointing to will be affective Appending and overriding values of the default file - any other first character in the property's value (that is the first character in the alternate configuration file path) means that the alternate file will be loaded and appended to the default one. If the alternate file contains properties which are already in the default configuration file the alternate file will override those properties. Here are two examples # # Completely override the default java.security file content # (notice the *two* equal signs) # java -Djava.security.properties==/etc/sysconfig/jvm1.java.security # # Append or override parts of the default java.security file # (notice the *single* equal sign) # java -Djava.security.properties=/etc/sysconfig/jvm.java.security Be Carefull As an important configuration option as it is we must not forget its security implications. We should always make sure that no one can tamper the value of the property and that no one can tamper the alternate file content if he shouldn't be allowed to.
December 3, 2012
by Eyal Lupu
· 74,008 Views
article thumbnail
Spring AOP in Security - Controlling Creation of UI Components via Aspects
The following post will show how in one of the projects that I took part in we used Spring's AOP to introduce some security related functionalities. The concept was such that in order for the user to see some UI components he needed to have a certain level of security privillages. If that requirement was not met then the UIComponent was not presented. Let's take a look at the project structure: Then there were also the aopApplicationContext.xml : Now let's take a look at the most interesting lines of the Spring's application context. First we have all the required schemas - I don't think that this needs to be explained in more depth. Then we have: which enables the @AspectJ support. Next there is the first we are turning on Spring configuration via annotations. Then deliberatly we exclude aspects from being initialized as beans by Spring itself. Why? Because... we want to create the aspect by ourselves and provide the factory-method="aspectOf" . By doing so our aspect will be included in the autowiring process of our beans - thus all the fields annotated with the @Autowired annotation will get the beans injected. Now let's move on to the code: UserServiceImpl.java package pl.grzejszczak.marcin.aop.service; import org.springframework.stereotype.Service; import pl.grzejszczak.marcin.aop.type.Role; import pl.grzejszczak.marcin.aop.user.UserHolder; @Service public class UserServiceImpl implements UserService { private UserHolder userHolder; @Override public UserHolder getCurrentUser() { return userHolder; } @Override public void setCurrentUser(UserHolder userHolder) { this.userHolder = userHolder; } @Override public Role getUserRole() { if (userHolder == null) { return null; } return userHolder.getUserRole(); } } The class UserServiceImpl is immitating a service that would get the current user information from the db or from the current application context. UserHolder.java package pl.grzejszczak.marcin.aop.user; import pl.grzejszczak.marcin.aop.type.Role; public class UserHolder { private Role userRole; public UserHolder(Role userRole) { this.userRole = userRole; } public Role getUserRole() { return userRole; } public void setUserRole(Role userRole) { this.userRole = userRole; } } This is a simple holder class that holds information about current user Role. Role.java package pl.grzejszczak.marcin.aop.type; public enum Role { ADMIN("ADM"), WRITER("WRT"), GUEST("GST"); private String name; private Role(String name) { this.name = name; } public static Role getRoleByName(String name) { for (Role role : Role.values()) { if (role.name.equals(name)) { return role; } } throw new IllegalArgumentException("No such role exists [" + name + "]"); } public String getName() { return this.name; } @Override public String toString() { return name; } } Role is an enum that defines a role for a person being an Admin, Writer or a Guest. UIComponent.java package pl.grzejszczak.marcin.aop.ui; public abstract class UIComponent { protected String componentName; protected String getComponentName() { return componentName; } } An abstraction over concrete implementations of some UI components. SomeComponentForAdminAndGuest.java package pl.grzejszczak.marcin.aop.ui; import pl.grzejszczak.marcin.aop.annotation.SecurityAnnotation; import pl.grzejszczak.marcin.aop.type.Role; @SecurityAnnotation(allowedRole = { Role.ADMIN, Role.GUEST }) public class SomeComponentForAdminAndGuest extends UIComponent { public SomeComponentForAdminAndGuest() { this.componentName = "SomeComponentForAdmin"; } public static UIComponent getComponent() { return new SomeComponentForAdminAndGuest(); } } This component is an example of a UI component extention that can be seen only by users who have roles of Admin or Guest. SecurityAnnotation.java package pl.grzejszczak.marcin.aop.annotation; import java.lang.annotation.Retention; import java.lang.annotation.RetentionPolicy; import pl.grzejszczak.marcin.aop.type.Role; @Retention(RetentionPolicy.RUNTIME) public @interface SecurityAnnotation { Role[] allowedRole(); } Annotation that defines a roles that can have this component created. UIFactoryImpl.java package pl.grzejszczak.marcin.aop.ui; import org.apache.commons.lang.NullArgumentException; import org.springframework.stereotype.Component; @Component public class UIFactoryImpl implements UIFactory { @Override public UIComponent createComponent(Class componentClass) throws Exception { if (componentClass == null) { throw new NullArgumentException("Provide class for the component"); } return (UIComponent) Class.forName(componentClass.getName()).newInstance(); } } A factory class that given the class of an object that extends UIComponent returns a new instance of the given UIComponent. SecurityInterceptor.java package pl.grzejszczak.marcin.aop.interceptor; import java.lang.annotation.Annotation; import java.lang.reflect.AnnotatedElement; import java.util.Arrays; import java.util.List; import org.aspectj.lang.ProceedingJoinPoint; import org.aspectj.lang.annotation.Around; import org.aspectj.lang.annotation.Aspect; import org.aspectj.lang.annotation.Pointcut; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import pl.grzejszczak.marcin.aop.annotation.SecurityAnnotation; import pl.grzejszczak.marcin.aop.service.UserService; import pl.grzejszczak.marcin.aop.type.Role; import pl.grzejszczak.marcin.aop.ui.UIComponent; @Aspect public class SecurityInterceptor { private static final Logger LOGGER = LoggerFactory.getLogger(SecurityInterceptor.class); public SecurityInterceptor() { LOGGER.debug("Security Interceptor created"); } @Autowired private UserService userService; @Pointcut("execution(pl.grzejszczak.marcin.aop.ui.UIComponent pl.grzejszczak.marcin.aop.ui.UIFactory.createComponent(..))") private void getComponent(ProceedingJoinPoint thisJoinPoint) { } @Around("getComponent(thisJoinPoint)") public UIComponent checkSecurity(ProceedingJoinPoint thisJoinPoint) throws Throwable { LOGGER.info("Intercepting creation of a component"); Object[] arguments = thisJoinPoint.getArgs(); if (arguments.length == 0) { return null; } Annotation annotation = checkTheAnnotation(arguments); boolean securityAnnotationPresent = (annotation != null); if (securityAnnotationPresent) { boolean userHasRole = verifyRole(annotation); if (!userHasRole) { LOGGER.info("Current user doesn't have permission to have this component created"); return null; } } LOGGER.info("Current user has required permissions for creating a component"); return (UIComponent) thisJoinPoint.proceed(); } /** * Basing on the method's argument check if the class is annotataed with * {@link SecurityAnnotation} * * @param arguments * @return */ private Annotation checkTheAnnotation(Object[] arguments) { Object concreteClass = arguments[0]; LOGGER.info("Argument's class - [{}]", new Object[] { arguments }); AnnotatedElement annotatedElement = (AnnotatedElement) concreteClass; Annotation annotation = annotatedElement.getAnnotation(SecurityAnnotation.class); LOGGER.info("Annotation present - [{}]", new Object[] { annotation }); return annotation; } /** * The function verifies if the current user has sufficient privilages to * have the component built * * @param annotation * @return */ private boolean verifyRole(Annotation annotation) { LOGGER.info("Security annotation is present so checking if the user can use it"); SecurityAnnotation annotationRule = (SecurityAnnotation) annotation; List requiredRolesList = Arrays.asList(annotationRule.allowedRole()); Role userRole = userService.getUserRole(); return requiredRolesList.contains(userRole); } } This is the aspect defined at the pointcut of executing a function createComponent of the UIFactory interface. Inside the Around advice there is the logic that first checks what kind of an argument has been passed to the method createComponent (for example SomeComponentForAdminAndGuest.class). Next it is checking if this class is annotated with SecurityAnnotation and if that is the case it checks what kind of Roles are required to have the component created. Afterwards it checks if the current user (from UserService to UserHolder's Roles) has the required role to present the component. If that is the case thisJoinPoint.proceed() is called which in effect returns the object of the class that extends UIComponent. Now let's test it - here comes the SpringJUnit4ClassRunner AopTest.java package pl.grzejszczak.marcin.aop; import org.junit.Assert; import org.junit.Test; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import pl.grzejszczak.marcin.aop.service.UserService; import pl.grzejszczak.marcin.aop.type.Role; import pl.grzejszczak.marcin.aop.ui.SomeComponentForAdmin; import pl.grzejszczak.marcin.aop.ui.SomeComponentForAdminAndGuest; import pl.grzejszczak.marcin.aop.ui.SomeComponentForGuest; import pl.grzejszczak.marcin.aop.ui.SomeComponentForWriter; import pl.grzejszczak.marcin.aop.ui.UIFactory; import pl.grzejszczak.marcin.aop.user.UserHolder; @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations = { "classpath:aopApplicationContext.xml" }) public class AopTest { @Autowired private UIFactory uiFactory; @Autowired private UserService userService; @Test public void adminTest() throws Exception { userService.setCurrentUser(new UserHolder(Role.ADMIN)); Assert.assertNotNull(uiFactory.createComponent(SomeComponentForAdmin.class)); Assert.assertNotNull(uiFactory.createComponent(SomeComponentForAdminAndGuest.class)); Assert.assertNull(uiFactory.createComponent(SomeComponentForGuest.class)); Assert.assertNull(uiFactory.createComponent(SomeComponentForWriter.class)); } } And the logs: pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:26 Security Interceptor created pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:38 Intercepting creation of a component pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:48 Argument's class - [[class pl.grzejszczak.marcin.aop.ui.SomeComponentForAdmin]] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:54 Annotation present - [@pl.grzejszczak.marcin.aop.annotation.SecurityAnnotation(allowedRole=[ADM])] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:57 Security annotation is present so checking if the user can use it pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:70 Current user has required permissions for creating a component pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:38 Intercepting creation of a component pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:48 Argument's class - [[class pl.grzejszczak.marcin.aop.ui.SomeComponentForAdminAndGuest]] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:54 Annotation present - [@pl.grzejszczak.marcin.aop.annotation.SecurityAnnotation(allowedRole=[ADM, GST])] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:57 Security annotation is present so checking if the user can use it pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:70 Current user has required permissions for creating a component pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:38 Intercepting creation of a component pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:48 Argument's class - [[class pl.grzejszczak.marcin.aop.ui.SomeComponentForGuest]] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:54 Annotation present - [@pl.grzejszczak.marcin.aop.annotation.SecurityAnnotation(allowedRole=[GST])] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:57 Security annotation is present so checking if the user can use it pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:66 Current user doesn't have permission to have this component created pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:38 Intercepting creation of a component pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:48 Argument's class - [[class pl.grzejszczak.marcin.aop.ui.SomeComponentForWriter]] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:54 Annotation present - [@pl.grzejszczak.marcin.aop.annotation.SecurityAnnotation(allowedRole=[WRT])] pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:57 Security annotation is present so checking if the user can use it pl.grzejszczak.marcin.aop.interceptor.SecurityInterceptor:66 Current user doesn't have permission to have this component created The unit test shows that for given Admin role only first two components get created whereas for the two others nulls are returned (due to the fact that user doesn't have proper rights). That is how in our project we used Spring's AOP to create a simple framework that would check if the user can have the given component created or not. Thanks to this after having programmed the aspects one doesn't have to remember about writing any security related code since it will be done for him. If you have any suggestions related to this post please feel free to comment it :)
November 7, 2012
by Marcin Grzejszczak
· 25,724 Views · 2 Likes
article thumbnail
Gradle Goodness: Exclude Transitive Dependency from All Configurations
We can exclude transitive dependencies easily from specific configurations. To exclude them from all configurations we can use Groovy's spread-dot operator and invoke the exclude() method on each configuration. We can only define the group, module or both as arguments for the exclude() method. The following part of a build file shows how we can exclude a dependency from all configurations: ... configurations { all*.exclude group: 'xml-apis', module: 'xmlParserAPIs' } // Equivalent to: configurations { all.collect { configuration -> configuration.exclude group: 'xml-apis', module: 'xmlParserAPIs' } } ...
November 1, 2012
by Hubert Klein Ikkink
· 18,300 Views
article thumbnail
What's up with the JUnit and Hamcrest Dependencies?
It's awesome that JUnit is recognizing the usefulness of Hamcrest, because I use these two a lot. However, I find JUnit packaging of their dependencies odd, and can cause class loading problem if you are not careful. Let's take a closer look. If you look at junit:junit:4.10 from Maven Central, you will see that it has this dependencies graph: +- junit:junit:jar:4.10:test | - org.hamcrest:hamcrest-core:jar:1.1:test This is great, except that inside the junit-4.10.jar, you will also find the hamcrest-core-1.1.jar content are embedded! But why??? I suppose it's a convenient for folks who use Ant, so that they save one jar to package in their lib folder, but it's not very Maven friendly. And you also expect classloading trouble if you want to upgrade Hamcrest or use extra Hamcrest modules. Now if you use Hamcrest long enough, you know that most of their goodies are in the second module named hamcrest-library, but this JUnit didn't package in. JUnit however chose to include some JUnit+Hamcrest extension of their own. Now including duplicated classes in jar are very trouble maker, so JUnit has a separated module junit-dep that doesn't include Hamcrest core package and help you avoid this issue. So if you are using Maven project, you should use this instead. junit junit-dep 4.10 test org.hamcrest hamcrest-core org.hamcrest hamcrest-library 1.2.1 test See how I have to exclude hamcrest from junit. This is needed if you want hamcrest-library that has higher version than the one JUnit comes with, which is 1.1. Interesting enough, Maven's dependencies in pom is order sensitive when it comes to auto resolving conflicting versions dependencies. Actually it would just pick the first one found and ignore the rest. So you can shorten above without exclusion if, only if, you place the Hamcrest bofore JUnit like this: org.hamcrest hamcrest-library 1.2.1 test junit junit-dep 4.10 test This should make Maven use the following dependencies: +- org.hamcrest:hamcrest-library:jar:1.2.1:test | \- org.hamcrest:hamcrest-core:jar:1.2.1:test +- junit:junit-dep:jar:4.10:test However I think using the exclusion tag would probably give you more stable build and not rely on Maven implicit ordering rule. And it avoid easy mistake for Maven beginer users. However I wish JUnit would do a better job at packaging and remove duplicated classes in jar. I personally think it's more productive for JUnit to also include hamcrest-libray instead of just the hamcrest-core jar. What do you think?
October 17, 2012
by Zemian Deng
· 36,031 Views
article thumbnail
Resolve Circular Dependency in Spring Autowiring
I would consider this post as best practice for using Spring in enterprise application development.
September 27, 2012
by Gal Levinsky
· 127,438 Views · 24 Likes
article thumbnail
Fixing Common Java Security Code Violations in Sonar
This article aims to show you how to quickly fix the most common java security code violations. It assumes that you are familiar with the concept of code rules and violations and how Sonar reports on them. However, if you haven’t heard these terms before then you might take a look at Sonar Concepts or the forthcoming book about Sonar for a more detailed explanation. To get an idea, during Sonar analysis, your project is scanned by many tools to ensure that the source code conforms with the rules you’ve created in your quality profile. Whenever a rule is violated… well a violation is raised. With Sonar you can track these violations with violations drilldown view or in the source code editor. There are hundreds of rules, categorized based on their importance. Ill try, in future posts, to cover as many as I can but for now let’s take a look at some common security rules / violations. There are two pairs of rules (all of them are ranked as critical in Sonar ) we are going to examine right now. 1. Array is Stored Directly ( PMD ) and Method returns internal array ( PMD ) These violations appear in the cases when an internal Array is stored or returned directly from a method. The following example illustrates a simple class that violates these rules. public class CalendarYear { private String[] months; public String[] getMonths() { return months; } public void setMonths(String[] months) { this.months = months; } } To eliminate them you have to clone the Array before storing / returning it as shown in the following class implementation, so noone can modify or get the original data of your class but only a copy of them. public class CalendarYear { private String[] months; public String[] getMonths() { return months.clone(); } public void setMonths(String[] months) { this.months = months.clone(); } } 2. Nonconstant string passed to execute method on an SQL statement (findbugs) and A prepared statement is generated from a nonconstant String (findbugs) Both rules are related to database access when using JDBC libraries. Generally there are two ways to execute an SQL Commants via JDBC connection : Statement and PreparedStatement. There is a lot of discussion about pros and cons but it’s out of the scope of this post. Let’s see how the first violation is raised based on the following source code snippet. Statement stmt = conn.createStatement(); String sqlCommand = "Select * FROM customers WHERE name = '" + custName + "'"; stmt.execute(sqlCommand); You’ve already noticed that the sqlcommand parameter passed to execute method is dynamically created during run-time which is not acceptable by this rule. Similar situations causes the second violation. String sqlCommand = "insert into customers (id, name) values (?, ?)"; Statement stmt = conn.prepareStatement(sqlCommand); You can overcome this problems with three different ways. You can either use StringBuilder or String.format method to create the values of the string variables. If applicable you can define the SQL Commands as Constant in class declaration, but it’s only for the case where the SQL command is not required to be changed in runtime. Let’s re-write the first code snippet using StringBuilder Statement stmt = conn.createStatement(); stmt.execute(new StringBuilder("Select FROM customers WHERE name = '"). append(custName). append("'").toString()); and using String.format Statement stmt = conn.createStatement(); String sqlCommand = String.format("Select * from customers where name = '%s'", custName); stmt.execute(sqlCommand); For the second example you can just declare the sqlCommand as following private static final SQLCOMMAND = insert into customers (id, name) values (?, ?)"; There are more security rules such as the blocker Hardcoded constant database password but I assume that nobody is still hardcodes passwords in source code files… In following articles I’m going to show you how to adhere to performance and bad practice rules. Until then I’m waiting for your comments or suggestions.
September 26, 2012
by Patroklos Papapetrou
· 27,098 Views
article thumbnail
Building a Simple TCP Proxy Server with node.js
Today we're going to build a simple TCP proxy server. The scenario: we've got one host (the client) that establishes a TCP connection to another one (the remote). client —> remote We want to set up a proxy server in the middle, so the client will establish the connection with the proxy and the proxy will forward it to the remote, keeping in mind the remote response also. With node.js is really simple to perform those kind of network operations. client —> proxy -> remote var net = require('net'); var LOCAL_PORT = 6512; var REMOTE_PORT = 6512; var REMOTE_ADDR = "192.168.1.25"; var server = net.createServer(function (socket) { socket.on('data', function (msg) { console.log(' ** START **'); console.log('<< From client to proxy ', msg.toString()); var serviceSocket = new net.Socket(); serviceSocket.connect(parseInt(REMOTE_PORT), REMOTE_ADDR, function () { console.log('>> From proxy to remote', msg.toString()); serviceSocket.write(msg); }); serviceSocket.on("data", function (data) { console.log('<< From remote to proxy', data.toString()); socket.write(data); console.log('>> From proxy to client', data.toString()); }); }); }); server.listen(LOCAL_PORT); console.log("TCP server accepting connection on port: " + LOCAL_PORT); Simple, isn’t it? Source code in github
September 20, 2012
by Gonzalo Ayuso
· 23,606 Views
article thumbnail
The Difference Between 'Hadoop DFS' and 'Hadoop FS'
While exploring HDFS, I came across these two syntaxes for querying HDFS: > hadoop dfs > hadoop fs Initally I couldn't differentiate between the two, and kept wondering why we have two different syntaxes for a common purpose. I found a number of people online with the same question -- their thoughts are below: Per Chris's explanation: it seems like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop ... elif [ "$COMMAND" = "datanode" ] ; then CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS" elif [ "$COMMAND" = "fs" ] ; then CLASS=org.apache.hadoop.fs.FsShell HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" elif [ "$COMMAND" = "dfs" ] ; then CLASS=org.apache.hadoop.fs.FsShell HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" elif [ "$COMMAND" = "dfsadmin" ] ; then CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" ... That was his reasoning. Unconvinced, I kept looking for a more persuasive answer, and these excerpts made more sense to me: FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to HDFS. Below are two excerpts from the Hadoop documentation that describe these two as different shells. FS Shell The FileSystem (FS) shell is invoked by bin/hadoop fs. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. DFShell The HDFS shell is invoked by bin/hadoop dfs. All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands. So, based on the above, we can conclude that it all depends on the scheme configuration. When using these two commands with absolute URI (i.e. scheme://a/b) the behavior shall be identical. Only it's the default configured scheme value for file and hdfs for fs and dfs respectively, which is the cause for difference in behavior.
September 14, 2012
by Abhishek Jain
· 45,338 Views
article thumbnail
Advanced Dependency Injection With Guice
The more I use dependency injection (DI) in my code, the more it alters the way I see both my design and implementation. Injection is so convenient and powerful that you end up wanting to make sure you use it as often as you can. And as it turns out, you can use it in many, many places. Let’s cover briefly the most obvious scenarios where DI, and more specifically, Guice, are a good fit: objects created either at class loading time or very early in your application. These two aspects are covered by either direct injection or by providers, which allow you to start building some of your object graph before you can inject more objects. I won’t go too much in details about these two use cases since they are explained in pretty much any Guice tutorial you can find on the net. Once the injector has created your graph of objects, you are pretty much back to normal and instantiating your “runtime objects” (the objects you create during the life time of your application) the normal way, most likely with “new” or factories. However, you will quickly start noticing that you need some runtime information to create these objects, other parts of them could be injected. Let’s take the following example: we have a GeoService interface that provides various geolocation functions, such as telling you if two addresses are close to each other: public interface GeoService { /** * @return true if the two addresses are within @param{miles} * miles of each other. */ boolean isNear(Address address1, Address address2, int miles); } Then you have a Person class which uses this service and also needs a name and an address to be instantiated: public class Person { // Fields omitted public Person(String name, Address address, GeoService gs) { this.name = name; this.address = address; this.geoService = gs; } public boolean livesNear(Person otherPerson) { return geoService.isNear(address, otherPerson.getAddress(), 2 /* miles */); } } Something odd should jump at you right away with this class: while name and address are part of the identity of a Person object, the presence of the GeoService instance in it feels wrong. The service is a singleton that is created on start up, so a perfect candidate to be injected, but how can I achieve the creation of a Person object when some of its information is supplied by Guice and the other part by myself? Guice gives you a very elegant and flexible way to implement this scenario with “assisted injection”. The first step is to define a factory for our objects that represents exactly how we want to create them: public interface PersonFactory { Person create(String name, Address address); } Since only name and address participate in the identity of our Person objects, these are the only parameters we need to construct our objects. The other parameters should be supplied by Guice so we modify our Person constructor to let Guice know: @Inject public Person(@Assisted String name, @Assisted Address address, GeoService geoService) { this.name = name; this.address = address; this.geoService = geoService; } In this code, I have added an @Inject annotation on the constructor and an @Assisted annotation on each parameter that I will be providing. Guice will take care of injecting the rest. Finally, we connect the factory to its objects when creating the module: Module module1 = new FactoryModuleBuilder() .build(PersonFactory.class); The important part here is to realize that we will never instantiate PersonFactory: Guice will. From now on, all we need to do whenever we want to instantiate a Person object is to ask Guice to hand us a factory: @Inject private PersonFactory personFactory; // ... Person p = personFactory.create("Bob", new Address("1 Ocean st")); If you want to find out more, take a look at the main documentation for assisted injection, which explains how to support overloaded constructors and also how to create different kinds of objects within the same factory. Wrapping up Let’s take a look at what we did. First, we started with a suspicious looking constructor: public Person(String name, Address address, GeoService s) { This constructor is suspicious because it accepts parameters that do not participate in the identity of the object (you won’t use the GeoService parameter when calculating the hash code of a Person object). Instead, we replaced this constructor with a factory that only accepts identity fields: public interface PersonFactory { Person create(String name, Address address); } and we let Guice’s assisted injection take care of creating a fully formed object for us. This observation leads us to the Identity Constructor rule: If a constructor accepts parameters that are not used to define the identity of the objects, consider injecting these parameters. Once you start looking at your objects with this rule in mind, you will be surprised to find out how many of them can benefit from assisted injection.
August 23, 2012
by Cedric Beust
· 36,592 Views · 2 Likes
  • Previous
  • ...
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×