IoT Resources

The Latest IoT Topics

Internet of Things (IoT) Reference Architecture

to converge internet of thing devices with corporate it solutions, teams require a reference architecture for the internet of things (iot). the reference architecture must include devices, server-side capabilities, and cloud architecture required to interact with and manage the devices. a reference architecture should provide architects and developers of iot projects with an effective starting point that addresses major iot project and system requirements. a high-level iot reference architecture may include the following layers (see figure 1): external communications - web/portal, dashboard, apis event processing and analytics (including data storage) aggregation / bus layer – esb and message broker device communications devices cross-‐cutting layers include: device and application management identity and access management a more detailed architecture component description can be found in the iot reference architecture white paper .

June 18, 2014

by Chris Haddad

· 17,798 Views

New report looks at the role of Chambers of Commerce

The business world is an increasingly complex one. In the past few IBM CEO surveys, they have highlighted the growing importance of both being able to manage this complexity, and to do so in a collaborative way. This shifting zeitgeist was reflected in a series of seminars hosted by Xincus, and prompted the launch of a two phase study into how Chambers of Commerce can evolve within this new landscape. The study, consisting of in depth one on one interviews and a nationwide online survey, aimed to better understand both how Chambers can adapt, and what changes would be required to do so. The findings from this research are now available in a new paper called Chamber 2.0: Digital – Connected – Global. The paper outlines both the main challenges currently facing Chambers, and the steps they can take to thrive in such an environment. Amongst the main challenges identified by the research was a fundamental desire to change and modernize, with a strategic positioning and business model that would allow Chambers to flourish. There was also a strong desire to work more effectively with partners, both inside and outside of the Chamber network, sharing both resources and insights. The report then concluded with a road map derived by molding these findings from within the network with best practice from the wider business world. The road-map consists of five broad stages, with each one containing more detailed steps Chambers can take to prepare for the modern world. Become a one stop shop for members, including positioning the Chamber brand for the modern world as centers for Business, Innovation, and Economic Development with a new and modernized approach to business that sees an adaptive and responsive leadership style essential to a revitalize business model. Offer new value, with a new emphasis on virtual services to reflect modern ways of working. Chambers will become a solution hub that connects and match makes members, with co-working spaces connecting the physical and virtual worlds. Collaborate beyond borders, by building an extensive Chamber alliance network, allowing Chambers to become specialized regional hubs, whilst tapping into the collective wisdom of the entire network as well as offering “health-club” type e-memberships to professionals, academics, entrepreneurs and “free agent” millennials alike. Nurture new economic development, by facilitating entrepreneurial collaboration between members and stakeholders, connecting the right people with the right resources, helping to forge an innovation economy and a thriving business community and jobs. Foster global innovation ecosystems, by tying all of these communities together to form a hyperconnected ecosystem, with Chambers at its heart, thus empowering the next wave of new economic development around the world. The report makes clear that whilst change is desired, the network remains positive that the right developments will occur. With Chambers striving to maintain their position at the heart of the business community, this report will go some way towards helping them achieve that goal. You can get your copy of the report here. Original post

May 22, 2014

by Adi Gaskell

· 3,480 Views

ActiveMQ - Network of Brokers Explained

Objective This 7 part blog series is to share about how to create network of ActiveMQ brokers in order to achieve high availability and scalability. Why network of brokers? ActiveMQ message broker is a core component of messaging infrastructure in an enterprise. It needs to be highly available and dynamically scalable to facilitate communication between dynamic heterogeneous distributed applications which have varying capacity needs. Scaling enterprise applications on commodity hardware is a rage nowadays. ActiveMQ caters to that very well by being able to create a network of brokers to share the load. Many times applications running across geographically distributed data centers need to coordinate messages. Running message producers and consumers across geographic regions/data centers can be architected better using network of brokers. ActiveMQ uses transport connectors over which it communicates with message producers and consumers. However, in order to facilitate broker to broker communication, ActiveMQ uses network connectors. A network connector is a bridge between two brokers which allows on-demand message forwarding. In other words, if Broker B1 initiates a network connector to Broker B2 then the messages on a channel (queue/topic) on B1 get forwarded to B2 if there is at least one consumer on B2 for the same channel. If the network connector was configured to be duplex, the messages get forwarded from B2 to B1 on demand. This is very interesting because it is now possible for brokers to communicate with each other dynamically. In this 7 part blog series, we will look into the following topics to gain understanding of this very powerful ActiveMQ feature: Network Connector Basics - Part 1 Duplex network connectors - Part 2 Load balancing consumers on local/remote brokers - Part 3 Load-balance consumers/subscribers on remote brokers Queue: Load balance remote concurrent consumers - Part 4 Topic: Load Balance Durable Subscriptions on Remote Brokers - Part 5 Store/Forward messages and consumer failover - Part 6 How to prevent stuckmessages Virtual Destinations - Part 7 To give credit where it is due, the following URLs have helped me in creating this blog post series. Advanced Messaging with ActiveMQ by Dejan Bosanac [Slides 32-36] Understanding ActiveMQ Broker Networks by Jakub Korab Prerequisites ActiveMQ 5.8.0 – To create broker instances Apache Ant – To run ActiveMQ sample producer and consumers for demo. We will use multiple ActiveMQ broker instances on the same machine for the ease of demonstration. Network Connector Basics - Part 1 The following diagram shows how a network connector functions. It bridges two brokers and is used to forward messages from Broker-1 to Broker-2 on demand if established by Broker-1 to Broker-2. A network connector can be duplex so messages could be forwarded in the opposite direction; from Broker-2 to Broker-1, once there is a consumer on Broker-1 for a channel which exists in Broker-2. More on this in Part 2 Setup network connector between broker-1 and broker-2 Create two broker instances, say broker-1 and broker-2 Ashwinis-MacBook-Pro:bin akuntamukkala$ pwd /Users/akuntamukkala/apache-activemq-5.8.0/bin Ashwinis-MacBook-Pro:bin akuntamukkala$ ./activemq-admin create ../bridge-demo/broker-1 Ashwinis-MacBook-Pro:bin akuntamukkala$ ./activemq-admin create ../bridge-demo/broker-2 Since we will be running both brokers on the same machine, let's configure broker-2 such that there are no port conflicts. Edit /Users/akuntamukkala/apache-activemq-5.8.0/bridge-demo/broker-2/conf/activemq.xml Change transport connector to 61626 from 61616 Change AMQP port from 5672 to 6672 (won't be using it for this blog) Edit /Users/akuntamukkala/apache-activemq-5.8.0/bridge-demo/broker-2/conf/jetty.xml Change web console port to 9161 from 8161 Configure Network Connector from broker-1 to broker-2 Add the following XML snippet to/Users/akuntamukkala/apache-activemq-5.8.0/bridge-demo/broker-1/conf/activemq.xml The above XML snippet configures two network connectors "T:broker1->broker2" (only topics as queues are excluded) and "Q:broker1->broker2" (only queues as topics are excluded). This allows for nice separation between network connectors used for topics and queues. The name can be arbitrary although I prefer to specify the [type]:[source broker]->[destination broker]. The URI attribute specifies how to connect to broker-2 Start broker-2 Ashwinis-MacBook-Pro:bin akuntamukkala$ pwd /Users/akuntamukkala/apache-activemq-5.8.0/bridge-demo/broker-2/bin Ashwinis-MacBook-Pro:bin akuntamukkala$ ./broker-2 console Start broker-1 Ashwinis-MacBook-Pro:bin akuntamukkala$ pwd /Users/akuntamukkala/apache-activemq-5.8.0/bridge-demo/broker-1/bin Ashwinis-MacBook-Pro:bin akuntamukkala$ ./broker-1 console Logs on broker-1 show 2 network connectors being established with broker-2 INFO | Establishing network connection from vm://broker-1?async=false&network=true to tcp://localhost:61626 INFO | Connector vm://broker-1 Started INFO | Establishing network connection from vm://broker-1?async=false&network=true to tcp://localhost:61626 INFO | Network connection between vm://broker-1#24 and tcp://localhost/127.0.0.1:61626@52132(broker-2) has been established. INFO | Network connection between vm://broker-1#26 and tcp://localhost/127.0.0.1:61626@52133(broker-2) has been established. Web Console on broker-1 @ http://localhost:8161/admin/connections.jsp shows the two network connectors established to broker-2 The same on broker-2 does not show any network connectors since no network connectors were initiated by broker-2 Let's see this in action Let's produce 100 persistent messages on a queue called "foo.bar" on broker-1. Ashwinis-MacBook-Pro:example akuntamukkala$ pwd /Users/akuntamukkala/apache-activemq-5.8.0/example Ashwinis-MacBook-Pro:example akuntamukkala$ ant producer -Durl=tcp://localhost:61616 -Dtopic=false -Ddurable=true -Dsubject=foo.bar -Dmax=100 broker-1 web console shows that 100 messages have been enqueued in queue "foo.bar" http://localhost:8161/admin/queues.jsp Let's start a consumer on a queue called "foo.bar" on broker-2. The important thing to note here is that the destination name "foo.bar" should match exactly. Ashwinis-MacBook-Pro:example akuntamukkala$ ant consumer -Durl=tcp://localhost:61626 -Dtopic=false -Dsubject=foo.bar We find that all the 100 messages from broker-1's foo.bar queue get forwarded to broker-2's foo.bar queue consumer. broker-1 admin console at http://localhost:8161/admin/queues.jsp broker-2 admin console @ http://localhost:9161/admin/queues.jspshows that the consumer we had started has consumed all 100 messages which were forwarded on-demand from broker-1 broker-2 consumer details on foo.bar queue broker-1 admin console shows that all 100 messages have been dequeued [forwarded to broker-2 via the network connector]. broker-1 consumer details on "foo.bar" queue shows that the consumer is created on demand: [name of connector]_[destination broker]_inbound_[source broker] Thus we have seen the basics of network connector in ActiveMQ. As always, please feel to comment about anything that can be improved. Your inputs are welcome! Stay tuned for Part 2.

March 12, 2014

by Ashwini Kuntamukkala

· 40,149 Views · 2 Likes

Resource Pooling, Virtualization, Fabric, and the Cloud

One of the five essential attributes of cloud computing (see The 5-3-2 Principle of Cloud Computing) is resource pooling, which is an important differentiator separating the thought process of traditional IT from that of a service-based, cloud computing approach. Resource pooling in the context of cloud computing and from a service provider’s viewpoint denotes a set of strategies and a methodical way of managing resources. For a user, resource pooling institutes an abstraction for presenting and consuming resources in a consistent and transparent fashion. This article presents these key concepts derived from resource pooling: Resource Pools Virtualization in the Context of Cloud Computing Standardization, Automation, and Optimization Fabric Cloud Closing Thoughts Resource Pools Ultimately, data center resources can be logically placed into three categories. They are: compute, networks, and storage. For many, this grouping may appear trivial. It is, however, a foundation upon which some cloud computing methodologies are developed, products designed, and solutions formulated. Compute This is a collection of all CPU capabilities. Essentially all data center servers, either for supporting or actually running a workload, are all part of this compute group. Compute pool represents the total capacity for executing code and running instances. The process to construct a compute pool is to first inventory all servers and identify virtualization candidates followed by implementing server virtualization. It is never too early to introduce a system management solution to facilitate the processes, which in my view is a strategic investment and a critical component for all cloud initiatives. Networks The physical and logical artifacts putting in place to connect resources, segment, and isolate resources from layer three and below, etc., are gathered in the network pool. Networking enables resources becoming visible and hence possibly manageable. In the age of instant gratification, networks and mobility are redefining the security and system administration boundaries, and play a direct and impactful role in user productivity and customer satisfaction. Networking in cloud computing is more than just remote access, but empowerment for a user to self-serve and consume resources anytime, anywhere, with any device. BYOD and consumerization of IT are various expressions of these concepts. Storage This has long been a very specialized and sometimes mysterious part of IT. An enterprise storage solution is frequently characterized as a high-cost item with a significant financial and contractual commitment, specialized hardware, proprietary API and software, a dependency on direct vendor support, etc. In cloud computing, storage has become even more noticeable since the ability to grow and shrink based on demands, i.e. elasticity, demands an enterprise-level, massive, reliable, and resilient storage solution at a global scale. While enterprise IT is consolidating resources and transforming the existing establishment into a cloud computing environment, how to leverage existing storage devices from various vendors and integrate them with the next generation storage solutions is among the highest priorities for modernizing a data center. Virtualization in the Context of Cloud Computing In the last decade, virtualization has proved its value and accelerated the realization of cloud computing. Then, virtualization was mainly server virtualization, which in an over-simplified statement means hosting multiple server instances with the same hardware while each instance runs transparently and in insolation, as if each consumes the entire hardware and is the only instance running. Much of the customer expectations, business needs, and methodologies has since evolved. Now, we should validate virtualization in the context of cloud computing to fully address the innovations rapidly changing how IT conducts business and delivers services. As discussed below, in the context of cloud computing, consumable resources are delivered in some virtualized form. Various virtualization layers collectively construct and form the so-called fabric. Server Virtualization The concept of server virtualization remains: running multiple server instances with the same hardware while each instance runs transparently and in isolation, as if each instance is the only instance running and consuming the entire server hardware. In addition to virtualizing and consolidating servers, server virtualization also signifies the practices of standardizing server deployment switching away from physical boxes to VMs. Server virtualization is for packaging, delivering, and consuming a compute pool. There are a few important considerations of virtualizing servers. IT needs the ability to identify and manage bare metal such that the entire resource life-cycle management from commencing to decommissioning can be standardized and automated. To fundamentally reduce the support and training cost while increasing productivity, a consistent platform with tools applicable across physical, virtual, on-premises, and off-premises deployments is essential. The last thing IT wants is one set of tools for physical resources and another for those virtualized, one set of tools for on-premises deployment and another for those deployed to a service provider, and one set of tools for development and another for deploying applications. The requirement is one methodology for all, one skill set for all, and one set of tools for all. This advantage is obvious when developing applications and deploying Windows Server 2012 R2 on premises or off premises to Windows Azure. The Active Directory security model can work across sites, System Center can manage resources deployed off premises to Windows Azure, and Visual Studio can publish applications across platforms. Windows infrastructure architecture, security, and deployment models are all directly applicable. Network Virtualization The similar idea of server virtualization applies here. Network virtualization is the ability to run multiple networks on the same network device while each network runs transparently and in isolation, as if each network is the only network running and consuming the entire network hardware. Conceptually, since each network instance is running in isolation, one tenant’s 192.168.x network is not aware of another tenant’s identical192.168.x network running with the same network device. Network virtualization provides the translation between physical network characteristics and the representation of and a resource identity in a virtualized network. Consequently, above the network virtualization layer, various tenants while running in isolation can have identical network configurations. A great example of network virtualization is Windows Azure virtual networking. At any given time, there can be multiple Windows Azure subscribers all allocating the same 192.168.x address space with an identical subnet scheme (192.168.1.x/16) for deploying VMs. Those VMs belonging to one subscriber will however not be aware of or visible to those deployed by others, despite the fact that the network configuration, IP scheme, and IP address assignments may all be identical. Network virtualization in Windows Azure isolates on subscriber from the others such that each subscriber operates as if the subscription is the only one employing a 192.168.x address space. Storage Virtualization I believe this is where the next wave of drastic cost reduction of IT post-server virtualization happens. Historically, storage has been a high cost item in any IT budget in each and every aspects including hardware, software, staffing, maintenance, SLA, etc. Since the introduction of Windows Server 2012, there is a clear direction where storage virtualization is built into OS and becoming a commodity. New capabilities like Storage Pool, Hyper-V over SMB, Scale-Out Fire Share, etc., are now part of Windows Server OS and are making storage virtualization part of server administration routines and easily manageable with tools and utilities like PowerShell, which is familiar to many IT professionals. The concept of storage virtualization remains consistent with the idea of logically separating a computing object from its hardware, in this case the storage capacity. Storage virtualization is the ability to integrate multiple and heterogeneous storage devices, aggregate the storage capacities, and present/manage as one logical storage device with a continuous storage space. JBOD is a technology to realize this concept. Standardization, Automation and Optimization Each of the three resource pools has an abstraction to logically present itself with characteristics and work patterns. A compute pool is a collection of physical (virtualization and infrastructure) hosts and VMs. A virtualization host hosts VMs that run workloads deployed by service owners and consumed by authorized users. A network pool encompasses network resources including physical devices, logical switches, address spaces, and site configurations. Network virtualization as enabled/defined in configurations can identify and translate a logical/virtual IP address into a physical one, such that tenants with the same network hardware can implement an identical network scheme without a concern. A storage pool is based on storage virtualization which is a concept of presenting an aggregated storage capacity as one continuous storage space as if provided from one logical storage device. In other words, the three resource pools are wrapped with server virtualization, network virtualization, and storage virtualization, respectively. Each virtualization presents a set of methodologies on which work patterns are derived and common practices are developed. These virtualization layers provides opportunities to standardize, automate, and optimize deployments and considerably facilitates the adoption of cloud computing. Standardization Virtualizing resources decouples the dependency between instances and the underlying hardware. This offers an opportunity to simplify and standardize the logical representation of a resource. For instance, a VM is defined and deployed with a VM template that provides a level of consistency with a standardized configuration. Automation Once VM characteristics are identified and standardized, we can now generate an instance by providing only instance-based information or information that depends on run-time, such as the VM machine name, which must be validated at run-time to prevent duplicated names. This requirement for providing only minimal information at deployment can be significantly simplify and streamline operations for automation. And with automation, resources can then be deployed, instantiated, relocated, taken off-line, brought back online, or removed rapidly and automatically based on set criteria. Standardization and automation are essential mechanisms so that workload can be scaled on demand, i.e., become elastic. Optimization Standardization provides a set of common criteria. Automation executes operations based on set criteria with volumes, consistency, and expediency. With standardization and automation, instances can be instantiated with consistency, efficiency, and predictability. In other words, resources can be operated in bulk with consistency and predictability. The next logical step is then to optimize the usage based on SLA. The presented progression is what resource pooling and virtualizations can provide and facilitate. These methodologies are now built into products and solutions. Windows Server 2012 R2 and System Center 2012 and later integrate server virtualization, network virtualization, and storage virtualization into one consistent solution platform with standardization, automation, and optimization for building and managing clouds. Fabric This is a significant abstraction in cloud computing. Fabric implies accessibility and discoverability, and denotes the ability to discover, identify, and manage a resource. Conceptually, fabric is an umbrella term encompassing all the underlying infrastructure supporting a cloud computing environment. At the same time, a fabric controller represents the system management solution which manages, i.e. owns, fabric. In cloud architecture, fabric consists of the three resource pools: compute, networks, and storage. Compute provides the computing capabilities, executes code, and runs instances. Networks glues the resources based on requirements. Storage is where VMs, configurations, data, and resources are kept. Fabric shields the physical complexities of the three resource pools presented with server virtualization, network virtualization, and storage virtualization. All operations are eventually directed by the fabric controller of a data center. Above fabric, there are logical views of consumable resources including VMs, virtual networks, and logical storage drives. By deploying VMs, configuring virtual networks, or acquiring storage, a user consumes resources. Under fabric, there are virtualization and infrastructure hosts, Active Directory, DNS, clusters, load balancers, address pools, network sites, library shares, storage arrays, topology, racks, cables, etc., all under the fabric controller’s command to collectively present and support fabric. For a service provider, building a cloud computing environment is essentially establishing a fabric controller and constructing fabric. Namely, instituting a comprehensive management solution, building the three resource pools, and integrating server virtualization, network virtualization, and storage virtualization to form fabric. From a user’s point of view, how and where a resource is physically provided is not a concern, but the accessibility, readiness, scalability, and fulfillment of SLA are. Cloud This is a well-defined term and we should not be confused with it. (see NIST SP 800-145 and the 5-3-2 Principle of Cloud Computing) We need to be very clear on: what a cloud must exhibit (the five essential attributes), how to consume it (with SaaS, PaaS, or IaaS), and the model a service is deployed in (like private cloud, public cloud, and hybrid cloud). Cloud is a concept, a state, a set of capabilities such that a business can be delivered as a service, i.e. available on demand. The architecture of a cloud computing environment is presented with three resource pools: compute, networks, and storage. Each is an abstraction provided by a virtualization layer. Server virtualization presents a compute pool with VMs that supply the computing, i.e. CPUs, and power to execute code and run instances. Network virtualization offers a network pool and is the mechanism that allows multiple tenants with identical network configurations on the same virtualization host while connecting, segmenting, isolating network traffic with virtual NICs, logical switches, address space, network sites, IP pools, etc. Storage virtualization provides a logical storage device with the capacity to appear continuous and aggregated with a pool of storage devices behind the scene. The three resource pools together constitute the fabric (of a cloud) while the three virtualization layers collectively form the abstraction, such that while the underlying physical infrastructure may be intricate, the user experience above fabric remains logical and consistent. Deploying a VM, configuring a virtual network, or acquiring storage is transparent with virtualization regardless of where the VM actually resides, how the virtual network is physically wired, or what devices in the aggregate the requested storage is provided with. Closing Thoughts Cloud is a very consumer-focused approach. It is about a customer’s ability and control based on SLA in getting resources when needed and with scale, and equally important releasing resources when no longer required. It is not about products and technologies. It is about servicing, consuming, and strengthening the bottom line.

August 12, 2013

by Yung Chou

· 10,448 Views

OLAP Operation in R

OLAP (Online Analytical Processing) is a very common way to analyze raw transaction data by aggregating along different combinations of dimensions. This is a well-established field in Business Intelligence / Reporting. In this post, I will highlight the key ideas in OLAP operation and illustrate how to do this in R. Facts and Dimensions The core part of OLAP is a so-called "multi-dimensional data model", which contains two types of tables; "Fact" table and "Dimension" table A Fact table contains records each describe an instance of a transaction. Each transaction records contains categorical attributes (which describes contextual aspects of the transaction, such as space, time, user) as well as numeric attributes (called "measures" which describes quantitative aspects of the transaction, such as no of items sold, dollar amount). A Dimension table contain records that further elaborates the contextual attributes, such as user profile data, location details ... etc. In a typical setting of Multi-dimensional model ... Each fact table contains foreign keys that references the primary key of multiple dimension tables. In the most simple form, it is called a STAR schema. Dimension tables can contain foreign keys that references other dimensional tables. This provides a sophisticated detail breakdown of the contextual aspects. This is also called a SNOWFLAKE schema. Also this is not a hard rule, Fact table tends to be independent of other Fact table and usually doesn't contain reference pointer among each other. However, different Fact table usually share the same set of dimension tables. This is also called GALAXY schema. But it is a hard rule that Dimension table NEVER points / references Fact table A simple STAR schema is shown in following diagram. Each dimension can also be hierarchical so that the analysis can be done at different degree of granularity. For example, the time dimension can be broken down into days, weeks, months, quarter and annual; Similarly, location dimension can be broken down into countries, states, cities ... etc. Here we first create a sales fact table that records each sales transaction. # Setup the dimension tables state_table <- data.frame(key=c("CA", "NY", "WA", "ON", "QU"), name=c("California", "new York", "Washington", "Ontario", "Quebec"), country=c("USA", "USA", "USA", "Canada", "Canada")) month_table <- data.frame(key=1:12, desc=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), quarter=c("Q1","Q1","Q1","Q2","Q2","Q2","Q3","Q3","Q3","Q4","Q4","Q4")) prod_table <- data.frame(key=c("Printer", "Tablet", "Laptop"), price=c(225, 570, 1120)) # Function to generate the Sales table gen_sales <- function(no_of_recs) { # Generate transaction data randomly loc <- sample(state_table$key, no_of_recs, replace=T, prob=c(2,2,1,1,1)) time_month <- sample(month_table$key, no_of_recs, replace=T) time_year <- sample(c(2012, 2013), no_of_recs, replace=T) prod <- sample(prod_table$key, no_of_recs, replace=T, prob=c(1, 3, 2)) unit <- sample(c(1,2), no_of_recs, replace=T, prob=c(10, 3)) amount <- unit*prod_table[prod,]$price sales <- data.frame(month=time_month, year=time_year, loc=loc, prod=prod, unit=unit, amount=amount) # Sort the records by time order sales <- sales[order(sales$year, sales$month),] row.names(sales) <- NULL return(sales) } # Now create the sales fact table sales_fact <- gen_sales(500) # Look at a few records head(sales_fact) month year loc prod unit amount 1 1 2012 NY Laptop 1 225 2 1 2012 CA Laptop 2 450 3 1 2012 ON Tablet 2 2240 4 1 2012 NY Tablet 1 1120 5 1 2012 NY Tablet 2 2240 6 1 2012 CA Laptop 1 225 Multi-dimensional Cube Now, we turn this fact table into a hypercube with multiple dimensions. Each cell in the cube represents an aggregate value for a unique combination of each dimension. # Build up a cube revenue_cube <- tapply(sales_fact$amount, sales_fact[,c("prod", "month", "year", "loc")], FUN=function(x){return(sum(x))}) # Showing the cells of the cude revenue_cube , , year = 2012, loc = CA month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 1350 225 900 675 675 NA 675 1350 NA 1575 900 1350 Printer NA 2280 NA NA 1140 570 570 570 NA 570 1710 NA Tablet 2240 4480 12320 3360 2240 4480 3360 3360 5600 2240 2240 3360 , , year = 2013, loc = CA month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 225 225 450 675 225 900 900 450 675 225 675 1125 Printer NA 1140 NA 1140 570 NA NA 570 NA 1140 1710 1710 Tablet 3360 3360 1120 4480 2240 1120 7840 3360 3360 1120 5600 4480 , , year = 2012, loc = NY month prod 1 2 3 4 5 6 7 8 9 10 11 12 Laptop 450 450 NA NA 675 450 675 NA 225 225 NA 450 Printer NA 2280 NA 2850 570 NA NA 1710 1140 NA 570 NA Tablet 3360 13440 2240 2240 2240 5600 5600 3360 4480 3360 4480 3360 , , year = 2013, loc = NY ..... dimnames(revenue_cube) $prod [1] "Laptop" "Printer" "Tablet" $month [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" $year [1] "2012" "2013" $loc [1] "CA" "NY" "ON" "QU" "WA" OLAP Operations Here are some common operations of OLAP Slice Dice Rollup Drilldown Pivot "Slice" is about fixing certain dimensions to analyze the remaining dimensions. For example, we can focus in the sales happening in "2012", "Jan", or we can focus in the sales happening in "2012", "Jan", "Tablet". # Slice # cube data in Jan, 2012 revenue_cube[, "1", "2012",] loc prod CA NY ON QU WA Laptop 1350 450 NA 225 225 Printer NA NA NA 1140 NA Tablet 2240 3360 5600 1120 2240 # cube data in Jan, 2012 revenue_cube["Tablet", "1", "2012",] CA NY ON QU WA 2240 3360 5600 1120 2240 "Dice" is about limited each dimension to a certain range of values, while keeping the number of dimensions the same in the resulting cube. For example, we can focus in sales happening in [Jan/ Feb/Mar, Laptop/Tablet, CA/NY]. revenue_cube[c("Tablet","Laptop"), c("1","2","3"), , c("CA","NY")] , , year = 2012, loc = CA month prod 1 2 3 Tablet 2240 4480 12320 Laptop 1350 225 900 , , year = 2013, loc = CA month prod 1 2 3 Tablet 3360 3360 1120 Laptop 225 225 450 , , year = 2012, loc = NY month prod 1 2 3 Tablet 3360 13440 2240 Laptop 450 450 NA , , year = 2013, loc = NY month prod 1 2 3 Tablet 3360 4480 6720 Laptop 450 NA 225 "Rollup" is about applying an aggregation function to collapse a number of dimensions. For example, we want to focus in the annual revenue for each product and collapse the location dimension (ie: we don't care where we sold our product). apply(revenue_cube, c("year", "prod"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) prod year Laptop Printer Tablet 2012 22275 31350 179200 2013 25200 33060 166880 "Drilldown" is the reverse of "rollup" and applying an aggregation function to a finer level of granularity. For example, we want to focus in the annual and monthly revenue for each product and collapse the location dimension (ie: we don't care where we sold our product). apply(revenue_cube, c("year", "month", "prod"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) , , prod = Laptop month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 2250 2475 1575 1575 2250 1800 1575 1800 900 2250 1350 2475 2013 2250 900 1575 1575 2250 2475 2025 1800 2025 2250 3825 2250 , , prod = Printer month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 1140 5700 570 3990 4560 2850 1140 2850 2850 1710 3420 570 2013 1140 4560 3420 4560 2850 1140 570 3420 1140 3420 3990 2850 , , prod = Tablet month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 14560 23520 17920 12320 10080 14560 13440 15680 25760 12320 11200 7840 2013 8960 11200 10080 7840 14560 10080 29120 15680 15680 8960 12320 22400 "Pivot" is about analyzing the combination of a pair of selected dimensions. For example, we want to analyze the revenue by year and month. Or we want to analyze the revenue by product and location. apply(revenue_cube, c("year", "month"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) month year 1 2 3 4 5 6 7 8 9 10 11 12 2012 17950 31695 20065 17885 16890 19210 16155 20330 29510 16280 15970 10885 2013 12350 16660 15075 13975 19660 13695 31715 20900 18845 14630 20135 27500 apply(revenue_cube, c("prod", "loc"), FUN=function(x) {return(sum(x, na.rm=TRUE))}) loc prod CA NY ON QU WA Laptop 16425 9450 7650 7425 6525 Printer 15390 19950 7980 10830 10260 Tablet 90720 117600 45920 34720 57120 I hope you can get a taste of the richness of data processing model in R. However, since R is doing all the processing in RAM. This requires your data to be small enough so it can fit into the local memory in a single machine.

July 30, 2013

by Ricky Ho

· 18,022 Views · 3 Likes

Java: Testing a Socket is Listening on All Network Interfaces/Wildcard Interface

I previously wrote a blog post describing how I’ve been trying to learn more about network sockets in which I created some server sockets and connected to them using netcat. The next step was to do the same thing in Java and I started out by writing a server socket which echoed any messages sent by the client: public class EchoServer { public static void main(String[] args) throws IOException { int port = 4444; ServerSocket serverSocket = new ServerSocket(port, 50, InetAddress.getByAddress(new byte[] {0x7f,0x00,0x00,0x01})); System.err.println("Started server on port " + port); while (true) { Socket clientSocket = serverSocket.accept(); System.err.println("Accepted connection from client: " + clientSocket.getRemoteSocketAddress() ); In in = new In (clientSocket); Out out = new Out(clientSocket); String s; while ((s = in.readLine()) != null) { out.println(s); } System.err.println("Closing connection with client: " + clientSocket.getInetAddress()); out.close(); in.close(); clientSocket.close(); } } } public final class In { private Scanner scanner; public In(java.net.Socket socket) { try { InputStream is = socket.getInputStream(); scanner = new Scanner(new BufferedInputStream(is), "UTF-8"); } catch (IOException ioe) { System.err.println("Could not open " + socket); } } public String readLine() { String line; try { line = scanner.nextLine(); } catch (Exception e) { line = null; } return line; } public void close() { scanner.close(); } } public class Out { private PrintWriter out; public Out(Socket socket) { try { out = new PrintWriter(socket.getOutputStream(), true); } catch (IOException ioe) { ioe.printStackTrace(); } } public void close() { out.close(); } public void println(Object x) { out.println(x); out.flush(); } } I ran the main method of the class and this creates a server socket on port 4444 listening on the 127.0.0.1 interface and we can connect to it using netcat like so: $ nc -v 127.0.0.1 4444 Connection to 127.0.0.1 4444 port [tcp/krb524] succeeded! hello hello The output in my IntelliJ console looked like this: Started server on port 4444 Accepted connection from client: /127.0.0.1:63222 Closing connection with client: /127.0.0.1 Using netcat is fine but what I actually wanted to do was write some test code which would check that I’d made sure the server socket on port 4444 was accessible via all interfaces i.e. bound to 0.0.0.0. There are actually some quite nice classes in Java which make this very easy to do and wiring those together I ended up with the following client code: public static void main(String[] args) throws IOException { Enumeration nets = NetworkInterface.getNetworkInterfaces(); for (NetworkInterface networkInterface : Collections.list(nets)) { for (InetAddress inetAddress : Collections.list(networkInterface.getInetAddresses())) { Socket socket = null; try { socket = new Socket(inetAddress, 4444); System.out.println(String.format("Connected using %s [%s]", networkInterface.getDisplayName(), inetAddress)); } catch (ConnectException ex) { System.out.println(String.format("Failed to connect using %s [%s]", networkInterface.getDisplayName(), inetAddress)); } finally { if (socket != null) { socket.close(); } } } } } } If we run the main method of that class we’ll see the following output (on my machine at least!): Failed to connect using en0 [/fe80:0:0:0:9afe:94ff:fe4f:ee50%4] Failed to connect using en0 [/192.168.1.89] Failed to connect using lo0 [/0:0:0:0:0:0:0:1] Failed to connect using lo0 [/fe80:0:0:0:0:0:0:1%1] Connected using lo0 [/127.0.0.1] Interestingly we can’t even connect via the loopback interface using IPv6 which is perhaps not that surprising in retrospect given we bound using an IPv4 address. If we tweak the second line of EchoServer from: ServerSocket serverSocket = new ServerSocket(port, 50, InetAddress.getByAddress(new byte[] {0x7f,0x00,0x00,0x01})); to ServerSocket serverSocket = new ServerSocket(port, 50, InetAddress.getByAddress(new byte[] {0x00,0x00,0x00,0x00})); And restart the server before re-running the client we can now connect through all interfaces: Connected using en0 [/fe80:0:0:0:9afe:94ff:fe4f:ee50%4] Connected using en0 [/192.168.1.89] Connected using lo0 [/0:0:0:0:0:0:0:1] Connected using lo0 [/fe80:0:0:0:0:0:0:1%1] Connected using lo0 [/127.0.0.1] We can then wrap the EchoClient code into our testing framework to assert that we can connect via all the interfaces.

July 17, 2013

by Mark Needham

· 12,951 Views

Build an Arduino Motor/Stepper/Servo Shield – Part 1: Servos

this post starts a small (or larger?) series of tutorials using the arduino motor/stepper/servo shield with the frdm-kl25z board. that motor shield is probably one of the most versatile on the market, and features 2 servo and 4 motor connectors for dc or stepper motors. that makes it a great shield for any robotic project arduino motor stepper servo shield with frdm-kl25z the series starts with a tutorial how to drive two servo motors. and if this is not what you are expecting to do with this shield, then you can vote and tell me what you want to see instead on this motor shield . oem or original? the original arduino motor/stepper/servo shield is available from adaftruit industries and costs less than $20. i’m using a oem version, see this link . the functionality is the same, except that the oem version only runs with motors up to 16 vdc, while the original shield is for motors up to 25 vdc. motor stepper servo shield details the board has two stmicroelectronics l293d motor h-bridge ic’s which can drive up to 4 dc motors (or up to 2 stepper motors) with 0.6 a per bridge (1.2 a peak). the 74hct595n (my board has the sn74hc595 from texas instrument) is a shift register used for the h-bridges to reduce the number of pins needed (more about this in a next post). a terminal block with jumper is providing power to the dc/stepper motor. the 5 vdc for the servos is taken from the frdm board. the frdm-kl25z can only give a few hundred ma on the 5v arduino header. that works for small servos, but i recommend to cut the 5v supply to the servos and use a dedicated 5v (or 6v) for the servos. outline in this tutorial, i’m creating a project with codewarrior for mcu10.4 for the frdm-kl25z board, and then add support for two servo motors. processor expert components this tutorial uses added processor expert components which are not part of codewarrior distribution. the following other components are used: wait : allows waiting for a given time servo : high level driver for hobby servp motors make sure you have the latest and greatest components loaded from github . instructions how to download and install the additional components can be found here . creating codewarrior project to create a new project in codewarrior: file > new > bareboard project, give a project name specify the device to be used: mkl25z128 opensda as connection i/o support can be set to ‘no i/o’ processor expert as rapid application development option this creates the starting point for my project: new servo project created servo motor servo motors are used in rc (radio control) or (hobby) robotics. typical servo motor (hitec hs-303) the motor has 3 connectors: gnd (black) power (red), typically 5v, but can be 6v or even higher pwm (white or yellow), signal for position information the pwm signal typically has frequency of 50 hz (20 ms), with a duty (high duration) between 1 ms and 2 ms. the screenshot below shows such a 50 hz signal with 1.5 ms duty cycle (servo middle position): servo signal many servos go below 1 ms and beyond 2 ms. e.g. many hitec servos have a range of 0.9…2.1 ms. check the data sheet of your servos for details. if you do not have a data sheet, then you might just experiment with different values. with a pwm duty of 1 ms to 2 ms within a 20 ms period, this means that only 10% of the whole pwm duty are used. this means if you have a pwm resolution of only 8bits, then only 10% of 256 steps could be used. as such, an 8bit pwm signal does not give me a fine tuned servo positioning. the duration of the duty cycle (1..2 ms) is translated into a motor position. typically the servo has a built-in closed-loop control with a microcontroller and a potentiometer. i have found that it is not important to have an *exact* 50 hz pwm frequency. you need to experiment with your servo if it works as well with a lower or higher frequency, or with non-fixed frequency (e.g. if you do a software pwm). many servos build an average of the duty cycle, so you might need to send several pulses until the servo reacts to a changed value. servo processor expert component i’m using here my own ‘servo’ component which offers following capabilities: pwm configuration (duty and period) min/max and initialization values methods to change the duty cycle optional command line shell support: you can type in commands and control the servo. this is useful for testing or calibration. optional ‘timed’ moving, so you can move the servo faster or slower to the new position in an interrupt driven way of course it is possible to use servos without any special components. from the components view, i add the servo component. to add it to my project, i can double-click on it or use the ‘+’ icon in that view: servo component in components library view in case the processor expert views are not shown, use the menu processor expert > show views this will add a new ‘servo’ component to the project: servo component added but it shows errors as first the pwm and pin settings need to be configured. pwm configuration on the arduino motor/stepper/servo shield the two servo motor headers are connected to pwm1b and pwm1a (see schematic ): servo header on board (source: dk electronics shield schematic) following the signals, this ends up at following pins on the kl25z: servo 1 => pwm1b => arduino header d10 => frdm-kl25z d10 => kl25z pin 73 => ptd0/spi0_pcs0/ tpm0_ch0 servo 2 => pwm1a => arduino header d9 => frdm-kl25z d9 => kl25z pin 78 => adc0_se6b/ptd5/spi1_sck/uart2_tx/ tpm0_ch5 from the pin names on the kinets (tpm0_ch0 and tpm0_ch5) i can see that this would be the same timer (tpm0), but with different channel numbers (ch0 and ch5). for my first servo processor expert has created for me a ‘timerunit_ldd’ which i will be able to share (later more on this). the timerunit_ldd implements the ‘ l ogical d evice d river’ for my pwm: timerunit_ldd so i select the pwm component inside the servo component and configure it for tpm0_c0v and the pin ptd0/spi0_pcs0/tpm0_ch0 with low initial polarity. the period of 20 ms (50 hz) and starting pulse with of 1.5 ms (mid-point) should already be pre-configured: servo1 pwm configuration i recommend to give it a pin signal name (i used ‘servo1′) that i need to set the ‘initial polarity’ to low is a bug of processor expert in my view: the device supports an initial ‘high’ polarity, but somehow this is not implemented? what it means is that the polarity of the pwm signal is now inverted: a ‘high’ duty cycle will mean that the signal is low. we need to ‘revert’ the logic later in the servo component. because of the inverted pwm logic, i need to set the ‘inverted pwm’ attribute in the servo component: inverted pwm the other settings of the servo component we can keep ‘as is’ for now. the ‘min pos pwm’ and ‘max pos pwm’ define the range of the pwm duty cycle which we will use later for the servo position. adding second servo as with the first servo, i add the second servo from the components library view. as i already have a timerunit_ldd present in my system, processor expert asks me if i want to re-use the existing one or to create a new component: shared component dialog as explained above: i can use the same timer (just a different pin/channel), so i have my existing component selected and press ok. as above, i configure the timer channel and pin with initial polarity: servo2 pwm configuration and i should not forget to enable the inverted logic: inverted pwm for servo2 test application time to try things out. for this i create a simple demo application which changes the position of both servos. first i add the wait component to the project from the components library: added wait component as i have all my processor expert components configured, i can generate the code: generating processor expert code next i add a new header application.h file to my project. for this i select the ‘sources’ folder of my project and use the new > header file context menu to add my new header file: new application.h in that header file application.h i add a prototype for my application ‘run’ routine: added app_run prototype from the main() in processorexpert.c , i call that function (not to forget to include the header file): calling app_run from main the same way i add a new source file application.c: new application.c to test my servos, i’m using the setpos() method which accepts a 8bit (0 to 255) value which is the position. to slow things a bit, i’m waiting a few milliseconds between the different positions: #include "application.h" #include "wait1.h" #include "servo1.h" #include "servo2.h" void app_run(void) { uint16_t pos; for(;;) { for(pos=0;pos<=255;pos++) { servo1_setpos(pos); servo2_setpos(pos); wait1_waitms(50); } } } save all files, and we should be ready to try it out on the board. build, download and run that’s it! time to build the project (menu project > build project ) and to download it with the debugger (menu run > debug ) and to start the application. if everything is going right, then the two servos will slowly turn in one direction until the end position, and then return back to the starting position. summary using hobby servo motors with the frdm-kl25z, codewarrior, processor expert and the additional components plus the arduino/stepper/servo shield is very easy in my view. i hope this post is useful to start your own experiments with hobby servo motors to bring any robotic project to the next level. i have here on github a project which features what is explained in this post, but with a lot more components, bells and whistles

June 2, 2013

by Erich Styger

· 17,898 Views · 7 Likes

Coalition or Council: Which One Are You?

I have been thinking about institutions that strive for change. Sometimes we call them communities or organizations, sometimes we call them alliances or parties. But whatever their nature, these institutions are usually led and managed by a small group of people. I see two kinds of leading groups: coalitions and councils. coalition A temporary alliance of distinct parties, persons, or states for joint action council A group elected or appointed as an advisory or legislative body Coalitions A coalition is a self-selecting team. The persons seek each other out because they want to be active agents for change, and by working together they can be more successful in achieving a common goal. In his change management books John Kotter referred to them as guiding coalitions. They are not elected. They are not appointed. They select each other because they want to. And they can even work undercover, because their goal is to influence, not to govern. The allied powers in World War II were a coalition. The Google founders were a coalition. The originators of the Stoos Network were a coalition. Councils A council is a group of representatives. These people also want to be active agents for change. But, their primary concern is to have buy-in from the larger group of people they are representing within the institute (community, organization, or party). The concept of democracy has led to many different versions of these councils. Sometimes we call them a government. Sometimes a committee. And everything has to be out in the open, because if it’s not, we call them cronies. Their goal is primarily to govern or advise the institute. The United Nations has a council. My former students society had a council. And many workplaces have management teams acting as councils. And you? If you have a group of people who all desire change, do you lead with a coalition or with a council? This is the big problem with some alliances and consortiums for change. They have directors who try to be both. It is a recipe for disaster. Maybe the best institutions have both: a coalition and a council. (image from Veni Markovski)

April 21, 2013

by Jurgen Appelo

· 7,108 Views

Weekend Project: Send sensor data from Arduino to MongoDB

Arduino is an open-source electronics platform that can acknowledge and interact with its environment through a variety of sensor types. It’s great for hardware prototyping and one-off projects. I just got an Arduino Board from our friends at SendGrid, who also gave me a little tutorial in the art of Arduino hacking. Inspired by the tutorial and armed with this new board, I bought a passive infared (PIR) motion sensor from my local Radio Shack. Now I was ready to play; in particular, I wanted to be able to collect that continuous stream of hardware sensor data into a MongoDB database for logging, trend analysis, system event correlation, etc. To this end, I created the demo project “mongodb-motion”, which I’ve made public on Github. In the “mongodb-motion” Github repo, you will find an Arudino project that writes motion sensor data to a cloud MongoDB database at MongoLab and sends alerts via email based on certain criteria. I built this demo using Node.js and the MongoLab REST API. Below, I’ll go through exactly what hardware you need to make your own “mongodb-motion” project a success, and how the code actually works. What You Need The hardware used in this demo includes: an Arduino UNO R3 and a Parallax PIR motion sensor. How the Code Works You can use a variety of motion sensors with the Arduino. In this particular experiment, I used a PIR motion sensor. The PIR motion sensor behaves like a switch, with ‘down’ events emitted on motion detection and ‘up’ events a few seconds after motion ceases to be detected. On the receiving side, I used JohnnyFive, an appropriately named Node.js package that accepts sensor events and sends messages to the Arduino board. With the two ends set, I’ll move on to the project’s configuration file. In this demo, I’ve included a configuration file, config-sample.js, where credentials for the MongoLab REST API and for the email SMTP server can be added. In my case, I used the SendGrid SMTP service. The configuration file also has two callbacks that determine when an email is emitted, one for each type of event – “detect” and “ceased”. I’ve used this feature to automatically send an email alert if an event timestamp is between 7:00pm and 8:00am, ostensibly when my office should be motionless… I’m out there watching you, office! Once you’ve customized this config-sample.js file, be sure to rename it to config.js in order for it to be usable. If you inspect the project code, you’ll notice that the MongoLab REST API is called in the logMsg() function, using an https.request. Building this little demo has given me some new ideas for hardware hacking the cloud. I hope you give it a try too. Thanks to the Arduino, Node.js and Javascript communities, and special thanks to Rick Waldon for Johnny Five, SendGrid for the UNO board, and a big shout out to @swiftalphaone for the Waza tutorial.

April 3, 2013

by Ben Wen

· 17,832 Views

Simulate Network Latency, Packet Loss, and Low Bandwidth on Mac OSX

Sometimes while testing you may want to be able to simulate network latency, or packet loss, or low bandwidth. I have done this with Linux and tc/netem as well as with Shunra on Windows, but I had never done it on Mac OSX. It turns out that Mac OSX includes ‘dummynet’ from FreeBSD which has the capability to do this WAN simulation. Here is a quick example: Inject 250ms latency and 10% packet loss on connections between my workstation and my development web server (10.0.0.1) Simulate maximum bandwidth of 1Mbps # Create 2 pipes and assigned traffic to and from our webserver to each: $ sudo ipfw add pipe 1 ip from any to 10.0.0.1 $ sudo ipfw add pipe 2 ip from 10.0.0.1 to any # Configure the pipes we just created: $ sudo ipfw pipe 1 config delay 250ms bw 1Mbit/s plr 0.1 $ sudo ipfw pipe 2 config delay 250ms bw 1Mbit/s plr 0.1 A quick test: $ ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1): 56 data bytes 64 bytes from 10.0.0.1: icmp_seq=0 ttl=63 time=515.939 ms 64 bytes from 10.0.0.1: icmp_seq=1 ttl=63 time=519.864 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=63 time=521.785 ms Request timeout for icmp_seq 3 64 bytes from 10.0.0.1: icmp_seq=4 ttl=63 time=524.461 ms Disable: $sudo ipfw list |grep pipe 01900 pipe 1 ip from any to 10.13.1.133 out 02000 pipe 2 ip from 10.13.1.133 to any in $ sudo ipfw delete 01900 $ sudo ipfw delete 02000 # or, flush all ipfw rules, not just our pipes $ sudo ipfw -q flush Notice that the round-trip on the ping is ~500ms. That is because we applied a 250ms latency to both pipes, incoming and outgoing traffic. Our example was very simple, but you can get quite complex since “pipes” are applied to traffic using standard ipfw firewall rules. For example, you could specify different latency based on port, host, network, etc. Packet loss is configured with the “plr” command. Valid values are 0 – 1. In our example above we used 0.1 which equals 10% packetloss. This is a very handy way for developers on Mac’s to test their applications in a variety of network environments. And you get it for FREE. On Windows you need to buy a commercial tool to achieve this (at least that was true the last time I looked, in 2008.)

December 15, 2012

by Joe Miller

· 17,097 Views

CentOS Minimal Installation Network Configuration

By default CentOS minimal install does not come with pre-configured network, here’s how to make it work: $ ping google.com ping: unknown host google.com To fix this we’ll need to edit the set up for the ethernet. Let’s start with editing this file: $ vim /etc/sysconfig/network-scripts/ifcfg-eth0 IPADDR=x.x.x.x BOOTPROTO=none NETMASK=255.255.255.0 GATEWAY=y.y.y.y DNS1=y.y.y.y DNS2=y.y.y.y USERCTL=yes HWADDR='your mac address' where x.x.x.x is your static ip, and y.y.y.y is your router ip If you’re not sure what your mac address is, run this command $ ifconfig eth0 | grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}' now edit the networks config and make sure you added the line below: $ vi /etc/sysconfig/network Add the line: GATEWAY = y.y.y.y Now restart the network interface: $ /etc/init.d/networking restart Now ping the router: $ ping y.y.y.y Request timeout for icmp_seq 0 64 bytes from y.y.y.y: icmp_seq=1 ttl=56 time=1.792 ms Request timeout for icmp_seq 1 64 bytes from y.y.y.y: icmp_seq=3 ttl=56 time=1.790 ms 64 bytes from y.y.y.y: icmp_seq=4 ttl=56 time=1.762 ms Looks good, now let’s see if we can see anything outside. $ ping google.com PING google.com (173.194.67.138) 56(84) bytes of data. 64 bytes from wi-in-f138.1e100.net (173.194.67.138): icmp_seq=1 ttl=49 time=7.88 ms 64 bytes from wi-in-f138.1e100.net (173.194.67.138): icmp_seq=2 ttl=49 time=7.35 ms 64 bytes from wi-in-f138.1e100.net (173.194.67.138): icmp_seq=3 ttl=49 time=7.13 ms Now you can connect to the internet, and get all the packages you need.

November 26, 2012

by Kasia Gogolek

· 90,039 Views · 1 Like

How to do a presentation in China? Some of my experiences

So the culture is different from Western culture we all know that! I am certainly not an expert on China but after living in China for almost 2 years knowing some language and working in a chinese company seeing presentations every week and also visiting over 30 western and chinese companies placed in China I think I have some insights about how you should organize your presentation in China. Since I recently went to Shanghai in order to to research exchange with Jiaotong University I was about to give a presentation to introduce my institute and me. So here you can find my rather uncommon presentation and some remarks, why some slides where designed in the way they are. http://www.rene-pickhardt.de/wp-content/uploads/2012/11/ApexLabIntroductionOfWeST.pdf Guanxi – your relations First of all I think it is really important to understand that in China everything is related to your relations (http://en.wikipedia.org/wiki/Guanxi). A chinese business card will always name a view of your best and strongest contacts. This is more important than your adress for example. If a conference starts people exchange namecards before they sit down and discuss. This principle of Guanxi is also reflected in the style presentations are made. Here are some basic rules: Show pictures of people you worked together with Show pictures of groups while you organized events Show pictures of the panels that run events Show your partners (for business not only clients but also people you are buying from or working together with in general) My way of respecting these principles: I first showed a group picture of our institute! I also showed for almost every project where I could get hold of it pictures of the people that are responsible for the project I did not only show the European research projects our university is in but listed all the different partners and showed logos of them Family The second thing is that in China the concept of family is very important. I would say as a rule of thumb if you want to make business with someone in china and you havent been introduced to their family things are not going like you might expect this. For this reason I have included some slides with a worldmap going further down to the place where I was born and where I studied and where my parents still leave! Localizing When I choosed a worldmap I did not only take one with Chinese language but I also took one where china was centered. In my contact data I also put chinese social networks. Remember Twitter, Facebook and many other sites are blocked in China. So if you really want to communicate with chinese people why not getting a QQ number or weibo account? Design of the slides You saw this on conferences many times. Chinese people just put a hack a lot of stuff on a slide. I strongly believe this is due to the fact that reading and recognizing Chinese characters is much faster than western characters. So if your presentation is in Chinese Language don’t be afraid to stuff your slides with information. I have seen many talks by Chinese people that where literally reading word by word what was written on the slides. Where in western countries this is considered bad practice in China this is all right. Language Speaking of Language: Of course if you know some chinese it shows respect if you at least try to include some chinese. I split my presentation in 2 parts. One which was in chinese and one that was in english. Have an interesting take away message So in my case I included the fact that we have PhD positions open and scholarships. That our institut is really international and the working language is english. Of course I also included some slides about my past and current research like Graphity and Typology During the presentation: In China it is not rude at all if ones cellphone rings and one has more important stuff to do. You as presenter should switch of your phone but you should not be disturbed or annoyed if people in the audience receive phone calls and go out of the room doing that business. This is very common in China. I am sure there are many more rules on how to hold a presentation in China and maybe I even made some mistakes in my presentation but at least I have the feeling that the reaction was quite positiv. So if you have questions, suggestions and feedback feel free to drop a line I am more than happy to discuss cultural topics!

November 11, 2012

by René Pickhardt

· 17,390 Views

Smart Batching

How often have we all heard that “batching” will increase latency? As someone with a passion for low-latency systems this surprises me. In my experience when batching is done correctly, not only does it increase throughput, it can also reduce average latency and keep it consistent. Well then, how can batching magically reduce latency? It comes down to what algorithm and data structures are employed. In a distributed environment we are often having to batch up messages/events into network packets to achieve greater throughput. We also employ similar techniques in buffering writes to storage to reduce the number of IOPS. That storage could be a block device backed file-system or a relational database. Most IO devices can only handle a modest number of IO operations per second, so it is best to fill those operations efficiently. Many approaches to batching involve waiting for a timeout to occur and this will by its very nature increase latency. The batch can also get filled before the timeout occurs making the latency even more unpredictable. Figure 1. Figure 1. above depicts decoupling the access to an IO device, and therefore the contention for access to it, by introducing a queue like structure to stage the messages/events to be sent and a thread doing the batching for writing to the device. The Algorithm An approach to batching uses the following algorithm in Java pseudo code: public final class NetworkBatcher implements Runnable { private final NetworkFacade network; private final Queue queue; private final ByteBuffer buffer; public NetworkBatcher(final NetworkFacade network, final int maxPacketSize, final Queue queue) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); this.queue = queue; } @Override public void run() { while (!Thread.currentThread().isInterrupted()) { while (null == queue.peek()) { employWaitStrategy(); // block, spin, yield, etc. } Message msg; while (null != (msg = queue.poll())) { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); } sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } Basically, wait for data to become available and as soon as it is, send it right away. While sending a previous message or waiting on new messages, a burst of traffic may arrive which can all be sent in a batch, up to the size of the buffer, to the underlying resource. This approach can use ConcurrentLinkedQueue which provides low-latency and avoid locks. However it has an issue in not creating back pressure to stall producing/publishing threads if they are outpacing the batcher whereby the queue could grow out of control because it is unbounded. I’ve often had to wrap ConcurrentLinkedQueue to track its size and thus create back pressure. This size tracking can add 50% to the processing cost of using this queue in my experience. This algorithm respects the single writer principle and can often be employed when writing to a network or storage device, and thus avoid lock contention in third party API libraries. By avoiding the contention we avoid the J-Curve latency profile normally associated with contention on resources, due to the queuing effect on locks. With this algorithm, as load increases, latency stays constant until the underlying device is saturated with traffic resulting in a more "bathtub" profile than the J-Curve. Let’s take a worked example of handling 10 messages that arrive as a burst of traffic. In most systems traffic comes in bursts and is seldom uniformly spaced out in time. One approach will assume no batching and the threads write to device API directly as in Figure 1. above. The other will use a lock free data structure to collect the messages plus a single thread consuming messages in a loop as per the algorithm above. For the example let’s assume it takes 100µs to write a single buffer to the network device as a synchronous operation and have it acknowledged. The buffer will ideally be less than the MTU of the network in size when latency is critical. Many network sub-systems are asynchronous and support pipelining but we will make the above assumption to clarify the example. If the network operation is using a protocol like HTTP under REST or Web Services then this assumption matches the underlying implementation. Best (µs) Average (µs) Worst (µs) Packets Sent Serial 100 500 1,000 10 Smart Batching 100 150 200 1-2 The absolute lowest latency will be achieved if a message is sent from the thread originating the data directly to the resource, if the resource is un-contended. The table above shows what happens when contention occurs and a queuing effect kicks in. With the serial approach 10 individual packets will have to be sent and these typically need to queue on a lock managing access to the resource, therefore they get processed sequentially. The above figures assume the locking strategy works perfectly with no perceivable overhead which is unlikely in a real application. For the batching solution it is likely all 10 packets will be picked up in first batch if the concurrent queue is efficient, thus giving the best case latency scenario. In the worst case only one message is sent in the first batch with the other nine following in the next. Therefore in the worst case scenario one message has a latency of 100µs and the following 9 have a latency of 200µs thus giving a worst case average of 190µs which is significantly better than the serial approach. This is one good example when the simplest solution is just a bit too simple because of the contention. The batching solution helps achieve consistent low-latency under burst conditions and is best for throughput. It also has a nice effect across the network on the receiving end in that the receiver has to process fewer packets and therefore makes the communication more efficient both ends. Most hardware handles data in buffers up to a fixed size for efficiency. For a storage device this will typically be a 4KB block. For networks this will be the MTU and is typically 1500 bytes for Ethernet. When batching, it is best to understand the underlying hardware and write batches down in ideal buffer size to be optimally efficient. However keep in mind that some devices need to envelope the data, e.g. the Ethernet and IP headers for network packets so the buffer needs to allow for this. There will always be an increased latency from a thread switch and the cost of exchange via the data structure. However there are a number of very good non-blocking structures available using lock-free techniques. For the Disruptor this type of exchange can be achieved in as little as 50-100ns thus making the choice of taking the smart batching approach a no brainer for low-latency or high-throughput distributed systems. This technique can be employed for many problems and not just IO. The core of the Disruptor uses this technique to help rebalance the system when the publishers burst and outpace the EventProcessors. The algorithm can be seen inside the BatchEventProcessor. Note: For this algorithm to work the queueing structure must handle the contention better than the underlying resource. Many queue implementations are extremely poor at managing contention. Use science and measure before coming to a conclusion. Batching with the Disruptor The code below shows the same algorithm in action using the Disruptor's EventHandler mechanism. In my experience, this is a very effective technique for handling any IO device efficiently and keeping latency low when dealing with load or burst traffic. public final class NetworkBatchHandler implements EventHander { private final NetworkFacade network; private final ByteBuffer buffer; public NetworkBatchHandler(final NetworkFacade network, final int maxPacketSize) { this.network = network; buffer = ByteBuffer.allocate(maxPacketSize); } public void onEvent(Message msg, long sequence, boolean endOfBatch) throws Exception { if (msg.size() > buffer.remaining()) { sendBuffer(); } buffer.put(msg.getBytes()); if (endOfBatch) { sendBuffer(); } } private void sendBuffer() { buffer.flip(); network.send(buffer); buffer.clear(); } } The endOfBatch parameter greatly simplifies the handling of the batch compared to the double loop in the algorithm above. I have simplified the examples to illustrate the algorithm. Clearly error handling and other edge conditions need to be considered. Separation of IO from Work Processing There is another very good reason to separate the IO from the threads doing the work processing. Handing off the IO to another thread means the worker thread, or threads, can continue processing without blocking in a nice cache friendly manner. I've found this to be critical in achieving high-performance throughput. If the underlying IO device or resource becomes briefly saturated then the messages can be queued for the batcher thread allowing the work processing threads to continue. The batching thread then feeds the messages to the IO device in the most efficient way possible allowing the data structure to handle the burst and if full apply the necessary back pressure, thus providing a good separation of concerns in the workflow. Conclusion So there you have it. Smart Batching can be employed in concert with the appropriate data structures to achieve consistent low-latency and maximum throughput. From http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html

October 26, 2011

by Martin Thompson

· 11,645 Views · 1 Like

Brute forcing a bin packing problem

Even a basic planning problem, such as bin packing, can be notoriously hard to solve and scale. One might consider the brute force algorithm. Let's take a look at how that algorithm works out on the cloud balance example of Drools Planner: Given a set of servers with different hardware (CPU, memory and network bandwidth) and given a set of processes with different hardware requirements, assign each process to 1 server and minimize the total cost of the active servers. The brute force algorithm is simple: try every combination between processes where each process is assigned to each server. For example, if we have 6 processes (P0, P1, P2, P3, P4, P5) and 2 servers (S0, S1), we'd try these solutions: P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S0, P5->S1 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S0 P0->S0, P1->S0, P2->S0, P3->S0, P4->S1, P5->S1 ... P0->S1, P1->S1, P2->S1, P3->S1, P4->S1, P5->S1 On my machine, it takes 15ms to calculate the score of these 2^6 combinations. When I scale out to 9 processes and 3 servers, which are 3^9 combinations, it becomes 1582ms. So it scales like this: Notice that despite that the number of processes has not even doubled, the running time multiplied by 100! For comparison, I 've added the running time of the First Fit algorithm. And it gets worse: for 12 processes and 4 servers, which are 4^12 combinations, it take more than 17 minutes: What if we want to distribute 3000 processes over 1000 servers? With this kind of scalability, it will take too long. In fact, the brute force algorithm is useless. Luckily, Drools Planner implements several other optimization algorithms, which can handle such loads. If you want to know more about them, take a look at the Drools Planner manual or come to my talk at JUDCon London (31 Oct - 1 Nov). This article was originally posted on the Drools & jBPM blog.

September 26, 2011

by Geoffrey De Smet

· 9,977 Views

Map Reduce and Stream Processing

Hadoop Map/Reduce model is very good in processing large amount of data in parallel. It provides a general partitioning mechanism (based on the key of the data) to distribute aggregation workload across different machines. Basically, map/reduce algorithm design is all about how to select the right key for the record at different stage of processing. However, "time dimension" has a very different characteristic compared to other dimensional attributes of data, especially when real-time data processing is concerned. It presents a different set of challenges to the batch oriented, Map/Reduce model. Real-time processing demands a very low latency of response, which means there isn't too much data accumulated at the "time" dimension for processing. Data collected from multiple sources may not have all arrived at the point of aggregation. In the standard model of Map/Reduce, the reduce phase cannot start until the map phase is completed. And all the intermediate data is persisted in the disk before download to the reducer. All these added to significant latency of the processing. Here is a more detail description of this high latency characteristic of Hadoop. Although Hadoop Map/Reduce is designed for batch-oriented work load, certain application, such as fraud detection, ad display, network monitoring requires real-time response for processing large amount of data, have started to looked at various way of tweaking Hadoop to fit in the more real-time processing environment. Here I try to look at some technique to perform low-latency parallel processing based on the Map/Reduce model. General stream processing model In this model, data are produced at various OLTP system, which update the transaction data store and also asynchronously send additional data for analytic processing. The analytic processing will write the output to a decision model, which will feed back information to the OLTP system for real-time decision making. Notice the "asynchronous nature" of the analytic processing which is decoupled from the OLTP system, this way the OLTP system won't be slow down waiting for the completion of the analytic processing. Nevetheless, we still need to perform the analytic processing ASAP, otherwise the decision model will not be very useful if it doesn't reflect the current picture of the world. What latency is tolerable is application specific. Micro-batch in Map/Reduce One approach is to cut the data into small batches based on time window (e.g. every hour) and submit the data collected in each batch to the Map Reduce job. Staging mechanism is needed such that the OLTP application can continue independent of the analytic processing. A job scheduler is used to regulate the producer and consumer so each of them can proceed independently. Continuous Map/Reduce Here lets imagine some possible modification of the Map/Reduce execution model to cater for real-time stream processing. I am not trying to worry about the backward compatibility of Hadoop which is the approach that Hadoop online prototype (HOP) is taking. Long running The first modification is to make the mapper and reducer long-running. Therefore, we cannot wait for the end of the map phase before starting the reduce phase as the map phase never ends. This implies the mapper push the data to the reducer once it complete its processing and let the reducer to sort the data. A downside of this approach is that it offers no opportunity to run the combine() function on the map side to reduce the bandwidth utilization. It also shift more workload to the reducer which now needs to do the sorting. Notice there is a tradeoff between latency and optimization. Optimization requires more data to be accumulated at the source (ie: the Mapper) so local consolidation (ie: combine) can be performed. Unfortunately, low latency requires the data to be sent ASAP so not much accumulation can be done. HOP suggest an adaptive flow control mechanism such that data is pushed out to reducer ASAP until the reducer is overloaded and push back (using some sort of flow control protocol). Then the mapper will buffer the processed message and perform combine() before it send to the reducer. This approach automatically shift back and forth the aggregation workload between the reducer and the mapper. Time Window: Slice and Range This is a "time slice" concept and a "time range" concept. "Slice" defines a time window where result is accumulated before the reduce processing is executed. This is also the minimum amount of data that the mapper should accumulate before sending to the reducer. "Range" defines the time window where results are aggregated. It can be a landmark window where it has a well-defined starting point, or a jumping window (consider a moving landmark scenario). It can also be a sliding window where is a fixed size window from the current time is aggregated. After receiving a specific time slice from every mapper, the reducer can start the aggregation processing and combine the result with the previous aggregation result. Slice can be dynamically adjusted based on the amount of data sent from the mapper. Incremental processing Notice that the reducer need to compute the aggregated slice value after receive all records of the same slice from all mappers. After that it calls the user-defined merge() function to merge the slice value with the range value. In case the range need to be refreshed (e.g. reaching a jumping window boundary), the init() functin will be called to get a refreshed range value. If the range value need to be updated (when certain slice value falls outside a sliding range), the unmerge() function will be invoked. Here is an example of how we keep tracked of the average hit rate (ie: total hits per hour) within a 24 hour sliding window with update happens per hour (ie: an one-hour slice). # Call at each hit record map(k1, hitRecord) { site = hitRecord.site # lookup the slice of the particular key slice = lookupSlice(site) if (slice.time - now > 60.minutes) { # Notify reducer whole slice of site is sent advance(site, slice) slice = lookupSlice(site) } emitIntermediate(site, slice, 1) } combine(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } # Send the message to the downstream node emitIntermediate(site, slice, hitCount) } # Called when reducer receive full slice from all mappers reduce(site, slice, countList) { hitCount = 0 for count in countList { hitCount += count } sv = SliceValue.new sv.hitCount = hitCount return sv } # Called at each jumping window boundary init(slice) { rangeValue = RangeValue.new rangeValue.hitCount = 0 return rangeValue } # Called after each reduce() merge(rangeValue, slice, sliceValue) { rangeValue.hitCount += sliceValue.hitCount } # Called when a slice fall out the sliding window unmerge(rangeValue, slice, sliceValue) { rangeValue.hitCount -= sliceValue.hitCount }

November 23, 2010

by Ricky Ho

· 17,286 Views · 1 Like

Leader vs Ruler

When I was trying to search for "leaders vs. rulers" on Google, I found many references to governments, royalty, and the military, throughout history. But the strange thing is that none of the articles seemed to distinguish between leaders and rulers. As if leaders and rulers are the same kind of people. They are not. Leaders Last week I was reading the book Tribes, by Seth Godin. In his book Seth says that never in history has it been so easy for anyone to be a leader. These days, with the use of social media, each of us is able to attract our own followers. And on Twitter, this is exactly what we're doing (quite literally). Seth explains that a crowd becomes a tribe when it has a leader that the people are following out of their own free will. And the interesting thing is that people can follow different leaders for different causes. In software projects it is the same. Some people can take the lead on an architectural level, while some have the lead on a functional level. Still others may be the first ones to turn to when people need advice about tools or processes. A complex system does not need a single leader. In fact, I believe a cross-functional team functions best when it has multiple leaders, each with his own area(s) of interest. Rulers In social systems the rulers are of an entirely different breed. While leaders use the power of attraction to convince people what to do, rulers use the power of authority to tell people what to do. Ruling people's lives is the very purpose of the ruler's job. With ruling comes law-making, enforcement and sanctioning, also called the trias politica (legislature, executive, judiciary). Unfortunately, rulers have gotten a bit of a bad name over the centuries. (Much of it deserved, by the way.) But ruling isn't all that bad. Laws, enforcement and sanctions are necessary evils, and in many social systems rulers can peacefully co-exist with leaders. For example: in any football (or soccer) match you will find leaders (one in each team) and rulers (the referees). They all play their parts in making the game work for everyone. Are managers rulers? There's no doubt in my mind that managers are rulers. They are (usually) the only ones with the authority to hire and fire people, and to place them in (or remove them from) teams or departments. They are able to tell people what software to use, what clothes to wear, and how much to pay for a place at the parking lot. Are managers leaders? This is a more interesting question. Lots of management book have been trying hard to turn managers into leaders. The last one I read was Good to Great, by Jim Collins. In his book Jim listed a 5-level hierarchy: Level 5 Executive: Builds enduring greatness through a paradoxical blend of personal humility and professional will. Level 4 Effective Leader: Catalyzes commitment to and vigorous pursuit of a clear and compelling vision, stimulating higher performance standards. Level 3 Manager: Organizes people and resources toward the effective and efficient pursuit of pre-determined objectives. Level 2 Contributing Team Member: Contributes individual capabilities to the achievement of group objectives and works effectively with others in a group setting. Level 1 Highly Capable Individual: Makes productive contributions through talent, knowledge, skills, and good work habits. The problem I have with Jim's hierarchy is that it suggests a linear progression to "higher" levels (where a leader is on a "higher" level than a manager). This doesn't fit with my observations of how social networks operate. In a software project, or any other social network, there can be many leaders, each with his or her own goals and desires. Some are taking initiatives for better architectures, some are leading the way to better user interface design, and some are guiding their followers towards better customer service, better processes, better software tools, or better coffee. To be a leader is not the next step for managers It is the manager's job to give room to leaders There are thousands of leaders on Twitter, and they all have their own huge numbers of followers. But who are the managers of Twitter? Only Evan Williams, Biz Stone and Jack Dorsey are. It's their platform. It's their game. They are the referees, making the laws, enforcing them, and sanctioning, while thousands of leaders and tribes are running around trying to score. Sure, it's ok when managers are trying to be leaders. Nothing wrong with that. Evan, Biz and Jack have a large number of followers themselves too. But they don't have the largest tribes. Managers are on top of things, but they are not on top. Rulers don't need to have the largest tribes themselves. Being a great ruler is hard enough already. If you think you need to be a great leader too, you're just making it hard for yourself. Referees contribute to great football/soccer games by being great rulers. They don't attempt to lead. It's not their job. They are in charge, but they are not the ones with the biggest egos. In his presentation Step Back from Chaos Jonathan Whitty shows that managers are often not the hubs in a social network. It's the informal leaders in a network through which most of the communication flows. It's the managers' job to make sure that leadership is cultivated, and that the emerging leaders are following the rules. So, you can be a leader, or you can be a ruler. And if you're exceptionally talented, perhaps you can be both. Which one will you be?

April 28, 2009

by Jurgen Appelo

· 6,653 Views