IoT Resources

The Latest IoT Topics

MotionSense Not-E-FYE: Motion Activated Pushbullet Notification to Smartphone

Create a motion-activated Pushbullet Notification to your smartphone.

June 28, 2015

by Anupam Datta

· 812,865 Views · 1 Like

Alpnames breaks 500,000 domain names under management

Alpnames becomes the second largest registrar for new domain endings Wednesday 24th June 2015, Gibraltar: Alpnames smashed through the half million mark last night in terms of domain under management. As an ICANN accredited registrar specialising in new generic Top Level Domains, Alpnames manages nearly 8% of all new gTLD names registered to date making it the second most popular new gTLD registrar after the industry giant GoDaddy and recently overtaking the first and longest standing registrar Network Solutions (part of web.com). “We are delighted with the extraordinary growth of Alpnames” said Damon Barnard, Chief Operations Officer at Alpnames. “Our decision to focus on bringing competitively priced new domain extensions to the attention of Internet users, who might be frustrated at the lack of affordable domain names under .COM, is clearly the right one” There is currently a massive change happening in the Internet Name space meaning that domain names can now end in almost any series of letters such as .science, .party or .club. ICANN, the body that regulates names on the internet, relaxed the rules on internet domain extensions in 2013 in order to promote competition, innovation and consumer choice. Currently, over 5.75 million domain names have been registered in nearly 640 new domain extensions. These so called new generic Top Level Domains (gTLDs) will form the backbone of the next generation of the Internet, opening up the web to a whole host of new possibilities. “Alpnames is proud to be a leading player in the new era of domain names” continued Barnard. “We intend to continue bringing new domain names to Internet users at affordable prices; names that help define communities of interest online and assist Internet users in navigating to websites that provide the content or interaction that they are looking for.” With more than 500,000 web addresses under management, Alpnames is one of the fastest growing domain name registrars in history.

June 24, 2015

by Fran Cator

· 1,013 Views

Business leaders today are becoming more aware of the many ways social knowledge networks can bring value to an organization. Communication is enhanced, knowledge is organized and accessible, products and customer service are improved, and business culture is strengthened through team collaboration. However, what many business leaders are failing to remember is that finding the right technology is only the first step in implementing a successful social knowledge network. The technology must be paired with smart planning and a well-thought-out structure. Elizabeth Lupfer, social collaboration and HR expert, claims, “A social knowledge network is one part technology and two parts process. The technology is only an enabler, and may only be worth 20 percent of the total value of the intranet.” The other 80 percent of value is created through effective structure and long-lasting employee engagement. We have researched and created a 10-step social knowledge network success plan that if implemented properly can maximize the value of any social knowledge network. Have a plan. Taking the time to create a social knowledge network implementation plan for your organization ensures managers and employees are motivated and committed to maximizing the value of the network. The network will only become as valuable as managers and employees plan to make it. Setting expectations for productivity and engagement ahead of time is important. Give employees a detailed explanation of how the social knowledge network connects to the organizational purpose. Communicate the value and purpose of the social knowledge network. Employees must clearly understand how the social knowledge network will contribute to the organizational purpose, long term and short term goals of the organization, and their individual role. Set up the organization and structure of the intranet at the beginning. Understand the priorities of each department, region, and team within your organization and how they connect. Who communicates with whom the most? What knowledge is of most value to each team? Your social knowledge network structure should enhance easy collaboration between individuals who value the information being shared. Thinking about the structure of your social knowledge network in advance will allow for easier search capabilities in the network and less time wasted sifting through information that is not relevant. Define roles and expectations around engagement for employees. Collaboration can be encouraged by asking employees to share projects and knowledge on the social knowledge network so that all team members are on the same page and no question is asked twice. Create and announce objectives with time-specific deadlines. It is important to create quantitative and time-specific objectives for your social knowledge network, such as “60 percent of users will have contributed some knowledge to our social knowledge network by June 30th.” Creating measurable engagement expectations will encourage employees to actively work towards the objective and increase network participation toward a common goal. Assign key social knowledge network ambassadors to start and maintain collaboration. Have people in your organization serve as the “ambassadors” of the social knowledge network. The ambassadors will be the network’s support, taking care of employee questions and concerns about the social knowledge network and asking the correct individuals for answers when they don’t know the answer themselves. Get the CEO and top level management involved. CEO and top-level involvement in a social knowledge network is very valuable, as employees pay special attention to what leaders have to share. Top level management should post openly about company activities to keep employees inspired about the broader picture of the organization. Management contributions should push brand, company culture, and core values. Industry and competitor news, trends, and expectations should be shared by management as well. Determine and reward achievement of Key Performance Indicators. Financial and non-financial benefits are important ways to encourage employee engagement in your social knowledge network. Gift cards, time off, or simple acknowledgment given to active users are good ways to inspire employees to participate more. Rewards given to the most active team or department will encourage group involvement and can be a great way to spark participation. Upload company documents and shared files. Ensure that your social knowledge network is the main place to access information, forms, presentations, and documents of all kinds. Keep all documents updated and organized so that they are accurate and searchable within the platform. Onboarding information, required customer forms, sales presentations, marketing tools, company calendars, and price lists should all be easily accessible on your social knowledge network. This will immediately spark network participation because these documents are needed by many people in your organization who will now access them through your social knowledge network. Be social! Although your social knowledge network should be a valuable business tool, there are many benefits of adding social, non-business related activity. Social activity on your network will create increased openness, allowing employees to get to know each other better and become more comfortable with sharing thoughts and opinions. Adding personal employee profile contributions, weekly peer-to-peer shout-outs for special accomplishments, employee group activities outside of work, or even posting something humorous once a week can help your social knowledge network become more open and fun, drawing employees back to check and use the network more often. Finding the right social knowledge network software for your organization is only the start. Research from the 2014 Social Business Global Executive Study shows only 17 percent of respondents see their organization as having mature social business practices. Following these steps will help you successfully integrate your social knowledge network into the flow of work. Once your organization becomes a true social business, you’ll see measurable value in improved innovation, talent management, and operations. If you are interested in learning more about how a knowledge base can break down silos within your organization, check out our white paper, Proving the Value: Getting Internal Buy-In for a Knowledge Base. Like this post? Click here to subscribe to our blog and receive the latest content on social learning, customer support, sales enablement, or all three.

June 24, 2015

by Bloomfire Marketing

· 1,046 Views

8 Key Findings About IoT Development

IoT is really hot, but can also be a bit confusing. Read about these 8 development key findings.

June 24, 2015

by Burke Holland

· 1,911 Views

ParStream to Present Requirements of an Analytics Platform for IoT at the TDWI Munich Conference 2015

COLOGNE, Germany – June 22, 2015 – ParStream, the IoT analytics company, today announced its participation at the TDWI Munich Conference 2015, one of the largest gatherings of expert Business Intelligence, Big Data and data warehousing leaders and educators in Europe. The conference will take place June 22-24, 2015 at the MOC Order and Event Center in Munich, Germany. Albert Aschauer, Sales Director DACH at ParStream, will present on requirements for an analytics platform for the Internet of Things (IoT) based on real-world use cases from the renewable energy and telecommunications industries. Big Data, fast data, edge analytics and real-time insights are driving new technology innovation to meet the demand for getting more value from IoT data. Additional details on the speaking session are below. What: “Requirements of an Analytics Platform for the Internet of Things” When: Monday, June 22, 2015 at 11:35 a.m. CEST Who: Albert Aschauer, Sales Director DACH at ParStream Where: MOC Munich, Germany – Room F112 To schedule a one-on-one meeting with Albert Aschauer and ParStream at TDWI Munich Conference 2015, send an email to events(at)parstream(dot)com.

June 22, 2015

by Fran Cator

· 1,121 Views

Make Your IoT Gateway WiFi-Aware Using Camel and Kura

The common scenario for the mobile IoT Gateways is to cache collected data locally on the device storage and synchronizing the data with the data center.

May 9, 2015

by Henryk Konsek

· 8,156 Views

Internet of Things MQTT Quality of Service Levels

At Red Hat's Virtual Event, Building Data-driven Solutions for the Internet of Things, Kenneth Peeples spoke about connecting to the IoT with the MQTT protocol.

April 20, 2015

by Kenneth Peeples

· 12,318 Views

Using Apache Kafka for Integration and Data Processing Pipelines with Spring

written by josh long on the spring blog applications generated more and more data than ever before and a huge part of the challenge - before it can even be analyzed - is accommodating the load in the first place. apache’s kafka meets this challenge. it was originally designed by linkedin and subsequently open-sourced in 2011. the project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. the design is heavily influenced by transaction logs. it is a messaging system, similar to traditional messaging systems like rabbitmq, activemq, mqseries, but it’s ideal for log aggregation, persistent messaging, fast (_hundreds_ of megabytes per second!) reads and writes, and can accommodate numerous clients. naturally, this makes it perfect for cloud-scale architectures! kafka powers many large production systems . linkedin uses it for activity data and operational metrics to power the linkedin news feed, and linkedin today, as well as offline analytics going into hadoop. twitter uses it as part of their stream-processing infrastructure. kafka powers online-to-online and online-to-offline messaging at foursquare. it is used to integrate foursquare monitoring and production systems with hadoop-based offline infrastructures. square uses kafka as a bus to move all system events through square’s various data centers. this includes metrics, logs, custom events, and so on. on the consumer side, it outputs into splunk, graphite, or esper-like real-time alerting. netflix uses it for 300-600bn messages per day. it’s also used by airbnb, mozilla, goldman sachs, tumblr, yahoo, paypal, coursera, urban airship, hotels.com, and a seemingly endless list of other big-web stars. clearly, it’s earning its keep in some powerful systems! installing apache kafka there are many different ways to get apache kafka installed. if you’re on osx, and you’re using homebrew, it can be as simple as brew install kafka . you can also download the latest distribution from apache . i downloaded kafka_2.10-0.8.2.1.tgz , unzipped it, and then within you’ll find there’s a distribution of apache zookeeper as well as kafka, so nothing else is required. i installed apache kafka in my $home directory, under another directory, bin , then i created an environment variable, kafka_home , that points to $home/bin/kafka . start apache zookeeper first, specifying where the configuration properties file it requires is: $kafka_home/bin/zookeeper-server-start.sh $kafka_home/config/zookeeper.properties the apache kafka distribution comes with default configuration files for both zookeeper and kafka, which makes getting started easy. you will in more advanced use cases need to customize these files. then start apache kafka. it too requires a configuration file, like this: $kafka_home/bin/kafka-server-start.sh $kafka_home/config/server.properties the server.properties file contains, among other things, default values for where to connect to apache zookeeper ( zookeeper.connect ), how much data should be sent across sockets, how many partitions there are by default, and the broker id ( broker.id - which must be unique across a cluster). there are other scripts in the same directory that can be used to send and receive dummy data, very handy in establishing that everything’s up and running! now that apache kafka is up and running, let’s look at working with apache kafka from our application. some high level concepts.. a kafka broker cluster consists of one or more servers where each may have one or more broker processes running. apache kafka is designed to be highly available; there are no master nodes. all nodes are interchangeable. data is replicated from one node to another to ensure that it is still available in the event of a failure. in kafka, a topic is a category, similar to a jms destination or both an amqp exchange and queue. topics are partitioned, and the choice of which of a topic’s partition a message should be sent to is made by the message producer. each message in the partition is assigned a unique sequenced id, its offset . more partitions allow greater parallelism for consumption, but this will also result in more files across the brokers. producers send messages to apache kafka broker topics and specify the partition to use for every message they produce. message production may be synchronous or asynchronous. producers also specify what sort of replication guarantees they want. consumers listen for messages on topics and process the feed of published messages. as you’d expect if you’ve used other messaging systems, this is usually (and usefully!) asynchronous. like spring xd and numerous other distributed system, apache kafka uses apache zookeeper to coordinate cluster information. apache zookeeper provides a shared hierarchical namespace (called znodes ) that nodes can share to understand cluster topology and availability (yet another reason that spring cloud has forthcoming support for it..). zookeeper is very present in your interactions with apache kafka. apache kafka has, for example, two different apis for acting as a consumer. the higher level api is simpler to get started with and it handles all the nuances of handling partitioning and so on. it will need a reference to a zookeeper instance to keep the coordination state. let’s turn now turn to using apache kafka with spring. using apache kafka with spring integration the recently released apache kafka 1.1 spring integration adapter is very powerful, and provides inbound adapters for working with both the lower level apache kafka api as well as the higher level api. the adapter, currently, is xml-configuration first, though work is already underway on a spring integration java configuration dsl for the adapter and milestones are available. we’ll look at both here, now. to make all these examples work, i added the libs-milestone-local maven repository and used the following dependencies: org.apache.kafka:kafka_2.10:0.8.1.1 org.springframework.boot:spring-boot-starter-integration:1.2.3.release org.springframework.boot:spring-boot-starter:1.2.3.release org.springframework.integration:spring-integration-kafka:1.1.1.release org.springframework.integration:spring-integration-java-dsl:1.1.0.m1 using the spring integration apache kafka with the spring integration xml dsl first, let’s look at how to use the spring integration outbound adapter to send message instances from a spring integration flow to an external apache kafka instance. the example is fairly straightforward: a spring integration channel named inputtokafka acts as a conduit that forwards message messages to the outbound adapter, kafkaoutboundchanneladapter . the adapter itself can take its configuration from the defaults specified in the kafka:producer-context element or it from the adapter-local configuration overrides. there may be one or many configurations in a given kafka:producer-context element. here’s the java code from a spring boot application to trigger message sends using the outbound adapter by sending messages into the incoming inputtokafka messagechannel . package xml; import org.apache.commons.logging.log; import org.apache.commons.logging.logfactory; import org.springframework.beans.factory.annotation.qualifier; import org.springframework.boot.commandlinerunner; import org.springframework.boot.springapplication; import org.springframework.boot.autoconfigure.springbootapplication; import org.springframework.context.annotation.bean; import org.springframework.context.annotation.dependson; import org.springframework.context.annotation.importresource; import org.springframework.integration.config.enableintegration; import org.springframework.messaging.messagechannel; import org.springframework.messaging.support.genericmessage; @springbootapplication @enableintegration @importresource("/xml/outbound-kafka-integration.xml") public class demoapplication { private log log = logfactory.getlog(getclass()); @bean @dependson("kafkaoutboundchanneladapter") commandlinerunner kickoff(@qualifier("inputtokafka") messagechannel in) { return args -> { for (int i = 0; i < 1000; i++) { in.send(new genericmessage<>("#" + i)); log.info("sending message #" + i); } }; } public static void main(string args[]) { springapplication.run(demoapplication.class, args); } } using the new apache kafka spring integration java configuration dsl shortly after the spring integration 1.1 release, spring integration rockstar artem bilan got to work on adding a spring integration java configuration dsl analog and the result is a thing of beauty! it’s not yet ga (you need to add the libs-milestone repository for now), but i encourage you to try it out and kick the tires. it’s working well for me and the spring integration team are always keen on getting early feedback whenever possible! here’s an example that demonstrates both sending messages and consuming them from two different integrationflow s. the producer is similar to the example xml above. new in this example is the polling consumer. it is batch-centric, and will pull down all the messages it sees at a fixed interval. in our code, the message received will be a map that contains as its keys the topic and as its value another map with the partition id and the batch (in this case, of 10 records), of records read. there is a messagelistenercontainer -based alternative that processes messages as they come. package jc; import org.apache.commons.logging.log; import org.apache.commons.logging.logfactory; import org.springframework.beans.factory.annotation.autowired; import org.springframework.beans.factory.annotation.qualifier; import org.springframework.beans.factory.annotation.value; import org.springframework.boot.commandlinerunner; import org.springframework.boot.springapplication; import org.springframework.boot.autoconfigure.springbootapplication; import org.springframework.context.annotation.bean; import org.springframework.context.annotation.configuration; import org.springframework.context.annotation.dependson; import org.springframework.integration.integrationmessageheaderaccessor; import org.springframework.integration.config.enableintegration; import org.springframework.integration.dsl.integrationflow; import org.springframework.integration.dsl.integrationflows; import org.springframework.integration.dsl.sourcepollingchanneladapterspec; import org.springframework.integration.dsl.kafka.kafka; import org.springframework.integration.dsl.kafka.kafkahighlevelconsumermessagesourcespec; import org.springframework.integration.dsl.kafka.kafkaproducermessagehandlerspec; import org.springframework.integration.dsl.support.consumer; import org.springframework.integration.kafka.support.zookeeperconnect; import org.springframework.messaging.messagechannel; import org.springframework.messaging.support.genericmessage; import org.springframework.stereotype.component; import java.util.list; import java.util.map; /** * demonstrates using the spring integration apache kafka java configuration dsl. * thanks to spring integration ninja artem bilan * for getting the java configuration dsl working so quickly! * * @author josh long */ @enableintegration @springbootapplication public class demoapplication { public static final string test_topic_id = "event-stream"; @component public static class kafkaconfig { @value("${kafka.topic:" + test_topic_id + "}") private string topic; @value("${kafka.address:localhost:9092}") private string brokeraddress; @value("${zookeeper.address:localhost:2181}") private string zookeeperaddress; kafkaconfig() { } public kafkaconfig(string t, string b, string zk) { this.topic = t; this.brokeraddress = b; this.zookeeperaddress = zk; } public string gettopic() { return topic; } public string getbrokeraddress() { return brokeraddress; } public string getzookeeperaddress() { return zookeeperaddress; } } @configuration public static class producerconfiguration { @autowired private kafkaconfig kafkaconfig; private static final string outbound_id = "outbound"; private log log = logfactory.getlog(getclass()); @bean @dependson(outbound_id) commandlinerunner kickoff( @qualifier(outbound_id + ".input") messagechannel in) { return args -> { for (int i = 0; i < 1000; i++) { in.send(new genericmessage<>("#" + i)); log.info("sending message #" + i); } }; } @bean(name = outbound_id) integrationflow producer() { log.info("starting producer flow.."); return flowdefinition -> { consumer spec = (kafkaproducermessagehandlerspec.producermetadataspec metadata)-> metadata.async(true) .batchnummessages(10) .valueclasstype(string.class) .valueencoder(string::getbytes); kafkaproducermessagehandlerspec messagehandlerspec = kafka.outboundchanneladapter( props -> props.put("queue.buffering.max.ms", "15000")) .messagekey(m -> m.getheaders().get(integrationmessageheaderaccessor.sequence_number)) .addproducer(this.kafkaconfig.gettopic(), this.kafkaconfig.getbrokeraddress(), spec); flowdefinition .handle(messagehandlerspec); }; } } @configuration public static class consumerconfiguration { @autowired private kafkaconfig kafkaconfig; private log log = logfactory.getlog(getclass()); @bean integrationflow consumer() { log.info("starting consumer.."); kafkahighlevelconsumermessagesourcespec messagesourcespec = kafka.inboundchanneladapter( new zookeeperconnect(this.kafkaconfig.getzookeeperaddress())) .consumerproperties(props -> props.put("auto.offset.reset", "smallest") .put("auto.commit.interval.ms", "100")) .addconsumer("mygroup", metadata -> metadata.consumertimeout(100) .topicstreammap(m -> m.put(this.kafkaconfig.gettopic(), 1)) .maxmessages(10) .valuedecoder(string::new)); consumer endpointconfigurer = e -> e.poller(p -> p.fixeddelay(100)); return integrationflows .from(messagesourcespec, endpointconfigurer) .>>handle((payload, headers) -> { payload.entryset().foreach(e -> log.info(e.getkey() + '=' + e.getvalue())); return null; }) .get(); } } public static void main(string[] args) { springapplication.run(demoapplication.class, args); } } the example makes heavy use of java 8 lambdas. the producer spends a bit of time establishing how many messages will be sent in a single send operation, how keys and values are encoded (kafka only knows about byte[] arrays, after all) and whether messages should be sent synchronously or asynchronously. in the next line, we configure the outbound adapter itself and then define an integrationflow such that all messages get sent out via the kafka outbound adapter. the consumer spends a bit of time establishing which zookeeper instance to connect to, how many messages to receive (10) in a batch, etc. once the message batches are recieved, they’re handed to the handle method where i’ve passed in a lambda that’ll enumerate the payload’s body and print it out. nothing fancy. using apache kafka with spring xd apache kafka is a message bus and it can be very powerful when used as an integration bus. however, it really comes into its own because it’s fast enough and scalable enough that it can be used to route big-data through processing pipelines. and if you’re doing data processing, you really want spring xd ! spring xd makes it dead simple to use apache kafka (as the support is built on the apache kafka spring integration adapter!) in complex stream-processing pipelines. apache kafka is exposed as a spring xd source - where data comes from - and a sink - where data goes to. spring xd exposes a super convenient dsl for creating bash -like pipes-and-filter flows. spring xd is a centralized runtime that manages, scales, and monitors data processing jobs. it builds on top of spring integration, spring batch, spring data and spring for hadoop to be a one-stop data-processing shop. spring xd jobs read data from sources , run them through processing components that may count, filter, enrich or transform the data, and then write them to sinks. spring integration and spring xd ninja marius bogoevici , who did a lot of the recent work in the spring integration and spring xd implementation of apache kafka, put together a really nice example demonstrating how to get a full working spring xd and kafka flow working . the readme walks you through getting apache kafka, spring xd and the requisite topics all setup. the essence, however, is when you use the spring xd shell and the shell dsl to compose a stream. spring xd components are named components that are pre-configured but have lots of parameters that you can override with --.. arguments via the xd shell and dsl. (that dsl, by the way, is written by the amazing andy clement of spring expression language fame!) here’s an example that configures a stream to read data from an apache kafka source and then write the message a component called log , which is a sink. log , in this case, could be syslogd, splunk, hdfs, etc. xd> stream create kafka-source-test --definition "kafka --zkconnect=localhost:2181 --topic=event-stream | log"--deploy and that’s it! naturally, this is just a tase of spring xd, but hopefully you’ll agree the possibilities are tantalizing. deploying a kafka server with lattice and docker it’s easy to get an example kafka installation all setup using lattice , a distributed runtime that supports, among other container formats, the very popular docker image format. there’s a docker image provided by spotify that sets up a collocated zookeeper and kafka image . you can easily deploy this to a lattice cluster, as follows: ltc create --run-as-root m-kafka spotify/kafka from there, you can easily scale the apache kafka instances and even more easily still consume apache kafka from your cloud-based services. next steps you can find the code for this blog on my github account . we’ve only scratched the surface! if you want to learn more (and why wouldn’t you?), then be sure to check out marius bogoevici and dr. mark pollack’s upcoming webinar on reactive data-pipelines using spring xd and apache kafka where they’ll demonstrate how easy it can be to use rxjava, spring xd and apache kafka!

April 18, 2015

by Pieter Humphrey

· 29,145 Views

Big Data Processing in Spark

In the traditional 3-tier architecture, data processing is performed by the application server where the data itself is stored in the database server. Application server and database server are typically two different machine. Therefore, the processing cycle proceeds as follows Application server send a query to the database server to retrieve the necessary data Application server perform processing on the received data Application server will save the changed data to the database server In the traditional data processing paradigm, we move data to the code. It can be depicted as follows ... Then big data phenomenon arrives. Because the data volume is huge, it cannot be hold by a single database server. Big data is typically partitioned and stored across many physical DB server machines. On the other hand, application servers need to be added to increase the processing power of big data. However, as we increase the number of App servers and DB servers for storing and processing the big data, more data need to be transfer back and forth across the network during the processing cycle, up to a point where the network becomes a major bottleneck. Moving Code to Data To overcome the network bottleneck, we need a new computing paradigm. Instead of moving data to the code, we move the code to the data and perform the processing at where the data is stored. Notice the change of the program structure The program execution starts at a driver, which orchestrate the execution happening remotely across many worker servers within a cluster. Data is no longer transferred to the driver program, the driver program holds a data reference in its variable rather than the data itself. The data reference is basically an id to locate the corresponding data residing in the database server Code is shipped from the program to the database server, where the execution is happening, and data is modified at the database server without leaving the server machine. Finally the program request a save of the modified data. Since the modified data resides in the database server, no data transfer happens over the network. By moving the code to the data, the volume of data transfer over network is significantly reduced. This is an important paradigm shift for big data processing. In the following session, I will use Apache Spark to illustrate how this big data processing paradigm is implemented. RDD Resilient Distributed Dataset (RDD) is how Spark implements the data reference concept. RDD is a logical reference of a dataset which is partitioned across many server machines in the cluster. To make a clear distinction between data reference and data itself, a Spark program is organized as a sequence of execution steps, which can either be a "transformation" or an "action". Programming Model A typical program is organized as follows From an environment variable "context", create some initial data reference RDD objects Transform initial RDD objects to create more RDD objects. Transformation is expressed in terms of functional programming where a code block is shipped from the driver program to multiple remote worker server, which hold a partition of the RDD. Variable appears inside the code block can either be an item of the RDD or a local variable inside the driver program which get serialized over to the worker machine. After the code (and the copy of the serialized variables) is received by the remote worker server, it will be executed there by feeding the items of RDD residing in that partition. Notice that the result of a transformation is a brand new RDD (the original RDD is not mutated) Finally, the RDD object (the data reference) will need to be materialized. This is achieved through an "action", which will dump the RDD into a storage, or return its value data to the driver program. Here is a word count example # Get initial RDD from the context file = spark.textFile("hdfs://...") # Three consecutive transformation of the RDD counts = file.flatMap(lambda line: line.split(" ")) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a + b) # Materialize the RDD using an action counts.saveAsTextFile("hdfs://...") When the driver program starts its execution, it builds up a graph where nodes are RDD and edges are transformation steps. However, no execution is happening at the cluster until an action is encountered. At that point, the driver program will ship the execution graph as well as the code block to the cluster, where every worker server will get a copy. The execution graph is a DAG. Each DAG is a atomic unit of execution. Each source node (no incoming edge) is an external data source or driver memory Each intermediate node is a RDD Each sink node (no outgoing edge) is an external data source or driver memory Green edge connecting to RDD represents a transformation. Red edge connecting to a sink node represents an action Data Shuffling Although we ship the code to worker server where the data processing happens, data movement cannot be completely eliminated. For example, if the processing requires data residing in different partitions to be grouped first, then we need to shuffle data among worker server. Spark carefully distinguish "transformation" operation in two types. "Narrow transformation" refers to the processing where the processing logic depends only on data that is already residing in the partition and data shuffling is unnecessary. Examples of narrow transformation includes filter(), sample(), map(), flatMap() .... etc. "Wide transformation" refers to the processing where the processing logic depends on data residing in multiple partitions and therefore data shuffling is needed to bring them together in one place. Example of wide transformation includes groupByKey(), reduceByKey() ... etc. Joining two RDD can also affect the amount of data being shuffled. Spark provides two ways to join data. In a shuffle join implementation, data of two RDD with the same key will be redistributed to the same partition. In other words, each of the items in each RDD will be shuffled across worker servers. Beside shuffle join, Spark provides another alternative call broadcast join. In this case, one of the RDD will be broadcasted and copied over to every partition. Imagine the situation when one of the RDD is significantly smaller relative to the other, then broadcast join will reduce the network traffic because only the small RDD need to be copied to all worker servers while the large RDD doesn't need to be shuffled at all. In some cases, transformation can be re-ordered to reduce the amount of data shuffling. Below is an example of a JOIN between two huge RDDs followed by a filtering. Plan1 is a naive implementation which follows the given order. It first join the two huge RDD and then apply the filter on the join result. This ends up causing a big data shuffling because the two RDD is huge, even though the result after filtering is small. Plan2 offers a smarter way by using the "push-down-predicate" technique where we first apply the filtering in both RDDs before joining them. Since the filtering will reduce the number of items of each RDD significantly, the join processing will be much cheaper. Execution Planning As explain above, data shuffling incur the most significant cost in the overall data processing flow. Spark provides a mechanism that generate an execute plan from the DAG that minimize the amount of data shuffling. Analyze the DAG to determine the order of transformation. Notice that we starts from the action (terminal node) and trace back to all dependent RDDs. To minimize data shuffling, we group the narrow transformation together in a "stage" where all transformation tasks can be performed within the partition and no data shuffling is needed. The transformations becomes tasks that are chained together within a stage Wide transformation sits at the boundary of two stages, which requires data to be shuffled to a different worker server. When a stage finishes its execution, it persist the data into different files (one per partition) of the local disks. Worker nodes of the subsequent stage will come to pickup these files and this is where data shuffling happens Below is an example how the execution planning turns the DAG into an execution plan involving stages and tasks. Reliability and Fault Resiliency Since the DAG defines a deterministic transformation steps between different partitions of data within each RDD RDD, fault recovery is very straightforward. Whenever a worker server crashes during the execution of a stage, another worker server can simply re-execute the stage from the beginning by pulling the input data from its parent stage that has the output data stored in local files. In case the result of the parent stage is not accessible (e.g. the worker server lost the file), the parent stage need to be re-executed as well. Imagine this is a lineage of transformation steps, and any failure of a step will trigger a restart of execution from its last step. Since the DAG itself is an atomic unit of execution, all the RDD values will be forgotten after the DAG finishes its execution. Therefore, after the driver program finishes an action (which execute a DAG to its completion), all the RDD value will be forgotten and if the program access the RDD again in subsequent statement, the RDD needs to be recomputed again from its dependents. To reduce this repetitive processing, Spark provide a caching mechanism to remember RDDs in worker server memory (or local disk). Once the execution planner finds the RDD is already cache in memory, it will use the RDD right away without tracing back to its parent RDDs. This way, we prune the DAG once we reach an RDD that is in the cache. Overall speaking, Apache Spark provides a powerful framework for big data processing. By the caching mechanism that holds previous computation result in memory, Spark out-performs Hadoop significantly because it doesn't need to persist all the data into disk for each round of parallel processing. Although it is still very new, I think Spark will take off as the main stream approach to process big data.

March 13, 2015

by Ricky Ho

· 23,522 Views · 2 Likes

The 100% Utilization Myth

Many organizations in which I've coached are concerned when the people on their teams aren't being "utilized" at 100% of their capacity. If someone is at work for 8 hours per day, minus an hour for lunch and breaks, that's 7 hours of potential capacity. Some organizations are progressive enough to see that the organization's overhead of administrative activities lowers that value to 6 hours per day. By extension, a team's capacity is simply a multiple of the number of team members and the number of hours available to be utilized, i.e. a team of 5 has 30 person-hours per day. So, anything less than 30 person-hours per day spent on tasks means that a team isn't being utilized 100%. Which is bad, apparently. A Thought Exercise If you're reading this, you have a computer and it's connected to a network. Your computer might be a small one that makes phone calls or a large one that can also handle the complex scenery of games at 60 frames per second without breaking a sweat. Regardless, it's a computer. Thanks to John von Neumann, almost all computers today have the same basic architecture, the heart of which is the Central Processing Unit or CPU. When the CPU on your computer, regardless of its size, reaches 100% utilization, what happens? Your computer slows down. Significantly. The CPU is saturated with instructions and does its best to process them, but it has a finite limit on how quickly it can do that. There's a finite limit to the speed at which the instructions and the data they use can be passed around among the internal components of the CPU. That processor just can't go any faster. To a person using the computer, it appears to be locked up. To see this in action, try running a modern game like Faster Than Light on a 1st generation iPad. The older processor can't keep up with the computing demand from a game targeted towards more a powerful CPU. The communications network to which your computer is connected might be wireless or have a "hard" connection, but behind the scenes the public internet is dominated by TCP/IP and Ethernet. What happens when the network to which your computer is connected is running at 100% capacity? The network slows down. Significantly. The equipment moving the packets of data about will experience numerous collisions and will have to send requests back to your computer to resend data. The equipment will also begin to simply "drop" packets because it can't process them quickly enough. As far as you're concerned the network has become either very slow or has simply stopped. To see this in real life, look at what happens to a web site when it's the subject of a Denial of Service (DoS) attack, which effectively saturates that site's network with data requests. The site either crashes completely or appears to be unresponsive because it can't handle the traffic. Knowledge work such as software development requires a huge amount of thinking (processing) and a huge amount of collaboration (communication). The human brain is a processor, and like the von Neumann architecture has subcomponents (lobes), working memory, etc. What happens when our processor hits 100% utilization? We simply can't process all the inputs fast enough and our ability to process information slows down dramatically. We even reach a state where the decision-making centre in our brain shuts itself down. Coupled with a decreasing ability to process information and make decisions, we experience increasing levels of anxiety. Now, let's look at teams. They represent a communications network, with the people being the nodes in the network. When the team reaches 100% utilization, the individual processors (the brains of the people!) begin to slow as the ability to process information diminishes and anxiety increases. This has the effect of slowing the team's ability to communicate and operate effectively. When one or multiple people reach the point of shutting down, the network collapses. Why does anyone think that 100% utilization is a good thing? In manufacturing, 100% utilization of the capacity of your workers is actually desirable. The thinking aspects have already been done in the design and engineering of the item being created, and the assembly process is infinitely repeatable until a new item comes along. This approach to the process was taken from the manufacturing world and applied to the development of software, with the thinking that assembly was done by the programmers after all the design and engineering had taken place. Well, that simply isn't true. The "assembly" of software isn't programming, it's the compilation and packing into something deployable. Writing software is a confluence of design and engineering that has creative and technical aspects. It's not an assembly line, and probably won't ever be within my lifetime. As long ago as the late 1950's, management guru Peter Drucker realized that there were people who "think for a living". He coined the term knowledge worker to describe that type of work. Developing software is a classic knowledge worker role and therefore has different rules for productivity. As Drucker said in his 1999 book Management Challenges of the 21st Century, Productivity of the knowledge worker is not — at least not primarily — a matter of the quantity of output. Quality is at least as important. If we're pushing individuals and teams to 100% capacity, the quality of work and therefore the productivity of the team will be reduced. In 2001, Tom deMarco released Slack: Getting Past Burnout, Busywork and the Myth of Total Efficiency, an entire book describing the problems created by trying to achieve 100% utilization. I highly recommend the book, and also recommend you give your manager a copy. :) Here are some selected quotes that are pertinent to reducing the load on our processor and communication network: Very successful companies have never struck me as particularly busy; in fact, they are, as a group, rather laid back. Energy is evident in the workplace, but it's not the energy tinged with fear that comes from being slightly behind on everything. The principal resource needed for invention is slack. When companies can't invent, it's usually because their people are too damn busy. People under time pressure don't think faster. - Tim Lister How Can We Deal With This? First and foremost, stop trying to do so damned much! This is truly a slow down to go faster type of solution. For a Team If you're an agile team using iterations or sprints, pull less work into a sprint even if your velocity says you should be able to pull more. The extra time can be used to increase the quality of the work that you do complete, which is in itself a motivating factor for knowledge workers. Increased quality means decreased rework, which means the team as a whole delivers faster over the long term. Similarly, the reduced anxiety and stress on the team members increases their ability to think about what they're doing, meaning better and more innovative solutions to problems. Also, ensure that teams are focusing on activities that contribute directly to their work. Meetings tend to be the worst offenders of this, including the meetings defined within agile processes. Some suggestions: No meeting can be scheduled unless there is a specific question to be answered or specific information to be passed along. Reduce the default meeting length from 1 hour to 30 minutes in whatever calendar tool you use. Even better, reduce it to 20 minutes. Choose blocks of time that are deemed meeting free zones. No one, regardless of their position in the org chart can schedule a meeting with the team or the team members during that time. Minimize meetings where people are on a phone. My observation is that people who call into meetings tend to speak longer than when they are in the room with everyone else. I know that I do this, and I see it often in others. For an Individual Learn to say, "No." At the very least, learn to say, "Not yet." It's really difficult to do this, and I speak from personal experience. We want to help others, we want to be team players. However, if we don't say no, we take on too much, our processing and communication ability suffers and we end up disappointing those we were trying to help! Also, learn to take breaks. Yes, take breaks. The same concepts from Slack apply here, with brief diversions clearing our mind and allowing us to work better afterwards. At the extreme, if you feel like you can't keep your eyes open, take a nap! I highly recommend the Pomodoro Technique as at least a starting point for giving yourself some slack. Step away from your work for just a few minutes and return recharged. Conclusion We don't want the CPU in our computer to be working at 100% and we don't want the communications network to which it's connected to be at 100% capacity either. Our brains are processors and teams are a communications network, so why would we want those to be 100% busy all the time? In fact, ensuring that teams and the people on them are always busy is a provably wrong approach for software development. It has it's roots in a manufacturing mindset and developing software is knowledge work. Software development requires substantially different ways to make teams and the individual people on them as productive as possible. Finally, there is no magic number for what percentage of utilization is best. People are variable and 90% utilization for a person might be fine on a given day while %30 is all that the same person can handle on the next. Don't aim for a number, aim for an environment where people know that they don't have to appear busy. That will leave plenty of capacity for each person's processor and their team's communications network to run smoothly and deliver real value more effectively.

March 12, 2015

by Dave Rooney

· 22,669 Views · 2 Likes

Retry-After HTTP Header in Practice

Retry-After is a lesser known HTTP response header.

February 20, 2015

by Tomasz Nurkiewicz

· 16,888 Views

(C# code snippet) How to create USB web camera viewer and stream to remote locations

in this brief tutorial you will learn how to develop a camera viewer application in c# that allows you to display the image of your usb webcam and to stream the camera image to remote pcs and smartphones. instead of presenting a long article, i would rather show how to implement such application with a few lines of c# code by using the prewritten components of a c# camera library. prerequisites a visual c# wpf application created in visual studio the voipsdk.dll added to the references. (it can be found on the official website of this c# camera library .) a media player supporting rtsp streaming (e.g. vlc) installed on a remote pc first of all let’s build the gui. if you follow the content of the mainwindow.xaml file line-by-line, you will see how to create user all the necessary gui elements that allows the user to be able to connect to a usb camera and display its image, and to set the listen address (including 2 textboxes for the ip address and the port number) that makes rtsp streaming possible. (the following figure illustrates the gui that can be created by using this code snippet.) in the mainwindow.xaml.cs file you will see how to implement the camera viewer functionality and how to turn your application as a video server. to test your application run the program, click the connect button, then when the camera image is displayed, enter the ipv4 address of your pc as listening address, and specify ’554’ as a port number. thereafter open the vlc media player on an other pc or smartphone, and open the network media stream by entering the following network url: rtsp://192.168.115.1:554 (that is: rtsp://youripv4address/portnumber). the result can be seen below: i hope my code snippet was useful! happy programming! // mainwindow.xaml // mainwindow.xaml.cs using system; using system.collections.generic; using system.linq; using system.runtime.interopservices; using system.text; using system.threading.tasks; using system.windows; using system.windows.controls; using system.windows.data; using system.windows.documents; using system.windows.input; using system.windows.media; using system.windows.media.imaging; using system.windows.navigation; using system.windows.shapes; using ozeki.media.ipcamera; using ozeki.media.mediahandlers; using ozeki.media.mediahandlers.video; using ozeki.media.mjpegstreaming; using ozeki.media.video.controls; namespace basic_cameraviewer { /// /// interaction logic for mainwindow.xaml /// public partial class mainwindow : window { private videoviewerwpf _videoviewerwpf; private bitmapsourceprovider _provider; private iipcamera _ipcamera; private webcamera _webcamera; private mediaconnector _connector; private myserver _server; private ivideosender _videosender; public mainwindow() { initializecomponent(); _connector = new mediaconnector(); _provider = new bitmapsourceprovider(); _server = new myserver(); setvideoviewer(); } private void setvideoviewer() { _videoviewerwpf = new videoviewerwpf { horizontalalignment = horizontalalignment.stretch, verticalalignment = verticalalignment.stretch, background = brushes.black }; camerabox.children.add(_videoviewerwpf); _videoviewerwpf.setimageprovider(_provider); } #region usb camera connect/disconnect private void connectusbcamera_click(object sender, routedeventargs e) { _webcamera = webcamera.getdefaultdevice(); if (_webcamera == null) return; _connector.connect(_webcamera, _provider); _videosender = _webcamera; _webcamera.start(); _videoviewerwpf.start(); } private void disconnectusbcamera_click(object sender, routedeventargs e) { if (_webcamera == null) return; _videoviewerwpf.stop(); _webcamera.stop(); _webcamera.dispose(); _connector.disconnect(_webcamera, _provider); } #endregion private void guithread(action action) { dispatcher.begininvoke(action); } private void startserver_click(object sender, routedeventargs e) { var ipadress = ipaddresstext.text; var port = int.parse(porttext.text); _server.videosender = _videosender; _server.onclientcountchanged += server_onclientcountchanged; _server.start(); _server.setlistenaddress(ipadress, port); } void server_onclientcountchanged(object sender, eventargs e) { guithread(() => { connectedclientlist.items.clear(); foreach (var client in _server.connectedclients) connectedclientlist.items.add("end point: " + client.transportinfo.remoteendpoint); }); } private void stopserver_click(object sender, routedeventargs e) { _server.onclientcountchanged -= server_onclientcountchanged; _server.stop(); } } }

November 19, 2014

by Timothy Walker

· 27,631 Views

Simple SecurePasswordVault in Java

There are some instances when you want to store your passwords in files to be used by programs or scripts. But storing your passwords in plain text is not a good idea. Use the SecurePasswordVault to encrypt your passwords before storing and get it decrypted when you want to use it. You can use the SecurePasswordVault described here to store any number of encrypted passwords. Passwords are stored as key value pairs. Key - any name given by the user for the password Value - encrypted password SecurePasswordVault will create a file with the given name in the working directory if it doesn't exist. If a file exists then the information in that file will be read. Passwords are encrypted using the MAC address of the network card. SecurePasswordVault will use the first network card MAC which is not the loop back interface. So the encrypted file can only be decrypted with that particular MAC address. If you want to reset the pass word details, just delete the password file and run the SecurePasswordVault. You can download the sample code from the following GitHub repository https://github.com/jsdjayanga/secure_password com.wso2.devgov; import org.bouncycastle.util.encoders.Base64; import javax.crypto.*; import javax.crypto.spec.SecretKeySpec; import java.io.*; import java.net.NetworkInterface; import java.net.SocketException; import java.security.InvalidKeyException; import java.security.NoSuchAlgorithmException; import java.security.Security; import java.util.*; /** * Created by jayanga on 3/31/14. */ public class SecurePasswordVault { private static final int AES_KEY_LEN = 32; private static final int PASSWORD_LEN = 256; private static boolean initialized; private final String secureFile; private final byte[] networkHardwareHaddress; private Map secureDataMap; private List secureDataList; SecretKeySpec secretKey; public SecurePasswordVault(String filename, String[] secureData) throws IOException { Security.addProvider(new org.bouncycastle.jce.provider.BouncyCastleProvider()); initialized = false; secureFile = filename; networkHardwareHaddress = SecurePasswordVault.readNetworkHardwareAddress(); secureDataMap = new HashMap(); this.secureDataList = new ArrayList(secureData.length); Collections.addAll(secureDataList, secureData); byte[] key = new byte[AES_KEY_LEN]; Arrays.fill(key, (byte)0); for(int index = 0; index < networkHardwareHaddress.length; index++){ key[index] = networkHardwareHaddress[index]; } secretKey = new SecretKeySpec(key, "AES"); if (!isInitialized()){ readSecureData(secureDataList); persistSecureData(); } readSecureDataFromFile(); } private boolean isInitialized(){ if (initialized == true){ return true; }else{ File file = new File(secureFile); if (file.exists()){ initialized = true; return initialized; } } return false; } private static byte[] readNetworkHardwareAddress() throws SocketException { Enumeration networkInterfaceEnumeration = NetworkInterface.getNetworkInterfaces(); if (networkInterfaceEnumeration != null){ NetworkInterface networkInterface = null; while (networkInterfaceEnumeration.hasMoreElements()){ networkInterface = networkInterfaceEnumeration.nextElement(); if (!networkInterface.isLoopback()){ break; } } if (networkInterface == null){ networkInterface = networkInterfaceEnumeration.nextElement(); } byte[] hwaddr = networkInterface.getHardwareAddress(); return hwaddr; }else{ throw new RuntimeException("Cannot initialize. Failed to generate unique id."); } } private byte[] encrypt(String word) { byte[] password = new byte[PASSWORD_LEN]; Arrays.fill(password, (byte)0); byte[] pw = new byte[0]; try { pw = word.getBytes("UTF-8"); for(int index = 0; index < pw.length; index++){ password[index] = pw[index]; } byte[] cipherText = new byte[password.length]; Cipher cipher = null; try { cipher = Cipher.getInstance("AES/ECB/NoPadding"); try { cipher.init(Cipher.ENCRYPT_MODE, secretKey); int ctLen = 0; try { ctLen = cipher.update(password, 0, password.length, cipherText, 0); ctLen += cipher.doFinal(cipherText, ctLen); return cipherText; } catch (ShortBufferException e) { e.printStackTrace(); } catch (BadPaddingException e) { e.printStackTrace(); } catch (IllegalBlockSizeException e) { e.printStackTrace(); } } catch (InvalidKeyException e) { e.printStackTrace(); } } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } catch (NoSuchPaddingException e) { e.printStackTrace(); } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } return null; } private String decrypt(byte[] cipherText) { byte[] plainText = new byte[PASSWORD_LEN]; Cipher cipher = null; try { cipher = Cipher.getInstance("AES/ECB/NoPadding"); try { cipher.init(Cipher.DECRYPT_MODE, secretKey); int plainTextLen = 0; try { plainTextLen = cipher.update(cipherText, 0, PASSWORD_LEN, plainText, 0); try { plainTextLen += cipher.doFinal(plainText, plainTextLen); String password = new String(plainText); return password.trim(); } catch (IllegalBlockSizeException e) { e.printStackTrace(); } catch (BadPaddingException e) { e.printStackTrace(); } } catch (ShortBufferException e) { e.printStackTrace(); } } catch (InvalidKeyException e) { e.printStackTrace(); } } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } catch (NoSuchPaddingException e) { e.printStackTrace(); } return null; } public void readSecureData(List secureDataList) throws IOException { BufferedReader bufferRead = new BufferedReader(new InputStreamReader(System.in)); for(int index = 0; index < secureDataList.size(); index++){ System.out.println("Please enter the value for :" + secureDataList.get(index)); String value = new String(Base64.encode(encrypt(bufferRead.readLine()))); secureDataMap.put(secureDataList.get(index), value); } } public String getSecureData(String key) { String value = secureDataMap.get(key); if (value != null){ return decrypt(Base64.decode(value.getBytes())); } throw new RuntimeException("Given key is unknown. [key=" + key + "]"); } private void readSecureDataFromFile() throws IOException { BufferedReader br = new BufferedReader(new FileReader(secureFile)); String line; while ((line = br.readLine()) != null){ int dividerPoint = line.indexOf("="); if (dividerPoint > 0){ secureDataMap.put(line.substring(0, dividerPoint), line.substring(dividerPoint + 1)); } } } private void persistSecureData() throws IOException { FileWriter fileWriter = new FileWriter(secureFile); for(String key : secureDataMap.keySet()){ fileWriter.append(key + "=" + secureDataMap.get(key) + "\n"); } fileWriter.close(); } }

October 5, 2014

by Jayanga Dissanayake

· 15,302 Views · 1 Like

Datacenter Resource Fragmentation

The concept of resource fragmentation is common in the IT world. In the simplest of contexts, resource fragmentation occurs when blocks of capacity (compute, storage, whatever) are allocated, freed, and ultimately re-allocated to create noncontiguous blocks. While the most familiar setting for fragmentation is memory allocation, the phenomenon plays itself out within the datacenter as well. But what does resource fragmentation look like in the datacenter? And more importantly, what is the remediation? The impacts of virtualization Server virtualization does for applications and compute what fragmentation and noncontiguous memory blocks did for storage. By creating virtual machines on servers, each with a customizable resource footprint, the once large contiguous blocks of compute capacity (each server) can be divided into much smaller subdivisions. And as applications take advantage of this architectural compute model, they become more distributed. The result of this is an application environment where individual components are distributed across multiple devices, effectively occupying a noncontiguous set of compute resources that must be unified via the network. It is not a stretch to say that for server virtualization to deliver against its promise of higher utilization, the network must act as the Great Uniter. Not just a virtual phenomenon While fragmentation is easily explained in a virtualized context, the phenomenon is certainly not only a virtual one. After creation, datacenters grow organically. In the best of times, they grow at very predictable, steady rates. More frequently, they grow in fits and spurts as business requirements heap new application demands on top of existing infrastructure. This growth model is made even more chaotic because of physical constraints. Rows are finite and have an end. If you want to rack up additional compute next to existing compute for a particular application, you might have to move a row over. But what about when that row itself is taken? Then maybe you move a couple rows over. Or a room over. Or maybe into another datacenter entirely. Physical locations are also constrained by how much space they have. Even if you have the will to expand, there might simply be no additional real estate to consume. So you build up, in which case the resources you need are now separated by a floor. Or maybe you build out and separate resources by a short distance across the campus. Or across the city. Or maybe even across the country. Sometimes it’s not even the physical space. With very large footprints, trying to pull enough power from the grid might be impossible. And then there are all the business continuity requirements that frequently lead to datacenter resource sprawl across physical locations. The point is that growth is rarely linear, and this means that physical resources cannot normally be guaranteed to be in close proximity. What started as a nicely groomed cluster of compute and storage turns into a set of noncontiguous resources spread out across whatever physical footprint your datacenter (or datacenters) occupies. Unifying contiguous resources There are, of course, ways to unify resources that suffer from this type of sprawl. In the best of cases, if all of your servers are equivalent, you can migrate VMs over time to achieve continuity. The orchestration of such a feat is nightmarish enough, forgetting for a moment the impact of all that activity and the risk it incurs. So if there is no datacenter equivalent for defragmentation, what do you do? The network ends up playing a unifying role. So long as resources are connected, they can work in concert to deliver some application workload. But not all networks are the same, and depending on the spread of resources, the type of network needed varies. Not all networks are the same If resources are contained now and forever in a fairly tight geographical space, then providing rack-to-rack or row-to-row connectivity is fairly straightforward. But what if the applications across those resources are more bandwidth hungry? You might need to consider cross-connect and offload solutions. How about if those applications are particularly latency-sensitive? You might favor completely flat architectures over more traditional two- and three-tier networks. If resources are not so easily contained, the network choices expand. If application workloads are distributed across different rooms in a datacenter, you have to consider the impact of room-to-room connectivity. Is that done through a WAN connection, in which case you take on yet another networking layer? Or do you use optical equipment to stretch an L2 domain across some physical distance, in which case you have to consider laying or leasing fiber? And even then, as distances grow from a few hundred meters to a few thousand kilometers, the considerations change again. Conditions will change Finally, the complexity only increases as you consider that all of this is a moving target. When your business is smaller, perhaps you can keep everything in one location. A few years down the road, maybe you outgrow your site or leasing terms change. Your company acquires another company, and you now have resource sprawl with a datacenter consolidation project on the horizon. Accounting for all of the potential outcomes is challenging. The best that you can do is create solid architectural building blocks that provide the most optionality for whatever outcomes exist. In that regard, planning for growth is about considering how that growth might materialize and including flexibility as one of the primary requirements around the underlying infrastructure. The bottom line As datacenters grow, application resources will become fragmented. The question is not whether you will have to deal with this but rather how quickly your infrastructure can adapt. Architecting with this explicitly in mind could mean the difference between natural evolution or the types of transformation initiatives that stop companies dead in their tracks every 3-5 years. [Today’s fun fact: Chewing gum while peeling onions will prevent you from crying. It doesn’t work as well in romantic comedies.]

October 3, 2014

by Mike Bushong

· 8,498 Views

How to Resolve Maven's ''Failure to Transfer'' Error

Learn how to resolve the ''failure to transfer'' error encountered in Maven in this quick tutorial.

September 24, 2014

by Jose Roy Javelosa

· 130,034 Views · 4 Likes

The Near Future of IoT

[This article was written by Sean Lorenz.] Pundits within the technology sphere have been calling 2014 the year of the Internet of Things (IoT). The market revenue potentials are forecasted into the trillions and it’s a Fortune 500 land grab with major companies moving quickly to stake their claims [1]. If this sounds a bit like pages from an American Wild West history book, a frontier analogy isn’t too far off. This is an exciting turning point in technology that—thanks to advances in plummeting sensor costs, wireless communication, and chip size reduction—will soon make today’s futuristic IoT concepts seem humorous in retrospect. While it’s difficult to see where the market is going, given the exponential rate of change in IoT technology, I have noticed several key trends emerging. As a fellow IoT prospector on the frontier, this is my account of the most evident trends as well as some educated predictions for the future. Trends 1. Business Value Over Technology Focus Like any promising new technology still in its infancy stage, the true innovation stems from tech-savvy researchers and tinkerers that build fascinating devices that sometimes have no consumer base–I’m looking at you, robotics market. We have all heard about the smart toothbrushes and smart egg trays coming to market and thought: “Interesting! I wouldn’t buy one, but… sure!” Perhaps the biggest trend is a shift from thinking, “let’s build it because we can” to “what business problem are we solving here?” IoT developers are getting wise to this mentality and building user-focused MVPs (Minimum Viable Products) that will begin hitting the market in late 2014 and early 2015. 2. Keeping It Real At my company, Xively, we often get asked what are the real use cases for the IoT. Many times our customers walk in the door with a vague idea of how connecting their product or service to the Internet would be potentially interesting, but need a little help with seeing how an IoT-enabled product can transform their business—internally and externally. The reason for this is that most of the exciting, transformative elements happen under the hood. Right now, the true “wow” moments in the industry are far from sexy: energy savings in enterprise complexes, CRM & ERP integration, service and support, supply chain efficiencies, product part failure and alert, and so on… you get the idea. Smart homes that respond to our every whim are really great ideas, but these products aren’t integral to our lives yet. Large manufacturing companies and enterprises are using the Internet of Things to manage internal operations and efficiency while also engaging their customers more fully with new IoT data sources aggregated in existing services like Salesforce1 or SAP. 3. Publish-Subscribe The IoT protocol wars are heating up, but allegiances aside, publish-subscribe messaging is what the bulk of implemented models use for connecting devices to the cloud. Pub-sub protocols such as MQTT, CoAP, and AMQP are attractive for connected product development thanks to their ease of scalability and many-to-one/one-to-many possibilities. Given the massive variance of the IoT market, there is bound to be more than one protocol that wins in the end; yet before we get to that point, there are plenty of bugs and vulnerabilities to patch across all of the thriving, open IoT protocols out there. 4. Security Panic! Hacked refrigerators, big box stores, and security cameras… oh my! There has been no shortage of concern for privacy, security, and compliance in the Internet of Things space. Like any news story, some of this attention is warranted and some overblown. Just like your pre-IoT old-fashioned Internet, creating specific application keys and advanced permissioning systems for hardware connecting to the cloud is essential. The amount of nodes at the edge connecting to services across the Internet will be far larger than anything we see now, but IoT platforms are already addressing these complex device lifecycle management issues that are crucial for protecting personal and enterprise information in a connected world. Near-Term Predictions Now lets hop in the DeLorean and look into the future. Rather than focus on five, ten, or twenty years into the future, let’s focus only on the next few years. Why? As I mentioned in the beginning, the IoT landscape changes on a day-to-day basis, so even a prediction looking forward six-months from now can be unreliable. This list contains no self-driving cars or sentient AIs. Instead, it makes some pretty sure bets for what to expect over the horizon. 1. A Household Name Usually the second question after “what’s your name?” at a dinner party is the inevitable “so what do you do?” Mentioning the Internet of Things to non-techies still draws blank stares and looks of confusion. Those looks are justified given the not-so-great marketing name of IoT and the myriad definitions trying to explain what it actually is. Whether it’s called the Internet of Things, Internet of Everything, or just the good ol’ Internet, the concept of connecting any and everything to the Internet will begin to make sense for everyday consumers. 2. Consumers Slow to Adopt Many IoT products are still just toys in many people’s minds. Startups are building products that address problems which most consumers don’t see as a problem yet. This isn’t to say the consumer IoT market will evaporate. It just means we need to get smarter about what customers actually want from smart devices. Today’s wearable products remind me of the Newton—Apple’s infamous PDA. The problem wasn’t the idea, but rather the timing. The Apple Newton seemed clunky, not very powerful, and low on the usability scale. Years later, the iPhone and iPad came along with a set of features and a form factor that customers were looking for. The same feels true of wearables right now—they may need a few more years to incubate before the general public gives two thumbs up. Other consumer IoT markets such as the smart home or driverless cars seem to be in the same situation as the wearables market, but this is changing quickly with major players like Apple and Google moving into these arenas. For example, in the home automation space, frameworks like Apple HomeKit will be essential for unifying disparate protocols and clouds into one application that can handle various products’ data, automating much of the technology and pushing it into the background. I am sure there is a brilliant developer learning Swift and building the first killer smart home app as we speak. 3. Analytics and Automation This prediction probably comes as no surprise, but it is worth stating. Most companies willing to foray into the IoT unknown are, for now, happy with connecting their devices to an external application or cloud service. Having a place to send the data is usually the first step in constructing an IoT system. So what do you do with all this data once you have it? Reporting tools for IoT are just starting to become available, but this is just the tip of the iceberg. The real magic lies in the ability to use exploratory and predictive algorithms to make actionable intelligence a reality. These insights are beneficial to both businesses understanding their customers and to the customers themselves. One could imagine closing the feedback loop between sensor, cloud, and actuator by adding some beautiful supervised machine learning code into the cloud platform at some point in the chain. There are currently a handful of analytics startups focusing on IoT specifically, but this market is about to explode from both platform and application perspectives. 4. IoT Startups Galore For any developers out there interested in the IoT with a real customer pain that needs solving, now is the time to get coding and building that pitch deck. With hardware back en vogue, venture capital funding of IoT-centric companies ison the rise [2]. Having been to a number of IoT events, the amount of enthusiasm by VC and angel investors is palpable. There’s a definite need for developers with great, connected product and service ideas; so, if you haven’t already, I strongly suggest putting on your favorite prospecting gear and exploring the untamed wild west of the Internet of Things. [1] https://internetofeverything.cisco.com/sites/default/files/docs/en/ioe_public_sector_vas_white%20paper_121913final.pdf [2] http://www.cbinsights.com/blog/internet-of-things-investing-snapshot 2014 Guide to Internet of Things The 2014 Guide to Internet of Things covers 39 different IoT SDKs, developer programs, and hardware options, plus: Key findings from our survey of over 2,000 developers "How to IoT Your Life: The Complete Shopping List" "The Scale of IoT" Infographic Glossary of common IoT terms Four in-depth articles from industry experts DOWNLOAD NOW

August 28, 2014

by Benjamin Ball

· 10,546 Views

Lambda Architecture Principles

"Lambda Architecture" (introduced by Nathan Marz) has gained a lot of traction recently. Fundamentally, it is a set of design patterns of dealing with Batch and Real time data processing workflow that fuel many organization's business operations. Although I don't realize any novice ideas has been introduced, it is the first time these principles are being outlined in such a clear and unambiguous manner. In this post, I'd like to summarize the key principles of the Lambda architecture, focus more in the underlying design principles and less in the choice of implementation technologies, which I may have a different favors from Nathan. One important distinction of Lambda architecture is that it has a clear separation between the batch processing pipeline (ie: Batch Layer) and the real-time processing pipeline (ie: Real-time Layer). Such separation provides a means to localize and isolate complexity for handling data update. To handle real-time query, Lambda architecture provide a mechanism (ie: Serving Layer) to merge/combine data from the Batch Layer and Real-time Layer and return the latest information to the user. Data Source Entry At the very beginning, data flows in Lambda architecture as follows ... Transaction data starts streaming in from OLTP system during business operations. Transaction data ingestion can be materialized in the form of records in OLTP systems, or text lines in App log files, or incoming API calls, or an event queue (e.g. Kafka) This transaction data stream is replicated and fed into both the Batch Layer and Realtime Layer Here is an overall architecture diagram for Lambda. Batch Layer For storing the ground truth, "Master dataset" is the most fundamental DB that captures all basic event happens. It stores data in the most "raw" form (and hence the finest granularity) that can be used to compute any perspective at any given point in time. As long as we can maintain the correctness of master dataset, every perspective of data view derived from it will be automatically correct. Given maintaining the correctness of master dataset is crucial, to avoid the complexity of maintenance, master dataset is "immutable". Specifically data can only be appended while update and delete are disallowed. By disallowing changes of existing data, it avoids the complexity of handling the conflicting concurrent update completely. Here is a conceptual schema of how the master dataset can be structured. The center green table represents the old, traditional-way of storing data in RDBMS. The surrounding blue tables illustrates the schema of how the master dataset can be structured, with some key highlights Data are partitioned by columns and stored in different tables. Columns that are closely related can be stored in the same table NULL values are not stored Each data record is associated with a time stamp since then the record is valid Notice that every piece of data is tagged with a time stamp at which the data is changed (or more precisely, a change record that represents the data modification is created). The latest state of an object can be retrieved by extracting the version of the object with the largest time stamp. Although master dataset stores data in the finest granularity and therefore can be used to compute result of any query, it usually take a long time to perform such computation if the processing starts with such raw form. To speed up the query processing, various data at intermediate form (called Batch View) that aligns closer to the query will be generated in a periodic manner. These batch views (instead of the original master dataset) will be used to serve the real-time query processing. To generate these batch views, the "Batch Layer" use a massively parallel, brute force approach to process the original master dataset. Notice that since data in master data set is timestamped, the data candidate can be identified simply from those that has the time stamp later than the last round of batch processing. Although less efficient, Lambda architecture advocates that at each round of batch view generation, the previous batch view should just be simply discarded and the new batch view is computed from master dataset. This simple-mind, compute-from-scratch approach has some good properties in stopping error propagation (since error cannot be accumulated), but the processing may not be optimized and may take a longer time to finish. This can increase the "staleness" of the batch view. Real time Layer As discussed above, generating the batch view requires scanning a large volume of master dataset that takes few hours. The batch view will therefore be stale for at least the processing time duration (ie: between the start and end of the Batch processing). But the maximum staleness can be up to the time period between the end of this Batch processing and the end of next Batch processing (ie: the batch cycle). The following diagram illustrate this staleness. Even the batch view is stale period, business operates as usual and transaction data will be streamed in continuously. To answer user's query with the latest, up-to-date information. The business transaction records need to be captured and merged into the real-time view. This is the responsibility of the Real-time Layer. To reduce the latency of latest information availability close to zero, the merge mechanism has to be done in an incremental manner such that no batching delaying the processing will be introduced. This requires the real time view update to be very different from the batch view update, which can tolerate a high latency. The end goal is that the latest information that is not captured in the Batch view will be made available in the Realtime view. The logic of doing the incremental merge on Realtime view is application specific. As a common use case, lets say we want to compute a set of summary statistics (e.g. mean, count, max, min, sum, standard deviation, percentile) of the transaction data since the last batch view update. To compute the sum, we can simply add the new transaction data to the existing sum and then write the new sum back to the real-time view. To compute the mean, we can multiply the existing count with existing mean, adding the transaction sum and then divide by the existing count plus one. To implement this logic, we need to READ data from the Realtime view, perform the merge and WRITE the data back to the Realtime view. This requires the Realtime serving DB (which host the Realtime view) to support both random READ and WRITE. Fortunately, since the realtime view only need to store the stale data up to one batch cycle, its scale is limited to some degree. Once the batch view update is completed, the real-time layer will discard the data from the real time serving DB that has time stamp earlier than the batch processing. This not only limit the data volume of Realtime serving DB, but also allows any data inconsistency (of the realtime view) to be clean up eventually. This drastically reduce the requirement of sophisticated multi-user, large scale DB. Many DB system support multiple user random read/write and can be used for this purpose. Serving Layer The serving layer is responsible to host the batch view (in the batch serving database) as well as hosting the real-time view (in the real-time serving database). Due to very different accessing pattern, the batch serving DB has a quite different characteristic from the real-time serving DB. As mentioned in above, while required to support efficient random read at large scale data volume, the batch serving DB doesn't need to support random write because data will only be bulk-loaded into the batch serving DB. On the other hand, the real-time serving DB will be incrementally (and continuously) updated by the real-time layer, and therefore need to support both random read and random write. To maintain the batch serving DB updated, the serving layer need to periodically check the batch layer progression to determine whether a later round of batch view generation is finished. If so, bulk load the batch view into the batch serving DB. After completing the bulk load, the batch serving DB has contained the latest version of batch view and some data in the real-time view is expired and therefore can be deleted. The serving layer will orchestrate these processes. This purge action is especially important to keep the size of the real-time serving DB small and hence can limit the complexity for handling real-time, concurrent read/write. To process a real-time query, the serving layer disseminates the incoming query into 2 different sub-queries and forward them to both the Batch serving DB and Realtime serving DB, apply application-specific logic to combine/merge their corresponding result and form a single response to the query. Since the data in the real-time view and batch view are different from a timestamp perspective, the combine/merge is typically done by concatenate the results together. In case of any conflict (same time stamp), the one from Batch view will overwrite the one from Realtime view. Final Thoughts By separating different responsibility into different layers, the Lambda architecture can leverage different optimization techniques specifically designed for different constraints. For example, the Batch Layer focuses in large scale data processing using simple, start-from-scratch approach and not worrying about the processing latency. On the other hand, the Real-time Layer covers where the Batch Layer left off and focus in low-latency merging of the latest information and no need to worry about large scale. Finally the Serving Layer is responsible to stitch together the Batch View and Realtime View to provide the final complete picture. The clear demarcation of responsibility also enable different technology stacks to be utilized at each layer and hence can tailor more closely to the organization's specific business need. Nevertheless, using a very different mechanism to update the Batch view (ie: start-from-scratch) and Realtime view (ie: incremental merge) requires two different algorithm implementation and code base to handle the same type of data. This can increase the code maintenance effort and can be considered to be the price to pay for bridging the fundamental gap between the "scalability" and "low latency" need. Nathan's Lambda architecture also introduce a set of candidate technologies which he has developed and used in his past projects (e.g. Hadoop for storing Master dataset, Hadoop for generating Batch view, ElephantDB for batch serving DB, Cassandra for realtime serving DB, STORM for generating Realtime view). The beauty of Lambda architecture is that the choice of technologies is completely decoupled so I intentionally do not describe any of their details in this post. On the other hand, I have my own favorite which is different and that will be covered in my future posts.

August 20, 2014

by Ricky Ho

· 12,226 Views

The Programming Challenges of IoT

Pragmatic developers can look at the Internet of Things in two ways: This is amazing. I can only begin to imagine how I can directly improve the world outside the set of networked computer boxes. This is terrifying. If something goes wrong, then it’s on me—and this time the system affected extends outside the set of networked computer boxes. IoT is amazing in the way it bridges physical and virtual environments, but even the phrase “Internet of Things” should give a developer pause. Computers are pretty smart. Things are stupid. IoT tries to put Things online and tries to make them into inter-networked computers. That’s pop-philosophy, but you want to develop in the real world. So what real-world challenges will you face when you shoot for the IoT moon? Two Types of Challenges It seems there are two types of programming challenges for the Internet of Things: Data and control (the comp-sci and networking stuff) Information and business logic (the info-sci and human-computer interaction stuff) For this article, we’re going to talk about the programming problems we can solve around IoT. We’ll start at the bottom (data and control) and work our way up to the big picture (information and business logic). Type 1: Data and Control Challenge 1.1: Power This one is pretty obvious. Many IoT devices are wireless, and no one has invented thumbnail fusion reactors yet. One solution is equally obvious: pick your algorithms carefully. If you can save cycles to perform a given task, then do it. Libraries for implementing power-optimized algorithms will presumably spring up in greater numbers, but even so, you may need to inject some heavy-duty comp-sci know-how into IoT app development. The second solution is more complex than the first. Higher-level developers will have to think more about Dynamic Power Management (DPM), which just means: shutting down devices when they don’t need to be on and starting them up when they do. Normally the operating system worries about this, but an IoT application that orchestrates wearables and phones, for example, will know things that each device’s OS won’t—and therefore will be able to switch things on and off more intelligently than each device’s individual OS. Another option is to write or customize an embedded OS. Challenge 1.2: Latency Latency on IoT sits in two places: at the source and in the pipes. The basic problem is a physical one. Thing-chips often have to be small, which means that the chip can only be as powerful as current transistor technology allows. Another problem is power. Many small devices transmit and receive data in discrete active/sleep cycles (think TDMA) in order to save bandwidth and power, but this increases latency inversely to power saved. Another tradeoff is that network topologies optimized for IoT can involve more hops over slower devices. Mesh networks, for example, are immune to the failure of a few nodes. Similarly, “fog” and “edge” computing paradigms relieve Internet infrastructure by doing as much as possible without hub-nodes. The downside is that each node (a) can’t do very much on its own and (b) can only talk to neighboring nodes. The problem in the pipes is a matter of network infrastructure. Simply: the more Things, the less available bandwidth. Infrastructure technology will get faster, but cell networks won’t catch up overnight. And Things, unlike fancier computers, are often supposed to transmit blindly—that is, without anyone necessarily asking them to. This means there’s a massive potential for wasted bandwidth. Challenge 1.3: Unreliability The third challenge flows from the first two. Devices are unreliable–“Things” even more so. The distributed and decentralized virtues of IoT bring their own reliability problems. Here are just a few: Ubiquitous devices are cheap, so they fail more often. Truly ad-hoc connectivity implies ephemeral SLA, so uptime and recovery time may be unclear. Loosely controlled devices may have better things to do than give you their data (or computing resources), so concurrency may grow very complex. Less-reliable hardware generates less-reliable information (‘does my outlying datapoint just signify device failure?’), so you may need to chew your data more thoroughly at the application level. In a sense, IoT decouples low-level (the sub-session layer) from high-level channel capacity, because the distribution of error-sources on IoT is more heavily weighted toward originating or remote nodes. This means more error-correcting for application developers. Type 2: Information and Business Logic Challenge 2.1: Vast & Thin Data Sensors on smartphones are already generating oceans of raw data. These sensors are pretty sophisticated. Every major mobile OS provides a unified, simple API to access clean sensor and geo data. But start grabbing this data and it’s not immediately clear what to do with it. Try to think of killer applications for barometric data—besides weather and elevation (with GPS)—off the top of your head. Raw sensor data is extremely thin. It doesn’t explain itself, and we haven’t yet produced a complete mapping from physical measurements to business logic—let alone software design. Even if you know what to do with sensor/geo data eventually, you may have to learn new algorithms and data structures to process immediately. Geo-graphs aren’t CS101 graph data structures (for one thing, edge length is a first-class citizen of geo-graphs). The size of data over IoT is itself a problem. Wireless sensors beget tons of data. All the problems (and opportunities) of Big Data cascade naturally from IoT. Massively distributed computing on IoT devices is an exciting thought, but the toolchain for splitting calculations over a thousand idle Fitbits just isn’t here yet. 2. Context-Sensitivity Consider the term “ubiquitous computing,” defined as: what happens when wirelessly connected sensors and actuators, placed more or less everywhere, allow software to interact with much larger swaths of the physical world than just hardware or bare metal. Put ubiquitous computing on the Internet, and IoT makes the software context much larger. This has implications at two basic levels. At a high computer-architectural level: IoT extends the concept of computing environment well outside the von Neumann machine and weakens the concept of peripheral I/O. In an IoT-world interface, sensors are input and actuators are output. As IoT devices process increasingly at the edge (within individual nodes), the devices that appear peripheral to other nodes are actually doing an awful lot of computation. At a high business-logic level: the more stuff outside the computer-box affects the program, the less predictable the program behavior becomes at runtime. The same bizarrely-birthed memory leak might slow down the UI in a smartphone context but contribute to a cascading electrical grid failure in an IoT context. This means that IoT demands more self-monitoring and self-repairing code. Two Types of Solutions Plenty of researchers are working on ambitious solutions to the programming challenges presented by IoT. Two of the more exciting examples include: Abstract Task Graph—a data-driven model that maps the network graph to an application graph [1] Computational REST—replaces content resources with computation resources [2] There are also a few more strategies you can use right now to solve some of the IoT programming challenges mentioned above. Reactive ProgrammingThis general purpose paradigm responds to all major application-level challenges and embraces opportunities presented by IoT. The four definitive attributes of a reactive application are: event-driven, scalable, resilient, and responsive [3]. These four are excellent guiding principles for IoT applications at a high, cross-stack level. Flow-based Programming and the Actor ModelBoth present strongly independent components where only messages can affect processes. Both are deeply amenable to concurrency (for example, shared state is discouraged), nondeterminism, and scaling without exponential complexity growth, because components are black boxes. FBP is a bit more pragmatic and restrictive while the actor model is less restrictive and a bit harder to implement. FBP has already been implemented in Javascript (NoFlo), and the actor model has been implemented in Java (Akka) [4][5][6]. What’s important to remember is that there are already tools and techniques that can help you build IoT applications. FBP, actors, and reactive programming all have key attributes for creating applications that leverage the strengths of IoT to overcome its challenges. [1] https://www.usenix.org/legacy/event/mobisys05/eesr05/tech/full_papers/bakshi/bakshi.pdf [2] http://isr.uci.edu/tech_reports/UCI-ISR-10-3.pdf [3] http://www.reactivemanifesto.org/ [4] http://jpaulmorrison.com/fbp/ [5] http://arxiv.org/ftp/arxiv/papers/1008/1008.1459.pdf [6] http://noflojs.org/ [7] http://akka.io/ 2014 Guide to Internet of Things The 2014 Guide to Internet of Things covers 39 different IoT SDKs, developer programs, and hardware options, plus: Key findings from our survey of over 2,000 developers "How to IoT Your Life: The Complete Shopping List" "The Scale of IoT" Infographic Glossary of common IoT terms Four in-depth articles from industry experts DOWNLOAD NOW

August 14, 2014

by John Esposito

· 16,383 Views

An Early Mover's Guide to the Internet of Things

[This article was written by Andreea Borca, developer of patient-empowering solutions for the healthcare industry, co-host of Farstuff: The IoT Podcast, and featured author in DZone's 2014 Guide to Internet of Things]. The creation of the Internet was a significant shift in the way people acquire information, interact with each other, and make decisions. Now, the Internet is expanding its reach to a range of devices that can gather and analyze physical data and react to that data in a variety of applications that we’ve never seen before. This “Internet of Things” marks another dynamic shift in the history of technology. This new stage in the Internet’s evolution is changing it from a tool that we actively need to engage with—deliberately using a browser to access it—to one that passively endows the world around us with a “mind” of its own. We are developing a world where things interact intelligently and cooperate to achieve goals without explicit guidance from human operators. Defining the Internet of Things First, we need to define the Internet of Things (also called “The Internet of Everything” by Cisco). A system falls under the Internet of Things definition if it meets the following criteria, known as the “3 Cs”: 1. It must Connect – to the physical world around itself collecting information, to other things in order to interact with them effectively, to the internet or a network, etc. 2. It must Compute – by processing the inputs it receives in some way and making them meaningful to other systems. 3. It must Communicate – with the network, with other things, and with the user if necessary (more often than not, as you’ll see, communicating to the user may be an unnecessary burden). Challenges for the Internet of Things Efficiency Devices within the Internet of Things (IoT) only need to do the bare minimum necessary to effectively work within the existing ecosystem. Many of the newest products rely heavily on the power of your smartphone to connect to the Internet and orchestrate devices, but there is also extensive pressure to reduce the size, energy consumption, and cost of the processing entities within IoT devices. In order to reduce power consumption and manage node outages, there is a concept of daisy-chaining across a network of devices into a more powerful central hub. This is known as mesh networking, and it’s becoming quite popular for IoT systems. Security, Privacy & the need to Share A core requirement of a well-functioning IoT device is to collect, transfer, and store data from a wide variety of sources. As more sensors arrive in cities and healthcare institutions, that increasingly connected information will unavoidably lead to more concern about security and privacy. The debate is still raging over balancing the clear benefits of new discoveries from processing Big Data with the strong personal fear of losing privacy. With IoT now in the picture, there is concern about devices that continuously and passively collect information on users. One recent clash over always-on sensors came with the release of Microsoft’s Xbox One Kinect console, which has a camera that is constantly pointed at your living room. Although the camera itself is not always on, the backlash over that possibility was fierce [1]. Finding this balance will quickly become a requirement for continued progress. Furthermore, the very nature of IoT and the connectivity network necessary for its success does make it particularly vulnerable in certain instances. Devices are especially vulnerable when connected over WiFi, because low tech sensor nodes with minimal computing power tend to be less secure, making them the ideal point of entry for infiltrators. Standards As with all new technologies, the battle over standards is always a struggle. Nest, the company that developed the most popular smart home thermostat, and its new owner, Google, are now making significant strides trying to establish the Nest platform as the foundation for all consumer-based IoT devices and their software counterparts [2]. Cisco, Qualcomm, IBM, Microsoft, and most other major players have a similar strategy for creating standard models for approaching the Internet of Things. The pressure to standardize is especially clear when new devices are appearing weekly. ZigBee already has extensive reach as an established standard for many household IoT devices. However, as a preferred codebase has yet to emerge as the standard of choice, it is recommended to connect with major standardization organizations like the IEEE, IETF, and the ZigBee Alliance [3]. Currently, the most common sensor networks use protocols such as Bluetooth Low Energy (BLE), RFID tags, ZigBee, and Wi-Fi. There are also iBeacons, which allow devices like smartphones to better identify their location and potential needs with NFC-powered micro-location and GPS technology. Opportunities for the Internet of Things There are numerous prospects to consider when looking to develop IoT products. Given the multi-trillion dollar projections for the future IoT economy, we should take a look at these emerging markets for IoT tech [4]. Consumers The consumer IoT space has bred a small but growing segment of followers that have invested early into “smart” tech. At this year’s CES, we saw everything from the Babolat Tennis Racket that becomes your personal tennis coach to the Kolibree Toothbrush that monitors your gum health while you brush. The fastest growing consumer IoT segment seems to be in smart home technology, with products such as self-managing refrigerators and resident-sensing door locks. Commercial Retailers have already proven adept at collecting a consumer’s shopping history. With the functionality of NFC-powered beacons, these retailers are eager to personalize your shopping experience in a whole new way. Essentially, each physical shopping trip can now be as littered with targeted ads as any typical online search, much like a scene from the 2002 sci-fi film Minority Report. Walk into a store and instantly the advertising screens on the wall change to address your particular demographic, income level, and shopping preferences. If you’ve connected your Google calendar to certain applications, these screens would show outfits targeting your next big event. Signs on clothing racks sense you coming near and change prices, fully leveraging a custom pricing model that would have economists drooling. And as you try on outfits, the smart mirror in the dressing room recommends accessories or comments on alternatives that might be a better fit for your body type. After all of these IoT events have helped you with your purchase, there’s no need to checkout. You’ve registered with the store and there’s a beacon at the exit that registers what you picked up and charges your card automatically as you leave. Healthcare With the recent U.S. mandate that all health records must be digital, there has been an explosion in the marketplace of new, patient-centered, smart health devices. The excitement of a healthcare revolution among top innovative companies, incubators, and startups predicts that this trend is not likely to taper off anytime soon. The key areas of focus so far have been: monitoring technologies like wearables (especially passive monitoring), function improving technologies, education, and notification technologies. Wearables are generally the first consumer touch point in the IoT health sphere. With the popularity of Fitbit pedometers and Withings scales, the market is starting to experiment with internal monitoring and potentially replacing some organs completely in the near future. A study at Boston University has had incredibly positive results creating an artificial pancreas for Type 1 diabetics by inserting an insulin and glucagon pump that responds when an attached glucometer goes below a certain level, just like an actual pancreas. Proteus, a promising startup out of San Francisco, has created an all-natural microchip in a pill that the patient swallows in order to monitor whether they are remembering to take their medication. The pill sends data to an armband that the user is wearing, which then can send notifications to family members regarding the patient’s status. The most impressive feature is the fact that these chips are powered by the energy in the patient’s digestive system. Cities, Infrastructure, and Industry The long-term vision of the future includes technology such as self-driving cars and city lights that alert police when there’s been an accident. In this stage of development, the majority of value is coming from technologies that monitor and collect data in urban settings. From an evolutionary perspective, the IoT city as a whole is still in what many would consider a learning phase. The main objective is to collect as much data as possible, make it available via open APIs, and encourage motivated data analysts to find opportunities for improvement in utility usage, environmental impacts, and service management for larger populations. This is one area where being an industrial country like the U.S. may actually impede the ability to progress as quickly as our less established counterparts. Third world countries that haven’t yet built a solid infrastructure allow for the creativity and flexibility to implement sophisticated solutions unfettered by generations of previous development. Silicon Valley powerhouses like Facebook and Google are actively engaged in projects to create a free global Wi-Fi network, and key locations in Africa have allowed them to experiment with these projects. Being an Early Mover In the very near future, as more and more things connect to the internet, internet connectivity from IoT devices will dwarf the amount of traditional web browsing. The core standards and assumptions that will drive this next revolution in computing technology are still being established and, as a result, building anything that can add value to this exploding industry (software, hardware, devices, sensors, beacons etc.) is a remarkable opportunity for the right developer. Right now is the time to start contributing to the development of these technologies if you want to be an early mover in IoT. 2014 Guide to Internet of Things The 2014 Guide to Internet of Things covers 39 different IoT SDKs, developer programs, and hardware options, plus: Key findings from our survey of over 2,000 developers "How to IoT Your Life: The Complete Shopping List" "The Scale of IoT" Infographic Glossary of common IoT terms Four in-depth articles from industry experts DOWNLOAD NOW

August 12, 2014

by Benjamin Ball

· 15,119 Views

How to Install Mono on a Raspberry Pi

This post exists to help with an MSDN Magazine article that I am authoring It provides some of the low-level details for the article How to install Mono and root certificates on a raspberry pi How to create an Azure mobile service How to create a Custom API inside Azure mobile services that the raspberry pi can call into How to create an Azure storage account MONO - HOW TO INSTALL ON A RASPBERRY PI Why Mono? How to install Mono on a raspberry pi Installing trusted root certificates on to the raspberry pi http://www.mono-project.com/Main_Page An open source, cross-platform, implementation of C# and the CLR that is binary compatible with Microsoft.NET Mono is a free and open source project led by Xamarin (formerly by Novell) that provides a .NET Framework-compatible set of tools including, among others, a C# compiler and a Common Language Runtime WHY MONO? Because it lets us write .net code compiled on Windows We can simply copy the binary files from Windows to Linux and run it as is From a raspberry pi device, it is possible to use a .net application to take a photo and upload it to Windows Azure storage HOW TO INSTALL ON A RASPBERRY PI RUNNING LINUX You will issue the following commands: pi@raspberrypi ~ $ sudo apt-get update pi@raspberrypi ~ $ sudo apt-get install mono-complete The first command makes sure all the local package index are up to date with the changes made in repositories. Second command installs the complete Mono tooling and runtime. MAKING SURE THAT YOUR MONO APPLICATIONS CAN MAKE A HTTPS REST-BASED CALLS This command downloads the trusted root certificates from the Mozilla LXR web site into the Mono certificate store. Once complete, the Raspberry PI will be capable of making web requests using HTTPS requests within Mono. pi@raspberrypi ~ $ mozroots --import --ask-remove --machine CREATING A NEW AZURE MOBILE SERVICES ACCOUNT The mobile services account is needed to host a Node.js application that provides shared access signatures to raspberry pi devices The shared access signature is needed by the raspberry pi, so that it can directly and securely upload photos to Azure storage STEPS TO CREATE AN AZURE MOBILE SERVICE The steps below will create an Azure mobile service The service will be used to host a Node.js application interacting with a raspberry pi devices We will provision a SQL database, although it will not be used initially FOLLOW THESE STEPS TO CREATE THE MOBILE SERVICE Login into the Azure Portal Select MOBILE SERVICES from the left menu pane at the Azure Portal. In the lower left corner select "+NEW" to create a new Azure Mobile Service. Make sure you've selected, "COMPUTE / MOBILE SERVICE / CREATE." You will now enter a url. We will call this service raspberrymobileservice. For the DATABASE, we will choose "Create a new SQL database instance." The REGION we chose is "West US." The BACKEND is "JavaScript." Click the "->" arrow to proceed to the next screen. In this screen you will "Specify database settings." The NAME of your database will based on the URL you entered previously. In this case, the database is called "raspberrymobileservice_db." You will need to choose a SERVER. We will choose "New SQL database server" from the drop-down list. You will need to provide a SERVER LOGIN NAME and a SERVER LOGIN PASSWORD. Take note of the login you provided as it will be needed later CREATING A CUSTOM API Azure mobile services allows you to create a custom API written in JavaScript that can be called from a raspberry pi device using REST This custom API is really just a Node.js application running in the server CREATING THE API TO RESPOND TO THE DEVICE TRYING TO UPLOAD PHOTOS Now that the service is established, we will turn our attention to creating an API that the device can call into to upload a photo. Login into the Azure Portal Your mobile service will take a few minutes to complete, and you should see the "Ready" flag as the "Status" for your service. Once it is ready you can drill into your service to customize its behavior. Just to the right of the service name, click the right arrow key "->" to drill into the service details. The top menu bar will offer many options, but we are interested in the one titled "API." The API allows you to create a series of node.JS API calls that a device can call into using rest-based approaches. Click on "API." from there, select "CREATE A CUSTOM API." You will be asked to provide an API name. Type in "photos" for the API name. Below you will see a series of drop-down combo boxes that relate to permission. We will keep the default value of "Anybody with the application key." This might not be the best option for all scenarios. You can read more about this here. http://msdn.microsoft.com/en-us/library/azure/jj193161.aspx. Click the checkmark to complete the process. The name of the AP you just created, "Photos," should be visible on the portal interface. To drill into the photos API click on the right arrow key "->". The right arrow key will be just to the right of the name of the API "Photos". At this point you should see a basic script that has been provided by default. We will overwrite this default script with our own script as described in the MSDN Magazine article. CREATING A STORAGE ACCOUNT TO STORE THE PHOTOS Navigate to the portal and create a storage account Create a container for the photos Obtain the: Storage Account Name (you will provide a name) Storage Account Access key (generated for you) Container Name (you will create) CREATING A STORAGE ACCOUNT We will need a storage account so that we can upload photos to it. The steps are well documented here: http://azure.microsoft.com/en-us/documentation/articles/storage-create-storage-account/ In our case we call the storage account raspberrystorage. This means that the URL that the device will use to upload photos is https://raspberrystorage.blob.core.windows.net/. As you complete these steps make sure that you choose the storage account location to be the same location as was used for your mobile services account. This avoids any unnecessary latency or bandwidth costs between data centers. Once the storage account is created, we will need to create a container within it. Photos or any blob for that matter, are always stored within a container. To create a container drill into your newly created storage account and select CONTAINERS from the top menu. From there, select CREATE A CONTAINER. The new container dialog box will ask for a name for your container. Take note of the name you provide. We are calling our container ?photocontainer.? When the raspberry pi device uploads photos to the storage account, it will target a specific container, such as the one we just created. You will next be asked to indicate ACCESS rights. To keep things simple we will select access rights of Public Blob. ENTERING APP SETTINGS Rather than hard-code storage account information inside your JavaScript/Node.js applications, you should consider using apps settings inside of the Azure mobile services portal This post also discusses it well: http://blogs.msdn.com/b/carlosfigueira/archive/2013/12/09/application-settings-in-azure-mobile-services.aspx ?The idea of application settings is a set of key-value pairs which can be set for the mobile service (either via the portal or via the command-line interface), and those values could be then read in the service runtime.? NAVIGATING TO APP SETTINGS Navigate to the Azure Mobile Services section of the portal. Drill into the specific service by hitting the arrow below Select from the Configure Menu at the top Scroll down to the very bottom to see app settings Note that we need to enter: - We need to get this from Azure Storage - PhotoContainerName - AccountName - AccountKey We get this information from the Azure Storage Section of the Portal. Note that you need to have provisioned a Storage Account to have this information. How to get the AccountKey with Azure Storage Services Now you can get the access keys HOW NODE.JS WILL ACCESS THE APP SETTINGS You will create a Node.js application inside of Azure Mobile Services See previous steps THE NODE.JS APPLICATION READING APP SETTINGS You will starting by going back to Azure Mobile Services and drill down into your newly minted service We called ours raspberrymobileservice Once you click API, you should see: Notice the app settings are being read on lines 12 to 14.

June 19, 2014

by Bruno Terkaly

· 16,798 Views