Big Data Resources

Security and Back-End Integration Top Mobile Challenges, Says Survey: Alpha Software

The mobile space is constantly evolving, and a recent survey published by Red Hat found that back-end integration and security were the two most rapidly changing areas of the mobile landscape.

December 11, 2015

by Amy Groden-Morrison

· 3,675 Views · 1 Like

Internet of Things: 4 Free Platforms to Build IoT Projects

If you're looking to get started with a new idea in IoT, check out these platforms.

December 5, 2015

by Francesco Azzola

· 88,433 Views · 12 Likes

The Internet of Things, Gateways, and Next Generation of APIs Speaker Session at APIStrat Austin

Check out these conference highlights concerning IoT development, API optimization, and data security from APIStrat Austin.

November 22, 2015

by Steven Willmott

· 7,802 Views · 6 Likes

The Future of Smart Farming With IoT and Open Source Farming

Smart farming is a concept quickly catching on in the agricultural business. Offering high-precision crop control, useful data collection, and automated farming techniques, there are clearly many advantages a networked farm has to offer.

November 5, 2015

by Michael Tharrington

· 52,269 Views · 11 Likes

Analytics with Apache Spark Tutorial Part 2: Spark SQL

This tutorial will show how to use Spark and Spark SQL with Cassandra.

November 3, 2015

by Rick Hightower

· 58,306 Views · 9 Likes

Reactive Trends on the JVM

Check out these Reactive trends on the JVM, including a look at what Reactive is, patterns, and event logging.

October 26, 2015

by Jonas Bonér

· 13,111 Views · 4 Likes

Why Java 8?

A preview of our new research guide: The DZone Guide to the Java Ecosystem, from MVB Trisha Gee.

October 23, 2015

by Trisha Gee

· 72,472 Views · 54 Likes

Recipe: rsyslog + Kafka + Logstash

Here's a similar follow up to the previous rsyslog + Redis + Logstash recipe, this time with Kafka instead of Redis.

October 8, 2015

by Radu Gheorghe

· 15,533 Views · 8 Likes

Microservices and Kerberos Authentication

How to use Kerberos authentication with microservice architectures and API gateways.

October 6, 2015

by Jethro Bakker

· 19,025 Views · 7 Likes

The Limitations of the IoT and How the Web of Things Can Help

Understand the limitations of the Internet of Things and how the Web of Things can help build an application layer for the IoT.

September 28, 2015

by Dominique Guinard

· 27,347 Views · 6 Likes

Problems Solved by IoT

We spoke with 20 executives across the IoT space about problems the Internet of Things are addressing.

September 24, 2015

by Tom Smith

CORE

· 37,222 Views · 5 Likes

Customer Journey Analytics and Data Science

Deciphering the "nuts-and-bolts” of individual customer journeys (and deducing intent) is core to improving customer experience and driving brand loyalty.

September 9, 2015

by Ravi Kalakota

· 8,532 Views · 1 Like

Too Big Data: Coping with Overplotting

written by tim brock. scatter plots are a wonderful way of showing ( apparent ) relationships in bivariate data. patterns and clusters that you wouldn't see in a huge block of data in a table can become instantly visible on a page or screen. with all the hype around big data in recent years it's easy to assume that having more data is always an advantage. but as we add more and more data points to a scatter plot we can start to lose these patterns and clusters. this problem, a result of overplotting, is demonstrated in the animation below. the data in the animation above is randomly generated from a pair of simple bivariate distributions. the distinction between the two distributions becomes less and less clear as we add more and more data. so what can we do about overplotting? one simple option is to make the data points smaller. (note this is a poor "solution" if many data points share exactly the same values.) we can also make them semi-transparent. and we can combine these two options: these refinements certainly help when we have ten thousand data points. however, by the time we've reached a million points the two distributions have seemingly merged in to one again. making points smaller and more transparent might help things; nevertheless, at some point we may have to consider a change of visualization. we'll get on to that later. but first let's try to supplement our visualization with some extra information. specifically let's visualize the marginal distributions . we have several options. there's far too much data for a rug plot , but we can bin the data and show histograms . or we can use a smoother option - a kernel density plot . finally, we could use the empirical cumulative distribution . this last option avoids any binning or smoothing but the results are probably less intuitive. i'll go with the kernel density option here, but you might prefer a histogram. the animated gif below is the same as the gif above but with the smoothed marginal distributions added. i've left scales off to avoid clutter and because we're only really interested in rough judgements of relative height. adding marginal distributions, particularly the distribution of variable 2, helps clarify that two different distributions are present in the bivariate data. the twin-peaked nature of variable 2 is evident whether there are a thousand data points or a million. the relative sizes of the two components is also clear. by contrast, the marginal distribution of variable 1 only has a single peak, despite coming from two distinct distributions. this should make it clear that adding marginal distributions is by no means a universal solution to overplotting in scatter plots. to reinforce this point, the animation below shows a completely different set of (generated) data points in a scatter plot with marginal distributions. the data again comes from a random sample of two different 2d distributions, but both marginal distributions of the complete dataset fail to highlight this separation. as previously, when the number of data points is large the distinction between the two clusters can't be seen from the scatter plot either. returning to point size and opacity, what do we get if we make the data points very small and almost completely transparent? we can now clearly distinguish two clusters in each dataset. it's difficult to make out any fine detail though. since we've lost that fine detail anyway, it seems apt to question whether we really want to draw a million data points. it can be tediously slow and impossible in certain contexts. 2d histograms are an alternative. by binning data we can reduce the number of points to plot and, if we pick an appropriate color scale, pick out some of the features that were lost in the clutter of the scatter plot. after some experimenting i picked a color scale that ran from black through green to white at the high end. note, this is (almost) the reverse of the effect created by overplotting in the scatter plots above. in both 2d histograms we can clearly see the two different clusters representing the two distributions from which the data is drawn. in the first case we can also see that there are more counts from the upper-left cluster than the bottom-right cluster, a detail that is lost in the scatter plot with a million data points (but more obvious from the marginal distributions). conversely, in the case of the second dataset we can see that the "heights" of the two clusters are roughly comparable. 3d charts are overused, but here (see below) i think they actually work quite well in terms of providing a broad picture of where the data is and isn't concentrated. feature occlusion is a problem with 3d charts so if you're going to go down this route when exploring your own data i highly recommend using software that allows for user interaction through rotation and zooming. in summary, scatter plots are a simple and often effective way of visualizing bivariate data. if, however, your chart suffers from overplotting, try reducing point size and opacity. failing that, a 2d histogram or even a 3d surface plot may be helpful. in the latter case be wary of occlusion.

July 3, 2015

by Josh Anderson

· 13,608 Views

Crowdsourcing our way to better food hygiene

The last few years has seen a tremendous boom in the number of sources online relaying information about restaurant quality. Whether it’s review sites or more general social media, there is no shortage of feedback on how people have found a particular restaurant. I wrote a few years ago about a project from the University of Rochester that aimed to mine Twitter for mentions of eating out, with the hope of producing a detailed and comprehensive map of food hygiene standards throughout restaurants in New York. The system, called nEmesis, analyzed millions of tweets, and was on the hunt for people sharing an attack of food poisoning after visiting a restaurant. You might think, or hope at least, that this would be a relatively small number, but over a four month period they found 480 such mentions in New York City alone from a total of 23,000 restaurant visitors. What’s more, the data collected correlated well with public health data on those diners. Crowdsourcing food hygiene A recent Harvard led project is hoping to provide similar assistance to the Boston food hygiene authorities by providing more intelligent information for the authorities to base their inspection checks on. Rather than using Twitter for data however, the Harvard project is turning to the review website Yelp. They have launched a NetFlix style competition to create an algorithm that can search through the ratings of restaurants in Boston and produce recommendations for which restaurants warrant a visit from the hygiene police. The competition, organized by the data company DrivenData, will see the raw data posted online and then an army of data scientists charged with solving the puzzle. The founders observed that whilst the collection of machine readable data was now mandated by the government, there was a literacy problem that rendered much of that data left dormant and unused. Bringing data science to the masses And so the competition was born to try and make data science affordable for organizations with a clear social need but no budget to afford what are still very expensive skill sets. Of course, the food hygiene challenge is but one of the challenges on the DrivenData website, with the venture coming along way from their first challenge to make a better algorithm for improving spending in schools. The organization try and ensure that whatever winning entries emerge from the competitions receive support and help to grow and improve. The winner of that initial competition, for instance, eventually turned their algorithm into a software tool for schools to use. The eventual aim is to establish a community of data scientists that are happy to deploy their talents for socially worthwhile endeavors. “Our mindset has grown; we want to solve the big-picture data literacy and data capacity problems in the social and public sectors,” the creators say. “We think competitions are a great mechanism to do that right now, but our goal is to do more, to serve that community in other ways.” Suffice to say, challenges have come a long way from their beginnings in the 18th century when the UK government launched such a competition to help find longitude more easily. The likes of the X Prize has taken them to newfound heights, and it’s great to see organizations like DrivenData apply the concept to more manageable challenges. Of course, they aren’t the only organization seeking to make algorithms more accessible. I wrote last year about the Algorithmia social network, which aims to connect up organizations with lots of data with algorithms that are being under-utilized. The aim is that this match up will create not just new insights but extra profits. Data science is undoubtedly a burgeoning field, and it’s one with a great many exciting developments in it. Original post

July 2, 2015

by Adi Gaskell

· 881 Views · 1 Like

Emerging Niches and Technologies in Mobile App Development

If there have been wide array of successful consumer apps like Angry Bird or WhatsApp or DropBox. After years of reign in the publicity focus finally these consumer apps giants understood the importance of offering enterprise grade features. In last few years suddenly the focus shifted to enterprise mobile apps. Rapid development, tracking or monitoring apps, wearable apps, Internet of Things Apps, Geo-location technologies like iBeacon and Geofencing in business apps, the list of emerging app niches and technologies seem to be too long. Let us have a quick look at some of the most definitive app niches and technologies in recent times. Enterprise apps While smartphones and mobile devices continue to move off the shelves and millions of apps continue to make the app stores brimming with energy, activity and competition, most consumer app still fail to make a earning to survive beyond the year of their launch. This has been the sordid storyline for consumer apps for years. So, for some time the focus of developers is shifting towards enterprise apps. Moreover, now businesses are bent on going mobile and they are keen to develop apps that make their business process more productive. Although enterprise mobile apps have just started to take off this new and broad app niche already shown huge promise to take over consumer apps in just more than a year down the line. Rapid development As enterprises now focusing all out to embrace mobile apps in their business process, the new demand of enterprise grade apps made rapid development cycle obvious. When winning competition for businesses is boiling down to a fast and user focused mobile presence, fast paced development will naturally be the rule. This overwhelming demand of business apps and enterprise grade software made rapid development a criterion in the present scenario. Shortening the development lifecycle has now become the major focus for most mobile app development companies around the world. Mobile monitoring apps Wide adaptation of mobile devices and apps among all age groups and people in recent years gave rise to certain concerns. Child security concern, parental concern for negative influence on children, employer’s concern on employee productivity and information security, etc. are some of the major concerns centered on the mobile devices. IOS or Android monitoring software, child phone tracker apps, mobile spy software, text message tracking apps, are few of the app types getting increasingly popular these days to address the aforementioned concerns in family or workplace environments. Internet of Things (IOT) apps The world around us is becoming connected with the mobile devices and gadgets and devices around us are increasingly finding themselves equipped with mobile control interface. This new horizon of interconnected devices is referred as Internet of Things or IOT. Now an electric toaster can be controlled from its respective app on the mobile device. Similarly, the music system with the respective mobile app can be turned on and off, tuned in and given other commands. This new breed of apps is being called IOT apps. Wearable apps The smartphones or smart mobile devices are now playing the central role in connecting all types of wearable smart devices. Most smartwatch apps are still now in character only the extension of their mobile counterparts. But as smartwatch is slowly picking up to be the next big device platform as commonest wearable, a new breed of apps are being developed targeting smartwatch and wearable users besides offering their respective mobile apps as well. From smart jewelries to health trackers and fitness bands to optically mounted computers like Google Glass, these new wearable devices will be the target development platform for a vast majority of mobile app developers in the time to come. More user-optimized mobile UI design UI design is presently the most focus driven area for mobile app development around the world. Experiments and analysis on making UIs better and user optimized is continuing and a wide variety of new techniques and design approaches are giving birth to unprecedented level of excellence in user experience. From motivational design to flat design to and playful interfaces, we have come across quite a few dominating design trends and techniques. Geo-location technologies Contextual and user specific push notification is the new maneuver to engage users with a mobile app and to garner revenue from the process. This cannot be better done than by knowing the user location. When you know the location of a user close to your retail shop you can notify him with an offer to grab his attention and push him for a visit to your store. Thus knowing the user location translates to far better contextual and business driven messaging and notifications. Several mobile friendly Geo-location technologies like iBeacon, Geofencing, Geomagnetics, etc. are there to let you integrate location based user engagement features in your app.

July 2, 2015

by Juned Ghanchi

· 3,848 Views

The Secret to More Efficient Data Science with Neo4j and R [OSCON Preview]

It’s a sad but true fact: Most data scientists spend 50-80% of their time cleaning and munging data and only a fraction of their time actually building predictive models. This is most often true in a traditional stack, where most of this data munging consists of writing lines upon lines of some flavor of SQL, leaving little time for model-building code in statistical programming languages such as R. These long, cryptic SQL queries not only slow development time but also prevent useful collaboration on analytics projects, as contributors struggle to understand each others’ SQL code. For example, in graduate school, I was on a project team where we used Oracle to store Twitter data. The kinds of queries my classmates and I were writing were unmaintainable and impossible to understand unless the author was sitting next to you. No one worked on the same queries together because they were so unwieldy. This not only hindered our collaboration efforts but also slowed our progress on the project. If we had been using an appropriate data store (like a graph database) we would have spent significantly less time pulling our hair out over the queries. Why Today’s Data Is Different This data-munging problem has persisted in the data science field because data is becoming increasingly social and highly-connected. Forcing this kind of interconnected data into an inherently tabular SQL database, where relationships are only abstract, leads to complicated schemas and overly complex queries. Yet, several NoSQL solutions – specifically in the graph database space – exist to store today’s highly-connected data. That is, data where relationships matter. A lot of data analysis today is performed in the context of better understanding people’s behavior or needs, such as: How likely is this visitor to click on advertisement X? Which products should I recommend to this user? How are User A and User B connected? Written by Nicole White People, as we know, are inherently social, so most of these questions can be answered by understanding the connections between people: User A is similar to User B, and we already know that User B likes this product, so let’s recommend this product to User A. The Good News: Data-Munging No More Data science doesn’t have to be 80% data munging. With the appropriate technology stack, a data scientist’s development process is seamless and short. It’s time to spend less time writing queries and more time building models by combining the flexibility of an open-source, NoSQL graph database with the maturity and breadth of R – an open-source statistical programming language. The combination of Neo4j’s ability to store highly-connected, possibly-unstructured data and R’s functional, ad-hoc nature creates the ideal data analysis environment. You don’t have to spend an hour writing CREATE TABLE statements. You don’t have to spend all day on StackOverflow figuring out how to traverse a tree in SQL. Just Cypher and go. Learn More at OSCON 2015 At my upcoming OSCON session we will walk through a project in which we analyze #OSCON Twitter data in a reproducible, low-effort workflow without writing a single line of SQL. For this highly-connected dataset we will use Neo4j, an open-source graph database, to store and query the data while highlighting the advantages of storing such data in a graph versus a relational schema. Finally, we will cover how to connect to Neo4j from an R environment for the purposes of performing common data science tasks, such as analysis, prediction and visualization.

June 30, 2015

by Mark Needham

· 1,668 Views

JBoss BPM Suite Quick Guide: Import External Data Models to BPM Project

You are working on a big project, developing rules, events and processes at your enterprise for mission critical business needs. Part of the requirements state that a certain business unit will be providing their data model for you to leverage. This data model will not be designed in the JBoss BPM Suite Data Modeler but you need to have access to it while working on your rules, events and processes from the business central dashboard. For this article we will be using the JBoss BPM Travel Agency demo project as a reference, with it's current data model built externally to the JBoss BPM Suite business central. The external data model is called the acme-data-model and is found in the project directory: This data model is built during installation and provides you with an object data model as a Java Archive (JAR) file which is installed into the JBoss BPM Suite business central component by placing it into the following location: jboss-eap-6.4/standalone/deployments/business-central.war/WEB_INF/lib/acmeDataModel-1.0.jar Authoring --> Artifact repository. This way of deploying the data model means that it is available to all projects you work on in JBoss BPM Suite business central, something that might not always be preferable. What we need is a way to deploy external data models into JBoss BPM Suite and then selectively add them to projects as needed. Within JBoss BPM Suite there is an Artifact Repository that is made just for this purpose. We can upload through the business central dashboard UI all our models and then pick and choose from the repository artifacts (your data model is one artifact) on a per project basis. This gives you absolute control over the models that a project can access. Choose external data model file. There are a few steps involved that we will take you through here to change the current installation of JBoss BPM Travel Agency where the acmeDataModel-1.0.jar file will be removed from the previously mentioned business central component and uploaded into the Artifact Repository and added to the Special Trips Agency project. Here is how you can do it yourself: obtain and install JBoss BPM Travel Agency demo project remove current data model from global business central application: $ rm ./target/jboss-eap-6.4/standalone/deployments/business-central.war/WEB_INF/lib/acmeDataModel-1.0.jar Upload external model jar file. start JBoss BPM Suite server after installation as stated in the installation instructions login to JBoss BPM Suite at http://localhost:8080/business-centralwith: u: erics p: bpmsuite1! go to AUTHORING --> ARTIFACT REPOSITORY go to UPLOAD --> CHOOSE FILE... --> projects/acme-data-model/target/acmeDataModel-1.0.jar --> click button to UPLOAD this puts the external data model into the JBoss BPM Suite artifact repository Select dependencies to add to project. got to AUTHORING --> PROJECT AUTHORING --> OPEN PROJECT EDITOR in project editor select GENERAL PROJECT SETTINGS --> DEPENDENCIES in dependencies select ADD FROM REPOSITORY -> in pop-upSELECT entry acmeDataModel-1.0.jar This will result in the external data model being added only to the Special Trips Agency project and not available to other projects unless they add this same dependency from the JBoss BPM Suite artifact repository. If you build & deploy the project, run it as described in the project instructions you will find that the external data model is available and used by the various rules and process components that are the JBoss BPM Travel Agency. As a closing note, this works exactly the same for JBoss BRMS projects.

June 29, 2015

by Eric D. Schabell

CORE

· 3,187 Views · 1 Like

Spark Grows Up and Scales Out

Written by Craig Wentworth. To understand the furor that’s greeted recent vendor announcements around open source analytics computing engine Spark, and some commentary seemingly setting up a Spark versus Hadoop battle, it’s worth taking a moment to recap on what each actually is (and is not). As I covered in last year’s MWD report on Hadoop and its family of tools, when people talk about Apache Hadoop they’re often referring to a whole framework of tools designed to facilitate distributed parallel processing of large datasets. That processing was traditionally confined to MapReduce batch jobs in Hadoop’s early days, though Hadoop 2 brought the YARN resource scheduler and opened up Hadoop to streaming, real-time querying and a wider array of analytical programming applications (beyond MapReduce). Spark has been designed to run on top of Hadoop’s Distributed File System (amongst other data platforms) as an alternative to MapReduce – tuned for real-time streaming data processing and fast interactive queries, and with multi-genre analytics applicability (machine learning, time series, graph, SQL, streaming out-of-the-box). It gets that speed advantage by caching in-memory (rather than writing interim results to disk, as MapReduce does), but with that approach comes a need for higher-spec physical machines (compared with MapReduce’s tolerance for commodity hardware). So, Spark isn’t about to replace Hadoop -- but it may well supplant MapReduce (especially in growing real-time use cases). Those “Spark vs Hadoop” headlines are about as meaningful as one proclaiming “mushrooms vs pizza." Yes, mushroom might be a more suitable topping than, say, pepperoni (especially in a vegetarian use case), but it’ll still be deployed on the same dough and tomato sauce pizza platform. Nobody’s about to suggest the mushroom should go it alone! But what’s behind the headlines and the hype is a story of enterprise adoption – or at least vendors anticipating that adoption and investing in ‘the weaponization of Spark’ as it faces the more exacting standards of security, scaling performance, consistency, etc. which come with mainstream enterprise deployment. Big names like IBM, Databricks (the company formed by the originators of Spark), and MapR made commitments in and around the Spark Summit earlier this month. MapR has announced three new Quick Start Solutions for its Hadoop distribution to help customers get started with Spark in real-time security log analytics, genome sequencing, and time series analytics; Databricks’ cloud-hosted Spark platform (formerly known as Databricks Cloud) has become generally available; and IBM announced a raft of measures designed to give Spark a significant shot in the arm – it’s open sourcing its SystemML technology to bolster Spark’s machine learning capabilities, integrating Spark into its own analytics platforms, investing in Spark training and education, committing 3,500 of its researchers and developers to work on Spark-related projects, and offering Spark as a service on its Bluemix developer cloud. Given the overlap with Databricks’ business model (of offering development, certification, and support for Spark), IBM’s intentions are likely to tread on some toes before long – but for now, at least, both companies are content to focus on the combined push benefiting the Spark community and its enterprise aspirations overall (though clearly IBM’s betting on all this investment buying it some influence over where Spark goes next). It’s worth bearing in mind that not all its supporters champion Spark wholesale and all the interested parties tend to be interested in particular bits of Spark (as wide-ranging as it is) because of overlaps with their own preferred toolsets. For instance, although Spark supports many analytics genres, Cloudera focuses on its machine learning capabilities (as it has its own SQL-on-Hadoop tool in Impala), and MapR and Hortonworks also promote Drill and Hive as their favoured source of SQL-on-Hadoop. IBM’s support is focused on Spark’s machine learning and in-memory capabilities (hence the SystemML open sourcing news). In the face of such strong vendor preferences, how long before some of Spark’s current features fall away (or at least start to show the effects of being starved of as much care and feeding as is bestowed upon vendors’ favourite Spark components)? The Spark community is at much the same place the Hadoop one was at a while back – it’s showing great promise and suitability in key growth workloads (in Spark’s case, such as real-time IoT applications). However, the product as it stands is too immature for many enterprise tastes. Cue enterprise software vendors stepping up to help grow Spark up fast. Their challenge though is to smooth out the edges without smothering what made it so interesting in the first place.

June 28, 2015

by Angela Ashenden

· 2,395 Views

Analyzing Application Workload Data with Apprenda and R

One of the most important kinds of data a Platform as a Service (PaaS) can leverage is its knowledge of guest applications that run within its purview. A PaaS should know all sorts of things about guest applications – their architecture, dependencies, scale across infrastructure, and more. Data including application resource utilization metrics (CPU, RAM, etc.) are key for things like data center capacity planning, policy enforcement, and application isolation in the enterprise. A PaaS such as Apprenda provides this information through a centralized single lens – in our case, a collection of RESTful APIs – making it easier than ever before to run analytics on application metrics in the data center. Apprenda’s approach as a PaaS is to provide developers and platform operators with helpful information through platform extensibility and APIs. This is because there are plenty of tools in the data center that provide advanced analytic capabilities, so long as you can feed them the information they need. We integrate with tools like System Center, New Relic, and more all the time because these are the tools our customers have invested in, and they are great at what they do. Our job is not to reinvent these tools but instead to provide data. Apprenda captures information about applications such as their duration of deployment, resource policy (allocation of CPU and memory), actual utilization of resources, scale (# of instances), custom metadata, and more. All of this information can be fed into data center tools that help IT make important, data-driven, decisions. In the land of DevOps, however, it is not uncommon for folks to use this data in creative and innovative ways. Often times this means using the mechanism “du jour,” which can be scripting (PowerShell), a programming language (R), or an entire runtime (Node.js) to quickly and effectively grab, process, and manipulate data. In a big example, let’s look at R, which is a powerful programming language centered on data mining and statistical analysis. It provides straightforward facilities for many types of data-analytics techniques, and is extensible using community maintained packages. In the simple example below, I use standard R functions plus three packages (easily included using R’s install.packages() function): 1. jsonlite for parsing JSON data that the Apprenda API returns. 2. httr for handling the HTTP requests necessary to authenticate and retrieve data. 3. plotrix for help rendering a plot of retrieved data. From there it’s pretty straightforward. The first step is to authenticate with your Apprenda environment: I’ve now stored my Apprenda session token in a variable called ‘token.’ I’ll include that token as a header in my API call to get application data: GET() is a function provided by the httr package that simplifies an HTTP request to the API. I’ve added the Apprenda session token to the HTTP Headers for authentication, and included a query string parameter that will help return all currently running application workloads on the platform. The data that is returned is parsed and stored in the variable (in R, a vector) called ‘r’ which now has 151 records, one for each application workload. Each record in ‘r’ has 15 variables (properties) that we can use to run analytics across the entire collection of results. For the purposes of illustration, I’m going to use the variable componentType, which represents Apprenda’s knowledge of the type of application workload that was deployed – there are seven self-explanatory types: UserInterface, PublicUserInterface,WindowsService, JavaWebApplication, LinuxService, WcfService, and Database. When the collection is then grouped bycomponentType, it becomes pretty simple to plot a chart showing the distribution of workload by the type of component: The resulting plot (pie3D() comes from the plotrix package) looks like this: I’ve had conversations with IT folks who couldn’t describe the architectural makeup of their application portfolio in any level of detail, yet in this case we pulled the data in real time with one line of R. Admittedly, a pie chart is a pretty watered down way to look at this information, but the point is the data is available and can be grouped, filtered, manipulated, and analyzed very simply with R. For this example, I used the open-source edition of RStudio. Some other powerful information that could be gleaned from the platform’s APIs: 1. The average discrepancy between resource allocation and actual utilization per workload. (This is helpful in capacity planning.) 2. The longest -running application workload. 3. The most distributed applications. (This could aid in scaling decisions.) There are many more. A PaaS such as Apprenda is, by nature, in a unique spot in the data center stack because it maintains knowledge of both infrastructure and applications. It also serves as a hub for data that, when analyzed creatively, provides new insights. These insights are an opportunity for enterprises to enhance their practices to better serve developers and applications while operating more efficiently than ever.

June 27, 2015

by Matthew Ammerman

· 3,230 Views

Web Data Mining Services Give Business Intelligence to Your Start-up!

business sphere nowadays has become an extremely competitive arena. dynamics change in a blink. times have become highly unpredictable and hence; businesses today need to be agile while being equipped with reliable, accurate, relevant and actionable business intelligence. every business venture has its own fair share of ebbs and tides. it becomes more of a challenge to prove your capabilities and achieve a strong hold in the market; especially when you have just started taking your first step in. for startups, getting the minutest nuances of how to run a business; right from the day one, forms the most crucial part! to smoothly sail through this enormously competitive space; startups need to perform above and beyond the expectations right from the very beginning. the initial barriers can be easily overcome when your business is armed with smallest details of the market. but how to catch the nerve of market, you will ask? - data extraction or data mining services is the answer! data mining equips you with rich business intelligence that in turn gives a firm control of things and empowers you to make informed business decisions as well as create more targeted, applicable and growth-oriented business strategies. data extraction services gather huge volume of data that is highly varied, precise, and relevant. most importantly - it is very useful for your new startup . a meticulous study of this database allows you to analyze things in great details and arranging this scattered information into meaningful clusters; helps you get the whole picture! which are the different ways for startups to effectively use web data mining? web data mining is a wide array, which can be employed for a variety of purposes to generate various kinds of important data to gain actionable insights. in fact, for a startup, the most critical part is to decide where and how to use this powerful technique to get valuable information which can help in creating a difference for overall future prospects of the company. let’s check out on some of those interesting avenues; where you can apply impactful web data extraction techniques: digging information for social rankings and backlinks for any startup; the most crucial business process is to analyze its competitors. this is one area where web data extraction comes across an instrumental enabler. many startups, in the past, have effectively used data mining to fish out critically useful information related to social rankings of competing companies. social ranking is equally important factor, since any ‘social actions’ on the internet are building blocks of several opinions as well as builds a reputation in this day and age. keeping these things in mind, you can use web data extraction to dig out for social rankings related to content created by your competitors in the cyber space. with thorough analysis; you can get a very clear picture of the entire situation and it helps you to arrive to a concrete conclusions in terms of what your competitors are doing well at, and what sells the best. obtaining contact information building strong networking is the best bet which helps you to get through the volatile market; specifically when you are a newbie in the market. whether it is with prospective or existing customers, industry peers, associates, or competitors; excellent networking is the driving force where there is open and transparent communication, ensures success of your startup. and to have such an effective communication and networking channel, you need a huge, robust list of contact information that is in sync with - your exact requirements. mining data from multiple web sources is by all means a perfect method to achieve this. in a short period of time you can easily collect rich contact information that can be leveraged in a number of ways. you can form a long lasting business relationship or make potential customers know what you offer; this information gives a thrust to your startup and propels it to new levels of recognition. for building brand, promotion and advertisement for startups, the very first wave of promotion is the key that builds a strong brand value in the market and ensures long-term business success. it is during this initial phase that the first and foremost public perception of your company is created, and the essentials of public opinion starts shaping up. for this reason, it is required to be precise with your marketing and promotion these formative years. to achieve this, you need a strong, in-depth understanding of the audience that you need to target. you require to classify your target audience based on factors like age, gender, income, demographics, and preferences. such detailed understanding can be attained only when you have a voluminous social data related to the targeted audience. and there is no better way to achieve this, other than web data extraction. with such a powerful weapon in your arsenal, you can certainly boost up your startup and take it a long way with clever decisions and timely implementations. web data extraction can be the absolute tool that a startup may ever have! its appropriate use should give you tons of required and relevant business intelligence, which should help you to shine in this competitive market.

June 26, 2015

by Ritesh Sanghani

· 1,633 Views

The Latest Big Data Topics