Data Resources

The Latest Data Topics

Learn R: How to Create Data Frames Using Existing Data Frames

In this article, we go over several commands developers and data scientists can use to create data frames using existing data frames.

June 27, 2015

by Ajitesh Kumar

· 253,445 Views · 2 Likes

From Design to Execution with JBoss BPM Suite & Signavio Process Editor

Occasionally we are asked about JBoss BPM Suite integration with other products and layers in an enterprises architecture. We have published articles talking about how to achieve this with various aspects such as: Microservices integration Data integration Articles are one thing, but seeing is believing, so we have done a few webinars to show you live how to tackle integration: Data integration webinar PEX webinar Along with these articles we have always published demo projects that give you a closer look and chance to get hands on with these integration strategies: JBoss BPM Suite & JBoss Fuse Travel microservices story JBoss BPM Suite & JBoss Data Virtualization integration Imported Signavio Process Editor mortgage workflow. There is another integration story yet to be told about how one can leverage other tooling together with JBoss BPM Suite. This article will introduce one such company,Signavio, that provides a Signavio Process Editor so"...you can start modeling and engaging your organization in improving operational efficiency through the development of optimal models..." The following demo project provides a working example of how you can model an example mortgage process in Signavio Process Editor and then bring it into JBoss BPM Suite where you can add implementation details, integration details and other implementation details to finally execute the mortgage process end-to-end. Demo project As always we bring you not only a story, but a reusable demo project you can easily spin up yourself to explore the details around how a JBoss BPM project would integrate with the model designed in Signavio Process Editor. The project is called the JBoss BPM Suite & Signavio Process Editor Integration Demo. The project installs JBoss BPM Suite 6.1 with an example mortgage project with rules, process, forms and other artifacts. It also includes a copy of an exported Signavio Process Editor mortgage process that we then show how to import. Final mortgage workflow project with implementation details and integration details completed. Ready to run! This gives you the initial starting point after importing the Signavio process and the completely integrated final mortgage project that you can run side-by-side. To setup this project there are just a few simple steps to get going and will be up and running minutes: Installation Download and unzip. Add products to installs directory. Run 'init.sh' or 'init.bat' file. 'init.bat' must be run with Administrative privileges. Start JBoss BPMS Server by running 'standalone.sh' or 'standalone.bat' in the /target/jboss-eap-6.1/bin directory. Login to http://localhost:8080/business-central - login for admin, appraisor, broker, and manager roles (u:erics / p:bpmsuite1!) Mortgage Loan demo pre-installed as project. Using process designer, import the Signavio process that was exported to the file found in: support/MortgageDemoSignavio.bpmn Looking to Automate your business? See screenshots provided in project for how this should look and note that the JBoss BPM Suite process designer included validation that puts messages about tasks not specified, this is correct as at this point you need to start implementing the process tasks. You can examine the imported process and note the various details captured during initial workshops have been put into the process details for each step in the workflow. After implementing these steps you will find the final process ready to run. You can now explore the final project by deploying it and starting a new instance. We hope you enjoy this example project and feel free to browse for more at JBoss Demo Central.

June 26, 2015

by Eric D. Schabell

CORE

· 1,917 Views · 1 Like

Web Data Mining Services Give Business Intelligence to Your Start-up!

business sphere nowadays has become an extremely competitive arena. dynamics change in a blink. times have become highly unpredictable and hence; businesses today need to be agile while being equipped with reliable, accurate, relevant and actionable business intelligence. every business venture has its own fair share of ebbs and tides. it becomes more of a challenge to prove your capabilities and achieve a strong hold in the market; especially when you have just started taking your first step in. for startups, getting the minutest nuances of how to run a business; right from the day one, forms the most crucial part! to smoothly sail through this enormously competitive space; startups need to perform above and beyond the expectations right from the very beginning. the initial barriers can be easily overcome when your business is armed with smallest details of the market. but how to catch the nerve of market, you will ask? - data extraction or data mining services is the answer! data mining equips you with rich business intelligence that in turn gives a firm control of things and empowers you to make informed business decisions as well as create more targeted, applicable and growth-oriented business strategies. data extraction services gather huge volume of data that is highly varied, precise, and relevant. most importantly - it is very useful for your new startup . a meticulous study of this database allows you to analyze things in great details and arranging this scattered information into meaningful clusters; helps you get the whole picture! which are the different ways for startups to effectively use web data mining? web data mining is a wide array, which can be employed for a variety of purposes to generate various kinds of important data to gain actionable insights. in fact, for a startup, the most critical part is to decide where and how to use this powerful technique to get valuable information which can help in creating a difference for overall future prospects of the company. let’s check out on some of those interesting avenues; where you can apply impactful web data extraction techniques: digging information for social rankings and backlinks for any startup; the most crucial business process is to analyze its competitors. this is one area where web data extraction comes across an instrumental enabler. many startups, in the past, have effectively used data mining to fish out critically useful information related to social rankings of competing companies. social ranking is equally important factor, since any ‘social actions’ on the internet are building blocks of several opinions as well as builds a reputation in this day and age. keeping these things in mind, you can use web data extraction to dig out for social rankings related to content created by your competitors in the cyber space. with thorough analysis; you can get a very clear picture of the entire situation and it helps you to arrive to a concrete conclusions in terms of what your competitors are doing well at, and what sells the best. obtaining contact information building strong networking is the best bet which helps you to get through the volatile market; specifically when you are a newbie in the market. whether it is with prospective or existing customers, industry peers, associates, or competitors; excellent networking is the driving force where there is open and transparent communication, ensures success of your startup. and to have such an effective communication and networking channel, you need a huge, robust list of contact information that is in sync with - your exact requirements. mining data from multiple web sources is by all means a perfect method to achieve this. in a short period of time you can easily collect rich contact information that can be leveraged in a number of ways. you can form a long lasting business relationship or make potential customers know what you offer; this information gives a thrust to your startup and propels it to new levels of recognition. for building brand, promotion and advertisement for startups, the very first wave of promotion is the key that builds a strong brand value in the market and ensures long-term business success. it is during this initial phase that the first and foremost public perception of your company is created, and the essentials of public opinion starts shaping up. for this reason, it is required to be precise with your marketing and promotion these formative years. to achieve this, you need a strong, in-depth understanding of the audience that you need to target. you require to classify your target audience based on factors like age, gender, income, demographics, and preferences. such detailed understanding can be attained only when you have a voluminous social data related to the targeted audience. and there is no better way to achieve this, other than web data extraction. with such a powerful weapon in your arsenal, you can certainly boost up your startup and take it a long way with clever decisions and timely implementations. web data extraction can be the absolute tool that a startup may ever have! its appropriate use should give you tons of required and relevant business intelligence, which should help you to shine in this competitive market.

June 26, 2015

by Ritesh Sanghani

· 1,629 Views

Generating CSV-files on .NET

I have project where I need to output some reports as CSV-files. I found a good library called CsvHelper from NuGet and it works perfect for me. After some playing with it I was able to generate CSV-files that were shown correctly in Excel. Here is some sample code and also extensions that make it easier to work with DataTables. Simple report Here’s the simple fragment of code that illustrates how to use CsvHelper. using (var writer = new StreamWriter(Response.OutputStream)) using (var csvWriter = new CsvWriter(writer)) { csvWriter.Configuration.Delimiter = ";"; csvWriter.WriteField("Task No"); csvWriter.WriteField("Customer"); csvWriter.WriteField("Title"); csvWriter.WriteField("Manager"); csvWriter.NextRecord(); foreach (var project in data) { csvWriter.WriteField(project.Code); csvWriter.WriteField(project.CustomerName); csvWriter.WriteField(project.Name); csvWriter.WriteField(project.ProjectManagerName); csvWriter.NextRecord(); } } Of course, you can use other methods to output whole object or object list with one shot. I just needed here custom headers that doesn’t match property names 1:1. Generic helper for DataTable Some of my projects come from service layer as DataTable. I don’t want to add new models or Data Transfer Objects (DTO) with no good reason and DataTable is actually flexible enough if you need to add new fields to report and you want to do it fast. As DataTables are not supported by default (yet?), I wrote simple extension methods that work on DataTable views. When called on DataTable it selects default view automatically. The idea is – you can set filter on default data view and leave out the rows you don’t need. If you just want to show DataTable to screen as table then check out my posting Simple view to display contents of DataTable. public static class CsvHelperExtensions { public static void WriteDataTable(this CsvWriter csvWriter, DataTable table) { WriteDataView(csvWriter, table.DefaultView); } public static void WriteDataView(this CsvWriter csvWriter, DataView view) { foreach (DataColumn col in view.Table.Columns) { csvWriter.WriteField(col.ColumnName); } csvWriter.NextRecord(); foreach (DataRowView row in view) { foreach (DataColumn col in view.Table.Columns) { csvWriter.WriteField(row[col.ColumnName]); } csvWriter.NextRecord(); } } } And here is simple MVC controller action that gets data as DataTable and returns it as CSV-file. The result is CSV-file that opens correctly in Excel. [HttpPost] public void ExportIncomesReport() { var data = // Get DataTable here Response.ContentType = "text/csv"; Response.AddHeader("Content-disposition", "attachment;filename=IncomesReport.csv"); var preamble = Encoding.UTF8.GetPreamble(); Response.OutputStream.Write(preamble, 0, preamble.Length); using (var writer = new StreamWriter(Response.OutputStream)) using (var csvWriter = new CsvWriter(writer)) { csvWriter.Configuration.Delimiter = ";"; csvWriter.WriteDataTable(data); } } One thing to notice – with CsvHelper we have full control over a stream where we write data and this way we can write more performant code. Related Posts .Net Framework 4.0: string.IsNullOrWhiteSpace() method Exporting GridView Data to Excel Code Contracts: Hiding ContractException How to dump object properties My object to object mapper source released The post Generating CSV-files on .NET appeared first on Gunnar Peipman - Programming Blog.

June 26, 2015

by Gunnar Peipman

· 4,735 Views · 1 Like

Perforce and Go2Group Integrate Helix SCM Platform with ConnectALL ALM Router

New Integration Provides Seamless Connections Between Perforce Helix and Leading Application Lifecycle Management Systems WOKINGHAM, UK. (June 24, 2015) – Perforce Software, the leader in software configuration management (SCM) and collaboration, and Go2Group, an Atlassian Platinum and Enterprise Expert, today announced the Perforce ConnectALL Adapter. The new adapter for Go2Group’s ConnectALL ALM Router connects Perforce Helix to Application Lifecycle Management (ALM) systems supported by ConnectALL. The companies also announced that they have expanded their partnership, which first began in 2002. “Very few SCMs can handle binary data, and no other SCM solution supports large file formats that scale across globally distributed enterprises like Helix,” said Brett Taylor, president of Go2Group. “Our customers demand future-proof solutions, and with Perforce we know they don’t have to worry about outgrowing their systems—it will serve them well whether they’re a team of 50 or 50,000.” With the Perforce adapter, ConnectALL automatically synchronises data and workflow with other ALM systems and integrates ALM systems components within minutes. “We’re excited to be a part of the ConnectALL ecosystem of adapters and to enable companies to more easily design, configure, synchronise, manage, and monitor their integrations with Perforce,” said Dave Robertson, vice president of Channels at Perforce. “We’re glad to extend our partnership with Go2Group to new technologies and markets.” Go2Group is part of Perforce’s network of sales partners across Europe, the Middle East, Africa, Asia Pacific and India. Perforce partners serve customers in more than 100 countries worldwide. The Perforce ConnectALL Adapter is available for purchase from the Go2Group website.

June 24, 2015

by Fran Cator

· 980 Views

New Integrated Biometrics System Extends Enterprise Access Control to the Data Centre and Other High-Security Settings

Combining BioConnect with Digitus server cabinet access control makes biometric identity management more effective and easier ENTERTECH SYSTEMSandDigitus Biometricshave just announced a new technology partnership between ENTERTECH’S BioConnect identity management platform and the Digitus db Bus and db Cabinet Sentry for server cabinet access control. The result is a new, fully-integrated solution called db BioConnect. The newdb BioConnectlets company data centres, co-location data centres and IT room customers simplify biometric implementation, enrollment and management for access control of perimeter doors, interior rooms, cages and now server racks all integrated into one identity platform. This game-changing solution now extends enterprise access security platforms into the data centre, making investments in Access Control Management Systems far more economical and effective. “Digitus Biometrics is the leader in providing biometric access control to the critical infrastructure market," said Rob Douglas, ENTERTECH SYSTEMS CEO. "With Digitus technology integrated to BioConnect, all 15 of our certified access control partners will now be able to offer it to their customers. For the first time, end users will have access to an integrated biometric solution that secures access control from the data centre entrance to the server cabinet instead of having to deal with stand-alone systems.” The list of BioConnect certified access control partners can be found atwww.bioconnect.com/partners. ENTERTECH SYSTEMS’BioConnectis the most advanced identity management platform on the global market today. Simple, secure and scalable, it provides Suprema biometric authentication across leading access control systems. As an application for security professionals, BioConnect helps enterprises successfully implement identity solutions by making deployment and use of biometrics easier than ever. BioConnect addresses deployment challenges by reducing costs, overcoming complexity and making it easier to on-board users. The platform provides seamless synchronization of data such as new cardholders, changes and deletions, and is tailor-made for enterprises where true identity is critical for secure access to physical facilities and software applications. “Our biometric access control solutions are designed to meet the needs of a diverse range of clients," said David Orischak, Digitus Biometrics CEO. "This technology integration to create db BioConnect will let us offer a single, centralised, highly advanced access control solution that's easy to deploy and use. An industry first, our customers will be able to use one centralised system facility-wide to secure a company’s critical infrastructure and data." db Bus ServerRack Access Controlis designed for data centres needing to secure large volumes of server cabinets with only one small component per cabinet. A single db Bus controller allows the user to secure up to 32 racks with a single 48V power supply and may be infinitely scaled.db Cabinet Sentryis designed for data centres with a structured cabling scheme. It is Digitus’ most compact, cost-effective and energy-efficient means of securing server cabinets. This extremely versatile unit can be deployed in networked or stand-alone environments, using power over Ethernet (PoE) or an external power supply.db BioLockis a server cabinet lock that uses biometric identification with prints for up to 10 fingers per user. Through this new technology partnership, these db products will be integrated with BioConnect to create the new integrated db BioConnect solution. “Digitus’ use of leading Suprema biometric technology in their server cabinet access control solutions is a natural fit for ENTERTECH SYSTEMS as the operating partner for Suprema in the US, Canada, UK, Ireland and Puerto Rico,” added Douglas. “The implications to the market are significant," added Orischak. "The integrated db BioConnect solution can be used to manage any cabinet system where biometric access control is warranted – even SCIF’s and locked areas housing sensitive assets such as pharmaceuticals, hazardous materials, intelligence archives, customer and patient records, as well as critical IP.” For more information on the db BioConnect integrated solution, visitwww.bioconnect.com/db.

June 24, 2015

by Fran Cator

· 1,367 Views

8 Key Findings About IoT Development

IoT is really hot, but can also be a bit confusing. Read about these 8 development key findings.

June 24, 2015

by Burke Holland

· 1,910 Views

A look at New Relic Browser

while at fluent conf this year, i was walking by the new relic booth when i noticed something interesting – a product called new relic browser. back when i converted my blog to wordpress, i ran into a lot of problems. my server went down, wordpress crashed, it was a bit frustrating. (much like how lemonade in a paper cut is a bit frustrating.) one of the tools i used to help diagnose my server was the new relic server monitor . outside of a few issues installing, i was really impressed with the level of detail the monitor provided. while it wasn’t the final solution for fixing my problem, it definitely helped me pinpoint what was sucking up all the ram on my box, and helped me check my changes to ensure things were going well. best of all, this was entirely free. i’ll give them huge props for offering such a powerful tool for no up front money. because i had such a good experience with them on the server side, i thought i’d give their browser product a try as well. as you can guess, this tool is meant to help you gain insights into how well your web application is performing. i decided to try it on my blog, which, admittedly, is probably not the best use case for this product. wordpress isn’t something i need to hack up and outside of the performance issues i had on the server side, i figured the client side was pretty much good enough. it certainly seemed good enough to me. but at the same time, my blog gets quite a bit of traffic so i figured it would also provide a good set of data to dig into as well. setup is relatively simple. you begin by selecting a deployment method: i selected copy/paste as i figured that would be simpler. on the next page, i said i was not using apm, even though i guess i kinda was. i was trying to test this as someone who was not also using the server side product, so there may be things i missed out on. typically when i try products like this though i try to keep things as simple as possible. the final step was copying a ginormous javascript string into my wordpress template. so that was that. i copied in the code, cleared my wordpress cache, and then promptly forgot about it for a week or so. i then took a look back at my stats. there’s a tremendous amount of information you get right on the front dashboard. first off – note the browser load times. i’m averaging 6.8 seconds or so which is quite high and not what i expected. for over ten years i ran my blog with very precise knowledge of what i had going on within my templates. with wordpress, i’ve kinda gotten lazy about it and have given up being so deeply involved. this gives me a clue that maybe i need to take a closer look at my template and plugins and see if i need everything i’ve got. also note the error graph. on average, 2% of my pages have javascript errors. the real question is – how often do those javascript errors impact the core thing people need to do on my site – read a blog post. as i said, the dashboard is pretty packed, but let’s go deeper. first, the page views report. this report shows recent pages and which page requests are consuming the most load time. you can mouse over each line item for a detailed view: you can also switch the “sort by” to show average page load time: and yes – that twenty plus second item on top there made me crap my pants. honestly i’m not sure why that page averaged so high as it is relatively simple, but it does give me something to dig into deeper. the next link, session traces , is not what you may think. i had assumed this was a report of a ‘session’ for one visitor to my blog. instead, it is a deep look at one particular web page. and when i say deep, i mean deep. here is a top level report for one session: note all the detail in the chart. you can then scroll through the session and look at every particular darn thing in the one request. for example, here i can look at what google analytics is doing. the next report shows you the ajax requests your web app is making. you get details on what is making requests as well as throughput and data size. i can say from experience that the data size chart could be really useful. back when i first learned ajax i made the mistake of not considering the size of my packets and my applications suffered through it. this is a rather long page so i’ve split the screen shot into two parts. i’m thinking that in ‘real’ web app the ajax report will be the number one place you’ll find nuggets of information. of similar importance is the js errors report. clicking on a particular item will give you details: if you click on the instance details, you can see the line number where this error was thrown. while not this particular error, earlier i found an issue with gravatar. i didn’t think i was using gravatar, but it turned out one plugin was making use of it and throwing an error. i modified the plugin and the error went away. the browsers report gives you details about what types of browsers are hitting your site and how well they perform. i mentioned how i was a bit surprised by the page load times on my site, well in this report i can see what browsers are having the worst issues with page load: look at that jump for ie and opera! that’s fascinating to me. it doesn’t necessarily mean those are bad browsers, but it gives me an area to focus in if i were to start digging into my site performance. you can then go to the geo report to see how different areas of the world (and america) handle your site. along with just reporting, you can also create alerts too. you get a default alert policy out of the box and can define your own as well. this is fairly similar to what you get in the server product as well. from what i can see, this is a really darn good tool, and as i said, i had great success with the server tool. so how much does it cost? here is the price plans as of the time i wrote the post: 150 a month isn’t necessarily cheap, but heck, that’s my rate for development (yes, that’s what i charge by the hour) and considering how much data you get the forensic information is easily worth it. the free (lite) version also has fewer reports. if you go to their pricing page you can see what you don’t get at that tier, but note that you get 14 days of free access to the top tier to see if it is worth it.

June 24, 2015

by Raymond Camden

· 851 Views

Information Builders Showcases Hot Business Intelligence Trends in "Summer Shorts" Webcast Series

London, UK – June 23, 2015 – Information Builders, a leader in business intelligence (BI) and analytics, information integrity, and integration solutions, today announced a new webcast series, “Summer Shorts,” designed to provide viewers quick overviews of the hottest topics in BI and analytics. Information Builders’ Summer Shorts will help enterprises rethink information strategies in a world transformed by the forces of mobile, social, cloud, advanced analytics, and big data. In each session, an Information Builders expert will offer a fun, informative presentation on a different BI and analytics discipline. Viewers can join one or all of the sessions below to learn tips for leveraging emerging technologies for better BI. 8 July | 14:00 BST / 15:00 CET | The Art of Dashboard Design for Business Intelligence – What are your dashboards telling you and your customers? Peter O’Grady will walk through design theories, design and layout considerations, and form-factor awareness and responsive design. Be empowered to change your data visualisation strategies, practices, and processes. 22 July | 14:00 BST / 15:00 CET | Advanced Data Visualization – Data visualisation is red hot, and for good reason. Companies in all sectors are finding hidden insights with sophisticated data visualisation. In this webcast by Porter Thorndike, attendees will learn advanced tips for data analysis, visualisation plug-in architecture, polished finished examples, and visualisation-based InfoApps™ from Information Builders. 5 August | 14:00 BST / 15:00 CET | Social and Feedback Analysis – Join this social media analytics webcast to learn how to better understand customer sentiment and behavior. Dan Grady will discuss how to capitalise on the opportunities presented by social media, including integrating social data with enterprise data, improving customer engagement, and picking the right platform to consolidate and share this information. 19 August | 14:00 BST / 15:00 CET | 5 Hot Trends for Business Intelligence – Mobile, social, cloud, advanced analytics, and big data aren’t just big trends, they also raise big questions in BI and analytics. Chris Banks will describe in this webcast why BI is vital to making these trends work for companies. It will cover how to build once and responsibly deploy BI to mobile devices, how to expose relevant analytics to customers and partners, and best practices for harnessing big data.

June 23, 2015

by Fran Cator

· 1,103 Views

PostgreSQL Powers All New Apps for 77% of the Database's Users

Survey of open source PostgreSQL users found adoption continues to rise with 55% of users deploying it for mission-critical applications Bedford, MA – June 23, 2015 – EnterpriseDB (EDB), the leading provider of enterprise-class Postgres products and database compatibility solutions, today announced the results of its “PostgreSQL Adoption Survey 2015,” a biennial survey of open source PostgreSQL users. Conducted by EnterpriseDB, the survey found PostgreSQL adoption continuing to rise, with 55% of users – up from 40% two years ago – deploying it for mission-critical applications and 77% of users are dedicating all new application deployments to PostgreSQL. These findings give voice to end users and confirm such industry indicators as increasing job listings and monthly rankings on DB-Engines that have pointed to rising interest in and demand for PostgreSQL, also called Postgres. The growing popularity of Postgres also comes as traditional software vendors suffer setbacks in the marketplace. The enterprise-class performance, security and stability of Postgres, on par with traditional database vendors for most corporate workloads, meanwhile have helped position Postgres among the solutions from the world’s largest vendors. The opportunity to transform their data center economics has helped fuel downloads of Postgres as well. End users reported cutting costs with Postgres, with 41% reporting they had first-year cost savings of 50% or more. They’re using Postgres to build web 2.0 applications using unstructured data as evidenced by the 64% of respondents who said they were working with JSON/JSONB and the 47% who said they were using Postgres for collaboration applications. “Postgres is empowering organizations to transform the economics of IT. IT can invest in the customer engagement applications that differentiate their operations from their competition instead of continuing to pay the steep and rising licensing and support fees charged by traditional database vendors,” said Marc Linster, senior vice president of products and services of EnterpriseDB. “With the expanding adoption, EnterpriseDB has experienced dramatic growth year over year, providing the software, services and support that organizations need to be successful with Postgres.” Database Migrations, Replacements The findings also support statements in a recent Gartner report that reflect the widespread acceptance of open source databases. “By 2018, more than 70% of new in-house applications will be developed on an OSDBMS, and 50% of existing commercial RDBMS instances will have been converted or will be in process,” according to the April 2015 Gartner report, The State of Open-Source RDBMs, 2015.* Among Postgres users, the survey findings show migrations are already under way with 37% reporting they had migrated applications from Oracle or Microsoft SQL Server to Postgres. Many users were still planning further migrations, with 37% of PostgreSQL users saying they will gradually replace their legacy systems with Postgres, compared to 29% who said that in the 2013 survey. Further, end users predict their deployments of Postgres will expand significantly, with 32% saying they anticipate production deployments of Postgres to increase by at least 50% over the next year. The survey, conducted by EnterpriseDB using an online tool in May 2015, queried registered users of PostgreSQL and drew 274 respondents worldwide from government organizations and companies ranging in size and industry. *The State of Open-Source RDBMs, 2015, by Donald Feinberg and Merv Adrian, published on April 21, 2015. Connect with EnterpriseDB Read the blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb Become a fan on Facebook: http://www.facebook.com/EnterpriseDB?ref=ts Join us on Google+: https://plus.google.com/108046988421677398468 Connect on LinkedIn: http://www.linkedin.com/company/enterprisedb

June 23, 2015

by Fran Cator

· 1,003 Views

This Week In Modern Software: Inside Obama’s Geek Squad

[This article was written by Kevin Casey] Welcome to This Week in Modern Software, orTWiMS, New Relic’s weekly roundup of the need-to-know news, stories, and events of interest surrounding software analytics, cloud computing, application monitoring, development methodologies, programming languages, and the myriad of other issues that influence modern software. This week, our top story goes inside President Obama’s secret team of tech geeks, 140 of them and counting: TWiMS Top Story: Inside Obama’s Stealth Startup—Fast Company What it’s about:If the President of the United States walked into the room and personally recruited you to rebuild the country’s technology infrastructure, could you turn him down? He’s serious, and that room is theRoosevelt Room in the West Wing of the White House, by the way. AsLisa Gelobtersays: “What are you going to say that?” Gelobter’s answer was “Yes”—she’s now chief digital officer for the US Department of Education, part of a 140-person-and-counting tech team that’s functioning something like an elite startup embedded inside the federal government. Its business? Only modernizing the technical infrastructure, applications, and processes of just about every federal agency. Why you should care:What was once something of a tech desert—the federal government—is beginning to draw top private-sector talent inside the Beltway. The team, led by Mikey Dickerson (who helped lead the team that rescuedHealthcare.gov) andformer US CTO Todd Park, also includes the likes of former Googler Matthew Weaver, and it hopes to hit 500 people by the end 2016, shortly before President Obama will leave office. Its challenges are immense, from tackling government bureaucracy (to test just how entrenched the suits were, Weaver requested the official title “Rogue Leader”—and he got it) to the fact that its recruiting pitch includes the phrase: “You’ll have to take a pay cut.” But its mission is both noble and necessary, and the appeal of working on major problems with enormous public impacts appears to be working. Recommended reading. Further reading: Mikey Dickerson’s 10 Tips for Dealing with Bureaucracy—New Relic Blog [Video] Airbnb Open Sources Software to Lure Talent Amid ‘Insane’ Competition—CIO Journal What it’s about:Airbnb added three new apps to its open source portfolio earlier this month, but the motivation wasn’t just trying to give employees the best business tools or contribute to the software community at large. Sure, that might have been part of the equation, but the rental booking site hopes open-sourcing some of its toolkit will help recruit the best software talent in the face of what director of engineeringMike Curtiscalls “insane” competition in the Silicon Valley labor market. Why you should care:In the software arms race, any little edge counts. Curtis tellsCIO Journalthat Airbnb will keep the proprietary stuff closely guarded, of course. But it will open source “generic” tools with wider industry use cases, such as its recently releasedAerosolvemachine-learning package and itsAirpalcloud-based data querying tool. The latter, which works with Facebook’s open sourcePrestoDB, aims to simplify SQL queries to the point where you don’t need to be a big data wonk or business intelligence guru to run it. Indeed, one in three Airbnb employees have run a query on it in the year since it launched. Airbnb has contributed a dozen open source tools on its aptly namedNerds site(gotta love that!) to date, something the company hopes both contributes to greater good but also advertises its software innovation to potential hires. Google Is Wielding Its Own Secret Weapon in the Cloud—The New York Times What it’s about:In thecutthroat competitionfor public cloud business, Google may be its own best customer testimonial. In advance of this week’sOpen Network Summit, theTimes’Bits bloglooked at Google’s plan to not only unveil cloud customers such as HTC but reveal much more than ever before about its own infrastructure. Google did just that on Wednesday, offering a look inside itsdata center networking, including its massive-capacity, lightning-fast Jupiter network. Why you should care:As major cloud players continue to zap prices with their shrink-rays, it’s increasingly clear that features and underlying platforms will distinguish one from the other when enterprise users make their pick. Google is taking a big step toward writing its own story in this regard, and the synopsis might read something like: “We’re pretty good at this stuff.” Its Jupiter fabrics deliver 1 petabit per second of bisection bandwidth, according to Google, or “enough for 100,000 servers to exchange information at 10Gb/s each, enough to read the entire scanned contents of the Library of Congress in less than 1/10th of a second.” If it sounds like a bit of bragging, well, yeah—it is. But it’s bragging with a purpose: Attracting devs who want access to the same technology without having to build it themselves.Google’s Amin Vahdat connected the dots in a blog post: “The same networks that power all of Google’s internal infrastructure and services also power Google Cloud Platform.” Move Over, Meeker: Byron Deeter’s State of the Cloud Report—Bessemer Venture Partners What it’s about:With a nod to Mary Meeker’s classicState of the Internet report,Bessemer Venture Partners’Byron Deeterchecks in with his 2015 State of the Cloud Report. Given cloud computing’s relative youth and rampant ascension, it’s no surprise the stats are staggering. Here’s one to start: Cloud revenues have increased tenfold in the last six years, from a scant $5.6 billion in 2008 to more than $56 billion in 2014. And it’s going to double again in the next four years, according to BVP’s projections, to $127.5 billion in 2018. Why you should care:Deeter’s full presentation is worth a weekend watch or read, but it’s the forward-looking slides that may be most compelling for software pros. Deeter notes both the immense risks and opportunities in cloud security, unveiling a 10-point security plan for cloud startups on slide 37. To underscore the security landscape, Deeter quotes an unnamed cloud CEO who says aDDoSattack that took down the firm’s API caused more customer churn in one day than in the rest of its history. Wow. He also addresses the exploding market for cloud services built specifically for developers including, yes, New Relic. And for mobile developers, slide 44 underscores something we’ve talked about before in this space:the real money’s in enterprise apps, and it’s still a largely untapped market. Click through thefull slide deck hereorwatch video of Deeter’s presentation here. Bandwidth: The Next Frontier of Cloud Computing—ZDnet What it’s about:Is networking the next big thing in the everything-as-a-service age? It just might be, as firms likePacnetvie to deliver networking capacity on a pay-for-what-you-use model that some industry folks say better suits cloud environments facing significant but uneven networking needs. Why you should care:As author Drew Turney notes, there’s a common blind spot when it comes to cloud computing’s many shapes and sizes: Moving all that data from points A to Z, and everywhere in between, which can cause both performance problems and undue financial pressures. The promise of Networking-as-a-Service (NaaS), industry execs tell Turney, is that it can provide more efficient, scalable networking for short-term usage bursts such as customer traffic spikes or large cloud backup-and-storage jobs, enabling companies to later dial down their capacity as needed. Combined withSoftware-Defined Networking (SDN),NaaS makes it possible to build intelligent applications that manage their own networking needs, which might be the most significant enterprise potential of NaaS, saysNuage NetworksarchitectMarten Hauville. Page Bloat: Average Web Page Now More Than 2MB—The Performance Beacon (SOASTA) What it’s about:Do you need to put your website on a diet? Apparently so: The average Web page topped 2 MB as of May 2015, according to ongoing tracking atThe Performance Beacon. That’s double the average page weight from just three years ago. The site projects average page weight will exceed 3 MB in late 2017. Why you should care:Performance, performance, performance:Slow speedsare a killerin the modern software era. While author andSOASTAUX evangelistTammy Evertsrightly notes that page weight is not the only factor in Web optimization, we’re simply not paying it enough attention when designing and building Web pages. Images are the big culprit in the Web’s expanding waistline: they comprise nearly two-thirds of the average page’s weight, and video is a growing part of our Web diet, too. But other factors such as custom fonts play a role, adding weight even as the Web sheds previous performance hogs like Flash. The ideal weight? 1 MB, she says, which will save crucial seconds in load times. Sounds like it’s time to hit the virtual treadmill.

June 23, 2015

by Fredric Paul

· 1,098 Views

Rx-java subscribeOn and observeOn

If you have been confused by Rx-java ObservablesubscribeOn and observeOn, one of the blog articles that helped me understand these operations is this one by Graham Lea. I wanted to recreate a very small part of the article here, so consider a service which emits values every 200 millseconds: package obs.threads; import obs.Util; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import rx.Observable; public class GeneralService { private static final Logger logger = LoggerFactory.getLogger(GeneralService.class); public Observable getData() { return Observable.create(s -> { logger.info("Start: Executing a Service"); for (int i = 1; i <= 3; i++) { Util.delay(200); logger.info("Emitting {}", "root " + i); s.onNext("root " + i); } logger.info("End: Executing a Service"); s.onCompleted(); }); } } Now, if I were to subscribe to this service, this way: @Test public void testThreadedObservable1() throws Exception { Observable ob1 = aService.getData(); CountDownLatch latch = new CountDownLatch(1); ob1.subscribe(s -> { Util.delay(500); logger.info("Got {}", s); }, e -> logger.error(e.getMessage(), e), () -> latch.countDown()); latch.await(); } All of the emissions and subscriptions will act on the main thread and something along the following lines will be printed: 20:53:29.380 [main] INFO o.t.GeneralService - Start: Executing a Service 20:53:29.587 [main] INFO o.t.GeneralService - Emitting root 1 20:53:30.093 [main] INFO o.t.ThreadedObsTest - Got root 1 20:53:30.298 [main] INFO o.t.GeneralService - Emitting root 2 20:53:30.800 [main] INFO o.t.ThreadedObsTest - Got root 2 20:53:31.002 [main] INFO o.t.GeneralService - Emitting root 3 20:53:31.507 [main] INFO o.t.ThreadedObsTest - Got root 3 20:53:31.507 [main] INFO o.t.GeneralService - End: Executing a Service By default the emissions are not asynchronous in nature. So now, what is the behavior if subscribeOn is used: public class ThreadedObsTest { private GeneralService aService = new GeneralService(); private static final Logger logger = LoggerFactory.getLogger(ThreadedObsTest.class); private ExecutorService executor1 = Executors.newFixedThreadPool(5, new ThreadFactoryBuilder().setNameFormat("SubscribeOn-%d").build()); @Test public void testSubscribeOn() throws Exception { Observable ob1 = aService.getData(); CountDownLatch latch = new CountDownLatch(1); ob1.subscribeOn(Schedulers.from(executor1)).subscribe(s -> { Util.delay(500); logger.info("Got {}", s); }, e -> logger.error(e.getMessage(), e), () -> latch.countDown()); latch.await(); } } Here I am using Guava's ThreadFactoryBuilder to give each thread in the threadpool a unique name pattern, if I were to execute this code, the output will be along these lines: 20:56:47.117 [SubscribeOn-0] INFO o.t.GeneralService - Start: Executing a Service 20:56:47.322 [SubscribeOn-0] INFO o.t.GeneralService - Emitting root 1 20:56:47.828 [SubscribeOn-0] INFO o.t.ThreadedObsTest - Got root 1 20:56:48.032 [SubscribeOn-0] INFO o.t.GeneralService - Emitting root 2 20:56:48.535 [SubscribeOn-0] INFO o.t.ThreadedObsTest - Got root 2 20:56:48.740 [SubscribeOn-0] INFO o.t.GeneralService - Emitting root 3 20:56:49.245 [SubscribeOn-0] INFO o.t.ThreadedObsTest - Got root 3 20:56:49.245 [SubscribeOn-0] INFO o.t.GeneralService - End: Executing a Service Now, the execution has moved away from the main thread and the emissions and the subscriptions are being processed in the threads borrowed from the threadpool. And what happens if observeOn is used: public class ThreadedObsTest { private GeneralService aService = new GeneralService(); private static final Logger logger = LoggerFactory.getLogger(ThreadedObsTest.class); private ExecutorService executor1 = Executors.newFixedThreadPool(5, new ThreadFactoryBuilder().setNameFormat("SubscribeOn-%d").build()); @Test public void testObserveOn() throws Exception { Observable ob1 = aService.getData(); CountDownLatch latch = new CountDownLatch(1); ob1.observeOn(Schedulers.from(executor2)).subscribe(s -> { Util.delay(500); logger.info("Got {}", s); }, e -> logger.error(e.getMessage(), e), () -> latch.countDown()); latch.await(); } } the output is along these lines: 21:03:08.655 [main] INFO o.t.GeneralService - Start: Executing a Service 21:03:08.860 [main] INFO o.t.GeneralService - Emitting root 1 21:03:09.067 [main] INFO o.t.GeneralService - Emitting root 2 21:03:09.268 [main] INFO o.t.GeneralService - Emitting root 3 21:03:09.269 [main] INFO o.t.GeneralService - End: Executing a Service 21:03:09.366 [ObserveOn-1] INFO o.t.ThreadedObsTest - Got root 1 21:03:09.872 [ObserveOn-1] INFO o.t.ThreadedObsTest - Got root 2 21:03:10.376 [ObserveOn-1] INFO o.t.ThreadedObsTest - Got root 3 The emissions are now back on the main thread but the subscriptions are being processed in a threadpool. That is the difference, when subscribeOn is used the emissions are performed on the specified Scheduler, when observeOn is used the subscriptions are performed are on the specified scheduler! And the output when both are specified is equally predictable. Now in all cases I had created a Scheduler using a ThreadPool with 5 threads but only 1 of the threads has really been used both for emitting values and for processing subscriptions, this is actually the normal behavior of Observables. If you want to make more efficient use of the Threadpool, one approach may be to create multiple Observable's, say for eg, if I have a service which returns pages of data this way: public Observable getPages(int totalPages) { return Observable.create(new Observable.OnSubscribe() { @Override public void call(Subscriber subscriber) { logger.info("Getting pages"); for (int i = 1; i <= totalPages; i++) { subscriber.onNext(i); } subscriber.onCompleted(); } }); } and another service which acts on each page of the data: public Observable actOnAPage(int pageNum) { return Observable.create(s -> { Util.delay(200); logger.info("Acting on page {}", pageNum); s.onNext("Page " + pageNum); s.onCompleted(); }); } a way to use a Threadpool to process each page of data would be to chain it this way: getPages(5).flatMap( page -> aService.actOnAPage(page).subscribeOn(Schedulers.from(executor1)) ) .subscribe(s -> { logger.info("Completed Processing page: {}", s); }); see how the subscribeOn is on the each Observable acting on a page. With this change, the output would look like this: 21:15:45.572 [main] INFO o.t.ThreadedObsTest - Getting pages 21:15:45.787 [SubscribeOn-1] INFO o.t.GeneralService - Acting on page 2 21:15:45.787 [SubscribeOn-0] INFO o.t.GeneralService - Acting on page 1 21:15:45.787 [SubscribeOn-4] INFO o.t.GeneralService - Acting on page 5 21:15:45.787 [SubscribeOn-3] INFO o.t.GeneralService - Acting on page 4 21:15:45.787 [SubscribeOn-2] INFO o.t.GeneralService - Acting on page 3 21:15:45.789 [SubscribeOn-1] INFO o.t.ThreadedObsTest - Completed Processing page: Page 2 21:15:45.790 [SubscribeOn-1] INFO o.t.ThreadedObsTest - Completed Processing page: Page 1 21:15:45.790 [SubscribeOn-1] INFO o.t.ThreadedObsTest - Completed Processing page: Page 3 21:15:45.790 [SubscribeOn-1] INFO o.t.ThreadedObsTest - Completed Processing page: Page 4 21:15:45.791 [SubscribeOn-1] INFO o.t.ThreadedObsTest - Completed Processing page: Page 5 Now the threads in the threadpool are being used uniformly.

June 23, 2015

by Biju Kunjummen

· 10,859 Views · 2 Likes

Opsmatic Expands Its "Single Source of Truth" Live State Monitoring Capabilities for Larger Enterprises

Opsmatic Inc., a company with a focus on creating tools to improve the effectiveness of development and operations teams, announced today the expansion of its live-state monitoring service to include Enterprise and On-Premises Editions. Opsmatic officially came out of stealth last month and continues to deliver on the promise of supporting the needs of DevOps teams, whether in the cloud or inside a corporate firewall. The Opsmatic live-state monitoring service is the only solution that provides a precise, real-time picture of the detailed configuration and changes that affect an enterprise’s computing infrastructure. The new offerings include all the features of the Professional Edition (announced last month), with the addition of a single sign-on and dedicated support in the Enterprise and On-Premises Editions. The On-Premises Edition is designed for customers with isolated infrastructure who require an internally deployed solution, while leveraging the same real-time visibility to quickly troubleshoot problems and reduce costly downtime. Opsmatic services are delivered through a purpose-built, intelligent data platform which enables customers to easily integrate events from other services for greater context (PagerDuty, Nagios, Zabbix), incorporate their own custom event data (deployment events, backups, etc.); and extend their live state host data to include custom configuration and state data from their own services to provide deeper, more complete insight. Alerts can also be posted to Slack, HipChat, DataDog and PagerDuty, to better support team collaboration and communication. “Our new single sign-on capability gives the entire technical team access to a central source of truth, reducing the number of configuration-related issues while increasing team velocity,” said Jim Stoneham, CEO, Opsmatic. “Our customers have said that incident triage that used to take them hours now takes minutes.” Opsmatic live-state monitoring features include: Real-time insight The Opsmatic service monitors the state of every host in real-time, providing a current and accurate picture of the configuration, as well as an instantaneous feed of any changes happening to that host. Any variation, or “drift” in configuration across host groups is also immediately visible to enable teams to fix minor issues before they escalate into downtime events, saving hours of detective work and remediation. Infrastructure search and Assertions Live-state inventory data can be instantly searched to find vulnerable packages, to identify every version of an open-source package that is deployed, or find hosts that are running a specific service. Specific policies (“Assertions”) can be easily defined and run against live-state data to enforce dependencies, verify configuration, or instantly identify potential issues. Customers can also add internal service configuration and other types of data with a simple inject utility to provide deeper service-level checks. Configuration management monitoring Deep integrations with popular Configuration Management tools from Chef, Puppet, Ansible, SaltStack, as well as Docker, enable tracking and reporting of automation runs, file integrity monitoring, and detailed visibility into the key host attributes set by each run. In addition, Opsmatic reaches beyond the policies deployed through the CM tool to track the entire host and report on changes made outside automation runs. Intelligent alerts Using Assertions and saved searches, Opsmatic gives teams control over alert noise and the fatigue it can cause, enabling them to focus on the changes or conditions that really matter. Any alert can be classified by host or host group, and can be fed into specific chat channels (Slack, HipChat), or emailed to the right person on team to investigate the issue. Robust Software-as-a-Service The Professional and Enterprise Editions are delivered as cloud-based services, supporting any cloud infrastructure, datacenter, or hybrid environment across a range of OS platforms. The services are hosted in a hardened data center with SAS 70 Type II and SSAE 16 certifications and 24x7 monitoring. Availability Opsmatic Enterprise Edition is sold as a monthly subscription, with billing based on usage: $7 per host, per month, based on the peak number of hosts being monitored each month. The Opsmatic On-Premises Edition is sold as a yearly contract by quote at [email protected]. The company offers an unlimited-use, 30-day free trial of Opsmatic Professional that is available at www.opsmatic.com/signup. No credit card is required for the free trial. General inquiries should be directed to [email protected]. About Opsmatic Inc. Opsmatic provides real-time visibility of any change in the live state of computing infrastructure and intelligently alerts users before trouble begins. The SaaS service is built on an underlying data platform with a robust API, and is integrated with popular monitoring and code automation tools to give customers complete context and provide the shared visibility required by modern DevOps teams. Founded in 2013, the Opsmatic team comprises experienced development and operations professionals who were involved at the beginning of the DevOps movement at major web-scale companies. The company is backed by leading investors in the cloud technology space, including AME Cloud Ventures (Jerry Yang), Freestyle Ventures, Illuminate Ventures and Index Ventures. For more info, please visit opsmatic.com or follow @Opsmatic on Twitter.

June 23, 2015

by Jim Rossner

· 868 Views

Big Data TCO Lessons From Virtualization Technology Sprawl

The complexity of big data makes it a difficult concept for many to grasp, and utilizing it effectively is one of the biggest challenges businesses face today. There is little doubt that big data offers organizations a number of clear advantages, but applying them across the entire enterprise is one obstacle that can truly be described as formidable, even daunting, to even the most technologically savvy companies. One department might be able to create its own business solutions through big data analytics, while another department might come up with answers of their own, but lack of true coordination and collaboration remains a significant problem. Businesses aren’t without help in this area, however, because they’ve encountered similar problems before. Many companies have encountered issues such as virtualization technology sprawl, and the lessons learned from addressing that problem could prove to be exceptionally valuable when dealing with big data true cost of ownership (TCO). To understand the problem and the solution, we must first look back at the rapid growth of virtualization technology, more specifically server virtualization. As businesses adopted virtualization, the mainframe systems soon diverged into multiple systems. The more popular virtualization became, the more projects were taken on and the more technologies diverged. Larger companies eventually sought technology specialists to work within their areas of expertise. The result of the use of these individual teams was virtualization technology sprawl, an inefficient development that eventually lead to even higher operational costs. For all the benefits virtualization technology offered, many of them were outweighed by the increased demands and greater management complexity that came from technology sprawl. Businesses were quick to come up with new solutions for the problem. The most common was to adopt a converged infrastructure . This strategy directly addressed the higher operational costs that resulted from technology sprawl, basically breaking through the silos by taking multiple technologies and combining them into single stacks for computing, storage, and networking. This made the management of virtualization technology much easier since operational complexity was significantly reduced. In other words, management of this technology was kept at a reasonable size. The same principle can apply to big data management across an entire organization. When it comes to management of big data and hadoop security, it’s easy to get caught up in the immensity of it all. The fact that big data is so versatile and can be applied to so many different use cases also means it can apply to any number of different divisions within a company. This creates silos and a general desire to hold onto data sets. In other words, big data ends up in a sprawl of its own, becoming that much more unwieldy and complicated, which is a major problem for a technology that’s already so complex to begin with. The lesson that every company should take away from the solution to virtualization technology sprawl is the breaking down of barriers to big data management. It all comes down to ready access to all the necessary data no matter what roles an employee may have within a company. Businesses shouldn’t have to worry over the cost it takes to store and process data since the insights gained from big data analytics are particularly valuable. Most importantly, it’s about avoiding big data from getting too big, to the point where it becomes unmanageable and merely adds to the overall operating costs of a company. It’s true that big data introduces more complexity, but businesses that have learned how to store and process it efficiently, sometimes through big data platforms or cloud-based services, are in a more advantageous position than companies still dealing with technology sprawl. The lessons learned from previous problems can indeed play a helpful role in solving the problems many experience today.

June 22, 2015

by Rick Delgado

· 1,969 Views

FusionExperience announces successful partnership with Cloud Consulting

London, UK – FusionExperience, the business and data solutions provider, today announces the success of its first salesforce.com partnership with Cloud Consulting Ltd. (CCL). CCL was working with an international airline client to migrate a legacy charter and group booking application from one Salesforce.com instance to a new one. Very early on in the project CCL discovered that there were considerable elements of unsupported custom code and that these had to be redesigned and redeveloped. The airline took the opportunity at this stage to request changes and improve the application in line with their new business processes. CCL worked with FusionExperience to migrate the application to the latest salesforce.com environment and re-architected the booking engine functionality and complex pricing algorithms using Apex and VisualForce. For business reasons the airline had a strict project deadline and despite all the unknowns involved the project timescales were maintained and FusionExperience delivered on time and to budget. The airline went live with the application on schedule without any post-production problems or warranty fixes required. They now have an up to date system that has achieved a game changing transformation in the way it does business. Robin James, Platform Evangelist for FusionExperience said; “The ability to seamlessly work with our partners on salesforce.com projects enables rapid scaling of resources and capabilities. This ensures that the client is delighted by the results, yet unaware of the complex extended ecosystem that has been involved. This is facilitated by that fact that we all speak the same salesforce.com language. Cloud Consulting is an ideal partner to work with in this way, as our delivery and technical strengths are well matched with their intimate client facing approach.” Tim Pullen, Managing Director of CCL added: “We already had a close relationship with FusionExperience and it was natural for us to turn to them for help with this suddenly extremely challenging project. The combination of cleaning, segmenting and splitting the data in Salesforce.com, extracting the system configuration and custom code and then creating a new system was tough enough to start but then having to redevelop the application from scratch took it to a new level. Right from the start Robin James and his team took everything in their stride and provided a level of comfort, reassurance, skill and professionalism that we’d never experienced before from other partners. Bear in mind that the old system had no user or technical documentation plus undocumented code and you begin to understand just how good the end result has been for the airline. Thank you Fusion!”

June 22, 2015

by Fran Cator

· 847 Views

Spring Data Couchbase: Handle Unknown Class

Spring Data Couchbase provides transparent way to save and load Java classes to and from Couchbase. However, if a loaded class contains a property of unknown class, you will receive org.springframework.data.mapping.model.MappingException: No mapping metadata found for java.lang.Object This may happen if, for example, different versions of your code save and load information. In order to handle situation when we want to load an object, which contains another object on unknown class (in a map or list property) we should override the default SPMappingCouchbaseConverter. Let's see how we do this with Spring XML configuration: I replace my old XML: to the following XML: And create the following class: public class MyMappingCouchbaseConverter extends MappingCouchbaseConverter { public MyMappingCouchbaseConverter(final MappingContext, CouchbasePersistentProperty> mappingContext) { super(mappingContext); } @Override protected R read(final TypeInformation type, final CouchbaseDocument source, final Object parent) { if (Object.class == typeMapper.readType(source, type).type) { return null; } return super.read(type, source, parent); } } Now, if loaded object will contain a property of unknown class or an object of unknown class in a list or map, this property or object will be replaced by null. view source print?

June 22, 2015

by Pavel Bernshtam

· 4,101 Views

Purple WiFi appoints Collin Tan As Regional Manager ASEAN

June 22, 2015: Purple WiFi, the cloud-based Social WiFi software company, today announced the appointment of Collin Tan as Regional Manager, ASEAN reporting to Allen Pan, VP Asia Pacific. He will be based in Singapore and will be responsible for all Purple WiFi’s business in the ASEAN region. He will be working to develop the distributors and reseller channels across countries in ASEAN, namely Singapore, Malaysia, Indonesia, Thailand, Philippines, Vietnam, Brunei, Cambodia, Laos and Myanmar and engage directly with key service providers in these countries. Collin was previously the Managing Director of Singapore start-up, 1Care Global Pte Ltd, providing after sales services, such as equipment protection and extended warranty. Under his leadership 1Care enjoyed tremendous growth with customers in the Asia Pacific region, which includes the world’s Top 2 PC manufacturers. Before joining 1Care Global, Collin spent 10 years with Intel Corporation, serving as Country Manager for Intel Singapore before he left in 2013. Previous roles with Intel include leading the regional OEM team for one of Intel’s largest MNC customers and Manager for the Field Applications Engineers Team based out of Taiwan. Purple WiFi is expanding globally following a $5m investment announced earlier this year. The investment was raised in order to accelerate product development and recruitment of a truly global sales team, which already has strongholds in Europe, Asia-Pacific and the Americas. The WiFi offering focuses on engaging, understanding and delivering value by allowing users to gain free access to a public WiFi network through their existing social media accounts or a short form. The user gets access to family friendly WiFi, while the benefit to the business hosting the service (such as a restaurant, hotel, retailer, museum, sports stadium or shopping mall) is valuable analytic insights into the profiles and movements of their customers and a sophisticated built-in marketing platform. Thousands of venues globally have been secured and deep technology partnerships established, most notably with Cisco, Cisco Meraki, BT and Verizon but also many others. Collin Tan, Regional Manager ASEAN, Purple WiFi, comments: “Purple WiFi provides the perfect solution for companies that wish to monetise their free WiFi, as well as enabling direct targeted marketing to users within its proximity. It also combines the four fastest growing technology sectors of Mobile, Cloud, Social Media and Analytics in a single product, making it extremely valuable as a service and technology organisation.” Allen Pan, VP Asia Pacific, Purple WiFi, comments: “Collin brings the perfect combination of experience and drive to the role and we’re excited to have him onboard. The market in ASEAN is growing quickly and Collin’s in-depth knowledge of the region will allow us to capitalise on the opportunities for Purple WiFi.”

June 22, 2015

by Fran Cator

· 1,040 Views

ParStream to Present Requirements of an Analytics Platform for IoT at the TDWI Munich Conference 2015

COLOGNE, Germany – June 22, 2015 – ParStream, the IoT analytics company, today announced its participation at the TDWI Munich Conference 2015, one of the largest gatherings of expert Business Intelligence, Big Data and data warehousing leaders and educators in Europe. The conference will take place June 22-24, 2015 at the MOC Order and Event Center in Munich, Germany. Albert Aschauer, Sales Director DACH at ParStream, will present on requirements for an analytics platform for the Internet of Things (IoT) based on real-world use cases from the renewable energy and telecommunications industries. Big Data, fast data, edge analytics and real-time insights are driving new technology innovation to meet the demand for getting more value from IoT data. Additional details on the speaking session are below. What: “Requirements of an Analytics Platform for the Internet of Things” When: Monday, June 22, 2015 at 11:35 a.m. CEST Who: Albert Aschauer, Sales Director DACH at ParStream Where: MOC Munich, Germany – Room F112 To schedule a one-on-one meeting with Albert Aschauer and ParStream at TDWI Munich Conference 2015, send an email to events(at)parstream(dot)com.

June 22, 2015

by Fran Cator

· 1,120 Views

Making litigation more affordable

Last year some data from the Citizens Advise Bureau revealed that 7 out of 10 potentially successful employment cases are not being pursued, with a good 50 percent of those being down to financial issues. Whilst it’s tempting to think that we are all equal in front of the law, there remains a distinct sense that we are anything but. It’s a major reason why companies such as Logikcull are trying to make the whole process easier and more efficient. It’s believed that the e-discovery process can contribute to around 70 percent of the costs of any legal proceeding, so reducing the time involved in that can be a huge cost saver. Using the crowd Other organizations are attempting to make the legal process more affordable by recruiting the crowd to help meet the legal costs involved. For instance, I wrote about LexShares towards the end of last year, who are a kind of crowd based investment site. You can ‘invest’ in a particular case, thus giving the plaintiff funds to pursue their case. If the case is successful, the backer gets their money back plus a bit of the damages. If the case fails, then they lose their money. Another crowd based venture launched in the UK recently. The site, called CrowdJustice, aims to provide funding to cases that would normally struggle to do so. Supporting public interest cases The site was founded by Julia Salasky, who previously worked for the UN, and aims to specialize in so called public interest cases. “CrowdJustice allows communities to band together to access the courts to protect their communal assets – like their local hospital – or shared values – like human rights. Successive governments have made access to justice harder and more expensive but we are using the power of the crowd to try and stem the tide,” she says. She suggests that cuts to legal aid has made it harder for poorer people to access adequate legal protection, especially when it comes to challenging large institutions. This is especially so when the end game doesn’t necessarily result in a large payout. This could include, for instance, the destruction of a local bird sanctuary or even much larger issues such as torture. Despite effecting huge numbers of people, it is often very difficult for communities to channel their energies towards fighting the case collectively. As such, these kind of cases typically require a determined individual to pursue the cause on their own. The hope is that the CrowdJustice platform will make this considerably easier. Whether it’s CrowdJustice or LexStorm or Logikcull, there are certainly a wide range of projects aiming to change the legal industry for the better. It will be fascinating to watch them as they unfold and witness the impact they have. Original post

June 22, 2015

by Adi Gaskell

· 1,010 Views

Thread Pools in NGINX Boost Performance 9x!

Introduction [This article was written by Valentin Bartenev] It’s well known that NGINX uses an asynchronous, event-driven approach to handling connections. This means that instead of creating another dedicated process or thread for each request (like servers with a traditional architecture), it handles multiple connections and requests in one worker process. To achieve this, NGINX works with sockets in a non-blocking mode and uses efficient methods such as epoll and kqueue. Because the number of full-weight processes is small (usually only one per CPU core) and constant, much less memory is consumed and CPU cycles aren’t wasted on task switching. The advantages of such an approach are well-known through the example of NGINX itself. It successfully handles millions of simultaneous requests and scales very well. Each process consumes additional memory, and each switch between them consumes CPU cycles and trashes L-caches But the asynchronous, event-driven approach still has a problem. Or, as I like to think of it, an “enemy”. And the name of the enemy is: blocking. Unfortunately, many third-party modules use blocking calls, and users (and sometimes even the developers of the modules) aren’t aware of the drawbacks. Blocking operations can ruin NGINX performance and must be avoided at all costs. Even in the current official NGINX code it’s not possible to avoid blocking operations in every case, and to solve this problem the new “thread pools” mechanism was implemented in NGINX version 1.7.11. What it is and how it supposed to be used, we will cover later. Now let’s meet face to face with our enemy. The Problem First, for better understanding of the problem a few words about how NGINX works. In general, NGINX is an event handler, a controller that receives information from the kernel about all events occurring on connections and then gives commands to the operating system about what to do. In fact, NGINX does all the hard work by orchestrating the operating system, while the operating system does the routine work of reading and sending bytes. So it’s very important for NGINX to respond fast and in a timely manner. The events can be timeouts, notifications about sockets ready to read or to write, or notifications about an error that occurred. NGINX receives a bunch of events and then processes them one by one, doing the necessary actions. Thus all the processing is done in a simple loop over a queue in one thread. NGINX dequeues an event from the queue and then reacts to it by, for example, writing or reading a socket. In most cases, this is extremely quick (perhaps just requiring a few CPU cycles to copy some data into memory) and NGINX proceeds through all of the events in the queue in an instant. All processing is done in a simple loop by one thread But what will happen if some long and heavy operation has occurred? The whole cycle of event processing will get stuck waiting for this operation to finish. So, by saying “a blocking operation” we mean any operation that stops the cycle of handling events for a significant amount of time. Operations can be blocking for various reasons. For example, NGINX might be busy with lengthy, CPU-intensive processing, or it might have to wait to access a resource (such as a hard drive, or a mutex or library function call that gets responses from a database in a synchronous manner, etc.). The key point is that while processing such operations, the worker process cannot do anything else and cannot handle other events, even if there are more system resources available and some events in the queue could utilize those resources. Imagine a salesperson in a store with a long queue in front of him. The first guy in the queue asks for something that is not in the store but is in the warehouse. The salesperson goes to the warehouse to deliver the goods. Now the entire queue must wait a couple of hours for this delivery and everyone in the queue is unhappy. Can you imagine the reaction of the people? The waiting time of every person in the queue is increased by these hours, but the items they intend to buy might be right there in the shop. Nearly the same situation happens with NGINX when it asks to read a file that isn’t cached in memory, but needs to be read from disk. Hard drives are slow (especially the spinning ones), and while the other requests waiting in the queue might not need access to the drive, they are forced to wait anyway. As a result, latencies increase and system resources are not fully utilized. Some operating systems provide an asynchronous interface for reading and sending files and NGINX can use this interface (see the aio directive). A good example here is FreeBSD. Unfortunately, we can’t say the same about Linux. Although Linux provides a kind of asynchronous interface for reading files, it has a couple of significant drawbacks. One of them is alignment requirements for file access and buffers, but NGINX handles that well. But the second problem is worse. The asynchronous interface requires the O_DIRECT flag to be set on the file descriptor, which means that any access to the file will bypass the cache in memory and increase load on the hard disks. That definitely doesn’t make it optimal for many cases. To solve this problem in particular, thread pools were introduced in NGINX 1.7.11. They are not included by default in NGINX Plus yet, but contact sales if you’d like to try a build of NGINX Plus R6 that has thread pools enabled. Now let’s dive into what thread pools are about and how they work. Thread Pools Let’s return to our poor sales assistant who delivers goods from a faraway warehouse. But he has become smarter (or maybe he became smarter after being beaten by the crowd of angry clients?) and hired a delivery service. Now when somebody asks for something from the faraway warehouse, instead of going to the warehouse himself, he just drops an order to a delivery service and they will handle the order while our sales assistant will continue serving other customers. Thus only those clients whose goods aren’t in the store are waiting for delivery, while others can be served immediately. In terms of NGINX, the thread pool is performing the functions of the delivery service. It consists of a task queue and a number of threads that handle the queue. When a worker process needs to do a potentially long operation, instead of processing the operation by itself it puts a task in the pool’s queue, from which it can be taken and processed by any free thread. It seems then we have another queue. Right. But in this case the queue is limited by a specific resource. We can’t read from a drive faster than the drive is capable of producing data. Now at least the drive doesn’t delay processing of other events and only the requests that need to access files are waiting. The “reading from disk” operation is often used as the most common example of a blocking operation, but actually the thread pools implementation in NGINX can be used for any tasks that aren’t appropriate to process in the main working cycle. At the moment, offloading to thread pools is implemented only for two essential operations: the read() syscall on most operating systems and sendfile() on Linux. We will continue to test and benchmark the implementation, and we may offload other operations to the thread pools in future releases if there’s a clear benefit. Benchmarking It’s time to move from theory to practice. To demonstrate the effect of using thread pools we are going to perform a synthetic benchmark that simulates the worst mix of blocking and non-blocking operations. It requires a data set that is guaranteed not to fit in memory. On a machine with 48 GB of RAM, we have generated 256 GB of random data in 4-MB files, and then have configured NGINX 1.9.0 to serve it. The configuration is pretty simple: worker_processes 16; events { accept_mutex off; } http { include mime.types; default_type application/octet-stream; access_log off; sendfile on; sendfile_max_chunk 512k; server { listen 8000; location / { root /storage; } } } As you can see, to achieve better performance some tuning was done: logging and accept_mutex were disabled,sendfile was enabled, and sendfile_max_chunk was set. The last directive can reduce the maximum time spent in blocking sendfile() calls, since NGINX won’t try to send the whole file at once, but will do it in 512-KB chunks. The machine has two Intel Xeon E5645 (12 cores, 24 HT-threads in total) processors and a 10-Gbps network interface. The disk subsystem is represented by four Western Digital WD1003FBYX hard drives arranged in a RAID10 array. All of this hardware is powered by Ubuntu Server 14.04.1 LTS. The clients are represented by two machines with the same specifications. On one of these machines, wrk creates load using a Lua script. The script requests files from our server in a random order using 200 parallel connections, and each request is likely to result in a cache miss and a blocking read from disk. Let’s call this load the random load. On the second client machine we will run another copy of wrk that will request the same file multiple times using 50 parallel connections. Since this file will be frequently accessed, it will remain in memory all the time. In normal circumstances, NGINX would serve these requests very quickly, but performance will fall if the worker processes are blocked by other requests. Let’s call this load the constant load. The performance will be measured by monitoring throughput of the server machine using ifstat and by obtaining wrkresults from the second client. Now, the first run without thread pools does not give us very exciting results: % ifstat -bi eth2 eth2 Kbps in Kbps out 5531.24 1.03e+06 4855.23 812922.7 5994.66 1.07e+06 5476.27 981529.3 6353.62 1.12e+06 5166.17 892770.3 5522.81 978540.8 6208.10 985466.7 6370.79 1.12e+06 6123.33 1.07e+06 As you can see, with this configuration the server is able to produce about 1 Gbps of traffic in total. In the output from top, we can see that all of worker processes spend most of the time in blocking I/O (they are in a D state): top - 10:40:47 up 11 days, 1:32, 1 user, load average: 49.61, 45.77 62.89 Tasks: 375 total, 2 running, 373 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 67.7 id, 31.9 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 49453440 total, 49149308 used, 304132 free, 98780 buffers KiB Swap: 10474236 total, 20124 used, 10454112 free, 46903412 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4639 vbart 20 0 47180 28152 496 D 0.7 0.1 0:00.17 nginx 4632 vbart 20 0 47180 28196 536 D 0.3 0.1 0:00.11 nginx 4633 vbart 20 0 47180 28324 540 D 0.3 0.1 0:00.11 nginx 4635 vbart 20 0 47180 28136 480 D 0.3 0.1 0:00.12 nginx 4636 vbart 20 0 47180 28208 536 D 0.3 0.1 0:00.14 nginx 4637 vbart 20 0 47180 28208 536 D 0.3 0.1 0:00.10 nginx 4638 vbart 20 0 47180 28204 536 D 0.3 0.1 0:00.12 nginx 4640 vbart 20 0 47180 28324 540 D 0.3 0.1 0:00.13 nginx 4641 vbart 20 0 47180 28324 540 D 0.3 0.1 0:00.13 nginx 4642 vbart 20 0 47180 28208 536 D 0.3 0.1 0:00.11 nginx 4643 vbart 20 0 47180 28276 536 D 0.3 0.1 0:00.29 nginx 4644 vbart 20 0 47180 28204 536 D 0.3 0.1 0:00.11 nginx 4645 vbart 20 0 47180 28204 536 D 0.3 0.1 0:00.17 nginx 4646 vbart 20 0 47180 28204 536 D 0.3 0.1 0:00.12 nginx 4647 vbart 20 0 47180 28208 532 D 0.3 0.1 0:00.17 nginx 4631 vbart 20 0 47180 756 252 S 0.0 0.1 0:00.00 nginx 4634 vbart 20 0 47180 28208 536 D 0.0 0.1 0:00.11 nginx 4648 vbart 20 0 25232 1956 1160 R 0.0 0.0 0:00.08 top 25921 vbart 20 0 121956 2232 1056 S 0.0 0.0 0:01.97 sshd 25923 vbart 20 0 40304 4160 2208 S 0.0 0.0 0:00.53 zsh In this case the throughput is limited by the disk subsystem, while the CPU is idle most of the time. The results from wrkare also very low: Running 1m test @ http://192.0.2.1:8000/1/1/1 12 threads and 50 connections Thread Stats Avg Stdev Max +/- Stdev Latency 7.42s 5.31s 24.41s 74.73% Req/Sec 0.15 0.36 1.00 84.62% 488 requests in 1.01m, 2.01GB read Requests/sec: 8.08 Transfer/sec: 34.07MB And remember, this is for the file that should be served from memory! The excessively large latencies are because all the worker processes are busy with reading files from the drives to serve the random load created by 200 connections from the first client, and cannot handle our requests in good time. It’s time to put our thread pools in play. For this we just add the aio threads directive to the location block: location / { root /storage; aio threads; } and ask NGINX to reload its configuration. After that we repeat the test: % ifstat -bi eth2 eth2 Kbps in Kbps out 60915.19 9.51e+06 59978.89 9.51e+06 60122.38 9.51e+06 61179.06 9.51e+06 61798.40 9.51e+06 57072.97 9.50e+06 56072.61 9.51e+06 61279.63 9.51e+06 61243.54 9.51e+06 59632.50 9.50e+06 Now our server produces 9.5 Gbps, compared to ~1 Gbps without thread pools! It probably could produce even more, but it has already reached the practical maximum network capacity, so in this test NGINX is limited by the network interface. The worker processes spend most of the time just sleeping and waiting for new events (they are in S state in top): top - 10:43:17 up 11 days, 1:35, 1 user, load average: 172.71, 93.84, 77.90 Tasks: 376 total, 1 running, 375 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 1.2 sy, 0.0 ni, 34.8 id, 61.5 wa, 0.0 hi, 2.3 si, 0.0 st KiB Mem: 49453440 total, 49096836 used, 356604 free, 97236 buffers KiB Swap: 10474236 total, 22860 used, 10451376 free, 46836580 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4654 vbart 20 0 309708 28844 596 S 9.0 0.1 0:08.65 nginx 4660 vbart 20 0 309748 28920 596 S 6.6 0.1 0:14.82 nginx 4658 vbart 20 0 309452 28424 520 S 4.3 0.1 0:01.40 nginx 4663 vbart 20 0 309452 28476 572 S 4.3 0.1 0:01.32 nginx 4667 vbart 20 0 309584 28712 588 S 3.7 0.1 0:05.19 nginx 4656 vbart 20 0 309452 28476 572 S 3.3 0.1 0:01.84 nginx 4664 vbart 20 0 309452 28428 524 S 3.3 0.1 0:01.29 nginx 4652 vbart 20 0 309452 28476 572 S 3.0 0.1 0:01.46 nginx 4662 vbart 20 0 309552 28700 596 S 2.7 0.1 0:05.92 nginx 4661 vbart 20 0 309464 28636 596 S 2.3 0.1 0:01.59 nginx 4653 vbart 20 0 309452 28476 572 S 1.7 0.1 0:01.70 nginx 4666 vbart 20 0 309452 28428 524 S 1.3 0.1 0:01.63 nginx 4657 vbart 20 0 309584 28696 592 S 1.0 0.1 0:00.64 nginx 4655 vbart 20 0 30958 28476 572 S 0.7 0.1 0:02.81 nginx 4659 vbart 20 0 309452 28468 564 S 0.3 0.1 0:01.20 nginx 4665 vbart 20 0 309452 28476 572 S 0.3 0.1 0:00.71 nginx 5180 vbart 20 0 25232 1952 1156 R 0.0 0.0 0:00.45 top 4651 vbart 20 0 20032 752 252 S 0.0 0.0 0:00.00 nginx 25921 vbart 20 0 121956 2176 1000 S 0.0 0.0 0:01.98 sshd 25923 vbart 20 0 40304 3840 2208 S 0.0 0.0 0:00.54 zsh There are still plenty of CPU resources. The results of wrk: Running 1m test @ http://192.0.2.1:8000/1/1/1 12 threads and 50 connections Thread Stats Avg Stdev Max +/- Stdev Latency 226.32ms 392.76ms 1.72s 93.48% Req/Sec 20.02 10.84 59.00 65.91% 15045 requests in 1.00m, 58.86GB read Requests/sec: 250.57 Transfer/sec: 0.98GB The average time to serve a 4-MB file has been reduced from 7.42 seconds to 226.32 milliseconds (33 times less), and the number of requests per second has increased by 31 times (250 vs 8)! The explanation is that our requests no longer wait in the events queue for processing while worker processes are blocked on reading, but are handled by free threads. As long as the disk subsystem is doing its job as best it can serving our random load from the first client machine, NGINX uses the rest of the CPU resources and network capacity to serve requests of the second client from memory. Still Not a Silver Bullet After all our fears about blocking operations and some exciting results, probably most of you already are going to configure thread pools on your servers. Don’t hurry. The truth is that fortunately most read and send file operations do not deal with slow hard drives. If you have enough RAM to store the data set, then an operating system will be clever enough to cache frequently used files in a so-called “page cache”. The “page cache” works pretty well and allows NGINX to demonstrate great performance in almost all common use cases. Reading from the page cache is quite quick and no one can call such operations “blocking.” On the other hand, offloading to a thread pool has some overhead. So if you have a reasonable amount of RAM and your working data set isn’t very big, then NGINX already works in the most optimal way without using thread pools. Offloading read operations to the thread pool is a technique applicable to very specific tasks. It is most useful where the volume of frequently requested content doesn’t fit into the operating system’s VM cache. This might be the case with, for instance, a heavily loaded NGINX-based streaming media server. This is the situation we’ve simulated in our benchmark. It would be great if we could improve the offloading of read operations into thread pools. All we need is an efficient way to know if the needed file data is in memory or not, and only in the latter case should the reading operation be offloaded to a separate thread. Turning back to our sales analogy, currently the salesman cannot know if the requested item is in the store and must either always pass all orders to the delivery service or always handle them himself. The culprit is that operating systems are missing this feature. The first attempts to add it to Linux as the fincore() syscall were in 2010 but that didn’t happen. Later there were a number of attempts to implement it as a new preadv2() syscall with the RWF_NONBLOCK flag (see Non-blocking buffered file read operations and Asynchronous buffered read operations at LWN.net for details). The fate of all these patches is still unclear. The sad point here is that it seems the main reason why these patches haven’t been accepted yet to the kernel is continuous bikeshedding. On the other hand, users of FreeBSD don’t need to worry at all. FreeBSD already has a sufficiently good asynchronous interface for reading files, which you should use instead of thread pools. Configuring Thread Pools So if you are sure that you can get some benefit out of using thread pools in your use case, then it’s time to dive deep into configuration. The configuration is quite easy and flexible. The first thing you should have is NGINX version 1.7.11 or later, compiled with the --with-threads configuration parameter. In the simplest case, the configuration looks very plain. All you need is to include the aio threads directive in the http, server, or location context: aio threads; This is the minimal possible configuration of thread pools. In fact, it’s a short version of the following configuration: thread_pool default threads=32 max_queue=65536; aio threads=default; It defines a thread pool called default with 32 working threads and a maximum length for the task queue of 65536 requests. If the task queue is overloaded, NGINX logs this error and rejects the request: thread pool "NAME" queue overflow: N tasks waiting The error means it’s possible that the threads aren’t able to handle the work as quickly as it is added to the queue. You can try increasing the maximum queue size, but if that doesn’t help, then it indicates that your system is not capable of serving so many requests. As you already noticed, with the thread_pool directive you can configure the number of threads, the maximum length of the queue, and the name of a specific thread pool. The last implies that you can configure several independent thread pools and use them in different places of your configuration file to serve different purposes: http { thread_pool one threads=128 max_queue=0; thread_pool two threads=32; server { location /one { aio threads=one; } location /two { aio threads=two; } } … } If the max_queue parameter isn’t specified, the value 65536 is used by default. As shown, it’s possible to set max_queueto zero. In this case the thread pool will only be able to handle as many tasks as there are threads configured; no tasks will wait in the queue. Now let’s imagine you have a server with three hard drives and you want this server to work as a «caching proxy» that caches all responses from your back ends. The expected amount of cached data far exceeds the available RAM. It’s actually a caching node for your personal CDN. Of course in this case the most important thing is to achieve maximum performance from the drives. One of your options is to configure a RAID array. This approach has its pros and cons. Now with NGINX you can take another one: # We assume that each of the hard drives is mounted on one of the directories: # /mnt/disk1, /mnt/disk2, or /mnt/disk3 accordingly proxy_cache_path /mnt/disk1 levels=1:2 keys_zone=cache_1:256m max_size=1024G use_temp_path=off; proxy_cache_path /mnt/disk2 levels=1:2 keys_zone=cache_2:256m max_size=1024G use_temp_path=off; proxy_cache_path /mnt/disk3 levels=1:2 keys_zone=cache_3:256m max_size=1024G use_temp_path=off; thread_pool pool_1 threads=16; thread_pool pool_2 threads=16; thread_pool pool_3 threads=16; split_clients $request_uri $disk { 33.3% 1; 33.3% 2; * 3; } location / { proxy_pass http://backend; proxy_cache_key $request_uri; proxy_cache cache_$disk; aio threads=pool_$disk; sendfile on; } In this configuration three independent caches are used, dedicated to each of the disks, and three independent thread pools are dedicated to the disks as well. The split_clients module is used for load balancing between the caches (and as a result between the disks), which perfectly fits this task. The use_temp_path=off parameter to the proxy_cache_path directive instructs NGINX to save temporary files into the same directories where the corresponding cache data is located. It is needed to avoid copying response data between the hard drives when updating our caches. All this together allows us to get maximum performance out of the current disk subsystem, because NGINX through separate thread pools interacts with the drives in parallel and independently. Each of the drives is served by 16 independent threads with a dedicated task queue for reading and sending files. I bet your clients like this custom-tailored approach. Be sure that your hard drives like it too. This example is a good demonstration of how flexibly NGINX can be tuned specifically for your hardware. It’s like you are giving instructions to NGINX about the best way to interact with the machine and your data set. And by fine-tuning NGINX in user space, you can ensure that your software, operating system, and hardware work together in the most optimal mode to utilize all the system resources as effectively as possible. Conclusion Summing up, thread pools is a great feature that pushes NGINX to new levels of performance by eliminating one of its well-known and long-time enemies – blocking – especially when we are speaking about really large volumes of content. And there is even more to come. As previously mentioned, this brand-new interface potentially allows offloading of any long and blocking operation without any loss of performance. NGINX opens up new horizons in terms of having a mass of new modules and functionality. Lots of popular libraries still do not provide an asynchronous non-blocking interface, which previously made them incompatible with NGINX. We may spend a lot of time and resources on developing our own non-blocking prototype of some library, but will it always be worth the effort? Now, with thread pools on board, it is possible to use such libraries relatively easily, making such modules without an impact on performance. Stay tuned.

June 22, 2015

by Patrick Nommensen

· 12,650 Views · 1 Like