DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Languages Topics

article thumbnail
Escolhendo a Melhor Hospedagem de Sites
Nessa postagem vamos dar uma breve explicação de forma simples e fácil do que é o que faz um servidor de hospedagem. Hospedagem de sites nada mais é do que o disco virtual que fica online 24 horas por dia que o hospeda os arquivos do seu site, e é justamente por isso que quando você digita o domínio do site a qualquer hora do dia e da noite seu site está sempre no ar. Mas um servidor de hospedagem é muito mais do que um simples disco de armazenamento online que mantém seu site no ar. Poucas pessoas sabem, mas o uso de e-mails está vinculado ao seu serviço de hospedagem, dizemos isso por que as pessoas nunca vinculam a hospedagem com os serviços de e-mail. Enfim, saibam que sempre que contratar uma hospedagem o serviço de envio e recebimento de e-mails também está incluso no pacote. Como escolher uma excelente hospedagem de sites? Bom agora que temos noção do que é uma hospedagem de sites vamos falar sobre quais fatores devemos levar em consideração antes de contratarmos uma empresa que nos preste serviço. Começamos dizendo que preço não é tudo. Muitas pessoas acham que quanto mais cara e de melhor marca a hospedagem de sites é mais seguros estão. Isso é um erro muito comum cometido por diversos usuários novos na internet. Antes de contratar pesquise no Google os seguintes quesitos: Preço, estabilidade do servidor e velocidade e tempo de resposta. Já citamos o primeiro item, que é o preço. Um preço médio de uma boa hospedagem varia entre R$ 10,00 e R$ 30,00, mas como dissemos essa é nossa ultima preocupação, pois o mais importante são os outros fatores envolvidos na qualidade. Estabilidade do Servidor A estabilidade de um servidor de hospedagem esta diretamente ligada à quantidade de banda larga disponibilizada para o servidor que você está hospedado, isso sem contar que se junto ao seu site tiver muitos outros sites hospedados no mesmo servidor com certeza ocorreram quedas frequentes e seu site ficará fora do ar. Por isso colocamos a quantidade de sites que estarão junto ao seu como um de nossos quesitos. Contrate sempre uma hospedagem com no mínimo 100 Giga bytes de tráfego de transferência, se possível com tráfego ilimitado, mas com no mínimo 100 GB você já fica tranquilo com a estabilidade do serviço. Velocidade e Tempo de Resposta Esse é o segundo quesito mais importante em diversos aspectos. O primeiro deles é, se o tempo de carregamento de um site for lento com certeza o usuário vai procurar outro site que ofereça os mesmos serviços que o seu, então sempre consulte o tempo de resposta junto a empresa de hospedagem de sites. O tempo médio de resposta de servidores é um tempo menor que 2 segundos. Outro fator que influencia na velocidade é a localidade do servidor. Se sua empresa está no Brasil tente comprar um serviço de hospedagem onde você saiba onde está localizado o servidor. Não vá me comprar uma hospedagem na China, pois com certeza você terá problemas. A melhor opção é comprar um servidor no Brasil, onde com certeza o tempo de resposta será menor. Mas existem muitos servidores localizados nos Estados Unidos que podemos confiar fielmente. Outro aspecto importante na velocidade é que, se você planeja vender seu negócio pela busca orgânica do Google e seu tempo de resposta for maior do que 2 segundos tenha ciência que você irá perder muito posicionamento nos resultados de busca. Hoje me dia além de termos um site bem assessorado em SEO precisamos também contar um serviço que hospedagem que não nos deixe na mão. Como ultimo fator preste atenção em quantos sites estão hospedados juntos no servidor compartilhado, pois muitos sites ainda fazem muito SPAM e esses spams prejudicam a estabilidade e velocidade do servidor. Sempre ligue para a hospedagem e pergunte se no seu servidor compartilhado eles têm sites suspeitos, se tiver escolha outra empresa de hospedagem. Se tiver um dinheiro sobrando e quiser ter tranquilidade contrate uma hospedagem dedicada, onde o servidor é só seu, mas isso se seu site for daqueles que não pode ficar fora do ar de jeito nenhum, pois em muitas vezes alguns ajustes na taxa de transferência resolvem o problema.
July 2, 2015
by Raphael Acheti
· 726 Views
article thumbnail
Mapping complex JSON structures with JDK8 Nashorn
How can you map a complex JSON structure to another JSON structure in Java? I think there are a few possible solutions in Java. The first solution is to use a serialization framework like Jackson, GSON or smart-json. The mapping is a piece of awkward Java code with a lot of if-else conditions. The result is hard to test and hard to maintain. Schematic it looks like this: JSON -> Java objects -> Mapping -> Java objects -> JSON An second approach is to use a templating framwork (like Freemarker or Velocity) in combination with a serialization framwork. The logic of the mapping has moved to the template. Schematic it looks like this: JSON -> Java objects -> Apply template -> JSON One of the issues with this approach is that the template must enforce that the result is a valid JSON structure. I have tried this approach and it is really hard to produce a valid JSON structure in all use cases. You could also map your JSON to XML and create the mapping with an XSL transformations. Schematic it looks like this: JSON -> XML -> XSL transformation -> XML -> JSON But the ideal schema looks like this: JSON -> Mapping -> JSON With JDK 8 and the Nashorn Javascript engine this becomes possible! This implementation provides JSON.parse() and JSON.stringify() by default. Example Javascript: function convert(val) { var json = JSON.stringify(val); var g = JSON.parse(json); var d = { chunkId: g.chunk.id, timestamp: g.chunk.timestamp }; return JSON.stringify(d); } Java code: private ScriptEngineManager engineManager; private ScriptEngine engine; public MyConverter() { ClassPathResource resource = new ClassPathResource("/converter.js"); InputStreamReader reader = new InputStreamReader(resource.getInputStream()); engineManager = new ScriptEngineManager(); engine = engineManager.getEngineByName("nashorn"); engine.eval(reader); } public String convert(String val){ return (String) engine.eval("convert(" + source + ")"); } I think this is -at this moment- the best approach, Java 9 will ship with native JSON support. Perhaps it will become more easier in the future. More info can be found on my blog.
July 2, 2015
by Jethro Bakker
· 1,467 Views
article thumbnail
Are crowds wise or mad?
Wharton’s Ethan Mollick is undoubtedly one of my favorite thinkers, and I’ve written about a number of his papers previously, whether it’s on the role of middle managers in innovation, or how successful crowdfunding has been at picking winners (compared to traditional venture capital). This apparent wisdom of crowds is something he has returned to for his latest paper, which looks at how successful crowds are versus experts in the funding of art. The study measures the artistic judgment of the crowd versus a team of experts to see how closely they’re matched. The art in question was a collection of 120 theatrical ventures listed on Kickstarter. There have been a number of studies down the years that highlight how effective the ‘uneducated masses’ tend to be when compared to an educated elite, and this one was no exception. “On average, we find a remarkable degree of convergence between the realized funding decisions by crowds and the evaluation of those same projects by experts,” Mollick says. “Projects that were funded by the crowds received consistently higher scores from experts … and were much more likely to have received funding from the experts.” How important crowdfunding is for arts funding The study was inspired by the finding that more money is raised for artistic ventures via Kickstarter than via the National Endowment for the Arts, which is the primary way the US government gives money to the arts. That obviously represents a sizable shift in how money is raised, so the authors were keen to explore what that meant. Were these new patrons ensuring the same quality of art? Does a greater range of art get funded? The authors recruited a team of well established experts from the art world and asked them to judge the projects funded on Kickstarter. The aim was to see if they would have funded those projects via more official channels. Interestingly there was indeed a broad level of consensus between the experts and the crowd. The experts agreed with many of the projects that got funded, and where disagreement existed, it was usually that the experts would not have funded a particular project. So, in reality, the crowd were ensuring a wider and more diverse range of projects received funding. What’s more, the crowd also seemed a good judge of potential success, with a strong track record of picking ‘winners’ in terms of commercial or artistic success. The study provides further insight into the potential for crowds to perform as well, if not better than, supposed experts. Certainly food for thought. Original post
July 2, 2015
by Adi Gaskell
· 942 Views · 1 Like
article thumbnail
Using Liquibase Without a Database Connection
There are many, many different processes and requirements companies have for managing their database schemas. Some allow the application to directly manage them on startup, some require SQL scripts be executed by hand. Some have schemas that can differ across customers, some have only one database to deal with. For people who prefer to execute SQL themselves, Liquibase has always supported an “updateSQL” mode which does not update the database but instead outputs what would be run. This allows developers and DBAs to know exactly what will be ran and even make modifications as needed before actually executing the script. Before version 3.2, however, Liquibase required an active database connection for updateSQL. It used that connection to determine the SQL dialect to use and to query the DATABASECHANGELOG table to learn what changeSets have already been executed. Controlling updateSql SQL Syntax With version 3.2, Liquibase added a new “offline” mode. Instead of specifying a jdbc url such as “jdbc:mysql://localhost/lbcat” you can use “offline:mysql” or “offline:postgresql” which lets Liquibase know what dialect to use. For finer dialect control, you can specify parameters like “offline:mysql?version=3.4&caseSensitive=false Available dialect parameters: version: Standard X.Y.Z version of the database productName: String description of the database, like the JDBC driver would return catalog: String containing the name of the default top-level container ('database' in some databases 'schema' in others) caseSensitive: Boolean value specifying if the database is case sensitive or not Tracking History With CSV These parameters let Liquibase know what SQL to generate for each changeSet, but without an active database connection you cannot rely on the DATABASECHANGELOG table to track what changeSets have already been ran. Instead, offline mode uses a CSV file which mimics the structure of the DATABASECHANGELOG table. By default, Liquibase will use a file called “databasechangelog.csv” in the working directory, but it can be specified with a “changeLogFile” parameter such as “offline:mssql?changeLogFile=path/to/file.csv” It is up to you to ensure that the contents of the csv file match what is in the database. Running updateSQL automatically appends to the CSV file under the assumption that you will apply the SQL to the database. Since the csv file matches a particular database, it isn’t something you normally would store or share under version control because every database can (and probably will) be in a different state. If you do store the files in a central location, you will probably want to at least have a separate file for each database. By default, the SQL generated by updateSql in offline mode will still contain the standard DATABASECHANGELOG insert statements, so each database that you apply the SQL to will still have a correct DATABASECHANGELOG table. This means that you can switch between a direct-connection update and offline updateSQL as needed. It also means that you can also extract the current contents of the DATABASECHANGELOG table to a CSV file and use that as the file passed to the offline connection to ensure you have the right contents in the file. If you do not want the DATABASECHANGELOG table SQL included in updateSQL output, there is an “outputLiquibaseSql” parameter which can be passed in your offline url. Possible outputLiquibaseSql values: "none" will output no DATABASECHANGELOG statements "data_only" will output only INSERT INTO DATABASECHANGELOG statements "all" will output CREATE TABLE DATABASECHANGELOG if the csv file does not exist as well as INSERT statements (default value) Offline Snapshots The new 3.4.0 release of Liquibase expands offline support with a new “snapshot” parameter which can be passed to the offline url pointing to a saved database structure. Liquibase will use the snapshot anywhere it would have normally needed to read the current database state. This allows you to use preconditions and perform diff and diffChangeLog operations without an active connection and even between snapshots of the same database from different points in time. To create a snapshot of your live databases, use the “—snapshotFormat=json” parameter on the “snapshot” command. Command line example: $ liquibase --url=jdbc:mysql://localhost/lbcat snapshot --snapshotFormat=json > snapshot.json or $ liquibase --url=jdbc:mysql://localhost/lbcat –outputFile=path/to/output.json snapshot --snapshotFormat=json NOTE: currently only “json” is supported as a snapshotFormat. You can then use that file with your offline url and any snapshot operations will use it as the database state. liquibase –url=jdbc:mysql://localhost/lbcat –referenceUrl=offline:mysql?snapshot=path/to/snapshot.json diff will compare the stored snapshot with the current database state liquibase –url=offline:mysql?snapshot=path/to/snapshot.json diff –referenceUrl=offline:mysql?snapshot=path/to/older-snapshot.json diff will compare two snapshots liquibase –url=offline:mysql?snapshot=path/to/snapshot.json generateChangeLog will generate a changelog based on what is in the snapshot liquibase –url=jdbc:mysql://localhost/lbcat –referenceUrl=offline:mysql?snapshot=path/to/snapshot.json diffChangeLog will generate a changelog based on what is new in the real database compared to what is in the snapshot.
July 2, 2015
by Nathan Voxland
· 10,831 Views
article thumbnail
Turning a Static HTML Site into a WordPress Theme: Why, How & More
With the release of version 4.1 “Dinah”, WordPress now powers over 60 million websites across the web and is being used by many well-known sites like Forbes, TechCrunch, GigaOM and CNN. Due to the rapid growth in popularity of WordPress in recent years, more and more people are now in favor of moving their static HTML sites to WordPress. Running your site on WordPress platform proves to be beneficial for you in many ways, out of which “easy content management” is the one. In this blog post, firstly I’ll make you familiar with reasons that inspire people to adopt WordPress. After that, I’ll take you through the process of converting an HTML site to WordPress. Later, I’ll be telling you what things you should do after migration. Let’s start! Why to go from Static HTML to WordPress? Below are some solid reasons why people move to WordPress: #Easy to Use: First and foremost reason, WordPress is extremely user-friendly. Anyone having adequate knowledge of computer and internet can setup and manage a WordPress site without any hassle. Regardless of who you are, a professional developer or a non-techie, you can get up and running with WordPress in just five minutes. Strictly speaking, everything from software installation to code modification to content publication is a breeze in WordPress. #SEO Friendly: WordPress is built to embrace search engine spiders and crawlers and therefore, it attracts a huge amount of organic traffic to your site. Having a clean code structure and packed with several search optimization tools, such as permalinks, blogroll and pingback, WordPress ensures your site would get higher rankings in search engine results. In addition, it allows you to take advantage of third party plug-ins for better SEO of your site. #Scalable and Flexible: As WordPress is an extremely customizable and highly expandable CMS, you’ll be able to give your site any look and functionality that you desire. It allows you to choose from a wide range of themes so that you could create any website of your taste. Also, there are a myriad of plug-ins available to let you enhance WordPress’ core functionality. Thus, the possibilities of what can be done with WordPress are endless. #Cost and Time Effective: As WordPress is open-source software, it’ll not affect your bank account unlike traditional websites do. Most of the WordPress themes and plug-ins are available to use for free. Means, you don’t need to spend a lot of time and money on a developer to have minor changes in the design and functionality of your site. With WordPress, you can do them by yourself. #Strong Community Support: WordPress is backed by a large and always growing community. So if you need any help regarding your website, there will always be someone there to assist you. There is no need to call a developer every time you want some editing in the code and content of your site. Just post your problems there and get them resolved by experts for free in minutes. #Trouble-free Upgrades: Websites built with WordPress take less time to upgrade as compared to static ones. In WordPress, using an FTP program such as FileZilla, you can take your website to a whole new level with a few mouse clicks. Unlike classic HTML websites, there is no need to mess with complex firewall settings or any other software. #Multi-User Capability: Being a multi-user capable platform, WordPress lets you control who can do what within your site. You as a site owner can assign a specific role to each of your users, allowing them to perform a set of tasks. For example, you can set up your editor with a user account where he is allowed only to add and edit content to your site. Try this with a static HTML site!! #Safe and Secure: Since its launch, WordPress has been updated more than 25 times. What do these all updates mean? Obviously, security! WordPress team is continuously working hard to make WordPress world’s most secure and reliable CMS. That’s the reason a site built with WordPress is secure enough to deal with any kind of malicious intent. How to Migrate from Static HTML Site to WordPress? If you’re ready to switch to WordPress, below are four steps following which you can move your existing HTML website to WordPress platform efficiently and effectively. #Analyze Your Existing HTML Site: This is the first and foremost step that you should follow before you’re going to convert your static HTML site to a WordPress theme. Check your site for irrelevant or outdated content and if found, clean it up. Examine the existing navigation system and think how it can be improved. Also, don’t forget to dig into hidden elements such as contact page, registration forms and email subscription etc. Doing your HTML site analysis would help you decide what content, features and functionalities should be migrated to WordPress. Consequently, you would have a clear idea about what plug-ins you need to install for getting the same functionality on WordPress platform. Remember, migration is the perfect time to assess whether the content of your site is worthy or not. #Get to Know WordPress: Once you have analyzed your static HTML site, the next step is to familiarize yourself with WordPress. This can be done by installing WordPress on a local computer or with your web hosting provider. WordPress installation is a quite easy process and therefore, I don’t think you would face any kind of trouble. Most web hosts offer one-click quick install and in case you do get stuck, please contact your web host. After finishing the installation, understand how WordPress works and try to find out which plug-ins would prove to be extremely helpful after migration process. Additionally, using “Settings” menu in the WordPress Dashboard, choose your permalink structure and disallow search engines to index your site during migration. #Do a Thorough Backup of Your HTML Site: Even if you have taken back-up of your old static site many times, you must not skip this step. I strongly recommend you to “take a complete backup of your static site once more” in order to avoid any risk of data loss while migrating. Remember, backups take very little effort and time but still are absurdly ignored. Hence as a precaution, have a tested backup saved in multiple locations (such as DVD, hard drive or hosting backup server) so that you could restore your site in case something goes wrong. As well, I suggest you not to tinker with your site in live mode even if you feel whatever you're doing is right. #Migrate to WordPress from Static HTML: Let’s come to Migration, the most juicy and vital part of the entire HTML to WordPress conversion process. May be conversion seems a bit tedious to you but actually it’s not like that. It indeed depends on your proficiency level in WordPress, HTML, PHP, CSS and JavaScript. If you have a passing familiarity with all of them, you can do conversion by yourself. Otherwise, you may need to get a professional HTML to WordPress conversion service for the same. Assuming you have sufficient coding knowledge and your site is small, the best option possible in front of you is to divide your existing HTML code into four sections (header, footer, sidebar and content) and then copy the content of each section into its respective PHP file. In case your site is large, you can take advantage of an HTML to WordPress plug-in, like HTML Import 2, to give your conversion process a boost. What to do after the migration? Once the conversion is completed, you need to do a few things to give your WordPress site the final touch. They are mentioned below: Install Necessary Plug-ins: To supercharge your brand new WordPress site with same functionalities as HTML site, install plug-ins that you found handy. Check and Fix Broken Links: Check your website for broken links (404 errors) and if found, fix them as soon as possible. You can make use of Google Webmaster Tools for this task. Set-up a Custom 404 Error Page: Add a custom 404 error page to take your visitors to important sections of your WordPress site, in case they try to access any URL that doesn't exist. Redirect Links: To inform search engines that your website’s content has been moved to a new web address, set up 301 redirects. For this purpose, you can use Simple 301 Redirect or Redirection plug-in. Enable Search Engine Indexing: Go to “Settings --> Reading” in your WordPress dashboard and check “Allow search engines to index this site” to get your site indexed by search engines. Generate and Submit XML Sitemap: To ensure your site would be included in search engine results as fast as possible, create an XML sitemap using plug-in like Google XML Sitemaps or XML Sitemaps and submit it to Google.
July 2, 2015
by Ajeet Yadav
· 10,008 Views
article thumbnail
JavaOne 2015 Java EE Track Committee: Johan Vos
This is the third in a series of interviews for you to meet some of the committee members for the JavaOne 2015 Java EE track. The committee plays the most important part in determining the content for JavaOne. These good folks really deserve recognition as most of them devote many hours of their time helping move JavaOne forward, often as volunteers. If JavaOne matters to you, these are folks you should know about. This interview is with Johan Vos. If you are having trouble seeing the embedded video below it is available here. Johan is a Java Champion, author, speaker, blogger, member of the BeJUG steering group, member of the Devoxx steering group and a JCP member. He is a fan of Java EE, GlassFish and JavaFX. He founded LodgON, a company offering Java based solutions for social networking software. In the interview he shares his experience and expectations for the Java EE track this year. On this note, I would like to make sure you know that the JavaOne content catalog is now already live with a few preliminary fairly obvious selections we were able to make. None of the sessions accepted at this stage are from Oracle speakers on our track. The folks that we selected early for acceptance include David Blevins, Jonathan Gallimore, Mohammed Taman, Rafael Benevides and Antoine Sabot-Durand. They will be talking about Java EE Connectors (JCA), Java EE 7 real world adoption, CDI and DeltaSpike. I would encourage you to check out all the early selections in the catalog. We are working to finalize the full catalog shortly. I hope to see you at JavaOne. Do stay tuned for more interviews with committee members and some key speakers on our track.
July 1, 2015
by Reza Rahman
· 1,182 Views
article thumbnail
What is Automorphic number in Java ?
In mathematics an automorphic number (sometimes referred to as a circular number) is a number whose square "ends" in the same digits as the number itself. For example, 52 = 25, 62 = 36, 762 = 5776, and 8906252 = 793212890625, so 5, 6, 76 and 890625 are all automorphic numbers. And the logic behind : int n=56; int d=1; int i; for(i=n;i>0;i=i/10) { d=d*10; } if((n*n)%d==n) { System.out.println(n+"\t"+"is Automorphic Number"); } else { System.out.println(n+"\t"+"is not Automorphic Number"); } } You can check full article from Geek On Java - Hub for Java and Android
July 1, 2015
by Das Nic
· 8,657 Views
article thumbnail
DBmaestro is the first 3rd party vendor to release extension for Oracle SQL Developer 4.x
DBmaestro, the pioneer and leading provider of DevOps for database solutions, has announced that TeamWork’s extension for Oracle SQL Developer 4.1 has been released, just weeks after Oracle released its latest version. DBmaestro TeamWork is now the only tool with an external extension that supports version 4.x of Oracle’s database development tool. TeamWork, DBmaestro’s flagship product, enables agile development and continuous integration & delivery for the database. TeamWork supports streamlining of development process management and enforcing change policy practices. Many leading enterprises use DBmaestro to facilitate DevOps for their database by executing deployment automation, enhancing and reinforcing security, and mitigating risk. DBmaestro’s extension for SQL Developer 4.x helps Oracle developers and DBAs streamline database development, collaborate across database teams, and allows for agile database development in an efficient and reliable way. Upon the release of SQL Developer 4.0, Oracle required that all extensions be updated to be compatible with the new version’s API. This is a result of the drastic changes made to JDeveloper, on which SQL Developer is built. “The SQL Developer 4.1 extension represents an important achievement for DBmaestro,” said Yaniv Yehuda, co-founder and CTO of DBmaestro. “Oracle SQL Developer has over 4 million active users and is the de facto standard database IDE tool out there. Oracle has drastically changed their API on the 4.0 version, which presented a challenge to those seeking to update third-party extensions. This integration is a statement of our commitment to our customers, and we will continue to lead the way to achieve DevOps for the databases.”
June 30, 2015
by Jeremy Tess
· 895 Views
article thumbnail
Instant Enterprise REST Accelerates the Software Driven Business
Software Driven Business is a consensus goal. But real challenges exist: the time, cost and complexity of building such apps is substantial. Business Agility – and strategic business advantage – is lost. We need another revolution – Instant Enterprise REST – that provides Business Agility using business-level specifications rather than low-level code, and delivers Enterprise-class scalability, integration, enforcement and extensibility. It’s now a reality with Instant Enterprise REST. Software Driven Business: Consensus Vision Businesses have seen the value in providing mobile and tablet apps that bring the business into the hands of customers and employees. They provide information at their finger tips – wherever they are. Industry Leaders like CA have pioneered the vision of a Software Driven Business. They argue persuasively that strategic business advantage lies in Time to Market and Time to Decision: “reveal the need for speed in the application economy. As companies transform into software-driven enterprises, bringing high-quality applications to market faster becomes one of the most critical differentiators.” The Business Agility Gap While there is consensus around this vision, there is a substantial gap in realizing the Software Driven Business. It centers around Agility – time to market. As CA argues, this drives strategic business advantage. This problem manifests both to Business Users and IT, although differently. You might have been party to a discussion like this: Business Users are frustrated about how long it takes to create systems, and revise them. They see problems that look nearly as simple as a spreadsheet take weeks… to months. How can it months for IT to build a system that takes days on a spreadsheet? IT is no less frustrated. They understand the deep technology it takes to build Enterprise-class systems: We’re working 90 hours a week. And falling behind. Gap Analysis For apps about critical corporate data, there’s general consensus that the time and cost for such systems are about evenly split between backends and front ends. And there’s nearly universal consensus that, independent of the UI technology, that RESTful APIs deliver the backend data. But the backend is far more than basic data access. A “SQL Pass-through” – simply restifying SQL data – does not meet Enterprise-class requirements to scale, integrate and enforce: Scale – APIs require Pagination to address large result sets, Nested Documents to reduce latency, Optimistic Locking to ensure concurrency. These are not provided in a simple SQL Pass-through – you must program them, by hand. Integrate – a wizard can produce an API from schema objects, but it cannot address multiple databases, or integrate non-SQL data sources such as ERP, other RESTful services, or NoSQL. Enforce – an API needs to enforce our security (down to the row level), and the integrity of the data. These are significant tasks, which are sadly often placed in client buttons where they cannot be shared. Providing these Enterprise class services takes significant time, expertise and expense. Business Agility is reduced. IT is essentially being forced to cover inadequate technology infrastructure. The Business Users are right: if the Business Specification is clear, then that ought to be enough: A clear business specification should be sufficient. Everything else is just friction. The vision of the Software Driven Business requires Business Driven Software that pre-supplies the infrastructure. We are not seeking 10 or 15%. We are looking for orders of magnitude. Our vision must be: We should be able to create RESTful APIs (mainly) from business specifications, not low level code. It should be no more difficult to create a system than it is toimagine it. Business-Driven Software: Instant Enterprise REST Business Driven Software is more than just a clever play on words. It’s a real implementation that delivers this vision, and we call it Instant Enterprise REST. It consists of 3 core technologies: Enterprise Pattern Automation – creates APIs that with Enterprise-class scalability built-in (pagination, nested documents, optimistic locking, etc) Declarative – specify your API, integration and enforcement policies with spreadsheet-like rules in a simple point-and-click UI Extensibility – enables the RESTful APIs to invoke your existing logic, inside or outside the JVM, via standard server-side JavaScript. The combination of these 3 technologies enables you to create RESTful APIs for database backends – half your system – 10 times faster. Let’s briefly examine them below. Technology 1: Enterprise Pattern Automation There are well known patterns in the data domain, describing data structure and access via SQL. There are also well-known patterns for managing SQL data in the context of RESTful services. Well known patterns can be automated. Let’s imagine a service (say, a server accessed via a browser) that automates these patterns, as described below, just by connecting the service to a database: Schema Discovery – tables, views, stored procedures: The system creates a complete (default) API for each schema object. Note this includes Stored Procedures, which often represent a significant investment. Enterprise Pattern Automation: the resultant API provides well-known services for Filter, Sort, Pagination, Optimistic Locking, handling Generated Keys and so forth. So, the service has provided a default Enterprise-class API, instantly. So, literally seconds into your project, you can test your running API: Not enough, not done, but a great start. Technology 2: Declarative Declarative is the key (“what, not how”). It has had striking impacts on domains where there are well-understood underlying patterns. Max Tardiveau has put it well: Whatever can be declarative, will be declarative. For example, spreadsheets are declarative – and they gave birth to the PC industry. And SQL is declarative – itself an industry. Two game-changers. So, the challenge is to apply the spirit of declarative to REST integration and enforcement. The stakes are high – success can deliver breathtaking agility. Declarative Integration: Multi-Database Custom API, Point and Click Enterprise Pattern Automation provides a good start, but the API is not rich. It is a flat, single-table API, really just “restified” SQL. What we really need is Nested Documents – returning multiple types (e.g., an Order, a list of Items, and a list of contact names) in a single call can reduce latency (vs. a separate call for each type). REST is perfect for this. Multi-database APIs – a RESTful server provides the opportunity to integrate multiple databases in single call, shielding clients from underlying complexity. Nested Documents are easy: define them by simply selecting tables (via a User Interface or Command Line). Foreign Keys are used to default the joins. Add the ability to choose / alias columns, and we’re on the way to a pretty good API. But what about databases that have no Foreign Keys? Or multi-database APIs? Leveraging the schema does not mean we are limited to it. All we need to do is: Provide a means to define “Virtual” Foreign Keys for the service (i.e., stored outside the schema) Extend this to Foreign Keys between databases We now have a rich, multi-database API. Defined declaratively as shown below, no code required, running in minutes, ready for client development: Declarative Enforcement: Integrity Logic, with spreadsheet-like rules So now consider enforcement, specifically database integrity. A very significant portion of any project is the multi-table validations and computations that define how the data is processed. “Your code goes here” means, well, a lot of code. We need a more powerful, more declarative, paradigm. In a spreadsheet, you assign expressions to cells. Whenever the referenced data is changed, the cell is updated. Since the cells references can chain, a series of simple expressions can solve remarkably complex problems. What if we did the same for database data? We could assign derivation expressions to columns, and validation expressions to tables. Then, the API could “watch” for requests that change the referenced column, and recompute (efficiently) the calculated column. Just as in a spreadsheet, support for chaining and proper ordering is required and implicit. To address multi-table logic, such expressions would need to address references to related tables. It’s only at this point that the logic becomes seriously powerful. Let’s take an example. To check credit in a Customer / Purchaseorder / Lineitem application, we could define spreadsheet-like expressions such as: There is actually a sub-branch of declarative that addresses this: Reactive Programming. Here it’s declarative,since you don’t need to code a Observer handler. The result is that the logic above can be fully executable. No need to code Change Detection / Change Dependency – it’s invoked and enforced automatically by the API in reaction to RESTful updates. SQL handling is also implicit, including underlying optimizations (caching, pruning etc). The impact is massive – the 5 expressions above express the same logic as hundreds of lines of code. That’s a massive 40X more concise. Game changer. And quality goes up, since the rules are applied automatically. Declarative Enforcement: Security, filter expressions for role/table We can provide an analogous approach to security: define filter expressions for roles (like SalesRep), so that when a table is accessed by the role, the API adds the filter. That way, a user with that role sees only the rows for which they are authorized. Technology 3: Standards-based Extensibility Declarative is great, but you’re probably thinking “ok, but you can’t solve every problem declaratively”. And you’re dead right. Business Value requires that we integrate a declarative approach with a procedural one that is familiar, standards-based, and enables us to integrate existing software. Automatic JavaScript Object Model The first phase of many projects is to build an ORM for natural programmatic access to data: JPA, Hibernate, Entity Framework. It’s not a small project, and cumbersome to maintain as changes occur. In fact, the Object Model can be created directly from the schema. So, you’d have an object type for Purchaseorder, for Lineitem, and so forth. The model provides access to attributes and related data, and persistence services. You could then use it as shown below. JavaScript seems like the best language choice: reasonable across technology bases (everybody uses JavaScript), and its dynamic nature eliminates code generation hassles. JavaScript Events In addition to accessors and persistence, the JavaScript objects are Logic Aware. That is, the save operation above executes any rules associated with OrderAudit (e.g., updated-by), and JavaScript Events. Here is a sample event for the PurchaseOrder object, where you access the JavaScript Object Model via the system-supplied row variable: Extensible Logic Auditing is a common pattern. It should be possible to solve this once in a genericmanner, then re-use it (e.g, to audit employees, orders and so forth). So, Instant Enterprise REST should enable you to provide Extensible Logic – load your own JavaScript code, and invoke it. So, the code above could become: MyLibrary.auditFromTo(orderRow,"OrderAudit"); where auditFromTo creates an instance of OrderAudit, sets the foreign key, sets like-named attributes, and saves it. Pluggable Authentication Most organizations have existing data stores that identify users and their roles, such as Active Directory, LDAP, OAuth, etc. Security should integrate with such systems as a function of enforcing row/column access. Standard deployment Finally, the system should deploy in a familiar manner: available on the cloud, or an on-premise virtual appliance or war file. Standards also enable integration with related critical infrastructure, such as API Management, ERP Systems, etc. See a project in 3 minutes To see how it all fits together, you can view this video to see a full project built: from concept, through initial implementation, and an iteration cycle. Actual project time was about half an hour. Instant Enterprise REST: Business Agility Instant Enterprise REST enables us to close the Agility Gap in realizing the Software Driven Business vision. We can now create important portions of our software in largely business terms, rather than technical terms. This offers major advantages: Time to Market: spreadsheet-like rules are 40X more concise. Instant REST eliminates all the SQL / REST / JSON boilerplate. Simplicity: team members can learn the basics of Espresso in days, and be as productive as rocket scientists using alternative technologies Leverage Expertise and Software: Espresso is built on standards like REST, JavaScript, and Event Oriented Programming. You can call out to existing software, and extend the rule types by identifying your own patterns and loading their implementations into Espresso. Quality: at the defect level, automatic invocation and ordering eliminate large classes of bugs. At the architectural level, centralized enforcement factors logic out of the client buttons where it can be shared, audited for compliances, etc
June 30, 2015
by Val Huber DZone Core CORE
· 1,364 Views
article thumbnail
Using Parameterized Query to Avoid SQL Injection
introduction to explain why you have to use parameterized query to avoid sql injection over concatenated inline query it needs to know about sql injection. what does sql injection mean? it means when any end user send some invalid inputs to perform any crud operation or forcibly execute the wrong query into the database, those can be harmful for the database. harmful means ‘data loss’ or ‘get the data with invalid inputs. to know more, follow the below steps. step 1: create a table named ‘login’ in any database. create table user_login ( userid varchar(20), pwd varchar(20) ) now save some user credentials into the database for login purpose and select the table. insert into user_login values('rahul','bansal@123') insert into user_login values('bansal','rahul@123') step 2: create a website named ‘website1’. now i will create a login page named ‘default.aspx’ to validate the credentials from the ‘login’ table and if user is valid then redirect to it to the next page named ‘home.aspx’. add 2 textboxes for userid & password respectively and a button for login. add 2 namespaces in the .cs file of the ‘default.aspx’. using system.data.sqlclient; using system.data; now add the following code to validate the credentials from the database on click event of login button. protected void btn_login_click(object sender, eventargs e) { string constr = system.configuration.configurationmanager.connectionstrings["constr"].connectionstring; sqlconnection con = new sqlconnection(constr); string sql = "select count(userid) from user_login where userid='" + txtuserid.text + "' and pwd='" + txtpwd.text + "'"; sqlcommand cmd = new sqlcommand(sql, con); con.open(); object res = cmd.executescalar(); con.close(); if (convert.toint32(res) > 0) response.redirect("home.aspx"); else { response.write("invalid credentials"); return; } } add a new page named ‘home.aspx’. where any valid user will get welcome message. step 3: now run the ‘default’ page and log in with valid credentials. it will redirect to next page ‘home.aspx’ for valid user. note: here i have not used the textmode="password" property in password textbox to show the password. i have not used any input validations to explain my example. problem: now i will perform the sql injection with some invalid credentials with successful query execution and after that i will redirect to the next page ‘home.aspx’ as a valid user. i will enter a string in both textboxes like the following: ‘ or ‘1’=’1 now run the page and login with above string in both textboxes. it will redirect to next page name ‘home.aspx’ for valid user. see what happened. this is called sql injection in the hacking world. reason: it happened just because of the string and after filling this string in both textboxes orur sql query became like the following: select count(userid) from user_login where userid='' or '1'='1' and pwd='' or '1'='1' which will give the userid count and that is 2 in the table because 2 users are in ‘user_login’ table. it can be used in more ways like just fill the following string only in user id textbox and you will go the next page as valid user. or 1=1 - - and it will also give users count 2 because sqlquery will become like the following: select count(userid) from user_login where userid='' or 1=1 --' and pwd='' or '1'='1' note: the sign -- are for commenting the preceding text in sql. it can be more harmful or dangerous when the invalid user/hacker executes a script to drop all tables in the database or drop whole database. solution: to resolve this issue you have to do 2 things: always use parameterized query. input validations on client and server both side. sometimes if your input validation fail, then parameterized will not execute any scripted value. let’s see the example. protected void btn_login_click(object sender, eventargs e) { string constr = system.configuration.configurationmanager.connectionstrings["constr"].connectionstring; sqlconnection con = new sqlconnection(constr); string sql = "select count(userid) from user_login where userid=@userid and pwd=@pwd"; sqlcommand cmd = new sqlcommand(sql, con); sqlparameter[] param = new sqlparameter[2]; param[0] = new sqlparameter("@userid", txtuserid.text); param[1] = new sqlparameter("@pwd", txtpwd.text); cmd.parameters.add(param[0]); cmd.parameters.add(param[1]); con.open(); object res = cmd.executescalar(); con.close(); if (convert.toint32(res) > 0) response.redirect("home.aspx"); else { response.write("invalid credentials"); return; } } now if i run the page and try to login with sql scripts as done earlier. with ‘ or ‘1’=’1 with ' or 1=1 - - as you have seen parameterized didn’t execute the sql script but why? reason: the reason behind this the parameterized query would not be vulnerable and would instead look for a user id or password which literally matched the entire string. in other words ‘the sql engine checks each parameter to ensure that it is correct for its column and are treated literally, and not as part of the sql to be executed’. conclusion: always use parameterized query and input validations on client and server both side.
June 30, 2015
by Rahul Bansal
· 11,776 Views
article thumbnail
The Secret to More Efficient Data Science with Neo4j and R [OSCON Preview]
It’s a sad but true fact: Most data scientists spend 50-80% of their time cleaning and munging data and only a fraction of their time actually building predictive models. This is most often true in a traditional stack, where most of this data munging consists of writing lines upon lines of some flavor of SQL, leaving little time for model-building code in statistical programming languages such as R. These long, cryptic SQL queries not only slow development time but also prevent useful collaboration on analytics projects, as contributors struggle to understand each others’ SQL code. For example, in graduate school, I was on a project team where we used Oracle to store Twitter data. The kinds of queries my classmates and I were writing were unmaintainable and impossible to understand unless the author was sitting next to you. No one worked on the same queries together because they were so unwieldy. This not only hindered our collaboration efforts but also slowed our progress on the project. If we had been using an appropriate data store (like a graph database) we would have spent significantly less time pulling our hair out over the queries. Why Today’s Data Is Different This data-munging problem has persisted in the data science field because data is becoming increasingly social and highly-connected. Forcing this kind of interconnected data into an inherently tabular SQL database, where relationships are only abstract, leads to complicated schemas and overly complex queries. Yet, several NoSQL solutions – specifically in the graph database space – exist to store today’s highly-connected data. That is, data where relationships matter. A lot of data analysis today is performed in the context of better understanding people’s behavior or needs, such as: How likely is this visitor to click on advertisement X? Which products should I recommend to this user? How are User A and User B connected? Written by Nicole White People, as we know, are inherently social, so most of these questions can be answered by understanding the connections between people: User A is similar to User B, and we already know that User B likes this product, so let’s recommend this product to User A. The Good News: Data-Munging No More Data science doesn’t have to be 80% data munging. With the appropriate technology stack, a data scientist’s development process is seamless and short. It’s time to spend less time writing queries and more time building models by combining the flexibility of an open-source, NoSQL graph database with the maturity and breadth of R – an open-source statistical programming language. The combination of Neo4j’s ability to store highly-connected, possibly-unstructured data and R’s functional, ad-hoc nature creates the ideal data analysis environment. You don’t have to spend an hour writing CREATE TABLE statements. You don’t have to spend all day on StackOverflow figuring out how to traverse a tree in SQL. Just Cypher and go. Learn More at OSCON 2015 At my upcoming OSCON session we will walk through a project in which we analyze #OSCON Twitter data in a reproducible, low-effort workflow without writing a single line of SQL. For this highly-connected dataset we will use Neo4j, an open-source graph database, to store and query the data while highlighting the advantages of storing such data in a graph versus a relational schema. Finally, we will cover how to connect to Neo4j from an R environment for the purposes of performing common data science tasks, such as analysis, prediction and visualization.
June 30, 2015
by Mark Needham
· 1,634 Views
article thumbnail
Level Up Your Automated Tests
I presented a new talk at GOTO Chicago 2015 about how to change a team’s attitude towards writing automated tests. The talk covers the same case study as Groovy vs Java for Testing, adopting Spock in MongoDB, but this is a more process/agile/people perspective, not a technical look at the merits of one language over another. Slides available below. As always, the slides are not super-useful out of context, but they do contain my conclusions (also note that due to a technology fail, my hand-drawn style is even more hand-drawn than usual). Questions I sadly did not have a lot of time for questions during the presentation, but thanks to the wonders of modern technology, I have a list of unanswered questions which I will attempt to address here. Is testing to find out your system works? Or is it so you know when your system is broken? Excellent question. I would expect that if you have a system that’s in production (which is probably the large majority of the projects we work on), we can assume the system is working, for some definition of working. Automated testing is particularly good at catching when your system stops doing the things you thought it was doing when you wrote the tests (which may, or may not, mean the system is genuinely “broken”). Regression testing is to find out when your system is no longer doing what you expect, and automated tests are really good for this. But testing can also make sure you implement code that behaves the way you expect, especially if you write the tests first. Automated tests can be used to determine that your code is complete, according to some pre-agreed specification (in this case, the automated tests you wrote up front). So I guess what I’m trying to say is, when you first write the tests you have tests that, when they pass, proves the system works (assumingyour tests are testing the right things and/or not giving you false positives). Subsequent passes show that you haven’t broken anything. At what level do “tests documenting code” actually become useful? And who is/should the documentation be targeted to? In the presentation, my case study is the MongoDB Java Driver. Our users were Java programmers, who were going to be coding using our driver. So in this example, it makes a lot of sense to document the code using a language that our users understood. We started with Java, and ended up using Groovy because it was also understandable for our users and a bit more succinct. On a previous project we had different types of tests. The unit and system tests documented what the expected behaviour was at the class or module level, and was aimed at developers in the team. The acceptance tests were written in Java, but in a friendly DSL-style way. These were usually written by a triad of tester, business analyst and developer, and documented to all these guys and girls what the top-level behaviour should be. Our audience here was fairly technical though, so there was no need to go to the extent of trying to write English-language-style tests, they were readable enough for a reasonably techy (but non-programmer) audience. These were not designed to be read by “the business” - us developers might use them to answer questions about the behaviour of the system, but they didn’t document it in a way that just anyone could understand. These are two different approaches for two different-sized team/organisations, with different users. So I guess in summary the answer is “it depends”. But at the very least, developers on your own team should be able to read your tests and understand what the expected behaviour of the code is. How do you become a team champion? I.e. get authority and acceptance that people listen to you? In my case, it was just by accident - I happened to care about the tests being green and also being useful, so I moaned at people until it happened. But it’s not just about nagging, you get more buy-in if other people see you doing the right things the right way, and it’s not too painful for them to follow your example. There are going to be things that you care about that you’ll never get other people to care about, and this will be different from team to team. You have two choices here - if you care that much, and it bothers you that much, you have to do it yourself (often on your own time, especially if your boss doesn’t buy into it). Or, you have to let it go - when it comes to quality, there are so many things you could care about that it might be more beneficial to drop one cause and pick another that you can get people to care about. For example, I wanted us to use assertThat instead of assertFalse (or true, or equals, or whatever). I tried to demo the advantages (as I saw them) of my approach to the team, and tried to push this in code reviews, but in the end the other developers weren’t sold on the benefits, and from my point of view the benefits weren’t big enough to force the issue. Those of us who cared, used assertThat. For the rest, I was just happy people were writing and maintaining tests. So, pick your battles. You’ll be surprised at how many people do get on board with things. I thought implementing checkstyle and setting draconian formatting standards was going to be a tough battle, but in the end people were just happy to have any standards, especially when they were enforced by the build. Do you report test, style, coverage, etc failures separately? Why? We didn’t fail on coverage. Enforcing a coverage percentage is a really good way to end up with crappy tests, like for getters/setters and constructors (by the way, if there’s enough logic in your constructor that it needs a test, You’re Doing It Wrong). Generally different types of failures are found by different tools, so for this reason alone they will be reported separately - for example, checkstyle will fail the build if it doesn’t conform to our style standards, codenarc fails it for Groovy style failures, and Gradle will run the tests in a different task to these two. What’s actually important, though, is time-to-failure. For checkstyle, for example, it will fail on something silly like curly braces in the wrong place. You want this to fail within seconds, so you can fix the silly mistake quickly. Ideally you’d have IntelliJ (perhaps) run your checks before it even makes it into your CI environment. Compiler errors should, of course, fail things before you run a test, short-running tests should fail before long-running tests. Basically, the easier it is to fix the problem, the sooner you want to know, I guess. Our build was relatively small and not too complex, so actually we ran all our types of tests (integration and unit, both Groovy and Java) in a single task, because this turned out to be much quicker in Gradle (in our case) than splitting things up into a simple pipeline. You might have a reason to report stuff separately, but for me it’s much more important to understand how fast I need to be aware of a particular type of failure. Sometimes I find myself modifying code design and architecture to enable testing. How can I avoid damaging design? This is a great question, and a common one too. The short answer is: in general writing code that’s easier to test leads to a cleaner design anyway (for example, dependency injection at that appropriate places). If you find you need to rip your design apart to test it, there’s a smell there somewhere - either your design isn’t following SOLID principals, or you’re trying to test the wrong things. Of course, the common example here is testing private methods - how do you test these without exposing secrets? I think for me, if it’s important enough to be tested it’s important enough to be exposed in some way - it might belong in some sort of util or helper (right now I’m not going to go into whether utils or helpers are, in themselves a smell), in a smaller class that only provides this sort of functionality, or simply a protected method. Or, if you’re testing with Groovy, you can access private methods anyway so this becomes a moot point (i.e. your testing framework may be limiting you). In another story from LMAX, we found we had created methods just for testing. It seemed a bit wrong to have these methods only available for testing, but later on down the line, we needed access to many of these methods In Real Life (well, from our Admin app), so our testing had “found” a missing feature. When we came to implement it, it was pretty easy as we’d already done most of it for testing. My co-workers often point to a lack of end-to-end testing as the reason why a lot of bugs get out to production even though they don’t have much unit tests nor integration tests. What, in your experience, is a good balance between unit tests, integration tests and end-to-end testing? Hmm, sounds to me like “lack of tests” is your problem! This is a big (and contentious!) topic. Martin Fowler has written about it, Google wrote something I completely disagree with (so I’m not even going to link to it, but you’ll find references in the links in this paragraph), and my ex-colleague Adrian talks about what we, at LMAX, meant by end-to-end tests. I hope that’s enough to get you started, there’s plenty more out there too. How did you go about getting buy in from the team to use Spock? I cover this in my other presentation on the topic - the short version is, I did a week-long spike to investigate whether Spock would make testing easier for us, showed the pros and cons to the whole team, and then led by example writing tests that (I thought) were more readable than what we had before and, probably most importantly, much easier to write than what we were previously doing. I basically got buy-in by showing how much easier it was for us to use the tool than even JUnit (which we were all familiar with). It did help that we were already using Gradle, so we already had a development dependency on Groovy. It also helped that adding Spock made no changes to the dependencies of the final Jar, which was very important. Over time, further buy-in (certainly from management) came when the new tests started catching more errors - usually regressions in our code or regressions in the server’s overnight builds. I don’t think it was Spock specifically that caught more problems - I think it was writing more tests, and better tests, that caught the issues. Can we do data driven style tests in frameworks like junit or cucumber? I don’t think you can in JUnit (although maybe there’s something out there). I believe someone told me you can do it in TestNG. Are there drawbacks to having tests that only run in ci? I.e I have Java 8 on my machine, but the test requires Java 7 Yes, definitely - the drawback is Time. You have to commit your code to a branch that is being checked by CI and wait for CI to finish before you find the error. In practice, we found very little that was different between Java 7 and 8, for example, but this is a valid concern (otherwise you wouldn’t be testing a complex matrix of dependencies at all). In our case, our Java 6 driver used Netty for async capabilities, as the stuff we were using from Java 7 wasn’t available. This was clearly a different code path that wasn’t tested by us locally as we were all running Java 8. Probably more importantly for us is we were testing against at least 3 different major versions of the server, which all supported different features (and had different APIs). I would often find I’d broken the tests for version 2.2 as I’d only been running it on 2.6, and had forgotten to either turn off the new tests for the old server versions, or didn’t realise the new functionality wouldn’t work there. So the main drawback is time - it takes a lot longer to find out about these errors. There are a few ways to get around this: Commit often!! And to a branch that’s actually going to be run by CI Make your build as fast as possible, so you get failures fast (you should be doing this anyway) You could set up virtual machines locally or somewhere cloudy to run these configurations before committing, but that sounds kinda painful (and to my mind defeats a lot of the point of CI). I set up Travis on my fork of the project, so I could have that running a different version of Java and MongoDB when I committed to my own fork - I’d be able to see some errors before they made it into the “real” project. If you can, you probably want these specific tests run first so they can fail fast. E.g. if you’re running a Java 6 & MongoDB 2.2 configuration on CI, run those tests that only work in that environment first. Would probably need some Gradle magic, and/or might need you to separate these into a different set of folders. The advantage of this approach though is if you set up some aliases on your local machine you could sanity check just these special cases before checking in. For example, I had aliases to start MongoDB versions/configurations from a single command, and to set JAVA_HOME to whichever version I wanted. Do you have any tips for unit tests that pass on dev machines but not on Jenkins because it’s not as powerful as our own machines? E.g. Synchronous calls timeout on the Jenkins builds intermittently. Erk! Yes, not uncommon. No, not really. We had our timeouts set longer than I would have liked to prevent these sorts of errors, and they still intermittently failed. You can also set some sort of retry on the test, and get your build system to re-run those that fail to see if they pass later. It’s kinda nasty though. At LMAX they were able to take testing seriously enough to really invest in their testing architecture, and, of course, this is The Correct Answer. Just often very difficult to sell. If you ask where are tests and dev asks if code is correct? And you say yes. Then dev asks why you’re delaying shipping value, how do you manage that? These are my opinions: Your code is not complete without tests that show me it’s complete. Your code might do what you think it’s supposed to do right now, but given Shared Code Ownership, anyone can come in and change it at any time, you want tests in place to make sure they don’t change it to break what you thought it did The tests are not so much to show it works right now, the tests are to show it continues to work in future Having automated tests will speed you up in future. You can refactor more safely, you can fix bugs and know almost immediately if you broke something, you can read from the test what the author of the code thought the code should do, getting you up to speed faster. You don’t know you’re shipping value without tests - you’re only shipping code (to be honest, you never know if you’re shipping value until much later on when you also analyse if people are even using the feature). Testing almost never slows you down in the long run. Show me the bits of your code base which are poorly tested, and I bet I can show you the bits of your code base that frequently have bugs (either because the code is not really doing what the author thinks, or because subsequent changes break things in subtle ways). If you say code is hard to understand and dev asks if you seriously don’t understand the code, how do you explain you mean easy to understand without thinking rather than ‘can I compile this in my head’? I have zero problem with saying “I’m too stupid to understand this code, and I expect you’re much smarter than me for writing it. Can you please write it in a way so that a less smart person like myself won’t trample all over your beautiful code at a later date through lack of understanding?” By definition, code should be easy to understand by someone who’s not the author. If someone who is not the author says the code is hard to understand, then the code is hard to understand. This is not negotiable. This is what code reviews or pair programming should address. What is effective nagging like? (Whether or not you get what you want) Mmm, good question. Off the top of my head: Don’t make the people who are the target of the nagging feel stupid - they’ll get defensive. If necessary, take the burden of “stupidity” on yourself. E.g. “I’m just not smart enough to be able to tell if this test is failing because the test is bad or because the code is bad. Can you walk me through it and help me fix it?” Do at least your fair share of the work, if not more. When I wanted to get the code to a state where we could fail style errors, I fixed 99% of the problems, and delegated the handful of remaining ones that I just didn’t have the context to fix. In the face of three errors to fix each, the team could hardly say “no” after I’d fixed over 6000. Explain why things need to be done. Developers are adults and don’t want to be treated like children. Give them a good reason and they’ll follow the rules. The few times I didn’t have good reasons, I could not get the team to do what I wanted. Find carrots and sticks that work. At LMAX, a short e-mail at the start of the day summarising the errors that had happened overnight, who seemed to be responsible, and whether they looked like real errors or intermittencies, was enough to get people to fix their problems2 - they didn’t like to look bad, but they also had enough information to get right on it, they didn’t have to wade through all the build info. On occasion, when people were ignoring this, I’d turn up to work with bags of chocolate that I’d bought with my own money, offering chocolate bars to anyone who fixed up the tests. I was random with my carrot offerings so people didn’t game the system. Give up if it’s not working. If you’ve tried to phrase the “why” in a number of ways, if you’ve tried to show examples of the benefits, if you’ve tried to work the results you want into a process, but it’s still not getting done, just accept the fact that this isn’t working for the team. Move on to something else, or find a new angle. 1 I had a colleague at LMAX who was working with a hypothesis that All Private Methods Were Evil - they were clearly only sharable within single class, so provided no reuse elsewhere, and if you have the same bit of code being called multiple times from within the same class (but it’s not valuable elsewhere) then maybe your design is wrong. I’m still pondering this specific hypothesis 4 years on, and I admit I see its pros and cons. 2 This worked so well that this process was automated by one of the guys and turned into a tool called AutoTrish, which as far as I know is still used at LMAX. Dave Farley talks about it in some of hisContinuous Delivery talks. Resources My talk that specifically looks at the advantages of Spock over JUnit, plus some Spock-specific resources. I love Jay Fields book Working Effectively With Unit Tests - if I could have made the whole team read this before moving to Spock, we might have stuck with JUnit. Go read everything Adrian Sutton has written about testing at LMAX. If not everything, definitely Abstraction by DSL and Making End-to-End Tests Work If you can’t make it all the way through Dave Farley and Jez Humble’s excellent Continuous Delivery book, do take a look at one of Dave’s presentations on the subject, for example The Rationale for Continuous Delivery or The Process, Technology and Practice of Continuous Delivery - my own talk was around testing, but I’m working off the assumption that you’re at least running some sort of Continuous Integration, if not Continuous Delivery. Martin Fowler has loads of interesting and useful articles on testing. Abstract What can you do to help developers a) write tests b) write meaningful tests and c) write readable tests? Trisha will talk about her experiences of working in a team that wanted to build quality into their new software version without a painful overhead - without a QA / Testing team, without putting in place any formal processes, without slavishly improving the coverage percentage. The team had been writing automated tests and running them in a continuous integration environment, but they were simply writing tests as another tick box to check, there to verify the developer had done what the developer had aimed to do. The team needed to move to a model where tests provided more than this. The tests needed to: Demonstrate that the library code was meeting the requirements Document in a readable fashion what those requirements were, and what should happen under non-happy-path situations Provide enough coverage so a developer could confidently refactor the code This talk will cover how the team selected a new testing framework (Spock, a framework written in Groovy that can be used to test JVM code) to aid with this effort, and how they evaluated whether this tool would meet the team’s needs. And now, two years after starting to use Spock, Trisha can talk about how both the tool and the shift in the focus of the purpose of tests has affected the quality of the code. And, interestingly, the happiness of the developers.
June 29, 2015
by Trisha Gee
· 2,051 Views
article thumbnail
Persistence and DAO Testing Made Simple (with Exparity-Stub and Hamcrest-Bean)
Persistence of model objects is a part of many Java projects and a part which deserves, and often gets, high test coverage as one of the key layer integration points in the code. However, I've often felt the testing paradigms for this can be cumbersome, often involving a large amount of setup with an equivalent amount of validation. This can be tedious to both create and maintain. As a solution to this I've been testing persistence with a different pattern; by combining both the exparity-stub and the hamcrest-bean library you can thoroughly test model persistence in a few lines of test code as per the snippet below; .. User user = aRandomInstanceOf(User.class); User saved = dao.save(user); assertThat(dao.getUserById(saved.getId()), theSameBeanAs(saved)); The test snippet above is small but in those few lines will thoroughly test that all fields in a graph can be persisted and retrieved without loss, that any JPA or other mapping is valid, and that your queries are valid. For a complete example we'll work through testing a simple DAO for storing and retrieving User objects using the in-memory H2 database for simplicity. The same example will work for any persistence mechanism. Before we get started with an example lets briefly outline what the libraries are and what they do. The Exparity-Stub Library The exparity-stub libraries provides a set of static methods for creating stubs of model objects, object graphs, collections, types, and primitive types. For our example we'll be creating random stubs because we want to completely fill the graph with junk data and check it can be written down. exparity-stub offers two approaches to this, the RandomBuilder or the BeanBuilder. The RandomBuilder provides a terser notation to create random objects with less code. For example: User user = RandomBuilder.aRandomInstanceOf(User.class); List users = RandomBuilder.aRandomListOf(User.class); String anyString = RandomBuilder.aRandomString(); Whereas the BeanBuilder provides a fluent interface with finer control for building individual objects and graphs, for example; User user = BeanBuilder.aRandomInstanceOf(User.class) .excludeProperty("Id").build(); For this example i'm going to use the BeanBuilder so I can exclude the User.Id property from being populated by the random builder. The Hamcrest-Bean Library The hamcrest-bean library is an extension library to the Java Hamcrest library. The hamcrest-bean library provides a set of matchers specifically for testing Java objects and object graphs and performs deep inspections of those objects. It supports exclusions and overrides to allow fine control, if required, of how matching of any property, path, or type is handled, for example: User expected = new User("Jane", "Doe"); assertThat(new User("John", "Doe"), BeanMatchers.theSameAs(expected).excludeProperty("FirstName")); A Sample Project The sample project I'll work through is persistence of a simple User object with a child list of UserComment objects. This simple graph will be persisted to a H2 database with hibernate handling the Object-Relational Mapping (ORM) mapping, and Java Persistence Annotation (JPA) used to mark-up the model. The Model Below are the two model classes; first the User class. package org.exparity.hamcrest.bean.sample.dao; import java.util.*; import javax.persistence.*; @Entity @Table public class User { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE) private Long id; private Date createTs; private String username, firstName, surname; @OneToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER) private List comments = new ArrayList<>(); public Long getId() { return id; } public void setId(Long id) { this.id = id; } public Date getCreateTs() { return createTs; } public void setCreateTs(Date createTs) { this.createTs = createTs; } public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getSurname() { return surname; } public void setSurname(String surname) { this.surname = surname; } public List getComments() { return comments; } public void setComments(List comments) { this.comments = comments; } } Followed by the UserComment class. package org.exparity.hamcrest.bean.sample.dao; import java.util.Date; import javax.persistence.*; @Table @Entity public class UserComment { private Long id; private Date timestamp; @Transient private String text; private String title; public Date getTimestamp() { return timestamp; } public void setTimestamp(Date timestamp) { this.timestamp = timestamp; } public String getText() { return text; } public void setText(String text) { this.text = text; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } } Followed by the UserComment class. package org.exparity.hamcrest.bean.sample.dao; import java.util.Date; import javax.persistence.*; @Table @Entity public class UserComment { private Long id; private Date timestamp; @Transient private String text; private String title; public Date getTimestamp() { return timestamp; } public void setTimestamp(Date timestamp) { this.timestamp = timestamp; } public String getText() { return text; } public void setText(String text) { this.text = text; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } } The Data Access Object (DAO) Next up we write our DAO layer. I've excluded the UserDAO interface from this post but it is available in the sample project ongithub .The full, if somewhat crude, implementation of the UserDAO is below. package org.exparity.hamcrest.bean.sample.dao; import org.hibernate.boot.registry.StandardServiceRegistryBuilder; import org.hibernate.cfg.Configuration; import org.hibernate.*; public class UserDAOHibernateImpl implements UserDAO { private final SessionFactory factory; public UserDAOHibernateImpl(final String resourceFile) { this.factory = new Configuration() .addAnnotatedClass(User.class) .addAnnotatedClass(UserComment.class) .buildSessionFactory( new StandardServiceRegistryBuilder() .loadProperties(resourceFile) .build()); } @Override public User save(final User user) { Session session = factory.getCurrentSession(); Transaction txn = session.beginTransaction(); try { session.save(user); txn.commit(); } catch (final Exception e) { txn.rollback(); } return user; } @Override public User getUserById(Long userId) { Session session = factory.getCurrentSession(); Transaction txn = session.beginTransaction(); try { return (User) session.get(User.class, userId); } finally { txn.rollback(); } } } Integration Test And finally, onto our integration test. The hibernate.properties will create an instance of an in-memory database and create the necessary tables on instantiation of the DAO. hibernate.dialect=org.hibernate.dialect.H2Dialect hibernate.connection.username=sa hibernate.connection.password= hibernate.connection.driver_class=org.h2.Driver hibernate.connection.url=jdbc:h2:mem:test hibernate.current_session_context_class=thread hibernate.cache.provider_class=org.hibernate.cache.internal.NoCacheProvider hibernate.show_sql=true hibernate.hbm2ddl.auto=update The integration test is below. package org.exparity.hamcrest.bean.sample.dao; import static org.exparity.hamcrest.BeanMatchers.theSameBeanAs; import static org.exparity.stub.bean.BeanBuilder.aRandomInstanceOf; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.*; import org.junit.Test; public class UserDAOHibernateImplTest { @Test public void canSaveAUser() { User user = aRandomInstanceOf(User.class).excludeProperty("Id").build(); UserDAOHibernateImpl dao = new UserDAOHibernateImpl("hibernate.properties"); User saved = dao.save(user); User loaded = dao.getUserById(saved.getId()); assertThat(loaded, not(sameInstance(user))); assertThat(loaded, theSameBeanAs(user)); } } Let's break the test down step by step to see what each step is doing and why the test is put together this way. 1) Model Setup User user = aRandomInstanceOf(User.class).excludeProperty("Id").build(); Create a random instance of the User class and it's associates using exparity-stub. The instance will be populated with random data with the exception of the Id property. I've excluded the Id property so that is left null to test that the id is being generated in the database. 2) DAO Setup UserDAOHibernateImpl dao = new UserDAOHibernateImpl("hibernate.properties") Instantiate the DAO ready to be tested, passing in the property file to use for the test. The hibernate properties used will configure an in-memory instance of H2 and create the schema automatically. 3) Exercise the DAO User saved = dao.save(user); User loaded = dao.getUserById(saved.getId()); Save the random instance of the model set up in step (1) and then query the object back out again. 4) Verify the results assertThat(loaded, not(sameInstance(user))); assertThat(loaded, theSameBeanAs(user)); The first line verifies that the loaded User instance is not the same instance as the originally saved User. This prevents false positive results when the loaded instance is returned directly from a cache. The second line uses hamcrest-bean to perform a deep comparison of the loaded User instance against the original user instance. Running the Test The first run of the test yields an error; specifically a hibernate warning because a @Id annotation has been missed on UserComment. org.hibernate.AnnotationException: No identifier specified for entity: org.exparity.hamcrest.bean.sample.dao.UserComment at org.hibernate.cfg.InheritanceState.determineDefaultAccessType(InheritanceState.java:277) at org.hibernate.cfg.InheritanceState.getElementsToProcess(InheritanceState.java:224) at org.hibernate.cfg.AnnotationBinder.bindClass(AnnotationBinder.java:775) at org.hibernate.cfg.Configuration$MetadataSourceQueue.processAnnotatedClassesQueue(Configuration.java:3845) at org.hibernate.cfg.Configuration$MetadataSourceQueue.processMetadata(Configuration.java:3799) at org.hibernate.cfg.Configuration.secondPassCompile(Configuration.java:1412) at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1846) at org.exparity.hamcrest.bean.sample.dao.UserDAOHibernateImpl.(UserDAOHibernateImpl.java:15) at org.exparity.hamcrest.bean.sample.dao.UserDAOHibernateImplTest.canSaveAUser(UserDAOHibernateImplTest.java:18) A fix to the UserComment object and we can run the test again. @Table @Entity public class UserComment { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE) private Long id; private Date timestamp; @Transient private String text; private String title; ... After running the test again we get another failure. The presence of the @Transient annotation on the UserComment.text property is preventing the value being persisted java.lang.AssertionError: Expected: the same as but: User.Comments[0].Text is null instead of "mDAWDJXbheIHbbHLR1NNVJqAki49RvaVwQtKD38r79u0y3MTDD" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) at org.exparity.hamcrest.bean.sample.dao.UserDAOHibernateImplTest.canSaveAUser(UserDAOHibernateImplTest.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) Another change to the UserComment object to remove the @Transient annotation and we can run the test again. @Table @Entity public class UserComment { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE) private Long id; private Date timestamp; private String text; private String title; ... After running the test again it all passes. Try It Out To try hamcrest-bean and exparity-stub out for yourself include the dependency in your maven pom or other dependency manager. org.exparity hamcrest-bean 1.0.10 test org.exparity exparity-stub 1.1.5 test
June 29, 2015
by Stewart Bissett
· 3,215 Views
article thumbnail
Generating JSON Schema from XSD with JAXB and Jackson
In this post, I demonstrate one approach for generating JSON Schema from an XML Schema (XSD). While providing an overview of an approach for creating JSON Schema from XML Schema, this post also demonstrates use of a JAXB implementation (xjc version 2.2.12-b150331.1824 bundled with JDK 9 [build 1.9.0-ea-b68]) and of a JSON/Java binding implementation (Jackson 2.5.4). The steps of this approach for generating JSON Schema from an XSD can be summarized as: Apply JAXB's xjc compiler to generate Java classes from XML Schema (XSD). Apply Jackson to generate JSON schema from JAXB-generated Java classes. Generating Java Classes from XSD with JAXB's xjc For purposes of this discussion, I'll be using the simple Food.xsd used in my previous blog post A JAXB Nuance: String Versus Enum from Enumerated Restricted XSD String. For convenience, I have reproduced that simple schema here without the XML comments specific to that earlier blog post: Food.xsd It is easy to use the xjc command line tool provided by the JDK-provided JAXB implementation to generate Java classes corresponding to this XSD. The next screen snapshot shows this process using the command: xjc -d jaxb .\Food.xsd This simple command generates Java classes corresponding to the provided Food.xsd and places those classes in the specified "jaxb" subdirectory. Generating JSON from JAXB-Generated Classes with Jackson With the JAXB-generated classes now available, Jackson can be applied to these classes to generate JSON from the Java classes. Jackson is described on its main portal page as "a multi-purpose Java library for processing" that is "inspired by the quality and variety of XML tooling available for the Java platform." The existence of Jackson and similar frameworks and libraries appears to be one of the reasons that Oracle hasdropped the JEP 198 ("Light-Weight JSON API") from Java SE 9. [It's worth noting that Java EE 7 already hasbuilt-in JSON support with its implementation of JSR 353 ("Java API for JSON Processing"), which is not associated with JEP 198).] One of the first steps of applying Jackson to generating JSON from our JAXB-generated Java classes is to acquire and configure an instance of Jackson's ObjectMapper class. One approach for accomplishing this is shown in the next code listing. Acquiring and Configuring Jackson ObjectMapper for JAXB Serialization/Deserialization /** * Create instance of ObjectMapper with JAXB introspector * and default type factory. * * @return Instance of ObjectMapper with JAXB introspector * and default type factory. */ private ObjectMapper createJaxbObjectMapper() { final ObjectMapper mapper = new ObjectMapper(); final TypeFactory typeFactory = TypeFactory.defaultInstance(); final AnnotationIntrospector introspector = new JaxbAnnotationIntrospector(typeFactory); // make deserializer use JAXB annotations (only) mapper.getDeserializationConfig().with(introspector); // make serializer use JAXB annotations (only) mapper.getSerializationConfig().with(introspector); return mapper; } The above code listing demonstrates acquiring the Jackson ObjectMapper instance and configuring it to use a default type factory and a JAXB-oriented annotation introspector. With the Jackson ObjectMapper instantiated and appropriately configured, it's easy to use thatObjectMapper instance to generate JSON from the generated JAXB classes. One way to accomplish this using the deprecated Jackson class JsonSchema is demonstrated in the next code listing. Generating JSON from Java Classes with Deprecated com.fasterxml.jackson.databind.jsonschema.JsonSchema Class /** * Write JSON Schema to standard output based upon Java source * code in class whose fully qualified package and class name * have been provided. * * @param mapper Instance of ObjectMapper from which to * invoke JSON schema generation. * @param fullyQualifiedClassName Name of Java class upon * which JSON Schema will be extracted. */ private void writeToStandardOutputWithDeprecatedJsonSchema( final ObjectMapper mapper, final String fullyQualifiedClassName) { try { final JsonSchema jsonSchema = mapper.generateJsonSchema(Class.forName(fullyQualifiedClassName)); out.println(jsonSchema); } catch (ClassNotFoundException cnfEx) { err.println("Unable to find class " + fullyQualifiedClassName); } catch (JsonMappingException jsonEx) { err.println("Unable to map JSON: " + jsonEx); } } The code in the above listing instantiates acquires the class definition of the provided Java class (the highest level Food class generated by the JAXB xjc compiler in my example) and passes that reference to the JAXB-generated class to ObjectMapper's generateJsonSchema(Class) method. The deprecated JsonSchemaclass's toString() implementation is very useful and makes it easy to write out the JSON generated from the JAXB-generated classes. For purposes of this demonstration, I provide the demonstration driver as a main(String[]) function. That function and the entire class to this point (including methods shown above) is provided in the next code listing. JsonGenerationFromJaxbClasses.java, Version 1 package dustin.examples.jackson; import com.fasterxml.jackson.databind.AnnotationIntrospector; import com.fasterxml.jackson.databind.JsonMappingException; import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.type.TypeFactory; import com.fasterxml.jackson.module.jaxb.JaxbAnnotationIntrospector; import com.fasterxml.jackson.databind.jsonschema.JsonSchema; import static java.lang.System.out; import static java.lang.System.err; /** * Generates JavaScript Object Notation (JSON) from Java classes * with Java API for XML Binding (JAXB) annotations. */ public class JsonGenerationFromJaxbClasses { /** * Create instance of ObjectMapper with JAXB introspector * and default type factory. * * @return Instance of ObjectMapper with JAXB introspector * and default type factory. */ private ObjectMapper createJaxbObjectMapper() { final ObjectMapper mapper = new ObjectMapper(); final TypeFactory typeFactory = TypeFactory.defaultInstance(); final AnnotationIntrospector introspector = new JaxbAnnotationIntrospector(typeFactory); // make deserializer use JAXB annotations (only) mapper.getDeserializationConfig().with(introspector); // make serializer use JAXB annotations (only) mapper.getSerializationConfig().with(introspector); return mapper; } /** * Write out JSON Schema based upon Java source code in * class whose fully qualified package and class name have * been provided. * * @param mapper Instance of ObjectMapper from which to * invoke JSON schema generation. * @param fullyQualifiedClassName Name of Java class upon * which JSON Schema will be extracted. */ private void writeToStandardOutputWithDeprecatedJsonSchema( final ObjectMapper mapper, final String fullyQualifiedClassName) { try { final JsonSchema jsonSchema = mapper.generateJsonSchema(Class.forName(fullyQualifiedClassName)); out.println(jsonSchema); } catch (ClassNotFoundException cnfEx) { err.println("Unable to find class " + fullyQualifiedClassName); } catch (JsonMappingException jsonEx) { err.println("Unable to map JSON: " + jsonEx); } } /** * Accepts the fully qualified (full package) name of a * Java class with JAXB annotations that will be used to * generate a JSON schema. * * @param arguments One argument expected: fully qualified * package and class name of Java class with JAXB * annotations. */ public static void main(final String[] arguments) { if (arguments.length < 1) { err.println("Need to provide the fully qualified name of the highest-level Java class with JAXB annotations."); System.exit(-1); } final JsonGenerationFromJaxbClasses instance = new JsonGenerationFromJaxbClasses(); final String fullyQualifiedClassName = arguments[0]; final ObjectMapper objectMapper = instance.createJaxbObjectMapper(); instance.writeToStandardOutputWithDeprecatedJsonSchema(objectMapper, fullyQualifiedClassName); } } To run this relatively generic code against the Java classes generated by JAXB's xjc based upon Food.xsd, I need to provide the fully qualified package name and class name of the highest-level generated class. In this case, that's com.blogspot.marxsoftware.foodxml.Food (package name is based on the XSD's namespace because I did not explicitly override that when running xjc). When I run the above code with that fully qualified class name and with the JAXB classes and Jackson libraries on the classpath, I see the following JSON written to standard output. Generated JSON {"type":"object","properties":{"vegetable":{"type":"string","enum":["CARROT","SQUASH","SPINACH","CELERY"]},"fruit":{"type":"string"},"dessert":{"type":"string","enum":["PIE","CAKE","ICE_CREAM"]}} Humans (which includes many developers) prefer prettier print than what was just shown for the generated JSON. We can tweak the implementation of the demonstration class's methodwriteToStandardOutputWithDeprecatedJsonSchema(ObjectMapper, String) as shown below to write out indented JSON that better reflects its hierarchical nature. This modified method is shown next. Modified writeToStandardOutputWithDeprecatedJsonSchema(ObjectMapper, String) to Write Indented JSON /** * Write out indented JSON Schema based upon Java source * code in class whose fully qualified package and class * name have been provided. * * @param mapper Instance of ObjectMapper from which to * invoke JSON schema generation. * @param fullyQualifiedClassName Name of Java class upon * which JSON Schema will be extracted. */ private void writeToStandardOutputWithDeprecatedJsonSchema( final ObjectMapper mapper, final String fullyQualifiedClassName) { try { final JsonSchema jsonSchema = mapper.generateJsonSchema(Class.forName(fullyQualifiedClassName)); out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(jsonSchema)); } catch (ClassNotFoundException cnfEx) { err.println("Unable to find class " + fullyQualifiedClassName); } catch (JsonMappingException jsonEx) { err.println("Unable to map JSON: " + jsonEx); } catch (JsonProcessingException jsonEx) { err.println("Unable to process JSON: " + jsonEx); } } When I run the demonstration class again with this modified method, the JSON output is more aesthetically pleasing: Generated JSON with Indentation Communicating Hierarchy { "type" : "object", "properties" : { "vegetable" : { "type" : "string", "enum" : [ "CARROT", "SQUASH", "SPINACH", "CELERY" ] }, "fruit" : { "type" : "string" }, "dessert" : { "type" : "string", "enum" : [ "PIE", "CAKE", "ICE_CREAM" ] } } } I have been using Jackson 2.5.4 in this post. The classcom.fasterxml.jackson.databind.jsonschema.JsonSchema is deprecated in that version with the comment, "Since 2.2, we recommend use of external JSON Schema generator module." Given that, I now look at using the new preferred approach (Jackson JSON Schema Module approach). The most significant change is to use the JsonSchema class in the com.fasterxml.jackson.module.jsonSchemapackage rather than using the JsonSchema class in the com.fasterxml.jackson.databind.jsonschema package. The approaches for obtaining instances of these different versions of JsonSchema classes are also different. The next code listing demonstrates using the newer, preferred approach for generating JSON from Java classes. Using Jackson's Newer and Preferred com.fasterxml.jackson.module.jsonSchema.JsonSchema /** * Write out JSON Schema based upon Java source code in * class whose fully qualified package and class name have * been provided. This method uses the newer module JsonSchema * class that replaces the deprecated databind JsonSchema. * * @param fullyQualifiedClassName Name of Java class upon * which JSON Schema will be extracted. */ private void writeToStandardOutputWithModuleJsonSchema( final String fullyQualifiedClassName) { final SchemaFactoryWrapper visitor = new SchemaFactoryWrapper(); final ObjectMapper mapper = new ObjectMapper(); try { mapper.acceptJsonFormatVisitor(mapper.constructType(Class.forName(fullyQualifiedClassName)), visitor); final com.fasterxml.jackson.module.jsonSchema.JsonSchema jsonSchema = visitor.finalSchema(); out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(jsonSchema)); } catch (ClassNotFoundException cnfEx) { err.println("Unable to find class " + fullyQualifiedClassName); } catch (JsonMappingException jsonEx) { err.println("Unable to map JSON: " + jsonEx); } catch (JsonProcessingException jsonEx) { err.println("Unable to process JSON: " + jsonEx); } } The following table compares usage of the two Jackson JsonSchema classes side-by-side with the deprecated approach shown earlier on the left (adapted a bit for this comparison) and the recommended newer approach on the right. Both generate the same output for the same given Java class from which JSON is to be written. /** * Write out JSON Schema based upon Java source code in * class whose fully qualified package and class name have * been provided. This method uses the deprecated JsonSchema * class in the "databind.jsonschema" package * {@see com.fasterxml.jackson.databind.jsonschema}. * * @param fullyQualifiedClassName Name of Java class upon * which JSON Schema will be extracted. */ private void writeToStandardOutputWithDeprecatedDatabindJsonSchema( final String fullyQualifiedClassName) { final ObjectMapper mapper = new ObjectMapper(); try { final com.fasterxml.jackson.databind.jsonschema.JsonSchema jsonSchema = mapper.generateJsonSchema(Class.forName(fullyQualifiedClassName)); out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(jsonSchema)); } catch (ClassNotFoundException cnfEx) { err.println("Unable to find class " + fullyQualifiedClassName); } catch (JsonMappingException jsonEx) { err.println("Unable to map JSON: " + jsonEx); } catch (JsonProcessingException jsonEx) { err.println("Unable to process JSON: " + jsonEx); } } /** * Write out JSON Schema based upon Java source code in * class whose fully qualified package and class name have * been provided. This method uses the newer module JsonSchema * class that replaces the deprecated databind JsonSchema. * * @param fullyQualifiedClassName Name of Java class upon * which JSON Schema will be extracted. */ private void writeToStandardOutputWithModuleJsonSchema( final String fullyQualifiedClassName) { final SchemaFactoryWrapper visitor = new SchemaFactoryWrapper(); final ObjectMapper mapper = new ObjectMapper(); try { mapper.acceptJsonFormatVisitor(mapper.constructType(Class.forName(fullyQualifiedClassName)), visitor); final com.fasterxml.jackson.module.jsonSchema.JsonSchema jsonSchema = visitor.finalSchema(); out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(jsonSchema)); } catch (ClassNotFoundException cnfEx) { err.println("Unable to find class " + fullyQualifiedClassName); } catch (JsonMappingException jsonEx) { err.println("Unable to map JSON: " + jsonEx); } catch (JsonProcessingException jsonEx) { err.println("Unable to process JSON: " + jsonEx); } } This blog post has shown two approaches using different versions of classes with name JsonSchema provided by Jackson to write JSON based on Java classes generated from an XSD with JAXB's xjc. The overall process demonstrated in this post is one approach for generating JSON Schema from XML Schema.
June 29, 2015
by Dustin Marx
· 96,540 Views · 6 Likes
article thumbnail
R: Scraping the Release Dates of Github Projects
Continuing on from my blog post about scraping Neo4j’s release dates I thought it’d be even more interesting to chart the release dates of some github projects. In theory the release dates should be accessible through the github API but the few that I looked at weren’t returning any data so I scraped the data together. We’ll be using rvest again and I first wrote the following function to extract the release versions and dates from a single page: library(dplyr) library(rvest) process_page = function(releases, session) { rows = session %>% html_nodes("ul.release-timeline-tags li") for(row in rows) { date = row %>% html_node("span.date") version = row %>% html_node("div.tag-info a") if(!is.null(version) && !is.null(date)) { date = date %>% html_text() %>% str_trim() version = version %>% html_text() %>% str_trim() releases = rbind(releases, data.frame(date = date, version = version)) } } return(releases) } Let’s try it out on the Cassandra release page and see what it comes back with: > r = process_page(data.frame(), html_session("https://github.com/apache/cassandra/releases")) > r date version 1 Jun 22, 2015 cassandra-2.1.7 2 Jun 22, 2015 cassandra-2.0.16 3 Jun 8, 2015 cassandra-2.1.6 4 Jun 8, 2015 cassandra-2.2.0-rc1 5 May 19, 2015 cassandra-2.2.0-beta1 6 May 18, 2015 cassandra-2.0.15 7 Apr 29, 2015 cassandra-2.1.5 8 Apr 1, 2015 cassandra-2.0.14 9 Apr 1, 2015 cassandra-2.1.4 10 Mar 16, 2015 cassandra-2.0.13 That works pretty well but it’s only one page! To get all the pages we can use the follow_link function to follow the ‘Next’ link until there aren’t anymore pages to process. We end up with the following function to do this: find_all_releases = function(starting_page) { s = html_session(starting_page) releases = data.frame() next_page = TRUE while(next_page) { possibleError = tryCatch({ releases = process_page(releases, s) s = s %>% follow_link("Next") }, error = function(e) { e }) if(inherits(possibleError, "error")){ next_page = FALSE } } return(releases) } Let’s try it out starting from the Cassandra page: > cassandra = find_all_releases("https://github.com/apache/cassandra/releases") Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.13 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.10 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.8 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.2.13 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-2.0.0-rc1 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.2.3 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.2.0-beta2 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.0.10 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.0.6 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-1.0.0-rc2 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.7.7 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.7.4 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.7.0-rc3 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.6.4 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.5.0-rc3 Navigating to https://github.com/apache/cassandra/releases?after=cassandra-0.4.0-final > cassandra %>% sample_n(10) date version 151 Mar 13, 2010 cassandra-0.5.0-rc2 25 Jul 3, 2014 cassandra-1.2.18 51 Jul 27, 2013 cassandra-1.2.8 21 Aug 19, 2014 cassandra-2.1.0-rc6 73 Sep 24, 2012 cassandra-1.2.0-beta1 158 Mar 13, 2010 cassandra-0.4.0-rc2 113 May 20, 2011 cassandra-0.7.6-2 15 Oct 24, 2014 cassandra-2.1.1 103 Sep 15, 2011 cassandra-1.0.0-beta1 93 Nov 29, 2011 cassandra-1.0.4 I want to plot when the different releases happened in time and in order to do that we need to create an extra column containing the ‘release series’ which we can do with the following transformation: series = function(version) { parts = strsplit(as.character(version), "\\.") return(unlist(lapply(parts, function(p) paste(p %>% unlist %>% head(2), collapse = ".")))) } bySeries = cassandra %>% mutate(date2 = mdy(date), series = series(version), short_version = gsub("cassandra-", "", version), short_series = series(short_version)) > bySeries %>% sample_n(10) date version date2 series short_version short_series 3 Jun 8, 2015 cassandra-2.1.6 2015-06-08 cassandra-2.1 2.1.6 2.1 161 Mar 13, 2010 cassandra-0.4.0-beta1 2010-03-13 cassandra-0.4 0.4.0-beta1 0.4 62 Feb 15, 2013 cassandra-1.1.10 2013-02-15 cassandra-1.1 1.1.10 1.1 153 Mar 13, 2010 cassandra-0.5.0-beta2 2010-03-13 cassandra-0.5 0.5.0-beta2 0.5 37 Feb 7, 2014 cassandra-2.0.5 2014-02-07 cassandra-2.0 2.0.5 2.0 36 Feb 7, 2014 cassandra-1.2.15 2014-02-07 cassandra-1.2 1.2.15 1.2 29 Jun 2, 2014 cassandra-2.1.0-rc1 2014-06-02 cassandra-2.1 2.1.0-rc1 2.1 21 Aug 19, 2014 cassandra-2.1.0-rc6 2014-08-19 cassandra-2.1 2.1.0-rc6 2.1 123 Feb 16, 2011 cassandra-0.7.2 2011-02-16 cassandra-0.7 0.7.2 0.7 135 Nov 1, 2010 cassandra-0.7.0-beta3 2010-11-01 cassandra-0.7 0.7.0-beta3 0.7 Now let’s plot those releases and see what we get: ggplot(aes(x = date2, y = short_series), data = bySeries %>% filter(!grepl("beta|rc", short_version))) + geom_text(aes(label=short_version),hjust=0.5, vjust=0.5, size = 4, angle = 90) + theme_bw() An interesting thing we can see from this visualisation is what overlap the various series of versions have. Most of the time there are only two series of versions overlapping but the 1.2, 2.0 and 2.1 series all overlap which is unusual. In this chart we excluded all beta and RC versions. Let’s bring those back in and just show the last 3 versions: ggplot(aes(x = date2, y = short_series), data = bySeries %>% filter(grepl("2\\.[012]\\.|1\\.2\\.", short_version))) + geom_text(aes(label=short_version),hjust=0.5, vjust=0.5, size = 4, angle = 90) + theme_bw() From this chart it’s clearer that the 2.0 and 2.1 series have recent releases so there will probably be three overlapping versions when the 2.2 series is released as well. The chart is still a bit cluttered although less than before. I’m not sure of a better way of visualising this type of data so if you have any ideas do let me know!
June 29, 2015
by Mark Needham
· 1,418 Views
article thumbnail
How to Watch Next: Leap Second in Java
What is a Leap Second? As some know - there will be a new leap second at the end of this month. Leap seconds are inserted into the standard UTC time scale by help of label "60" either at the end of June or December (exceptionally also in March and September) in irregular intervals in order to compensate for the slightly increasing difference between an astronomical day and the pulse of modern atomic clocks we now use for time-keeping. Initial Situation When I asked myself in year 2012 (where the last leap second happened) how to watch it using Java-based software the answer was simply: Not possible. At least not possible with any kind of standard tool. As with most software today, Java is totally ignorant towards leap seconds and pretends that they don't exist. There is no way to let Java print timestamps like "2015-06-30T23:59:60Z". Note that Java-8 with its new time library (JSR-310, java.time-package) does not offer this feature, too. For standard business purposes this approach is fine. From a scientific point of view however, this is not satisfying. Of course, these rare leap seconds still exist. When it comes to clock synchronization even leap-second-ignorant software has to pay some attribution if monotonicity is important. Any clock will sooner or later be synchronized such that leap seconds will be taken into account, often by setting the clock one second back. NTP time servers will usually repeat the timestamp "2015-06-30T23:59:59Z" and send in advance a so called leap indicator flag. This flag does not directly indicate the current timestamp as leap second but is only intended to be an announcement flag. So even NTP tries to hide leap seconds in some way. And some NTP servers like those from Google apply internal smearing algorithms (by slightly prolonging the second over a day) in order to avoid reporting any leap second at all. Decision for a New Library So I decided to develop a new time library named Time4J to solve this issue. First I had to set up a new infrastructure around a built-in configurable leap second table, then a new class net.time4j.Moment capable of holding a leap second as internal state. A SNTP-based clock yielding Moment-timestamps was developed and successfully tested during the last leap second in 2012. Since leap seconds are also reported by the well-known IANA-TZDB (standard timezone repository) I decided to develop a mechanism to apply the whole TZDB and its leap seconds on the class net.time4j.Moment. Finally I had even set up a new format and parse engine from the scratch to enable formatting and parsing of leap seconds in any timezone. A lot of work but the existing Java-software was not useable at all, not even as starting point. Recently I also developed a monotonic clock which enables to watch a leap second offline. Keep in mind that any clock - even NTP - are no reliable sources to watch a leap second in live connection. Fortunately Java offers at least System.nanoTime() which accesses the monotonic clock of operating system (if available). So I used this as a base of the new monotonic clock of Time4J which can connect to a NTP time server before a leap second will happen. How Does Time4J Help? import net.time4j.Moment; import net.time4j.Month; import net.time4j.PlainDate; import net.time4j.SI; import net.time4j.SystemClock; import net.time4j.base.TimeSource; import net.time4j.clock.FixedClock; import net.time4j.format.expert.ChronoFormatter; import net.time4j.format.expert.PatternType; import net.time4j.tz.olson.AMERICA; import java.util.Locale; import java.util.concurrent.ScheduledFuture; import java.util.concurrent.ScheduledThreadPoolExecutor; import java.util.concurrent.TimeUnit; public class DZone { static TimeSource clock; static ScheduledThreadPoolExecutor executor; static ScheduledFuture future; public static void main(String[] args) throws Exception { // when is the next leap second? Moment ls = SystemClock.currentMoment().with(Moment.nextLeapSecond()); if (ls == null) { // ooops, we are now too late and after last known leap second, // so let us set it directly ls = PlainDate.of(2015, Month.JUNE, 30).atTime(23, 59, 59) .atUTC().plus(1, SI.SECONDS); } // move 5 seconds earlier ls = ls.minus(5, SI.SECONDS); // let's start our monotonic clock at determined fixed time clock = SystemClock.MONOTONIC.synchronizedWith(FixedClock.of(ls)); // finally we observe our clock every second for 10 times executor = new ScheduledThreadPoolExecutor(15); future = executor.scheduleAtFixedRate( new ClockTask(), 0, 1, TimeUnit.SECONDS); } static class ClockTask implements Runnable { private int attempt = 1; public void run() { Moment moment = clock.currentTime(); String time = ChronoFormatter.ofMomentPattern( "MMM/dd/uuuu hh:mm:ss a XXX", PatternType.CLDR, Locale.ENGLISH, AMERICA.NEW_YORK ).format(moment); String flag = moment.isLeapSecond() ? " [leap second]" : ""; System.out.println(time + flag); attempt++; if (attempt > 10) { future.cancel(false); } } } } /* output: Jun/30/2015 07:59:55 PM -04:00 Jun/30/2015 07:59:56 PM -04:00 Jun/30/2015 07:59:57 PM -04:00 Jun/30/2015 07:59:58 PM -04:00 Jun/30/2015 07:59:59 PM -04:00 Jun/30/2015 07:59:60 PM -04:00 [leap second] Jun/30/2015 08:00:00 PM -04:00 Jun/30/2015 08:00:01 PM -04:00 Jun/30/2015 08:00:02 PM -04:00 Jun/30/2015 08:00:03 PM -04:00 */ For a real-life scenario you can just replace the clock this way (SNTP-example): SntpConnector sntp = new SntpConnector("ptbtime1.ptb.de"); sntp.connect(); clock = sntp; Conclusion Time4J does not try to hide the reality but gives you also the freedom to decide if you want to handle leap seconds or not. You can even switch off this feature completely by setting an appropriate system property. As side effect, a library has been developed which does not need to be afraid of any comparison with other existing libraries like JSR-310 (java.time-package in Java-8) or Joda-Time.
June 29, 2015
by Meno Hochschild
· 3,010 Views
article thumbnail
Observer Design Patterns Automation Testing
In my articles from the series “Design Patterns in Automation Testing“, I show you how to integrate the most useful code design patterns in the automation testing.
June 29, 2015
by Anton Angelov
· 2,910 Views
article thumbnail
Mystery Curve
this afternoon i got a review copy of the book creating symmetry: the artful mathematics of wallpaper patterns . here’s a striking curves from near the beginning of the book, one that the author calls the “mystery curve.” the curve is the plot of exp( it ) – exp(6 it )/2 + i exp(-14 it )/3 with t running from 0 to 2π. here’s python code to draw the curve. import matplotlib.pyplot as plt from numpy import pi, exp, real, imag, linspace def f(t): return exp(1j*t) - exp(6j*t)/2 + 1j*exp(-14j*t)/3 t = linspace(0, 2*pi, 1000) plt.plot(real(f(t)), imag(f(t))) # these two lines make the aspect ratio square fig = plt.gcf() fig.gca().set_aspect('equal') plt.show() maybe there’s a more direct way to plot curves in the complex plane rather than taking real and imaginary parts. updated code for the aspect ratio per janne’s suggestion in the comments. related posts : several people have been making fun visualizations that generalize the example above. brent yorgey has written two posts, one choosing frequencies randomly and another that animates the path of a particle along the curve and shows how the frequency components each contribute to the motion. mike croucher developed a jupyter notebook that lets you vary the frequency components with sliders. john golden created visualizations in geogerba here and here . jennifer silverman showed how these curves are related to decorative patterns that popular in the 1960’s. she also created a coloring book and a video . dan anderson accused me of nerd sniping him and created this visualization .
June 28, 2015
by John Cook
· 4,365 Views · 1 Like
article thumbnail
Launching Missiles With Haskell
Haskell advocates are fond of saying that a Haskell function cannot launch missiles without you knowing it. Pure functions have no side effects, so they can only do what they purport to do. In a language that does not enforce functional purity, calling a function could have arbitrary side effects, including launching missiles. But this cannot happen in Haskell. The difference between pure functional languages and traditional imperative languages is not quite that simple in practice. Programming with pure functions is conceptually easy but can be awkward in practice. You could just pass each function the state of the world before the call, and it returns the state of the world after the call. It’s unrealistic to pass a program’s entire state as an argument each time, so you’d like to pass just that state that you need to, and have a convenient way of doing so. You’d also like the compiler to verify that you’re only passing around a limited slice of the world. That’s where monads come in. Suppose you want a function to compute square roots and log its calls. Your square root function would have to take two arguments: the number to find the root of, and the state of the log before the function call. It would also return two arguments: the square root, and the updated log. This is a pain, and it makes function composition difficult. Monads provide a sort of side-band for passing state around, things like our function call log. You’re still passing around the log, but you can do it implicitly using monads. This makes it easier to call and compose two functions that do logging. It also lets the compiler check that you’re passing around a log but not arbitrary state. A function that updates a log, for example, can effect the state of the log, but it can’t do anything else. It can’t launch missiles. Once monads get large and complicated, it’s hard to know what side effects they hide. Maybe they can launch missiles after all. You can only be sure by studying the source code. Now how do you know that calling a C function, for example, doesn’t launch missiles? You study the source code. In that sense Haskell and C aren’t entirely different. The Haskell compiler does give you assurances that a C compiler does not. But ultimately you have to study source code to know what a function does and does not do.
June 28, 2015
by John Cook
· 11,964 Views · 1 Like
article thumbnail
Spark Grows Up and Scales Out
Written by Craig Wentworth. To understand the furor that’s greeted recent vendor announcements around open source analytics computing engine Spark, and some commentary seemingly setting up a Spark versus Hadoop battle, it’s worth taking a moment to recap on what each actually is (and is not). As I covered in last year’s MWD report on Hadoop and its family of tools, when people talk about Apache Hadoop they’re often referring to a whole framework of tools designed to facilitate distributed parallel processing of large datasets. That processing was traditionally confined to MapReduce batch jobs in Hadoop’s early days, though Hadoop 2 brought the YARN resource scheduler and opened up Hadoop to streaming, real-time querying and a wider array of analytical programming applications (beyond MapReduce). Spark has been designed to run on top of Hadoop’s Distributed File System (amongst other data platforms) as an alternative to MapReduce – tuned for real-time streaming data processing and fast interactive queries, and with multi-genre analytics applicability (machine learning, time series, graph, SQL, streaming out-of-the-box). It gets that speed advantage by caching in-memory (rather than writing interim results to disk, as MapReduce does), but with that approach comes a need for higher-spec physical machines (compared with MapReduce’s tolerance for commodity hardware). So, Spark isn’t about to replace Hadoop -- but it may well supplant MapReduce (especially in growing real-time use cases). Those “Spark vs Hadoop” headlines are about as meaningful as one proclaiming “mushrooms vs pizza." Yes, mushroom might be a more suitable topping than, say, pepperoni (especially in a vegetarian use case), but it’ll still be deployed on the same dough and tomato sauce pizza platform. Nobody’s about to suggest the mushroom should go it alone! But what’s behind the headlines and the hype is a story of enterprise adoption – or at least vendors anticipating that adoption and investing in ‘the weaponization of Spark’ as it faces the more exacting standards of security, scaling performance, consistency, etc. which come with mainstream enterprise deployment. Big names like IBM, Databricks (the company formed by the originators of Spark), and MapR made commitments in and around the Spark Summit earlier this month. MapR has announced three new Quick Start Solutions for its Hadoop distribution to help customers get started with Spark in real-time security log analytics, genome sequencing, and time series analytics; Databricks’ cloud-hosted Spark platform (formerly known as Databricks Cloud) has become generally available; and IBM announced a raft of measures designed to give Spark a significant shot in the arm – it’s open sourcing its SystemML technology to bolster Spark’s machine learning capabilities, integrating Spark into its own analytics platforms, investing in Spark training and education, committing 3,500 of its researchers and developers to work on Spark-related projects, and offering Spark as a service on its Bluemix developer cloud. Given the overlap with Databricks’ business model (of offering development, certification, and support for Spark), IBM’s intentions are likely to tread on some toes before long – but for now, at least, both companies are content to focus on the combined push benefiting the Spark community and its enterprise aspirations overall (though clearly IBM’s betting on all this investment buying it some influence over where Spark goes next). It’s worth bearing in mind that not all its supporters champion Spark wholesale and all the interested parties tend to be interested in particular bits of Spark (as wide-ranging as it is) because of overlaps with their own preferred toolsets. For instance, although Spark supports many analytics genres, Cloudera focuses on its machine learning capabilities (as it has its own SQL-on-Hadoop tool in Impala), and MapR and Hortonworks also promote Drill and Hive as their favoured source of SQL-on-Hadoop. IBM’s support is focused on Spark’s machine learning and in-memory capabilities (hence the SystemML open sourcing news). In the face of such strong vendor preferences, how long before some of Spark’s current features fall away (or at least start to show the effects of being starved of as much care and feeding as is bestowed upon vendors’ favourite Spark components)? The Spark community is at much the same place the Hadoop one was at a while back – it’s showing great promise and suitability in key growth workloads (in Spark’s case, such as real-time IoT applications). However, the product as it stands is too immature for many enterprise tastes. Cue enterprise software vendors stepping up to help grow Spark up fast. Their challenge though is to smooth out the edges without smothering what made it so interesting in the first place.
June 28, 2015
by Angela Ashenden
· 2,348 Views
  • Previous
  • ...
  • 398
  • 399
  • 400
  • 401
  • 402
  • 403
  • 404
  • 405
  • 406
  • 407
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×