Data Engineering Resources

The Latest Data Engineering Topics

Azure Service Bus – As I Understand It: Part I (Overview)

Recently we started working on including support for Azure Service Bus in Cloud Portam. Prior to this, I had no experience with this service though it has been around for quite some time and I always wanted to try this out but one thing or another (oh, my stupid excuses :)!) prevented me from doing so. I learned a lot (and I am still learning) about this service while including support for it in Cloud Portam and this blog post talks about my learning. Please note that at the time of writing of all in all I have about a week of learning about this service so it is quite possible that I may be wrong about certain things. If that’s the case, please let me know and I will fix them ASAP. Now that the tone is set, let’s start! Azure Service Bus Offering The way I understand is that “Azure Service Bus” is a cloud-based messaging service that enables you to connect virtually anything – be it applications, services or devices. The beauty of Service Bus is that these things need not be in the cloud. They can run anywhere even inside the firewalled networks! Another thing I learned is that “Azure Service Bus” is essentially an umbrella service. At the time of writing of this post, there are actually four distinct services that are collectively offered under “Service Bus” umbrella – Queues, Topics & Subscriptions, Relays and Notification Hubs. Each service serves a different purpose yet the common theme is that all of them provide rich messaging infrastructure. To give you an analogy, if you have used Azure Storage Service you may already know that it offers four distinct services – Blobs, Files, Queues and Tables. It is the same with Service Bus as well. Queues Queues is the simplest of the service and kind of compares with Azure Storage Queue Service in the sense that it provides a unidirectional messaging infrastructure where a publisher publishes a message and the message is received by a receiver. There can be many receivers ready to receive the messages however one receiver can only receive a message. No two receivers can receive a single message simultaneously. For an in-depth comparison of Service Bus Queue and Storage Queues, please see this link: https://msdn.microsoft.com/en-us/library/azure/hh767287.aspx. Topics Topics are like queues in the sense that it also provides a unidirectional messaging infrastructure where a publisher publishes a message and receivers receive the message. The key difference is that same message can be received by multiple receivers (subscribers). Each subscriber can optionally specify a filter criteria so that they only receive the messages matching that criteria. To understand the difference between the two, let’s consider an example. Let’s say you run an e-commerce site and on successful completion of order, you have two tasks: 1) Send an email to customer about the order and 2) Notify the warehouse. If you were using Queues, you would either create 2 queues and put email notification message in one queue and warehouse notification message in another queue or build a workflow where you would send order confirmation message to a queue. Receiver would take that message and send out an email and then put warehouse notification message in the same queue (or other queue) and then another receiver would receive the message and notify the warehouse. However if you were using Topics, things would be much simpler logistically speaking. Essentially you would have just one message (order confirmation) but there will be two subscribers – one will be responsible for sending the email confirmation and the other will be responsible for notifying the warehouse. Relays Unlike Queues and Topics, which provide unidirectional flow of messages a Relay provides bi-directional flow. Using Relays, two disparate applications, services or devices can exchange messages. Other key difference is that a Relay doesn’t store the message like Queues and Topics. It just passes the messages from source to destination. Event Hubs Event Hubs service is meant for ingesting events and telemetry data in the cloud at massive scale (millions of events / second). Event Hubs are now more than important considering the push for connected devices (Internet-of-Things). Azure Service Bus Tiers Azure Service Bus is offered under two tiers (or SKUs if you would like): Basic and Standard. The difference is the level of functionality offered in each tier and the pricing. For example, Topics, Relays and Notification Hubs are only offered under Standard tier. Even with Queues, a limited set of functionality is exposed under Basic tier. For a list of features offered under each tier, please see this link: http://azure.microsoft.com/en-in/pricing/details/service-bus/. Summary That’s it for this post. In the next posts in this series, I will share my learnings about Queues and other Service Bus services. So stay tuned for that! Again, if you think that I have provided some incorrect information, please let me know and I will fix them ASAP.

June 30, 2015

by Gaurav Mantri

· 1,262 Views

Using Parameterized Query to Avoid SQL Injection

introduction to explain why you have to use parameterized query to avoid sql injection over concatenated inline query it needs to know about sql injection. what does sql injection mean? it means when any end user send some invalid inputs to perform any crud operation or forcibly execute the wrong query into the database, those can be harmful for the database. harmful means ‘data loss’ or ‘get the data with invalid inputs. to know more, follow the below steps. step 1: create a table named ‘login’ in any database. create table user_login ( userid varchar(20), pwd varchar(20) ) now save some user credentials into the database for login purpose and select the table. insert into user_login values('rahul','bansal@123') insert into user_login values('bansal','rahul@123') step 2: create a website named ‘website1’. now i will create a login page named ‘default.aspx’ to validate the credentials from the ‘login’ table and if user is valid then redirect to it to the next page named ‘home.aspx’. add 2 textboxes for userid & password respectively and a button for login. add 2 namespaces in the .cs file of the ‘default.aspx’. using system.data.sqlclient; using system.data; now add the following code to validate the credentials from the database on click event of login button. protected void btn_login_click(object sender, eventargs e) { string constr = system.configuration.configurationmanager.connectionstrings["constr"].connectionstring; sqlconnection con = new sqlconnection(constr); string sql = "select count(userid) from user_login where userid='" + txtuserid.text + "' and pwd='" + txtpwd.text + "'"; sqlcommand cmd = new sqlcommand(sql, con); con.open(); object res = cmd.executescalar(); con.close(); if (convert.toint32(res) > 0) response.redirect("home.aspx"); else { response.write("invalid credentials"); return; } } add a new page named ‘home.aspx’. where any valid user will get welcome message. step 3: now run the ‘default’ page and log in with valid credentials. it will redirect to next page ‘home.aspx’ for valid user. note: here i have not used the textmode="password" property in password textbox to show the password. i have not used any input validations to explain my example. problem: now i will perform the sql injection with some invalid credentials with successful query execution and after that i will redirect to the next page ‘home.aspx’ as a valid user. i will enter a string in both textboxes like the following: ‘ or ‘1’=’1 now run the page and login with above string in both textboxes. it will redirect to next page name ‘home.aspx’ for valid user. see what happened. this is called sql injection in the hacking world. reason: it happened just because of the string and after filling this string in both textboxes orur sql query became like the following: select count(userid) from user_login where userid='' or '1'='1' and pwd='' or '1'='1' which will give the userid count and that is 2 in the table because 2 users are in ‘user_login’ table. it can be used in more ways like just fill the following string only in user id textbox and you will go the next page as valid user. or 1=1 - - and it will also give users count 2 because sqlquery will become like the following: select count(userid) from user_login where userid='' or 1=1 --' and pwd='' or '1'='1' note: the sign -- are for commenting the preceding text in sql. it can be more harmful or dangerous when the invalid user/hacker executes a script to drop all tables in the database or drop whole database. solution: to resolve this issue you have to do 2 things: always use parameterized query. input validations on client and server both side. sometimes if your input validation fail, then parameterized will not execute any scripted value. let’s see the example. protected void btn_login_click(object sender, eventargs e) { string constr = system.configuration.configurationmanager.connectionstrings["constr"].connectionstring; sqlconnection con = new sqlconnection(constr); string sql = "select count(userid) from user_login where userid=@userid and pwd=@pwd"; sqlcommand cmd = new sqlcommand(sql, con); sqlparameter[] param = new sqlparameter[2]; param[0] = new sqlparameter("@userid", txtuserid.text); param[1] = new sqlparameter("@pwd", txtpwd.text); cmd.parameters.add(param[0]); cmd.parameters.add(param[1]); con.open(); object res = cmd.executescalar(); con.close(); if (convert.toint32(res) > 0) response.redirect("home.aspx"); else { response.write("invalid credentials"); return; } } now if i run the page and try to login with sql scripts as done earlier. with ‘ or ‘1’=’1 with ' or 1=1 - - as you have seen parameterized didn’t execute the sql script but why? reason: the reason behind this the parameterized query would not be vulnerable and would instead look for a user id or password which literally matched the entire string. in other words ‘the sql engine checks each parameter to ensure that it is correct for its column and are treated literally, and not as part of the sql to be executed’. conclusion: always use parameterized query and input validations on client and server both side.

June 30, 2015

by Rahul Bansal

· 11,781 Views

The Secret to More Efficient Data Science with Neo4j and R [OSCON Preview]

It’s a sad but true fact: Most data scientists spend 50-80% of their time cleaning and munging data and only a fraction of their time actually building predictive models. This is most often true in a traditional stack, where most of this data munging consists of writing lines upon lines of some flavor of SQL, leaving little time for model-building code in statistical programming languages such as R. These long, cryptic SQL queries not only slow development time but also prevent useful collaboration on analytics projects, as contributors struggle to understand each others’ SQL code. For example, in graduate school, I was on a project team where we used Oracle to store Twitter data. The kinds of queries my classmates and I were writing were unmaintainable and impossible to understand unless the author was sitting next to you. No one worked on the same queries together because they were so unwieldy. This not only hindered our collaboration efforts but also slowed our progress on the project. If we had been using an appropriate data store (like a graph database) we would have spent significantly less time pulling our hair out over the queries. Why Today’s Data Is Different This data-munging problem has persisted in the data science field because data is becoming increasingly social and highly-connected. Forcing this kind of interconnected data into an inherently tabular SQL database, where relationships are only abstract, leads to complicated schemas and overly complex queries. Yet, several NoSQL solutions – specifically in the graph database space – exist to store today’s highly-connected data. That is, data where relationships matter. A lot of data analysis today is performed in the context of better understanding people’s behavior or needs, such as: How likely is this visitor to click on advertisement X? Which products should I recommend to this user? How are User A and User B connected? Written by Nicole White People, as we know, are inherently social, so most of these questions can be answered by understanding the connections between people: User A is similar to User B, and we already know that User B likes this product, so let’s recommend this product to User A. The Good News: Data-Munging No More Data science doesn’t have to be 80% data munging. With the appropriate technology stack, a data scientist’s development process is seamless and short. It’s time to spend less time writing queries and more time building models by combining the flexibility of an open-source, NoSQL graph database with the maturity and breadth of R – an open-source statistical programming language. The combination of Neo4j’s ability to store highly-connected, possibly-unstructured data and R’s functional, ad-hoc nature creates the ideal data analysis environment. You don’t have to spend an hour writing CREATE TABLE statements. You don’t have to spend all day on StackOverflow figuring out how to traverse a tree in SQL. Just Cypher and go. Learn More at OSCON 2015 At my upcoming OSCON session we will walk through a project in which we analyze #OSCON Twitter data in a reproducible, low-effort workflow without writing a single line of SQL. For this highly-connected dataset we will use Neo4j, an open-source graph database, to store and query the data while highlighting the advantages of storing such data in a graph versus a relational schema. Finally, we will cover how to connect to Neo4j from an R environment for the purposes of performing common data science tasks, such as analysis, prediction and visualization.

June 30, 2015

by Mark Needham

· 1,644 Views

The Philosophy of the CUBA Platform

A huge amount has happened recently. Following the official launch of CUBA on 1st of June, we have rolled out a new release, published our first article on a few Java sites and presented the platform at the Devoxx UK сonference in London. But before the rush continues, about it is an apt time to articulate the philosophy behind CUBA. The first words associated with enterprise software development will probably be: slow, routine, complex and convoluted - nothing exciting at all! A common approach to combat these challenges is raising the level of abstraction - so that developers can operate with interfaces and tools encapsulating internal mechanisms. This enables the focus on high-level business requirements without the need to reinvent common processes for every project. Such a concept is typically implemented in frameworks, or platforms. The previous CUBA article explained why CUBA is more than just a bunch of well-known open-source frameworks comprehensively integrated together. In brief, it brings declarative UI with data aware visual components, out-of-the-box features starting from sophisticated security model to BPM, and awesome development tools to complement your chosen IDE. You can easily find more details on our Learn page, so instead of listing all of them I'll try to "raise the abstraction level" and explain the fundamental principles of CUBA. Practical The platform is a living organism, and its evolution is mostly driven by specific requests from developers. Of course, we constantly keep track of emerging technologies, but we are rather conservative and employ them only when we see that they can bring tangible value to the enterprise software development. As a result, CUBA is extremely practical; every part of it has been created to solve some real problem. Integral Apart from the obvious material features, the visual development environment provided by CUBA Studio greatly reduces the learning curve for newcomers and juniors. It is even more important that the platform brings a unified structure to your applications. When you open a CUBA-based project, you will always know where to find a screen, or a component inside of it; where the business logic is located and how is it invoked. Such an ability to quickly understand and change the code written by other developers cannot be underestimated as a significant benefit to continual enterprise development. An enterprise application lifecycle may last tens of years and your solution must constantly evolve with the business environment, regardless of any changes in your team. For this reason, the flexibility to rotate, or scale up or down the team when needed, is one of the major concerns for the companies, especially those who outsource development or have distributed teams. Open One of the key principles of CUBA is openness. This starts with the full platform source code, which you have at hand when you work on a CUBA-based project. In addition, the platform is also open in the sense that you can change almost any part of it to suit your needs. You don't need to fork it to customize some parts of the platform - it is possible to extend and modify the platform functionality right in your project. To achieve this, we usually follow the open inheritance pattern, providing access to the platform internals. We understand that this can cause issues when the project is upgraded to a newer platform version. However, from our experience, this is far less evil than maintaining a fork, or accepting the inability to adapt the tool for a particular task. We could also make a number of specific extension points, but in such case we would have to anticipate how application developers will use the platform. Such predictions always fail, sooner or later. So instead we have made the whole platform extension-friendly: you can inherit and override platform Java code including the object model, XML screens layout and configuration parameters. Transitively, this remains true for CUBA-based projects. If you follow a few simple conventions, your application becomes open for extension, allowing you to adapt the single product for many customers. Symbiotic CUBA is not positioned as a “thing-in-itself”. When a suitable and well-supported instrument already exists and we can integrate with it without sacrificing platform usability, we will integrate with it. An illustration of such integrations is full-text search and BPM engines, JavaScript charts and Google Maps API. At the same time, we have had to implement our ownreport generator from scratch, because we could not find a suitable tool (technology and license wise). The CUBA Studio follows this principle too. It is a standalone web application and it doesn't replace your preferred IDE. You can use Studio and the IDE in parallel, switching between them to accomplish different tasks. WYSIWYG approach, implemented in Studio, is great for designing the data model and screens layout, while the classic Java IDE is the best for writing code. You can change any part of your project right in the IDE, even things created by Studio. When you return to Studio, it will instantly parse all changes, allowing you to keep on developing visually. As you see, instead of competing with the power of Java IDEs, we follow a symbiotic approach. Moreover, to raise coding efficiency, we’ve developed plugins for the most popular IDEs. When we integrate with a third-party framework, we always wrap it in a higher level API. This enables replacing the underlying implementation if needed and makes the whole platform API more stable long term and less dependent on the constant changes in the integrated third-party frameworks. However, we don't restrict the direct use of underlying frameworks and libraries. It makes sense if CUBA API does not fit a particular use case. For example, if you can't do something via Generic UI, you can unwrap a visual component and get direct access to Vaadin (or Swing). The same applies for data access; if some operation is slow or not supported by ORM, just write SQL and run it via JDBC or MyBatis. Of course, such “hacks” lead to more complex and less portable application code, but they are typically very rare compared to the use of standard platform API. This knowledge of inherent flexibility and a sense of “Yes you can” adds a lot of confidence to developers. Wide use area We recommend using CUBA if you need to create an application with anything starting from 5-10 screens, as long as they consist of standard components like fields, forms, and tables. The effect from using CUBA grows exponentially with the complexity of your application, independent of the domain. We have delivered complex projects in financial, manufacturing, logistics and other areas. As an example, a non-obvious, but popular use case is using CUBA as the backend and admin UI, while creating the end-user interface with another, lighter or more customizable web technology. I hope you will see some use cases of the platform for yourself, so in the next articles we’ll focus on “what's under the hood” – as we provide a detailed overview of the different CUBA parts.

June 29, 2015

by Aleksey Stukalov

· 12,032 Views · 6 Likes

Calix Announces Next Generation PON Solutions that Redefine the Gigabit Experience

As device-enabled subscribers demand more, Calix introduces multiple wavelength NG-PON2 solutions primed for next generation applications ANAHEIM, CA - June 29, 2015 - Calix, Inc. (NYSE: CALX), the world leader in gigabit fiber deployments, today announced new cards for its E-Series portfolio that introduce both increased systems capacity and ITU/FSAN NG-PON2 support. By adding 10 gigabit per second (10 Gbps) time and wavelength division multiplexed (TWDM) PON NG-PON2 with both fixed and tunable wavelengths, Calix is paving the way for service providers to leverage next generation fiber solutions that redefine the broadband experience. These solutions, when combined with Calix Compass software applications and GigaCenter platforms, extend far beyond gigabit "speeds and feeds" to encompass superior Wi-Fi performance and compelling cloud-based applications, and lay the foundation for a superior gigabit and multi-gigabit experience for residential and business subscribers. Service providers globally are seeing an explosion of cloud-connected devices. In North America alone, the average person is projected to have over 11 of these devices by 2019. As these devices proliferate and are used to stream and share rich multimedia content, they will place enormous pressure on the access infrastructure. By 2019, 80 percent of all internet traffic will be IP video, while global cloud traffic will nearly quadruple over the same period. (Statistics from Cisco VNI and Cisco GCI) Service providers need solutions that can meet these emerging demands today while seamlessly adding new technologies for even higher scale and capacity in the future. Extending from the award-winning GigaCenter at the subscriber premises all the way to the cloud-based Compass software, Calix next generation fiber solutions arm service providers with technologies to address challenges and opportunities in subscriber support, analytics, and service enhancement. Potential issues impacting the subscriber broadband experience, such as device bandwidth contention in the ultra HD video enabled home, management of the device-rich smart home, and sub-par Wi-Fi performance, can be remedied before they manifest themselves to subscribers. Calix has a long history of serving fiber access customers with an architectural philosophy of a unified access infrastructure, supporting both business and residential services. Calix service provider customers have found the financial benefits of this architecture to be compelling, resulting in industry-leading efficiency in installed equipment costs and on-going operational savings. The Calix NG-PON2 strategy extends this leadership, expanding the capacity of fiber networks and allowing service providers to keep pace with the needs of business and residential customers alike over a common infrastructure. "From being the vendor with the first commercial deployments of GPON over a decade ago, to our unique auto-detect technology and pay-as-you-grow modular fiber architecture, Calix has consistently led the way in fiber access innovation," said Michel Langlois, Calix senior vice president of systems products. "This leadership in innovation is why over 1000 service providers across the globe rely on Calix for fiber access solutions, including nearly 100 delivering a gigabit service experience to their residential subscribers. Next generation PON provides fertile ground for a new wave of innovations, and Calix is again leading the way with significant contributions to development of the NG-PON2 standard, including key submissions that will reduce deployment costs and technical complexity and assure 2.4 GPON coexistence. Demonstrations of both NG-PON2 with tunable TWDM wavelengths and fixed wavelength 10G PON will take place this fall." Calix next generation PON solutions will extend across the entire access infrastructure, from the device-enabled subscriber premises to the data center or central office: Subscriber Premises: The industry leading Calix premises portfolio will be expanded to include NG-PON2 technology, including both fixed and tunable optics, across the full range of subscriber applications - from business services to MDU and SFU residential applications with Carrier Class Wi-Fi. Compass software-as-a-service applications will also be expanded to support these technologies and bring a true multi-gigabit experience to subscribers. Access Infrastructure: The award winning E7 portfolio will be enhanced by NG-PON2 cards that support both 10 Gbps fixed and TWDM wavelengths with pluggable optics. Optimized for areas of high bandwidth demand and congestion, these cards will support the delivery of multiple wavelengths of symmetrical 10 Gbps services. Data Center / Central Office: The E7 portfolio will be augmented by new high capacity cards that tap into new levels of performance in both system switching and uplink capacity. "Calix has a long history of new innovations in the fiber access industry, making key contributions to fiber access standards and making the fiber access business model work for service providers," said Teresa Mastrangelo, principal and founder of Broadbandtrends LLC. "Now as we move into the emerging era of next generation PON, it's no surprise to see Calix announcing a comprehensive portfolio addressing a range of applications. It is clear that NG-PON2 is going to be the PON technology that enables the multi-gigabit experience of the future, and Calix looks to be ready as this market emerges with a robust solutions oriented strategy." The new Calix innovations, including its new next generation PON and recently announced G.fast solutions for MDU applications, will be highlighted at the Calix booth #413 this week at the FTTH Conference & Expo in Anaheim, California.

June 29, 2015

by Fran Cator

· 990 Views

Persistence and DAO Testing Made Simple (with Exparity-Stub and Hamcrest-Bean)

Persistence of model objects is a part of many Java projects and a part which deserves, and often gets, high test coverage as one of the key layer integration points in the code. However, I've often felt the testing paradigms for this can be cumbersome, often involving a large amount of setup with an equivalent amount of validation. This can be tedious to both create and maintain. As a solution to this I've been testing persistence with a different pattern; by combining both the exparity-stub and the hamcrest-bean library you can thoroughly test model persistence in a few lines of test code as per the snippet below; .. User user = aRandomInstanceOf(User.class); User saved = dao.save(user); assertThat(dao.getUserById(saved.getId()), theSameBeanAs(saved)); The test snippet above is small but in those few lines will thoroughly test that all fields in a graph can be persisted and retrieved without loss, that any JPA or other mapping is valid, and that your queries are valid. For a complete example we'll work through testing a simple DAO for storing and retrieving User objects using the in-memory H2 database for simplicity. The same example will work for any persistence mechanism. Before we get started with an example lets briefly outline what the libraries are and what they do. The Exparity-Stub Library The exparity-stub libraries provides a set of static methods for creating stubs of model objects, object graphs, collections, types, and primitive types. For our example we'll be creating random stubs because we want to completely fill the graph with junk data and check it can be written down. exparity-stub offers two approaches to this, the RandomBuilder or the BeanBuilder. The RandomBuilder provides a terser notation to create random objects with less code. For example: User user = RandomBuilder.aRandomInstanceOf(User.class); List users = RandomBuilder.aRandomListOf(User.class); String anyString = RandomBuilder.aRandomString(); Whereas the BeanBuilder provides a fluent interface with finer control for building individual objects and graphs, for example; User user = BeanBuilder.aRandomInstanceOf(User.class) .excludeProperty("Id").build(); For this example i'm going to use the BeanBuilder so I can exclude the User.Id property from being populated by the random builder. The Hamcrest-Bean Library The hamcrest-bean library is an extension library to the Java Hamcrest library. The hamcrest-bean library provides a set of matchers specifically for testing Java objects and object graphs and performs deep inspections of those objects. It supports exclusions and overrides to allow fine control, if required, of how matching of any property, path, or type is handled, for example: User expected = new User("Jane", "Doe"); assertThat(new User("John", "Doe"), BeanMatchers.theSameAs(expected).excludeProperty("FirstName")); A Sample Project The sample project I'll work through is persistence of a simple User object with a child list of UserComment objects. This simple graph will be persisted to a H2 database with hibernate handling the Object-Relational Mapping (ORM) mapping, and Java Persistence Annotation (JPA) used to mark-up the model. The Model Below are the two model classes; first the User class. package org.exparity.hamcrest.bean.sample.dao; import java.util.*; import javax.persistence.*; @Entity @Table public class User { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE) private Long id; private Date createTs; private String username, firstName, surname; @OneToMany(cascade = CascadeType.ALL, fetch = FetchType.EAGER) private List comments = new ArrayList<>(); public Long getId() { return id; } public void setId(Long id) { this.id = id; } public Date getCreateTs() { return createTs; } public void setCreateTs(Date createTs) { this.createTs = createTs; } public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getFirstName() { return firstName; } public void setFirstName(String firstName) { this.firstName = firstName; } public String getSurname() { return surname; } public void setSurname(String surname) { this.surname = surname; } public List getComments() { return comments; } public void setComments(List comments) { this.comments = comments; } } Followed by the UserComment class. package org.exparity.hamcrest.bean.sample.dao; import java.util.Date; import javax.persistence.*; @Table @Entity public class UserComment { private Long id; private Date timestamp; @Transient private String text; private String title; public Date getTimestamp() { return timestamp; } public void setTimestamp(Date timestamp) { this.timestamp = timestamp; } public String getText() { return text; } public void setText(String text) { this.text = text; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } } Followed by the UserComment class. package org.exparity.hamcrest.bean.sample.dao; import java.util.Date; import javax.persistence.*; @Table @Entity public class UserComment { private Long id; private Date timestamp; @Transient private String text; private String title; public Date getTimestamp() { return timestamp; } public void setTimestamp(Date timestamp) { this.timestamp = timestamp; } public String getText() { return text; } public void setText(String text) { this.text = text; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } } The Data Access Object (DAO) Next up we write our DAO layer. I've excluded the UserDAO interface from this post but it is available in the sample project ongithub .The full, if somewhat crude, implementation of the UserDAO is below. package org.exparity.hamcrest.bean.sample.dao; import org.hibernate.boot.registry.StandardServiceRegistryBuilder; import org.hibernate.cfg.Configuration; import org.hibernate.*; public class UserDAOHibernateImpl implements UserDAO { private final SessionFactory factory; public UserDAOHibernateImpl(final String resourceFile) { this.factory = new Configuration() .addAnnotatedClass(User.class) .addAnnotatedClass(UserComment.class) .buildSessionFactory( new StandardServiceRegistryBuilder() .loadProperties(resourceFile) .build()); } @Override public User save(final User user) { Session session = factory.getCurrentSession(); Transaction txn = session.beginTransaction(); try { session.save(user); txn.commit(); } catch (final Exception e) { txn.rollback(); } return user; } @Override public User getUserById(Long userId) { Session session = factory.getCurrentSession(); Transaction txn = session.beginTransaction(); try { return (User) session.get(User.class, userId); } finally { txn.rollback(); } } } Integration Test And finally, onto our integration test. The hibernate.properties will create an instance of an in-memory database and create the necessary tables on instantiation of the DAO. hibernate.dialect=org.hibernate.dialect.H2Dialect hibernate.connection.username=sa hibernate.connection.password= hibernate.connection.driver_class=org.h2.Driver hibernate.connection.url=jdbc:h2:mem:test hibernate.current_session_context_class=thread hibernate.cache.provider_class=org.hibernate.cache.internal.NoCacheProvider hibernate.show_sql=true hibernate.hbm2ddl.auto=update The integration test is below. package org.exparity.hamcrest.bean.sample.dao; import static org.exparity.hamcrest.BeanMatchers.theSameBeanAs; import static org.exparity.stub.bean.BeanBuilder.aRandomInstanceOf; import static org.hamcrest.MatcherAssert.assertThat; import static org.hamcrest.Matchers.*; import org.junit.Test; public class UserDAOHibernateImplTest { @Test public void canSaveAUser() { User user = aRandomInstanceOf(User.class).excludeProperty("Id").build(); UserDAOHibernateImpl dao = new UserDAOHibernateImpl("hibernate.properties"); User saved = dao.save(user); User loaded = dao.getUserById(saved.getId()); assertThat(loaded, not(sameInstance(user))); assertThat(loaded, theSameBeanAs(user)); } } Let's break the test down step by step to see what each step is doing and why the test is put together this way. 1) Model Setup User user = aRandomInstanceOf(User.class).excludeProperty("Id").build(); Create a random instance of the User class and it's associates using exparity-stub. The instance will be populated with random data with the exception of the Id property. I've excluded the Id property so that is left null to test that the id is being generated in the database. 2) DAO Setup UserDAOHibernateImpl dao = new UserDAOHibernateImpl("hibernate.properties") Instantiate the DAO ready to be tested, passing in the property file to use for the test. The hibernate properties used will configure an in-memory instance of H2 and create the schema automatically. 3) Exercise the DAO User saved = dao.save(user); User loaded = dao.getUserById(saved.getId()); Save the random instance of the model set up in step (1) and then query the object back out again. 4) Verify the results assertThat(loaded, not(sameInstance(user))); assertThat(loaded, theSameBeanAs(user)); The first line verifies that the loaded User instance is not the same instance as the originally saved User. This prevents false positive results when the loaded instance is returned directly from a cache. The second line uses hamcrest-bean to perform a deep comparison of the loaded User instance against the original user instance. Running the Test The first run of the test yields an error; specifically a hibernate warning because a @Id annotation has been missed on UserComment. org.hibernate.AnnotationException: No identifier specified for entity: org.exparity.hamcrest.bean.sample.dao.UserComment at org.hibernate.cfg.InheritanceState.determineDefaultAccessType(InheritanceState.java:277) at org.hibernate.cfg.InheritanceState.getElementsToProcess(InheritanceState.java:224) at org.hibernate.cfg.AnnotationBinder.bindClass(AnnotationBinder.java:775) at org.hibernate.cfg.Configuration$MetadataSourceQueue.processAnnotatedClassesQueue(Configuration.java:3845) at org.hibernate.cfg.Configuration$MetadataSourceQueue.processMetadata(Configuration.java:3799) at org.hibernate.cfg.Configuration.secondPassCompile(Configuration.java:1412) at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1846) at org.exparity.hamcrest.bean.sample.dao.UserDAOHibernateImpl.(UserDAOHibernateImpl.java:15) at org.exparity.hamcrest.bean.sample.dao.UserDAOHibernateImplTest.canSaveAUser(UserDAOHibernateImplTest.java:18) A fix to the UserComment object and we can run the test again. @Table @Entity public class UserComment { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE) private Long id; private Date timestamp; @Transient private String text; private String title; ... After running the test again we get another failure. The presence of the @Transient annotation on the UserComment.text property is preventing the value being persisted java.lang.AssertionError: Expected: the same as but: User.Comments[0].Text is null instead of "mDAWDJXbheIHbbHLR1NNVJqAki49RvaVwQtKD38r79u0y3MTDD" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8) at org.exparity.hamcrest.bean.sample.dao.UserDAOHibernateImplTest.canSaveAUser(UserDAOHibernateImplTest.java:19) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) Another change to the UserComment object to remove the @Transient annotation and we can run the test again. @Table @Entity public class UserComment { @Id @GeneratedValue(strategy = GenerationType.SEQUENCE) private Long id; private Date timestamp; private String text; private String title; ... After running the test again it all passes. Try It Out To try hamcrest-bean and exparity-stub out for yourself include the dependency in your maven pom or other dependency manager. org.exparity hamcrest-bean 1.0.10 test org.exparity exparity-stub 1.1.5 test

June 29, 2015

by Stewart Bissett

· 3,222 Views

Value Added Networks: Commuters Ride First Class with Cellwize's Value-Based SON

Value-Driven SON® provides the right mobile network experience for the right customers at the right time Cellwize, the innovative Self-Organizing Networks provider, has revealed how its Value-Driven SON® solution can provide mobile operators with the means to ensure better customer experience for selected audiences, thereby driving new business for operators. Cellwize’ Value-Driven SON shifts the value generated by the network to the end customer – moving from a Network-Centric SON to a Customer-Centric activity. The latest whitepaper from Cellwize “Value-Driven SON – Putting the Customer at the Network’s Center” is a must read for forward thinking operators who are introducing customer-focused metrics. As operators globally look to maximize user satisfaction, several are using metrics to measure quality of experience (QoE), net promoter scores (NPS) and data service revenues. In fact some are starting to use the term Key Quality Indicator (KQI) to describe these metrics and distinguish them from the more operational KPIs. And, unlike KPIs, KQI’s are generally at a customer level – either for selected customers or a segment of customers. “We believe that SON should be driven by the business objectives of the operator and not by the immediate need to reduce complexity and costs in the network itself,” said Ofir Zemer, CEO of Cellwize. “And since operator’s business objectives aim to deliver superior value to various groups of end-users; Value-Driven SON allows mobile operators to turn this value to enhanced revenue. Value-Driven SON connects customer insight generated in and around the network with network optimization technology, and is driven by the operators’ business goals. Ultimately, the network is there to serve a purpose and the shift to the user will enable a quicker path for operators to deliver better performance and better user experience.” The Whitepaper illustrates the benefits of a customer-centric approach with Value-Driven SON through two types of use cases: commuters and enterprise customers. The first depicts Value-Driven SON for Space & Time use cases - prioritizing network resources to meet the demand of commuters along a specific route experiencing poor service quality, such as dropped calls, session termination, latency and data throughput. The second, explains how Value-Driven SON can prioritize and maximize network performance for a defined group of customers e.g. enterprise customers. Download the whitepaper to find out more about “Value-Driven SON®: Putting the Customer at the Network’s Center”. Additional information Cellwize elastic-SON® and Value Driven-SON®

June 29, 2015

by Fran Cator

· 726 Views

The Cloudcast #198 - Architecting Cloud Foundry

Download the MP3 Date: June 19, 2015 By: Aaron Delp and Brian Gracely Description: Aaron and Brian talk to Chip Childers (@chipchilders, VP of Technology @CloudFoundryOrg) about the current status of Cloud Foundry projects, how Microsoft .NET will be integrated, IaaS vs. PaaS, and the CF.org thinking about overall interoperability Interested in the O'Reilly OSCON? Want to register for OSCON now? Use promo code 20CLOUD for 20% off Details to win an OSCON pass coming soon! Check out the OSCON Schedule Free eBook from O'Reilly Media for Cloudcast Listeners! Check out an excerpt from the upcoming Docker Cookbook Topic 1 - From an overall project perspective, what grades would you give Cloud Foundry in terms of stability, core functionality, security, operations, etc? Topic 2 - You were previously involved (directly/indirectly)with CloudStack. As you talk to people in the marketplace, how is it different discussing IaaS vs. PaaS. Topic 3 - How much ability will you have to drive prioritization within sub-projects or new projects? (eg. Security vs. new Languages vs. Interop, etc.) Topic 4 - What’s the CF.org way of thinking about interoperability? Topic 5 - What guidance are you giving the teams in terms of expandability of Cloud Foundry? Architecturally, are there certain places you recommend over other places? Topic 6 - Is there a place for integrating SaaS applications (monitoring, logging, etc.) into Cloud Foundry?

June 29, 2015

by Brian Gracely

· 1,135 Views

JBoss BPM Suite Quick Guide: Import External Data Models to BPM Project

You are working on a big project, developing rules, events and processes at your enterprise for mission critical business needs. Part of the requirements state that a certain business unit will be providing their data model for you to leverage. This data model will not be designed in the JBoss BPM Suite Data Modeler but you need to have access to it while working on your rules, events and processes from the business central dashboard. For this article we will be using the JBoss BPM Travel Agency demo project as a reference, with it's current data model built externally to the JBoss BPM Suite business central. The external data model is called the acme-data-model and is found in the project directory: This data model is built during installation and provides you with an object data model as a Java Archive (JAR) file which is installed into the JBoss BPM Suite business central component by placing it into the following location: jboss-eap-6.4/standalone/deployments/business-central.war/WEB_INF/lib/acmeDataModel-1.0.jar Authoring --> Artifact repository. This way of deploying the data model means that it is available to all projects you work on in JBoss BPM Suite business central, something that might not always be preferable. What we need is a way to deploy external data models into JBoss BPM Suite and then selectively add them to projects as needed. Within JBoss BPM Suite there is an Artifact Repository that is made just for this purpose. We can upload through the business central dashboard UI all our models and then pick and choose from the repository artifacts (your data model is one artifact) on a per project basis. This gives you absolute control over the models that a project can access. Choose external data model file. There are a few steps involved that we will take you through here to change the current installation of JBoss BPM Travel Agency where the acmeDataModel-1.0.jar file will be removed from the previously mentioned business central component and uploaded into the Artifact Repository and added to the Special Trips Agency project. Here is how you can do it yourself: obtain and install JBoss BPM Travel Agency demo project remove current data model from global business central application: $ rm ./target/jboss-eap-6.4/standalone/deployments/business-central.war/WEB_INF/lib/acmeDataModel-1.0.jar Upload external model jar file. start JBoss BPM Suite server after installation as stated in the installation instructions login to JBoss BPM Suite at http://localhost:8080/business-centralwith: u: erics p: bpmsuite1! go to AUTHORING --> ARTIFACT REPOSITORY go to UPLOAD --> CHOOSE FILE... --> projects/acme-data-model/target/acmeDataModel-1.0.jar --> click button to UPLOAD this puts the external data model into the JBoss BPM Suite artifact repository Select dependencies to add to project. got to AUTHORING --> PROJECT AUTHORING --> OPEN PROJECT EDITOR in project editor select GENERAL PROJECT SETTINGS --> DEPENDENCIES in dependencies select ADD FROM REPOSITORY -> in pop-upSELECT entry acmeDataModel-1.0.jar This will result in the external data model being added only to the Special Trips Agency project and not available to other projects unless they add this same dependency from the JBoss BPM Suite artifact repository. If you build & deploy the project, run it as described in the project instructions you will find that the external data model is available and used by the various rules and process components that are the JBoss BPM Travel Agency. As a closing note, this works exactly the same for JBoss BRMS projects.

June 29, 2015

by Eric D. Schabell

CORE

· 3,160 Views · 1 Like

Building an App with MongoDB: Creating a REST API Using the MEAN Stack Part 2

Written by Norberto Leite In the first part of this blog series, we covered the basic mechanics of our application and undertook some data modeling. In this second part, we will create tests that validate the behavior of our application and then describe how to set-up and run the application. Write the tests first Let’s begin by defining some small configuration libraries. file name: test/config/test_config.js Our server will be running on port 8000 on localhost. This will be fine for initial testing purposes. Later, if we change the location or port number for a production system, it would be very easy to just edit this file. To prepare for our test cases, we need to ensure that we have a good test environment. The following code achieves this for us. First, we connect to the database. file name: test/setup_tests.js Next, we drop the user collection. This ensures that our database is in a known starting state. Next, we will drop the user feed entry collection. Next, we will connect to Stormpath and delete all the users in our test application. Next, we close the database. Finally, we call async.series to ensure that all the functions run in the correct order. Frisby was briefly mentioned earlier. We will use this to define our test cases, as follows. file name: test/create_accounts_error_spec.js We will start with the enroll route in the following code. In this case we are deliberately missing the first name field, so we expect a status reply of 400 with a JSON error that we forgot to define the first name. Let’s “toss that frisby”: In the following example, we are testing a password that does not have any lower-case letters. This would actually result in an error being returned by Stormpath, and we would expect a status reply of 400. In the following example, we are testing an invalid email address. So, we can see that there is no @ sign and no domain name in the email address we are passing, and we would expect a status reply of 400. Now, let’s look at some examples of test cases that should work. Let’s start by defining 3 users. file name: test/create_accounts_spec.js In the following example, we are sending the array of the 3 users we defined above and are expecting a success status of 201. The JSON document returned would show the user object created, so we can verify that what was created matched our test data. Next, we will test for a duplicate user. In the following example, we will try to create a user where the email address already exists. One important issue is that we don’t know what API key will be returned by Stormpath a priori. So, we need to create a file dynamically that looks like the following. We can then use this file to define test cases that require us to authenticate a user. file name: /tmp/readerTestCreds.js In order to create the temporary file above, we need to connect to MongoDB and retrieve user information. This is achieved by the following code. file name: tests/writeCreds.js In the following code, we can see that the first line uses the temporary file that we created with the user information. We have also defined several feeds, such as Dilbert and the Eater Blog. file name: tests/feed_spec.js Previously, we defined some users but none of them had subscribed to any feeds. In the following code we test feed subscription. Note that authentication is required now and this is achieved using .auth with the Stormpath API keys. Our first test is to check for an empty feed list. In our next test case, we will subscribe our first test user to the Dilbert feed. In our next test case, we will try to subscribe our first test user to a feed that they are already subscribed-to. Next, we will subscribe our test user to a new feed. The result returned should confirm that the user is subscribed now to 2 feeds. Next, we will use our second test user to subscribe to a feed. The REST API Before we begin writing our REST API code, we need to define some utility libraries. First, we need to define how our application will connect to the database. Putting this information into a file gives us the flexibility to add different database URLs for development or production systems. file name: config/db.js If we wanted to turn on database authentication we could put that information in a file, as shown below. This file should not be checked into source code control for obvious reasons. file name: config/security.js We can keep Stormpath API and Secret keys in a properties file, as follows, and need to carefully manage this file as well. file name: config/stormpath_apikey.properties Express.js overview In Express.js, we create an “application” (app). This application listens on a particular port for HTTP requests to come in. When requests come in, they pass through a middleware chain. Each link in the middleware chain is given a req (request) object and a res (results) object to store the results. Each link can choose to do work, or pass it to the next link. We add new middleware via app.use(). The main middleware is called our “router”, which looks at the URL and routes each different URL/verb combination to a specific handler function. Creating our application Now we can finally see our application code, which is quite small since we can embed handlers for various routes into separate files. file name: server.js We define our own middleware at the end of the chain to handle bad URLs. Now our server application is listening on port 8000. Let’s print a message on the console to the user. Defining our Mongoose data models We use Mongoose to map objects on the Node.js side to documents inside MongoDB. Recall that earlier, we defined 4 collections: Feed collection. Feed entry collection. User collection. User feed-entry-mapping collection. So we will now define schemas for these 4 collections. Let’s begin with the user schema. Notice that we can also format the data, such as converting strings to lowercase, and remove leading or trailing whitespace using trim. file name: app/routes.js In the following code, we can also tell Mongoose what indexes need to exist. Mongoose will also ensure that these indexes are created if they do not already exist in our MongoDB database. The unique constraint ensures that duplicates are not allowed. The “email : 1” maintains email addresses in ascending order. If we used “email : -1” it would be in descending order. We repeat the process for the other 3 collections. The following is an example of a compound index on 4 fields. Each index is maintained in ascending order. Every route that comes in for GET, POST, PUT and DELETE needs to have the correct content type, which is application/json. Then the next link in the chain is called. Now we need to define handlers for each combination of URL/verb. The link to the complete code is available in the resources section and we just show a few examples below. Note the ease with which we can use Stormpath. Furthermore, notice that we have defined /api/v1.0, so the client would actually call /api/v1.0/user/enroll, for example. In the future, if we changed the API, say to 2.0, we could use /api/v2.0. This would have its own router and code, so clients using the v1.0 API would still continue to work. Starting the server and running tests Finally, here is a summary of the steps we need to follow to start the server and run the tests. Ensure that the MongoDB instance is running mongod Install the Node libraries npm install Start the REST API server node server.js Run test cases node setup_tests.js jasmine-node create_accounts_error_spec.js jasmine-node create_accounts_spec.js node write_creds.js jasmine-node feed_spec.js MongoDB University provides excellent free training. There is a course specifically aimed at Node.js developers and the link can be found in the resources section below. The resources section also contains links to good MongoDB data modeling resources. Resources HTTP status code definitions Chad Tindel’s Github Repository M101JS: MongoDB for Node.js Developers Data Models Data Modeling Considerations for MongoDB Applications

June 29, 2015

by Dana Groce

· 2,250 Views

Stackato on the Microsoft Azure Cloud

The growth of Azure has been outstanding--more than 90,000 new subscriptions every month. And the innovation is exponential with over 500 new features and services being added to the platform in the last 12 months. We're very excited to be part of this growth. As we announced yesterday, you can now access Stackato through Azure. We think it's a great way for Azure customers to get access to a Cloud Foundry and Docker based PaaS. With Azure, Microsoft provides an easy path to the cloud for their customers. All applications can be run on one cloud. Microsoft wants to dominate the cloud the same as it has with on-premise software and rarely does a day go by without reading an article about Azure. Whether it's their recent announcement to help encourage start-up's use of Azure by providing $120,000 worth of credits per year or their commitment to open source. Azure gives its customers a growing collection of integrated services that make it easier to build and manage enterprise, mobile, web and Internet of Things (IoT) apps faster. Enterprises face real complexities when building their cloud solution. Having a solid infrastructure is really just the first step in the process--companies also need the right platform to support the deployment and management of their cloud-native applications. The platform should give their developers the freedom to use the language best suited to build the application. In addition, enterprises are on more than one cloud. They need to have the versatility to scale out or move their applications to whatever cloud is appropriate in order to meet end user demand without any downtime. With Stackato, we help remove these complexities. We provide enterprises with a polyglot PaaS that supports the development of applications in virtually any language. We like to refer to Stackato as being "infrastructure-agnostic" and allow companies to deploy their applications to any cloud--private, public or hybrid--without the need to run new scripts or re-package the application in order for it to work in the new environment. The combination of Stackato on Azure gives enterprises the technology they need to streamline application delivery, drive innovation and meet the demands of their customers.

June 29, 2015

by Kathy Thomas

· 936 Views

MotionSense Not-E-FYE: Motion Activated Pushbullet Notification to Smartphone

Create a motion-activated Pushbullet Notification to your smartphone.

June 28, 2015

by Anupam Datta

· 812,856 Views · 1 Like

How to Facilitate Intentional Improvisation

At Bloomfire’s User Conference in May, I had the pleasure of listening to City of Austin’s Chief Innovation Officer Kerry O’Connor present on how government knowledge management is changing. The Innovation Office focuses on internal and public service innovation, as well as open government. O’Connor has worked in the public sector for many years – at the U.S. Department of State, the Office of Management Policy Rightsizing and Innovation, and several U.S. Embassies. She talked about seeing firsthand that the government is changing from a “need to know organization” to a “need to share organization.” O’Connor argues that disruption is inevitable, and will come whether in the form of opportunity or threat – and there’s no script. “When there’s no script,” O’Connor says, “we have to be intentionally improvisational.” O’Connor defines innovation as any project that is new to you and has an uncertain outcome. She talked about how important knowledge is in supporting innovation. As the first person to ever fill this role, her goal for her first year in office was to set up an innovation infrastructure. This included putting into place the processes, teams, and skills and information to create an environment that fosters innovation. O’Connor recommends that to facilitate intentional improvisation, you must frame the problems you want to solve first. Once you know the goal, look for innovation technology infrastructure that helps you manage contacts, relationships, projects, knowledge, ideas, and insights. We live in a world that is increasingly interconnected and disrupted, and O’Connor says that organizations are naturally becoming more networked, human-centered, and improvisational. She encouraged attendees to “use what you have; we must connect, coach, mentor, share, and experiment.” To ensure that citizens can interact with the knowledge that city employees have, the City of Austin created online public spaces. These spaces, created on Bloomfire, offer the opportunity for citizens to participate in a conversation with employees around innovation, data, and city orientation. I was inspired by O’Connor’s presentation, and proud to live in a city that is so forward thinking about how information is shared. It made me want to get more involved in finding ways to solve some of problems Austin is facing as a result of our rapid growth. As a result of her talk, I’m going to try to make it to this weekend’s ATX Hack for Change. If you would like to watch O’Connor’s entire presentation, you can access it on the Bloomfire Community. Like this post? Click here to subscribe to our blog and receive the latest content on social learning, customer support, sales enablement, or all three.

June 28, 2015

by Bloomfire Marketing

· 886 Views

Mystery Curve

this afternoon i got a review copy of the book creating symmetry: the artful mathematics of wallpaper patterns . here’s a striking curves from near the beginning of the book, one that the author calls the “mystery curve.” the curve is the plot of exp( it ) – exp(6 it )/2 + i exp(-14 it )/3 with t running from 0 to 2π. here’s python code to draw the curve. import matplotlib.pyplot as plt from numpy import pi, exp, real, imag, linspace def f(t): return exp(1j*t) - exp(6j*t)/2 + 1j*exp(-14j*t)/3 t = linspace(0, 2*pi, 1000) plt.plot(real(f(t)), imag(f(t))) # these two lines make the aspect ratio square fig = plt.gcf() fig.gca().set_aspect('equal') plt.show() maybe there’s a more direct way to plot curves in the complex plane rather than taking real and imaginary parts. updated code for the aspect ratio per janne’s suggestion in the comments. related posts : several people have been making fun visualizations that generalize the example above. brent yorgey has written two posts, one choosing frequencies randomly and another that animates the path of a particle along the curve and shows how the frequency components each contribute to the motion. mike croucher developed a jupyter notebook that lets you vary the frequency components with sliders. john golden created visualizations in geogerba here and here . jennifer silverman showed how these curves are related to decorative patterns that popular in the 1960’s. she also created a coloring book and a video . dan anderson accused me of nerd sniping him and created this visualization .

June 28, 2015

by John Cook

· 4,371 Views · 1 Like

Building an App with MongoDB: Creating a REST API Using the MEAN Stack Part 1

Written by Norberto Leite Introduction In this 2-part blog series, you will learn how to use MongoDB, Mongoose Object Data Mapping (ODM) with Express.js and Node.js. These technologies use a uniform language - JavaScript - providing performance gains in the software and productivity gains for developers. In this first part, we will describe the basic mechanics of our application and undertake data modeling. In the second part, we will create tests that validate the behavior of our application and then describe how to set-up and run the application. No prior experience with these technologies is assumed and developers of all skill levels should benefit from this blog series. So, if you have no previous experience using MongoDB, JavaScript or building a REST API, don’t worry - we will cover these topics with enough detail to get you past the simplistic examples one tends to find online, including authentication, structuring code in multiple files, and writing test cases. Let’s begin by defining the MEAN stack. What is the MEAN stack? The MEAN stack can be summarized as follows: M = MongoDB/Mongoose.js: the popular database, and an elegant ODM for node.js. E = Express.js: a lightweight web application framework. A = Angular.js: a robust framework for creating HTML5 and JavaScript-rich web applications. N = Node.js: a server-side JavaScript interpreter. The MEAN stack is a modern replacement for the LAMP (Linux, Apache, MySQL, PHP/Python) stack that became the popular way for building web applications in the late 1990s. In our application, we won’t be using Angular.js, as we are not building an HTML user interface. Instead, we are building a REST API which has no user interface, but could instead serve as the basis for any kind of interface, such as a website, an Android application, or an iOS application. You might say we are building our REST API on the ME(a)N stack, but we have no idea how to pronounce that! What is a REST API? REST stands for Representational State Transfer. It is a lighter weight alternative to SOAP and WSDL XML-based API protocols. REST uses a client-server model, where the server is an HTTP server and the client sends HTTP verbs (GET, POST, PUT, DELETE), along with a URL and variable parameters that are URL-encoded. The URL describes the object to act upon and the server replies with a result code and valid JavaScript Object Notation (JSON). Because the server replies with JSON, it makes the MEAN stack particularly well suited for our application, as all the components are in JavaScript and MongoDB interacts well with JSON. We will see some JSON examples later, when we start defining our Data Models. The CRUD acronym is often used to describe database operations. CRUD stands for CREATE, READ, UPDATE, and DELETE. These database operations map very nicely to the HTTP verbs, as follows: POST: A client wants to insert or create an object. GET: A client wants to read an object. PUT: A client wants to update an object. DELETE: A client wants to delete an object. These operations will become clear later when define our API. Some of the common HTTP result codes that are often used inside REST APIs are as follows: 200 - “OK”. 201 - “Created” (Used with POST). 400 - “Bad Request” (Perhaps missing required parameters). 401 - “Unauthorized” (Missing authentication parameters). 403 - “Forbidden” (You were authenticated but lacking required privileges). 404 - “Not Found”. A complete description can be found in the RFC document, listed in the resources section at the end of this blog. We will use these result codes in our application and you will see some examples shortly. Why Are We Starting with a REST API? Developing a REST API enables us to create a foundation upon which we can build all other applications. As previously mentioned, these applications may be web-based or designed for specific platforms, such as Android or iOS. Today, there are also many companies that are building applications that do not use an HTTP or web interface, such as Uber, WhatsApp, Postmates, and Wash.io. A REST API also makes it easy to implement other interfaces or applications over time, turning the initial project from a single application into a powerful platform. Creating our REST API The application that we will be building will be an RSS Aggregator, similar to Google Reader. Our application will have two main components: The REST API Feed Grabber (similar to Google Reader) In this blog series we will focus on building the REST API, and we will not cover the intricacies of RSS feeds. However, code for Feed Grabber is available in a github repository, listed in the resources section of this blog. Let’s now describe the process we will follow in building our API. We will begin by defining the data model for the following requirements: Store user information in user accounts Track RSS feeds that need to be monitored Pull feed entries into the database Track user feed subscriptions Track which feed entry a user has already read Users will need to be able to do the following: Create an account Subscribe/unsubscribe to feeds Read feed entries Mark feeds/entries as read or unread Modeling Our Data An in-depth discussion on data modeling in MongoDB is beyond the scope of this article, so see the references section for good resources on this topic. We will need 4 collections to manage this information: Feed collection Feed entry collection User collection User-feed-entry mapping collection Let’s take a closer look at each of these collections. Feed Collection Lets now look at some code. To model a feed collection, we can use the following JSON document: If you are familiar with relational database technology, then you will know about databases, tables, rows and columns. In MongoDB, there is a mapping to most of these Relational concepts. At the highest level, a MongoDB deployment supports one or more databases. A database contains one or more collections, which are the similar to tables in a relational database. Collections hold documents. Each document in a collection is, at a highest level, similar to a row in a relational table. However, documents do not follow a fixed schema with pre-defined columns of simple values. Instead, each document consists of one or more key-value pairs where the value can be simple (e.g., a date), or more sophisticated (e.g., an array of address objects). Our JSON document above is an example of one RSS feed for the Eater Blog, which tracks information about restaurants in New York City. We can see that there are a number of different fields but the key ones that our client application may be interested in include the URL of the feed and the feed description. The description is important so that if we create a mobile application, it would show a nice summary of the feed. The remaining fields in our JSON document are for internal use. A very important field is _id. In MongoDB, every document must have a field called _id. If you create a document without this field, at the point where you save the document, MongoDB will create it for you. In MongoDB, this field is a primary key and MongoDB will guarantee that within a collection, this value is unique. Feed Entry Collection After feeds, we want to track feed entries. Here is an example of a document in the feed entry collection: Again, we can see that there is a _id field. There are also some other fields, such as description, title and summary. For the content field, note that we are using an array, and the array is also storing a document. MongoDB allows us to store sub-documents in this way and this can be very useful in some situations, where we want to hold all information together. The entryID field uses the tag format to avoid duplicate feed entries. Notice also the feedID field that is of type ObjectId - the value is the _id of the Eater Blog document, described earlier. This provides a referential model, similar to a foreign key in a relational database. So, if we were interested to see the feed document associated with this ObjectId, we could take the value 523b1153a2aa6a3233a913f8 and query the feed collection on _id, and it would return the Eater Blog document. User Collection Here is the document we could use to keep track of users: A user has an email address, first name and last name. There is also an sp_api_key_id and sp_api_key_secret - we will use these later with Stormpath, a user management API. The last field, called subs, is a subscription array. The subs field tells us which feeds this user is subscribed-to. User-Feed-Entry Mapping Collection The last collection allows us to map users to feeds and to track which feeds have been read. We use a Boolean (true/false) to mark the feed as read or unread. Functional Requirements for the REST API As previously mentioned, users need to be able to do the following: Create an account. Subscribe/unsubscribe to feeds. Read feed entries. Mark feeds/entries as read or unread. Additionally, a user should be able to reset their password. The following table shows how these operations can be mapped to HTTP routes and verbs. Route Verb Description Variables /user/enroll POST Register a new user firstName lastName email password /user/resetPassword PUT Password Reset email /feeds GET Get feed subscriptions for each user with description and unread count /feeds/subscribe PUT Subscribe to a new feed feedURL /feeds/entries GET Get all entries for feeds the user is subscribed to /feeds/&ltfeedid>/entries GET Get all entries for a specific feed /feeds/&ltfeedid> PUT Mark all entries for a specific feed as read or unread read = &lttrue | false> /feeds/&ltfeedid>/entries/&ltentryid> PUT Mark a specific entry as either read or unread read = &lttrue | false> /feeds/&ltfeedid> DELETE Unsubscribe from this particular feed In a production environment, the use of secure HTTP (HTTPS) would be the standard approach when sending sensitive details, such as passwords. Real World Authentication with Stormpath In robust real-world applications it is important to provide user authentication. We need a secure approach to manage users, passwords, and password resets. There are a number of ways we could authenticate users for our application. One possibility is to use Node.js with the Passport Plugin, which could be useful if we wanted to authenticate with social media accounts, such as Facebook or Twitter. However, another possibility is to use Stormpath. Stormpath provides User Management as a Service and supports authentication and authorization through API keys. Basically, Stormpath maintains a database of user details and passwords and a client application REST API would call the Stormpath REST API to perform user authentication. The following diagram shows the flow of requests and responses using Stormpath. In detail, Stormpath will provide a secret key for each “Application” that is defined with their service. For example, we could define an application as “Reader Production” or “Reader Test”. This could be very useful when we are still developing and testing our application, as we may be frequently adding and deleting test users. Stormpath will also provide an API Key Properties file. Stormpath also allows us to define password strength requirements for each application, such as: Must have >= 8 characters. Must include lowercase and uppercase. Must include a number. Must include a non-alphabetic character Stormpath keeps track of all of our users and assigns them API keys, which we can use for our REST API authentication. This greatly simplifies the task of building our application, as we don’t have to focus on writing code for authenticating users. Node.js Node.js is a runtime environment for server-side and network applications. Node.js uses JavaScript and it is available for many different platforms, such as Linux, Microsoft Windows and Apple OS X. Node.js applications are built using many library modules and there is a very rich ecosystem of libraries available, some of which we will use to build our application. To start using Node.js, we need to define a package.json file describing our application and all of its library dependencies. The Node.js Package Manager installs copies of the libraries in a subdirectory, called node_modules/, in the application directory. This has benefits, as it isolates the library versions for each application and so avoids code compatibility problems if the libraries were to be installed in a standard system location, such as /usr/lib, for example. The command npm install will create the node_modules/ directory, with all of the required libraries. Here is the JavaScript from our package.json file: Our application is called reader-api. The main file is called server.js. Then we have a list of the dependent libraries and their versions. Some of these libraries are designed for parsing the HTTP queries. The test harness we will use is called frisby. The jasmine-node is used to run frisby scripts. One library that is particularly important is async. If you have never used node.js, it is important to understand that node.js is designed to be asynchronous. So, any function which does blocking input/output (I/O), such as reading from a socket or querying a database, will take a callback function as the last parameter, and then continue with the control flow, only returning to that callback function once the blocking operation has completed. Let’s look at the following simple example to demonstrate this. In the above example, we may think that the output would be: one two but in fact it might be: two one because the line that prints “one” might happen later, asynchronously, in the callback. We say “might” because if conditions are just right, “one” might print before “two”. This element of uncertainty in asynchronous programming is called non-deterministic execution. For many programming tasks, this is actually desirable and permits high performance, but clearly there are times when we want to execute functions in a particular order. The following example shows how we could use the async library to achieve the desired result of printing the numbers in the correct order: In the above code, we are guaranteed that function two will only be called after function one has completed. Wrapping Up Part 1 Now that we have seen the basic mechanics of node.js and async function setup, we are ready to move on. Rather than move into creating the application, we will instead start by creating tests that validate the behavior of the application. This approach is called test-driven development and has two very good features: It helps the developer really understand how data and functions are consumed and often exposes subtle needs like the ability to return 2 or more things in an array instead of just one thing. By writing tests before building the application, the paradigm becomes “broken / unimplemented until proven tested OK” instead of “assumed to be working until a test fails.” The former is a “safer” way to keep the code healthy.

June 28, 2015

by Dana Groce

· 10,891 Views · 2 Likes

Working with Merge and Identity Column -- A Practical Scenario

Introduction As we all know about the Identity columns and Merge statement. We are not going to discuss any boring theoretical tropics related to it. Better we are discussing here with a practical scenario of merging records. Hope all of you must enjoy it and it will be informative. The Scenario We have a Table with Identity Columns named #tbl_TempStudentRecords. The table details are mentioned bellow. Column Name SrlNo StudentName StudentClass StudentSection We have another table named #tbl_TempStudrntMsrks. The table details are mentioned bellow. Column Name SrlNo StubjectName MarksObtain What we want to do is, we have another set of table called tbl_StudentDetails mentioned bellow. Column Name StdRoll (PK) StudentName StudentClass StudentSection Another table named tbl_StudentMarks Column Name IdNo (PK) StdRoll (FK) References [tbl_StudentDetails].[StdRoll] SubjectName MarksObtain Her we can insert records very easily in tbl_StudentDetails from #tbl_TempStudentRecords very easily. But the main problem is the IDENTITY columns in the Table named [tbl_StudentDetails].[ StdRoll]. When we insert records the Identity columns values generate automatically. When we are trying to insert records into the table named tbl_StudentMarks from Table named #tbl_TempStudrntMsrks we have to provide the StdRoll values, which is the Foreign Key References to the Table named [tbl_StudentDetails].[ StdRoll]. Think one minute with the case scenario. Hope you can understand the problem. Now we have to solve it and we are not using any LOOP for that and NOT even any DDL operation to change the structure of base table. We are just using the SET BASED operation to make performance high. How to Solve it Step – 1 [ Create the Base Table First ] IF OBJECT_ID('tempdb..#tbl_TempStudentRecords')IS NOT NULL BEGIN DROP TABLE #tbl_TempStudentRecords; END GO CREATE TABLE #tbl_TempStudentRecords ( SrlNo BIGINT NOT NULL, StudentName VARCHAR(50) NOT NULL, StudentClass INT NOT NULL, StudentSection CHAR(1) NOT NULL ); GO IF OBJECT_ID('tempdb..#tbl_TempStudrntMsrks')IS NOT NULL BEGIN DROP TABLE #tbl_TempStudrntMsrks; END GO CREATE TABLE #tbl_TempStudrntMsrks ( SrlNo BIGINT NOT NULL, StubjectName VARCHAR(50) NOT NULL, MarksObtain INT NOT NULL ); GO IF OBJECT_ID(N'[dbo].[tbl_StudentDetails]', N'U')IS NOT NULL BEGIN DROP TABLE [dbo].[tbl_StudentDetails]; END GO CREATE TABLE [dbo].[tbl_StudentDetails] ( StdRoll BIGINT NOT NULL IDENTITY(100,1) PRIMARY KEY, StudentName VARCHAR(50) NOT NULL, StudentClass INT NOT NULL, StudentSection CHAR(1) NOT NULL ); GO IF OBJECT_ID(N'[dbo].[tbl_StudentMarks]', N'U')IS NOT NULL BEGIN DROP TABLE [dbo].[tbl_StudentMarks]; END GO CREATE TABLE [dbo].[tbl_StudentMarks] ( IdNo BIGINT NOT NULL IDENTITY(1,1) PRIMARY KEY, StdRoll BIGINT NOT NULL, SubjectName VARCHAR(50) NOT NULL, MarksObtain INT NOT NULL ); GO ALTER TABLE [dbo].[tbl_StudentMarks] ADD CONSTRAINT FK_StdRoll_tbl_StudentMarks FOREIGN KEY(StdRoll) REFERENCES [dbo].[tbl_StudentDetails](StdRoll); Step – 2 [ Inserting Records in Temp Table ] INSERT INTO #tbl_TempStudentRecords (SrlNo, StudentName, StudentClass, StudentSection) VALUES(1, 'Joydeep Das', 1, 'A'), (2, 'Preeti Sharma', 1, 'A'), (3, 'Deepasree Das', 1, 'A'); INSERT INTO #tbl_TempStudrntMsrks (SrlNo, StubjectName, MarksObtain) VALUES (1, 'Bengali', 50), (1, 'English', 70), (1, 'Math', 80), (2, 'Bengali', 0), (2, 'English', 70), (2, 'Math', 80), (3, 'Bengali', 20), (3, 'English', 90), (3, 'Math', 95); Step – 3 [ Now Solve it By MERGE Statement ] BEGIN DECLARE @MappingTable TABLE ([NewRecordID] BIGINT, [OldRecordID] BIGINT) MERGE [dbo].[tbl_StudentDetails] AS target USING (SELECT [SrlNo] AS RecordID_Original ,[StudentName] ,[StudentClass] ,[StudentSection] FROM #tbl_TempStudentRecords ) AS source ON (target.StdRoll = NULL) WHEN NOT MATCHED THEN INSERT ([StudentName], [StudentClass], [StudentSection]) VALUES (source.[StudentName],source.[StudentClass], source.[StudentSection]) OUTPUT inserted.[StdRoll], source.[RecordID_Original] INTO @MappingTable; --- Now Map table is ready and we can use it --- INSERT INTO [dbo].[tbl_StudentMarks] (StdRoll, SubjectName, MarksObtain) SELECT b.NewRecordID, a.StubjectName, a.MarksObtain FROM #tbl_TempStudrntMsrks AS a INNER JOIN @MappingTable AS b ON a.SrlNo = b.OldRecordID; END GO Step – 4 [ Observation ] SELECT * FROM [dbo].[tbl_StudentDetails]; GO SELECT * FROM [dbo].[tbl_StudentMarks]; GO StdRoll StudentName StudentClass StudentSection 100 Joydeep Das 1 A 101 Preeti Sharma 1 A 102 Deepasree Das 1 A IdNo StdRoll SubjectName MarksObtain 1 100 Bengali 50 2 100 English 70 3 100 Math 80 4 101 Bengali 0 5 101 English 70 6 101 Math 80 7 102 Bengali 20 8 102 English 90 9 102 Math 95

June 28, 2015

by Joydeep Das

· 5,486 Views

Spark Grows Up and Scales Out

Written by Craig Wentworth. To understand the furor that’s greeted recent vendor announcements around open source analytics computing engine Spark, and some commentary seemingly setting up a Spark versus Hadoop battle, it’s worth taking a moment to recap on what each actually is (and is not). As I covered in last year’s MWD report on Hadoop and its family of tools, when people talk about Apache Hadoop they’re often referring to a whole framework of tools designed to facilitate distributed parallel processing of large datasets. That processing was traditionally confined to MapReduce batch jobs in Hadoop’s early days, though Hadoop 2 brought the YARN resource scheduler and opened up Hadoop to streaming, real-time querying and a wider array of analytical programming applications (beyond MapReduce). Spark has been designed to run on top of Hadoop’s Distributed File System (amongst other data platforms) as an alternative to MapReduce – tuned for real-time streaming data processing and fast interactive queries, and with multi-genre analytics applicability (machine learning, time series, graph, SQL, streaming out-of-the-box). It gets that speed advantage by caching in-memory (rather than writing interim results to disk, as MapReduce does), but with that approach comes a need for higher-spec physical machines (compared with MapReduce’s tolerance for commodity hardware). So, Spark isn’t about to replace Hadoop -- but it may well supplant MapReduce (especially in growing real-time use cases). Those “Spark vs Hadoop” headlines are about as meaningful as one proclaiming “mushrooms vs pizza." Yes, mushroom might be a more suitable topping than, say, pepperoni (especially in a vegetarian use case), but it’ll still be deployed on the same dough and tomato sauce pizza platform. Nobody’s about to suggest the mushroom should go it alone! But what’s behind the headlines and the hype is a story of enterprise adoption – or at least vendors anticipating that adoption and investing in ‘the weaponization of Spark’ as it faces the more exacting standards of security, scaling performance, consistency, etc. which come with mainstream enterprise deployment. Big names like IBM, Databricks (the company formed by the originators of Spark), and MapR made commitments in and around the Spark Summit earlier this month. MapR has announced three new Quick Start Solutions for its Hadoop distribution to help customers get started with Spark in real-time security log analytics, genome sequencing, and time series analytics; Databricks’ cloud-hosted Spark platform (formerly known as Databricks Cloud) has become generally available; and IBM announced a raft of measures designed to give Spark a significant shot in the arm – it’s open sourcing its SystemML technology to bolster Spark’s machine learning capabilities, integrating Spark into its own analytics platforms, investing in Spark training and education, committing 3,500 of its researchers and developers to work on Spark-related projects, and offering Spark as a service on its Bluemix developer cloud. Given the overlap with Databricks’ business model (of offering development, certification, and support for Spark), IBM’s intentions are likely to tread on some toes before long – but for now, at least, both companies are content to focus on the combined push benefiting the Spark community and its enterprise aspirations overall (though clearly IBM’s betting on all this investment buying it some influence over where Spark goes next). It’s worth bearing in mind that not all its supporters champion Spark wholesale and all the interested parties tend to be interested in particular bits of Spark (as wide-ranging as it is) because of overlaps with their own preferred toolsets. For instance, although Spark supports many analytics genres, Cloudera focuses on its machine learning capabilities (as it has its own SQL-on-Hadoop tool in Impala), and MapR and Hortonworks also promote Drill and Hive as their favoured source of SQL-on-Hadoop. IBM’s support is focused on Spark’s machine learning and in-memory capabilities (hence the SystemML open sourcing news). In the face of such strong vendor preferences, how long before some of Spark’s current features fall away (or at least start to show the effects of being starved of as much care and feeding as is bestowed upon vendors’ favourite Spark components)? The Spark community is at much the same place the Hadoop one was at a while back – it’s showing great promise and suitability in key growth workloads (in Spark’s case, such as real-time IoT applications). However, the product as it stands is too immature for many enterprise tastes. Cue enterprise software vendors stepping up to help grow Spark up fast. Their challenge though is to smooth out the edges without smothering what made it so interesting in the first place.

June 28, 2015

by Angela Ashenden

· 2,352 Views

Cloud Strategy and Collaboration Software

We’re going back to the classics this month, with the latest enterprise collaboration news round-up focussing on cloud strategy and considerations and benefits when it comes to implementing collaboration software. Mashable shared an infographic it created in conjunction with Hewlett Packard, which compiles data and research suggesting that the use of hybrid and private cloud computing is on the rise. The article quotes statistics from Rightscale, which states that 82% of enterprises have a multi-cloud strategy already, and of these 14% use multiple private clouds, 13% use multiple public clouds, and 55% use hybrid clouds. Mashable quotes Technology Business Research, which states that “there is continued migration of enterprise vendors in mature markets such as the U.S. to hybrid and private cloud platforms to provide software vendors an opportunity to generate adoption for management technologies, as customers require next-generation tools to manage heterogeneous IT infrastructures efficiently.” In his article for eWEEK, Chris Preimesberger outlines 10 ways IT and business leaders must collaborate on cloud strategies. Chris explains that a decision to use cloud services is no longer simply down to the IT department. During the last nine years, he says, entire businesses have become necessarily immersed in IT strategies in order to harness the cloud for economics, innovation, operations and growth. He shares a slide show which provides advice for how technical and business leaders can collaborate to build a secure cloud strategy. The slide show states that usage indicate that private clouds are expected to grow at double the rate of public cloud, a result of ongoing concerns about data security and privacy. Gary Audin asks the question cloud economics or flexibility? in his article for No Jitter. Gary explains that although the cost of cloud can be attractive, that might not be the real draw for enterprises. He states that knowing what costs to consider as part of a cloud service implementation is vital to making the right decision about cloud. Gary points out the benefits of the cloud as being far more than simply a matter of cost. He explains that the cloud allows rapid response for an enterprise as it contends with change due to situations such as staff growth or reduction, market fluctuations, financial limitations, or new opportunities. Above all, Gary explains, the cloud delivers flexibility and it is this which makes it the most attractive option for enterprises. In his article for MSP Mentor, Michael Brown reveals the result of a recent report on cloud adoption in the enterprise. The report, by Skyhigh Networks, revealed that enterprise cloud adoption grew by 43% in 2014. Michael highlights findings on the file sharing front, revealing that 37 percent of employees were found to be uploading sensitive business data to consumer file sharing services. Consumer file sharing services are one element of a growing trend towards BYOC (bring your own cloud, content and collaboration). Robert Bamforth explains that BYOC is an evolution of BYOD (bring your own device) which posed a challenge to IT departments since the rise of the smartphone. Robert explains that BYOC is a new challenge for IT departments in controlling their organisation’s digital assets while liberating employee productivity and information sharing. Robert states that the BYOC conundrum should change as enterprise-strength security features and tools continue to evolve to have more consumer-like interfaces, which will make asking employees to use enterprise tools much easier. He gives some suggestions to help enterprises in the mean time: understand the appeal of consumer tools, make sure everyone understands security risks, forget trying to apply strong rules to trivial information, get a mobile-ready solution, look for and pre-plug data leaks, and above all don’t stop collaboration if it’s happening. In his article for ZDNet, Dion Hinchcliffe reflects on the state of the digital collaboration industry. Far from maturing, Dion says, the collaboration tool space is busier than ever evolving, branching out, and multiplying. But, he asks, are organizations able to adopt so many different ways of working together? Dion observes that instead of settling down, the collaboration software space is actually get more interesting and varied, and he is seeing new technologies, such as applications that focus on optimizing collaboration for mobile devices or for team analytics. It’s now time for organizations to design a strong foundation for digital collaboration, says Dion, as the near future promises many key new innovations that must be considered and incorporated to stay competitive, both to customers and the workforce. When businesses do decide to adopt one or more digital collaboration platforms, Andre Bourque offers some helpful ways in which to measure ROI. Andre quotes a Mashable report which states that cloud collaboration drives creativity and engagement, leading to happier employees and a better company culture, but this is not a metric that is easily measurable. Andre explains that it’s hard to find definitive examples of ROI, as most are anecdotal or “in process”, and merely counting user adoption rate of a collaborative platform is inadequate. Instead, Andre quotes Angela Ashenden, of MWD Advisors, who offers the following metrics to consider: reduced travel time and costs; creating new business opportunities and services; increased employee retention rates, cost savings across the organisation, and faster on-boarding for new users. Do you have any metrics that you find useful to measure ROI on your collaboration platform in your organisation?

June 27, 2015

by Highq Collaborate

· 1,369 Views

How to Keep REST API Credentials Secure

If you are building mobile apps then you are connecting to some REST API. For example, if you want to resolve an address to a latitude/longitude information to display on a map, you might use the Google Geocoding API: Google Geocoding API: https://maps.googleapis.com/maps/api/geocode/json?address=San Francisco,CA&key=AIzaSyDvFMYGjeR02RH If you are invoking the API from the client, then the API key also has to be present on the client. But, this is also the problem. It’s very easy to look at the app source in the browser and get access to the API key. If someone has access to your API key, they can send requests on your behalf (without you knowing), and use up your request quota. Even if you are building a hybrid app, it’s still the same problem. A hybrid app is HTML/JavaScript inside a native wrapper, it’s possible to download the app, un-package it and gain access to API keys or any sensitive information stored in the app. Even native apps are not immune to this. For example, an Android app is just a Java application and a Java application can be de-compiled to view the original source. The next image shows how to get access to an API key in the browser: Viewing app source in browser A good solution is to never expose the API key (or any other sensitive data) on the client. How do you do that? You keep the API key and any other sensitive information on the server. Appery.io Secure Proxy Appery.io Secure Proxy (part of Backend Services) enables app developers to keep sensitive app data on the server. Your API keys or any other data is never exposed on the client. Watch this 5-minute video on how to use Secure Proxy: Before using the Secure Proxy, you need to store the data on the server. To store the data you are going to use the Appery.io Database. It’s as simple as creating a collection with two columns. The first column is the value name, the second column is the actual value. This is how the database looks when storing the API key for Google Geocoding API: Saving API key in database As this key is stored on the server, no one (but you) has access to it. You can store other data as well such as URLs, tokens or anything else that shouldn’t be exposed on the client. The next step is to setup the proxy that will use the information stored in the database. This step is also very simple, this is how it looks: Secure proxy linked to a database You give the proxy a name and then link it to a database which stores your data. The above proxy is linked to Secrets_db database, Credentials collection, and secretName, secretValue columns. The last step is to link a REST API service to the proxy. In the service editor you select the secure proxy created: REST API service using secure proxy then in the Request tab you reference the API key stored in the database (the name stored in secretName column): Request parameter substitution will happen on the server and that’s it. When the API service is invoked, the call will go through the secure proxy (server) where the API key will substituted: API key is not exposed on the client For web apps, you can add an extra layer of security by specifying from which page URLs the proxy should accept requests: URL-based security The proxy will only accept requests from page URLs listed in the table. Another option to keep API keys private is to invoke the API from the server using Server Code, I will cover this in another post. Setting up an using the Appery.io Secure Proxy is simple. It provides a very important feature by allowing to keep sensitive and private data on the server, never exposing it on the client, and adding an extra security layer to your app.

June 27, 2015

by Max Katz

· 7,085 Views

Notes from Troy Hunt's Hack Yourself First Workshop

Troy Hunt (@troyhunt, blog) had a great, very hands-on 2-day workshop about webapp security at NDC Oslo. Here are my notes. Highlights – resources Personal security and privacy https://www.entropay.com/ – a Prepaid Virtual Visa Card mailinator.com – tmp email f-secure VPN https://www.netsparker.com/ – scan a site for issues (insecure cookies, framework disclosure, SQL injection, …) (lot of $k) Site security https://report-uri.io/ – get reports when CSP rules violated; also displays CSP headers for a site in a human-friendly way https://securityheaders.io/ check quality of headers wrt security free SSL – http://www.startssl.com/, https://www.cloudflare.com/ (also provides web app firewall and other protections) ; SSL quality check: https://www.ssllabs.com/ssltest/ https://letsencrypt.org/ – free, automated, open Certificate Authority (Linux Found., Mozilla) Breaches etc. http://arstechnica.com/security/2015/06/hack-of-cloud-based-lastpass-exposes-encrypted-master-passwords/ https://twitter.com/jmgosney – one of ppl behind http://passwordscon.org . http://password-hashing.net experts panel. Team Hashcat. http://arstechnica.com/security/2012/12/25-gpu-cluster-cracks-every-standard-windows-password-in-6-hours/ To follow ! http://krebsonsecurity.com/ ! http://www.troyhunt.com/ ! https://www.schneier.com/ ! https://twitter.com/mikko (of F-Secure) also great [TED] talks kevin mitnick (jailed for hacking; twitter, books) Books http://www.amazon.com/We-Are-Anonymous-LulzSec-Insurgency/dp/0316213527 – easy read, hard to put down http://www.amazon.com/Ghost-Wires-Adventures-Worlds-Wanted/dp/1441793755 – about Mitnick’s hacking, social engineering, living on the run ? http://www.amazon.com/Art-Intrusion-Exploits-Intruders-Deceivers/dp/0471782661/ Mitnick: http://www.amazon.com/Art-Deception-Controlling-Element-Security/dp/076454280X/ – social engineering Other https://www.xssposed.org/ See https://www.drupal.org/SA-CORE-2014-005 https://www.youtube.com/watch?v=Qvhdz8yE_po – Havij example http://www.troyhunt.com/2013/07/everything-you-wanted-to-know-about-sql.html, http://www.troyhunt.com/2010/05/owasp-top-10-for-net-developers-part-1.html, http://www.troyhunt.com/2012/12/stored-procedures-and-orms-wont-save.html, Googlee: find config files with SA access info: `inurl:ftp inurl:web.config filetype:config sa` https://scotthelme.co.uk/hardening-your-http-response-headers/ and https://securityheaders.io/ https://developer.mozilla.org/en-US/docs/Web/Security/Public_Key_Pinning – prevent MITM wappalyzer chrome plugin displaying info about the server and client that can be detected (jQuery, NewRelic, IIS, win OS, …) http://www.troyhunt.com/2015/05/do-you-really-want-bank-grade-security.html http://www.troyhunt.com/2012/05/everything-you-ever-wanted-to-know.html tool: https://github.com/gentilkiwi/mimikatz extract plaintexts passwords, hash, PIN code and kerberos tickets from memory on Windows Notes HackYourselfFirst.troyhunt.com – an example app with many vulnerabilities Note: maximizing your browser window will share info about your screen size, which might help to identify you haveibeenpwned.com – Troy’s online DB of hacked accounts Tips check robots.txt to know what to access Example Issues no https on login page insecure psw requirements cookies not secure flag => sent over http incl. AuthCookie) psw sent in clear text in confirm email user enumeration, f.eks. an issue with AdultFriendFinder – entry someone’s email to login to find out whether they’ve an account post illegal chars, get them displayed => injection no anti-automation (captcha) login confirm. email & autom. creating 1m accounts => sending 1m emails => pisses ppl off, likely increase one’s spam reputation (=> harder to send emails) brute-force protection? ### XSS Reflected XSS: display unescaped user input Encoding context: HTML, JS, CSS … have diff. escape sequences for the same char (e.g. <) – look at where they’re mixed Check the encoding consistency – manual encoding, omitting some chars JS => load ext resources, access cookies, manipulate the DOM Task: stal authCookie via search ### SQL injection Error-based injection: when the DB helps us by telling us what is wrong -> use ti learn more and even show some data Ex.: http://hackyourselffirst.troyhunt.com/Make/10?orderby=supercarid <—— supercarid is a column name orderby=(select * from userprofile) … learn about DB sructure, force an exception that shows the valueex.: (select top 1 cast(password) as int from userprofile) => “Conversion failed for the nvar value ‘passw0rd …’” Tips think of SQL commands that disclose structure: sys.(tables,columns), system commands enumerate records: nest queries: select top X ows asc then top 1 rows from that desc write out how you think the query works / is being constructed internally cast things to invalid types to disclose values in err msgs (or implicit cast due to -1 ..) #### Defenses whitelist input data types (id=123 => onlyallow ints) enumerable values – check against an appropr. whitelist if the value is stored – who uses it, how? making query/insertion safe permissions: give read-only permissions as much as possible; don’t use admin user from your webapp ### Mobile apps Look at HTTP req for sensitive data – creds, account, … Apps may ignore certificate validations In your app: param tampering, auth bypass, direct object refs Weak often: airlines, small scale shops, fast foods, … Tips certificate pining – the app has the fingerprint of the server cert. hardcoded and doesn’t trust even “valid” MITM certificate (banks, dropbox, …)x ### CSRF Cross-Site Request Forgery = make the user send a request => their auth cookie included async Ajax req to another site forbidden but that doesn’t apply to normal post Protection anti-forgery tags ### Understanding fwrk disclosure http://www.shodanhq.com/ -> search for “drupal 7” -> pwn How disclosed: headers familiar signs – jsessionid cookie for java, … The default error and 404 responses may help to recognize the fwr HTML code (reactid), “.do” for Sttruts implicit: order of headers (Apache x IIS), paths (capitalized?), response to improper HTTP version/protocol, => likely still possible to figure out the stack but not possible to simple search for fwrk+version ### Session hijacking Steal authentication cookie => use for illegal requests. Persistence over HTTP of auth., session: cookie, URL (but URL insecure – can be shared) Session/auth ID retrieval: insecure transport, referrer, stored in exceptions, XSS Factors limiting hijacking: short duration expiry, keyed to client device / IP (but IPs may rotate, esp, on mobile devices => be very cautious) DAY 2 ——– ### Cracking passwords Password hashing: salt: so that 2 ppl choosing the same psw will have a different hash => cracking is # salts * # passwords inst. of just N has cracking tips: character space Dictionary: passw0rd, … Mutations: manipulation and subst. of characters Tips: 1Password , LastPass, …. GPU ~ 100* faster than CPU #### Ex: Crack with hashcat common psw dict + md5-hashed passwords => crack ./hashcat-cli64.bin –hash-type=0 StratforHashes.txt hashkiller.com.dic # 23M psw dict -> Recovered.: 44 326/860 160 hashes [obs duplications] in 4 min (speed 135.35k plains) Q: What dictionary we use? Do we apply any mutations to it? ### Account enumeration = Does XY have an account? Multiple vectors (psw reset, register a new user with the same e-mail, …) Anti-automation: is there any? It may be inconsistent across vectors Does it matter? (<> privacy needs) How to “ask” the site and how to identify + and – responses? Timing attacks: distinguish positive x negative response based on the latency differing between the two ### HTTPS Confidentiality, Integrity, Authenticity Traffic hijacking: [a href="https://www.wifipineapple.com/"]https://www.wifipineapple.com/ – wifi hotspot with evil capabilities monitor probe requests (the phone looks for networks it knows), present yourself as one of those, the phone connects autom. (if no encryption) Consider everything sent over HTTP to be compromised Look at HTTPS content embedded in untrusted pages (iframes, links) – e.g. payment page embedded in http Links HSTS Preload – tell Chrome, FF that your site should only be ever loaded over HTTPS – https://hstspreload.appspot.com/ https://www.owasp.org/index.php/HTTP_Strict_Transport_Security header ### Content Scurity Policy header https://developer.chrome.com/extensions/contentSecurityPolicy See e.g. https://haveibeenpwned.com/ headers w/o CSP anything can be added to the page via a reflected XSS risk Anyth, can be added to the DOM downstream (on a proxy) With CSP the browser will only load resources you white-list; any violations can be reported Use e.g. https://report-uri.io/home/generate to create it and the report to watch for violations to fine tune it. ### SQL injection cont’d (Yesterday: Error-Based) #### Union Based SQLi Modify the query to union whatever other data and show them. More data faster than error-based inj. Ex.: http://hackyourselffirst.troyhunt.com/CarsByCylinders?Cylinders=V12 : V12 -> `V12′ union select voteid, comments collate SQL_Latin1_General_CP1_CI_AS from vote– ` #### Blind Boolean (laborious) Blind inj.: We can’t always rely on data being explicitly returned to the UI => ask a question, draw a conclusion about the data. Ex: http://hackyourselffirst.troyhunt.com/Supercar/Leaderboard?orderBy=PowerKw&asc=false -> ordedby => case when (select count(*) from userprofile) > 1 then powerkw else topspeedkm end Extract email: Is ascii of the lowercase char #1 < ascii of m ? Automation: SqlMap #### Time based blind injection When no useful output returned but yes/no responses differ significantly in how much time they take. F.ex. ask the db to delay the OK response. MS SQL: IF ‘b’ > ‘a’ WAITFOR DELAY ’00:00:05′ ### Brute force attacks Are there any defences? Often not How are defences impl? block the req resources block the src IP rate limit (by src IP) ### Automation penetration testing apps and services such as Netsparker, WhiteHatSec targets identification: shodan, googledorks, randowm crawling think aout the actions that adhere to a pattern – sql injection, fuzzing (repeat a req. trying diff. values for fields – SQLi, …), directory enumeration automation can be used for good – test your site tip: have autom. penetration testing (and perhaps static code analysis) as a part fo your build pipeline Task: Get DB schema using sqlmap (see python2.7 sqlmap.py –help) ### Protection Intrusion Detection System (IDS) – e.g. Snort Web Application Firewall (WAF) – e.g. CloudFare ($20/m)

June 27, 2015

by Jakub Holý

· 3,543 Views