JavaScript Resources

The Latest JavaScript Topics

The UUID Discussion

UUID really start coming in handy is when you start synchronizing data across servers.

July 3, 2015

by Lieven Doclo

· 26,838 Views

Mapping complex JSON structures with JDK8 Nashorn

How can you map a complex JSON structure to another JSON structure in Java? I think there are a few possible solutions in Java. The first solution is to use a serialization framework like Jackson, GSON or smart-json. The mapping is a piece of awkward Java code with a lot of if-else conditions. The result is hard to test and hard to maintain. Schematic it looks like this: JSON -> Java objects -> Mapping -> Java objects -> JSON An second approach is to use a templating framwork (like Freemarker or Velocity) in combination with a serialization framwork. The logic of the mapping has moved to the template. Schematic it looks like this: JSON -> Java objects -> Apply template -> JSON One of the issues with this approach is that the template must enforce that the result is a valid JSON structure. I have tried this approach and it is really hard to produce a valid JSON structure in all use cases. You could also map your JSON to XML and create the mapping with an XSL transformations. Schematic it looks like this: JSON -> XML -> XSL transformation -> XML -> JSON But the ideal schema looks like this: JSON -> Mapping -> JSON With JDK 8 and the Nashorn Javascript engine this becomes possible! This implementation provides JSON.parse() and JSON.stringify() by default. Example Javascript: function convert(val) { var json = JSON.stringify(val); var g = JSON.parse(json); var d = { chunkId: g.chunk.id, timestamp: g.chunk.timestamp }; return JSON.stringify(d); } Java code: private ScriptEngineManager engineManager; private ScriptEngine engine; public MyConverter() { ClassPathResource resource = new ClassPathResource("/converter.js"); InputStreamReader reader = new InputStreamReader(resource.getInputStream()); engineManager = new ScriptEngineManager(); engine = engineManager.getEngineByName("nashorn"); engine.eval(reader); } public String convert(String val){ return (String) engine.eval("convert(" + source + ")"); } I think this is -at this moment- the best approach, Java 9 will ship with native JSON support. Perhaps it will become more easier in the future. More info can be found on my blog.

July 2, 2015

by Jethro Bakker

· 1,482 Views

Azure Service Bus – As I Understand It: Part II (Queues & Messages)

continuing from my previous post about azure service bus, in this post i will share my learning about queues & messages. the focus of this post will be about some of the undocumented things i found as we implemented support for queues and messages in cloud portam . queues as mentioned in my previous post, queues is the simplest of the azure service bus service and kind of compares with azure storage queue service in the sense that it provides a unidirectional messaging infrastructure where a publisher publishes a message and the message is received by a receiver. there can be many receivers ready to receive the messages however one receiver can only receive a message. no two receivers can receive a single message simultaneously. now some learning about queues. queue name a queue name can be up to 260 characters in length and can contain letters, numbers, periods (.), hyphens (-), and underscores (_) . a queue name is case-insensitive. queue size when creating a queue, you must define the size of the queue. queue size could be one of the following values: 1 gb, 2 gb, 3 gb, 4 gb or 5 gb . a queue size can’t be changed once the queue is created. however if you create a “ partition enabled queue ” then service bus creates 16 partitions thus your queue size is automatically multiplied by 16 and your queue size becomes 16 gb, 32 gb, 48 gb, 64 gb or 80 gb depending on the size you selected (this confused me initially :)). queue properties a service bus queue has many properties. some of the properties can only be set during queue creation time while some of the properties can only be set if you are using “standard” tier of service bus. (above are the screenshots from cloud portam for creating a queue) status indicates the status of a queue – active or disabled . once a queue is disabled, it cannot send or receive messages. max delivery count (maxdeliverycount) indicates the maximum number of times a message can be delivered . once this count has exceeded, message will either be removed from the queue or dead-lettered. the way i understand it is this property is used to manage poison messages. if a message is not processed successfully by receivers for “x” number of times, just move it somewhere else for further inspection or remove it. message time to live (messagettl) indicates a time span for which a message will live inside a queue . if the message is not processed by that time, it will either be removed or dead-lettered. one interesting thing i noticed is that if you’re using “standard” tier, a message could live forever in a queue however in “basic” tier, a message can only live for a maximum of 14 days . lock duration (lockduration) indicates number of seconds for which a message will be locked by a receiver once it receives it so that no other receiver can receive that message . it essentially gives the receiver time to process the message. once this elapses, message will be available to be received by another receiver. maximum value for lock duration can be 5 minutes / 300 seconds . enable partitioning (enablepartitioning) indicates if the queue should be partitioned across multiple message brokers . as mentioned above, service bus automatically creates 16 partitions if this is enabled. this also results in maximum size of the queue increase by a factor of 16. this property can only be set during queue creation time . enable deadlettering (enabledeadlettering) indicates if the messages in the queue should be moved to dead-letter sub queue once they expire. if this property is not set, then the messages will be removed from the queue once they expire. enable batching (enablebatchedoperations) indicates if server-side batched operations are supported. this is used to improve the throughput of a queue as service bus holds the messages for up to 20ms before writing/deleting them in a batch. enable message ordering (supportordering) indicates if the queue supports ordering. requires duplicate detection (requiresduplicatedetection) indicates if the queue requires duplicate detection. this property can only be set during queue creation time and is only available for “standard” tier. enable express (enableexpress) indicates if the queue is an express queue. an express queue holds a message in memory temporarily before writing it to persistent storage. this property can only be set during queue creation time and is only available for “standard” tier. requires session (requiressession) indicates if the queue supports the concept of session. this property can only be set during queue creation time and is only available for “standard” tier. auto delete queue this property specifies a time period after which an idle queue should be deleted automatically by service bus . minimum period allowed is 5 minutes. this can only be set for “standard” tier . duplicate detection history time window (duplicatedetectionhistorytimewindow) defines the duration of the duplicate detection history. this can only be set for “standard” tier . forward messages to queue/topic (forwardto) you can use this property to automatically forward messages from a queue to another queue or topic. when setting this property, the queue/topic must exist in the account. this can only be set for “standard” tier . forward dead-lettered messages to queue/topic (forwarddeadletteredmessagesto) you can use this property to automatically forward dead-lettered message to another queue or topic. when setting this property, the queue/topic must exist in the account. user metadata (usermetadata) you can use this property to define any custom metadata for a queue. following table summarizes property applicability by tier and whether they are editable or not. property tier editable? size basic, standard no status basic, standard yes max delivery count basic, standard yes message time to live basic, standard yes lock duration basic, standard yes enable partitioning basic, standard no enable deadlettering basic, standard yes enable batching basic, standard yes enable message ordering basic, standard yes requires duplicate detection standard no enable express standard no require session standard no auto delete queue standard yes duplicate detection history time window standard yes forward messages to queue/topic standard yes forward dead-lettered messages to queue/topic basic, standard yes user metadata basic, standard yes to learn more about these properties, please see this link: https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.queuedescription.aspx . messages the way i see it, messages are the entities that contain information about the work a sender wants a receiver to do. as mentioned earlier, a sender sends a message to a queue and a receiver will receive the message. at any time, a message will be received by one and only one receiver. message processing there’re two ways by which a receiver will receive a message: peek and lock & receive and delete . peek and lock in peek and lock mode, the message is locked by the receiver for a duration specified by queue’s “ lock duration ” property or in other words under this mode a message is hidden from other receivers for a duration specified by lock duration. the receiver then would process the message and after that a receiver would mark the message as “ complete ” which essentially deletes the message from the queue. if the “lock duration” expires, other receivers will be able to fetch this message. receive and delete in receive and delete mode, once the message is received by a receiver it will be deleted from the queue automatically. if a receiver fails to process that message, then the message is lost forever. so unless you’re sure of receiver’s functionality that it will never fail or you don’t care if the message is processed successfully or not, use this mode cautiously. message composition a message in service bus consists of 3 things – message body, standard properties and custom properties. message body is the actual content of the message. there are some predefined properties of a message and those fall under standard properties. apart from that you can define custom properties on a message which are essentially a collection of name/value pairs. total size of a message is 256 kb. message properties now let’s take a look at some of the standard properties of a message that i found interesting. message id this is the identifier of a message. you can set it at the time of sending a message. because it is an identifier, one would assume that it needs to be unique but that’s not the case. different messages can have same message id. sequence number when a message is created, service bus assigns a number to a message. that number is stored in this property. please note that it is a read-only property. message time to live (message ttl) this is the time period for which a message will remain in the queue. if you recall, you can also define a default message time-to-live at queue level also. service bus actually picks the lower of the two values as message ttl. for example, if you have defined that a message will expire after 14 days at queue level but 5 minutes at the message level then the message will expire after 5 minutes. lock token whenever a message is received by a receiver in “ peek and lock ” mode, service bus returns a (lock) token that must be used to perform further operations (e.g. delete message or dead-letter message etc.) on that message. this token is valid for a duration specified by “ lock duration ” property. after the lock duration expires, the lock token becomes invalid and any attempt to use this token for performing any allowed operations will result in an error. once a lock token expires, a receiver must receive the message again. there are other properties as well which i have not included for the sake of brevity. for a complete list of properties, please see this link: https://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage_properties.aspx . summary that’s it for this post. in the next posts in this series, i will share my learning about topics and other service bus services. so stay tuned for that! again, if you think that i have provided some incorrect information, please let me know and i will fix them asap.

July 2, 2015

by Gaurav Mantri

· 8,625 Views

Captains with Benefits

When it comes to teaching or learning, video streaming is something that still frightens people away. As a matter of fact that video chats and webinars have been around for a relatively long time, however; its still hard to encourage an individual or business to take part as such. And yet the benefits of CaptainLive can be substantial in both, short as well as long term. As we have already seen the benefits of video marketing therefore, we want to encourage you to use CaptainLive in order to take advantage of your potential whether it’s hidden in you or you are well aware of it. CaptainLive was launched in early 2015 with a mission to connect people in need of knowledge and skills with Captains with Benefits that are willing to share and give their expertise and mentor skills. CaptainLive’s integrated service now allows for text, video and audio conferencing. It’s been used by a variety of individuals with different backgrounds. At CaptainLive you can schedule an online live video stream with the experts in number topics ranging from counseling up to entertainment. Captains/Experts on the site charges from $5 USD up to $150 USD, most of which offer free 5 minute sessions with no obligation to book their session thereafter. Who knows you might end up registering as Captain yourself and start a part time business of your own to help others with your skills while making a healthy stream of income for yourself, it’s surely well worth your effort.

July 1, 2015

by Peter Watson

· 828 Views

Getting Started with D3.js

Originally authored by Elizabeth Engel You are thinking about including some nice charts and graphics into your current project? Maybe you heard about D3.js, as some people claim it the universal JavaScript visualization framework. Maybe you also heard about a steep learning curve. Let’s see if this is really true! First of all, what is D3.js? D3.js is an open source JavaScript framework written by Mike Bostock helping you to manipulate documents based on data. Okay, let’s first have a look at the syntax Therefore, lets look at the following hello world example. It will append an element saying ‘Hello World!’ to the content element. As you can see the syntax is very similar to frameworks like JQuery and obviously, it saves you a lot of code lines as it offers a nice fluent API. But let’s see how we can bind data to it: d3.select('#content') .selectAll('h1') .data(['Sarah', 'Robert', 'Maria', 'Marc']) .enter() .append('h1') .text(function(name) {return 'Hello ' + name + '!'}); What happens? The data function gets our names array as parameter and for each name we append an element with a personalized greeting message. For a second, we ignore the selectAll(‘h1′) and enter() method call, as we will explore them later. Looking into the browser we can see the following: Hello Sarah! Hello Robert! Hello Maria! Hello Marc! Not bad for a start! Inspecting the element in the browser, we see the following generated markup: [...] Hello Sarah! Hello Robert! Hello Maria! Hello Marc! [...] This already shows one enourmous advantage of D3.js: You acctually see the generated code and can spot errors easily. Now, let’s have a closer look at the data-document connection As mentioned in the beginning, D3.js helps you to manipulate documents based on data. Therefore, we only take care about handing the right data over to D3.js, so the framework can do the magic for us. To understand how D3.js handles data, we’ll first have a look at how data might change over time. Let’s take the document from our last example. Every name is one data entry. Easy. Now let’s assume new data comes in: As new data is coming in, the document needs to be updated. The entries of Robert and Maria need to be removed, Sarah and Marc can stay unchanged and Mike, Sam and Nora need a new entry each. Fortunately, using D3.js we don’t have to care about finding out which nodes need to be added and removed. D3.js will take care about it. It will also reuse old nodes to improve performance. This is one key benefit of D3.js. So how can we tell D3.js what to do when? To let D3.js update our data, we initially need a data join, so D3.js knows our data. Therefore, we select all existing nodes and connect them with our data. We can also hand over a function, so D3.js knows how to identify data nodes. As we initally don’t have nodes, the selectAll function will return an empty set. var textElements = svg.selectAll('h1').data(data, function(d) { return d; }); After the first iteration, the selectAll will hand over the existing nodes, in our case Sarah, Robert, Marc and Maria. So we can now update these existing nodes. For example, we can change their CSS class to grey: textElements.attr({'class': 'grey'}); Additionally, we can tell D3.js what to do with entering nodes, in our case Mike, Sam and Nora. For example, we can add an element for each of them and set the CSS class to green for each of them: textElements.enter().append('h1').attr({'class': 'green'}); As D3.js now updated the old nodes and added the new ones, we can define what will happen to both of them. In our cases this will affect the nodes of Mike, Sarah, Sam, Mark and Nora. For example, we can rotate them: textElements.attr({'transform', rotate(30 20,40)}); Furthermore, we can specify what D3.js will do to nodes like Robert and Maria, that are not contained in the data set any more. Let’s change their CSS class to red: textElements.exit().attr({'class': 'red'}); You can find the full example code to illustrate the data-document connection of D3.js as JSFiddle here: https://jsfiddle.net/q5sgh4rs/1/ But how to visualize data with D3.js? Now that we know about the basics of D3.js, let’s go to the most interesting part of D3.js: drawing graphics. To do so, we use SVG, which stands for scalable vector graphics. Maybe you already know it from other contexts. In a nutshell, it’s a XML-based vector image language supporting animation and interaction. Fortunately, we can just add SVG tags in our HTML and all common browsers will display it directly. This also facilitates debugging, as we can inspect generated elements in the browser. In the following, we see some basic SVG elements and their attributes: To get a better understanding of how SVG looks like, we’ll have a look at it as a basic example of SVG code, generating a rectangle, a line and a circle. To generate the same code using D3.js, we need to add an SVG to our content and then append the tree elements with their attributes like this: var svg = d3.select('#content').append('svg'); svg.append('rect').attr({x: 10, y: 15, width: 60, height: 20}); svg.append('line').attr({x1: 85, y1: 35, x2: 105, y2: 15}); svg.append('circle').attr({cx: 130, cy: 25, r: 6}); Of course, for static SVG code, we wouldn’t do this, but as we already saw, D3.js can fill attributes with our data. So we are now able to create charts! Let’s see how this works: This will draw our first bar chart for us! Have a look at it: https://jsfiddle.net/tLhomz11/2/ How to turn this basic bar chart into an amazing one? Now that we started drawing charts, we can make use of all the nice features D3.js offers. First of all, we will adjust the width of each bar to fill the available space by using a linear scale, so we don’t have to scale our values by hand. Therefore, we specify the range we want to get values in and the domain we have. In our case, the data is in between 0 and 200 and we would like to scale it to a range of 0 to 400, like this: var xScale = d3.scale.linear().range([0, 400]).domain([0,200]); If we now specify x values, we just use this function and get an eqivalent value in the right range. If we don’t know our maximum value for the domain, we can use the d3.max() function to calculate it based on the data set we want to display. To add an axis to our bar chart, we can use the following function and call it on our SVG. To get it in the right position, we need to transform it below the chart. [svg from above].call(d3.svg.axis().scale(xScale).orient("bottom")); Now, we can also add interaction and react to user input. For example, we can give an alert, if someone clicks one our chart: [svg from above].on("click", function () { alert("Houston, we get attention here!"); }) Adding a text node for each line, we get the following chart rendered in the browser: If you would like to play around with it, here is the code: https://jsfiddle.net/Loco5ddt/ If you would like to see even more D3.js code, using the same data to display a pie chart and adding an update button, look at the following one: https://jsfiddle.net/4eqzyquL/ Data import Finally, we can import our data in CSV, TSV or JSON format. To import a JSON file, for example, use the following code. Of course, you can also fetch your JSON via a server call instead of importing a static file. d3.json("data.json", function(data) { [access your data using the data variable] } What else does D3,js offer? Just to name a few, D3.js helps you with layouts, geometry, scales, ranges, data transformation, array and math functions, colors, time formating and scales, geography, as well as drag & drop. There are a lot of examples online: https://github.com/mbostock/d3/wiki/Gallery TL;DR + based on web standards + totally flexible + easy to debug + many, many examples online + Libaries build on D3.js (NVD3.js, C3.js or IVML) – a lot of code compared to other libraries – for standard charts too heavy Learning more As this blog post is based on a presentation held at the MunichJS Meetup, you can find the original slides here: http://slides.com/elisabethengel/d3js#/ The recording is available on youTube: https://www.youtube.com/watch?v=EYmJEsReewo For further information, have a look at: https://github.com/mbostock/d3/wiki https://github.com/mbostock/d3/wiki/API-Reference http://bost.ocks.org/mike/ Getting Started with D3 by Mike Dewar Interactive Data Visualization for the Web by Scott Murray (online for free) https://github.com/alignedleft/d3-book If you have any questions or suggestions, feel free to comment or write me: [email protected]

June 30, 2015

by Comsysto Gmbh

· 6,436 Views · 3 Likes

Writing a Download Server Part I: Always Stream, Never Keep Fully in Memory

Downloading various files (either text or binary) is a bread and butter of every enterprise application. PDF documents, attachments, media, executables, CSV, very large files, etc. Almost every application, sooner or later, will have to provide some form of download. Downloading is implemented in terms of HTTP, so it's important to fully embrace this protocol and take full advantage of it. Especially in Internet facing applications features like caching or user experience are worth considering. This series of articles provides a list of aspects that you might want to consider when implementing all sorts of download servers. Note that I avoid "best practices" term, these are just guidelines that I find useful but are not necessarily always applicable. One of the biggest scalability issues is loading whole file into memory before streaming it. Loading full file into byte[] to later return it e.g. from Spring MVC controller is unpredictable and doesn't scale. The amount of memory your server will consume depends linearly on number of concurrent connections times average file size - factors you don't really want to depend on so much. It's extremely easy to stream contents of a file directly from your server to the client byte-by-byte (with buffering), there are actually many techniques to achieve that. The easiest one is to copy bytes manually: @RequestMapping(method = GET) public void download(OutputStream output) throws IOException { try(final InputStream myFile = openFile()) { IOUtils.copy(myFile, output); } } Your InputStream doesn't even have to be buffered, IOUtils.copy() will take care of that. However this implementation is rather low-level and hard to unit test. Instead I suggest returning Resource: @RestController @RequestMapping("/download") public class DownloadController { private final FileStorage storage; @Autowired public DownloadController(FileStorage storage) { this.storage = storage; } @RequestMapping(method = GET, value = "/{uuid}") public Resource download(@PathVariable UUID uuid) { return storage .findFile(uuid) .map(this::prepareResponse) .orElseGet(this::notFound); } private Resource prepareResponse(FilePointer filePointer) { final InputStream inputStream = filePointer.open(); return new InputStreamResource(inputStream); } private Resource notFound() { throw new NotFoundException(); } } @ResponseStatus(value= HttpStatus.NOT_FOUND) public class NotFoundException extends RuntimeException { } Two abstractions were created to decouple Spring controller from file storage mechanism.FilePointer is a file descriptor, irrespective to where that file was taken. Currently we use one method from it: public interface FilePointer { InputStream open(); //more to come } open() allows reading the actual file, no matter where it comes from (file system, database BLOB, Amazon S3, etc.) We will gradually extend FilePointer to support more advanced features, like file size and MIME type. The process of finding and creatingFilePointers is governed by FileStorage abstraction: public interface FileStorage { Optional findFile(UUID uuid); } Streaming allows us to handle hundreds of concurrent requests without significant impact on memory and GC (only a small buffer is allocated in IOUtils). BTW I am using UUID to identify files rather than names or other form of sequence number. This makes it harder to guess individual resource names, thus more secure (obscure). More on that in next articles. Having this basic setup we can reliably serve lots of concurrent connections with minimal impact on memory. Remember that many components in Spring framework and other libraries (e.g. servlet filters) may buffer full response before returning it. Therefore it's really important to have an integration test trying to download huge file (in tens of GiB) and making sure the application doesn't crash. Writing a download server Part I: Always stream, never keep fully in memory Part II: headers: Last-Modified, ETag and If-None-Match Part III: headers: Content-length and Range Part IV: Implement HEAD operation (efficiently) Part V: Throttle download speed Part VI: Describe what you send (Content-type, et.al.) The sample application developed throughout these articles is available on GitHub.

June 24, 2015

by Tomasz Nurkiewicz

· 17,205 Views

Think about the last time something really aggravated you, whether it was a slow Internet connection, long store lines, or a rude cashier. Did you vow to go home and call an 800 number, punch through a bunch of option keys, and wait to talk to a customer service rep? Or did you take out your smart phone and hammer out your frustrations in 140 characters or less? Odds are you’re like the millions of consumers who express their grievances with friends, family, and colleagues on Twitter, Facebook, YouTube, or a host of other social sites. With more than 230 million people on Twitter and a billion or more on Facebook, companies now understand the importance of providing customer service over social media. According to a 2014 Forrester report, 62 percent of businesses believe they will lose ground if they don’t adopt social customer service technologies. Companies slow to embrace social media for customer service, also known as social care, are missing an opportunity to build their brands and customer loyalty. Ignoring customer problems on social media can spark a raging ﬁre of discontent. But by connecting with customers on social media, you can quickly respond and resolve issues in front of thousands of other prospective clients. A study by the International Customer Management Institute shows 61 percent of consumers who received social care were more satisﬁed with their support. And 58 percent said social care increased their customer loyalty. Are you ready to advance your customer support program to the social community or are you willing to sit on hold while your customers begin looking elsewhere? We’ve created an eBook, “Social Customer Care: How to Use Social Media to Improve Customer Support,” to explore the reasons for adding social care to your customer service programs. It also provides tips from industry experts on how to get there. Like this post? Click here to subscribe to our blog and receive the latest content on social learning, customer support, sales enablement, or all three.

June 22, 2015

by Bloomfire Marketing

· 1,000 Views · 1 Like

Regular Expressions Denial of the Service (ReDOS) Attacks: From the Exploitation to the Prevention

autors :michael hidalgo, dinis cruz introduction when it comes to web application security, one of the recommendations to write software that is resilient to attacks is to perform a correct input data validation. however, as mobile applications and apis (application programming interface) proliferates, the number of untrusted sources where data comes from goes up, and a potential attacker can take advantage of the lack of validations to compromise our applications. regular expressions provides a versatile mechanism to perform input data validation. developers use them to validate email addresses, zip codes, phone numbers and many other task that are easily implemented thought them. unfortunately most of the time software engineers don't fully understand how regular expressions works in the background and by choosing a wrong regular expression pattern they can introduce a risk in the application. in this article we are going to discuss about the so called regular expression denial of the service (redos) vulnerability and how we can identify this problems early in the software development life cycle (sdlc) stages by enforcing a culture focused on unit testing. hardware features for this article in order to provide information about execution time, performance, cpu utilisation and other facts, we are relying on virtual machine that uses windows 7 32-bit operating system, 5.22 gb ram. intel(r) core (tm) it-3820qm cpu @2.7 ghz. we are also using 4 cores. understanding the problem. the owasp foundation (2012) defines a regular regular expression denial of service attack as follows: "the regular expression denial of service (redos) is a denial of service attack, that exploits the fact that most regular expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size). an attacker can then cause a program using a regular expression to enter these extreme situations and then hang for a very long time." although a broad explanation about regular expression engines is out of the scope of this article,it is important to understand that, according to stubblebine,t (regular expressions pocket reference), a pattern matching consist of finding a section of text that is described (matched) by a regular expression. two main rules are used to match results: the earliest (leftmost) wins : the regular expression is applied to the input starting at the first character and moving toward the last. as soon as the regular expression engine finds a match,it returns. standard quantifiers are greedy : according to stubblebine, "quantifiers specify how many times something can be repeated. the standard quantifiers attempt to match as many times as possible. the process of giving up characters and trying less-greedy matches is called backtracking." for this article we are focused a regular expression engine called nondeterministic finite automaton (nfa).this engines usually compare each element of the regex to the input string, keeping track of positions where it chose between two options in the regex. if an option fails, the engine backtracks to the most recently saved position.(stubblebine,t 2007). it is important to note that this engine is also implemented in .net, java, python, php and ruby on rails. this article is focused on c# and therefore we are relying on the microsoft .net framework system.text.regularexpression classes which at the heart uses nfa engines. according to bryan sullivan "one important side effect of backtracking is that while the regex engine can fairly quickly confirm a positive match (that is, an input string does match a given regex), confirming a negative match (the input string does not match the regex) can take quite a bit longer. in fact, the engine must confirm that none of the possible “paths” through the input string match the regex, which means that all paths have to be tested. with a simple non-grouping regular expression, the time spent to confirm negative matches is not a huge problem." in order to illustrate the problem, let's use this regular expression (\w+\d+)+c which basically performs the following checks: between one and unlimited times, as many times as possible, giving back as needed. \w+ match any word character a-za-z0-9_ . \d+ match a digit 0-9 matches the character c literally (case sensitive) so matching values are 12c,1232323232c and !!!!cd4c and non matching values are for instance !!!!!c,aaaaaac and abababababc . the following unit test was created to verify both cases. const string regexpattern = @"(\w+\d+)+c"; public void testregularexpression() { var validinput = "1234567c"; var invalidinput = "aaaaaaac"; regex.ismatch(validinput, regexpattern).assert_is_true(); regex.ismatch(invalidinput, regexpattern).assert_is_false(); } execution time : 6 milliseconds now that we've verified that our regular expression works well, let's write a new unit test to understand the backtracking problem and the performance effects. note that the longer the string, the longer the time the regular expression engine will take to resolve it. we will generate 10 random strings, starting at the length of 15 characters, incrementing the length until get to 25 characters,and then we will see the execution times. const string regexpattern = @"(\w+\d+)+c"; [testmethod] public void isvalidinput() { var sw = new stopwatch(); int16 maxiterations = 25; for (var index = 15; index < maxiterations; index++) { sw.start(); //generating x random numbers using fluentsharp api var input = index.randomnumbers() + "!"; regex.ismatch(input, regexpattern).assert_false(); sw.stop(); sw.reset(); } } now let's take a look at the test results: random string character length elapsed time (ms) 360817709111694! 16 16ms 2639383945572745! 17 23ms 57994905459869261! 18 50ms 327218096525942566! 19 106ms 4700367489525396856! 20 207ms 24889747040739379138! 21 394ms 156014309536784168029! 22 795ms 8797112169446577775348! 23 1595ms 41494510101927739218368! 24 3200ms 112649159593822679584363! 25 6323ms by looking at this results we can understand that the execution time (total time to resolve the input text against the regular expression) goes up exponentially to the size of the input. we can also see that when we append a new character, the execution time almost duplicates. this is an important finding because shows how expensive this process is, if we do not have a correct input data validation we can introduce performance issues in our application. a real-life use-case and an appeal for a unit testing approach now that we have seen the problems we can face by selecting a wrong (evil) regular expression, let's discuss about a realistic scenario where we need to validate input data thought regular expressions. we strongly believe that unit testing techniques can not only help to write quality code but also we can use them to find vulnerabilities in the code we are writing. by writing unit test that performs security checks (like input data validation) a common task in web applications consist on request an email address to the user signing in our application. from a ux (user experience perspective) complaining browsers support friendly error messages when an input, that was supposed to be an email address, does not match with the requirements in terms of format. here is a ui validation when a input textbox (with the email type is set) and the value is not a valid email address. however relying on a ui validation is not longer enough. an eavesdropper can easily perform an http request without using a browser (namely by using a proxy to capture data in transit) and then send a payload that can compromise our application. in the following use case, we are using a backend validation for the email address by using a regular expression. we will show you the real power of regular expressions here, we are not only testing that the regular expression validates the input but also how it behaves when it receives any arbitrary input. we are using this evil regular expression to validate the email: ^( 0-9a-za-z @([0-9a-za-z][-\w][0-9a-za-z].)+[a-za-z]{2,9})$ . with the following test we are verifying that a valid email and invalid emails formats are correctly processed by the regular expression, which is the functional aspect from a development point of view. const string emailregex = @"^([0-9a-za-z]([-.\w]*[0-9a-za-z])*@([0-9a-za-z][-\w]*[0-9a-za-z]\.)+[a-za-z]{2,9})$"; [testmethod] public void validateemailaddress() { var validemailaddress = "[email protected]"; var invalidemailaddress = new string[] { "a", "abc.com", "1212", "aa.bb.cc", "aabcr@s" }; regex.ismatch(validemailaddress, emailregex).assert_is_true(); //looping throught invalid email address foreach (var email in invalidemailaddress) { regex.ismatch(email, emailregex).assert_is_false(); } } elapsed time: 6ms. so both cases are validate correctly. one could state that both scenarios supported by the unit test are enough to select this regular expression for our input data validations. however we can do a more extensive testing as you'll see. the exploit so far the previous regular expression selected to valid an email address seems to work well, we have added some unit test that verifies valid an invalid inputs. but how does it behaves when we send an arbitrary input?, from a variable length, do we face a denial of the service attack?. this kind of questions can be solved wit unit testing technique like this one: const string emailregex = @"^([0-9a-za-z]([-.\w]*[0-9a-za-z])*@([0-9a-za-z][-\w]*[0-9a-za-z]\.)+[a-za-z]{2,9})$"; [testmethod] public void validateemailaddress() { var validemailaddress = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"; var watch = new stopwatch(); watch.start(); validemailaddress.regex(emailregex).assert_is_false(); watch.stop(); console.writeline("elapsed time {0}ms", watch.elapsedmilliseconds); watch.reset(); } **elapsed time : ~23 minutes (1423127 milliseconds).** results are disturbing. we can clearly see the performance problem introduced by evaluating the given input.it takes roughly 23 minutes to validate the input given the hardware characteristics described before. in the following images you will see the cpu behaviour when running this unit test. here is another cpu utlization: and this is another image from the cpu utilization while the test is running. fuzzing and unit testing: a perfect combination of techniques in the previous unit test we found that a given input string can lead to have denial of the service issue in our application. note that we didn't need an extreme large payload, in our scenario 34 characters can illustrate this problem or even less. when using any regular expression it is recomendable to always test it against unit testing to cover most of the possible ways a user (which can be a potential attacker) can send. here is where we can use fuzzing. tobias klein in his book a bug hunter's diary a guide tour throught the wilds of sofware security defines fuzzing as "a complete different approach to bug hunting is known as fuzzing. fuzzing is a dynamic-analysis technique that consist of testing an application by providing it with malformed or unexpected input. then klein continues adding that: "it isn't easy to identify the entry points of such complex applications, but complex software often tends to crash while processing malformed input data. page 05" mano paul in his book official (isc)2 guide to the csslp talking about fuzzing states that: "also known as fuzz testing or fault injection testing, fuzzing is a brute-force type of testing in which faults (random and pseudo-random input data) are injected into the software and it's behaviour is observed. it is a test whose results are indicative of the extended and effectiveness of the input validation.page 336". taking previous definitions into consideration, we are going to implement a new unit test that can allow us to generate random input data and test our regular expression. in this case, we are using this email regular expression "^[\w-.]{1,}\@([\w]{1,}.){1,}[a-z]{2,4}$"; and by doing an exhaustive testing we will see if we are not introducing a denial of the service problem. we want to make sure that the elapsed time to resolve if the random string matches the regular expression is evaluated in less than 3 seconds: const string emailregex = @"^[\w-\.]{1,}\@([\w]{1,}\.){1,}[a-z]{2,4}$"; //number of random strings to generte. const int maxiterations = 10000; [testmethod] public void fuzz_emailaddress() { //valid email should return true "[email protected]".regex(emailregex).assert_is_true(); //invalid email should return false "abce" .regex(emailregex).assert_is_false(); //testing maxiterations times for (int index = 0; index < maxiterations; index++) { //generating a random string var fuzzinput = (index * 5).randomstring(); var sw = new stopwatch(); sw.start(); fuzzinput.regex(emailregex).assert_is_false(); //elapsed time should be less than 3 seconds per input. sw.elapsed.seconds().assert_size_is_smaller_than(3); } } under the hardware features described before, this test passes. considering that we are using this computation (index * 5), the largest string generate is of 49995 character (which is 9999 *5). having said that we were able to test a large string against the regular expression and we confirmed that even thought it is quite large input value, the time involved to verify if it was or not a valid email, it was less than 3 seconds. now assuming that a check for the length of the email in the first place, it will guarantee that a malicious user can't inject a large payload in our application. countermeasures provided in microsoft .net 4.5 and upper if you are developing applications in microsoft .net 4.5 then you can take advantage of a new implementation on top of the ismatch method from the regex class . starting from .net 4.5 the ismatch method provides an overload that allows you to enter a timeout. note that this overload is not available in .net 4.0 . this new parameter is called matchtimeout and according to microsoft : "the matchtimeout parameter specifies how long a pattern matching method should try to find a match before it times out. setting a time-out interval prevents regular expressions that rely on excessive backtracking from appearing to stop responding when they process input that contains near matches. for more information, see best practices for regular expressions in the .net framework and backtracking in regular expressions . if no match is found in that time interval, the method throws a regexmatchtimeoutexception exception. matchtimeout overrides any default time-out value defined for the application domain in which the method executes." taken from here . we've written a new unit test where we're using a regular expression that we know can lead to denial of the service. in this case we'll test an email address that previously generated a significant side effect in the performance of the application. we'll see then how we can reduce the impact of this process by setting up a timeout. const string emailregexpattern = @"^([0-9a-za-z]([-.\w]*[0-9a-za-z])*@([0-9a-za-z][-\w]*[0-9a-za-z]\.)+[a-za-z]{2,9})$"; [testmethod] public void validateemailaddress() { var emailaddress = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!"; var watch = new stopwatch(); watch.start(); //timeout of 5 seconds try { regex.ismatch(emailaddress, emailregexpattern, regexoptions.ignorecase, timespan.fromseconds(5)); } catch (exception ex) { ex.message.assert_not_null(); ex.gettype().assert_is(typeof(regexmatchtimeoutexception)); } finally { watch.stop(); watch.elapsed.seconds().assert_size_is_smaller_than(5); watch.reset(); } } running this test in visual studio we can confirm it passes, which means that the backtracking mechanism is taking longer than 5 seconds to resolve. it will throw a regexmatchtimeoutexception exception indicating that it might take longer than 5 seconds to evaluate the input. ideally one would expect this process to take less than a second, however several conditions or requirements might lead to allow a timeout in seconds. note how this model provides a very needed defensive programming style where the software engineers make informed decisions on the code they write, in this case we can establish the next steps when our method times and that way we can decrease any denial of the service attack. final thoughts no one size fits all is so cliché that has to be true. we are not sure if the regular expressions you are currently using in your applications are vulnerable to this attack. what we can do for sure is to show you how you can take advantage of unit testing to write secure code. when we write code we want to make sure that each single line of code is covered by a unit testing, which at the end of the day will guarantee early detections of error. however if we can combine this exercise with the adoption and implementation of test that can also try to attack/compromise the application (and we are not talking about anything fancy) like sending random strings, using fuzzing techniques, using combination of characters, exceeding the expected length, we will be helping to write software that is resilient to attacks. as a recommendation always test your regular expressions agains uni test, make sure that they are resilient to the attack we have covered in this article and if you are able to identify those problematic patterns out there, do a contribution and report them so we are not introduce them in the software we write. references 1.cruz,dinis(2013) the email regex that (could had) dosed a site. 2.hollos,s. hollos,r (2013) finite automata and regular expressions problems and solutions. 3.kirrage,j. rathnayake , thielecke, h.: static analysis for regular expression denial-of-service attacks. university of birmingham, uk 4.klein, t. a bug hunter's diary a guided tour through the wilds of software security (2011). 5.the owasp foundation (2012) regular expression denial of service - redos. 6.stubblebine, t(2007) regular expression pocket reference, second edition. 7.sullivan, b (2010) regular expression denial of service attacks and defenses

June 7, 2015

by Michael Hidalgo

· 34,269 Views · 5 Likes

What are the Benefits of Node.js?

What is Node.js? Ryan Dahl, and other developers, at Joyent created Node.js. Node.js is an open source, cross-platform runtime environment for server-side and networking applications. It brings event-driven programming to web servers enabling development of fast web servers in Javascript. In an event-driven application, there is a main loop that listens for events, and then triggers a callback function when one of those events is detected. Node.js also provides a non-blocking I/O API that optimizes an application's throughput and scalability. In a non-blocking language, commands execute in parallel, and use callbacks to signal completion. In a blocking language, commands execute only after the previous command has completed. Node.js uses the Google V8 JavaScript engine to execute code, and a large percentage of the basic modules are written in JavaScript. Node.js contains a built-in library to allow applications to act as a Web server without software such as Apache HTTP Server or IIS.NPM is the pre-installed package manager for the Node.js server platform. It is used to install Node.js programs from the npm registry. The package manager allows publishing and sharing of open-source Node.js libraries by the community, and simplifies installation, updating and un-installation of libraries. What are some of the Benefits of Node.js? 1. Asynchronous I/O It's built to handle asynchronous I/O from the ground up and is a good match to a lot of common web- and network-development problems. In addition to fast JavaScript execution, the real magic behind Node.js is called the Event Loop. To scale to large volumes of clients, all I/O intensive operations in Node.js are performed asynchronously. 2. Javascript Node.js is Javascript. So the same language can be used on the backend and frontend. This means it breaks down the boundaries between front- and back-end development. 3. Community Driven In addition to it’s innate capabilities, Node.js has a thriving open source community which has produced many excellent modules to add additional capabilities to Node.js applications. One of the most famous is Socket.io, a module to manage persistent connections between client and server, enabling the server to push real-time updates to clients. Socket.io abstracts the technology used to maintain these connections away from the developer, automatically using the best technology available for a particular client (websockets if the browser supports it, JSONP or Ajax longpolling if not). References: https://blog.udemy.com/learn-node-js/ http://pettergraff.blogspot.com/2013/01/why-node.html

May 20, 2015

by Kenneth Peeples

· 47,258 Views · 1 Like

Swim Lane Diagrams in JavaScript

Learn about swim-lane diagrams to connect business processes and departments and apply the concept to the object-oriented world of javascript.

May 17, 2015

by Daniel Jebaraj

· 7,181 Views

Use RegEx to Test Password Strength in JavaScript

In this post, we learn how to combine JavaScript and RegEx to create scripts that can help us test our password strength.

May 16, 2015

by Nic Raboy

· 93,756 Views · 1 Like

JSF "Loading" JavaScript -- Brief Overview

What remains unchanged is the way that JavaScript enter in the scene via the or, as:

April 25, 2015

by Anghel Leonard

CORE

· 16,717 Views

Currency Format Validation and Parsing

In Java, formatting a number according to a locale-specific currency format is pretty simple. You use an instance of java.text.NumberFormat class by instantiating it through NumberFormat.getCurrencyInstance() and invoke one of the format() methods. Following is a code-snippet from https://docs.oracle.com/javase/tutorial/i18n/format/numberFormat.html static public void displayCurrency( Locale currentLocale) { Double currencyAmount = new Double(9876543.21); Currency currentCurrency = Currency.getInstance(currentLocale); NumberFormat currencyFormatter = NumberFormat.getCurrencyInstance(currentLocale); System.out.println( currentLocale.getDisplayName() + ", " + currentCurrency.getDisplayName() + ": " + currencyFormatter.format(currencyAmount)); } However, if an application allows users to enter an amount as a string using separators and currency symbols, there is quite a possibility that currency format may not have been followed according to the locale. For example, user can use a wrong thousand or decimal separator, or a wrong currency symbol altogether. In that case, the application should incorporate a mechanism to first ensure that format is followed and then parse that string to convert it into a proper number in order to perform several mathematical calculations in a currency format independent manner. It is important to note that parsing a string using java.text.NumberFormat#public Number parse(String source) method requires that the currency string must contain a value following the locale-specific pattern defined in java.text.DecimalFormat and symbols defined in java.text.DecimalFormatSymbols. If the pattern and/or symbols are not followed, program will throw java.text.ParseException. For example, currency pattern for it_CH = Italian (Switzerland) is ¤ #,##0.00 and grouping separator is ', hence SFr. 1'234.56 is valid while 1,234.56 SFr. is invalid In order to check the validity of a currency amount before actually parsing it, Apache Commons Validator project's org.apache.commons.validator.routines.CurrencyValidator comes in real handy. All you have to do is to construct its object and call public boolean isValid(String value,Locale locale) method to check whether locale-specific format is followed or not. Once you are done with validation and found that number is valid, you can then parse the number by calling the parse() method. Following code snippet shows how to validate and parse. Note that the validation is lenient and if currency symbol is not already present in the string, it appends appropriate currency symbol according to the pattern and then validate and parse it. /** * Converts given item price into a number based on given * currency code. * * It works generically for all currencies supported by Java * * * @param itemPrice currency amount * @param currencyCode 3-letter ISO country code * @return {@link String} containing stripped price or same as given price * if parsing failed or formatter couldn't be constructed * @author Muhammad Haris */ public static Double convertPrice(String itemPrice, String currencyCode) { Double itemPriceConverted = null; Locale currencyLocale = LocaleUtility .getLocaleAgainstCurrency(currencyCode); DecimalFormat currencyFormatter = getCurrencyFormatter(currencyLocale); if (currencyFormatter != null) { itemPrice = appendCurrencySymbol(itemPrice, currencyFormatter); try { Number number = currencyFormatter.parse(itemPrice); itemPriceConverted = number.doubleValue(); } catch (ParseException e) { LOG.error("Failed to parse currency: " + currencyCode + ", value: " + itemPrice + ". " + e.getMessage(), e); } } else { LOG.error("No appropriate formatter found for currency: " + currencyCode + ", value: " + itemPrice + ". "); } return itemPriceConverted; } /** * Gets currency formatter against given currency locale * * @param currencyCode * {@link String} containing 3 letter ISO currency code * @return {@link NumberFormat} object specialized for the currency or null * if it couldn't be composed * @author Muhammad Haris */ public static DecimalFormat getCurrencyFormatter(Locale currencyLocale) { if (currencyLocale != null) { return (DecimalFormat) NumberFormat .getCurrencyInstance(currencyLocale); } return null; } /** * Appends appropriate currency symbol to the given price using the pattern * defined in the given currency formatter * * @param itemPrice * {@link String} containing price of the item in locale specific * format * @param currencyFormatter * {@link DecimalFormat} object containing currency locale * specific formatting info * @author Muhammad Haris */ public static String appendCurrencySymbol(String itemPrice, DecimalFormat currencyFormatter) { String currencySymbol = currencyFormatter.getDecimalFormatSymbols() .getCurrencySymbol(); String pattern = currencyFormatter.toPattern(); if (!itemPrice.contains(currencySymbol)) { if (pattern.startsWith("¤ ")) { itemPrice = currencySymbol + " " + itemPrice; } else if (pattern.endsWith(" ¤")) { itemPrice = itemPrice + " " + currencySymbol; } else if (pattern.startsWith("¤")) { itemPrice = currencySymbol + itemPrice; } else if (pattern.endsWith("¤")) { itemPrice = itemPrice + currencySymbol; } } return itemPrice; }

April 10, 2015

by Muhammad Haris

· 15,012 Views

Using Oauth 2.0 in your Web Browser with AngularJS

I have a few popular Oauth related posts on my blog. I have one pertaining to Oauth 1.0a, and I have one on the topic of Oauth 2.0 for use in mobile application development. However, I get a lot of requests to show how to accomplish an Oauth 2.0 connection in a web browser using only JavaScript and AngularJS. We’re going to better explore the process flow behind Oauth 2.0 to establish a secure connection with a provider of our choice. In this particular example we’ll be using Imgur because I personally think it is a great service. Before we begin, it is important to note that this tutorial will only work with providers that offer the implicit grant type. Oauth Implicit Grant Type via OauthLib: The implicit grant type is used to obtain access tokens (it does not support the issuance of refresh tokens) and is optimized for public clients known to operate a particular redirection URI. These clients are typically implemented in a browser using a scripting language such as JavaScript. Unlike the authorization code grant type, in which the client makes separate requests for authorization and for an access token, the client receives the access token as the result of the authorization request. You’ll know the provider supports the implicit grant type when they make use of response_type=token rather than response_type=code. So there are going to be a few requirements to accomplish this in AngularJS: We are going to be using the AngularJS UI-Router library We are going to have a stand-alone index.html page with multiple templates We are going to have a stand-alone oauth_callback.html page with no AngularJS involvement With that said, let’s go ahead and create our project to look like the following: project root templates login.html secure.html js app.js index.html oauth_callback.html The templates/login.html page is where we will initialize the Oauth flow. After reaching the oauth_callback.html page we will redirect to the templates/secure.html page which requires a successful sign in. Crack open your index.html file and add the following code: Now it is time to add some very basic HTML to our templates/login.html and templates/secure.html pages: Login Login with Imgur Secure Web Page Access Token: {{accessToken} Not much left to do now. Open your js/app.js file and add the following AngularJS code: var example = angular.module("example", ['ui.router']); example.config(function($stateProvider, $urlRouterProvider) { $stateProvider .state('login', { url: '/login', templateUrl: 'templates/login.html', controller: 'LoginController' }) .state('secure', { url: '/secure', templateUrl: 'templates/secure.html', controller: 'SecureController' }); $urlRouterProvider.otherwise('/login'); }); example.controller("LoginController", function($scope) { $scope.login = function() { window.location.href = "https://api.imgur.com/oauth2/authorize?client_id=" + "CLIENT_ID_HERE" + "&response_type=token" } }); example.controller("SecureController", function($scope) { $scope.accessToken = JSON.parse(window.localStorage.getItem("imgur")).oauth.access_token; }); We are first going to focus on the login method of the LoginController. Go ahead and add the following, pretty much taken exactly from the Imgur documentation: $scope.login = function() { window.location.href = "https://api.imgur.com/oauth2/authorize?client_id=" + "CLIENT_ID_HERE" + "&response_type=token" } This long URL has the following components: Parameter Description client_id The application id found in your Imgur developer dashboard response_type Authorization grant or implicit grant type. In our case token for implicit grant The values will typically change per provider, but the parameters will usually remain the same. Now let’s dive into the callback portion. After the Imgur login flow, it is going to send you to http://localhost/oauth_callback.html because that is what we’ve decided to enter into the Imgur dashboard. Crack open your oauth_callback.html file and add the following source code: Redirecting... If you’re familiar with the ng-cordova-oauth library that I made, you’ll know much of this code was copied from it. Basically what we’re doing is grabbing the current URL and parsing out all the token parameters that Imgur has provided us. We are then going to construct an object with these parameters and serialize them into local storage. Finally we are going to redirect into the secure area of our application. In order to test this we need to be running our site from a domain or localhost. We cannot test this via a file:// URL. If you’re on a Mac or Linux machine, the simplest thing to do is run sudo python -m SimpleHTTPServer 80 since both these platforms ship with Python. This will run your web application as localhost on port 80. A video version of this article can be seen below.

April 7, 2015

by Nic Raboy

· 27,481 Views · 1 Like

Fork/Join Framework vs. Parallel Streams vs. ExecutorService: The Ultimate Fork/Join Benchmark

How does the Fork/Join framework act under different configurations? Just like the upcoming episode of Star Wars, there has been a lot of excitement mixed with criticism around Java 8 parallelism. The syntactic sugar of parallel streams brought some hype almost like the new lightsaber we’ve seen in the trailer. With many ways now to do parallelism in Java, we wanted to get a sense of the performance benefits and the dangers of parallel processing. After over 260 test runs, some new insights rose from the data and we wanted to share these with you in this post. Fork/Join Framework vs. Parallel Streams vs. ExecutorService: The Ultimate Fork/Join Benchmark http://t.co/CMNfYZe58Z pic.twitter.com/6WExlmbyo6 — Takipi (@takipid) January 20, 2015 ExecutorService vs. Fork/Join Framework vs. Parallel Streams A long time ago, in a galaxy far, far away.... I mean, some 10 years ago concurrency was available in Java only through 3rd party libraries. Then came Java 5 and introduced the java.util.concurrent library as part of the language, strongly influenced by Doug Lea. The ExecutorService became available and provided us a straightforward way to handle thread pools. Of course java.util.concurrent keeps evolving and in Java 7 the Fork/Join framework was introduced, building on top of the ExecutorService thread pools. With Java 8 streams, we’ve been provided an easy way to use Fork/Join that remains a bit enigmatic for many developers. Let’s find out how they compare to one another. We’ve taken 2 tasks, one CPU-intensive and the other IO-intensive, and tested 4 different scenarios with the same basic functionality. Another important factor is the number of threads we use for each implementation, so we tested that as well. The machine we used had 8 cores available so we had variations of 4, 8, 16 and 32 threads to get a sense of the general direction the results are going. For each of the tasks, we’ve also tried a single threaded solution, which you’ll not see in the graphs since, well, it took much much longer to execute. To learn more about exactly how the tests ran you can check out the groundwork section below. Now, let’s get to it. Indexing a 6GB file with 5.8M lines of text In this test, we’ve generated a huge text file, and created similar implementations for the indexing procedure. Here’s what the results looked like: ** Single threaded execution: 176,267msec, or almost 3 minutes. ** Notice the graph starts at 20000 milliseconds. 1. Fewer threads will leave CPUs unutilized, too many will add overhead The first thing you notice in the graph is the shape the results are starting to take - you can get an impression of how each implementation behaves from only these 4 data points. The tipping point here is between 8 and 16 threads, since some threads are blocking in file IO, and adding more threads than cores helped utilize them better. When 32 threads are in, performance got worse because of the additional overhead. 2. Parallel Streams are the best! Almost 1 second better than the runner up: using Fork/Join directly Syntactic sugar aside (lambdas! we didn’t mention lambdas), we’ve seen parallel streams perform better than the Fork/Join and the ExecutorService implementations. 6GB of text indexed in 24.33 seconds. You can trust Java here to deliver the best result. 3. But… Parallel Streams also performed the worst: The only variation that went over 30 seconds This is another reminder of how parallel streams can slow you down. Let’s say this happens on machines that already run multithreaded applications. With a smaller number of threads available, using Fork/Join directly could actually be better than going through parallel streams - a 5 second difference, which makes for about an 18% penalty when comparing these 2 together. 4. Don’t go for the default pool size with IO in the picture When using the default pool size for Parallel Streams, the same number of cores on the machine (which is 8 here), performed almost 2 seconds worse than the 16 threads version. That’s a 7% penalty for going with the default pool size. The reason this happens is related with blocking IO threads. There’s more waiting going on, so introducing more threads lets us get more out of the CPU cores involved while other threads wait to be scheduled instead of being idle. How do you change the default Fork/Join pool size for parallel streams? You can either change the common Fork/Join pool size using a JVM argument: [java] -Djava.util.concurrent.ForkJoinPool.common.parallelism=16 [/java] (All Fork/Join tasks are using a common static pool the size of the number of your cores by default. The benefit here is reducing resource usage by reclaiming the threads for other tasks during periods of no use.) Or... You can use this trick and run Parallel Streams within a custom Fork/Join pool. This overrides the default use of the common Fork/Join pool and lets you use a pool you’ve set up yourself. Pretty sneaky. In the tests, we’ve used the common pool. 5. Single threaded performance was 7.25x worse than the best result Parallelism provided a 7.25x improvement, and considering the machine had 8 cores, it got pretty close to the theoretic 8x prediction! We can attribute the rest to overhead. With that being said, even the slowest parallelism implementation we tested, which this time was parallel streams with 4 threads (30.24sec), performed 5.8x better than the single threaded solution (176.27sec). What happens when you take IO out of the equation? Checking if a number is prime For the next round of tests, we’ve eliminated IO altogether and examined how long it would take to determine if some really big number is prime or not. How big? 19 digits. 1,530,692,068,127,007,263, or in other words: one quintillion seventy nine quadrillion three hundred sixty four trillion thirty eight billion forty eight million three hundred five thousand thirty three. Argh, let me get some air. Anyhow, we haven’t used any optimization other than running to its square root, so we checked all even numbers even though our big number doesn’t divide by 2 just to make it process longer. Spoiler alert: it’s a prime, so each implementation ran the same number of calculations. Here’s how it turned out: ** Single threaded execution: 118,127msec, or almost 2 minutes. ** Notice the graph starts at 20000 milliseconds 1. Smaller differences between 8 and 16 threads Unlike the IO test, we don’t have IO calls here so the performance of 8 and 16 threads was mostly similar, except for the Fork/Join solution. We’ve actually ran a few more sets of tests to make sure we’re getting good results here because of this “anomaly” but it turned out very similar time after time. We’d be glad to hear your thoughts about this in the comment section below. 2. The best results are similar for all methods We see that all implementations share a similar best result of around 28 seconds. No matter which way we tried to approach it, the results came out the same. This doesn’t mean that we’re indifferent to which method to use. Check out the next insight. 3. Parallel streams handle the thread overload better than other implementations This is the more interesting part. With this test, we see again that the the top results for running 16 threads are coming from using parallel streams. Moreover, in this version, using parallel streams was a good call for all variations of thread numbers. 4. Single threaded performance was 4.2x worse than the best result In addition, the benefit of using parallelism when running computationally intensive tasks is almost 2 times worse than the IO test with file IO. This makes sense since it’s a CPU intensive test, unlike the previous one where we could get an extra benefit from cutting down the time our cores were waiting on threads stuck with IO. Conclusion I’d recommend going to the source to learn more about when to use parallel streams and applying careful judgement anytime you do parallelism in Java. The best path to take would be running similar tests to these in a staging environment where you can try and get a better sense of what you’re up against. The factors you have to be mindful of are of course the hardware you’re running on (and the hardware you’re testing on), and the total number of threads in your application. This includes the common Fork/Join pool and code other developers on your team are working on. So try to keep those in check and get a full view of your application before adding parallelism of your own. Groundwork To run this test we’ve used an EC2 c3.2xlarge instance with 8 vCPUs and 15GB of RAM. A vCPU means there’s hyperthreading in place so in fact we have here 4 physical cores that each act as if it were 2. As far as the OS scheduler is concerned, we have 8 cores here. To try and make it as fair as we could, each implementation ran 10 times and we’ve taken the average run time of runs 2 through 9. That’s 260 test runs, phew! Another thing that was important is the processing time. We’ve chosen tasks that would take well over 20 seconds to process so the differences will be easier to spot and less affected by external factors. What’s next? The raw results are available right here, and the code is on GitHub. Please feel free to tinker around with it and let us know what kind of results you’re getting. If you have any more interesting insights or explanations for the results that we’ve missed, we’d be happy to read them and add it to the post. Originally posted on Takipi's blog

April 1, 2015

by Chen Harel

· 16,764 Views

Get Client (Browser) timezone and maintain it in cookie

Recently, I came with requirement where we need to get browser timezone and maintain it so our Spring MVC application can use it. Our application need to convert date and time from server timezone to client timezone. Below is overall idea of implementation: Get Browser timezone by javascript. We can use opensource 'jstz.min.js' file for getting this. We can find this from ‘http://pellepim.bitbucket.org/jstz/’. We need to maintain this timezone. For same, we will store this timezone in cookie. This can be done by creating one jsp 'findTimeZonePage.jsp'. This page will store timezone in cookie and again redirect to original page. Every method of Spring MVC controller will check whether cookie is available, If not then it will redirect to findTimeZonePage.jsp. While doing this we will also pass current Url(will set in model) so that findTimeZonePage jsp can redirect to same page again. Code: 1. findTimeZonePage.jsp loading the page... 2. Add below Methods in Util class: public static TimeZonegetBrowserTimeZone(HttpServletRequest request){ Cookie[] cookieArray = request.getCookies(); if(cookieArray != null){ for(Cookie cookie : cookieArray){ if("CalenderAppTimeZone".equals(cookie.getName())){ String timeZoneId = cookie.getValue(); return TimeZone.getTimeZone(timeZoneId); } } } return null; } public static StringgetFullURL(HttpServletRequest request) { StringBuffer requestURL = request.getRequestURL(); String queryString = request.getQueryString(); if (queryString == null) { return requestURL.toString(); } else { return requestURL.append('?').append(queryString).toString(); } } 3. In each method of MVC Controller class, Add below code at start of method: TimeZone currentTimeZone = MyUtil.getBrowserTimeZone(request); if(currentTimeZone == null){ String url = MyUtil.getFullURL(request); System.out.println("Url="+url); model.addAttribute("redirectUrl", url); //Redirect to 'findTimeZone' for setting timezone. System.out.println("####Timezone is not set. Redirecting to findTimeZone.jsp for setting timezone."); return "findTimeZonePage"; } System.out.println("####Current TimeZone="+currentTimeZone.getID()); Hope this will help.

March 28, 2015

by Rajeshkumar Dave

· 12,879 Views

Walking Recursive Data Structures Using Java 8 Streams

The Streams API is a real gem in Java 8, and I keep finding more or less unexpected uses for them. I recently wrote about using them as ForkJoinPool facade. Here’s another interesting example: Walking recursive data structures. Without much ado, have a look at the code: class Tree { private int value; private List children = new LinkedList<>(); public Tree(int value, List children) { super(); this.value = value; this.children.addAll(children); } public Tree(int value, Tree... children) { this(value, asList(children)); } public int getValue() { return value; } public List getChildren() { return Collections.unmodifiableList(children); } public Stream flattened() { return Stream.concat( Stream.of(this), children.stream().flatMap(Tree::flattened)); } } It’s pretty boring, except for the few highlighted lines. Let’s say we want to be able to find elements matching some criteria in the tree or find particular element. One typical way to do it is a recursive function – but that has some complexity and is likely to need a mutable argument (e.g. a set where you can append matching elements). Another approach is iteration with a stack or a queue. They work fine, but take a few lines of code and aren’t so easy to generalize. Here’s what we can do with this flattened function: // Get all values in the tree: t.flattened().map(Tree::getValue).collect(toList()); // Get even values: t.flattened().map(Tree::getValue).filter(v -> v % 2 == 0).collect(toList()); // Sum of even values: t.flattened().map(Tree::getValue).filter(v -> v % 2 == 0).reduce((a, b) -> a + b); // Does it contain 13? t.flattened().anyMatch(t -> t.getValue() == 13); I think this solution is pretty slick and versatile. One line of code (here split to 3 for readability on blog) is enough to flatten the tree to a straightforward stream that can be searched, filtered and whatnot. It’s not perfect though: It is not lazy and flattened is called for each and every node in the tree every time. It probably could be improved using a Supplier. Anyway, it doesn’t matter for typical, reasonably small trees, especially in a business application on a very tall stack of libraries. But for very large trees, very frequent execution and tight time constraints the overhead might cause some trouble.

March 18, 2015

by Konrad Garus

· 25,172 Views · 1 Like

Java 8 Stream to Rx-Java Observable

I was recently looking at a way to convert a Java 8 Stream to Rx-JavaObservable. There is one api in Observable that appears to do this : public static final Observable from(java.lang.Iterable iterable) So now the question is how do we transform a Stream to an Iterable. Stream does not implement the Iterable interface, and there are good reasons for this. So to return an Iterable from a Stream, you can do the following: Iterable iterable = new Iterable() { @Override public Iterator iterator() { return aStream.iterator(); } }; Observable.from(iterable); Since Iterable is a Java 8 functional interface, this can be simplified to the following using Java 8 Lambda expressions!: Observable.from(aStream::iterator); First look it does appear cryptic, however if it is seen as a way to simplify the expanded form of Iterable then it slowly starts to make sense. Reference: This is entirely based on what I read on this Stackoverflow question.

March 12, 2015

by Biju Kunjummen

· 12,668 Views · 2 Likes

A JAXB Nuance: String Versus Enum from Enumerated Restricted XSD String

Although Java Architecture for XML Binding (JAXB) is fairly easy to use in nominal cases (especially since Java SE 6), it also presents numerous nuances. Some of the common nuances are due to the inability to exactlymatch (bind) XML Schema Definition (XSD) types to Java types. This post looks at one specific example of this that also demonstrates how different XSD constructs that enforce the same XML structure can lead to different Java types when the JAXB compiler generates the Java classes. The next code listing, for Food.xsd, defines a schema for food types. The XSD mandates that valid XML will have a root element called "Food" with three nested elements "Vegetable", "Fruit", and "Dessert". Although the approach used to specify the "Vegetable" and "Dessert" elements is different than the approach used to specify the "Fruit" element, both approaches result in similar "valid XML." The "Vegetable" and "Dessert" elements are declared directly as elements of the prescribed simpleTypes defined later in the XSD. The "Fruit" element is defined via reference (ref=) to another defined element that consists of a simpleType. Food.xsd Although Vegetable and Dessert elements are defined in the schema differently than Fruit, the resulting valid XML is the same. A valid XML file is shown next in the code listing for food1.xml. food1.xml Spinach Watermelon Pie At this point, I'll use a simple Groovy script to validate the above XML against the above XSD. The code for this Groovy XML validation script (validateXmlAgainstXsd.groovy) is shown next. validateXmlAgainstXsd.groovy #!/usr/bin/env groovy // validateXmlAgainstXsd.groovy // // Accepts paths/names of two files. The first is the XML file to be validated // and the second is the XSD against which to validate that XML. if (args.length < 2) { println "USAGE: groovy validateXmlAgainstXsd.groovy " System.exit(-1) } String xml = args[0] String xsd = args[1] import javax.xml.validation.Schema import javax.xml.validation.SchemaFactory import javax.xml.validation.Validator try { SchemaFactory schemaFactory = SchemaFactory.newInstance(javax.xml.XMLConstants.W3C_XML_SCHEMA_NS_URI) Schema schema = schemaFactory.newSchema(new File(xsd)) Validator validator = schema.newValidator() validator.validate(new javax.xml.transform.stream.StreamSource(xml)) } catch (Exception exception) { println "\nERROR: Unable to validate ${xml} against ${xsd} due to '${exception}'\n" System.exit(-1) } println "\nXML file ${xml} validated successfully against ${xsd}.\n" The next screen snapshot demonstrates running the above Groovy XML validation script against food1.xmland Food.xsd. The objective of this post so far has been to show how different approaches in an XSD can lead to the same XML being valid. Although these different XSD approaches prescribe the same valid XML, they lead to different Java class behavior when JAXB is used to generate classes based on the XSD. The next screen snapshot demonstrates running the JDK-provided JAXB xjc compiler against the Food.xsd to generate the Java classes. The output from the JAXB generation shown above indicates that Java classes were created for the "Vegetable" and "Dessert" elements but not for the "Fruit" element. This is because "Vegetable" and "Dessert" were defined differently than "Fruit" in the XSD. The next code listing is for the Food.java class generated by the xjc compiler. From this we can see that the generated Food.java class references specific generated Java types for Vegetable and Dessert, but references simply a generic Java String for Fruit. Food.java (generated by JAXB jxc compiler) // // This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 // See http://java.sun.com/xml/jaxb // Any modifications to this file will be lost upon recompilation of the source schema. // Generated on: 2015.02.11 at 10:17:32 PM MST // package com.blogspot.marxsoftware.foodxml; import javax.xml.bind.annotation.XmlAccessType; import javax.xml.bind.annotation.XmlAccessorType; import javax.xml.bind.annotation.XmlElement; import javax.xml.bind.annotation.XmlRootElement; import javax.xml.bind.annotation.XmlSchemaType; import javax.xml.bind.annotation.XmlType; /** * Java class for anonymous complex type. * * The following schema fragment specifies the expected content contained within this class. * * * * * * * * * * * * * * * * */ @XmlAccessorType(XmlAccessType.FIELD) @XmlType(name = "", propOrder = { "vegetable", "fruit", "dessert" }) @XmlRootElement(name = "Food") public class Food { @XmlElement(name = "Vegetable", required = true) @XmlSchemaType(name = "string") protected Vegetable vegetable; @XmlElement(name = "Fruit", required = true) protected String fruit; @XmlElement(name = "Dessert", required = true) @XmlSchemaType(name = "string") protected Dessert dessert; /** * Gets the value of the vegetable property. * * @return * possible object is * {@link Vegetable } * */ public Vegetable getVegetable() { return vegetable; } /** * Sets the value of the vegetable property. * * @param value * allowed object is * {@link Vegetable } * */ public void setVegetable(Vegetable value) { this.vegetable = value; } /** * Gets the value of the fruit property. * * @return * possible object is * {@link String } * */ public String getFruit() { return fruit; } /** * Sets the value of the fruit property. * * @param value * allowed object is * {@link String } * */ public void setFruit(String value) { this.fruit = value; } /** * Gets the value of the dessert property. * * @return * possible object is * {@link Dessert } * */ public Dessert getDessert() { return dessert; } /** * Sets the value of the dessert property. * * @param value * allowed object is * {@link Dessert } * */ public void setDessert(Dessert value) { this.dessert = value; } } The advantage of having specific Vegetable and Dessert classes is the additional type safety they bring as compared to a general Java String. Both Vegetable.java and Dessert.java are actually enums because they come from enumerated values in the XSD. The two generated enums are shown in the next two code listings. Vegetable.java (generated with JAXB xjc compiler) // // This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 // See http://java.sun.com/xml/jaxb // Any modifications to this file will be lost upon recompilation of the source schema. // Generated on: 2015.02.11 at 10:17:32 PM MST // package com.blogspot.marxsoftware.foodxml; import javax.xml.bind.annotation.XmlEnum; import javax.xml.bind.annotation.XmlEnumValue; import javax.xml.bind.annotation.XmlType; /** * Java class for Vegetable. * * The following schema fragment specifies the expected content contained within this class. * * * * * * * * * * * * */ @XmlType(name = "Vegetable") @XmlEnum public enum Vegetable { @XmlEnumValue("Carrot") CARROT("Carrot"), @XmlEnumValue("Squash") SQUASH("Squash"), @XmlEnumValue("Spinach") SPINACH("Spinach"), @XmlEnumValue("Celery") CELERY("Celery"); private final String value; Vegetable(String v) { value = v; } public String value() { return value; } public static Vegetable fromValue(String v) { for (Vegetable c: Vegetable.values()) { if (c.value.equals(v)) { return c; } } throw new IllegalArgumentException(v); } } Dessert.java (generated with JAXB xjc compiler) // // This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, v2.2.8-b130911.1802 // See http://java.sun.com/xml/jaxb // Any modifications to this file will be lost upon recompilation of the source schema. // Generated on: 2015.02.11 at 10:17:32 PM MST // package com.blogspot.marxsoftware.foodxml; import javax.xml.bind.annotation.XmlEnum; import javax.xml.bind.annotation.XmlEnumValue; import javax.xml.bind.annotation.XmlType; /** * Java class for Dessert. * * The following schema fragment specifies the expected content contained within this class. * * * * * * * * * * * */ @XmlType(name = "Dessert") @XmlEnum public enum Dessert { @XmlEnumValue("Pie") PIE("Pie"), @XmlEnumValue("Cake") CAKE("Cake"), @XmlEnumValue("Ice Cream") ICE_CREAM("Ice Cream"); private final String value; Dessert(String v) { value = v; } public String value() { return value; } public static Dessert fromValue(String v) { for (Dessert c: Dessert.values()) { if (c.value.equals(v)) { return c; } } throw new IllegalArgumentException(v); } } Having enums generated for the XML elements ensures that only valid values for those elements can be represented in Java. Conclusion JAXB makes it relatively easy to map Java to XML, but because there is not a one-to-one mapping between Java and XML types, there can be some cases where the generated Java type for a particular XSD prescribed element is not obvious. This post has shown how two different approaches to building an XSD to enforce the same basic XML structure can lead to very different results in the Java classes generated with the JAXB xjccompiler. In the example shown in this post, declaring elements in the XSD directly on simpleTypes restricting XSD's string to a specific set of enumerated values is preferable to declaring elements as references to other elements wrapping a simpleType of restricted string enumerated values because of the type safety that is achieved when enums are generated rather than use of general Java Strings.

February 25, 2015

by Dustin Marx

· 21,489 Views · 1 Like

Redirecting All Kinds of stdout in Python

A common task in Python (especially while testing or debugging) is to redirect sys.stdout to a stream or a file while executing some piece of code. However, simply "redirecting stdout" is sometimes not as easy as one would expect; hence the slightly strange title of this post. In particular, things become interesting when you want C code running within your Python process (including, but not limited to, Python modules implemented as C extensions) to also have its stdout redirected according to your wish. This turns out to be tricky and leads us into the interesting world of file descriptors, buffers and system calls. But let's start with the basics. Pure Python The simplest case arises when the underlying Python code writes to stdout, whether by calling print, sys.stdout.write or some equivalent method. If the code you have does all its printing from Python, redirection is very easy. With Python 3.4 we even have a built-in tool in the standard library for this purpose - contextlib.redirect_stdout. Here's how to use it: from contextlib import redirect_stdout f = io.StringIO() with redirect_stdout(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) When this code runs, the actual print calls within the with block don't emit anything to the screen, and you'll see their output captured by in the stream f. Incidentally, note how perfect the with statement is for this goal - everything within the block gets redirected; once the block is done, things are cleaned up for you and redirection stops. If you're stuck on an older and uncool Python, prior to 3.4 [1], what then? Well, redirect_stdout is really easy to implement on your own. I'll change its name slightly to avoid confusion: from contextlib import contextmanager @contextmanager def stdout_redirector(stream): old_stdout = sys.stdout sys.stdout = stream try: yield finally: sys.stdout = old_stdout So we're back in the game: f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) print('Got stdout: "{0}"'.format(f.getvalue())) Redirecting C-level streams Now, let's take our shiny redirector for a more challenging ride: import ctypes libc = ctypes.CDLL(None) f = io.StringIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue())) I'm using ctypes to directly invoke the C library's puts function [2]. This simulates what happens when C code called from within our Python code prints to stdout - the same would apply to a Python module using a C extension. Another addition is the os.system call to invoke a subprocess that also prints to stdout. What we get from this is: this comes from C and this is from echo Got stdout: "foobar 12 " Err... no good. The prints got redirected as expected, but the output from puts and echo flew right past our redirector and ended up in the terminal without being caught. What gives? To grasp why this didn't work, we have to first understand what sys.stdout actually is in Python. Detour - on file descriptors and streams This section dives into some internals of the operating system, the C library, and Python [3]. If you just want to know how to properly redirect printouts from C in Python, you can safely skip to the next section (though understanding how the redirection works will be difficult). Files are opened by the OS, which keeps a system-wide table of open files, some of which may point to the same underlying disk data (two processes can have the same file open at the same time, each reading from a different place, etc.) File descriptors are another abstraction, which is managed per-process. Each process has its own table of open file descriptors that point into the system-wide table. Here's a schematic, taken from The Linux Programming Interface: File descriptors allow sharing open files between processes (for example when creating child processes with fork). They're also useful for redirecting from one entry to another, which is relevant to this post. Suppose that we make file descriptor 5 a copy of file descriptor 4. Then all writes to 5 will behave in the same way as writes to 4. Coupled with the fact that the standard output is just another file descriptor on Unix (usually index 1), you can see where this is going. The full code is given in the next section. File descriptors are not the end of the story, however. You can read and write to them with the read and write system calls, but this is not the way things are typically done. The C runtime library provides a convenient abstraction around file descriptors - streams. These are exposed to the programmer as the opaque FILE structure with a set of functions that act on it (for example fprintf and fgets). FILE is a fairly complex structure, but the most important things to know about it is that it holds a file descriptor to which the actual system calls are directed, and it provides buffering, to ensure that the system call (which is expensive) is not called too often. Suppose you emit stuff to a binary file, a byte or two at a time. Unbuffered writes to the file descriptor with write would be quite expensive because each write invokes a system call. On the other hand, using fwrite is much cheaper because the typicall call to this function just copies your data into its internal buffer and advances a pointer. Only occasionally (depending on the buffer size and flags) will an actual write system call be issued. With this information in hand, it should be easy to understand what stdout actually is for a C program. stdout is a global FILE object kept for us by the C library, and it buffers output to file descriptor number 1. Calls to functions like printf and puts add data into this buffer. fflush forces its flushing to the file descriptor, and so on. But we're talking about Python here, not C. So how does Python translate calls to sys.stdout.write to actual output? Python uses its own abstraction over the underlying file descriptor - a file object. Moreover, in Python 3 this file object is further wrapper in an io.TextIOWrapper, because what we pass to print is a Unicode string, but the underlying write system calls accept binary data, so encoding has to happen en route. The important take-away from this is: Python and a C extension loaded by it (this is similarly relevant to C code invoked via ctypes) run in the same process, and share the underlying file descriptor for standard output. However, while Python has its own high-level wrapper around it - sys.stdout, the C code uses its own FILE object. Therefore, simply replacing sys.stdout cannot, in principle, affect output from C code. To make the replacement deeper, we have to touch something shared by the Python and C runtimes - the file descriptor. Redirecting with file descriptor duplication Without further ado, here is an improved stdout_redirector that also redirects output from C code [4]: from contextlib import contextmanager import ctypes import io import os, sys import tempfile libc = ctypes.CDLL(None) c_stdout = ctypes.c_void_p.in_dll(libc, 'stdout') @contextmanager def stdout_redirector(stream): # The original fd stdout points to. Usually 1 on POSIX systems. original_stdout_fd = sys.stdout.fileno() def _redirect_stdout(to_fd): """Redirect stdout to the given file descriptor.""" # Flush the C-level buffer stdout libc.fflush(c_stdout) # Flush and close sys.stdout - also closes the file descriptor (fd) sys.stdout.close() # Make original_stdout_fd point to the same file as to_fd os.dup2(to_fd, original_stdout_fd) # Create a new sys.stdout that points to the redirected fd sys.stdout = io.TextIOWrapper(os.fdopen(original_stdout_fd, 'wb')) # Save a copy of the original stdout fd in saved_stdout_fd saved_stdout_fd = os.dup(original_stdout_fd) try: # Create a temporary file and redirect stdout to it tfile = tempfile.TemporaryFile(mode='w+b') _redirect_stdout(tfile.fileno()) # Yield to caller, then redirect stdout back to the saved fd yield _redirect_stdout(saved_stdout_fd) # Copy contents of temporary file to the given stream tfile.flush() tfile.seek(0, io.SEEK_SET) stream.write(tfile.read()) finally: tfile.close() os.close(saved_stdout_fd) There are a lot of details here (such as managing the temporary file into which output is redirected) that may obscure the key approach: using dup and dup2 to manipulate file descriptors. These functions let us duplicate file descriptors and make any descriptor point at any file. I won't spend more time on them - go ahead and read their documentation, if you're interested. The detour section should provide enough background to understand it. Let's try this: f = io.BytesIO() with stdout_redirector(f): print('foobar') print(12) libc.puts(b'this comes from C') os.system('echo and this is from echo') print('Got stdout: "{0}"'.format(f.getvalue().decode('utf-8'))) Gives us: Got stdout: "and this is from echo this comes from C foobar 12 " Success! A few things to note: The output order may not be what we expected. This is due to buffering. If it's important to preserve order between different kinds of output (i.e. between C and Python), further work is required to disable buffering on all relevant streams. You may wonder why the output of echo was redirected at all? The answer is that file descriptors are inherited by subprocesses. Since we rigged fd 1 to point to our file instead of the standard output prior to forking to echo, this is where its output went. We use a BytesIO here. This is because on the lowest level, the file descriptors are binary. It may be possible to do the decoding when copying from the temporary file into the given stream, but that can hide problems. Python has its in-memory understanding of Unicode, but who knows what is the right encoding for data printed out from underlying C code? This is why this particular redirection approach leaves the decoding to the caller. The above also makes this code specific to Python 3. There's no magic involved, and porting to Python 2 is trivial, but some assumptions made here don't hold (such as sys.stdout being a io.TextIOWrapper). Redirecting the stdout of a child process We've just seen that the file descriptor duplication approach lets us grab the output from child processes as well. But it may not always be the most convenient way to achieve this task. In the general case, you typically use the subprocess module to launch child processes, and you may launch several such processes either in a pipe or separately. Some programs will even juggle multiple subprocesses launched this way in different threads. Moreover, while these subprocesses are running you may want to emit something to stdout and you don't want this output to be captured. So, managing the stdout file descriptor in the general case can be messy; it is also unnecessary, because there's a much simpler way. The subprocess module's swiss knife Popen class (which serve as the basis for much of the rest of the module) accepts a stdout parameter, which we can use to ask it to get access to the child's stdout: import subprocess echo_cmd = ['echo', 'this', 'comes', 'from', 'echo'] proc = subprocess.Popen(echo_cmd, stdout=subprocess.PIPE) output = proc.communicate()[0] print('Got stdout:', output) The subprocess.PIPE argument can be used to set up actual child process pipes (a la the shell), but in its simplest incarnation it captures the process's output. If you only launch a single child process at a time and are interested in its output, there's an even simpler way: output = subprocess.check_output(echo_cmd) print('Got stdout:', output) check_output will capture and return the child's standard output to you; it will also raise an exception if the child exist with a non-zero return code. Conclusion I hope I covered most of the common cases where "stdout redirection" is needed in Python. Naturally, all of the same applies to the other standard output stream - stderr. Also, I hope the background on file descriptors was sufficiently clear to explain the redirection code; squeezing this topic in such a short space is challenging. Let me know if any questions remain or if there's something I could have explained better. Finally, while it is conceptually simple, the code for the redirector is quite long; I'll be happy to hear if you find a shorter way to achieve the same effect. [1] Do not despair. As of February 2015, a sizable chunk of the worldwide Python programmers are in the same boat. [2] Note that bytes passed to puts. This being Python 3, we have to be careful since libc doesn't understand Python's unicode strings. [3] The following description focuses on Unix/POSIX systems; also, it's necessarily partial. Large book chapters have been written on this topic - I'm just trying to present some key concepts relevant to stream redirection. [4] The approach taken here is inspired by this Stack Overflow answer.

February 23, 2015

by Eli Bendersky

· 19,735 Views