DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Culture and Methodologies Topics

article thumbnail
Bug Fixing: To Estimate, or Not to Estimate: That is The Question
According to Steve McConnell in Code Complete (data from 1975-1992) most bugs don’t take long to fix. About 85% of errors can be fixed in less than a few hours. Some more can be fixed in a few hours to a few days. But the rest take longer, sometimes much longer – as I talked about in an earlier post. Given all of these factors and uncertainty, how to you estimate a bug fix? Or should you bother? Block out some time for bug fixing Some teams don’t estimate bug fixes upfront. Instead they allocate a block of time, some kind of buffer for bug fixing as a regular part of the team’s work, especially if they are working in time boxes. Developers come back with an estimate only if it looks like the fix will require a substantial change – after they’ve dug into the code and found out that the fix isn’t going to be easy, that it may require a redesign or require changes to complex or critical code that needs careful review and testing. Use a rule of thumb placeholder for each bug fix Another approach is to use a rough rule of thumb, a standard place holder for every bug fix. Estimate ½ day of development work for each bug, for example. According to this post on Stack Overflow the ½ day suggestion comes from Jeff Sutherland, one of the inventors of Scrum. This place holder should work for most bugs. If it takes a developer more than ½ day to come up with a fix, then they probably need help and people need to know anyways. Pick a place holder and use it for a while. If it seems too small or too big, change it. Iterate. You will always have bugs to fix. You might get better at fixing them over time, or they might get harder to find and fix once you’ve got past the obvious ones. Or you could use the data earlier from Capers Jones on how long it takes to fix a bug by the type of bug. A day or half day works well on average, especially since most bugs are coding bugs (on average 3 hours) or data bugs (6.5 hours). Even design bugs on average only take little more than a day to resolve. Collect some data – and use it Steve McConnell, In Software Estimation: Demystifying the Black Art says that it’s always better to use data than to guess. He suggests collecting time data for as little as a few weeks or maybe a couple of months on how long on average it takes to fix a bug, and use this as a guide for estimating bug fixes going forward. If you have enough defect data, you can be smarter about how to use it. If you are tracking bugs in a bug database like Jira, and if programmers are tracking how much time they spend on fixing each bug for billing or time accounting purposes (which you can also do in Jira), then you can mine the bug database for similar bugs and see how long they took to fix – and maybe get some ideas on how to fix the bug that you are working on by reviewing what other people did on other bugs before you. You can group different bugs into buckets (by size – small, medium, large, x-large – or type) and then come up with an average estimate, and maybe even a best case, worst case and most likely for each type. Use Benchmarks For a maintenance team (a sustaining engineering or break/fix team responsible for software repairs only), you could use industry productivity benchmarks to project how many bugs your team can handle. Capers Jones in Estimating Software Costs says that the average programmer (in the US, in 2009), can fix 8-10 bugs per month (of course, if you’re an above-average programmer working in Canada in 2012, you’ll have to set these numbers much higher). Inexperienced programmers can be expected to fix 6 a month, while experienced developers using good tools can fix up to 20 per month. If you’re focusing on fixing security vulnerabilities reported by a pen tester or a scan, check out the remediation statistical data that Denim Group has started to collect, to get an idea on how long it might take to fix a SQL injection bug or an XSS vulnerability. So, do you estimate bug fixes, or not? Because you can’t estimate how long it will take to fix a bug until you’ve figured out what’s wrong, and most of the work in fixing a bug involves figuring out what’s wrong, it doesn’t make sense to try to do an in-depth estimate of how long it will take to fix each bug as they come up. Using simple historical data, a benchmark, or even a rough guess place holder as a rule-of-thumb all seem to work just as well. Whatever you do, do it in the simplest and most efficient way possible, don’t waste time trying to get it perfect – and realize that you won’t always be able to depend on it. Remember the 10x rule – some outlier bugs can take up to 10x as long to find and fix than an average bug. And some bugs can’t be found or fixed at all – or at least not with the information that you have today. When you’re wrong (and sometimes you’re going to be wrong), you can be really wrong, and even careful estimating isn’t going to help. So stick with a simple, efficient approach, and be prepared when you hit a hard problem, because it's gonna happen.
October 12, 2012
by Jim Bird
· 23,022 Views
article thumbnail
Choosing Static vs. Dynamic Languages for Your Startup
Everyone is thinking why in the world would anyone pick static, when you can be dynamic? Usually the thought process is, "what language am I most proficient in, that can do the job." Totally not a bad way to go about it. Now does this choice affect anything else? Testing? Speed of development? Robustness? Dynamic vs. Static Dynamic languages are languages that don’t necessarily need variables to be declared before they are used. Examples of dynamic languages are Python, Ruby, and PHP. So in dynamic languages the following is possible: num = 10 We have successfully assigned a value to variable without declaring it before hand. Simple enough, try doing this in Java (you can’t). This can *increase* development speed, without having to write boilerplate code. This can somewhat be a double edge sword, since dynamic languages types are checked during runtime, there is no way to tell if there is a bug in code until it is run. I know you can test, but you can’t test for everything. You can’t test for everything. Here is an example albeit trivial. def get_first_problem(problems): for problem in problems: problam = problem + 1 return problam Now if you are raging to some serious dubstep, its easy enough to miss that small typo, you go screw it and do it live, and deploy to production. Python will simply create the new variable and not a single thing will be said. Only you can stop bugs in production! Static languages are languages that variables need to be declared before use and type checking is done at compile time. Examples of static languages include Java, C, and C++. So in static languages the following is enforced static int awesomeNumber; awesomeNumber = 10; Many argue this increases robustness as well as decrease chances of Runtime Errors. Since the compiler will catch those horrible horrible mistakes you made throughout your code. Your methods contracts are tighter, downside to this is crap ton of boilerplate code. Weak and Strong Typing can be often be confused with dynamic and static languages. Weak typed languages can lead to philosophical questions like what does the number 2 added to the word ‘two’ give you? Things like this are possible with a weak typed language. a = 2 b = "2" concatenate(a, b) // Returns "22" add(a, b) // Returns 4 Traditionally languages may place restriction on what transaction may occur for example in a strong typed language adding a string and integer will result in a type error as shown below. >>> a = 10 >>> b = 'ten' >>> a + b Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'int' and 'str' >>> Conclusion Regardless of where you land on this discussion, claiming one is better than the other would lead to flame war, but there are places where each is strong. Dynamic languages are good for fast quick development cycles and prototyping, while static languages are better suited to longer development cycles where trivial bugs could be extremely costly (telecommunication systems, air traffic control). For example if some giant company called Moo Corp. spent millions of dollars on QA and Testing and a bug somehow gets into the field, to fix it would mean another round of testing. When sitting in that chair the choice is clear static languages FTW, its a hard job but someone has to milk the cows. Test, test, and test. Just a little food for thought, for when you are starting your next project. You never know what limitations you maybe placing on yourself and your team. What do you do consider when selecting a programming language for a project?
September 25, 2012
by Mahdi Yusuf
· 24,870 Views
article thumbnail
Your First Hadoop MapReduce Job
Hadoop MapReduce is a YARN-based system for parallel processing of large data sets. In this article, learn to quickly start writing the simplest MapReduce job.
September 12, 2012
by Amresh Singh
· 19,627 Views
article thumbnail
Manual Test-Driven Development
Test-Driven Development is a code-level practice, based on running automated tests that are written before the production code they exercise. But practices can be applied only in the context where they were developed: when some premises are not present is difficult to apply TDD as-is. Automated specification For example, consider the premise of assertion automation: it is possible to write a (hopefully) small algorithm that is able to check the result of running production code and return true or false. In the case the problem is: Draw an antialiased circle on this blank canvas. -- Carlo Pescio it is not immediately clear how to define automated tests for this behavior. We could check that some pixels are still blank inside or outside the circle, or that there is a bound number of pixels of black color; or even that they are contiguous. An opinion I've heard (that I try not to misrepresent) is that we only need to write some looser tests in these cases, checking only a few pixels of the circle. This process will give us a little feedback on the API of our Canvas or Circle object, but not much on the algorithm we are implementing inside it. Are we going in the right direction? Have new test cases correctly been satisfied without a large intervention on the existing code? Are we painting some unrelated pixels due to an hidden bug? What I argument here is instead that we should change the nature of the feedback mechanism. Speaking in control theory terms, change the block that acquires the output and influences the input to our design process. Develop in the browser When I was developing a Couchapp, a kind of web application served directly from a CouchDB database, I was appaled by the difficulty of testing it. While the production code was composed of ~100 lines, it was a complex mix of technologies: HTML and CSS code, client-side JavaScript for managing user events and some server-side JavaScript for the "queries" (actually the server-side only consists of the database in Couchapps.) Some of this logic could be tested in automation, like the result of queries over views. Yet much of it was related to a user interface, and as such requiring a large time investment to automate. Instead of waking up my Selenium server and start to manipulate a browser with code, I noticed that this UI was almost read-only; there were a few cases where a new document would have to be inserted, but a manual test of them was short and did not even required to reload the page. The whole application state was observable. Summing it up, I performed a frequent manual test that took a few seconds instead of trying to define complex and brittle automation logic for testing the UI. Now that I've been introduced to a simple qualitative ROI model by Carlo Pescio's article, I would do the same for every context where: a large time investment is needed for automating tests. it is possible to perform manual tests quickly. as the only logic conclusion. A word of caution TDD has many benefits (including catching regressions early) so I'm not prepared to give it up just because it is difficult to test. These are technical scenarios where I have successfully followed TDD by the book: multithreaded and multiprocess code applications distributed over multiple machines computer vision (object recognition and tracking) image manipulation code (via comparison testing) development of browser bindings for Selenium And even in the case the big picture is not easy to test-first (like in the case of image manipulation), we can benefit from TDD the pieces of the solution. For example, in the computer vision case I wasn't able to write a test beforehand for tracking a car inside a movie. But I was able to TDD the objects that the algorithmic solution to the problem called for: Patch, Area, Cluster, Movement, and so on. End-to-end TDD is not always cheap but unit level TDD can often be, if it considers testability as a relevant property (while regression testing even at the end-to-end level is always possible, in the worst case with record and replay.) End-to-end specifications If we can't define automated assertions for our "big picture" problem, it doesn't mean that we cannot apply the TDD approach, by substituting a manual step. Going back to the circle problem, I would define manual test cases on an inspection page seen by a human. I've seen this done with layouts and multiple browsers to catch CSS rendering bugs, for example: It would be very difficult to check these screenshots automatically, as each browser renders pages a bit differently from the others. The iterative process becomes: Define a cheap manual test, automating the arrange and act phases but not the assertion. Write only the code necessary to make it pass. Refactor. As long as the number of tests does not increase without limit and the manual check can be performed quickly, this approach does not slow you down with respect to TDD by-the-book. You'll have to take care of regression with other means; but at least you define a set of manual test cases. Feedback! TDD is an instrument of feedback: if feedback cannot be gathered in an automated way, we have to resort to manual checking of the specifications. Here are other examples of manual tools for generating feedback: Read-Eval-Print Loops: you can experimenting with existing classes and functions, and easily repeat steps thanks to history. the browser refresh button: the fastest way to transform a PSD into an HTML and CSS template. MongoDB console for learning the database API; other kinds of consoles like Firebug and Chrome's, or Clojure's.
September 3, 2012
by Giorgio Sironi
· 10,232 Views
article thumbnail
Build Flow Jenkins Plugin
With the advent of Continuous Integration and Continuous Delivery, our builds are split into different steps creating the deployment pipeline. Some of these steps can be compiled and run fast tests, run slow tests, run automated acceptance tests, or releasing the application, to cite a few. Most of us are using Jenkins/Hudson to implement Continuous Integration/Delivery, and we manage job orchestration combining some Jenkins plugins like build pipeline, parameterized-build, join or downstream-ext. We have to configure all of them which implies polluting the job configuration through multiple jobs, which , makes the system configuration very complex to maintain. Build Flow enables us to define an upper level flow item to manage job orchestration and link up rules, using a dedicated DSL. Let's see a very simple example: First step is installing the plugin. Go to Jenkins -> Manage Jenkins -> Plugin Manager -> Available and find for CloudBees Build Flowplugin. Then you can go to Jenkins -> New Job and you will see a new kind of job called Build Flow. In this example we are going to name it build-all-yy. And now you only have to program using flow DSL how this job should orchestrate the other jobs. In "Define build flow using flow DSL" input text you can specify the sequence of commands to execute. In current example I have already created two jobs, one executing clean compile goal (yy-compile job name) and the other one executing javadoc goal (yy-javadoc job name). I know that this deployment pipeline is not real in a true environment but for now it is enough. Then we want javadoc job running after project is compiled. To configure this we don't have to create any upstream or downstream actions, simply add next lines at DSL text area: build("yy-compile"); build("yy-javadoc"); Save and execute build-all-yy job and both projects will be built in a sequential way. Now suppose that we add a third job called yy-sonar which runs sonar goal that generates code quality sonar report. In this case it seems obvious that after project is compiled, generation of javadocs and code quality jobs can be run in parallel. So script is changed to: build("yy-compile") parallel ( {build("yy-javadoc")}, {build("yy-sonar")} ) This plugin also supports more operations like retry (similar behaviour of retry-failed-job plugin) or guard-rescue, that it works mostly like a try+finally block. Also you can create parameterized builds, accessing to build execution or printing to Jenkins console. Next example will print build number of yy-compile job execution: b = build("yy-compile") out.println b.build.number And finally you can also have a quick graphical overview of the execution in Status section. It is true that could be improved more, but for now it is acceptable, and can be used without any problem. Build Flow plugin is in its early stages, in fact it is only at version 0.4. But will be a plugin to be considered in future, and I think it is good to know that it exists. Moreover is being developed by CloudBees folks so it is a guarantee of being fully supported by Jenkins. We Keep Learning. Alex. Warning: In order to run parallel tasks with the plugin Anonymous users must have Read Job access (Jenkins -> Manage Jenkins -> Configure System). There is an issue already inserted into Jira to fix this problem.
August 2, 2012
by Alex Soto
· 37,664 Views · 1 Like
article thumbnail
Bringing Order to Your Jenkins Jobs
Once you’ve been working with Jenkins and uberSVN for a while, you may find yourself in a situation where you have several jobs that need to run in a specific order, for example: Job 1 and Job 3 can run simultaneously. BUT Job 2 should only start when Job 1 and Job 3 have finished running. AND Job 4 should only start when Job 2 has finished. How can you implement this complicated setup? This is where Jenkins’ ‘Advanced Project Options’ and build triggers come in handy. In this tutorial, we’ll walk through the different options for scheduling jobs using Jenkins and uberSVN, the free ALM platform for Apache Subversion. Note, this tutorial assumes you have already created a job and configured it to automatically poll your Subversion repository. 1) Open the Jenkins tab of your uberSVN installation and select a job. 2) Click the ‘Configure’ option from the left-hand menu. 3) In the ‘Advanced Project Options’ tab, select the ‘Advanced…’ button 4) This contains two options that are useful for ordering your jobs: Block build when upstream project is building – blocks builds when a dependency is in the queue, or building. Note, these dependencies include both direct and transitive dependencies. Block build when downstream project is building – blocks builds when a child of the project is in the queue, or building. This applies to both direct and transitive children. If this option doesn’t meet your needs, you can explicitly name a project (or projects) that must be built before your job is allowed to run. To set this: 1) Scroll down to the ‘Build triggers’ tab on the configure page. 2) Select the ‘Build after other projects are built’ checkbox. This will bring up a text box where you can list any number of projects. Utilized properly, the build triggers and advanced project options should allow you to organize your jobs into a schedule. Tip, if you need even more control over your build schedule, there are plenty of scheduling plugins available. To add plugins to Jenkins, simply: 1) Open the ‘Manage Jenkins’ screen. 2) Click the ‘Manage Plugins’ link. 3) Open the ‘Available’ tab and select the appropriate plugins from the list.
July 28, 2012
by Jessica Thornsby
· 21,051 Views
article thumbnail
Set up a Nightly Build Process with Jenkins, SVN and Nexus
we wanted to set up a nightly integration build with our projects so that we could run unit and integration tests on the latest version of our applications and their underlying libraries. we have a number of libraries that are shared across multiple projects and we wanted this build to run every night and use the latest versions of those libraries even if our applications had a specific release version defined in their maven pom file. in this way we would be alerted early if someone added a change to one of the dependency libraries that could potentially break an application when the developer upgraded the dependent library in a future version of the application. the chart below illustrates our dependencies between our libraries and our applications. updating versions nightly both the crossdock-shared and messaging-shared libraries depend on the siesta framework library. the crossdock web service and crossdockmessaging applications both depend on the crossdock-shared and messaging-shared libraries. because of the dependency structure, we wanted the siestaframework library built first. the crossdock-shared and messaging-shared libraries could be built in parallel, but we didn’t want the builds for the crossdock web service and crossdockmessaging applications to begin until all the libraries had finished building. we also wanted the nightly build to tag a subversion with the build date as well as upload the artifact to our nexus “nightly build” repository. the resulting artifact would look something like siestaframework-20120720.jar also as i had mentioned, even though the crossdockmessaging app may specify in its pom file it depends on version 5.0.4 of the siestaframework library. for the purposes of the nightly build, we wanted it to use the freshly built siestaframework-nightly-20120720.jar version of the library. the first problem to tackle was getting the current date into the project’s version number. for this i started with the jenkins zentimestamp plugin . with this plugin the format of jenkin’s build_id timestamp can be changed. i used this to specify using the format of yyyymmdd for the timestamp. the next step was to get the timestamp into the version number of the project. i was able to accomplish this by using the maven versions plugin. one thing the versions plugin can do is allow you to dynamically override the version number in the pom file at build time. the code snippet from the siestaframework’s pom file is below. org.codehaus.mojo versions-maven-plugin 1.3.1 at this point the jenkins job can be configured to invoke the “versions;set” goal, passing in the new version string to use. the ${build_id} jenkins variable will have the newly formatted date string. this will produce an artifact with the name siestaframework-nightly-20120720.jar uploading artifacts to a nightly repository since this job needed to upload the artifact to a different repository from our release repository that's defined in our project pom files, the “altdeploymentrepository” property was used to pass in the location of the nightly repository. the deployment portion of the siestaframework job specifies the location of the nightly repository where ${lynden_nightly_repo} is a jenkins variable containing the nightly repo url. tagging subversion finally, the jenkins subversion tagging plugin was used to tag svn if the project was successfully built. the plugin provides a post-build action for the job with the configuration section shown below. dynamically updating dependencies so now that the main project is set up, the dependent projects are set up in a similar way, but need to be configured to use the siestaframework-nightly-20120720 of the dependency rather than whatever version they currently have specified in their pom file. this can be accomplished by changing the pom to use a property for the version number of the dependency. for example, if the snippet below was the original pom file— com.lynden siestaframework 5.0.1 —changing it to the following would allow the siestaframework version to be set dynamically: 5.0.1 com.lynden siestaframework ${siesta.version} this version can then be overriden by the jenkins job. the example below shows the jenkins configuration for the crossdock-shared build. enforcing build order the final step in this process is setting up a structure to enforce the build order of the projects. the dependencies are set up in such a way that siestaframework needs to be built first, and the crossdock-shared and messaging-shared libraries can be run concurrently once siestaframework finishes. the crossdock web service and crossdockmessaging application jobs can be run concurrently, too, but not until after both shared libraries have finished. setting up the crossdock-shared and messaging-shared jobs to be built after the siestaframework finishes is pretty straightforward. in the jenkins job configuration for both the shared libraries, the following build trigger is added: to satisfy the requirement that the apps build only after all libraries have built, i enlisted the help of the join plugin . the join plugin can be used to execute a job once all “downstream” jobs have completed. what does this mean exactly? looking at the diagram below, the crossdock-shared and the messaging-shared jobs are “downstream” from the siestaframework job. once both of these jobs complete, a join trigger can be used to start other jobs. in this case, rather than having the join trigger kick off other app jobs directly, i created a dummy join job. in this way, as we add more application builds, we don’t need to keep modifying the siestaframework job with the new application job we just added. to illustrate the configuration, siestaframework has a new post-build action (below): join-build is a jenkins job i configured that does not do anything when executed. then our crossdock web service and crossdockmessaging applications define their builds to trigger as soon as join-build has completed. in this way we are able to run builds each night that will update to the latest version of our dependencies as well as tag svn and archive the binaries to nexus. i’d love to hear feedback from anyone who is handling nightly builds via jenkins, and how they have handled the configuration and build issues.
July 25, 2012
by Rob Terpilowski
· 22,837 Views
article thumbnail
20 Subjects Every Software Engineer Should Know
Here are the most important subjects for software engineering, with brief explanations: 1.Object oriented analysis & design: For better maintainability, reusability and faster development, the most well accepted approach, shortly OOAD and its SOLID principals are very important for software engineering. 2.Software quality factors: Software engineering depends on some very important quality factors. Understanding and applying them is crucial. 3.Data structures & algorithms: Basic data structures like array, list, stack, tree, map, set etc. and useful algorithms are vital for software development. Their logical structure should be known. 4. Big-O notation: Big-O notation indicates the performance of an algorithm/code section. Understanding it is very important for comparing performances. 5.UML notation: UML is the universal and complete language for software design & analysis. If there is lack of UML in a development process, it feels there is no engineering. 6.Software processes and metrics: Software enginnering is not a random process. It requires a high level of systematic and some numbers to monitor those techniques. So, processes and metrics are essential. 7.Design patterns: Design patterns are standard and most effective solutions for specific problems. If you don't want to reinvent the wheel, you should learn them. 8.Operating systems basics: Learning OS basics is very important because all applications runs on it. By learning it, we can have better vision, viewpoints and performance for our applications. 9.Computer organization basics: All applications including OS requires a hardware for physical interaction. So, learning computer organization basics is vital again for better vision, viewpoints and performance. 10.Network basics: Network is related with computer organization, OS and the whole information transfer process. In any case we will face it while software development. So, it is important to learn network basics. 11.Requirement analysis: Requirement analysis is the starting point and one of the most important parts of software engineering. Performing it correctly and practically needs experience but it is very essential. 12.Software testing: Testing is another important part of software engineering. Unit testing, its best practices and techniques like black box, white box, mocking, TDD, integration testing etc. are subjects which must be known. 13.Dependency management: Library (JAR, DLL etc.) management, and widely known tools (Maven, Ant, Ivy etc.) are essential for large projects. Otherwise, antipatterns like Jar Hell are inevitable. 14.Continuous integration: Continuous integration brings easiness and automaticity for testing large modules, components and also performs auto-versioning. Its aim and tools (like Hudson etc.) should be known. 15.ORM (Object relational mapping): ORM and its widely known implementation Hibernate framework is an important technique for mapping objects into database tables. It reduces code length and maintenance time. 16.DI (Dependency Injection): DI or IoC (Inversion of Control) and its widely known implementation Spring framework makes life easy for object creation and lifetime management on big enterprise applications. 17.Version controlling systems: VCS tools (SVN, TFS, CVS etc.) are very important by saving so much time for collaborative works and versioning. Their logical viewpoint and standard cammands should be known. 18.Internationalization (i18n): i18n by extracting strings into external files is the best way of supporting multiple languages in our applications. Its practices on different IDEs and technologies must be known. 19.Architectural patterns: Understanding architectural design patterns (like MVC, MVP, MVVM etc.) is essential for producing a maintainable, clean, extendable and testable source code. 20.Writing clean code: Working code is not enough, it must be readable and maintainable also. So, code formatting and readable code development techniques are needed to be known and applied.
July 2, 2012
by Cagdas Basaraner
· 108,559 Views · 5 Likes
article thumbnail
Reportlab: Mixing Fixed Content and Flowables
Recently I needed the ability to use Reportlab’s flowables, but place them in fixed locations. Some of you are probably wondering why I would want to do that. The nice thing about flowables, like the Paragraph, is that they’re easily styled. If I could bold something or center something AND put it in a fixed location, then that would rock! It took a lot of Googling and trial and error, but I finally got a decent template put together that I could use for mailings. In this article, I’m going to show you how to do this too. Getting Started You’ll need to make sure you have Reportlab or you’ll end up with a whole lot of nothing. You can go here to grab it. While you wait for it to download you can continue reading this article or go do something else productive. Are you ready now? Then let’s get this show on the road! Now we just need to come up with an example. Fortunately I was working on something at my job that I’ve been able to dummy up into the following silly and incomplete form letter. Study the code closely because you never know when there will be a test from reportlab.lib.pagesizes import letter from reportlab.lib.styles import getSampleStyleSheet from reportlab.lib.units import mm, inch from reportlab.pdfgen import canvas from reportlab.platypus import Image, Paragraph, Table ######################################################################## class LetterMaker(object): """""" #---------------------------------------------------------------------- def __init__(self, pdf_file, org, seconds): self.c = canvas.Canvas(pdf_file, pagesize=letter) self.styles = getSampleStyleSheet() self.width, self.height = letter self.organization = org self.seconds = seconds #---------------------------------------------------------------------- def createDocument(self): """""" voffset = 65 # create return address address = """ Jack Spratt 222 Ioway Blvd, Suite 100 Galls, TX 75081-4016 """ p = Paragraph(address, self.styles["Normal"]) # add a logo and size it logo = Image("snakehead.jpg") logo.drawHeight = 2*inch logo.drawWidth = 2*inch ## logo.wrapOn(self.c, self.width, self.height) ## logo.drawOn(self.c, *self.coord(140, 60, mm)) ## data = [[p, logo]] table = Table(data, colWidths=4*inch) table.setStyle([("VALIGN", (0,0), (0,0), "TOP")]) table.wrapOn(self.c, self.width, self.height) table.drawOn(self.c, *self.coord(18, 60, mm)) # insert body of letter ptext = "Dear Sir or Madam:" self.createParagraph(ptext, 20, voffset+35) ptext = """ The document you are holding is a set of requirements for your next mission, should you choose to accept it. In any event, this document will self-destruct %s seconds after you read it. Yes, %s can tell when you're done...usually. """ % (self.seconds, self.organization) p = Paragraph(ptext, self.styles["Normal"]) p.wrapOn(self.c, self.width-70, self.height) p.drawOn(self.c, *self.coord(20, voffset+48, mm)) #---------------------------------------------------------------------- def coord(self, x, y, unit=1): """ # http://stackoverflow.com/questions/4726011/wrap-text-in-a-table-reportlab Helper class to help position flowables in Canvas objects """ x, y = x * unit, self.height - y * unit return x, y #---------------------------------------------------------------------- def createParagraph(self, ptext, x, y, style=None): """""" if not style: style = self.styles["Normal"] p = Paragraph(ptext, style=style) p.wrapOn(self.c, self.width, self.height) p.drawOn(self.c, *self.coord(x, y, mm)) #---------------------------------------------------------------------- def savePDF(self): """""" self.c.save() #---------------------------------------------------------------------- if __name__ == "__main__": doc = LetterMaker("example.pdf", "The MVP", 10) doc.createDocument() doc.savePDF() Now you’ve seen the code, so we’ll spend a little time going over how it works. First off we create a Canvas object that we can use without our LetterMaker class. We also create a styles dict and set up a few other class variables. In the createDocument method, we create a Paragraph (an address) using some HTML-like tags to control the font and line breaking behavior. Then we create a logo and size it before putting both items into a Reportlab Table object. You’ll note that I’ve left in a couple commented out lines that show how to place the logo without the table. We use the coord method to help position the flowable. I found it on StackOverflow and thought it was pretty handy. The body of the letter uses a little string substitution and puts the result into another Paragraph. We also use a stored offset to help us position things. I find that storing a couple of offsets for certain portions of the code is very helpful. If you use them carefully then you can just change a couple of offsets to move the content around on the document rather than having to edit the position of each element. If you need to draw lines or shapes, you can do them in the usual way with your canvas object. Wrapping Up I hope this code will help you in your PDF creation endeavors. I have to admit that I’m posting it on here as much for my own future benefit as for your own. I’m a little sad I had to strip out so much from it, but my organization wouldn’t like it very much if I posted the original. Regardless, you now have the tools to create some pretty fancy PDF documents with Python. Now you just have to get out there and do it!
June 29, 2012
by Mike Driscoll
· 19,871 Views
article thumbnail
Continuous Delivery vs. Traditional Agile
in working with development teams at organizations which are adopting continuous delivery , i have found there can be friction over practices that many developers have come to consider as the right way for agile teams to work. i believe the root of conflicts between what i’ve come to think of as traditional agile and cd is the approach to making software “ready for release”. evolution of software delivery a usefully simplistic view of the evolution of ideas about making software ready for release is this: waterfall believes a team should only start making its software ready for release when all of the functionality for the release has been developed (i.e. when it is “feature complete”). agile introduces the idea that the team should get their software ready for release throughout development. many variations of agile (which i refer to as “traditional agile” in this post) believe this should be done at periodic intervals. continuous delivery is another subset of agile which in which the team keeps its software ready for release at all times during development. it is different from “traditional” agile in that it does not involve stopping and making a special effort to create a releasable build. continuous delivery is not about shorter cycles going from traditional agile development to continuous delivery is not about adopting a shorter cycle for making the software ready for release. making releasable builds every night is still not continuous delivery. cd is about moving away from making the software ready as a separate activity, and instead developing in a way that means the software is always ready for release. ready for release does not mean actually releasing a common misunderstanding is that continuous delivery means releasing into production very frequently. this confusion is made worse by the use of organizations that release software multiple times every day as poster children for cd. continuous delivery doesn’t require frequent releases, it only requires ensuring software could be released with very little effort at any point during development. (see jez humble’s article on continuous delivery vs. continuous deployment .) although developing this capability opens opportunities which may encourage the organization to release more often, many teams find more than enough benefit from cd practices to justify using it even when releases are fairly infrequent. friction points between continuous delivery and traditional agile as i mentioned, there are sometimes conflicts between continuous delivery and practices that development teams take for granted as being “proper” agile. friction point: software with unfinished work can still be releasable one of these points of friction is the requirement that the codebase not include incomplete stories or bugfixes at the end of the iteration. i explored this in my previous post on iterations . this requirement comes from the idea that the end of the iteration is the point where the team stops and does the extra work needed to prepare the software for release. but when a team adopts continuous delivery, there is no additional work needed to make the software releasable. more to the point, the cd team ensures that their code could be released to production even when they have work in progress, using techniques such as feature toggles . this in turn means that the team can meet the requirement that they be ready for release at the end of the iteration even with unfinished stories. this can be a bit difficult for people to swallow. the team can certainly still require all work to be complete at the iteration boundary, but this starts to feel like an arbitrary constraint that breaks the team’s flow. continuous delivery doesn’t require non-timeboxed iterations, but the two practices are complementary. friction point: snapshot/release builds many development teams divide software builds into two types, “snapshot” builds and “release” builds. this is not specific to agile, but has become strongly embedded in the java world due to the rise of maven, which puts the snapshot/build concept at the core of its design. this approach divides the development cycle into two phases, with snapshots being used while software is in development, and a release build being created only when the software is deemed ready for release. this division of the release cycle clearly conflicts with the continuous delivery philosophy that software should always be ready for release. the way cd is typically implemented involves only creating a build once, and then promoting it through multiple stages of a pipeline for testing and validation activities, which doesn’t work if software is built in two different ways as with maven. it’s entirely possible to use maven with continuous delivery, for example by creating a release build for every build in the pipeline. however this leads to friction with maven tools and infrastructure that assume release builds are infrequent and intended for production deployment. for example, artefact repositories such as nexus and artefactory have housekeeping features to delete old snapshot builds, but don’t allow release builds to be deleted. so an active cd team, which may produce dozens of builds a day, can easily chew through gigabytes and terabytes of disk space on the repository. friction point: heavier focus on testing deployability a standard practice with continuous delivery is automatically deploying every build that passes basic continuous integration to an environment that emulates production as closely as possible, using the same deployment process and tooling. this is essential to proving whether the code is ready for release on every commit, but this is more rigorous than many development teams are used to having in their ci. for example, pre-cd continuous integration might run automated functional tests against the application by deploying it to an embedded application server using a build tool like ant or maven. this is easier for developers to use and maintain, but is probably not how the application will be deployed in production. so a cd team will typically add an automated deployment to an environment will more fully replicates production, including separated web/app/data tiers, and deployment tooling that will be used in production. however this more production-like deployment stage is more likely to fail due to its added complexity, and may be may be more difficult for developers to maintain and fix since it uses tooling more familiar to system administrators than to developers. this can be an opportunity to work more closely with the operations team to create a more reliable, easily supported deployment process. but it is likely to be a steep curve to implement and stabilize this process, which may impact development productivity. is cd worth it? given these friction points, what benefit is there to moving from traditional agile to continuous delivery worthwhile, especially for a team that is unlikely to actually release into production more often than every iteration? decrease risk by uncovering deployment issues earlier, increase flexibility by giving the organization the option to release at any point with minimal added cost or risk, involves everyone involved in production releases - such as qa, operations, etc. - in making the full process more efficient. the entire organization must identify difficult areas of the process and find ways to fix them, through automation, better collaboration, and improved working practices, by continuously rehearsing the release process, the organization becomes more competent at doing it, so that releasing becomes autonomic, like breathing, rather than traumatic, like giving birth, improves the quality of the software, by forcing the team to fix problems as they are found rather than being able to leave things for later. dealing with the friction the friction points i’ve described seem to come up fairly often when continuous delivery is being introduced. my hope is that understanding the source of this friction will be helpful in discussing it when it comes up, and working through the issues. if developers who are initially uncomfortable with breaking with the “proper” way of doing things, or find a cd pipeline overly complex or difficult understand the aims and value of these practices, hopefully they will be more open to giving them a chance. once these practices become embedded and mature in an organization, team members often find it’s difficult to go back to the old ways of doing them. edit: i’ve rephrased the definition of the “traditional agile” approach to making software ready for release. this definition is not meant to apply to all agile practices, but rather applies to what seems to me to be a fairly mainstream belief that agile means stopping work to make the software releasable.
May 9, 2012
by Kief Morris
· 54,135 Views · 7 Likes
article thumbnail
Lean Tools: the Last Responsible Moment
Options Thinking lead us to invest time and money in delaying decisions to a time where we know the most about it; the extreme application of the Decide as late as possible principle is the concept of Last Responsible Moment, the optimal point of the trade-off between the available time for a decision and the need to complete a story or a task. The last responsible moment is the instant in which the cost of the delay of a decision surpasses the benefit of delay; or the moment when failing to take a decision eliminates an important alternative. For example, failing to provide a public HTTP API may make you lose an important customer, forcing you to publish an unfinished work. Tactics Mary Poppendiesk describes several tactics for delaying decisions until the last responsible moment: share partial design information, before it is freezed or released. The irreversible decisions, like freezing an api, are made later after feedback has been gathered; at the same time, the rest of the team can start to work with it. improve the response time for new stories. If you want to make a decision later, you still will have to respect the deadline. The faster you are, the later you can take important decisions. The adjectives lean and agile usually connotates lightweight approaches where decisions can be taken later for maximum flexibility. absorb changes by delaying the commitments to particular implementations, tools, and libraries. Modularization, interfaces, configuration parameters and any kind of abstraction are welcome investments in any case where there is the possibility of change in the future. By the way, the *no extra features* XP mantra recognizes that simple design, which minimizes duplication and moving parts, is the best response to the need for evolution. Real Options The Real Option (still the financial option metaphor) concept motivates Agile practices as for their ability to improve our options for deciding at the last responsible moment. For example, tests give us more options for a design by preserving its ability to change; and pairing give us more options for who should develop a feature, as knowledge of that particular part of the code base is spread across the team instead of being concentrated in a few people. It's all about risk management. Delaying decisions lets us able to make them in conditions of less uncertainty, when we can only know more about the domain and the project. Criticism Alistair Cockburn criticizes the concept of last responsible moment for several reasons. First, since the characterization as a single instant is not so close to reality. Cost and benefits of a decisions are soft functions that vary continuously, so it's difficult to think of a precise moment where a decision must be taken. In most cases, the *moment* spans for days. Second, this concept is not actionable, in the sense that you don't know the point in time where it will take place until after it has passed. Knowing that there is a deadline for a decision is different from knowing it with absolute precision. Finally, Cockburn views it as simple not good advice as trade-offs between cost and benefits should only apply to critical decisions, like a database with an high cost or lock-in, or the hardware architecture of the application. From the Extreme Programming point of view, it is correct to delay commitment to the last responsible moment, but not to overengineer a system to postpone every possible design decision. Choices like the programming language to write code in must be taken at the start of the project; the set of classes and methods should be kept minimal as long as duplication is eliminated. After all, this is a series on tools and it's up to us to pick up the right tool in the right context. The last responsible moment makes sense for decisions which are costly to change, but everything that can be rolledback thanks to encapsulation and information hiding is already abstracted away enough. In fact, iterative development is based on starting with a large set of assumptions and removing them one by one according to priority, evolving the code towards a more general picture. For example I have no problem hardcoding business rules, database drivers choices inside Repositories (but not credentials of course), and web application routes. As long as I can go back to the code in the future, they are not final decision; instead, I try to reserve the delaying of commitment to published interfaces and HTTP APIs... Conclusions We have learned to try to postpone decisions which are not immediately required, and even to invest in finding solutions for postponing some of them even when they should ordinarily be taken at the present time. The last responsible moment is a concept not to be taken literally, but when applied to difficult design and business decisions sets a goal for gathering all the needed information to take a choice when the time comes. Don't worry about what you can still change: worry about what will be carved in stone and delay the related decision as long as it does not damage you.
May 9, 2012
by Giorgio Sironi
· 20,352 Views
article thumbnail
What the Heck is a Utility Tree?
i recently answered this question in stackoverflow : what is an utility tree and what is it’s purpose in case of architecture tradeoff analysis method(atam)? i did answer the question there but here’s a better explanation with lots of examples based on the initial version for chapter 1 of soa patterns (which didn’t make it into the final version of the book). there are two types of requirements for software projects: functional and non-functional requirements. functional requirements are the requirements for what the solution must do (which are usually expressed as use cases or stories). the functional requirements are what the users (or systems) that interact with the system do with the system (fill in an order, update customer details, authorize a loan etc.). non-functional requirements are attributes the system is expected to have or manifest. these usually include requirements in areas such as performance, security, availability etc. a better name for non-functional requirements is “quality attributes” . below are some formal definitions from ieee standad 1061 “standard for a software quality metrics methodology” for quality attributes and related terms: quality attribute a characteristic of software, or a generic term applying to quality factors, quality subfactors, or metric values. quality factor a management-oriented attribute of software that contributes to its quality. quality subfactor a decomposition of a quality factor or quality subfactor to its technical components. metric value a metric output or an element that is from the range of a metric. software quality metric a function whose inputs are software data and whose output is a single numerical value that can beinterpreted as the degree to which software possesses a given attribute that affects its quality. most of the requirements that drive the design of a software architecture comes from system’s quality attributes. the reason for this is that that the effect of quality attributes is usually system-wide (e.g. you wouldn’t want your system to have good performance only in the ui – you want the system to perform well no matter what) – which is exactly what software architecture is concerned with. note however, that few requirements might still come from functional requirements) [1] . the question is how do we find out what those requirements are? the answer to that is also in the software architecture definition. the source for quality attributes are the stakeholders. so what or who are these “stakeholders”? well, a stakeholder is just about anyone who has a vested interest in the project. a typical system has a lot of stakeholders starting from the (obvious) customer, the end-users (those people in the customer organization/dept that will actually use the software) and going to the operations personnel (it – those who will have to keep the solution running), the development team, testers, maintainers, management. in some systems the stakeholders can even be the shareholders or even the general public (imagine for example, that you build a new dispatch system for a 911 center). one of the architect’s roles is to analyze the quality attributes and define an architecture that will enable delivering all the functional requirements while supporting the quality attributes. as can be expected ,sometimes quality attributes are in conflict with each other – the most obvious examples are performance vs. security or flexibility vs. simplicity and the architect’s role is to strike a balance between the different quality attributes (and the stakeholders) to make sure the overall quality of the system is maximized. contextual solutions (e.g. patterns) can be devised to solve specific quality attributes need. however saying that a system needs to have “good performance” or that it needs to be “testable” doesn’t really help us know what to do. in order for us to be able to discern which patterns apply to specific quality attribute , we need a better understanding of quality attributes besides the formal definition, something that is more concrete. the way to get that concrete understanding of the effect of quality attributes is to use scenarios. scenarios are short, “user story”-like proses that demonstrate how a quality attribute is manifested in the system using a functional situation quality attributes scenarios originated as a way to evaluate software architecture. the software engineering institute developed several evaluation methodologies, like architecture tradeoff analysis method (clements, kazman and klein, 2002) that heavily build on scenarios to contrast and compare how the different quality attributes are met by candidate architectures. atam (and similar evaluation methods like laaam which is part of msf 4.0) suggest building a “utility tree” which represent the overall usefulness of the system. the scenarios serve as the leafs of the utility tree and the architecture is evaluated by considering how the architecture makes the scenarios possible. i found that using scenarios and the utility tree approach early in the design of the architecture (see writings about saf ) can greatly enhance the quality of the architecture that is produced. when you examine the scenarios you can also prioritize them and better balance conflicting attributes. the scenarios can be used as an input to make sure the quality attributes are actually met. furthermore you can use the scenarios to help identify the strategies or patterns applicable to make the scenarios possible (and thus ensure the quality attributes are met) within the system. we usually group scenarios into a “utility tree” which is a representation of the total usefulness (“utility”) of a system . as you can see in the diagram below we have the key quality attributes (performance, security etc.). each of the quality attributes has sub categories (e.g. performance is broken into latency, data loss etc.). each sub category is demonstrated by a scenario that we expect the system to manifest. the tree representation helps get the whole picture but the important bits here are the scenarios so let’s explore them some more. scenarios are expressed as statements that have 3 parts: a stimulus , a context and a response . the stimulus is the action taken (by the system / user/ other system / any other person); response is how the system is expected to behave when the stimulus occur, and the context specifies the environment or conditions under which we expect the to get the response. for example in the following scenario: “when you perform a database operation , under normal condition, it should take less than 100 miliseconds.” “under normal condition” is the context “when you perform a database operation” is the stimulus “it should take less than 100 millisecond” is the response expected from the system. here are a couple of additional examples for quality attribute scenarios: performance –>latency -> under normal conditions a client consuming multiple services should have latency less than 5 seconds. security->authentications -> under all conditions, any call to a service should be authenticated using x.509 certificate you can also check out this document for a few more scenario examples from a system i worked on in the past [1] design has the ratios reversed i.e. most of the requirements for design come from functional requirements and a few requirements might come from the quality attributes. illustration by epsos.de
May 9, 2012
by Arnon Rotem-gal-oz
· 19,443 Views
article thumbnail
Lean tools: Options thinking
We now have finished exploring the Lean tools for amplifying learning like feedback, iterations and set-based development. We enter the real of the 3rd Lean principle, Decide as late as possible. This principle is oriented to postpone decisions as long as the delay does not impact the product, in order to gain more flexibility instead of becoming locked in with some initial design decisions. Software is easy to rebuild from source code, but its architecture is not always malleable by default as non-technical people would think. Moreover, there are some changes which will always happen, like upgrade of libraries and operating systems, which complements change in requirements or integration ports. The easiest decision to change is the one that has not been made yet. Options Thinking The first tool that helps in postponing decisions is Options Thinking: the introduction of mechanisms whose specific purpose is to enable delaying decisions. In the financial domain, an option is the right to buy a good at a certain price before a future date comes - effectively transferring the decision of buying shares or products some time in the future, as options can expire without being exercised. A simpler instance of Options Thinking cited by Mary Poppendieck is an hotel reservation: you invest a small sum of money (the reservation fee) to book a room; exercising the option means actually going to the hotel, a decision which is made only when the time comes. Trains and airlines often use the same pricing model for seats (even if we do not consider the rise of prices as a flight is being filled). There are multiple types of tickets for each combination of flight and date: some basic and not transferrable or refundable, some more costly that provide the option of changing the date or to get a partial or total refund. Agile Mary Poppendieck adds the insight that Agile software development is a process that creates many options by introducing a very flexible plan and only prescribing more detailed actions after several inspect and adapt loops. It's not bad to delay a commitment until you know more about a problem: forced early decisions are the mark of waterfall (actually of the mainstream version of waterfall). But options do not come for free: for example, in order to simplify a technical decision, XP suggests to create throwaway code. These spikes are the exploration of each potential solution, which in a certain sense are a waste of development time as their final result is of low quality and usually thrown away. However, spikes produces knowledge about the solution that results in a better estimate for its full development or in its abandonment. The decision to adopt a technology or of which solution to adopt is delayed until the end of a spike, but this option pay itself quickly as uncertainty is removed and decisions "get it right" with an higher probability. Real world examples Almost any application I have been involved with in the last two years has had the separation of a persistence layer as one of the goals: Active Record has been progressively abandoned in the PHP world to favor Data Mappers like the Doctrine ORM and ODMs. As for all options that can be bought, this separation does not come for free: development is a little slower when Repositories are objects that have to be designed instead of just a bunch of static calls to the Entity class like User::find() (although there are benefits of the Data Mapper approach that go beyond keeping options open.) An isolated persistence layer, however, allows us to postpone fundamental decisions about the database to use: it's a rough time for many of them as licenses change (MySQL) or new NoSQL solutions come out and evolve. Every month of development where you're not tied to a specific database is a month where the hype goes down and we move towards more mature solutions that we can choose with a greater knowledge of the requirements of our data. Do we need relational database consistency? Or a schema-less store? Moreover, the investment in persistence adapters separated from the core of the application let us able to choose different databases for different bounded contexts of an application; for example, storing views in a relational database and the primary database as a set of aggregates in Couch or Mongo. Conclusion I will never advocate to invest in an option just for the sake of the technical challenge, nor that they come for free; but once you recognize postponing a decision freezing is valuable for the project, there should be really no issue in go and buying it.
May 2, 2012
by Giorgio Sironi
· 10,358 Views
article thumbnail
Amazon EMR Tutorial: Running a Hadoop MapReduce Job Using Custom JAR
See original post at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html Introduction Amazon EMR is a web service which can be used to easily and efficiently process enormous amounts of data. It uses a hosted Hadoop framework running on the web-scale infrastructure of Amazon EC2 and Amazon S3. Amazon EMR removes most of the cumbersome details of Hadoop while taking care of provisioning of Hadoop, running the job flow, terminating the job flow, moving the data between Amazon EC2 and Amazon S3, and optimizing Hadoop. In this tutorial, we will use a developed WordCount Java example using Hadoop and thereafter, we execute our program on Amazon Elastic MapReduce. Prerequisites You must have valid AWS account credentials. You should also have a general familiarity with using the Eclipse IDE before you begin. The reader can also use any other IDE of their choice. Step 1 – Develop MapReduce WordCount Java Program In this section, we are first going to develop a WordCount application. A WordCount program will determine how many times different words appear in a set of files. In Eclipse (or whatever the IDE you are using), Create simple Java Project with the name "WordCount". Create a java class name Map and override the map method as follow, public class Map extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Create a java class named Reduce and override the reduce method as shown below, public class Reduce extends Reducer { @Override protected void reduce(Text key, java.lang.Iterable values, org.apache.hadoop.mapreduce.Reducer.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } Create a java class named WordCount and defined the main method as below, public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setJarByClass(WordCount.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } Export the WordCount program in a jar using eclipse and save it to some location on disk. Make sure that you have provided the Main Class (WordCount.jar) during extraction ofu8u the jar file as shown below. Our jar is ready!!! Step 2 – Upload the WordCount JAR and Input Files to Amazon S3 Now we are going to upload the WordCount jar to Amazon S3. First, go to the following URL: https://console.aws.amazon.com/s3/home Next, click “Create Bucket”, give your bucket a name, and click the “Create” button. Select your new S3 bucket in the left-hand pane. Upload the WordCount JAR and sample input file for counting the words. Step 3 – Running an Elastic MapReduce job Now that the JAR is uploaded into S3, all we need to do is to create a new Job flow. let's execute the steps below. (I encourage readers to check out the following link for details regarding each step, How to Create a Job Flow Using a Custom JAR ) Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/ Click Create New Job Flow. In the DEFINE JOB FLOW page, enter the following details, a) Job Flow Name = WordCountJob b) Select Run your own applications) Select Custom JAR in the drop-down list) Click Continue In the SPECIFY PARAMETERS page, enter values in the boxes using the following table as a guide, and then click Continue.JAR Location = bucketName/jarFileLocationJAR Arguments =s3n://bucketName/inputFileLocations3n://bucketName/outputpath Please note that the output path must be unique each time we execute the job. The Hadoop always create a folder with the same name specified here. After executing the job, just wait and monitor your job that runs through the Hadoop flow. You can also look for errors by using the Debug button. The job should be complete within 10 to 15 minutes (can also depend on the size of the input). After completing the job, You can view results in the S3 Browser panel. You can also download the files from S3 and can analyze the outcome of the job. Amazon Elastic MapReduce Resources Amazon Elastic MapReduce Documentation,http://aws.amazon.com/documentation/elasticmapreduce/ Amazon Elastic MapReduce Getting Started Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/ Amazon Elastic MapReduce Developer Guide,http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/ Apache Hadoop,http://hadoop.apache.org/ See more at https://muhammadkhojaye.blogspot.com/2012/04/how-to-run-amazon-elastic-mapreduce-job.html
April 23, 2012
by Muhammad Ali Khojaye
· 59,029 Views
article thumbnail
Face Detection using HTML5, Javascript, Webrtc, Websockets, Jetty and OpenCV
How to create a real-time face detection system using HTML5, JavaScript, and OpenCV, leveraging WebRTC for webcam access and WebSockets for client-server communication.
April 23, 2012
by Jos Dirksen
· 53,100 Views
article thumbnail
Scheduling a Job Using The NCron Library
Introduction NCron is a .Net scheduling framework, it is a .Net version of Cron - the time based job scheduler found on unix like operating systems or Cron4j - scheduling library for Java. Ncron is light weight and easy to use, with little learning curve. It comes with some cool advantages, being that you can use it in C#, Vb.net or any other .Net programming language. It takes your mind off the details of scheduling and you can focus on how to implement the business logic of your application or the job to be scheduled. Details such as threading and timers have been taken care of. Ncron Library You can point your browser to http://code.google.com/p/ncron/downloads/detail?name=ncron-2.1.zip to download the ncron library. You need to add reference to the Ncron library in your project so as to be able to access the classes and functionalities of the Ncron scheduling framework. Scheduling a Job When creating a job to be scheduled using NCron, the job is wrapped up in a class which must extend the class NCron.CronJob and override a void method Execute public class MyJob : NCron.CronJob { public override void Execute() { System.IO.File.Copy(@"c:\\output.out", @"f:\\output.out"); } } The job to be scheduled will be placed in the Execute method. The next thing to do is to give NCron control over the job execution, by calling the static method Bootstrap.Init() at the entry point of your application, for example this can be put in the Main method. You should have a static setup method, which I called JobSetup method that will be passed into the Bootstrap.Init() method. using System; using System.Collections.Generic; using System.Linq; using System.Text; using NCron.Fluent.Crontab; using NCron.Fluent.Generics; using NCron.Service; namespace NcronExample { public class Program { private static void Main(string[] args) { Bootstrap.Init(args, JobSetup); } private static void JobSetup(SchedulingService schedulingService) { schedulingService.At("* * * * *").Run(); } } } The line of code inside the JobSetup method is to specify how the Job is going to be run, and the parameter in the schedulingService.At() method is known as crontab expression which I will discuss shortly. The SchedulingService class has a number of methods of interest. service.Daily().Run(); //runs the scheduled job once every day service.Hourly().Run(); //runs the scheduled job once every hour service.Weekly().Run(); //runs the scheduled job once every week Crontab Expression A crontab expression is a string comprising of 5 characters, which are seperated by space. This crontab expression when parsed produces occurrences of time based on a given schedule expressed in the crontab format. NCron parses crontab expression through the use of NCrontab(Crontab for .Net) an open source library for parsing crontab expressions. A regular crontab expression is of the form * * * * * where the first * is for minute which can be from 0-59. The second * is for hour which can also be from 0-23. The third * is for day of the month from 1-31. The fourth * is for month from 1-12. The last * is for day of week from 0-6 where 0 represents Sunday. The asterisk or wildcard character if left in the expression indicates all valid or legal values for that column. If yIf you want the scheduled job to run every minute, the expresion will be in the form below. * * * * * The The expression below causes the scheduler to run the job at the fifth minute of every ninth hour everyday. 5 9 * * * To run a job every tenth minute of every hour from Monday to Friday only, the expression will be in the form below. 10 * * * 1,2,3,4,5 You can read more on crontab expressions at http://code.google.com/p/ncrontab/wiki/CrontabExamples Deploying the Scheduled Job After the application has been built and compiled, you can deploy the scheduled job as a service by opening command prompt and change directory to where the executable of the application is and then run the command. ncronexample install To install the scheduled job as a service, and that is it !!!
April 18, 2012
by Ayobami Adewole
· 17,476 Views
article thumbnail
Quartz Scheduler Misfire Instructions Explained
Sometimes Quartz is not capable of running your job at the time when you desired. There are three reasons for that: all worker threads were busy running other jobs (probably with higher priority) the scheduler itself was down the job was scheduled with start time in the past (probably a coding error) You can increase the number of worker threads by simply customizing the org.quartz.threadPool.threadCount in quartz.properties (default is 10). But you cannot really do anything when the whole application/server/scheduler was down. The situation when Quartz was incapable of firing given trigger is called misfire. Do you know what Quartz is doing when it happens? Turns out there are various strategies (called misfire instructions) Quartz can take and also there are some defaults if you haven't thought about it. But in order to make your application robust and predictable (especially under heavy load or maintenance) you should really make sure your triggers and jobs are configured conciously. There are different configuration options (available misfire instructions) depending on the trigger chosen. Also Quartz behaves differently depending on trigger setup (so called smart policy). Although the misfire instructions are described in the documentation, I found it hard to understand what do they really mean. So I created this small summary article. Before I dive into the details, there is yet another configuration option that should be described. It is org.quartz.jobStore.misfireThreshold (in milliseconds), defaulting to 60000 (a minute). It defines how late the trigger should be to be considered misfired. With default setup if trigger was suppose to be fired 30 seconds ago, Quartz will happily just run it. Such delay is not considered misfiring. However if the trigger is discovered 61 seconds after the scheduled time - the special misfire handler thread takes care of it, obeying the misfire instruction. For test purposes we will set this parameter to 1000 (1 second) so that we can test misfiring quickly. Simple trigger without repeating In our first example we will see how misfiring is handled by simple triggers scheduled to run only once: val trigger = newTrigger(). startAt(DateUtils.addSeconds(new Date(), -10)). build() The same trigger but with explicitly set misfire instruction handler: val trigger = newTrigger(). startAt(DateUtils.addSeconds(new Date(), -10)). withSchedule( simpleSchedule(). withMisfireHandlingInstructionFireNow() //MISFIRE_INSTRUCTION_FIRE_NOW ). build() For the purpose of testing I am simply scheduling the trigger to run 10 seconds ago (so it is 10 seconds late by the time it is created!) In real world you would normally never schedule triggers like that. Instead imagine the trigger was set correctly but by the time it was scheduled the scheduler was down or didn't have any free worker threads. Nevertheless, how will Quartz handle this extraordinary situation? In the first code snippet above no misfire handling instruction is set (so called smart policy is used in that case). The second code snippet explicitly defines what kind of behaviour do we expect when misfiring occurs. See the table: Instruction Meaning smart policy - default See: withMisfireHandlingInstructionFireNow withMisfireHandlingInstructionFireNow MISFIRE_INSTRUCTION_FIRE_NOW The job is executed immediately after the scheduler discovers misfire situation. This is the smart policy. Example scenario: you have scheduled some system clean up at 2 AM. Unfortunately the application was down due to maintenance by that time and brought back on 3 AM. So the trigger misfired and the scheduler tries to save the situation by running it as soon as it can - at 3 AM. withMisfireHandlingInstructionIgnoreMisfires MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY QTZ-283 See: withMisfireHandlingInstructionFireNow withMisfireHandlingInstructionNextWithExistingCount MISFIRE_INSTRUCTION_RESCHEDULE_NEXT_WITH_EXISTING_COUNT See: withMisfireHandlingInstructionNextWithRemainingCount withMisfireHandlingInstructionNextWithRemainingCount MISFIRE_INSTRUCTION_RESCHEDULE_NEXT_WITH_REMAINING_COUNT Does nothing, misfired execution is ignored and there is no next execution. Use this instruction when you want to completely discard the misfired execution. Example scenario: the trigger was suppose to start recording of a program in TV. There is no point of starting recording when the trigger misfired and is already 2 hours late. withMisfireHandlingInstructionNowWithExistingCount MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_EXISTING_REPEAT_COUNT See: withMisfireHandlingInstructionFireNow withMisfireHandlingInstructionNowWithRemainingCount MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_REMAINING_REPEAT_COUNT See: withMisfireHandlingInstructionFireNow Simple trigger repeating fixed number of times This scenario is much more complicated. Imagine we have scheduled some job to repeat fixed number of times: val trigger = newTrigger(). startAt(dateOf(9, 0, 0)). withSchedule( simpleSchedule(). withRepeatCount(7). withIntervalInHours(1). WithMisfireHandlingInstructionFireNow() //or other ). build() In this example the trigger is suppose to fire 8 times (first execution + 7 repetitions) every hour, beginning at 9 AM today (startAt(dateOf(9, 0, 0)). Thus the last execution should occur at 4 PM. However assume that due to some reason the scheduler was not capable of running jobs at 9 and 10 AM and it discovered that fact at 10:15 AM, i.e. 2 firings misfired. How will the scheduler behave in this situation? Instruction Meaning smart policy - default See: withMisfireHandlingInstructionNowWithExistingCount withMisfireHandlingInstructionFireNow MISFIRE_INSTRUCTION_FIRE_NOW See: withMisfireHandlingInstructionNowWithRemainingCount withMisfireHandlingInstructionIgnoreMisfires MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICYQTZ-283 Fires all triggers that were missed as soon as possible and then goes back to ordinary schedule. Example scenario: With this strategy in our example the scheduler will fire jobs scheduled at 9 and 10 AM immediately. Then it will wait to 11 AM and go back to ordinary schedule. Note: When handling misfires it is equally important to realize that the actual job execution time might be way after the scheduled time. This means you cannot simply rely on current system date, but you need to use JobExecutionContext .getScheduledFireTime(): def execute(context: JobExecutionContext) { val date = context.getScheduledFireTime //... } withMisfireHandlingInstructionNextWithExistingCount MISFIRE_INSTRUCTION_RESCHEDULE_NEXT_WITH_EXISTING_COUNT The scheduler won't do anything immediately. Instead it will wait for next scheduled time and run all triggers with scheduled intervals. See also: withMisfireHandlingInstructionNextWithRemainingCount Example scenario: at 10:15 the scheduler discovers 2 misfired executions. It waits until next scheduled time (11 AM) and fires all 8 scheduled executions every hour, stopping at 6 PM (the trigger should have stopped at 4 PM). withMisfireHandlingInstructionNextWithRemainingCount MISFIRE_INSTRUCTION_RESCHEDULE_NEXT_WITH_REMAINING_COUNT The scheduler discards misfired executions and waits for the next scheduled time. The total number of trigger executions will be less then configured. Example scenario: at 10:15 two misfired executions are discarded. The scheduler waits for next scheduled time (11 AM) and fires remaining triggers up to 4 PM. Effectively it behaves as if misfire never occurred. withMisfireHandlingInstructionNowWithExistingCount MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_EXISTING_REPEAT_COUNT First misfired trigger is executed immediately. Then the scheduler waits desired interval and executes all remaining triggers. Effectively the first fire time of the misfired trigger is moved to current time with no other changes. Example scenario: at 10:15 the scheduler runs the first misfired execution. Then it waits 1 hour and fires the second one at 11:15 AM. All 8 executions are performed, the last one at 5:15 PM withMisfireHandlingInstructionNowWithRemainingCount MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_REMAINING_REPEAT_COUNT First misfired execution runs immediately. Remaining misfired executions are discarded. Triggers that were not misfired are executed with desired interval. Example scenario: at 10:15 the scheduler runs the first misfired execution (from 9 AM). It discards remaining misfired executions (the one from 10 AM) and waits 1 hour to execute six more triggers: 11:15, 12:15, … 4:15 PM Simple trigger repeating infinitely In this scenario trigger repeats infinite number of times at a given interval: val trigger = newTrigger(). startAt(dateOf(9, 0, 0)). withSchedule( simpleSchedule(). withRepeatCount(SimpleTrigger.REPEAT_INDEFINITELY). withIntervalInHours(1). WithMisfireHandlingInstructionFireNow() //or other ). build() Once again trigger should fire on every hour, beginning at 9 AM today (startAt(dateOf(9, 0, 0)). However the scheduler was not capable of running jobs at 9 and 10 AM and it discovered that fact at 10:15 AM, i.e. 2 firings misfired. This is a more general situation compared to simple trigger running fixed number of times. Instruction Meaning smart policy - default See: withMisfireHandlingInstructionNextWithRemainingCount withMisfireHandlingInstructionFireNow MISFIRE_INSTRUCTION_FIRE_NOW See: withMisfireHandlingInstructionNowWithRemainingCount withMisfireHandlingInstructionIgnoreMisfires MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICYQTZ-283 The scheduler will immediately run all misfired triggers, then continue on schedule. Example scenario: the triggers scheduled at 9 and 10 AM are executed immediately. Future invocations (next scheduled at 11 AM) are executed according to the plan. withMisfireHandlingInstructionNextWithExistingCount MISFIRE_INSTRUCTION_RESCHEDULE_NEXT_WITH_EXISTING_COUNT See: withMisfireHandlingInstructionNextWithRemainingCount withMisfireHandlingInstructionNextWithRemainingCount MISFIRE_INSTRUCTION_RESCHEDULE_NEXT_WITH_REMAINING_COUNT Does nothing, misfired executions are discarded. Then the scheduler waits for next scheduled interval and goes back to schedule. Example scenario: Misfired execution at 9 and 10 AM are discarded. The first execution occurs at 11 AM. withMisfireHandlingInstructionNowWithExistingCount MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_EXISTING_REPEAT_COUNT See: withMisfireHandlingInstructionNowWithRemainingCount withMisfireHandlingInstructionNowWithRemainingCount MISFIRE_INSTRUCTION_RESCHEDULE_NOW_WITH_REMAINING_REPEAT_COUNT The first misfired execution is run immediately, remaining are discarded. Next execution happens after desired interval. Effectively the first execution time is moved to current time. Example scenario: the scheduler fires misfired trigger immediately at 10:15 AM. Then waits an hour and runs the second one at 11:15 AM and continues with 1 hour interval. CRON triggers CRON triggers are the most popular ones amongst Quartz users. However there are also two other available triggers: DailyTimeIntervalTrigger (e.g. fire every 25 minutes) and CalendarIntervalTrigger (e.g. fire every 5 months). They support triggering policies not possible in both CRON and simple triggers. However they understand the same misfire handling instructions as CRON trigger. val trigger = newTrigger(). withSchedule( cronSchedule("0 0 9-17 ? * MON-FRI"). withMisfireHandlingInstructionFireAndProceed() //or other ). build() In this example the trigger should fire every hour between 9 AM and 5 PM, from Monday to Friday. But once again first two invocations were missed (so the trigger misfired) and this situation was discovered at 10:15 AM. Note that available misfire instructions are different compared to simple triggers: Instruction Meaning smart policy - default See: withMisfireHandlingInstructionFireAndProceed withMisfireHandlingInstructionIgnoreMisfires MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICYQTZ-283 All misfired executions are immediately executed, then the trigger runs back on schedule. Example scenario: the executions scheduled at 9 and 10 AM are executed immediately. The next scheduled execution (at 11 AM) runs on time. withMisfireHandlingInstructionFireAndProceed MISFIRE_INSTRUCTION_FIRE_ONCE_NOW Immediately executes first misfired execution and discards other (i.e. all misfired executions are merged together). Then back to schedule. No matter how many trigger executions were missed, only single immediate execution is performed. Example scenario: the executions scheduled at 9 and 10 AM are merged and executed only once (in other words: the execution scheduled at 10 AM is discarded). The next scheduled execution (at 11 AM) runs on time. withMisfireHandlingInstructionDoNothing MISFIRE_INSTRUCTION_DO_NOTHING All misfired executions are discarded, the scheduler simply waits for next scheduled time. Example scenario: the executions scheduled at 9 and 10 AM are discarded, so basically nothing happens. The next scheduled execution (at 11 AM) runs on time. QTZ-283Note: QTZ-283: MISFIRE_INSTRUCTION_IGNORE_MISFIRE_POLICY not working with JDBCJobStore - apparently there is a bug when JDBCJobStore is used, keep an eye on that issue. As you can see various triggers behave differently based on the actual setup. Moreover, even though the so called smart policy is provided, often the decision is based on business requirements. Essentially there are three major strategies: ignore, run immediately and continue and discard and wait for next. They all have different use-cases: Use ignore policies when you want to make sure all scheduled executions were triggered, even if it means multiple misfired triggers will fire. Think about a job that generates report every hour based on orders placed during that last hour. If the server was down for 8 hours, you still want to have that reports generated, as soon as you can. In this case the ignore policies will simply run all triggers scheduled during that 8 hour as fast as scheduler can. They will be several hours late, but will eventually be executed. Use now* policies when there are jobs executing periodically and upon misfire situation they should run as soon as possible, but only once. Think of a job that cleans /tmp directory every minute. If the scheduler was busy for 20 minutes and finally can run this job, you don't want to run in 20 times! One is enough, but make sure it runs as fast it can. Then back to your normal one-minute intervals. Finally next* policies are good when you want to make sure your job runs at particular points in time. For example you need to fetch stock prices quarter past every hour. They change rapidly so if your job misfired and it is already 20 minutes past full hour, don't bother. You missed the correct time by 5 minutes and now you don't really care. It is better to have a gap rather than an inaccurate value. In this case Quartz will skip all misfired executions and simply wait for the next one.
April 13, 2012
by Tomasz Nurkiewicz
· 109,074 Views · 13 Likes
article thumbnail
Configuring Quartz With JDBCJobStore in Spring
I am starting a little series about Quartz scheduler internals, tips and tricks, this is chapter 0 - how to configure persistent job store.
April 7, 2012
by Tomasz Nurkiewicz
· 37,704 Views
article thumbnail
The Two Hand Rule for Meandering Stand Ups
when working with agile teams the daily stand up meeting provides a heart beat to the day and an opportunity for team members to share information. stand up meetings work best when they are short and balance the inputs across all the people in the team. a common problem is stand ups that start running too long. when two hands are raised then it's a signal to move the stand up meeting on the two hands rule sometimes the conversations at stand up can get too detailed or go on too long. for these situations we’ve introduced the “two hands” rule; if anyone thinks the current conversation has gone off topic, or is no longer effective, then they raise a hand. once a second person raises a hand then that’s a sign to stop the conversation and continue with the rest of the stand up. those speaking can continue the conversation after the stand up has finished. this approach makes it easy for people to share their view on the effectiveness of the conversation in a way that reduces the risk of causing offence. it also provides a way for the team to detect and correct its own behaviour. i introduced this idea recently to a team who agreed to give it a try. in a stand up a few days later i was talking with a team member and didn’t realise that our conversation had got too detailed and gone on too long. i missed seeing that two other team members had put their hands up. it wasn’t until one of them spoke up that i noticed! this is one of the characteristics of difficult conversations; we often become blind to signs, easily spotted by others, that the conversation has become ineffective. by agreeing with the team to use the “two hands” rule they helped me detect when they thought i’d become ineffective. the technique can have some downsides though. it can feel direct or confrontational, especially when people first experience it. it’s important to discuss any issues after the stand up and consider reviewing the practice in a retrospective. i’d like to hear your thoughts. have you had stand up meetings that have taken too long? what approaches have you used? if you’ve tried something like the “two hands” rule, how did it go?
March 6, 2012
by Benjamin Mitchell
· 8,922 Views · 1 Like
article thumbnail
Why Having "DevOps" in a Job Title Makes Sense
We’ve been trying to grow our team for a few months now and the title we’re hiring for is Devops Engineer. One of the candidates our recruiters reached out to, let’s call him John, came back to us with a bunch of questions including: How do you feel about hiring someone with a devops title? It’s a very legittimate question, Devops is a cultural and professional movement, so how could it be a job title? What I argued in my reply to this fella is that Devops isn’t the job title, Devops Engineer is, and in this sense Devops is just a qualifier and I strongly believe a very useful one. I really sympathise with those that are fighting hard to keep Devops real and avoid the same faith that some refer to as the sad commercialisation of Agile. My campaign to make of devops a job title isn’t a campaign to come up with a set of bullet points that define Devops as a job so that I can put it on a resume or build it into a product. My argument here is that the guy I’m trying to hire, John, I want him to be a certain kind of guy and the best way I have to describe what I want is Devops Engineer. I’m looking for an operations guy , but I want him to be open to developers, consider engineering and the company as a whole, be focused on delivering value and not rathole into fights about technology or claim root access only on principle. I want that guy to have great communication skills and the interest to explore what’s besides his infrastructure, to be wanting to borrow as much good he can find in other disciplines across the organisation. And then of course there is the practical part, the desire to automate and escape a boring manual routine, the familiarity with cloud that willing or not has powered the movement, and even more specific things like configuration management. You may argue that this is just a good engineer or what systems engineers are becoming, in other words nothing new under the sun. And you may be right, but job titles are in many ways just another way to communicate, to broadcast an intent and a need. So you know what I told John about hiring Devops Engineers? That I felt pretty damn proud about it. The true ones, not the ones slapping it on their CV to get a job, are fantastic engineers and I can’t but encourage them to start to respond to that qualifier. Likewise the companies and individuals seeking them out are likely the ones building great groups those people will want to be members of. Yes, the moment it becomes a keyword recruiters start to match against we’re likely to see a spur of fakes trying to land a job, but that’s nothing new under the sun. Signed, a Devops manager Source: http://www.spikelab.org/devops-job-title/
March 5, 2012
by Spike Morelli
· 10,695 Views
  • Previous
  • ...
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×