It's commonly said that everyone does Agile differently. In my experience, it's also common to do basically whatever you want and call it Agile. It can be useful to occasionally reset and examine what canonical Agile recommends.
It’s a common pattern for command line tools to have multiple subcommands that run off of a single executable. For example, git fetch origin and git commit --amend both use the same executable /usr/bin/git to run. Each subcommand has its own set of required and optional parameters. This pattern is fairly easy to implement in your own Python command-line utilities using argparse. Here is a script that pretends to be git and provides the above two commands and arguments. #!/usr/bin/env python import argparse import sys class FakeGit(object): def __init__(self): parser = argparse.ArgumentParser( description='Pretends to be git', usage='''git [] The most commonly used git commands are: commit Record changes to the repository fetch Download objects and refs from another repository ''') parser.add_argument('command', help='Subcommand to run') # parse_args defaults to [1:] for args, but you need to # exclude the rest of the args too, or validation will fail args = parser.parse_args(sys.argv[1:2]) if not hasattr(self, args.command): print 'Unrecognized command' parser.print_help() exit(1) # use dispatch pattern to invoke method with same name getattr(self, args.command)() def commit(self): parser = argparse.ArgumentParser( description='Record changes to the repository') # prefixing the argument with -- means it's optional parser.add_argument('--amend', action='store_true') # now that we're inside a subcommand, ignore the first # TWO argvs, ie the command (git) and the subcommand (commit) args = parser.parse_args(sys.argv[2:]) print 'Running git commit, amend=%s' % args.amend def fetch(self): parser = argparse.ArgumentParser( description='Download objects and refs from another repository') # NOT prefixing the argument with -- means it's not optional parser.add_argument('repository') args = parser.parse_args(sys.argv[2:]) print 'Running git fetch, repository=%s' % args.repository if __name__ == '__main__': FakeGit() The argparse library gives you all kinds of great stuff. You can run ./git.py --help and get the following: usage: git [] The most commonly used git commands are: commit Record changes to the repository fetch Download objects and refs from another repository Pretends to be git positional arguments: command Subcommand to run optional arguments: -h, --help show this help message and exit You can get help on a particular subcommand with ./git.py commit --help. usage: git.py [-h] [--amend] Record changes to the repository optional arguments: -h, --help show this help message and exit --amend Want bash completion on your awesome new command line utlity? Try argcomplete, a drop in bash completion for Python + argparse.
One of the great things about git is how fast it is. You can create a new branch, or switch to another branch, almost as fast as you can type the command. This tends to lower the impedance of branching. As a result, many individuals and teams will naturally converge on a process where they create many, many branches. If you’re like me, you may have 30 branches at any given time. This can make viewing all the branches unwieldy. Once I week or so, I would go on a branch deletion spree by manually copying and pasting multiple branch names into a git branch -D statement. The basic use case is that you want to delete any branches that are already merged into master. Here is a python script that automated just that. from subprocess import check_output import sys def get_merged_branches(): ''' a list of merged branches, not couting the current branch or master ''' raw_results = check_output('git branch --merged upstream/master', shell=True) return [b.strip() for b in raw_results.split('\n') if b.strip() and not b.startswith('*') and b.strip() != 'master'] def delete_branch(branch): return check_output('git branch -D %s' % branch, shell=True).strip() if __name__ == '__main__': dry_run = '--confirm' not in sys.argv for branch in get_merged_branches(): if dry_run: print branch else: print delete_branch(branch) if dry_run: print '*****************************************************************' print 'Did not actually delete anything yet, pass in --confirm to delete' print '*****************************************************************' To print the branches that would be deleted, just execute python delete_merged_branches.py. To actually delete the branches, execute python delete_merged_branches.py --confirm.
"this is my rifle. there are many like it, but this one is mine." - rifleman’s creed there are a thousand ways to grep over files. most developers i have observed keep a separate command line open just for searching. a few use an ide that has file search built in. personally, i use a couple of vim macros. in vim, you can execute a cross-file search with something like :vimgrep /dostuff()/j ../**/*.c . i don’t know about you, but the first time i saw that syntax my brain simply refused. instead, i have the following in my .vimrc file: " opens search results in a window w/ links and highlight the matches command! -nargs=+ grep execute 'silent grep! -i -r -n --exclude *.{json,pyc} . -e ' | copen | execute 'silent /' " shift-control-* greps for the word under the cursor :nmap g :grep =expand("") the first command is just a simple alias for the above mentioned native grep. like all custom commands, it must start with a capital letter (to differentiate it from native commands). you simply type :grep foobar and it will search in your current directory through all file extensions (except .json and .pyc -- you can add more to the blacklist). it also displays the results in a nice little buffer window, which you can navigate through with normal hjkl keys, and open matches in the main editor window. the second line is a key mapping that will grep for the word currently under the cursor. you can just navigate to a word and hit leader-g to issue the grep command.
Apache Hive is a high level SQL-like interface to Hadoop. It lets you execute mostly unadulterated SQL, like this: CREATE TABLE test_table(key string, stats map); The map column type is the only thing that doesn’t look like vanilla SQL here. Hive can actually use different backends for a given table. Map is used to interface with column oriented backends like HBase. Essentially, because we won’t know ahead of time all the column names that could be in the HBase table, Hive will just return them all as a key/value dictionary. There are then helpers to access individual columns by key, or even pivot the map into one key per logical row. As part of the Hadoop family, Hive is focused on bulk loading and processing. So it’s not a surprise that Hive does not support inserting raw values like the following SQL: INSERT INTO suppliers (supplier_id, supplier_name) VALUES (24553, 'IBM'); However, for unit testing Hive scripts, it would be nice to be able to insert a few records manually. Then you could run your map reduce HQL, and validate the output. Luckily, Hive can load CSV files, so it’s relatively easy to insert a handful or records that way. CREATE TABLE foobar(key string, stats map) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|' MAP KEYS TERMINATED BY ':' ; LOAD DATA LOCAL INPATH '/tmp/foobar.csv' INTO TABLE foobar; This will load a CSV file with the following data, where c4ca4-0000001-79879483-000000000124 is the key, and comments and likesare columns in a map. c4ca4-0000001-79879483-000000000124,comments:0|likes:0 c4ca4-0000001-79879483-000000000124,comments:0|likes:0 Because I’ve been doing this quite a bit in my unit tests, I wrote a quick Python helper to dump a list of key/map tuples to a temporary CSV file, and then load it into Hive. This uses hiver to talk to Hive over thrift. import hiver from django.core.files.temp import NamedTemporaryFile def _hql(self, hql): client = hiver.connect(settings.HIVE_HOST, settings.HIVE_PORT) try: client.execute(hql) finally: client.shutdown() def insert(self, table_name, rows): ''' cannot insert single rows via hive, need to save to a temp file and bulk load that ''' csv_file = NamedTemporaryFile(delete=True) for row in rows: map_repr = '|'.join('%s:%s' % (key, value) for key, value in row[1].items()) csv_file.write(row[0] + "," + map_repr + "\n") csv_file.flush() try: _hql('DROP TABLE IF EXISTS %s' % table_name) _hql(""" CREATE TABLE %s ( key string, map ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '|' MAP KEYS TERMINATED BY ':' """ % (table_name)) _hql(""" LOAD DATA LOCAL INPATH '%s' INTO TABLE %s """ % (csv_file.name, table_name) finally: csv_file.close() You can call it like this: insert('test_table', [ ('c4ca4-0000001-79879483-000000000124', {'comments': 1, 'likes': 2}), ('c4ca4-0000001-79879483-000000000124', {'comments': 1, 'likes': 2}), ('c4ca4-0000001-79879496-000000000124', {'comments': 1, 'likes': 2}), ('b4aed-0000002-79879783-000000000768', {'comments': 1, 'likes': 2}), ('b4aed-0000002-79879783-000000000768', {'comments': 1, 'likes': 2}), ])
In my last Django project, we had a set of helper functions that we used a lot. The most used was helpers.first, which takes a query set and returns the first element, or None if the query set was empty. Instead of writing this: try: object = MyModel.objects.get(key=value) except model.DoesNotExist: object = None You can write this: def first(query): try: return query.all()[0] except: return None object = helpers.first(MyModel.objects.filter(key=value)) Note, that this is not identical. The get method will ensure that there is exactly one row in the database that matches the query. The helper.first() method will silently eat all but the first matching row. As long as you're aware of that, you might choose to use the second form in some cases, primarily for style reasons. But the syntax on the helper is a little verbose, plus you're constantly including helpers.py. Here is a version that makes this available as a method on the end of your query set chain. All you have to do is have your models inherit from this AbstractModel. class FirstQuerySet(models.query.QuerySet): def first(self): try: return self[0] except: return None class ManagerWithFirstQuery(models.Manager): def get_query_set(self): return FirstQuerySet(self.model) class AbstractModel(models.Model): objects = ManagerWithFirstQuery() class Meta: abstract = True class MyModel(AbstractModel): ... Now, you can do the following. object = MyModel.objects.filter(key=value).first()
Django handles database connections transparently in almost all cases. It will start a new connection when your request starts up, and commit it at the end of the request lifetime. Other times you need to dive in further and do your own granular transaction management. But for the most part, it's fully automatic. However, sometimes your use case may require that you close the current database connection and open a new one. While this is possible in Django, it's not well documented. Why would you want to do this? I my case, I was writing an automation test framework. Some of the automation tests make database calls through the Django ORM to setup records, clean up after the test, etc. Each test is executed in the same process space, via a thread pool. We found that if one of the early tests threw an unrecoverable database error, such as an IntegrityError due to violating a unique constraint, the database connection would be aborted. Subsequent tests that tried to use the database would raise a DatabaseError: Traceback (most recent call last): File /home/user/project/app/test.py, line 73, in tearDown MyModel.objects.all() File /usr/local/lib/python2.6/dist-packages/django/db/models/query.py, line 444, in delete collector.collect(del_query) File /usr/local/lib/python2.6/dist-packages/django/db/models/deletion.py, line 146, in collect reverse_dependency=reverse_dependency) File /usr/local/lib/python2.6/dist-packages/django/db/models/deletion.py, line 91, in add if not objs: File /usr/local/lib/python2.6/dist-packages/django/db/models/query.py, line 113, in __nonzero__ iter(self).next() File /usr/local/lib/python2.6/dist-packages/django/db/models/query.py, line 107, in _result_iter self._fill_cache() File /usr/local/lib/python2.6/dist-packages/django/db/models/query.py, line 772, in _fill_cache self._result_cache.append(self._iter.next()) File /usr/local/lib/python2.6/dist-packages/django/db/models/query.py, line 273, in iterator for row in compiler.results_iter(): File /usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py, line 680, in results_iter for rows in self.execute_sql(MULTI): File /usr/local/lib/python2.6/dist-packages/django/db/models/sql/compiler.py, line 735, in execute_sql cursor.execute(sql, params) File /usr/local/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py, line 44, in execute return self.cursor.execute(query, args) DatabaseError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. It turns out that it's relatively easy to reset the database connection. We just called the following function at the start of every test. Django is smart enough to re-initialize the connection the next time it's used, assuming that it's disconnected properly. def reset_database_connection(): from django import db db.close_connection()
In my Django applications, I tend to use custom middleware extensively for common tasks. I have middleware that logs page runtime, middleware that sets context that most views will end up needing anyway, and middleware that copies the HTTP_REFERRER header from an entry page into the session scope for use later in the session. At some point, I inadvertently created a middleware class invalidated the browser cache for certain views. Typically, just wrapping a view in @cache_control(max_age=3600) is enough to have the browser cache that view for an hour. But if you do something innocuous like evaluate request.user.is_authenticated() in a middleware class, then Django will set the Vary: Cookie header, invalidating the cache. In my case, what I really wanted was a decorator that I could attach to a view that would skip my custom middleware, like an exclude list. Of course, you could just attach your middleware explicitly to each view that needs it, but that's needless code repetition if a middleware should wrap almost all views. You could also change each of your middleware classes to exclude particular views by URL, but you might end up having to alter many different middleware classes with that logic. As another option, you can use the following decorator/middleware pair to short-circuit the middleware execution of any view, for any middleware defined in your settings file AFTER this one. """ Allows short-curcuiting of ALL remaining middleware by attaching the @shortcircuitmiddleware decorator as the TOP LEVEL decorator of a view. Example settings.py: MIDDLEWARE_CLASSES = ( 'django.middleware.common.CommonMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.csrf.CsrfViewMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.contrib.messages.middleware.MessageMiddleware', # THIS MIDDLEWARE 'myapp.middleware.shortcircuit.ShortCircuitMiddleware', # SOME OTHER MIDDLE WARE YOU WANT TO SKIP SOMETIMES 'myapp.middleware.package.MostOfTheTimeMiddleware', # MORE MIDDLEWARE YOU WANT TO SKIP SOMETIMES HERE ) Example view to exclude from MostOfTheTimeMiddleware (and any subsequent): @shortcircuitmiddleware def myview(request): ... """ def shortcircuitmiddleware(f): """ view decorator, the sole purpose to is 'rename' the function '_shortcircuitmiddleware' """ def _shortcircuitmiddleware(*args, **kwargs): return f(*args, **kwargs) return _shortcircuitmiddleware class ShortCircuitMiddleware(object): """ Middleware; looks for a view function named '_shortcircuitmiddleware' and short-circuits. Relies on the fact that if you return an HttpResponse from a view, it will short-circuit other middleware, see: https://docs.djangoproject.com/en/dev/topics/http/middleware/#process-request """ def process_view(self, request, view_func, view_args, view_kwargs): if view_func.func_name == "_shortcircuitmiddleware": return view_func(request, *view_args, **view_kwargs) return None Source: http://bitkickers.blogspot.com/2011/08/django-exclude-some-views-from.html