DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone >

Merge.py

Snippets Manager user avatar by
Snippets Manager
·
Jan. 05, 07 · · Code Snippet
Like (0)
Save
Tweet
445 Views

Join the DZone community and get the full member experience.

Join For Free
Merge a number of text files, removing duplicates and sorting the results.

Usage: [-e charset] filenames destination
Example: merge folder/*.log list.txt merged.txt

from codecs import open
from getopt import getopt
from glob import glob
from os import linesep
from sys import argv

def main():
    if len(argv) < 3:
        exit('usage: %s [-e charset] filenames destination' % argv[0])
    options, filenames = getopt(argv[1:-1], 'e:')
    destination, filenames = argv[-1], set(filenames)
    try: charset = dict(options)['-e']
    except KeyError: charset = 'utf-8'
    for name in argv[1:-1]:
        if not name.count('*'): continue
        filenames.remove(name)
        filenames.update(glob(name))
    result = []
    for name in filenames:
        lines = open(name, 'U', charset).read().split('\n')
        result.extend(lines)
    result = sorted(set(result))
    open(destination, 'w', charset).writelines(linesep.join(result))
    print '%s = %s (%d lines)' % (' + '.join(filenames), destination, len(result))

if __name__ == '__main__':
    main()

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • DZone's Article Submission Guidelines
  • Refactoring Java Application: Object-Oriented And Functional Approaches
  • How to Generate Fake Test Data
  • Evolving Domain-Specific Languages

Comments

Partner Resources

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo