Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

An Introduction to Python Sets

DZone's Guide to

An Introduction to Python Sets

Learn about how to create sets using the brace notation, about set constructors, and about the various commonly used operations with sets.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Python supports sets, which are a collection of unique elements and provide operations for computing set unions, intersections, and differences.

Introduction

A set is a collection of unique elements. A common use is to eliminate duplicate elements from a list. In addition, it supports set operations like union intersection and difference.

Creating a Set

This involves brace construction and set comprehension. 

Brace Construction

Creating a set looks similar to creating a dictionary; you enclose a bunch of items within braces.

s = {1, 2, 3, 3, 4}
print s
# prints
set([1, 2, 3, 4])

Notice that the set contains unique elements only even though we put duplicates into it.

A set need not contain elements of the same type. You can mix and match element types as you like.

set = {1, 2, 'hello', 4, 'world'}
print set
# prints
set(['world', 2, 4, 'hello', 1])

Set Comprehension

Similar to dictionaries and lists, you can use set comprehension as in the following example of a set of squares.

a = {x*x for x in xrange(10)}
print a
# prints
set([0, 1, 4, 81, 64, 9, 16, 49, 25, 36])

Using the set() Constructor

Create a set from a list using the set() constructor.

a = [1, 2, 2, 3]
print set(a)
# prints
set([1, 2, 3])

How about creating a set of characters comprising a string? This shortcut will work.

print set('abcd')
# prints
set(['a', 'c', 'b', 'd'])

Creating a set of unique random numbers:

a = [random.randint(0, 10) for x in xrange(10)]
print a
print set(a)
# prints
[10, 2, 3, 3, 6, 6, 4, 9, 5, 0]
set([0, 2, 3, 4, 5, 6, 9, 10])

Methods of set

The following sections explain the most commonly used methods of sets.

Membership Testing

The boolean expressions elem in a and elem not in a allow checking for membership of a set.

a = {'apple', 'orange', 'banana', 'melon', 'mango'}
print a
print 'banana' in a
print 'papaya' in a
# prints
set(['melon', 'orange', 'mango', 'banana', 'apple'])
True
False

Set Size

You can obtain the size of a set (the number of elements) using the len() function.

a = {'apple', 'orange', 'banana', 'melon', 'mango'}
print a
print 'size of a:', len(a)
# prints
set(['melon', 'orange', 'mango', 'banana', 'apple'])
size of a: 5

Adding Elements to a Set

Use the add() method to add an element to the set. If the element does not exist, it is added. No errors are raised if the element does exist, though.

a = [random.randint(0, 10) for x in xrange(10)]
print 'list =>', a
s = set(a)
print 'set =>', s
s.add(10)
print 'after add =>', s
# prints
list => [3, 4, 7, 2, 8, 0, 4, 1, 0, 4]
set => set([0, 1, 2, 3, 4, 7, 8])
after add => set([0, 1, 2, 3, 4, 7, 8, 10])

You will need to use a loop to add multiple elements since the add() method accepts only a single argument.

You cannot add a list to a set since the list cannot be hashed.

s.add(10)
print 'after add =>', s
s.add([21, 22])
print s
# prints
TypeErrorTraceback (most recent call last)
 in ()
      5 s.add(10)
      6 print 'after add =>', s
----> 7 s.add([21, 22])
      8 print s

TypeError: unhashable type: 'list'

However, a tuple can be added since it is not mutable and hence hashable.

s.add((21, 22))
print s
# prints
set([0, 3, 4, 5, 6, 7, (21, 22), 9, 10])

Removing Elements from a Set

Remove a single element from a set using remove().

a = [random.randint(0, 10) for x in xrange(10)]
print 'list =>', a
s = set(a)
print 'set =>', s
s.remove(10)
print 'after remove =>', s
# prints
list => [6, 6, 7, 6, 7, 5, 10, 3, 8, 3]
set => set([3, 5, 6, 7, 8, 10])
after remove => set([3, 5, 6, 7, 8])

A KeyError is raised if the element is not in the set. (Running the same code as above a couple of times generates a random sequence without 10 in the set.)

# prints
list => [0, 4, 4, 4, 6, 6, 9, 5, 9, 6]
set => set([0, 9, 4, 5, 6])

KeyErrorTraceback (most recent call last)
 in ()
      3 s = set(a)
      4 print 'set =>', s
----> 5 s.remove(10)
      6 print 'after remove =>', s

KeyError: 10

Need to remove an element from a set without the pesky KeyError? Use discard().

print s
s.remove(0)
s.discard(20)
print s
# prints
set([0, 2, 3, 4, 8, 9])
set([2, 3, 4, 8, 9])

Remove all elements from a set? Use clear().

print s
s.clear()
print s
# prints
set([2, 3, 4, 8, 9])
set([])

Set Operations

Let's now learn about set operations supported by a set.

Disjoint Sets

A set is disjoint with another set if the two have no common elements. The method isdisjoint() returns True or False as appropriate.

print set([0, 3, 6]).isdisjoint(set([9, 10, 5, 7]))
# prints True

Another example:

print set([0, 1, 2, 3, 4]).isdisjoint(set([8, 1]))
# prints False

Checking for Subset and Superset

Check whether all elements of a set are contained in another set using the issubset() method. You can also use the boolean form setA <= setB.

Using the form setA < setB checks for setA being a proper subset of setB (that is setB containing all elements from setA and then some more).

Need to check for a superset? Use issuperset() or setA >= setB or setA > setB for a proper superset.

a = set([1, 3, 4, 5])
b = set([1, 3, 4, 5])
c = set([1, 3, 4, 5, 6, 7])
print 'a = ', a
print 'b = ', b
print 'c = ', c
print 'a <= b', a <= b
print 'a < b', a < b
print 'issubset', a.issubset(b)
print 'a < c', a < c
# prints
a =  set([1, 3, 4, 5])
b =  set([1, 3, 4, 5])
c =  set([1, 3, 4, 5, 6, 7])
a <= b True
a < b False
issubset True
a < c True

Set Union

Compute the union of two or more sets using the union() method. A new set containing all elements of all sets is returned.

You can also use the pipe operator (|) as shown below.

a = set([1, 2, 3])
b = set([3, 4, 5, 6])
c = set(list('abcd'))
print a.union(b, c)
print a | b | c
# prints
set(['a', 1, 2, 3, 4, 5, 6, 'b', 'c', 'd'])
set(['a', 1, 2, 3, 4, 5, 6, 'b', 'c', 'd'])

Set Intersection

How about identifying elements common to two or more sets? Use the intersection() method or the & operator.

print a & b
print a & b & c
# prints
set([3])
set([])

Set Difference

Set difference returns a new set containing all elements in the argument set that are not in the other sets.

print a - b
print a - b - set([2])
# prints
set([1, 2])
set([1])

Iterating Over Sets

There are several ways of iterating over sets, most common ones are presented here.

  • A set is an iterable and hence can be used in a for loop for iterating over the elements.
    a = set([random.randint(0, 10) for _ in xrange(10)])
    print a
    for x in a:
      print x
    # prints
    set([0, 2, 5, 7, 8, 9])
    0
    2
    5
    7
    8
    9
  • The ever-present enumerate() function is available, which returns a tuple of loop index and the element. Note that the loop index does not have any correlation to the set; in other words, a set does not have a concept of any ordering, so the index is not an index into the set. It is just a loop counter.
    for i, v in enumerate(a):
      print i, v
    # prints
    0 1
    1 2
    2 3
    3 4
    4 5
    5 6
    6 8

Conclusion

And that’s it for now with sets. We learned how to create sets using the brace notation as well as the set constructors. Next up were the various commonly used operations with sets.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
big data ,python ,python sets ,tutorial

Published at DZone with permission of Jay Sridhar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}