My goal for today’s post is to give a deeper understanding of what tuples are and how those distinctions make them useful, saving their many novel applications involving packing/unpacking for a maybe-someday follow-up post so that I can address it a a psudo-distinct concept. Then, for those interested in what I’ve been writing, I hope to follow that up with tutorials on sequence types generally, lists specifically, usages of map, reduce and filter, and then the itertools module. Apparently my enthusiasm for sequences has a ways to go before it plays itself out. I make no promise of sticking to that schedule, but if I do I hope to release each of the above a slightly accelerated pace compared to my usual posting speed.
That all being said, for those of you who came here to learn (or better yet, teach me something!), I’ll get started on the tutorial now.
What They Are
Python tuples belong to the group of sequential data types, a subset of containers which is comprised of strings, Unicode strings, lists, tuples, bytearrays, buffers, and xrange objects. Tuples, like lists, can contain heterogenous ordered data; and all of the standard sequence operations can be applied to them. The primary difference between lists and tuples is that tuples are immutable, which means that once created, they cannot be altered. New tuples can be created from old tuples, and sub-tuples can be extracted, but once a tuple has been instantiated it is left utterly unchanged until the garbage collector cleans it up.
There are two very similar syntaxes for creating tuples, either just a comma separated list of values, or a comma separated list of values enclosed by technically optional parenthesis. I say ‘technically optional’ because to the very best of my understanding to exclude the parenthesis is uncommon and unpopular. I personally favour parenthesis as well, as I feel they keep visual consistency with lists while also clearly indicating that the created object is a tuple.
# Although you can do this: no_parens_tuple = 1, 'b', 2 # Personally I lean towards parens_tuple = ('a', 2, 'c')
What I feel helps my case in favour of the usage of parens is that the syntax to create a single item tuple is the element with a comma after it, with or without parens. Using just the comma without the parens is easy to miss and makes the code easier to misunderstand. To further emphasize my point, a zero-element tuple can only be created by just a pair of parenthesis with nothing between them. Given that these two uses both make a strong case for using the parens (even as uncommon as they may be), for the sake of consistency they should be used for longer tuples as well.
readable_tuple = ('spectacular',) ambiguous_tuple = 'spectacular', empty_tuple = ()
Please note that the trailing comma is required for a single item tuple, even if parenthesis are included. Otherwise the parens are disregarded and the result is just the object intended for the tuple.
remembered = (10,) remembered (10,) forgot = (10) forgot 10
The gathering of values into a tuple is called ‘tuple packing.’ The converse also exists, called ‘tuple unpacking,’ and I’ll delve into it in part two of this series. It’s a concept that deserves individual addressing.
Tuples have a couple more interesting (or challenging) characteristics that are born of their immutable nature. Firstly, they are hashable, which means (amongst other things) that they can be used as keys in dictionaries. Were you to try using a list as a key in a dict it would fail, and with good reason; as soon as the list was altered its hash value would change and it would point to a different or nonexistent bucket of items in the dictionary. To be honest, there aren’t many cases where hashing a tuple is going to be essential, but there’s a pretty healthy number where it’s neat, intuitive, or more pythonic than the alternative.
Functionally, it sometimes helps to think of tuples as a subset of lists; they forgo all the non-standard-sequence functions (such as append() and index assignment) but in exchange are a faster and more lightweight data type. If performance is a concern and it’s an option, tuples are the way to go. Before further addressing when and why to use tuples, I’d like to illustrate a little further the distinctions between mutable lists and immutable tuples below:
person_tuple = ('James Woods', 45, 'Probably Hollywood', 'Actor') person_list = ['Luke Skywalker', 25, 'Far Far Away', 'Jedi Knight'] # Of course appending to a list works like you'd expect person_list.append('Yoda's School of the Jedi Arts') # But we can't do this with tuples person_tuple.append('School of Hard Knocks') Traceback (most recent call last): File '<stdin>', line 1, in <module> AttributeError: 'tuple' object has no attribute 'append' # Assigning new values doesn't work either person_tuple[3] = 'Celebrity' Traceback (most recent call last): File '<stdin>', line 1, in <module> TypeError: 'tuple' object does not support item assignment # Tuples can be used to derive new tuples: slice_tuple = person_tuple[1:3] # Using slices combined_tuple = person_tuple + person_tuple # Appending one to another copied_tuple = person_tuple[:] # Duplicating tuple tuple_val = person_tuple[0] # Index retrieval works as you'd expect
When To Use Them
Because lists and tuples are so similar, and lists grant additional flexibility, it’s easily to fall back on the argument that lists will do just fine, particularly because they’re so comfy and familiar to most people by the time they discover tuples. And to be honest, a great majority of the time using lists will be perfectly adequate, and if by the end of this tutorial you’re still confused by tuples and unsure of where they belong, go ahead and use that list, it’s okay. Chances are it’ll work just fine and no one will ever even think to question you. That being said, there are times that the advantages of tuples (such as their immutability, hashability, and performance gains) are significant enough to justify their usage. My goal here is to give you just a bit of a sense of when these cases might be.
You may note that many of my examples so far have contained heterogenous data constructed to represent a complex object (like a person). This is because unlike with lists we have a guaranteed contract of consistency with tuples, meaning that if I create a tuple with a string, a number, and then a bytearray, I can know with absolute certainty when I operate on that tuple later that the properties and their types will match the schema and values with which I created them earlier. Because there is no guarantee of order or data type with a list after it’s been created you lose that contract of consistency; and therefor I find that as a rule of thumb although lists techinically support heterogenous data, I try to keep them to sets of a single data type and use them in cases where I’m iterating over the elements and performing a uniform operation on all of them.
Tuples on the other hand are write-protected, almost like they have an implicit assert statement verifying that their data will be correctly formed. For this reason I use them typically for cases where I need a quick stand-in for an object, one that’s light-weight, easy reconfigurable at a later date, and not needed for use elsewhere. For example in a quick script to pull rows of data from a static source (database, csv, or other similar store) where you’re doing all the heavy lifting for one reason or another, it makes sense to handle rows as tuples because it wouldn’t make sense to modify, reorder, or resize those rows; it would create inconsistency of data that would be difficult to process. If our rows are tuples then when we use code like the following:
for row_num, name, email, first_pet, mothers_maiden_name in rows: print 'Processing %s (%d/%d)' % (name, row_num, len(rows)) recover_password(email, first_pet, mothers_maiden_name)
We know our code won’t fail because of an accidentally extended row or a swapped name and email (unless of course the original data was corrupt, which is a whole different story). Python itself is written around this concept, that tuples have an implied semantic and an index has meaning. A great example of this is the return value of python’s database API specification, where fetchmany() and fetchall() return a list (ordered sequence of homogenous data) of tuples representing rows (hetergenous set where order has specific meaning). Cool, eh?
Another of tuples’ advantages needs less longwinded addressing, their improved performance, as it is relatively self explanatory. More lightweight than lists, dictionaries, or instantiating custom objects, creating and iterating over tuples is simply faster. Of course, this isn’t carte blanche, wrapping a list in a one-element tuple won’t speed up processing of that list, but for tuples that contain only hashable data types (simple and immutable types) there’s a pretty decent speed boost. Rarely will you find yourself writing python where the difference between list and tuples performance is a make or break choice, but if you’re dealing with large amounts of performance sensitive code and data, sticking to tuples can be one of many choices you make to make sure it all runs at top speed.
Finally, the hashability of tuples is another simple but neat little feature. This, like the performance gains, is particular to tuples that contain hashable data, but can be applied in some pretty fun places. Generally, it makes any situation where you have a couple of shared properties between objects that you would like to group and access them by a lot easier. For example, lat/long coordinate pairs for people or residencies, or month_of_year, day_of_month pairs make it easy to group date-tracked data while neatly handling that pesky leap-year problem.
lat_long_tup = (49, -117) lat_long_list = [37, -122] # If I were storing a bunch of house locations, it'd be easy to group them by lat/long coordinates. # I do this by coordinates instead of by city because I used to live in a forest and no, # Vancouver isn't a close enough approximation. It's eight hours away! peoples_houses = {} peoples_houses[lat_long_tup] = [my_old_house, my_old_neighbour] #This will fail due to lists being unhashable peoples_houses[lat_long_list] = [my_new_house, my_old_neighbour] Traceback (most recent call last): File '', line 1, in TypeError: unhashable type: 'list'
Extra Reading
A shorter, better, and cooler description of tuples and how they compare to lists
An excellent tuple tutorial, and it’s illustrated!
Sequence types. The table in this section is an invaluable reference
Python tuples aren’t just constant lists