- 1. Welcome
- 2. Dedication
- 3. Preface
- 4. Introduction
- 5. Installation
- 6. First Steps
- 7. Basics
- 8. Operators and Expressions
- 9. Control Flow
- 10. Functions
- 11. Modules
- 12. Data Structures
- 13. Problem Solving
- 14. Object Oriented Programming
- 15. Input and Output
- 16. Exceptions
- 17. Standard Library
- 18. More
- 19. Appendix: FLOSS
- 20. Appendix: Colophon
- 21. Appendix: History Lesson
- 22. Appendix: Revision History
- 23. Translations
- 24. Translation Howto
9. Control Flow
In the programs we have seen till now, there has always been a series of statements faithfully
executed by Python in exact top-down order. What if you wanted to change the flow of how it works?
For example, you want the program to take some decisions and do different things depending on
different situations, such as printing 'Good Morning' or 'Good Evening' depending on the time of
the day?
As you might have guessed, this is achieved using control flow statements. There are three control
flow statements in Python - if, for and while.
9.1. The if statement
The if statement is used to check a condition: if the condition is true, we run a block of
statements (called the if-block), else we process another block of statements (called the
else-block). The else clause is optional.
Example (save as if.py):
number = 23
guess = int(raw_input('Enter an integer : '))
if guess == number:
# New block starts here
print 'Congratulations, you guessed it.'
print '(but you do not win any prizes!)'
# New block ends here
elif guess < number:
# Another block
print 'No, it is a little higher than that'
# You can do whatever you want in a block ...
else:
print 'No, it is a little lower than that'
# you must have guessed > number to reach here
print 'Done'
# This last statement is always executed,
# after the if statement is executed.
Output:
$ python if.py Enter an integer : 50 No, it is a little lower than that Done $ python if.py Enter an integer : 22 No, it is a little higher than that Done $ python if.py Enter an integer : 23 Congratulations, you guessed it. (but you do not win any prizes!) Done
How It Works
In this program, we take guesses from the user and check if it is the number that we have. We set
the variable number to any integer we want, say 23
. Then, we take the user’s guess using the
raw_input()
function. Functions are just reusable pieces of programs. We’ll read more about them
in the next chapter.
We supply a string to the built-in
raw_input
function which prints it to the screen and waits for
input from the user. Once we enter something and press enter key, the raw_input()
function
returns what we entered, as a string. We then convert this string to an integer using int
and
then store it in the variable guess
. Actually, the int
is a class but all you need to know
right now is that you can use it to convert a string to an integer (assuming the string contains a
valid integer in the text).
Next, we compare the guess of the user with the number we have chosen. If they are equal, we print
a success message. Notice that we use indentation levels to tell Python which statements belong to
which block. This is why indentation is so important in Python. I hope you are sticking to the
"consistent indentation" rule. Are you?
Notice how the
if
statement contains a colon at the end - we are indicating to Python that a
block of statements follows.
Then, we check if the guess is less than the number, and if so, we inform the user that they must
guess a little higher than that. What we have used here is the
elif
clause which actually
combines two related if else-if else
statements into one combined if-elif-else
statement. This
makes the program easier and reduces the amount of indentation required.
The
elif
and else
statements must also have a colon at the end of the logical line followed by
their corresponding block of statements (with proper indentation, of course)
You can have another
if
statement inside the if-block of an if
statement and so on - this is
called a nested if
statement.
Remember that the
elif
and else
parts are optional. A minimal valid if
statement is:if True:
print 'Yes, it is true'
After Python has finished executing the complete
if
statement along with the associated elif
and else
clauses, it moves on to the next statement in the block containing the if
statement. In this case, it is the main block (where execution of the program starts), and the next
statement is the print 'Done'
statement. After this, Python sees the ends of the program and
simply finishes up.
Even though this is a very simple program, I have been pointing out a lot of things that you should
notice. All these are pretty straightforward (and surprisingly simple for those of you from C/C++
backgrounds). You will need to become aware of all these things initially, but after some practice
you will become comfortable with them, and it will all feel 'natural' to you.
Note
|
Note for C/C++ Programmers
There is no switch statement in Python. You can use an if..elif..else statement to do the same
thing (and in some cases, use a dictionary to do it quickly)
|
9.2. The while Statement
The
while
statement allows you to repeatedly execute a block of statements as long as a condition
is true. A while
statement is an example of what is called a looping statement. A while
statement can have an optional else
clause.
Example (save as
while.py
):number = 23
running = True
while running:
guess = int(raw_input('Enter an integer : '))
if guess == number:
print 'Congratulations, you guessed it.'
# this causes the while loop to stop
running = False
elif guess < number:
print 'No, it is a little higher than that.'
else:
print 'No, it is a little lower than that.'
else:
print 'The while loop is over.'
# Do anything else you want to do here
print 'Done'
Output:
$ python while.py Enter an integer : 50 No, it is a little lower than that. Enter an integer : 22 No, it is a little higher than that. Enter an integer : 23 Congratulations, you guessed it. The while loop is over. Done
How It Works
In this program, we are still playing the guessing game, but the advantage is that the user is
allowed to keep guessing until he guesses correctly - there is no need to repeatedly run the
program for each guess, as we have done in the previous section. This aptly demonstrates the use of
the while
statement.
We move the
raw_input
and if
statements to inside the while
loop and set the variable
running
to True
before the while loop. First, we check if the variable running
is True
and
then proceed to execute the corresponding while-block. After this block is executed, the
condition is again checked which in this case is the running
variable. If it is true, we execute
the while-block again, else we continue to execute the optional else-block and then continue to the
next statement.
The
else
block is executed when the while
loop condition becomes False
- this may even be the
first time that the condition is checked. If there is an else
clause for a while
loop, it is
always executed unless you break out of the loop with a break
statement.
The
True
and False
are called Boolean types and you can consider them to be equivalent to the
value 1
and 0
respectively.
Note
|
Note for C/C++ Programmers
Remember that you can have an else clause for the while loop.
|
9.3. The for loop
The
for..in
statement is another looping statement which iterates over a sequence of objects
i.e. go through each item in a sequence. We will see more about sequences in detail in
later chapters. What you need to know right now is that a sequence is just an ordered collection of
items.
Example (save as
for.py
):for i in range(1, 5):
print i
else:
print 'The for loop is over'
Output:
$ python for.py 1 2 3 4 The for loop is over
How It Works
In this program, we are printing a sequence of numbers. We generate this sequence of numbers
using the built-in range
function.
What we do here is supply it two numbers and
range
returns a sequence of numbers starting from
the first number and up to the second number. For example, range(1,5)
gives the sequence [1, 2,
3, 4]
. By default, range
takes a step count of 1. If we supply a third number to range
, then
that becomes the step count. For example, range(1,5,2)
gives [1,3]
. Remember that the range
extends up to the second number i.e. it does not include the second number.
Note that
range()
generates a sequence of numbers, but it will generate only one number at a
time, when the for loop requests for the next item. If you want to see the full sequence of numbers
immediately, use list(range())
. Lists are explained in the data structures
chapter.
The
for
loop then iterates over this range - for i in range(1,5)
is equivalent to for i in [1,
2, 3, 4]
which is like assigning each number (or object) in the sequence to i, one at a time, and
then executing the block of statements for each value of i
. In this case, we just print the
value in the block of statements.
Remember that the
else
part is optional. When included, it is always executed once after the
for
loop is over unless a break statement is encountered.
Remember that the
for..in
loop works for any sequence. Here, we have a list of numbers generated
by the built-in range
function, but in general we can use any kind of sequence of any kind of
objects! We will explore this idea in detail in later chapters.
Note
|
Note for C/C++/Java/C# Programmers
The Python
for loop is radically different from the C/C++ for loop. C# programmers will note
that the for loop in Python is similar to the foreach loop in C#. Java programmers will note
that the same is similar to for (int i : IntArray) in Java 1.5.
In C/C, if you want to write `for (int i = 0; i < 5; i)
, then in Python you write just `for i
in range(0,5) . As you can see, the for loop is simpler, more expressive and less error prone in
Python. |
9.4. The break Statement
The
break
statement is used to break out of a loop statement i.e. stop the execution of a
looping statement, even if the loop condition has not become False
or the sequence of items has
not been completely iterated over.
An important note is that if you break out of a
for
or while
loop, any corresponding loop
else
block is not executed.
Example (save as
break.py
):while True:
s = raw_input('Enter something : ')
if s == 'quit':
break
print 'Length of the string is', len(s)
print 'Done'
Output:
$ python break.py Enter something : Programming is fun Length of the string is 18 Enter something : When the work is done Length of the string is 21 Enter something : if you wanna make your work also fun: Length of the string is 37 Enter something : use Python! Length of the string is 11 Enter something : quit Done
How It Works
In this program, we repeatedly take the user’s input and print the length of each input each
time. We are providing a special condition to stop the program by checking if the user input is
'quit'
. We stop the program by breaking out of the loop and reach the end of the program.
The length of the input string can be found out using the built-in
len
function.
Remember that the
break
statement can be used with the for
loop as well.9.5. The continue Statement
The
continue
statement is used to tell Python to skip the rest of the statements in the current
loop block and to continue to the next iteration of the loop.
Example (save as
continue.py
):while True:
s = raw_input('Enter something : ')
if s == 'quit':
break
if len(s) < 3:
print 'Too small'
continue
print 'Input is of sufficient length'
# Do other kinds of processing here...
Output:
$ python continue.py Enter something : a Too small Enter something : 12 Too small Enter something : abc Input is of sufficient length Enter something : quit
How It Works
In this program, we accept input from the user, but we process the input string only if it is at
least 3 characters long. So, we use the built-in len
function to get the length and if the length
is less than 3, we skip the rest of the statements in the block by using the continue
statement. Otherwise, the rest of the statements in the loop are executed, doing any kind of
processing we want to do here.
Note that the
continue
statement works with the for
loop as well.9.6. Summary
We have seen how to use the three control flow statements -
if
, while
and for
along with
their associated break
and continue
statements. These are some of the most commonly used parts
of Python and hence, becoming comfortable with them is essential.
Next, we will see how to create and use functions.
10. Functions
Functions are reusable pieces of programs. They allow you to give a name to a block of statements,
allowing you to run that block using the specified name anywhere in your program and any number of
times. This is known as calling the function. We have already used many built-in functions such
as
len
and range
.
The function concept is probably the most important building block of any non-trivial software
(in any programming language), so we will explore various aspects of functions in this chapter.
Functions are defined using the
def
keyword. After this keyword comes an identifier name for
the function, followed by a pair of parentheses which may enclose some names of variables, and by
the final colon that ends the line. Next follows the block of statements that are part of this
function. An example will show that this is actually very simple:
Example (save as
function1.py
):def say_hello():
# block belonging to the function
print 'hello world'
# End of function
say_hello() # call the function
say_hello() # call the function again
Output:
$ python function1.py hello world hello world
How It Works
We define a function called say_hello
using the syntax as explained above. This function takes no
parameters and hence there are no variables declared in the parentheses. Parameters to functions
are just input to the function so that we can pass in different values to it and get back
corresponding results.
Notice that we can call the same function twice which means we do not have to write the same code
again.
10.1. Function Parameters
A function can take parameters, which are values you supply to the function so that the function
can do something utilising those values. These parameters are just like variables except that the
values of these variables are defined when we call the function and are already assigned values
when the function runs.
Parameters are specified within the pair of parentheses in the function definition, separated by
commas. When we call the function, we supply the values in the same way. Note the terminology
used - the names given in the function definition are called parameters whereas the values you
supply in the function call are called arguments.
Example (save as
function_param.py
):def print_max(a, b):
if a > b:
print a, 'is maximum'
elif a == b:
print a, 'is equal to', b
else:
print b, 'is maximum'
# directly pass literal values
print_max(3, 4)
x = 5
y = 7
# pass variables as arguments
print_max(x, y)
Output:
$ python function_param.py 4 is maximum 7 is maximum
How It Works
Here, we define a function called print_max
that uses two parameters called a
and b
. We find
out the greater number using a simple if..else
statement and then print the bigger number.
The first time we call the function
print_max
, we directly supply the numbers as arguments. In
the second case, we call the function with variables as arguments. print_max(x, y)
causes the
value of argument x
to be assigned to parameter a
and the value of argument y
to be assigned
to parameter b
. The printMax function works the same way in both cases.10.2. Local Variables
When you declare variables inside a function definition, they are not related in any way to other
variables with the same names used outside the function - i.e. variable names are local to the
function. This is called the scope of the variable. All variables have the scope of the block
they are declared in starting from the point of definition of the name.
Example (save as
function_local.py
):x = 50
def func(x):
print 'x is', x
x = 2
print 'Changed local x to', x
func(x)
print 'x is still', x
Output:
$ python function_local.py x is 50 Changed local x to 2 x is still 50
How It Works
The first time that we print the value of the name x with the first line in the function’s
body, Python uses the value of the parameter declared in the main block, above the function
definition.
Next, we assign the value
2
to x
. The name x
is local to our function. So, when we change
the value of x
in the function, the x
defined in the main block remains unaffected.
With the last
print
statement, we display the value of x
as defined in the main block, thereby
confirming that it is actually unaffected by the local assignment within the previously called
function.10.3. The global statement
If you want to assign a value to a name defined at the top level of the program (i.e. not inside
any kind of scope such as functions or classes), then you have to tell Python that the name is not
local, but it is global. We do this using the
global
statement. It is impossible to assign a
value to a variable defined outside a function without the global
statement.
You can use the values of such variables defined outside the function (assuming there is no
variable with the same name within the function). However, this is not encouraged and should be
avoided since it becomes unclear to the reader of the program as to where that variable’s
definition is. Using the
global
statement makes it amply clear that the variable is defined in an
outermost block.
Example (save as
function_global.py
):x = 50
def func():
global x
print 'x is', x
x = 2
print 'Changed global x to', x
func()
print 'Value of x is', x
Output:
$ python function_global.py x is 50 Changed global x to 2 Value of x is 2
How It Works
The global
statement is used to declare that x
is a global variable - hence, when we assign a
value to x
inside the function, that change is reflected when we use the value of x
in the main
block.
You can specify more than one global variable using the same
global
statement e.g. global x, y,
z
.10.4. Default Argument Values
For some functions, you may want to make some parameters optional and use default values in case
the user does not want to provide values for them. This is done with the help of default argument
values. You can specify default argument values for parameters by appending to the parameter name
in the function definition the assignment operator (
=
) followed by the default value.
Note that the default argument value should be a constant. More precisely, the default argument
value should be immutable - this is explained in detail in later chapters. For now, just remember
this.
Example (save as
function_default.py
):def say(message, times=1):
print message * times
say('Hello')
say('World', 5)
Output:
$ python function_default.py Hello WorldWorldWorldWorldWorld
How It Works
The function named say
is used to print a string as many times as specified. If we don’t supply a
value, then by default, the string is printed just once. We achieve this by specifying a default
argument value of 1
to the parameter times
.
In the first usage of
say
, we supply only the string and it prints the string once. In the second
usage of say
, we supply both the string and an argument 5
stating that we want to say the
string message 5 times.
Caution
|
Only those parameters which are at the end of the parameter list can be given default argument
values i.e. you cannot have a parameter with a default argument value preceding a parameter without
a default argument value in the function’s parameter list.
This is because the values are assigned to the parameters by position. For example,
def func(a,
b=5) is valid, but def func(a=5, b) is not valid. |
10.5. Keyword Arguments
If you have some functions with many parameters and you want to specify only some of them, then you
can give values for such parameters by naming them - this is called keyword arguments - we use
the name (keyword) instead of the position (which we have been using all along) to specify the
arguments to the function.
There are two advantages - one, using the function is easier since we do not need to worry about
the order of the arguments. Two, we can give values to only those parameters to which we want to,
provided that the other parameters have default argument values.
Example (save as
function_keyword.py
):def func(a, b=5, c=10):
print 'a is', a, 'and b is', b, 'and c is', c
func(3, 7)
func(25, c=24)
func(c=50, a=100)
Output:
$ python function_keyword.py a is 3 and b is 7 and c is 10 a is 25 and b is 5 and c is 24 a is 100 and b is 5 and c is 50
How It Works
The function named func
has one parameter without a default argument value, followed by two
parameters with default argument values.
In the first usage,
func(3, 7)
, the parameter a
gets the value 3
, the parameter b
gets the
value 7
and c
gets the default value of 10
.
In the second usage
func(25, c=24)
, the variable a
gets the value of 25 due to the position of
the argument. Then, the parameter c
gets the value of 24
due to naming i.e. keyword
arguments. The variable b
gets the default value of 5
.
In the third usage
func(c=50, a=100)
, we use keyword arguments for all specified values. Notice
that we are specifying the value for parameter c
before that for a
even though a
is defined
before c
in the function definition.10.6. VarArgs parameters
Sometimes you might want to define a function that can take any number of parameters,
i.e. variable number of arguments, this can be achieved by using the stars (save as
function_varargs.py
):def total(initial=5, *numbers, **keywords):
count = initial
for number in numbers:
count += number
for key in keywords:
count += keywords[key]
return count
print total(10, 1, 2, 3, vegetables=50, fruits=100)
Output:
$ python function_varargs.py 166
How It Works
When we declare a starred parameter such as *param
, then all the positional arguments from that
point till the end are collected as a tuple called 'param'.
Similarly, when we declare a double-starred parameter such as
**param
, then all the keyword
arguments from that point till the end are collected as a dictionary called 'param'.
We will explore tuples and dictionaries in a later chapter.
10.7. The return statement
The
return
statement is used to return from a function i.e. break out of the function. We can
optionally return a value from the function as well.
Example (save as
function_return.py
):def maximum(x, y):
if x > y:
return x
elif x == y:
return 'The numbers are equal'
else:
return y
print maximum(2, 3)
Output:
$ python function_return.py 3
How It Works
The maximum
function returns the maximum of the parameters, in this case the numbers supplied to
the function. It uses a simple if..else
statement to find the greater value and then returns
that value.
Note that a
return
statement without a value is equivalent to return None
. None
is a special
type in Python that represents nothingness. For example, it is used to indicate that a variable has
no value if it has a value of None
.
Every function implicitly contains a
return None
statement at the end unless you have written
your own return
statement. You can see this by running print some_function()
where the function
some_function
does not use the return
statement such as:def some_function():
pass
The
pass
statement is used in Python to indicate an empty block of statements.
Tip
|
There is a built-in function called max that already implements the 'find maximum'
functionality, so use this built-in function whenever possible.
|
10.8. DocStrings
Python has a nifty feature called documentation strings, usually referred to by its shorter name
docstrings. DocStrings are an important tool that you should make use of since it helps to
document the program better and makes it easier to understand. Amazingly, we can even get the
docstring back from, say a function, when the program is actually running!
Example (save as
function_docstring.py
):def print_max(x, y):
'''Prints the maximum of two numbers.
The two values must be integers.'''
# convert to integers, if possible
x = int(x)
y = int(y)
if x > y:
print x, 'is maximum'
else:
print y, 'is maximum'
print_max(3, 5)
print print_max.__doc__
Output:
$ python function_docstring.py 5 is maximum Prints the maximum of two numbers. The two values must be integers.
How It Works
A string on the first logical line of a function is the docstring for that function. Note that
DocStrings also apply to modules and classes which we will learn about in the
respective chapters.
The convention followed for a docstring is a multi-line string where the first line starts with a
capital letter and ends with a dot. Then the second line is blank followed by any detailed
explanation starting from the third line. You are strongly advised to follow this convention for
all your docstrings for all your non-trivial functions.
We can access the docstring of the
print_max
function using the doc
(notice the double
underscores) attribute (name belonging to) of the function. Just remember that Python treats
everything as an object and this includes functions. We’ll learn more about objects in the
chapter on classes.
If you have used
help()
in Python, then you have already seen the
usage of docstrings! What it does is just fetch the doc
attribute of that function and displays it in a neat manner for
you. You can try it out on the function above - just include help(print_max)
in your
program. Remember to press the q
key to exit help
.
Automated tools can retrieve the documentation from your program in this manner. Therefore, I
strongly recommend that you use docstrings for any non-trivial function that you write. The
pydoc
command that comes with your Python distribution works similarly to help()
using
docstrings.10.9. Summary
We have seen so many aspects of functions but note that we still haven’t covered all aspects of
them. However, we have already covered most of what you’ll use regarding Python functions on an
everyday basis.
Next, we will see how to use as well as create Python modules.
11. Modules
You have seen how you can reuse code in your program by defining functions once. What if you wanted
to reuse a number of functions in other programs that you write? As you might have guessed, the
answer is modules.
There are various methods of writing modules, but the simplest way is to create a file with a
.py
extension that contains functions and variables.
Another method is to write the modules in the native language in which the Python interpreter
itself was written. For example, you can write modules in the C
programming language and when compiled, they can be used from your Python code when using the
standard Python interpreter.
A module can be imported by another program to make use of its functionality. This is how we can
use the Python standard library as well. First, we will see how to use the standard library
modules.
Example (save as
module_using_sys.py
):import sys
print('The command line arguments are:')
for i in sys.argv:
print i
print '\n\nThe PYTHONPATH is', sys.path, '\n'
Output:
$ python module_using_sys.py we are arguments The command line arguments are: module_using_sys.py we are arguments The PYTHONPATH is ['/tmp/py', # many entries here, not shown here '/Library/Python/2.7/site-packages', '/usr/local/lib/python2.7/site-packages']
How It Works
First, we import the sys
module using the import
statement. Basically, this translates to us
telling Python that we want to use this module. The sys
module contains functionality related to
the Python interpreter and its environment i.e. the system.
When Python executes the
import sys
statement, it looks for the sys
module. In this case, it is
one of the built-in modules, and hence Python knows where to find it.
If it was not a compiled module i.e. a module written in Python, then the Python interpreter will
search for it in the directories listed in its
sys.path
variable. If the module is found, then
the statements in the body of that module are run and the module is made available for you
to use. Note that the initialization is done only the first time that we import a module.
The
argv
variable in the sys
module is accessed using the dotted notation i.e. sys.argv
. It
clearly indicates that this name is part of the sys
module. Another advantage of this approach is
that the name does not clash with any argv
variable used in your program.
The
sys.argv
variable is a list of strings (lists are explained in detail in a
later chapter. Specifically, the sys.argv
contains the list of command line
arguments i.e. the arguments passed to your program using the command line.
If you are using an IDE to write and run these programs, look for a way to specify command line
arguments to the program in the menus.
Here, when we execute
python module_using_sys.py we are arguments
, we run the module
module_using_sys.py
with the python
command and the other things that follow are arguments
passed to the program. Python stores the command line arguments in the sys.argv
variable for us
to use.
Remember, the name of the script running is always the first argument in the
sys.argv
list. So,
in this case we will have 'module_using_sys.py'
as sys.argv[0]
, 'we'
as sys.argv[1]
,
'are'
as sys.argv[2]
and 'arguments'
as sys.argv[3]
. Notice that Python starts counting
from 0 and not 1.
The
sys.path
contains the list of directory names where modules are imported from. Observe that
the first string in sys.path
is empty - this empty string indicates that the current directory is
also part of the sys.path
which is same as the PYTHONPATH
environment variable. This means that
you can directly import modules located in the current directory. Otherwise, you will have to place
your module in one of the directories listed in sys.path
.
Note that the current directory is the directory from which the program is launched. Run
import
os; print os.getcwd()
to find out the current directory of your program.11.1. Byte-compiled .pyc files
Importing a module is a relatively costly affair, so Python does some tricks to make it faster. One
way is to create byte-compiled files with the extension
.pyc
which is an intermediate form that
Python transforms the program into (remember the introduction section on how Python
works?). This .pyc
file is useful when you import the module the next time from a different
program - it will be much faster since a portion of the processing required in importing a module
is already done. Also, these byte-compiled files are platform-independent.
Note
|
These .pyc files are usually created in the same directory as the corresponding .py
files. If Python does not have permission to write to files in that directory, then the .pyc
files will not be created.
|
11.2. The from … import statement
If you want to directly import the
argv
variable into your program (to avoid typing the sys.
everytime for it), then you can use the from sys import argv
statement.
In general, you should avoid using this statement and use the
import
statement instead since
your program will avoid name clashes and will be more readable.
Example:
from math import sqrt
print "Square root of 16 is", sqrt(16)
11.3. A module’s name
Every module has a name and statements in a module can find out the name of their module. This is
handy for the particular purpose of figuring out whether the module is being run standalone or
being imported. As mentioned previously, when a module is imported for the first time, the code it
contains gets executed. We can use this to make the module behave in different ways depending on
whether it is being used by itself or being imported from another module. This can be achieved
using the
name
attribute of the module.
Example (save as
module_using_name.py
):if __name__ == '__main__':
print 'This program is being run by itself'
else:
print 'I am being imported from another module'
Output:
$ python module_using_name.py This program is being run by itself $ python >>> import module_using_name I am being imported from another module >>>
How It Works
Every Python module has its name
defined. If this is 'main'
, that implies that the
module is being run standalone by the user and we can take appropriate actions.11.4. Making Your Own Modules
Creating your own modules is easy, you’ve been doing it all along! This is because every Python
program is also a module. You just have to make sure it has a
.py
extension. The following
example should make it clear.
Example (save as
mymodule.py
):def say_hi():
print 'Hi, this is mymodule speaking.'
__version__ = '0.1'
The above was a sample module. As you can see, there is nothing particularly special about it
compared to our usual Python program. We will next see how to use this module in our other Python
programs.
Remember that the module should be placed either in the same directory as the program from which we
import it, or in one of the directories listed in
sys.path
.
Another module (save as
mymodule_demo.py
):import mymodule
mymodule.say_hi()
print 'Version', mymodule.__version__
Output:
$ python mymodule_demo.py Hi, this is mymodule speaking. Version 0.1
How It Works
Notice that we use the same dotted notation to access members of the module. Python makes good
reuse of the same notation to give the distinctive 'Pythonic' feel to it so that we don’t have to
keep learning new ways to do things.
Here is a version utilising the
from..import
syntax (save as mymodule_demo2.py
):from mymodule import say_hi, __version__
say_hi()
print 'Version', __version__
The output of
mymodule_demo2.py
is same as the output of mymodule_demo.py
.
Notice that if there was already a
version
name declared in the module that imports mymodule,
there would be a clash. This is also likely because it is common practice for each module to
declare it’s version number using this name. Hence, it is always recommended to prefer the import
statement even though it might make your program a little longer.
You could also use:
from mymodule import *
This will import all public names such as
say_hi
but would not import version
because it
starts with double underscores.
Warning
|
Remember that you should avoid using import-star, i.e. from mymodule import * .
|
11.5. The dir function
You can use the built-in
dir
function to list the identifiers that an object defines. For
example, for a module, the identifiers include the functions, classes and variables defined in that
module.
When you supply a module name to the`dir()` function, it returns the list of the names defined in
that module. When no argument is applied to it, it returns the list of names defined in the current
module.
Example:
$ python >>> import sys # get names of attributes in sys module >>> dir(sys) ['__displayhook__', '__doc__', 'argv', 'builtin_module_names', 'version', 'version_info'] # only few entries shown here # get names of attributes for current module >>> dir() ['__builtins__', '__doc__', '__name__', '__package__'] # create a new variable 'a' >>> a = 5 >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'a'] # delete/remove a name >>> del a >>> dir() ['__builtins__', '__doc__', '__name__', '__package__']
How It Works
First, we see the usage of dir
on the imported sys
module. We can see the huge list of
attributes that it contains.
Next, we use the
dir
function without passing parameters to it. By default, it returns the list
of attributes for the current module. Notice that the list of imported modules is also part of this
list.
In order to observe the
dir
in action, we define a new variable a
and assign it a value and
then check dir
and we observe that there is an additional value in the list of the same name. We
remove the variable/attribute of the current module using the del
statement and the change is
reflected again in the output of the dir
function.
A note on
del
- this statement is used to delete a variable/name and after the statement has
run, in this case del a
, you can no longer access the variable a
- it is as if it never existed
before at all.
Note that the
dir()
function works on any object. For example, run dir(print)
to learn
about the attributes of the print function, or dir(str)
for the attributes of the str class.
There is also a
vars()
function which can
potentially give you the attributes and their values, but it will not work for all cases.11.6. Packages
By now, you must have started observing the hierarchy of organizing your programs. Variables
usually go inside functions. Functions and global variables usually go inside modules. What if you
wanted to organize modules? That’s where packages come into the picture.
Packages are just folders of modules with a special
init.py
file that indicates to Python
that this folder is special because it contains Python modules.
Let’s say you want to create a package called 'world' with subpackages 'asia', 'africa', etc. and
these subpackages in turn contain modules like 'india', 'madagascar', etc.
This is how you would structure the folders:
- <some folder present in the sys.path>/ - world/ - __init__.py - asia/ - __init__.py - india/ - __init__.py - foo.py - africa/ - __init__.py - madagascar/ - __init__.py - bar.py
Packages are just a convenience to hierarchically organize modules. You will see many instances of
this in the standard library.
11.7. Summary
Just like functions are reusable parts of programs, modules are reusable programs. Packages are
another hierarchy to organize modules. The standard library that comes with Python is an example of
such a set of packages and modules.
We have seen how to use these modules and create our own modules.
Next, we will learn about some interesting concepts called data structures.
12. Data Structures
Data structures are basically just that - they are structures which can hold some data
together. In other words, they are used to store a collection of related data.
There are four built-in data structures in Python - list, tuple, dictionary and set. We will see
how to use each of them and how they make life easier for us.
12.1. List
A
list
is a data structure that holds an ordered collection of items i.e. you can store a
sequence of items in a list. This is easy to imagine if you can think of a shopping list where
you have a list of items to buy, except that you probably have each item on a separate line in your
shopping list whereas in Python you put commas in between them.
The list of items should be enclosed in square brackets so that Python understands that you are
specifying a list. Once you have created a list, you can add, remove or search for items in the
list. Since we can add and remove items, we say that a list is a mutable data type i.e. this type
can be altered.
12.2. Quick Introduction To Objects And Classes
Although I’ve been generally delaying the discussion of objects and classes till now, a little
explanation is needed right now so that you can understand lists better. We will explore this topic
in detail in a later chapter.
A list is an example of usage of objects and classes. When we use a variable
i
and assign a value
to it, say integer 5
to it, you can think of it as creating an object (i.e. instance) i
of
class (i.e. type) int
. In fact, you can read help(int)
to understand this better.
A class can also have methods i.e. functions defined for use with respect to that class only. You
can use these pieces of functionality only when you have an object of that class. For example,
Python provides an
append
method for the list
class which allows you to add an item to the end
of the list. For example, mylist.append('an item')
will add that string to the list
mylist
. Note the use of dotted notation for accessing methods of the objects.
A class can also have fields which are nothing but variables defined for use with respect to that
class only. You can use these variables/names only when you have an object of that class. Fields
are also accessed by the dotted notation, for example,
mylist.field
.
Example (save as
ds_using_list.py
):# This is my shopping list
shoplist = ['apple', 'mango', 'carrot', 'banana']
print 'I have', len(shoplist), 'items to purchase.'
print 'These items are:',
for item in shoplist:
print item,
print '\nI also have to buy rice.'
shoplist.append('rice')
print 'My shopping list is now', shoplist
print 'I will sort my list now'
shoplist.sort()
print 'Sorted shopping list is', shoplist
print 'The first item I will buy is', shoplist[0]
olditem = shoplist[0]
del shoplist[0]
print 'I bought the', olditem
print 'My shopping list is now', shoplist
Output:
$ python ds_using_list.py I have 4 items to purchase. These items are: apple mango carrot banana I also have to buy rice. My shopping list is now ['apple', 'mango', 'carrot', 'banana', 'rice'] I will sort my list now Sorted shopping list is ['apple', 'banana', 'carrot', 'mango', 'rice'] The first item I will buy is apple I bought the apple My shopping list is now ['banana', 'carrot', 'mango', 'rice']
How It Works
The variable shoplist
is a shopping list for someone who is going to the market. In shoplist
,
we only store strings of the names of the items to buy but you can add any kind of object to a
list including numbers and even other lists.
We have also used the
for..in
loop to iterate through the items of the list. By now, you must
have realised that a list is also a sequence. The speciality of sequences will be discussed in a
later section.
Notice the use of the trailing comma in the
print
statement to indicate that we want to end the
output with a space instead of the usual line break. Think of the comma as telling Python that we
have more items to print on the same line.
Next, we add an item to the list using the
append
method of the list object, as already discussed
before. Then, we check that the item has been indeed added to the list by printing the contents of
the list by simply passing the list to the print
statement which prints it neatly.
Then, we sort the list by using the
sort
method of the list. It is important to understand that
this method affects the list itself and does not return a modified list - this is different from
the way strings work. This is what we mean by saying that lists are mutable and that strings are
immutable.
Next, when we finish buying an item in the market, we want to remove it from the list. We achieve
this by using the
del
statement. Here, we mention which item of the list we want to remove and
the del
statement removes it from the list for us. We specify that we want to remove the first
item from the list and hence we use del shoplist[0]
(remember that Python starts counting from
0).
If you want to know all the methods defined by the list object, see
help(list)
for details.12.3. Tuple
Tuples are used to hold together multiple objects. Think of them as similar to lists, but without
the extensive functionality that the list class gives you. One major feature of tuples is that they
are immutable like strings i.e. you cannot modify tuples.
Tuples are defined by specifying items separated by commas within an optional pair of parentheses.
Tuples are usually used in cases where a statement or a user-defined function can safely assume
that the collection of values i.e. the tuple of values used will not change.
Example (save as
ds_using_tuple.py
):# I would recommend always using parentheses
# to indicate start and end of tuple
# even though parentheses are optional.
# Explicit is better than implicit.
zoo = ('python', 'elephant', 'penguin')
print 'Number of animals in the zoo is', len(zoo)
new_zoo = 'monkey', 'camel', zoo
print 'Number of cages in the new zoo is', len(new_zoo)
print 'All animals in new zoo are', new_zoo
print 'Animals brought from old zoo are', new_zoo[2]
print 'Last animal brought from old zoo is', new_zoo[2][2]
print 'Number of animals in the new zoo is', \
len(new_zoo)-1+len(new_zoo[2])
Output:
$ python ds_using_tuple.py Number of animals in the zoo is 3 Number of cages in the new zoo is 3 All animals in new zoo are ('monkey', 'camel', ('python', 'elephant', 'penguin')) Animals brought from old zoo are ('python', 'elephant', 'penguin') Last animal brought from old zoo is penguin Number of animals in the new zoo is 5
How It Works
The variable zoo
refers to a tuple of items. We see that the len
function can be used to get
the length of the tuple. This also indicates that a tuple is a sequence as well.
We are now shifting these animals to a new zoo since the old zoo is being closed. Therefore, the
new_zoo
tuple contains some animals which are already there along with the animals brought over
from the old zoo. Back to reality, note that a tuple within a tuple does not lose its identity.
We can access the items in the tuple by specifying the item’s position within a pair of square
brackets just like we did for lists. This is called the indexing operator. We access the third
item in
new_zoo
by specifying new_zoo[2]
and we access the third item within the third item in
the new_zoo
tuple by specifying new_zoo[2][2]
. This is pretty simple once you’ve understood the
idiom.
Note
|
Tuple with 0 or 1 items
An empty tuple is constructed by an empty pair of parentheses such as myempty = () . However, a
tuple with a single item is not so simple. You have to specify it using a comma following the first
(and only) item so that Python can differentiate between a tuple and a pair of parentheses
surrounding the object in an expression i.e. you have to specify singleton = (2 , ) if you mean
you want a tuple containing the item 2 .
|
Note
|
Note for Perl programmers
A list within a list does not lose its identity i.e. lists are not flattened as in Perl. The same
applies to a tuple within a tuple, or a tuple within a list, or a list within a tuple, etc. As far
as Python is concerned, they are just objects stored using another object, that’s all.
|
12.4. Dictionary
A dictionary is like an address-book where you can find the address or contact details of a person
by knowing only his/her name i.e. we associate keys (name) with values (details). Note that the
key must be unique just like you cannot find out the correct information if you have two persons
with the exact same name.
Note that you can use only immutable objects (like strings) for the keys of a dictionary but you
can use either immutable or mutable objects for the values of the dictionary. This basically
translates to say that you should use only simple objects for keys.
Pairs of keys and values are specified in a dictionary by using the notation
d = {key1 : value1,
key2 : value2 }
. Notice that the key-value pairs are separated by a colon and the pairs are
separated themselves by commas and all this is enclosed in a pair of curly braces.
Remember that key-value pairs in a dictionary are not ordered in any manner. If you want a
particular order, then you will have to sort them yourself before using it.
The dictionaries that you will be using are instances/objects of the
dict
class.
Example (save as
ds_using_dict.py
):# 'ab' is short for 'a'ddress'b'ook
ab = { 'Swaroop' : 'swaroop@swaroopch.com',
'Larry' : 'larry@wall.org',
'Matsumoto' : 'matz@ruby-lang.org',
'Spammer' : 'spammer@hotmail.com'
}
print "Swaroop's address is", ab['Swaroop']
# Deleting a key-value pair
del ab['Spammer']
print '\nThere are {} contacts in the address-book\n'.format(len(ab))
for name, address in ab.items():
print 'Contact {} at {}'.format(name, address)
# Adding a key-value pair
ab['Guido'] = 'guido@python.org'
if 'Guido' in ab:
print "\nGuido's address is", ab['Guido']
Output:
$ python ds_using_dict.py Swaroop's address is swaroop@swaroopch.com There are 3 contacts in the address-book Contact Swaroop at swaroop@swaroopch.com Contact Matsumoto at matz@ruby-lang.org Contact Larry at larry@wall.org Guido's address is guido@python.org
How It Works
We create the dictionary ab
using the notation already discussed. We then access key-value pairs
by specifying the key using the indexing operator as discussed in the context of lists and
tuples. Observe the simple syntax.
We can delete key-value pairs using our old friend - the
del
statement. We simply specify the
dictionary and the indexing operator for the key to be removed and pass it to the del
statement. There is no need to know the value corresponding to the key for this operation.
Next, we access each key-value pair of the dictionary using the
items
method of the dictionary
which returns a list of tuples where each tuple contains a pair of items - the key followed by the
value. We retrieve this pair and assign it to the variables name
and address
correspondingly
for each pair using the for..in
loop and then print these values in the for-block.
We can add new key-value pairs by simply using the indexing operator to access a key and assign
that value, as we have done for Guido in the above case.
We can check if a key-value pair exists using the
in
operator.
For the list of methods of the
dict
class, see help(dict)
.
Tip
|
Keyword Arguments and Dictionaries
If you have used keyword arguments in your functions, you have already used dictionaries! Just
think about it - the key-value pair is specified by you in the parameter list of the function
definition and when you access variables within your function, it is just a key access of a
dictionary (which is called the symbol table in compiler design terminology).
|
12.5. Sequence
Lists, tuples and strings are examples of sequences, but what are sequences and what is so special
about them?
The major features are membership tests, (i.e. the
in
and not in
expressions) and indexing
operations, which allow us to fetch a particular item in the sequence directly.
The three types of sequences mentioned above - lists, tuples and strings, also have a slicing
operation which allows us to retrieve a slice of the sequence i.e. a part of the sequence.
Example (save as
ds_seq.py
):shoplist = ['apple', 'mango', 'carrot', 'banana']
name = 'swaroop'
# Indexing or 'Subscription' operation #
print 'Item 0 is', shoplist[0]
print 'Item 1 is', shoplist[1]
print 'Item 2 is', shoplist[2]
print 'Item 3 is', shoplist[3]
print 'Item -1 is', shoplist[-1]
print 'Item -2 is', shoplist[-2]
print 'Character 0 is', name[0]
# Slicing on a list #
print 'Item 1 to 3 is', shoplist[1:3]
print 'Item 2 to end is', shoplist[2:]
print 'Item 1 to -1 is', shoplist[1:-1]
print 'Item start to end is', shoplist[:]
# Slicing on a string #
print 'characters 1 to 3 is', name[1:3]
print 'characters 2 to end is', name[2:]
print 'characters 1 to -1 is', name[1:-1]
print 'characters start to end is', name[:]
Output:
$ python ds_seq.py Item 0 is apple Item 1 is mango Item 2 is carrot Item 3 is banana Item -1 is banana Item -2 is carrot Character 0 is s Item 1 to 3 is ['mango', 'carrot'] Item 2 to end is ['carrot', 'banana'] Item 1 to -1 is ['mango', 'carrot'] Item start to end is ['apple', 'mango', 'carrot', 'banana'] characters 1 to 3 is wa characters 2 to end is aroop characters 1 to -1 is waroo characters start to end is swaroop
How It Works
First, we see how to use indexes to get individual items of a sequence. This is also referred to as
the subscription operation. Whenever you specify a number to a sequence within square brackets as
shown above, Python will fetch you the item corresponding to that position in the
sequence. Remember that Python starts counting numbers from 0. Hence, shoplist[0]
fetches the
first item and shoplist[3]
fetches the fourth item in the `shoplist`sequence.
The index can also be a negative number, in which case, the position is calculated from the end of
the sequence. Therefore,
shoplist[-1]
refers to the last item in the sequence and shoplist[-2]
fetches the second last item in the sequence.
The slicing operation is used by specifying the name of the sequence followed by an optional pair
of numbers separated by a colon within square brackets. Note that this is very similar to the
indexing operation you have been using till now. Remember the numbers are optional but the colon
isn’t.
The first number (before the colon) in the slicing operation refers to the position from where the
slice starts and the second number (after the colon) indicates where the slice will stop at. If the
first number is not specified, Python will start at the beginning of the sequence. If the second
number is left out, Python will stop at the end of the sequence. Note that the slice returned
starts at the start position and will end just before the end position i.e. the start position
is included but the end position is excluded from the sequence slice.
Thus,
shoplist[1:3]
returns a slice of the sequence starting at position 1, includes position 2
but stops at position 3 and therefore a slice of two items is returned. Similarly, shoplist[:]
returns a copy of the whole sequence.
You can also do slicing with negative positions. Negative numbers are used for positions from the
end of the sequence. For example,
shoplist[:-1]
will return a slice of the sequence which
excludes the last item of the sequence but contains everything else.
You can also provide a third argument for the slice, which is the step for the slicing (by
default, the step size is 1):
>>> shoplist = ['apple', 'mango', 'carrot', 'banana']
>>> shoplist[::1]
['apple', 'mango', 'carrot', 'banana']
>>> shoplist[::2]
['apple', 'carrot']
>>> shoplist[::3]
['apple', 'banana']
>>> shoplist[::-1]
['banana', 'carrot', 'mango', 'apple']
Notice that when the step is 2, we get the items with position 0, 2,… When the step size is 3, we
get the items with position 0, 3, etc.
Try various combinations of such slice specifications using the Python interpreter interactively
i.e. the prompt so that you can see the results immediately. The great thing about sequences is
that you can access tuples, lists and strings all in the same way!
12.6. Set
Sets are unordered collections of simple objects. These are used when the existence of an object
in a collection is more important than the order or how many times it occurs.
Using sets, you can test for membership, whether it is a subset of another set, find the
intersection between two sets, and so on.
>>> bri = set(['brazil', 'russia', 'india'])
>>> 'india' in bri
True
>>> 'usa' in bri
False
>>> bric = bri.copy()
>>> bric.add('china')
>>> bric.issuperset(bri)
True
>>> bri.remove('russia')
>>> bri & bric # OR bri.intersection(bric)
{'brazil', 'india'}
How It Works
The example is pretty much self-explanatory because it involves basic set theory mathematics taught
in school.12.7. References
When you create an object and assign it to a variable, the variable only refers to the object and
does not represent the object itself! That is, the variable name points to that part of your
computer’s memory where the object is stored. This is called binding the name to the object.
Generally, you don’t need to be worried about this, but there is a subtle effect due to references
which you need to be aware of:
Example (save as
ds_reference.py
):print 'Simple Assignment'
shoplist = ['apple', 'mango', 'carrot', 'banana']
# mylist is just another name pointing to the same object!
mylist = shoplist
# I purchased the first item, so I remove it from the list
del shoplist[0]
print 'shoplist is', shoplist
print 'mylist is', mylist
# Notice that both shoplist and mylist both print
# the same list without the 'apple' confirming that
# they point to the same object
print 'Copy by making a full slice'
# Make a copy by doing a full slice
mylist = shoplist[:]
# Remove first item
del mylist[0]
print 'shoplist is', shoplist
print 'mylist is', mylist
# Notice that now the two lists are different
Output:
$ python ds_reference.py Simple Assignment shoplist is ['mango', 'carrot', 'banana'] mylist is ['mango', 'carrot', 'banana'] Copy by making a full slice shoplist is ['mango', 'carrot', 'banana'] mylist is ['carrot', 'banana']
How It Works
Most of the explanation is available in the comments.
Remember that if you want to make a copy of a list or such kinds of sequences or complex objects
(not simple objects such as integers), then you have to use the slicing operation to make a
copy. If you just assign the variable name to another name, both of them will ''refer'' to the same
object and this could be trouble if you are not careful.
Note
|
Note for Perl programmers
Remember that an assignment statement for lists does not create a copy. You have to use slicing
operation to make a copy of the sequence.
|
12.8. More About Strings
We have already discussed strings in detail earlier. What more can there be to know? Well, did you
know that strings are also objects and have methods which do everything from checking part of a
string to stripping spaces!
The strings that you use in program are all objects of the class
str
. Some useful methods of
this class are demonstrated in the next example. For a complete list of such methods, see
help(str)
.
Example (save as
ds_str_methods.py
):# This is a string object
name = 'Swaroop'
if name.startswith('Swa'):
print 'Yes, the string starts with "Swa"'
if 'a' in name:
print 'Yes, it contains the string "a"'
if name.find('war') != -1:
print 'Yes, it contains the string "war"'
delimiter = '_*_'
mylist = ['Brazil', 'Russia', 'India', 'China']
print delimiter.join(mylist)
Output:
$ python ds_str_methods.py Yes, the string starts with "Swa" Yes, it contains the string "a" Yes, it contains the string "war" Brazil_*_Russia_*_India_*_China
How It Works
Here, we see a lot of the string methods in action. The startswith
method is used to find out
whether the string starts with the given string. The in
operator is used to check if a given
string is a part of the string.
The
find
method is used to locate the position of the given substring within the string; find
returns -1 if it is unsuccessful in finding the substring. The str
class also has a neat method
to join
the items of a sequence with the string acting as a delimiter between each item of the
sequence and returns a bigger string generated from this.12.9. Summary
We have explored the various built-in data structures of Python in detail. These data structures
will be essential for writing programs of reasonable size.
Now that we have a lot of the basics of Python in place, we will next see how to design and write a
real-world Python program.
13. Problem Solving
We have explored various parts of the Python language and now we will take a look at how all these
parts fit together, by designing and writing a program which does something useful. The idea is
to learn how to write a Python script on your own.
13.1. The Problem
The problem we want to solve is:
I want a program which creates a backup of all my important files.
Although, this is a simple problem, there is not enough information for us to get started with the
solution. A little more analysis is required. For example, how do we specify which files are to
be backed up? How are they stored? Where are they stored?
After analyzing the problem properly, we design our program. We make a list of things about how
our program should work. In this case, I have created the following list on how I want it to
work. If you do the design, you may not come up with the same kind of analysis since every person
has their own way of doing things, so that is perfectly okay.
-
The files and directories to be backed up are specified in a list.
-
The backup must be stored in a main backup directory.
-
The files are backed up into a zip file.
-
The name of the zip archive is the current date and time.
-
We use the standard
zip
command available by default in any standard GNU/Linux or Unix distribution. Note that you can use any archiving command you want as long as it has a command line interface.
Note
|
For Windows users
Windows users can install the zip command from
the GnuWin32 project page and add C:\Program
Files\GnuWin32\bin to your system PATH environment variable, similar to what we did
for recognizing the python command itself.
|
13.2. The Solution
As the design of our program is now reasonably stable, we can write the code which is an
implementation of our solution.
Save as
backup_ver1.py
:import os
import time
# 1. The files and directories to be backed up are
# specified in a list.
# Example on Windows:
# source = ['"C:\\My Documents"', 'C:\\Code']
# Example on Mac OS X and Linux:
source = ['/Users/swa/notes']
# Notice we had to use double quotes inside the string
# for names with spaces in it.
# 2. The backup must be stored in a
# main backup directory
# Example on Windows:
# target_dir = 'E:\\Backup'
# Example on Mac OS X and Linux:
target_dir = '/Users/swa/backup'
# Remember to change this to which folder you will be using
# 3. The files are backed up into a zip file.
# 4. The name of the zip archive is the current date and time
target = target_dir + os.sep + \
time.strftime('%Y%m%d%H%M%S') + '.zip'
# Create target directory if it is not present
if not os.path.exists(target_dir):
os.mkdir(target_dir) # make directory
# 5. We use the zip command to put the files in a zip archive
zip_command = "zip -r {0} {1}".format(target,
' '.join(source))
# Run the backup
print "Zip command is:"
print zip_command
print "Running:"
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
print 'Backup FAILED'
Output:
$ python backup_ver1.py Zip command is: zip -r /Users/swa/backup/20140328084844.zip /Users/swa/notes Running: adding: Users/swa/notes/ (stored 0%) adding: Users/swa/notes/blah1.txt (stored 0%) adding: Users/swa/notes/blah2.txt (stored 0%) adding: Users/swa/notes/blah3.txt (stored 0%) Successful backup to /Users/swa/backup/20140328084844.zip
Now, we are in the testing phase where we test that our program works properly. If it doesn’t
behave as expected, then we have to debug our program i.e. remove the bugs (errors) from the
program.
If the above program does not work for you, copy the line printed after the
Zip command is
line
in the output, paste it in the shell (on GNU/Linux and Mac OS X) / cmd
(on Windows), see what the
error is and try to fix it. Also check the zip command manual on what could be wrong. If this
command succeeds, then the problem might be in the Python program itself, so check if it exactly
matches the program written above.
How It Works
You will notice how we have converted our design into code in a step-by-step manner.
We make use of the
os
and time
modules by first importing them. Then, we specify the files and
directories to be backed up in the source
list. The target directory is where we store all the
backup files and this is specified in the target_dir
variable. The name of the zip archive that
we are going to create is the current date and time which we generate using the time.strftime()
function. It will also have the .zip
extension and will be stored in the target_dir
directory.
Notice the use of the
os.sep
variable - this gives the directory separator according to your
operating system i.e. it will be '/'
in GNU/Linux and Unix, it will be '\\'
in Windows and
':'
in Mac OS. Using os.sep
instead of these characters directly will make our program portable
and work across all of these systems.
The
time.strftime()
function takes a specification such as the one we have used in the above
program. The %Y
specification will be replaced by the year with the century. The %m
specification will be replaced by the month as a decimal number between 01
and 12
and
so on. The complete list of such specifications can be found in the
Python Reference Manual.
We create the name of the target zip file using the addition operator which concatenates the
strings i.e. it joins the two strings together and returns a new one. Then, we create a string
zip_command
which contains the command that we are going to execute. You can check if this
command works by running it in the shell (GNU/Linux terminal or DOS prompt).
The
zip
command that we are using has some options and parameters passed. The -r
option
specifies that the zip command should work recursively for directories i.e. it should include
all the subdirectories and files. The two options are combined and specified in a shortcut as
-qr
. The options are followed by the name of the zip archive to create followed by the list of
files and directories to backup. We convert the source
list into a string using the join
method
of strings which we have already seen how to
use.
Then, we finally run the command using the
os.system
function which runs the command as if it
was run from the system i.e. in the shell - it returns 0
if the command was successfully, else
it returns an error number.
Depending on the outcome of the command, we print the appropriate message that the backup has
failed or succeeded.
That’s it, we have created a script to take a backup of our important files!
Note
|
Note to Windows Users
Instead of double backslash escape sequences, you can also use raw strings. For example, use
'C:\\Documents' or r’C:\Documents' . However, do not use 'C:\Documents' since you end up
using an unknown escape sequence \D .
|
Now that we have a working backup script, we can use it whenever we want to take a backup of the
files. This is called the operation phase or the deployment phase of the software.
The above program works properly, but (usually) first programs do not work exactly as you
expect. For example, there might be problems if you have not designed the program properly or if
you have made a mistake when typing the code, etc. Appropriately, you will have to go back to the
design phase or you will have to debug your program.
13.3. Second Version
The first version of our script works. However, we can make some refinements to it so that it can
work better on a daily basis. This is called the maintenance phase of the software.
One of the refinements I felt was useful is a better file-naming mechanism - using the time as
the name of the file within a directory with the current date as a directory within the main
backup directory. The first advantage is that your backups are stored in a hierarchical manner and
therefore it is much easier to manage. The second advantage is that the filenames are much
shorter. The third advantage is that separate directories will help you check if you have made a
backup for each day since the directory would be created only if you have made a backup for
that day.
Save as
backup_ver2.py
:import os
import time
# 1. The files and directories to be backed up are
# specified in a list.
# Example on Windows:
# source = ['"C:\\My Documents"', 'C:\\Code']
# Example on Mac OS X and Linux:
source = ['/Users/swa/notes']
# Notice we had to use double quotes inside the string
# for names with spaces in it.
# 2. The backup must be stored in a
# main backup directory
# Example on Windows:
# target_dir = 'E:\\Backup'
# Example on Mac OS X and Linux:
target_dir = '/Users/swa/backup'
# Remember to change this to which folder you will be using
# Create target directory if it is not present
if not os.path.exists(target_dir):
os.mkdir(target_dir) # make directory
# 3. The files are backed up into a zip file.
# 4. The current day is the name of the subdirectory
# in the main directory.
today = target_dir + os.sep + time.strftime('%Y%m%d')
# The current time is the name of the zip archive.
now = time.strftime('%H%M%S')
# The name of the zip file
target = today + os.sep + now + '.zip'
# Create the subdirectory if it isn't already there
if not os.path.exists(today):
os.mkdir(today)
print 'Successfully created directory', today
# 5. We use the zip command to put the files in a zip archive
zip_command = "zip -r {0} {1}".format(target,
' '.join(source))
# Run the backup
print "Zip command is:"
print zip_command
print "Running:"
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
print 'Backup FAILED'
Output:
$ python backup_ver2.py Successfully created directory /Users/swa/backup/20140329 Zip command is: zip -r /Users/swa/backup/20140329/073201.zip /Users/swa/notes Running: adding: Users/swa/notes/ (stored 0%) adding: Users/swa/notes/blah1.txt (stored 0%) adding: Users/swa/notes/blah2.txt (stored 0%) adding: Users/swa/notes/blah3.txt (stored 0%) Successful backup to /Users/swa/backup/20140329/073201.zip
How It Works
Most of the program remains the same. The changes are that we check if there is a directory with
the current day as its name inside the main backup directory using the os.path.exists
function. If it doesn’t exist, we create it using the os.mkdir
function.13.4. Third Version
The second version works fine when I do many backups, but when there are lots of backups, I am
finding it hard to differentiate what the backups were for! For example, I might have made some
major changes to a program or presentation, then I want to associate what those changes are with
the name of the zip archive. This can be easily achieved by attaching a user-supplied comment to
the name of the zip archive.
Warning
|
The following program does not work, so do not be alarmed, please follow along because there’s a lesson in here. |
Save as
backup_ver3.py
:import os
import time
# 1. The files and directories to be backed up are
# specified in a list.
# Example on Windows:
# source = ['"C:\\My Documents"', 'C:\\Code']
# Example on Mac OS X and Linux:
source = ['/Users/swa/notes']
# Notice we had to use double quotes inside the string
# for names with spaces in it.
# 2. The backup must be stored in a
# main backup directory
# Example on Windows:
# target_dir = 'E:\\Backup'
# Example on Mac OS X and Linux:
target_dir = '/Users/swa/backup'
# Remember to change this to which folder you will be using
# Create target directory if it is not present
if not os.path.exists(target_dir):
os.mkdir(target_dir) # make directory
# 3. The files are backed up into a zip file.
# 4. The current day is the name of the subdirectory
# in the main directory.
today = target_dir + os.sep + time.strftime('%Y%m%d')
# The current time is the name of the zip archive.
now = time.strftime('%H%M%S')
# Take a comment from the user to
# create the name of the zip file
comment = raw_input('Enter a comment --> ')
# Check if a comment was entered
if len(comment) == 0:
target = today + os.sep + now + '.zip'
else:
target = today + os.sep + now + '_' +
comment.replace(' ', '_') + '.zip'
# Create the subdirectory if it isn't already there
if not os.path.exists(today):
os.mkdir(today)
print 'Successfully created directory', today
# 5. We use the zip command to put the files in a zip archive
zip_command = "zip -r {0} {1}".format(target,
' '.join(source))
# Run the backup
print "Zip command is:"
print zip_command
print "Running:"
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
print 'Backup FAILED'
Output:
$ python backup_ver3.py File "backup_ver3.py", line 39 target = today + os.sep + now + '_' + ^ SyntaxError: invalid syntax
How This (does not) Work
This program does not work! Python says there is a syntax error which means that the script does
not satisfy the structure that Python expects to see. When we observe the error given by Python, it
also tells us the place where it detected the error as well. So we start debugging our program
from that line.
On careful observation, we see that the single logical line has been split into two physical lines
but we have not specified that these two physical lines belong together. Basically, Python has
found the addition operator (
+
) without any operand in that logical line and hence it doesn’t
know how to continue. Remember that we can specify that the logical line continues in the next
physical line by the use of a backslash at the end of the physical line. So, we make this
correction to our program. This correction of the program when we find errors is called bug
fixing.13.5. Fourth Version
Save as
backup_ver4.py
:import os
import time
# 1. The files and directories to be backed up are
# specified in a list.
# Example on Windows:
# source = ['"C:\\My Documents"', 'C:\\Code']
# Example on Mac OS X and Linux:
source = ['/Users/swa/notes']
# Notice we had to use double quotes inside the string
# for names with spaces in it.
# 2. The backup must be stored in a
# main backup directory
# Example on Windows:
# target_dir = 'E:\\Backup'
# Example on Mac OS X and Linux:
target_dir = '/Users/swa/backup'
# Remember to change this to which folder you will be using
# Create target directory if it is not present
if not os.path.exists(target_dir):
os.mkdir(target_dir) # make directory
# 3. The files are backed up into a zip file.
# 4. The current day is the name of the subdirectory
# in the main directory.
today = target_dir + os.sep + time.strftime('%Y%m%d')
# The current time is the name of the zip archive.
now = time.strftime('%H%M%S')
# Take a comment from the user to
# create the name of the zip file
comment = raw_input('Enter a comment --> ')
# Check if a comment was entered
if len(comment) == 0:
target = today + os.sep + now + '.zip'
else:
target = today + os.sep + now + '_' + \
comment.replace(' ', '_') + '.zip'
# Create the subdirectory if it isn't already there
if not os.path.exists(today):
os.mkdir(today)
print 'Successfully created directory', today
# 5. We use the zip command to put the files in a zip archive
zip_command = "zip -r {0} {1}".format(target,
' '.join(source))
# Run the backup
print "Zip command is:"
print zip_command
print "Running:"
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
print 'Backup FAILED'
Output:
$ python backup_ver4.py Enter a comment --> added new examples Zip command is: zip -r /Users/swa/backup/20140329/074122_added_new_examples.zip /Users/swa/notes Running: adding: Users/swa/notes/ (stored 0%) adding: Users/swa/notes/blah1.txt (stored 0%) adding: Users/swa/notes/blah2.txt (stored 0%) adding: Users/swa/notes/blah3.txt (stored 0%) Successful backup to /Users/swa/backup/20140329/074122_added_new_examples.zip
How It Works
This program now works! Let us go through the actual enhancements that we had made in version 3. We
take in the user’s comments using the input
function and then check if the user actually entered
something by finding out the length of the input using the len
function. If the user has just
pressed enter
without entering anything (maybe it was just a routine backup or no special changes
were made), then we proceed as we have done before.
However, if a comment was supplied, then this is attached to the name of the zip archive just
before the
.zip
extension. Notice that we are replacing spaces in the comment with underscores -
this is because managing filenames without spaces is much easier.13.6. More Refinements
The fourth version is a satisfactorily working script for most users, but there is always room for
improvement. For example, you can include a verbosity level for the program where you can specify
a
-v
option to make your program become more talkative or a -q
to make it quiet.
Another possible enhancement would be to allow extra files and directories to be passed to the
script at the command line. We can get these names from the
sys.argv
list and we can add them to
our source
list using the extend
method provided by the list
class.
The most important refinement would be to not use the
os.system
way of creating archives and
instead using the zipfile or
tarfile built-in modules to create these
archives. They are part of the standard library and available already for you to use without
external dependencies on the zip program to be available on your computer.
However, I have been using the
os.system
way of creating a backup in the above examples purely
for pedagogical purposes, so that the example is simple enough to be understood by everybody but
real enough to be useful.
Can you try writing the fifth version that uses the
zipfile module instead of the
os.system
call?13.7. The Software Development Process
We have now gone through the various phases in the process of writing a software. These phases
can be summarised as follows:
-
What (Analysis)
-
How (Design)
-
Do It (Implementation)
-
Test (Testing and Debugging)
-
Use (Operation or Deployment)
-
Maintain (Refinement)
A recommended way of writing programs is the procedure we have
followed in creating the backup script: Do the analysis and
design. Start implementing with a simple version. Test and debug
it. Use it to ensure that it works as expected. Now, add any features that you want and continue to
repeat the Do It-Test-Use cycle as many times as required.
Remember:
Software is grown, not built.
13.8. Summary
We have seen how to create our own Python programs/scripts and the various stages involved in
writing such programs. You may find it useful to create your own program just like we did in this
chapter so that you become comfortable with Python as well as problem-solving.
Next, we will discuss object-oriented programming.
14. Object Oriented Programming
In all the programs we wrote till now, we have designed our program around functions i.e. blocks of
statements which manipulate data. This is called the procedure-oriented way of programming. There
is another way of organizing your program which is to combine data and functionality and wrap it
inside something called an object. This is called the object oriented programming paradigm. Most
of the time you can use procedural programming, but when writing large programs or have a problem
that is better suited to this method, you can use object oriented programming techniques.
Classes and objects are the two main aspects of object oriented programming. A class creates a
new type where objects are instances of the class. An analogy is that you can have variables
of type
int
which translates to saying that variables that store integers are variables which are
instances (objects) of the int
class.
Note
|
Note for Static Language Programmers
Note that even integers are treated as objects (of the
int class). This is unlike C++ and Java
(before version 1.5) where integers are primitive native types.
See
help(int) for more details on the class.
C# and Java 1.5 programmers will find this similar to the boxing and unboxing concept.
|
Objects can store data using ordinary variables that belong to the object. Variables that belong
to an object or class are referred to as fields. Objects can also have functionality by using
functions that belong to a class. Such functions are called methods of the class. This
terminology is important because it helps us to differentiate between functions and variables which
are independent and those which belong to a class or object. Collectively, the fields and methods
can be referred to as the attributes of that class.
Fields are of two types - they can belong to each instance/object of the class or they can belong
to the class itself. They are called instance variables and class variables respectively.
A class is created using the
class
keyword. The fields and methods of the class are listed in an
indented block.14.1. The self
Class methods have only one specific difference from ordinary
functions - they must have an extra first name that has to be added to
the beginning of the parameter list, but you do not give a value
for this parameter when you call the method, Python will provide
it. This particular variable refers to the object itself, and by convention, it is given the name
self
.
Although, you can give any name for this parameter, it is strongly recommended that you use the
name
self
- any other name is definitely frowned upon. There are many advantages to using a
standard name - any reader of your program will immediately recognize it and even specialized IDEs
(Integrated Development Environments) can help you if you use self
.
Note
|
Note for C++/Java/C# Programmers
The self in Python is equivalent to the this pointer in C++ and the this reference in Java
and C#.
|
You must be wondering how Python gives the value for
self
and why you don’t need to give a value
for it. An example will make this clear. Say you have a class called MyClass
and an instance of
this class called myobject
. When you call a method of this object as myobject.method(arg1,
arg2)
, this is automatically converted by Python into MyClass.method(myobject, arg1, arg2)
-
this is all the special self
is about.
This also means that if you have a method which takes no arguments, then you still have to have one
argument - the
self
.14.2. Classes
The simplest class possible is shown in the following example (save as
oop_simplestclass.py
).class Person:
pass # An empty block
p = Person()
print(p)
Output:
$ python oop_simplestclass.py <__main__.Person instance at 0x10171f518>
How It Works
We create a new class using the class
statement and the name of the class. This is followed by an
indented block of statements which form the body of the class. In this case, we have an empty block
which is indicated using the pass
statement.
Next, we create an object/instance of this class using the name of the class followed by a pair of
parentheses. (We will learn more about instantiation in the next section). For our
verification, we confirm the type of the variable by simply printing it. It tells us that we have
an instance of the
Person
class in the main
module.
Notice that the address of the computer memory where your object is stored is also printed. The
address will have a different value on your computer since Python can store the object wherever it
finds space.
14.3. Methods
We have already discussed that classes/objects can have methods just like functions except that we
have an extra
self
variable. We will now see an example (save as oop_method.py
).class Person:
def say_hi(self):
print('Hello, how are you?')
p = Person()
p.say_hi()
# The previous 2 lines can also be written as
# Person().say_hi()
Output:
$ python oop_method.py Hello, how are you?
How It Works
Here we see the self
in action. Notice that the say_hi
method takes no parameters but still has
the self
in the function definition.
14.4. The init
method
There are many method names which have special significance in Python classes. We will see the
significance of the
init
method now.
The
init
method is run as soon as an object of a class is instantiated. The method is useful
to do any initialization you want to do with your object. Notice the double underscores both at
the beginning and at the end of the name.
Example (save as
oop_init.py
):class Person:
def __init__(self, name):
self.name = name
def say_hi(self):
print 'Hello, my name is', self.name
p = Person('Swaroop')
p.say_hi()
# The previous 2 lines can also be written as
# Person('Swaroop').say_hi()
Output:
$ python oop_init.py Hello, my name is Swaroop
How It Works
Here, we define the init
method as taking a parameter name
(along with the usual self
).
Here, we just create a new field also called name
. Notice these are two different variables even
though they are both called 'name'. There is no problem because the dotted notation self.name
means that there is something called "name" that is part of the object called "self" and the other
name
is a local variable. Since we explicitly indicate which name we are referring to, there is
no confusion.
Most importantly, notice that we do not explicitly call the
init
method but pass the
arguments in the parentheses following the class name when creating a new instance of the
class. This is the special significance of this method.
Now, we are able to use the
self.name
field in our methods which is demonstrated in the sayHi
method.14.5. Class And Object Variables
We have already discussed the functionality part of classes and objects (i.e. methods), now let us
learn about the data part. The data part, i.e. fields, are nothing but ordinary variables that are
bound to the namespaces of the classes and objects. This means that these names are valid
within the context of these classes and objects only. That’s why they are called name spaces.
There are two types of fields - class variables and object variables which are classified
depending on whether the class or the object owns the variables respectively.
Class variables are shared - they can be accessed by all instances of that class. There is only
one copy of the class variable and when any one object makes a change to a class variable, that
change will be seen by all the other instances.
Object variables are owned by each individual object/instance of the class. In this case, each
object has its own copy of the field i.e. they are not shared and are not related in any way to the
field by the same name in a different instance. An example will make this easy to understand (save
as
oop_objvar.py
):class Robot:
"""Represents a robot, with a name."""
# A class variable, counting the number of robots
population = 0
def __init__(self, name):
"""Initializes the data."""
self.name = name
print "(Initializing {})".format(self.name)
# When this person is created, the robot
# adds to the population
Robot.population += 1
def die(self):
"""I am dying."""
print "{} is being destroyed!".format(self.name)
Robot.population -= 1
if Robot.population == 0:
print "{} was the last one.".format(self.name)
else:
print "There are still {:d} robots working.".format(
Robot.population)
def say_hi(self):
"""Greeting by the robot.
Yeah, they can do that."""
print "Greetings, my masters call me {}.".format(self.name)
@classmethod
def how_many(cls):
"""Prints the current population."""
print "We have {:d} robots.".format(cls.population)
droid1 = Robot("R2-D2")
droid1.say_hi()
Robot.how_many()
droid2 = Robot("C-3PO")
droid2.say_hi()
Robot.how_many()
print "\nRobots can do some work here.\n"
print "Robots have finished their work. So let's destroy them."
droid1.die()
droid2.die()
Robot.how_many()
Output:
$ python oop_objvar.py (Initializing R2-D2) Greetings, my masters call me R2-D2. We have 1 robots. (Initializing C-3PO) Greetings, my masters call me C-3PO. We have 2 robots. Robots can do some work here. Robots have finished their work. So let's destroy them. R2-D2 is being destroyed! There are still 1 robots working. C-3PO is being destroyed! C-3PO was the last one. We have 0 robots.
How It Works
This is a long example but helps demonstrate the nature of class and object variables. Here,
population
belongs to the Robot
class and hence is a class variable. The name
variable belongs
to the object (it is assigned using self
) and hence is an object variable.
Thus, we refer to the
population
class variable as Robot.population
and not as
self.population
. We refer to the object variable name
using self.name
notation in the methods
of that object. Remember this simple difference between class and object variables. Also note that
an object variable with the same name as a class variable will hide the class variable!
Instead of
Robot.population
, we could have also used self.__class__.population because every
object refers to it’s class via the self.__class__ attribute.
The
how_many
is actually a method that belongs to the class and not to the object. This means we
can define it as either a classmethod
or a staticmethod
depending on whether we need to know
which class we are part of. Since we refer to a class variable, let’s use classmethod
.
We have marked the
how_many
method as a class method using a decorator.
Decorators can be imagined to be a shortcut to calling a wrapper function, so applying the
@classmethod
decorator is same as calling:how_many = classmethod(how_many)
Observe that the
init
method is used to initialize the Robot
instance with a name. In this
method, we increase the population
count by 1 since we have one more robot being added. Also
observe that the values of self.name
is specific to each object which indicates the nature of
object variables.
Remember, that you must refer to the variables and methods of the same object using the
self
only. This is called an attribute reference.
In this program, we also see the use of docstrings for classes as well as methods. We can access
the class docstring at runtime using
Robot.doc
and the method docstring as
Robot.say_hi.doc
In the
die
method, we simply decrease the Robot.population
count by 1.
All class members are public. One exception: If you use data members with names using the double
underscore prefix such as
__privatevar
, Python uses name-mangling to effectively make it a
private variable.
Thus, the convention followed is that any variable that is to be used only within the class or
object should begin with an underscore and all other names are public and can be used by other
classes/objects. Remember that this is only a convention and is not enforced by Python (except for
the double underscore prefix).
Note
|
Note for C++/Java/C# Programmers
All class members (including the data members) are public and all the methods are virtual in
Python.
|
14.6. Inheritance
One of the major benefits of object oriented programming is reuse of code and one of the ways
this is achieved is through the inheritance mechanism. Inheritance can be best imagined as
implementing a type and subtype relationship between classes.
Suppose you want to write a program which has to keep track of the teachers and students in a
college. They have some common characteristics such as name, age and address. They also have
specific characteristics such as salary, courses and leaves for teachers and, marks and fees for
students.
You can create two independent classes for each type and process them but adding a new common
characteristic would mean adding to both of these independent classes. This quickly becomes
unwieldy.
A better way would be to create a common class called
SchoolMember
and then have the teacher and
student classes inherit from this class i.e. they will become sub-types of this type (class) and
then we can add specific characteristics to these sub-types.
There are many advantages to this approach. If we add/change any functionality in
SchoolMember
,
this is automatically reflected in the subtypes as well. For example, you can add a new ID card
field for both teachers and students by simply adding it to the SchoolMember class. However,
changes in the subtypes do not affect other subtypes. Another advantage is that if you can refer to
a teacher or student object as a SchoolMember
object which could be useful in some situations
such as counting of the number of school members. This is called polymorphism where a sub-type
can be substituted in any situation where a parent type is expected i.e. the object can be treated
as an instance of the parent class.
Also observe that we reuse the code of the parent class and we do not need to repeat it in the
different classes as we would have had to in case we had used independent classes.
The
SchoolMember
class in this situation is known as the base class or the superclass. The
Teacher
and Student
classes are called the derived classes or subclasses.
We will now see this example as a program (save as
oop_subclass.py
):class SchoolMember:
'''Represents any school member.'''
def __init__(self, name, age):
self.name = name
self.age = age
print '(Initialized SchoolMember: {})'.format(self.name)
def tell(self):
'''Tell my details.'''
print 'Name:"{}" Age:"{}"'.format(self.name, self.age),
class Teacher(SchoolMember):
'''Represents a teacher.'''
def __init__(self, name, age, salary):
SchoolMember.__init__(self, name, age)
self.salary = salary
print '(Initialized Teacher: {})'.format(self.name)
def tell(self):
SchoolMember.tell(self)
print 'Salary: "{:d}"'.format(self.salary)
class Student(SchoolMember):
'''Represents a student.'''
def __init__(self, name, age, marks):
SchoolMember.__init__(self, name, age)
self.marks = marks
print '(Initialized Student: {})'.format(self.name)
def tell(self):
SchoolMember.tell(self)
print 'Marks: "{:d}"'.format(self.marks)
t = Teacher('Mrs. Shrividya', 40, 30000)
s = Student('Swaroop', 25, 75)
# prints a blank line
print
members = [t, s]
for member in members:
# Works for both Teachers and Students
member.tell()
Output:
$ python oop_subclass.py (Initialized SchoolMember: Mrs. Shrividya) (Initialized Teacher: Mrs. Shrividya) (Initialized SchoolMember: Swaroop) (Initialized Student: Swaroop) Name:"Mrs. Shrividya" Age:"40" Salary: "30000" Name:"Swaroop" Age:"25" Marks: "75"
How It Works
To use inheritance, we specify the base class names in a tuple following the class name in the
class definition. Next, we observe that the init
method of the base class is explicitly
called using the self
variable so that we can initialize the base class part of the object. This
is very important to remember - Python does not automatically call the constructor of the base
class, you have to explicitly call it yourself.
We also observe that we can call methods of the base class by prefixing the class name to the
method call and then pass in the
self
variable along with any arguments.
Notice that we can treat instances of
Teacher
or Student
as just instances of the
SchoolMember
when we use the tell
method of the SchoolMember
class.
Also, observe that the
tell
method of the subtype is called and not the tell
method of the
SchoolMember
class. One way to understand this is that Python always starts looking for methods
in the actual type, which in this case it does. If it could not find the method, it starts looking
at the methods belonging to its base classes one by one in the order they are specified in the
tuple in the class definition.
A note on terminology - if more than one class is listed in the inheritance tuple, then it is
called multiple inheritance.
The trailing comma is used at the end of the
print
statement in the superclass’s tell()
method
to print a line and allow the next print to continue on the same line. This is a trick to make
print
not print a \n
(newline) symbol at the end of the printing.14.7. Summary
We have now explored the various aspects of classes and objects as well as the various
terminologies associated with it. We have also seen the benefits and pitfalls of object-oriented
programming. Python is highly object-oriented and understanding these concepts carefully will help
you a lot in the long run.
Next, we will learn how to deal with input/output and how to access files in Python.
15. Input and Output
There will be situations where your program has to interact with the user. For example, you would
want to take input from the user and then print some results back. We can achieve this using the
raw_input()
function and print
statement respectively.
For output, we can also use the various methods of the
str
(string) class. For example, you can
use the rjust
method to get a string which is right justified to a specified width. See
help(str)
for more details.
Another common type of input/output is dealing with files. The ability to create, read and write
files is essential to many programs and we will explore this aspect in this chapter.
15.1. Input from user
Save this program as
io_input.py
:def reverse(text):
return text[::-1]
def is_palindrome(text):
return text == reverse(text)
something = raw_input("Enter text: ")
if is_palindrome(something):
print "Yes, it is a palindrome"
else:
print "No, it is not a palindrome"
Output:
$ python io_input.py Enter text: sir No, it is not a palindrome $ python io_input.py Enter text: madam Yes, it is a palindrome $ python io_input.py Enter text: racecar Yes, it is a palindrome
How It Works
We use the slicing feature to reverse the text. We’ve already seen how we can make
slices from sequences using the seq[a:b]
code starting from position a
to position
b
. We can also provide a third argument that determines the step by which the slicing is
done. The default step is 1
because of which it returns a continuous part of the text. Giving a
negative step, i.e., -1
will return the text in reverse.
The
raw_input()
function takes a string as argument and displays it to the user. Then it waits
for the user to type something and press the return key. Once the user has entered and pressed the
return key, the raw_input()
function will then return that text the user has entered.
We take that text and reverse it. If the original text and reversed text are equal, then the text
is a palindrome.
15.1.1. Homework exercise
Checking whether a text is a palindrome should also ignore punctuation, spaces and case. For
example, "Rise to vote, sir." is also a palindrome but our current program doesn’t say it is. Can
you improve the above program to recognize this palindrome?
If you need a hint, the idea is that… [2]
15.2. Files
You can open and use files for reading or writing by creating an object of the
file
class and
using its read
, readline
or write
methods appropriately to read from or write to the
file. The ability to read or write to the file depends on the mode you have specified for the file
opening. Then finally, when you are finished with the file, you call the close
method to tell
Python that we are done using the file.
Example (save as
io_using_file.py
):poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
# Open for 'w'riting
f = open('poem.txt', 'w')
# Write text to file
f.write(poem)
# Close the file
f.close()
# If no mode is specified,
# 'r'ead mode is assumed by default
f = open('poem.txt')
while True:
line = f.readline()
# Zero length indicates EOF
if len(line) == 0:
break
# The `line` already has a newline
# at the end of each line
# since it is reading from a file.
print line,
# close the file
f.close()
Output:
$ python io_using_file.py Programming is fun When the work is done if you wanna make your work also fun: use Python!
How It Works
First, open a file by using the built-in open
function and specifying the name of the file and
the mode in which we want to open the file. The mode can be a read mode ('r'
), write mode ('w'
)
or append mode ('a'
). We can also specify whether we are reading, writing, or appending in text
mode ('t'
) or binary mode ('b'
). There are actually many more modes available and help(open)
will give you more details about them. By default, open()
considers the file to be a 't’ext file
and opens it in 'r’ead mode.
In our example, we first open the file in write text mode and use the
write
method of the file
object to write to the file and then we finally close
the file.
Next, we open the same file again for reading. We don’t need to specify a mode because 'read text
file' is the default mode. We read in each line of the file using the
readline
method in a
loop. This method returns a complete line including the newline character at the end of the
line. When an empty string is returned, it means that we have reached the end of the file and we
'break' out of the loop.
In the end, we finally
close
the file.
Now, check the contents of the
poem.txt
file to confirm that the program has indeed written to
and read from that file.15.3. Pickle
Python provides a standard module called
pickle
using which you can store any plain Python
object in a file and then get it back later. This is called storing the object persistently.
Example (save as
io_pickle.py
):import pickle
# The name of the file where we will store the object
shoplistfile = 'shoplist.data'
# The list of things to buy
shoplist = ['apple', 'mango', 'carrot']
# Write to the file
f = open(shoplistfile, 'wb')
# Dump the object to a file
pickle.dump(shoplist, f)
f.close()
# Destroy the shoplist variable
del shoplist
# Read back from the storage
f = open(shoplistfile, 'rb')
# Load the object from the file
storedlist = pickle.load(f)
print storedlist
Output:
$ python io_pickle.py ['apple', 'mango', 'carrot']
How It Works
To store an object in a file, we have to first open
the file in write binary mode and
then call the dump
function of the pickle
module. This process is called pickling.
Next, we retrieve the object using the
load
function of the pickle
module which returns the
object. This process is called unpickling.15.4. Unicode
So far, when we have been writing and using strings, or reading and writing to a file, we have used
simple English characters only. If we want to be able to read and write other non-English
languages, we need to use the
unicode
type, and it all starts with the character u
:>>> "hello world" 'hello world' >>> type("hello world") <type 'str'> >>> u"hello world" u'hello world' >>> type(u"hello world") <type 'unicode'>
We use the
unicode
type instead of strings
to make sure that we handle non-English languages in
our programs. However, when we read or write to a file or when we talk to other computers on the
Internet, we need to convert our unicode strings into a format that can be sent and received, and
that format is called "UTF-8". We can read and write in that format, using a simple keyword
argument to our standard open
function:# encoding=utf-8
import io
f = io.open("abc.txt", "wt", encoding="utf-8")
f.write(u"Imagine non-English language here")
f.close()
text = io.open("abc.txt", encoding="utf-8").read()
print text
How It Works
You can ignore the import
statement for now, we’ll explore that in detail in the modules
chapter.
Whenever we write a program that uses Unicode literals like we have used above, we have to make
sure that Python itself is told that our program uses UTF-8, and we have to put
# encoding=utf-8
comment at the top of our program.
We use
io.open
and provide the "encoding" and "decoding" argument to tell Python that we are
using unicode, and in fact, we have to pass in a string in the form of u""
to make it clear that
we are using Unicode strings.
You should learn more about this topic by reading:
15.5. Summary
We have discussed various types of input/output, about file handling, about the pickle module and
about Unicode.
Next, we will explore the concept of exceptions.
16. Exceptions
Exceptions occur when exceptional situations occur in your program. For example, what if you are
going to read a file and the file does not exist? Or what if you accidentally deleted it when the
program was running? Such situations are handled using exceptions.
Similarly, what if your program had some invalid statements? This is handled by Python which
raises its hands and tells you there is an error.
16.1. Errors
Consider a simple
print
function call. What if we misspelt print
as Print
? Note the
capitalization. In this case, Python raises a syntax error.>>> Print "Hello World" File "<stdin>", line 1 Print "Hello World" ^ SyntaxError: invalid syntax >>> print "Hello World" Hello World
Observe that a
SyntaxError
is raised and also the location where the error was detected is
printed. This is what an error handler for this error does.16.2. Exceptions
We will try to read input from the user. Press
ctrl-d
and see what happens.>>> s = raw_input('Enter something --> ') Enter something --> Traceback (most recent call last): File "<stdin>", line 1, in <module> EOFError
Python raises an error called
EOFError
which basically means it found an end of file symbol
(which is represented by ctrl-d
) when it did not expect to see it.16.3. Handling Exceptions
We can handle exceptions using the
try..except
statement. We basically put our usual statements
within the try-block and put all our error handlers in the except-block.
Example (save as
exceptions_handle.py
):try:
text = raw_input('Enter something --> ')
except EOFError:
print 'Why did you do an EOF on me?'
except KeyboardInterrupt:
print 'You cancelled the operation.'
else:
print 'You entered {}'.format(text)
Output:
# Press ctrl + d $ python exceptions_handle.py Enter something --> Why did you do an EOF on me? # Press ctrl + c $ python exceptions_handle.py Enter something --> ^CYou cancelled the operation. $ python exceptions_handle.py Enter something --> No exceptions You entered No exceptions
How It Works
We put all the statements that might raise exceptions/errors inside the try
block and then put
handlers for the appropriate errors/exceptions in the except
clause/block. The except
clause
can handle a single specified error or exception, or a parenthesized list of errors/exceptions. If
no names of errors or exceptions are supplied, it will handle all errors and exceptions.
Note that there has to be at least one
except
clause associated with every try
clause. Otherwise, what’s the point of having a try block?
If any error or exception is not handled, then the default Python handler is called which just
stops the execution of the program and prints an error message. We have already seen this in action
above.
You can also have an
else
clause associated with a try..except
block. The else
clause is
executed if no exception occurs.
In the next example, we will also see how to get the exception object so that we can retrieve
additional information.
16.4. Raising Exceptions
You can raise exceptions using the
raise
statement by providing the name of the error/exception
and the exception object that is to be thrown.
The error or exception that you can raise should be a class which directly or indirectly must be a
derived class of the
Exception
class.
Example (save as
exceptions_raise.py
):class ShortInputException(Exception):
'''A user-defined exception class.'''
def __init__(self, length, atleast):
Exception.__init__(self)
self.length = length
self.atleast = atleast
try:
text = raw_input('Enter something --> ')
if len(text) < 3:
raise ShortInputException(len(text), 3)
# Other work can continue as usual here
except EOFError:
print 'Why did you do an EOF on me?'
except ShortInputException as ex:
print ('ShortInputException: The input was ' + \
'{0} long, expected at least {1}')\
.format(ex.length, ex.atleast)
else:
print 'No exception was raised.'
Output:
$ python exceptions_raise.py Enter something --> a ShortInputException: The input was 1 long, expected at least 3 $ python exceptions_raise.py Enter something --> abc No exception was raised.
How It Works
Here, we are creating our own exception type. This new exception type is called
ShortInputException
. It has two fields - length
which is the length of the given input, and
atleast
which is the minimum length that the program was expecting.
In the
except
clause, we mention the class of error which will be stored as
the variable name
to hold the corresponding error/exception object. This is analogous to parameters and arguments in
a function call. Within this particular except
clause, we use the length
and atleast
fields of
the exception object to print an appropriate message to the user.16.5. Try … Finally
Suppose you are reading a file in your program. How do you ensure that the file object is closed
properly whether or not an exception was raised? This can be done using the
finally
block.
Save this program as
exceptions_finally.py
:import sys
import time
f = None
try:
f = open("poem.txt")
# Our usual file-reading idiom
while True:
line = f.readline()
if len(line) == 0:
break
print line,
sys.stdout.flush()
print "Press ctrl+c now"
# To make sure it runs for a while
time.sleep(2)
except IOError:
print "Could not find file poem.txt"
except KeyboardInterrupt:
print "!! You cancelled the reading from the file."
finally:
if f:
f.close()
print "(Cleaning up: Closed the file)"
Output:
$ python exceptions_finally.py Programming is fun Press ctrl+c now ^C!! You cancelled the reading from the file. (Cleaning up: Closed the file)
How It Works
We do the usual file-reading stuff, but we have arbitrarily introduced sleeping for 2 seconds after
printing each line using the time.sleep
function so that the program runs slowly (Python is very
fast by nature). When the program is still running, press ctrl + c
to interrupt/cancel the
program.
Observe that the
KeyboardInterrupt
exception is thrown and the program quits. However, before the
program exits, the finally clause is executed and the file object is always closed.
Note that we use
sys.stdout.flush()
after print
so that it prints to the screen immediately.16.6. The with statement
Acquiring a resource in the
try
block and subsequently releasing the resource in the finally
block is a common pattern. Hence, there is also a with
statement that enables this to be done in
a clean manner:
Save as
exceptions_using_with.py
:with open("poem.txt") as f:
for line in f:
print line,
How It Works
The output should be same as the previous example. The difference here is that we are using the
open
function with the with
statement - we leave the closing of the file to be done
automatically by with open
.
What happens behind the scenes is that there is a protocol used by the
with
statement. It fetches
the object returned by the open
statement, let’s call it "thefile" in this case.
It always calls the
thefile.enter
function before starting the block of code under it and
always calls thefile.exit
after finishing the block of code.
So the code that we would have written in a
finally
block should be taken care of automatically
by the exit
method. This is what helps us to avoid having to use explicit try..finally
statements repeatedly.
More discussion on this topic is beyond scope of this book, so please refer
PEP 343 for a comprehensive explanation.
16.7. Summary
We have discussed the usage of the
try..except
and try..finally
statements. We have seen how to
create our own exception types and how to raise exceptions as well.
Next, we will explore the Python Standard Library.
17. Standard Library
The Python Standard Library contains a huge number of useful modules and is part of every standard
Python installation. It is important to become familiar with the Python Standard Library since many
problems can be solved quickly if you are familiar with the range of things that these libraries
can do.
We will explore some of the commonly used modules in this library. You can find complete details
for all of the modules in the Python Standard Library in the
'Library Reference' section of the documentation that comes with
your Python installation.
Let us explore a few useful modules.
Caution
|
If you find the topics in this chapter too advanced, you may skip this chapter. However, I highly recommend coming back to this chapter when you are more comfortable with programming using Python. |
17.1. sys module
The
sys
module contains system-specific functionality. We have already seen that the sys.argv
list contains the command-line arguments.
Suppose we want to check the version of the Python software being used, the
sys
module gives us
that information.$ python >>> import sys >>> sys.version_info sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0) >>> sys.version_info.major == 2 True
How It Works
The sys
module has a version_info
tuple that gives us the version information. The first entry
is the major version. We can pull out this information to use it.17.2. logging module
What if you wanted to have some debugging messages or important messages to be stored somewhere so
that you can check whether your program has been running as you would expect it? How do you "store
somewhere" these messages? This can be achieved using the
logging
module.
Save as
stdlib_logging.py
:import os, platform, logging
if platform.platform().startswith('Windows'):
logging_file = os.path.join(os.getenv('HOMEDRIVE'),
os.getenv('HOMEPATH'),
'test.log')
else:
logging_file = os.path.join(os.getenv('HOME'),
'test.log')
print "Logging to", logging_file
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s : %(levelname)s : %(message)s',
filename = logging_file,
filemode = 'w',
)
logging.debug("Start of the program")
logging.info("Doing something")
logging.warning("Dying now")
Output:
$ python stdlib_logging.py Logging to /Users/swa/test.log $ cat /Users/swa/test.log 2014-03-29 09:27:36,660 : DEBUG : Start of the program 2014-03-29 09:27:36,660 : INFO : Doing something 2014-03-29 09:27:36,660 : WARNING : Dying now
If you do not have the
cat
command, then you can just open the test.log
file in a text editor.
How It Works
We use three modules from the standard library - the os
module for interacting with the operating
system, the platform
module for information about the platform i.e. the operating system and the
logging
module to log information.
First, we check which operating system we are using by checking the string returned by
platform.platform()
(for more information, see import platform; help(platform)
). If it is
Windows, we figure out the home drive, the home folder and the filename where we want to store the
information. Putting these three parts together, we get the full location of the file. For other
platforms, we need to know just the home folder of the user and we get the full location of the
file.
We use the
os.path.join()
function to put these three parts of the location together. The reason
to use a special function rather than just adding the strings together is because this function
will ensure the full location matches the format expected by the operating system.
We configure the
logging
module to write all the messages in a particular format to the file we
have specified.
Finally, we can put messages that are either meant for debugging, information, warning or even
critical messages. Once the program has run, we can check this file and we will know what happened
in the program, even though no information was displayed to the user running the program.
17.3. Module of the Week Series
There is much more to be explored in the standard library such as
debugging,
handling command line options,
regular expressions and so
on.
The best way to further explore the standard library is to read Doug Hellmann’s excellent
Python Module of the Week series (also available as a
book) and reading the Python documentation.
17.4. Summary
We have explored some of the functionality of many modules in the Python Standard Library. It is
highly recommended to browse through the Python Standard Library
documentation to get an idea of all the modules that are available.
Next, we will cover various aspects of Python that will make our tour of Python more complete.
18. More
So far we have covered a majority of the various aspects of Python that you will use. In this
chapter, we will cover some more aspects that will make our knowledge of Python more well-rounded.
18.1. Passing tuples around
Ever wished you could return two different values from a function? You can. All you have to do is
use a tuple.
>>> def get_error_details(): ... return (2, 'details') ... >>> errnum, errstr = get_error_details() >>> errnum 2 >>> errstr 'details'
Notice that the usage of
a, b = <some expression>
interprets the result of the expression as a
tuple with two values.
This also means the fastest way to swap two variables in Python is:
>>> a = 5; b = 8 >>> a, b (5, 8) >>> a, b = b, a >>> a, b (8, 5)
18.2. Special Methods
There are certain methods such as the
init
and del
methods which have special
significance in classes.
Special methods are used to mimic certain behaviors of built-in types. For example, if you want to
use the
x[key]
indexing operation for your class (just like you use it for lists and tuples),
then all you have to do is implement the getitem()
method and your job is done. If you think
about it, this is what Python does for the list
class itself!
Some useful special methods are listed in the following table. If you
want to know about all the special methods,
see the manual.
init(self, …)
- This method is called just before the newly created object is returned for usage.
del(self)
- Called just before the object is destroyed (which has unpredictable timing, so avoid using this)
str(self)
- Called when we use the
print
statement or whenstr()
is used. lt(self, other)
- Called when the less than operator (<) is used. Similarly, there are special methods for all the operators (+, >, etc.)
getitem(self, key)
- Called when
x[key]
indexing operation is used. len(self)
- Called when the built-in
len()
function is used for the sequence object.
18.3. Single Statement Blocks
We have seen that each block of statements is set apart from the rest by its own indentation
level. Well, there is one caveat. If your block of statements contains only one single statement,
then you can specify it on the same line of, say, a conditional statement or looping statement. The
following example should make this clear:
>>> flag = True >>> if flag: print 'Yes' ... Yes
Notice that the single statement is used in-place and not as a separate block. Although, you can
use this for making your program smaller, I strongly recommend avoiding this short-cut method,
except for error checking, mainly because it will be much easier to add an extra statement if you
are using proper indentation.
18.4. Lambda Forms
A
lambda
statement is used to create new function objects. Essentially, the lambda
takes a
parameter followed by a single expression only which becomes the body of the function and the value
of this expression is returned by the new function.
Example (save as
more_lambda.py
):points = [ { 'x' : 2, 'y' : 3 },
{ 'x' : 4, 'y' : 1 } ]
points.sort(key=lambda i : i['y'])
print points
Output:
$ python more_lambda.py [{'y': 1, 'x': 4}, {'y': 3, 'x': 2}]
How It Works
Notice that the sort
method of a list
can take a key
parameter which determines how the list
is sorted (usually we know only about ascending or descending order). In our case, we want to do a
custom sort, and for that we need to write a function but instead of writing a separate def
block
for a function that will get used in only this one place, we use a lambda expression to create a
new function.18.5. List Comprehension
List comprehensions are used to derive a new list from an existing list. Suppose you have a list of
numbers and you want to get a corresponding list with all the numbers multiplied by 2 only when the
number itself is greater than 2. List comprehensions are ideal for such situations.
Example (save as
more_list_comprehension.py
):listone = [2, 3, 4]
listtwo = [2*i for i in listone if i > 2]
print listtwo
Output:
$ python more_list_comprehension.py [6, 8]
How It Works
Here, we derive a new list by specifying the manipulation to be done (2*i) when some condition
is satisfied (if i > 2
). Note that the original list remains unmodified.
The advantage of using list comprehensions is that it reduces the amount of boilerplate code
required when we use loops to process each element of a list and store it in a new list.
18.6. Receiving Tuples and Dictionaries in Functions
There is a special way of receiving parameters to a function as a tuple or a dictionary using the
* or ** prefix respectively. This is useful when taking variable number of arguments in the
function.
>>> def powersum(power, *args): ... '''Return the sum of each argument raised to the specified power.''' ... total = 0 ... for i in args: ... total += pow(i, power) ... return total ... >>> powersum(2, 3, 4) 25 >>> powersum(2, 10) 100
Because we have a * prefix on the
args
variable, all extra arguments passed to the function are
stored in args
as a tuple. If a ** prefix had been used instead, the extra parameters would be
considered to be key/value pairs of a dictionary.18.7. The assert statement
The
assert
statement is used to assert that something is true. For example, if you are very sure
that you will have at least one element in a list you are using and want to check this, and raise
an error if it is not true, then assert
statement is ideal in this situation. When the assert
statement fails, an AssertionError
is raised.>>> mylist = ['item'] >>> assert len(mylist) >= 1 >>> mylist.pop() 'item' >>> assert len(mylist) >= 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError
The
assert
statement should be used judiciously. Most of the time, it is better to catch
exceptions, either handle the problem or display an error message to the user and then quit.18.8. Decorators
Decorators are a shortcut to applying wrapper functions. This is helpful to "wrap" functionality
with the same code over and over again. For example, I created a
retry
decorator for myself that
I can just apply to any function and if any exception is thrown during a run, it is retried again,
till a maximum of 5 times and with a delay between each retry. This is especially useful for
situations where you are trying to make a network call to a remote computer:from time import sleep
from functools import wraps
import logging
logging.basicConfig()
log = logging.getLogger("retry")
def retry(f):
@wraps(f)
def wrapped_f(*args, **kwargs):
MAX_ATTEMPTS = 5
for attempt in range(1, MAX_ATTEMPTS + 1):
try:
return f(*args, **kwargs)
except:
log.exception("Attempt %s/%s failed : %s",
attempt,
MAX_ATTEMPTS,
(args, kwargs))
sleep(10 * attempt)
log.critical("All %s attempts failed : %s",
MAX_ATTEMPTS,
(args, kwargs))
return wrapped_f
counter = 0
@retry
def save_to_database(arg):
print "Write to a database or make a network call or etc."
print "This will be automatically retried if exception is thrown."
global counter
counter += 1
# This will throw an exception in the first call
# And will work fine in the second call (i.e. a retry)
if counter < 2:
raise ValueError(arg)
if __name__ == '__main__':
save_to_database("Some bad value")
Output:
$ python more_decorator.py Write to a database or make a network call or etc. This will be automatically retried if exception is thrown. ERROR:retry:Attempt 1/5 failed : (('Some bad value',), {}) Traceback (most recent call last): File "more_decorator.py", line 14, in wrapped_f return f(*args, **kwargs) File "more_decorator.py", line 39, in save_to_database raise ValueError(arg) ValueError: Some bad value Write to a database or make a network call or etc. This will be automatically retried if exception is thrown.
How It Works
No comments:
Post a Comment