Python Basics


  • Eclipse and PyDev
In preparation for installing eclipse, we need a version of Java JDK installed on the system. The current Java versions include 8, 9 and 10; however version 8 is the most safe on our systems as other installations may depend on that version. Install JDK package by:
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt update
$ sudo apt install oracle-java8-installer
You have to acknowledge the code license.

You can actually have multiple JDK versions on your system and switch between them by:
$ update-java-alternatives -l              // see the alternatives
$ sudo update-alternatives --config java   // choose one

  • Install Eclipse
Ubuntu provides Eclipse and PyDev as packages, but in the past I have not been able to get them to provide the full PyDev functionality, so I use the packages available from the eclipse home site:
Within the Eclipse downloads available from this site there are many choices. We will use the basic Java Developers version. You can download it here from the Computer Science server:
The archive extracts to the directory eclipse and we want it moved to
/usr/local/eclipse
Assuming it downloaded into the Downloads directory, you can achieve the desired outcome in this one liner:
$ sudo tar xzf ~/Downloads/eclipse-jee-oxygen-3a-linux-gtk-x86_64.tar.gz -C /usr/local/

Create a launcher and start it
Create the file (as root):
/usr/share/applications/eclipse.desktop

[Desktop Entry]
Type=Application
Version=1.0
Name=Eclipse
Exec=/usr/local/eclipse/eclipse %F
Icon=/usr/local/eclipse/icon.xpm
Terminal=false
Categories=GTK;Development;IDE;
StartupNotify=true
select
Eclipse should then appear in your Applications Programming menu.

Start it up. It asks for a project directory, the standard one being:
eclipse-workspace
Always take this choice to match up with the usage in the documents.
Eclipse-based plugin installation
In Eclipse, plugins are available through one of these choices:
Help Install New Software 
Help Eclipse Marketplace
In the former, you get a drop-down, entry box:
Work with:
to which you add a URL which summons up the relevant software. The Marketplace is easier because it will find the relevant software from a keyword search.
Install Python Pydev
For sake of PyDev, first install this package:
$ sudo apt install python3-pip
Open Eclipse MarketPlace and enter this into the Find box:
Find: pydev
One entry should come up:
PyDev - Python IDE for Eclipse.
Click Install and then follow through:
  1. Two software choices are selected. Click Confirm.
  2. Review Licenses: 
  3. I accept ... 
  4. Finish.
  5. Security warning: OK.
At the end, restart Eclipse.
PyDev Configuration
Go through Window Preferences and select
PyDev Interpreters Python Interpreter
Click the New... button.

Add this information:
Interpreter Name:       python3
Interpreter Executable: /usr/bin/python3
Keying in the correct executable will cause immediate recognition of the python installation.

Click Apply to activate. Click OK to leave.

Selection Needed popup. Click OK. Every time Python software is added, you must go through the last step this procedure which constructs the PYTHONPATH environment variable thereby making it be recognized by PyDev.
Eclipse Color Themes
Controlling the background/foreground colors in Eclipse is unnecessarily tricky. It's easiest to choose a color theme, however this feature must be installed. From Eclipse MarketPlace and enter this into the Find box:
Find: Color Theme
One entry should come up:
Eclipse Color Theme.
Click Install and then follow through.

Locate the Color Theme selection in
Window Preferences General Appearance Color Theme
Eclipse Font Selection
One item below the Color Theme selection is
Colors and Fonts
You probably don't need to change the colors from here, but you may want to change the font. In the Basic category, choose Text Font. Click the Edit button and modify the font selection and/or size.
Control-C Copying
I'm not exactly sure this is necessary; it's something I needed in the past. Just keep the idea in mind and come back here if necessary. I had to do this odd step to make Control-C successfully copy text in PyDev:
Window Preferences Java Editor Typing
Uncheck Update imports and restart Eclipse.
Python on Linux systems
Python is a standard part Linux systems. It exists in two forms:
  • Python2: frozen in time with development stopped at 2.7
  • Python3: the modern version, not backwardly compatible
The compatibility issue means that Python2 programs are likely to not work using Python3, even for the most basic things like the print operation. You have to explicitly refer to python3 to use it. We'll stick with Python2 since some key software used is still written in that version.

In contrast to Bash, Python is a complete programming language intended to work in any platform. Importable modules can be employed to extend Python's capabilities to handle any programming need. Python's most singular syntactic feature is that program blocks are understood solely through indentation in contrast to ending tokens. Python bears some syntactic similarities to JavaScript, for example:
  • values have types, but variables do not
  • variables must be defined before being used
  • newline acts like a statement terminator
Python is a pure object-oriented language in that everything in Python is an object, including the scalar types.

Python syntax is very clean, effective and fairly minimal. In comparison, Perl (although I like this language) is bloated with syntax and operators on the order of Bash. Moreover, Perl's notion of object-oriented construction gives the impression of an "completely hacked add-on."
Tab settings for Python use in Linux Editors
In Eclipse PyDev a TAB is, by default, equivalent to using 4 spaces, so I recommend that you ensure that all the editors you use adhere to this specification. Some examples:
  • vim: Make this setting in ~/.vimrc: set tabstop=4 shiftwidth=4 expandtab
  • nano: Make these settings in ~/.nanorc: set tabsize 4 
  • set tabstospaces
  • gedit: Open Edit Preferences Editor. Set Tab width: 4
  • Insert spaces instead of tabs.
  • Enable automatic indentation.
  • geany: From the menu choose:
    • Document Spaces
    • Document Replace Tabs With Spaces
    • Editor Preferences Editor Indentation width = 4\

  • Python indentation
Dealing with Python's indentation takes some planning. In Python, the first line of a block defines the "indentation" for that block (possibly empty). All subsequent statements within that block must use the same indentation. Thus both of these are incorrect:
x = 3
  y = 2
x = 3
if x == 3:
    y = 2
  z = 4
In the latter case, a fix would align the "z = 4" with either the "if" or the "y = 2", giving two possible programs with different behaviors. Thus you can see why there cannot be a "Format the Source" feature in a Python Editor! In contrast, despite any visual defects, the following script is OK:
x = 3
if x == 3:
        y = 2
else:
  z = 4
A more serious issue is that correct indentation is not just visual. For example, if a tab is of size 4, but is not replaced by blank spaces, then this will be wrong:
x = 3
if x == 3:
    y = 2    # type TAB to statement
    z = 4    # type 4 spaces to statement

  • Running Python as an interpreter
One very important characteristic which makes it similar to Bash is the ability to run as an interpreter as well as a script language. See:
The interpreter is activated by calling "python" with no arguments. This is the favorite manner to illustrate the behavior of python language features in documentation. The interpreter acts in "echo mode" in values of expressions automatically printed without need of a print statement. You can use this a simple on-line calculator. Here is an sample run:
$ python3
  ..........
>>> 2+3
5
>>> "hello world"
'hello world'
>>> print("hello world")
hello world
>>> x = 5
>>> x
5
>>> if x == 5:
...   print("yes:5")
... 
yes:5
>>> if x == 6:
...      print("yes:6")
... 
>>> quit()                   (or Ctrl-D)
One of the most basic differences between Python2 and Python3 is seen right here with the print usage:
Python2:  print "hello"
Python3:  print("hello")

Internal Documentation
Python itself provides much of the documentation available on what operations are available. For example, we can use the dir function with this interpreted content:
$ python3
>>> dir(int)            member operations available to int, float, str types
>>> dir(float)
>>> dir(str)
>>> import os
>>> dir(os)             importable entities from the os module
Going further, we have additional module documentation:
>>> print(os.__doc__)    basic documentation
>>> print(help(os))      extensive man-page documentation using a pager

Running a Python script
The most basic "hello world" python script, hello.py, is this one-liner:
print("Hello World")
The print statement functions like the Bash echo statement in that a newline is automatically appended. We can run this explicitly with the interpreter:
$ python hello.py
With preparation, we can also run this script by itself:
$ ./hello.py
What is necessary for this to happen are the following modifications:
  1. add an initial "shebang" line. The favorite is this: #!/usr/bin/env python3
  2. print("Hello World")                  


    but we could just as well use this
    #!/usr/bin/python3
  3. print("Hello World")                  


    The presumed advantage of the former is that it will
    /usr/bin/env will find the "preferred" python installation as the first occurrence in the system PATH.
  4. Make the script executable: $ chmod +x hello.py

  • Install Python Basics
Download the source archive python_basics.zip. Extract the archive into the eclipse-workspace directory:
$ unzip ~/Downloads/python_basics.zip -d ~/eclipse-workspace/
Install python_basics as a PyDev project in Eclipse. Right-click on the Package Explorer window:
  1. New Project PyDev PyDev Project, then Next.
  2. Set Project Name to python_basics. Should see: Project location contains existing Python files. The created project will include them.
    Click Finish.
  3. Open Associated Perspective? Check Remember my decision
    Then click Yes.
The python files within a PyDev project can be executed through PyDev by selecting the file through right-click and choosing Run As Python Run. Alternatively, simply navigate a terminal shell to the directory and use that to run scripts, because the built-in run feature doesn't work well if the script needs arguments.
Modules
PyDev refers to python files as modules. There also the notion of a package which we'll see later. If you right-click on the python_basics project line, you'll see these two choice from the New submenu.

The python import is a way of making available the code from other python modules to augment the functionality. One sees imports typically in one of two forms:
  • import my_module

  • from my_module import my_entity

Python refers to the entities imported as attributes.

Like Java's CLASSPATH, Python can refer to an external PYTHONPATH to find directories where modules are to be located. Python can also look in a sys.path variable for such directories.

On an import, python looks for either the compiled version my_module.pyc or the source version, my_module.py. If the source version has a later modification date (recently edited), then it compiles the .py file and replaces the older .pyc file. After this determination, the package my_pack is included into the program in such a way that it is actually only included once despite possibly being referenced multiple times through other imports.

The difference between the import styles (a) and (b) above is how one would use my_entity:
  • In the former case, we have to preface it by the package name, e.g., my_module.my_entity(...)

  • In the latter case, we do not need the package name: my_entity(...)

Importing vs. Executing
Like Java and other languages, the same file types serve for executable programs as well as repositories for classes and other data which can be imported into other scripts. Here are two Python scripts which illustrate the difference between execution and importation. The main difference is that the special variable __name__ recognizes the importing module whereas for execution, __name__ is "__main__". Recognizing these differences allows the module script to behave differently when executed versus imported.

hello_module.py#!/usr/bin/env python3

def saySomething():
    print("calling saySomething function")

print("Activation module: " + __name__) 

if __name__ == "__main__":
    print(__file__ + " running as a script")

call_hello_module.py#!/usr/bin/env python3

from hello_module import saySomething

saySomething()

Try running both:
$ python3 hello_module.py
Activation module: __main__
hello_module.py running as a script

$ python3 call_hello_module.py
Activation module: hello_module
calling saySomething function
As indicated, the "main" section is only activated when the module is run directly. When the module is imported, the main block is not executed.

Packages
A Python module is more-or-less what we consider to be a program or script. When imported it gives access to attributes defined within via a "dotted-access" format, like this:
my_module.attribute()
Functions are one sort of attribute which can be defined in a module.

A Python package represents a group of modules within a directory, or within subdirectories. Access to each module or sub-package also use the "." construction. See:
In this example, we have a the following directory structure:
mypack/
  __init__.py
  moduleA.py
  moduleB.py
The __init__.py files are required. They indicate that the directory is a Python package which provides access to submodules:
mypack.moduleA
mypack.moduleB
If all you want is access to these submodules, then __init__.py can be empty. If, however, you want the name mypack to act like a module, the code for it goes into the __init__.py module.

In this example all 3 files, including __init__.py, have identical code:

mypack/__init__.pydef saySomething():
    print("calling saySomething from " + __name__)
If you open __init__.py, you'll see that PyDev recognizes this file as something special and names the editor tab according to the directory, thereby avoiding confusion if there are more than one such file open.

The file which activates all of them is this:

call_mypack.py#!/usr/bin/env python3

#import mypack        # not necessary with next two
import mypack.moduleA
import mypack.moduleB

mypack.saySomething()            # from __init__.py
mypack.moduleA.saySomething()
mypack.moduleB.saySomething()
The test run goes like this:
$ python3 call_mypack.py 
calling saySomething from mypack
calling saySomething from mypack.moduleA
calling saySomething from mypack.moduleB
What gets confusing in package usage that the import statement must still refer to a module and the module, by default, uses the full path name. For example these three are both wrong:
  1. This one gets flagged as a syntax error: import mypack.moduleA
  2. moduleA.saySomething()


    The module must use the full name
    mypack.moduleA unless an alias is given.
  3. Although syntactically correct, this one is flagged as a runtime error. 
  4. import mypack 
  5. mypack.moduleA.saySomething()


    We actually need to import
    moduleA per se, not the package.
You can, however, simplify the usage presentation like this with an alias:
import mypack.moduleA as moduleA

moduleA.saySomething()
or like this to get the attribute directly:
from mypack.moduleA import saySomething

saySomething()
Subpackages
The package idea extends into subpackages, i.e., we could continue:
mypack/
  subpack/
    __init__.py
    moduleC.py
giving us access to the sub-package via the syntax:
mypack.subpack
and submodules of subpack by:
mypack.subpack.moduleC
Scalars
Python3 has numeric types: int, float, complex. Python2 has a separate long type as well. The float type has at least double precision (like double in C/Java). Python also has a bool type with constants True and False.
Python has a special object None which acts like null in JavaScript and other languages; however x = None does not at all as if x were undefined. Python has no explicit test for "definedness" of a variable; it is considered unusual in Python to test whether a variable is defined or not.
Python strings are created by single, double, or triple-single, or triple-double quotes. These are good sites:
Try these little experiments. Start a Python3 interpreter, select, copy/paste + return:

type(1)
type(1.1)
type(True)
type(None)
select

x = 'a string'
y = "another 'string'"
z = '''yet another 
string'''
w = """a 'fourth' "example" \ of a string\n
string"""
x
y
z
w
print(x)
print(y)
print(z)
print(w)
select
All quotes are equal, except the triple versions permit embedded newlines (very useful). In Python, triple quotes take serve to comment regions, replacing the C-style /* ... */.

Substrings (and individual characters) use bracketing with ranges like x[2] and x[2:8]. String concatenation is done with the "+" operator, except that Python does not coerce numeric types to string; for example, these are errors:
"x" + 2
"y" + 3.3
You can use explicit casting to disambiguate "+". In general it is a good idea to use Python's excellent string format operation if you want to combine the values of variables into a single string. Here is a demo program:

scalars.py#!/usr/bin/env python3

a = 'abcdefghijklm'
x = "7"
y = 5

print("a =", a)
print("a[2] =", a[2])
print("a[2:8] =", a[2:8])
print()
print("x =", x, "   : ", type(x))
print("y =", y, "   : ", type(y))
print('x + str(y) = ', x + str(y))
print('int(x) + y = ', int(x) + y)
print()
print('"|{}|{}|".format(x,y) =         ', "|{}|{}|".format(x,y))
print('"|{:05d}|{: >5}|".format(x,y) = ', "|{:05d}|{: >5}|".format(y,x))

import sys

print("------------------------------------------")

# for some reason, Python decided to make prompt-style printing difficult
print("type something: ", end="", flush=True)
# sys.stdout.write("type something: "); sys.stdout.flush()

line = sys.stdin.readline();
print("|{}|".format(line))
print("------------------------------------------")
print("|{}|".format(line.strip()))
Regarding Python's printf-style format operator, the {...} expressions represent argument insertion points. A literal "{" or "}", is gotten by "{{" or "}}", respectively. The minimal form is simply {}, but you can use an more complex version like this:
{position:format-info}
Without the position, arguments are taken in order (0,1,...), but we can take them out of order by making the position explicit. For example, consider this run:
>>> x = 'foo'; y = 33
>>> print("|{:05d}|{: >5}|".format(y,x))
|00033|  foo|
>>> print("|{1:05d}|{0: >5}|".format(x,y))
|00033|  foo|
>>> print("{}--{}".format('foo',33))
foo--33
>>> print("{1}--{0}".format(33,'foo'))
foo--33
Controls
Python's control structure syntax is quite different from C-style. See:
The tokens used to delimit sections are similar to those used in Bash
if expression:
   ...
elif expression:
   ...
else:
   ...
The biggest difference is the indentation requirements. The if/elif/else tokens must be at the same indentation level and there is no "fi" equivalent. Some other points are these:
  • and, or, not are the boolean operators
  • the relational operators ==, !=, <, >, <=, >= all "do the right thing" based on the operand types;
  • relational operators with operands of mixed string/numeric should be avoided.
  • the "false" expressions are False (boolean), "" (empty string), None, and the numeric zero values: 0, 0.0
  • the sub-blocks cannot be empty; a special statement, pass, can be used as a no-operation statement
Here is a demo program:

controls.py#!/usr/bin/env python3

a = 0; b = ""; c = False; d = 0.0; e = None; f = "0"

print("a = ", a, type(a))
if a: print("true")
else: print("false")

print("b = ", b, type(b))
if b: print("true"); 
else: print("false")

print("c = ", c, type(c))
if c: print("true"); 
else: print("false")

print("d = ", d, type(d))
if d: print("true"); 
else: print("false")

print("e = ", e, type(e))
if e: print("true"); 
else: print("false")

print("f = ", f, type(f))
if f: print("true"); 
else: print("false")

print()

x = '123'; y = '57'
if x < y: 
    print("true: {} < {} as strings".format(x,y))
else
    print("false: {} < {} as strings".format(x,y))

u = int(x); v = int(y)
if u < v:
    print("true: {} < {} as ints".format(u,v))
else
    print("false: {} < {} as ints".format(u,v))

print()

if '2' == 2
    print('2(string)=2(int): yes')
else
    print('2(string)=2(int): no')

a = 22; b = 33;
if a < 50 and b > 50:
    print("1")
elif a > 50 or b > 50:
    pass
else:
    print("3")
Lists
Like other programming languages, Python supplies data structures for lists. Usually we think of lists as indexed lists, i.e, the association of an an element to an integer position. However, like Java, Python provides additional list-like structures through its collections module. Regarding indexed lists, there are two kinds:
  • lists: they use the syntax L=[a,b,c]. The syntax L[index] gives the element at position index. Out-of-ranges indexes cause errors. These support all "list-like" operations you can think of including insertion, deletion, reassignment of index values with the typical syntax L[index] = new_value.
  • tuples: they use syntax: (a, b, c). You can consider them as non-mutable lists in that the contents cannot be altered by reassignment, deletion, etc.
Some points about their usage:
  • The "+" operator acts as list concatenation.
  • The for construct is used to iterates over lists and tuples for x in L:
  •     ...

  • Lists and tuples can be accessed by slices of the form L[a:b] where a:b gives the inclusive/exclusive index range. Either endpoint can be omitted, defaulting to the lowest/highest index value; e.g. L[:4] is L[0:4].
  • The slice notation is useful even with both endpoints empty, generating a copy of the list: M = L        # M is a reference to L
  • M = L[:]     # M is a copy of L

Here is a demo program:

lists.py#!/usr/bin/env python3

t = ( 44, 'x', 3.5, )
s = 1,2,3,55,66,77,88
l = [ 44, 'x', 3.5, 'str']
r = range(19,31)          # inclusive/exclusive on arguments
q = range(19,31,2)        # interval skip

print("t =", t, type(t), "len=", len(t))
print("s =", s, type(s), "len=", len(s))
print()
print("l =", l, type(l), "len=", len(l))
print()

print("r = ", r, type(r), "len=", len(r))
for i in r: print(i, " ", end="")         # iterate
print()
print("list(r)=", list(r))
print()

print("q =", q, type(q), len(q)) 
for i in q: print(i, " ", end="")        # iterate
print()
print("list(q)=", list(q))
print()

print("t[1] l[2] = ", t[1], l[2])
print()

print("--- concatenate")
print("s+t =", s + t)               # tuples
print("l+list(q) =", l + list(q))   # lists
print()

print("--- slices")
print("r[3:7] = ", r[3:7])    # range slices
print("s[2:5] =", s[2:5])     # tuple slice
print();

print("---- iterate")
print("-------------- tuple s")
for x in s: 
    print(x)
print("-------------- list l")
for x in l: 
    print(x)
print("-------------------------------------------")

x,y = t[:2]
print("multiple assignments from tuple/list: {},{}".format(x,y))

l[1:3] = ['hello']
print("assign into list: l =", l)
print()

print([str(x) for x in l])
print("__".join([str(x) for x in l]))
print();

print("l.pop() =", l.pop(), ", l = ", l)
l.append('there')
print("l.append('there'): l =",  l)
l.insert(0, 'test')
print("l.insert(0,'test'): l =",  l)
l.insert(2, 'again')
print("l.insert(2,'again'): l =",  l)
del l[1]
print("del l[1]: l =",  l)
del l[1:3]
print("del l[1:3]: l =",  l)
Maps (Dictionaries)
In Python, map (or associative list, or partial function) is referred to as a dictionary. As of Python 3.6, dictionaries are ordered in that iterative access to entries retain the entry order of the key/value pairs. Thus, a python dictionary equivalent to the Java LinkedHashMap Here is a demo program:

dicts.py#!/usr/bin/env python3

grade = { "A": 4.0, "B": 3.1, "C": 2.0, "D": 1.0 };
print('grade =', grade)
print('grade["A"] =', grade["A"])
grade["B"] = 3.0
print('grade =', grade)
print('grade.keys() = ', grade.keys())
print('grade.values() = ', grade.values())
print()

print('--------- iterate over keys, values')
for key in grade.keys():
    print(key)
for value in grade.values():
    print(value)
print("--------- using items")
for key,value in grade.items():
    print("{}: {}".format(key,value))
print()

weight = dict()        # or weight = {}
weight['John'] = 180
weight['Ellen'] = 135
weight['Joe'] = 185
print('weight =', weight)
del weight['Joe']
print('weight =', weight)
weight.update( {'Dave':200,'Mary':140} )
print('weight =', weight)
print()

for key in ('Ellen','Jane','Joe','Mary'):
    if key in weight: 
        print('contains', key)
    else
        print('not contains', key)
print()

print("---- age: with items as a view")
age = dict( John=50, Ellen=15, Marty=44# using keyword args

entries = age.items()  # capture items

print("---- before: ", end="")
for x in entries: print(x, end="")   # display
print()

age['Paul'] = 18  # change age map
age['Ellen'] = 33

print("---- after:  ", end="")
for x in entries: print(x, end="")
print()
Key points about Python dicts are:
  • They can be initialized with the {} or the dict() operation with keyword args.
  • They can be augmented/modified with the [] construct in the form some_dict[key]=new_value, or with a group of changes using the update member function.
  • Iteration can be done over keys, values, or key/value pairs using the member functions keys(), values(), iteritems(), respectively.
  • Existence is tested with the in boolean operator.
  • Key/value pairs are deleted using the Del function.
Regular expression operations
These web pages give useful descriptions of Python regular expression usage:
There are five principle operations which use regular expressions:
  1. match a string with a pattern, giving a boolean (yes/no) result
  2. extract portions of a string which match a pattern
  3. substitute portions of a string which match a pattern by replacement string
  4. split a string into an array by removing portions which match a pattern
  5. extract (grep) a subarray of array elements which match a pattern
In Python the re module holds all the functionality for regular expression usage. Some points are these:
  • the match functions can be called either from the module level, or from a compiled pattern
  • regular expressions in module-level operations are strings with an "r" prefix, e.g., r'\d+\s*a+'
  • matching is done using either match (from beginning only) or search (anywhere)
  • parenthesized subexpressions are accessed by variations of the .group member function
The first demo program has three parts: a matching-only comparison of match and search, an extraction illustration using the group member function, and replacement.

regex1.py#!/usr/bin/env python3

import re


print('''\
--------------------------------------
match vs. search
--------------------------------------
''')

pattern = r'\w\d{2}'

tests = [ "ABCD474", "A474" ]

for str_to_match in tests:
    # "match" must match at beginning
    if re.match( pattern, str_to_match ):
        print("match: '{}' with pattern {}: yes" . format(str_to_match, pattern))
    else:
        print("match: '{}' with pattern {}: no" . format(str_to_match, pattern))

print("===============================")

for str_to_match in tests:
    # "search" can match anywhere
    if re.search( pattern, str_to_match ):
        print("search: '{}' with pattern {}: yes" . format(str_to_match, pattern))
    else:
        print("search: '{}' with pattern {}: no" . format(str_to_match, pattern))

print("""
--------------------------------------
extract
--------------------------------------
""")

pattern = r"(a+)\s*:\s*(\d+)\s*(\w+)"

str_to_match = " AaA: 272xy7-88";

match = re.search( pattern, str_to_match, re.IGNORECASE )  # @UndefinedVariable
if match:
    print("'{}' matches '{}'".format(str_to_match, pattern))
    print(match.group())
    print(match.group(1))
    print(match.group(2))
    print(match.group(3))
    print(match.groups())
    print(match.group(1,3))
else:
    print("no match")

print("""
--------------------------------------
replacement
--------------------------------------
""")

str_to_match = " 234aaAA  22bbbb  3cc  ";

print("string_to_match = '{}'".format(str_to_match))

#----------------------------

news = re.sub( string = str_to_match, pattern = r"\d+", repl="***" )
print("(0) news = '{}'".format(news))

#----------------------------

def repfunc1(match):
    return "[{}]".format(match.group(0))

news = re.sub( string = str_to_match, pattern = r"\d+", repl=repfunc1 )
print("(1) news = '{}'".format(news))

#----------------------------

def repfunc2(match):
    return "[{}]".format( int(match.group(0)) + 1 )

news = re.sub( string = str_to_match, pattern = r"\d+", repl=repfunc2 )
print("(2) news = '{}'".format(news))

#----------------------------

def repfunc3(match):
    return "[{}]{}".format(int(match.group(1)) + 1, "@" * len(match.group(2)))

news = re.sub( string = str_to_match, pattern = r"(\d+)(a+)", repl=repfunc3, 
               flags=re.IGNORECASE )  # @UndefinedVariable
print("(3) news = '{}'".format(news))
Compiled regular expressions
Programming languages which support regular expression pattern matching do so with an internal compiled version of the pattern. In Python, this compilation is implicit for functions called at the module level (re.match, re.search, re.sub), but Python (like Java) gives the user an explicit object representing a compiled version.

As an example of explicit regular expression compilation, suppose the module-level call were:
match = re.search( r"d+", "a Bc Dd e ddd f DDD", re.IGNORECASE )
then the equivalent usage with explicit compilation is:
cp = re.compile( "d+", re.IGNORECASE )
match = cp.search( "a Bc Dd e ddd f DDD" )
The point is that using the compiled version is more efficient when the same regular expression is used multiple times in a program.

The second program repeats the first two parts of the first program, except using a compiled pattern. In particular the code in the first treats pattern as a regular expression per se:
pattern = r"-a-regular-expression"
The second version treats pattern as an object compiled from a separate regular expression:
regex = r"-a-regular-expression"
pattern = re.compile(regex, ...)
Here is the program:

regex2.py#!/usr/bin/env python3
# the point of this program is to compare it to regex1.py
# in which regular expressions are used directly, whereas here
# the regular expressions are compiled

import re

print("""\
--------------------------------------
match vs. search
--------------------------------------
""")

regex = r'\w\d{2}'
pattern = re.compile(regex)

tests = [ "ABCD474", "A474" ]

for str_to_match in tests:
    if pattern.match( str_to_match ):
        print("match: '{}' with pattern {}: yes" . format(str_to_match, regex))
    else:
        print("match: '{}' with pattern {}: no" . format(str_to_match, regex))


print("===============================")

for str_to_match in tests:
    if pattern.search( str_to_match ):
        print("search: '{}' with pattern {}: yes" . format(str_to_match, regex))
    else:
        print("search: '{}' with pattern {}: no" . format(str_to_match, regex))


print("""
--------------------------------------
extract
--------------------------------------
""")

regex = r"(a+)\s*:\s*(\d+)\s*(\w+)"
pattern = re.compile(regex, re.IGNORECASE)  # @UndefinedVariable

str_to_match = " AaA: 272xy7-88";

match = pattern.search( str_to_match )
if match:
    print("'{}' matches '{}'".format(str_to_match, regex))
    print(match.group())
    print(match.group(1))
    print(match.group(2))
    print(match.group(3))
    print(match.groups())
    print(match.group(1,3))
else:
    print("no match")
A third demo program illustrates using a compiled regular expression to match against all the lines of a file, which in this case, is the previous "controls.py" program.



























regex3.py#!/usr/bin/env python3

import re

pattern = "[a-d].*(true|false)"

test_file = 'controls.py'

print("match lines in file '{}' to pattern '{}'".format(test_file, pattern))

# compiled pattern
cp = re.compile(pattern = pattern, flags = re.IGNORECASE)  # @UndefinedVariable

f = open(test_file)
lines = f.readlines()
f.close()

matching = []
for line in lines:
    line = line.rstrip("\n")
    if cp.search(line):
        matching.append(line)

print("\n----- MATCHING LINES -----")
print("\n".join(matching))
The lines which match the regular expression are put into a list which is printed out.
Functions and Classes
Some key points about Python functions:
  • they cannot be overloaded (no type signature), but they can have default arguments (which must be assigned backward to forward in the argument order)
  • arguments can be passed as keyword argument of the form param=value
  • internal variables have global scope when read and obtain local scope when assigned unless there is an explicit global declaration
Here is a demo program:

funcs.py#!/usr/bin/env python3

def F(a,b):
    print("*** F: a = {}, b = {}".format(a,b))

def G(a,b=100,c=200):
    print("*** G: a = {}, b = {}, c = {}".format(a,b,c))

F(3,5)
F(b=7, a=10)

G(5)
G(5, 10)
G(5, 10, 15)
G(5, c=20, b=30)
#==================================

x = 77

def F2():
    print("*** F2: x = {}".format(x))

def F3():
    x = 33
    print("*** F3: x = {}".format(x))

def F4():
    global x
    x = 99
    print("--> F4: x = {}".format(x))

print("x = {}".format(x))
F2()

F3()
print("x = {}".format(x))

F4()
print("x = {}".format(x))
In Python classes, the usual "this" found in other languages is replaced by "self". Furthermore, it requires explicit usage (like Php, say). Initialized data members are static and also referred to with "self". Python can employ both a static and non-static usage of the same variable name. Here is a demo program:

classes.py#!/usr/bin/env python3

g = 5

class Foo(object):
    s1 = 66         # static
    s2 = g          # static, init from global
    __h = g + 10    # static hidden

    def __init__(self, a=200):  # constructor
        self.mem1 = 100         # non-static member
        self.mem2 = self.s1     # init from static
        self.mem3 = g           # init from global
        self.__hmem = a         # hidden
        self.show()             # call show member function

    def show(self):
        print("  show: s1 =", self.s1)
        print("  show: __h =", self.__h)
        print("  show: mem1 =", self.mem1)
        print("  show: __hmem =", self.__hmem)

    def bar(self):
        self.s1 += 1000     # this separates dynamic and static occurrences
        self.__h += 1000    # same here
        self.__hmem += 1000
        self.mem1 += 1000

print('g =', g)
print('Foo.s1 =', Foo.s1)
print('Foo.s2 =', Foo.s2)
print("--------------------")

print('create foo:')
foo = Foo()    # instantiate, no "new"
print('foo.mem1 =', foo.mem1)
print('foo.mem2 =', foo.mem2)
print('foo.mem3 =', foo.mem3)

# we are trying to access a hidden members
try:
    print(Foo.__h)
except Exception as err:
    print(err)

try:
    print(foo.__hmem)
except Exception as err:
    print(err)

print(vars(foo))
print("------------------------------------------------")
print('create foo1:')
foo1 = Foo(333)

print("foo1.bar()")
foo1.bar()

print("foo1.show()")
foo1.show()

# static occurrences are separate from dynamic ones for same member name

print('Foo.s1 =', Foo.s1)
print('foo1.s1 =', foo1.s1)
print(vars(foo))

print("------------------------------------------------")

class Person(object): 
    pass

joe = Person()
joe.fname = "Joe"
joe.lname = "Jones"
joe.age = 33

print(vars(joe))
The other feature shown in this program is exception handling:
try:
    # Exception-generating code
except Exception as err:
    print(err)
Command-line arguments
The Python argparse module offers a complete solution to the problem of dealing with command-line arguments. See
According to argparse usage, command-line arguments are thought of as non-option positional arguments (the positioning matters) and option arguments (the positioning of these doesn't matter, although the order may matter). Option arguments are either of the single-dash short style such as "-v" or the the double-dash long style such as "--verbose". In addition, a built-in mechanism for creating a help synopsis and usage information for option error usage is provided.

The parser is created by:
parser = argparse.ArgumentParser()
You add argument descriptors to it one-by-one with
parser.add_argument(...)
The first parameter used dictates whether it is an optional argument (starts with a "-") or a positional argument (does not). When all argument descriptors are defined, you put them into effect by parsing the arguments with:
args = parser.parse_args()
Afterwards the member names of the args object are used to provide the values (or perhaps just indication of presence) of the command arguments. The parameters available to parser.add_argument are varied and complex with many, many capabilities of which our simple examples only touch the surface.

args1.py#!/usr/bin/env python3

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('arg1')
parser.add_argument('arg2', nargs='?', default='foo')
parser.add_argument('arg3', nargs='?')
parser.add_argument('-t', '--test', action='store_true')

args = parser.parse_args()

print(args)
print(args.arg1, args.arg2, args.arg3, args.test)
Try out these test usages:
$ ./args1.py -h
$ echo $?                   (success status on correct option usage)
$ ./args1.py
$ echo $?                   (success status on incorrect option usage)
$ ./args1.py aaa
$ ./args1.py aaa bbb
$ ./args1.py aaa bbb ccc
$ ./args1.py -t aaa bbb
$ ./args1.py --test aaa bbb
$ ./args1.py aaa bbb -t
$ ./args1.py aaa -t bb       (options cannot be in the middle of positionals)
Another example is the following which permits a variable number of positional arguments.

args2.py#!/usr/bin/env python3

import argparse

synopsis = """This command takes an arbitrary
number of positional parameters with optional
file output
"""

parser = argparse.ArgumentParser(description = synopsis)

parser.add_argument('infile', nargs='*')
parser.add_argument('-o', metavar='outfile')

args = parser.parse_args()

print(args)
Try out these test usages:
$ ./args2.py -h                  (metavar shows up here)       
$ ./args2.py aaa
$ ./args2.py aaa bbb -o fff

$ ./args2.py -o fff aaa bbb ccc

Comments

Popular posts from this blog

PI Web API and Python

Visit website using proxy server, and with different IP addresses