Chapter 8 String manipulation

String manipulation is one of Python’s strong suites. It comes built in with methods for strings, and the re module (for regular expressions) ups that power many fold.

Strings are objects that we typically see in quotes. We can also check if a variable is a string.

a = 'Les Miserable'

type(a)
<class 'str'>

Strings are a little funny. They look like they are one thing, but they can act like lists. In some sense they are really a container of characters. So we can have

len(a)
13
a[:4]
'Les '
a[3:6]
' Mi'

The rules are basically the same as lists. To make this explicit, let’s consider the word ‘bare’. In terms of positions, we can write this out.

index 0 1 2 3
string b a r e
neg index -4 -3 -2 -1

We can also slices strings (and lists for that matter) in intervals. So, going back to a,

a[::2]
'LsMsrbe'

slices every other character.

Strings come with several methods to manipulate them natively.

'White Knight'.capitalize()
'White knight'
"It's just a flesh wound".count('u')
2
'Almond'.endswith('nd')
True
'White Knight'.lower()
'white knight'
'White Knight'.upper()
'WHITE KNIGHT'
'flesh wound'.replace('flesh','bullet')
'bullet wound'
' This is my song   '.strip()
'This is my song'
'Hello, hello, hello'.split(',')
['Hello', ' hello', ' hello']

One of the most powerful string methods is join. This allows us to take a list of characters, and then put them together using a particular separator.

' '.join(['This','is','my','song'])
'This is my song'

Also recall that we are allowed “string arithmetic”.

'g' + 'a' + 'f' + 'f' + 'e'
'gaffe'
'a '*5
'a a a a a '

8.0.1 String formatting

In older code, you will see a formal format statement.

var = 'horse'
var2 = 'car'

s = 'Get off my {}!'

s.format(var)
'Get off my horse!'
s.format(var2)
'Get off my car!'

This is great for templates.

template_string = """
{country}, our native village
There was a {species} tree.
We used to sleep under it.
"""

print(template_string.format(country='India', species = 'banyan'))

India, our native village
There was a banyan tree.
We used to sleep under it.
print(template_string.format(country = 'Canada', species = 'maple'))

Canada, our native village
There was a maple tree.
We used to sleep under it.

In Python 3.6+, the concept of f-strings or formatted strings was introduced. They can be easier to read, faster and have better performance.

country = 'USA'
f"This is my {country}!"
'This is my USA!'

8.1 Regular expressions

Regular expressions are amazingly powerful tools for string search and manipulation. They are available in pretty much every computer language in some form or the other. I’ll provide a short and far from comprehensive introduction here. The website regex101.com is a really good resource to learn and check your regular expressions.

8.1.1 Pattern matching

Syntax Description
. Matches any one character
^ Matches from the beginning of a string
$ Matches to the end of a string
* Matches 0 or more repetitions of the previous character
+ Matches 1 or more repetitions of the previous character
? Matches 0 or 1 repetitions of the previous character
{m} Matches m repetitions of the previous character
{m,n} Matches any number from m to n of the previous character
\ Escape character
[ ] A set of characters (e.g. [A-Z] will match any capital letter)
( ) Matches the pattern exactly
| OR