Chapter 8 String manipulation
String manipulation is one of Python’s strong suites. It comes built in with methods for strings, and the re
module (for regular expressions) ups that power many fold.
Strings are objects that we typically see in quotes. We can also check if a variable is a string.
<class 'str'>
Strings are a little funny. They look like they are one thing, but they can act like lists. In some sense they are really a container of characters. So we can have
13
'Les '
' Mi'
The rules are basically the same as lists. To make this explicit, let’s consider the word ‘bare’. In terms of positions, we can write this out.
index | 0 | 1 | 2 | 3 |
string | b | a | r | e |
neg index | -4 | -3 | -2 | -1 |
We can also slices strings (and lists for that matter) in intervals. So, going back to a
,
'LsMsrbe'
slices every other character.
Strings come with several methods to manipulate them natively.
'White knight'
2
True
'white knight'
'WHITE KNIGHT'
'bullet wound'
'This is my song'
['Hello', ' hello', ' hello']
One of the most powerful string methods is join
. This allows us to take a list of characters, and then
put them together using a particular separator.
'This is my song'
Also recall that we are allowed “string arithmetic”.
'gaffe'
'a a a a a '
8.0.1 String formatting
In older code, you will see a formal format statement.
'Get off my horse!'
'Get off my car!'
This is great for templates.
template_string = """
{country}, our native village
There was a {species} tree.
We used to sleep under it.
"""
print(template_string.format(country='India', species = 'banyan'))
India, our native village
There was a banyan tree.
We used to sleep under it.
Canada, our native village
There was a maple tree.
We used to sleep under it.
In Python 3.6+, the concept of f-strings
or formatted strings was introduced. They can be easier to read, faster and have better performance.
'This is my USA!'
8.1 Regular expressions
Regular expressions are amazingly powerful tools for string search and manipulation. They are available in pretty much every computer language in some form or the other. I’ll provide a short and far from comprehensive introduction here. The website regex101.com is a really good resource to learn and check your regular expressions.
8.1.1 Pattern matching
Syntax | Description |
---|---|
. |
Matches any one character |
^ |
Matches from the beginning of a string |
$ |
Matches to the end of a string |
* |
Matches 0 or more repetitions of the previous character |
+ |
Matches 1 or more repetitions of the previous character |
? |
Matches 0 or 1 repetitions of the previous character |
{m} |
Matches m repetitions of the previous character |
{m,n} |
Matches any number from m to n of the previous character |
\ |
Escape character |
[ ] |
A set of characters (e.g. [A-Z] will match any capital letter) |
( ) |
Matches the pattern exactly |
| |
OR |