In Python, string is an immutable data type. This means that you cannot modify its value once created. Nevertheless, this doesn’t preclude data processing on string values. In fact, Python provides many tools for manipulating string values, allowing the creation of new strings based on existing ones.
Common String Operators
Python operators can be used to perform basic string manipulations. Here are some common string operators in Python:
Concatenation (+
): Unlike when it is
applied on numeric data, the +
operator concatenates two
strings, combining them into a single string.
str1 = "Hello"
str2 = "Python"
result = str1 + ", " + str2 + "!"
print(result) # Output: Hello, Python!
## Hello, Python!
Repetition (*
): The *
operator is used to repeat a string a certain number of times.
str1 = "Hello "
result = str1 * 3
print(result) # Output: Hello Hello Hello
## Hello Hello Hello
Membership (in
, not in
):
The in
and not in
operators check if a
substring is present or absent in a string.
str1 = "Python"
print("Py" in str1) # Output: True
## True
print("Py" not in str1) # Output: False
## False
Comparison Operators: Comparison operators can also be used to compare strings based on lexicographical order.
str1 = "apple"
str2 = "app"
str3 = "apple pie"
str4 = "APPLE PIE"
str1 == str2 # Equality
## False
str1 != str2 # Inequality
## True
str3 > str4 # Greater than
## True
str3 >= str4 # Greater than or equal to
## True
str3 < str4 # Less than
## False
str3 <= str4 # Less than or equal to
## False
Common String Methods
In object-oriented programming languages like Python, class serves as a blueprint, defining the structure and behavior of objects. For example, the character string is a built-in class in Python1, equipped with a collection of methods-functions that belong to instances of a specific class. These string methods are powerful tools for handling and manipulating strings.
string.split
and string.rsplit()
string.split(sep=None, maxsplit=-1)
string.rsplit(sep=None, maxsplit=-1)
The string.split
method splits
character string from left-side of the separator and returns a list,
while string.rsplit
splits from right-side
of the separator and returns a list.
marx = "Groucho and Harpo and Chico"
marx.split(" and ")
## ['Groucho', 'Harpo', 'Chico']
marx.rsplit(" and ")
## ['Groucho', 'Harpo', 'Chico']
Optionally, you can set the maximum number of the splits with
maxsplit
argument.
marx.split(sep=" and ", maxsplit=1)
## ['Groucho', 'Harpo and Chico']
marx.rsplit(sep=" and ", maxsplit=1)
## ['Groucho and Harpo', 'Chico']
string.join
The string.join
method concatenates multiple character
strings included in an iterable (such as a list, tuple, or even a string
itself) into a single string.
# Syntax for the method is separator.join(iterable)
marx = ["Chico", "Harpo", "Groucho"]
" and ".join(marx)
## 'Chico and Harpo and Groucho'
Note that if the object passing into the join
method
contains a non-character string, Python will throw an error.
my_tuple = ('1', '2', '3', 4)
", ".join(my_tuple)
## TypeError: sequence item 3: expected str instance, int found
strip
, lstrip
, and
rstrip
The strip
, lstrip
, and rstrip
methods are used to return character strings, excluding a
certain expression. The string.strip
excludes
character expression starting from left and right end of the string,
while lstrip
and rstrip
exclude character
expression starting from left and right respectively.
my_palindrome = "madam"
my_palindrome.strip('m')
## 'ada'
my_palindrome.lstrip('m')
## 'adam'
my_palindrome.rstrip('m')
## 'mada'
The character expression passing into the method is the set of characters you want to exclude. Here, Python will exclude all characters in the set, until it reaches any character that is not in the set. So, the order of the expression does not matter:
my_palindrome = "A man, a plan, a canal: Panama."
# Removing characters 'A', 'p', and 'c' from both ends until a non-'A', a non-'p', or a non-'c' chacacter is encountered.
my_palindrome.strip("Apc")
## ' man, a plan, a canal: Panama.'
# Removing characters 'p', 'A', and 'c' from both ends until a non-'A', a non-'p', or a non-'c' chacacter is encountered. (order doesn't matter)
my_palindrome.strip("pAc")
## ' man, a plan, a canal: Panama.'
Here, I passed character set "Apc"
, a set containing
'A'
, 'p'
, and 'c'
. Starting from
the left, Python excludes any matching character until it first met a
non-matching character. Similarly, starting from the right, Python
excludes any matching character from the palindrome. Since it is a set,
even when I passed "pAc"
, Python returns the same
result.
Again, strings are immutable. So, applying strip
methods
will not delete the character from the object: it just returns another
string excluding the character expressions.
my_palindrome
## 'A man, a plan, a canal: Panama.'
Replacing a String
The replace
method returns a string after
“replacing” a specific part of the string. The argument of the
method are existing character expression, new character expression, and
how many times you want to replace. For example:
# Replace 'a' with 'A' three times from the left.
my_palindrome.replace('a', 'A', 3)
## 'A mAn, A plAn, a canal: Panama.'
The example above will return another string after replacing
a
with A
three times. However, as mentioned
earlier, string is an immutable data type, so technically it does not
“replace” any consisting characters in a string.
# No replacement after applying the method
print(my_palindrome)
## A man, a plan, a canal: Panama.
A useful trick to actually replace characters in a string is re-assigning the object after applying the method. For example:
my_palindrome = my_palindrome.replace('a', 'A', 3)
print(my_palindrome)
## A mAn, A plAn, a canal: Panama.
center
, ljust
, and rjust
In Python, the center()
, just()
, and
rjust()
methods are used to pad strings to a certain width.
The center
method centers the string within the specified
width, while the ljust
and rjust
methods left-
and right-justify the string, respectively.
The methods take two arguments: the desired width of the string and the character to use for padding. The default character for padding is a space. For example:
left = "Hello, World!".ljust(20, '>')
right = "Hello, World!".rjust(20, '<')
center = "Hello, World!".center(20)
len(left)
## 20
len(right)
## 20
len(center)
## 20
print(left)
## Hello, World!>>>>>>>
print(right)
## <<<<<<<Hello, World!
print('"', center, '"')
## " Hello, World! "
expandtabs
The expandtabs()
method in Python is used to replace all
tab characters in a string with the appropriate number of spaces. It is
quite useful when you work with an HTML codes. The tab size is specified
by the optional tabsize
argument, which defaults to 8. For
example, the following code will replace all tab characters in the
string "Hello\tworld!"
with the specified number of
spaces:
"Hello\tworld!"
## 'Hello\tworld!'
"Hello\tworld!".expandtabs
## <built-in method expandtabs of str object at 0x000002B9B2E0E730>
"Hello\tworld!".expandtabs()
## 'Hello world!'
"Hello\tworld!".expandtabs(8)
## 'Hello world!'
"Hello\tworld!".expandtabs(1)
## 'Hello world!'
format
The format
method offers an alternative approach to
string formatting, distinct from f-strings. It is invoked on a string
and uses placeholders ({}
) to indicate where the values
should be inserted. Here’s an example:
states = [
'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho',
'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine',
'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi',
'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey',
'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington',
'West Virginia', 'Wisconsin', 'Wyoming'
]
state_capitals = [
'Montgomery', 'Juneau', 'Phoenix', 'Little Rock', 'Sacramento', 'Denver',
'Hartford', 'Dover', 'Tallahassee', 'Atlanta', 'Honolulu', 'Boise',
'Springfield', 'Indianapolis', 'Des Moines', 'Topeka', 'Frankfort', 'Baton Rouge', 'Augusta',
'Annapolis', 'Boston', 'Lansing', 'St. Paul', 'Jackson',
'Jefferson City', 'Helena', 'Lincoln', 'Carson City', 'Concord', 'Trenton',
'Santa Fe', 'Albany', 'Raleigh', 'Bismarck', 'Columbus', 'Oklahoma City',
'Salem', 'Harrisburg', 'Providence', 'Columbia', 'Pierre', 'Nashville',
'Austin', 'Salt Lake City', 'Montpelier', 'Richmond', 'Olympia',
'Charleston', 'Madison', 'Cheyenne'
]
for i in range(50):
formatted_string = "{city} is the capical of {state}".format(city = state_capitals[i],
state = states[i])
print(formatted_string)
## Montgomery is the capical of Alabama
## Juneau is the capical of Alaska
## Phoenix is the capical of Arizona
## Little Rock is the capical of Arkansas
## Sacramento is the capical of California
## Denver is the capical of Colorado
## Hartford is the capical of Connecticut
## Dover is the capical of Delaware
## Tallahassee is the capical of Florida
## Atlanta is the capical of Georgia
## Honolulu is the capical of Hawaii
## Boise is the capical of Idaho
## Springfield is the capical of Illinois
## Indianapolis is the capical of Indiana
## Des Moines is the capical of Iowa
## Topeka is the capical of Kansas
## Frankfort is the capical of Kentucky
## Baton Rouge is the capical of Louisiana
## Augusta is the capical of Maine
## Annapolis is the capical of Maryland
## Boston is the capical of Massachusetts
## Lansing is the capical of Michigan
## St. Paul is the capical of Minnesota
## Jackson is the capical of Mississippi
## Jefferson City is the capical of Missouri
## Helena is the capical of Montana
## Lincoln is the capical of Nebraska
## Carson City is the capical of Nevada
## Concord is the capical of New Hampshire
## Trenton is the capical of New Jersey
## Santa Fe is the capical of New Mexico
## Albany is the capical of New York
## Raleigh is the capical of North Carolina
## Bismarck is the capical of North Dakota
## Columbus is the capical of Ohio
## Oklahoma City is the capical of Oklahoma
## Salem is the capical of Oregon
## Harrisburg is the capical of Pennsylvania
## Providence is the capical of Rhode Island
## Columbia is the capical of South Carolina
## Pierre is the capical of South Dakota
## Nashville is the capical of Tennessee
## Austin is the capical of Texas
## Salt Lake City is the capical of Utah
## Montpelier is the capical of Vermont
## Richmond is the capical of Virginia
## Olympia is the capical of Washington
## Charleston is the capical of West Virginia
## Madison is the capical of Wisconsin
## Cheyenne is the capical of Wyoming
The format()
method can also take arguments by position
as follows:
# arguments by position
for i in range(50):
print('{0} is the capital of {1}'.format(state_capitals[i], states[i]))
## Montgomery is the capital of Alabama
## Juneau is the capital of Alaska
## Phoenix is the capital of Arizona
## Little Rock is the capital of Arkansas
## Sacramento is the capital of California
## Denver is the capital of Colorado
## Hartford is the capital of Connecticut
## Dover is the capital of Delaware
## Tallahassee is the capital of Florida
## Atlanta is the capital of Georgia
## Honolulu is the capital of Hawaii
## Boise is the capital of Idaho
## Springfield is the capital of Illinois
## Indianapolis is the capital of Indiana
## Des Moines is the capital of Iowa
## Topeka is the capital of Kansas
## Frankfort is the capital of Kentucky
## Baton Rouge is the capital of Louisiana
## Augusta is the capital of Maine
## Annapolis is the capital of Maryland
## Boston is the capital of Massachusetts
## Lansing is the capital of Michigan
## St. Paul is the capital of Minnesota
## Jackson is the capital of Mississippi
## Jefferson City is the capital of Missouri
## Helena is the capital of Montana
## Lincoln is the capital of Nebraska
## Carson City is the capital of Nevada
## Concord is the capital of New Hampshire
## Trenton is the capital of New Jersey
## Santa Fe is the capital of New Mexico
## Albany is the capital of New York
## Raleigh is the capital of North Carolina
## Bismarck is the capital of North Dakota
## Columbus is the capital of Ohio
## Oklahoma City is the capital of Oklahoma
## Salem is the capital of Oregon
## Harrisburg is the capital of Pennsylvania
## Providence is the capital of Rhode Island
## Columbia is the capital of South Carolina
## Pierre is the capital of South Dakota
## Nashville is the capital of Tennessee
## Austin is the capital of Texas
## Salt Lake City is the capital of Utah
## Montpelier is the capital of Vermont
## Richmond is the capital of Virginia
## Olympia is the capital of Washington
## Charleston is the capital of West Virginia
## Madison is the capital of Wisconsin
## Cheyenne is the capital of Wyoming
The term class is not directly synonymous with data type, although there is a relationship between the two concepts. A data type refers to a categorization of data in terms of which operations can be performed on it and how that data is stored. In Python, classes are used to define data types, and objects created from these classes have associated data types.↩︎