String Manipulations in Python

In Python, string is an immutable data type. This means that you cannot modify its value once created. Nevertheless, this doesn’t preclude data processing on string values. In fact, Python provides many tools for manipulating string values, allowing the creation of new strings based on existing ones.

Common String Operators

Python operators can be used to perform basic string manipulations. Here are some common string operators in Python:

Concatenation (+): Unlike when it is applied on numeric data, the + operator concatenates two strings, combining them into a single string.

str1 = "Hello"
str2 = "Python"
result = str1 + ", " + str2 + "!"
print(result) # Output: Hello, Python!
## Hello, Python!


Repetition (*): The * operator is used to repeat a string a certain number of times.

str1 = "Hello "
result = str1 * 3
print(result) # Output: Hello Hello Hello 
## Hello Hello Hello


Membership (in, not in): The in and not in operators check if a substring is present or absent in a string.

str1 = "Python"
print("Py" in str1) # Output: True
## True
print("Py" not in str1) # Output: False
## False


Comparison Operators: Comparison operators can also be used to compare strings based on lexicographical order.

str1 = "apple"
str2 = "app"
str3 = "apple pie"
str4 = "APPLE PIE"


str1 == str2 # Equality
## False
str1 != str2 # Inequality
## True
str3 > str4 # Greater than
## True
str3 >= str4 # Greater than or equal to
## True
str3 < str4 # Less than
## False
str3 <= str4 # Less than or equal to
## False



Common String Methods

In object-oriented programming languages like Python, class serves as a blueprint, defining the structure and behavior of objects. For example, the character string is a built-in class in Python1, equipped with a collection of methods-functions that belong to instances of a specific class. These string methods are powerful tools for handling and manipulating strings.

string.split and string.rsplit()

string.split(sep=None, maxsplit=-1)
string.rsplit(sep=None, maxsplit=-1)


The string.split method splits character string from left-side of the separator and returns a list, while string.rsplit splits from right-side of the separator and returns a list.

marx = "Groucho and Harpo and Chico"

marx.split(" and ") 
## ['Groucho', 'Harpo', 'Chico']
marx.rsplit(" and ")
## ['Groucho', 'Harpo', 'Chico']


Optionally, you can set the maximum number of the splits with maxsplit argument.

marx.split(sep=" and ", maxsplit=1)
## ['Groucho', 'Harpo and Chico']
marx.rsplit(sep=" and ", maxsplit=1)
## ['Groucho and Harpo', 'Chico']


string.join

The string.join method concatenates multiple character strings included in an iterable (such as a list, tuple, or even a string itself) into a single string.

# Syntax for the method is separator.join(iterable)
marx = ["Chico", "Harpo", "Groucho"]

" and ".join(marx)
## 'Chico and Harpo and Groucho'


Note that if the object passing into the join method contains a non-character string, Python will throw an error.

my_tuple = ('1', '2', '3', 4)
", ".join(my_tuple)
## TypeError: sequence item 3: expected str instance, int found


strip, lstrip, and rstrip

The strip, lstrip, and rstrip methods are used to return character strings, excluding a certain expression. The string.strip excludes character expression starting from left and right end of the string, while lstrip and rstrip exclude character expression starting from left and right respectively.

my_palindrome = "madam"

my_palindrome.strip('m')
## 'ada'
my_palindrome.lstrip('m')
## 'adam'
my_palindrome.rstrip('m')
## 'mada'


The character expression passing into the method is the set of characters you want to exclude. Here, Python will exclude all characters in the set, until it reaches any character that is not in the set. So, the order of the expression does not matter:

my_palindrome = "A man, a plan, a canal: Panama."

# Removing characters 'A', 'p', and 'c' from both ends until a non-'A', a non-'p', or a non-'c' chacacter is encountered.

my_palindrome.strip("Apc")
## ' man, a plan, a canal: Panama.'
# Removing characters 'p', 'A', and 'c' from both ends until a non-'A', a non-'p', or a non-'c' chacacter is encountered. (order doesn't matter)
my_palindrome.strip("pAc")
## ' man, a plan, a canal: Panama.'


Here, I passed character set "Apc", a set containing 'A', 'p', and 'c'. Starting from the left, Python excludes any matching character until it first met a non-matching character. Similarly, starting from the right, Python excludes any matching character from the palindrome. Since it is a set, even when I passed "pAc", Python returns the same result.

Again, strings are immutable. So, applying strip methods will not delete the character from the object: it just returns another string excluding the character expressions.

my_palindrome
## 'A man, a plan, a canal: Panama.'


Replacing a String

The replace method returns a string after “replacing” a specific part of the string. The argument of the method are existing character expression, new character expression, and how many times you want to replace. For example:

# Replace 'a' with 'A' three times from the left.
my_palindrome.replace('a', 'A', 3)
## 'A mAn, A plAn, a canal: Panama.'


The example above will return another string after replacing a with A three times. However, as mentioned earlier, string is an immutable data type, so technically it does not “replace” any consisting characters in a string.

# No replacement after applying the method
print(my_palindrome)
## A man, a plan, a canal: Panama.


A useful trick to actually replace characters in a string is re-assigning the object after applying the method. For example:

my_palindrome = my_palindrome.replace('a', 'A', 3)
print(my_palindrome)
## A mAn, A plAn, a canal: Panama.


center, ljust, and rjust

In Python, the center(), just(), and rjust() methods are used to pad strings to a certain width. The center method centers the string within the specified width, while the ljust and rjust methods left- and right-justify the string, respectively.

The methods take two arguments: the desired width of the string and the character to use for padding. The default character for padding is a space. For example:

left = "Hello, World!".ljust(20, '>')
right = "Hello, World!".rjust(20, '<')
center = "Hello, World!".center(20)

len(left)
## 20
len(right)
## 20
len(center)
## 20


print(left)
## Hello, World!>>>>>>>
print(right)
## <<<<<<<Hello, World!
print('"', center, '"')
## "    Hello, World!     "


expandtabs

The expandtabs() method in Python is used to replace all tab characters in a string with the appropriate number of spaces. It is quite useful when you work with an HTML codes. The tab size is specified by the optional tabsize argument, which defaults to 8. For example, the following code will replace all tab characters in the string "Hello\tworld!" with the specified number of spaces:

"Hello\tworld!"
## 'Hello\tworld!'
"Hello\tworld!".expandtabs
## <built-in method expandtabs of str object at 0x000002B9B2E0E730>
"Hello\tworld!".expandtabs()
## 'Hello   world!'
"Hello\tworld!".expandtabs(8)
## 'Hello   world!'
"Hello\tworld!".expandtabs(1)
## 'Hello world!'


format

The format method offers an alternative approach to string formatting, distinct from f-strings. It is invoked on a string and uses placeholders ({}) to indicate where the values should be inserted. Here’s an example:

states = [
    'Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
    'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho',
    'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine',
    'Maryland', 'Massachusetts', 'Michigan', 'Minnesota', 'Mississippi',
    'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey',
    'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma',
    'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
    'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington',
    'West Virginia', 'Wisconsin', 'Wyoming'
]

state_capitals = [
    'Montgomery', 'Juneau', 'Phoenix', 'Little Rock', 'Sacramento', 'Denver',
    'Hartford', 'Dover', 'Tallahassee', 'Atlanta', 'Honolulu', 'Boise',
    'Springfield', 'Indianapolis', 'Des Moines', 'Topeka', 'Frankfort', 'Baton Rouge', 'Augusta',
    'Annapolis', 'Boston', 'Lansing', 'St. Paul', 'Jackson',
    'Jefferson City', 'Helena', 'Lincoln', 'Carson City', 'Concord', 'Trenton',
    'Santa Fe', 'Albany', 'Raleigh', 'Bismarck', 'Columbus', 'Oklahoma City',
    'Salem', 'Harrisburg', 'Providence', 'Columbia', 'Pierre', 'Nashville',
    'Austin', 'Salt Lake City', 'Montpelier', 'Richmond', 'Olympia',
    'Charleston', 'Madison', 'Cheyenne'
]

for i in range(50):
    formatted_string = "{city} is the capical of {state}".format(city = state_capitals[i], 
                                                                 state = states[i])
    print(formatted_string)
## Montgomery is the capical of Alabama
## Juneau is the capical of Alaska
## Phoenix is the capical of Arizona
## Little Rock is the capical of Arkansas
## Sacramento is the capical of California
## Denver is the capical of Colorado
## Hartford is the capical of Connecticut
## Dover is the capical of Delaware
## Tallahassee is the capical of Florida
## Atlanta is the capical of Georgia
## Honolulu is the capical of Hawaii
## Boise is the capical of Idaho
## Springfield is the capical of Illinois
## Indianapolis is the capical of Indiana
## Des Moines is the capical of Iowa
## Topeka is the capical of Kansas
## Frankfort is the capical of Kentucky
## Baton Rouge is the capical of Louisiana
## Augusta is the capical of Maine
## Annapolis is the capical of Maryland
## Boston is the capical of Massachusetts
## Lansing is the capical of Michigan
## St. Paul is the capical of Minnesota
## Jackson is the capical of Mississippi
## Jefferson City is the capical of Missouri
## Helena is the capical of Montana
## Lincoln is the capical of Nebraska
## Carson City is the capical of Nevada
## Concord is the capical of New Hampshire
## Trenton is the capical of New Jersey
## Santa Fe is the capical of New Mexico
## Albany is the capical of New York
## Raleigh is the capical of North Carolina
## Bismarck is the capical of North Dakota
## Columbus is the capical of Ohio
## Oklahoma City is the capical of Oklahoma
## Salem is the capical of Oregon
## Harrisburg is the capical of Pennsylvania
## Providence is the capical of Rhode Island
## Columbia is the capical of South Carolina
## Pierre is the capical of South Dakota
## Nashville is the capical of Tennessee
## Austin is the capical of Texas
## Salt Lake City is the capical of Utah
## Montpelier is the capical of Vermont
## Richmond is the capical of Virginia
## Olympia is the capical of Washington
## Charleston is the capical of West Virginia
## Madison is the capical of Wisconsin
## Cheyenne is the capical of Wyoming


The format() method can also take arguments by position as follows:

# arguments by position
for i in range(50):
    print('{0} is the capital of {1}'.format(state_capitals[i], states[i]))
## Montgomery is the capital of Alabama
## Juneau is the capital of Alaska
## Phoenix is the capital of Arizona
## Little Rock is the capital of Arkansas
## Sacramento is the capital of California
## Denver is the capital of Colorado
## Hartford is the capital of Connecticut
## Dover is the capital of Delaware
## Tallahassee is the capital of Florida
## Atlanta is the capital of Georgia
## Honolulu is the capital of Hawaii
## Boise is the capital of Idaho
## Springfield is the capital of Illinois
## Indianapolis is the capital of Indiana
## Des Moines is the capital of Iowa
## Topeka is the capital of Kansas
## Frankfort is the capital of Kentucky
## Baton Rouge is the capital of Louisiana
## Augusta is the capital of Maine
## Annapolis is the capital of Maryland
## Boston is the capital of Massachusetts
## Lansing is the capital of Michigan
## St. Paul is the capital of Minnesota
## Jackson is the capital of Mississippi
## Jefferson City is the capital of Missouri
## Helena is the capital of Montana
## Lincoln is the capital of Nebraska
## Carson City is the capital of Nevada
## Concord is the capital of New Hampshire
## Trenton is the capital of New Jersey
## Santa Fe is the capital of New Mexico
## Albany is the capital of New York
## Raleigh is the capital of North Carolina
## Bismarck is the capital of North Dakota
## Columbus is the capital of Ohio
## Oklahoma City is the capital of Oklahoma
## Salem is the capital of Oregon
## Harrisburg is the capital of Pennsylvania
## Providence is the capital of Rhode Island
## Columbia is the capital of South Carolina
## Pierre is the capital of South Dakota
## Nashville is the capital of Tennessee
## Austin is the capital of Texas
## Salt Lake City is the capital of Utah
## Montpelier is the capital of Vermont
## Richmond is the capital of Virginia
## Olympia is the capital of Washington
## Charleston is the capital of West Virginia
## Madison is the capital of Wisconsin
## Cheyenne is the capital of Wyoming

  1. The term class is not directly synonymous with data type, although there is a relationship between the two concepts. A data type refers to a categorization of data in terms of which operations can be performed on it and how that data is stored. In Python, classes are used to define data types, and objects created from these classes have associated data types.↩︎

Post a Comment

0 Comments