Split string with multiple delimiters in Python

Python string split() method allows a string to be easily split into a list based on a delimiter. Though in some cases, you might need the separation to occur based on not just one but multiple delimiter values. This quick 101 article introduces two convenient approaches this can be achieved in Python.

Split String With Two Delimiters in Python

Assume the following string.

text = "python is, an easy;language; to, learn."

For our example, we need to split it either by a semicolon followed by a space ;, or by a comma followed by a space ,. In this case, any occurrences of singular semicolons or commas i.e. , , ; with no trailing spaces should not be concerned.

Regular Expressions

A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams. An example of a delimiter is the comma character, which acts as a field delimiter in a sequence of comma-separated values.

Use Basic Expression

Python’s built-in module re has a split() method we can use for this case.

Let’s use a basic a or b regular expression (a|b) for separating our multiple delimiters.

import re

text = "python is, an easy;language; to, learn."
print(re.split('; |, ', text))

Output:

['python is', 'an easy;language', 'to', 'learn.']

As mentioned on the Wikipedia page, Regular Expressions use IEEE POSIX as the standard for its syntax. By referring to this standard, we can administer several additional ways we may come about writing a regular expression that matches our use case.

Instead of using bar separators (|) for defining our delimiters, we can achieve the same result using Range ([]) syntax provided in Regular Expressions. You may define a range of characters a regular expression can match by providing them within square brackets.

Therefore when specifying the pattern of our regular expression, we can simply provide a semicolon and comma within square brackets and an additional space [;,] which would result in the regular expression being matched by parts of a string with exactly [a semicolon OR comma] and a trailing space.

import re

text = "python is, an easy;language; to, learn."
print(re.split("[;,] ", text))

Make It a Function

Prior mentioned basic expression was limited to a hardcoded set of separators. This can later on lead to hassles when delimiter modifications occur and also limits its reusability on other parts of the code. Therefore, It is better in terms of using the best practices to consider making the code more generic and reusable. Hence let’s code that logic to a Python function just to be on our safe side.

import re
text = "python is, an easy;language; to, learn."
separators = "; ", ", "

def custom_split(delimiters_list, string, maxsplit=0):
    # create regular expression dynamically
    regular_exp = '|'.join(map(re.escape, delimiters_list))
    return re.split(regular_exp, string, maxsplit)

print(custom_split(separators, text))

re.escape allows to build the pattern automatically and have the delimiters escaped nicely.

If you’d like to leave the original delimiters in the string, you can change the regex to use a lookbehind assertion instead:

>>> import re
>>> delimiters = "a", "...", "(c)"
>>> example = "stackoverflow (c) is awesome... isn't it?"
>>> regexPattern = '|'.join('(?<={})'.format(re.escape(delim)) for delim in delimiters)
>>> regexPattern
'(?<=a)|(?<=\\.\\.\\.)|(?<=\\(c\\))'
>>> re.split(regexPattern, example)
['sta', 'ckoverflow (c)', ' is a', 'wesome...', " isn't it?"]

replace ?<= with ?= to attach the delimiters to the righthand side, instead of left.