splitting a list into two distinct sublists


Let's say you have the following list, or string:

1 ababcdcd

How would you find the point between abab and cdcd? i.e the point where the substrings contain distinct characters from one another?

1 s = "ababcdcd"
2 for i in range(1, len(s)):
3     if not set(s[:i]) & set(s[i:]):
4         print(s[:i], s[i:])
5         break

This solution converts the substrings into sets, and & them. When the intersection of both sets is empty, both substrings are distinct.

The solution can also be expressed as a generator expression:

1 s = "ababcdcd"
2 gen = ((s[:i], s[i:]) for i in range(1, len(s)) if not set(s[:i]) & set(s[i:]))
3 print(next(gen))

Getting all distinct substrings

Printing out all distinct substrings is also straightforward:

1 s = "ababcdcdefefef"
2 indices = [0] + [i for i in range(1, len(s)) if not set(s[:i]) & set(s[i:])] + [len(s)]
3 print([s[i:j] for i, j in zip(indices, indices[1:])])


1 ['abab', 'cdcd', 'efefef']

See also:

shifted zip