I'm looking for a string.contains
or string.indexof
method in Python.
I want to do:
if not somestring.contains("blah"):
continue
if "blah".lower() not in somestring.lower():
. Sure "blah"
is already lower case, but if you replaced it with something else (like a variable) it might not be.
Commented
Jul 3 at 19:25
You can use str.find:
s = "This be a string"
if s.find("is") == -1:
print("Not found")
else:
print("Found")
The
find()
method should be used only if you need to know the position of sub. To check if sub is a substring or not, use thein
operator. (c) Python reference
if ' is ' in s:
which will return False
as is (probably) expected.
Commented
Aug 9, 2010 at 3:22
\bis\b
(word boundaries).
is
inside "This be a string." That will evaluate to True because of the is
in This
. This is bad for programs that search for words, like swear filters (for example, a dumb word check for "ass" would also catch "grass").
Commented
Jun 19, 2022 at 18:44
Does Python have a string contains substring method?
99% of use cases will be covered using the keyword, in
, which returns True
or False
:
'substring' in any_string
For the use case of getting the index, use str.find
(which returns -1 on failure, and has optional positional arguments):
start = 0
stop = len(any_string)
any_string.find('substring', start, stop)
or str.index
(like find
but raises ValueError on failure):
start = 100
end = 1000
any_string.index('substring', start, end)
Use the in
comparison operator because
>>> 'foo' in '**foo**'
True
The opposite (complement), which the original question asked for, is not in
:
>>> 'foo' not in '**foo**' # returns False
False
This is semantically the same as not 'foo' in '**foo**'
but it's much more readable and explicitly provided for in the language as a readability improvement.
__contains__
The "contains" method implements the behavior for in
. This example,
str.__contains__('**foo**', 'foo')
returns True
. You could also call this function from the instance of the superstring:
'**foo**'.__contains__('foo')
But don't. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in
and not in
functionality (e.g. if subclassing str
):
class NoisyString(str):
def __contains__(self, other):
print(f'testing if "{other}" in "{self}"')
return super(NoisyString, self).__contains__(other)
ns = NoisyString('a string with a substring inside')
and now:
>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True
find
and index
to test for "contains"Don't use the following string methods to test for "contains":
>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2
>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
'**oo**'.index('foo')
ValueError: substring not found
Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in
comparison operator.
Also, these are not drop-in replacements for in
. You may have to handle the exception or -1
cases, and if they return 0
(because they found the substring at the beginning) the boolean interpretation is False
instead of True
.
If you really mean not any_string.startswith(substring)
then say it.
We can compare various ways of accomplishing the same goal.
import timeit
def in_(s, other):
return other in s
def contains(s, other):
return s.__contains__(other)
def find(s, other):
return s.find(other) != -1
def index(s, other):
try:
s.index(other)
except ValueError:
return False
else:
return True
perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}
And now we see that using in
is much faster than the others.
Less time to do an equivalent operation is better:
>>> perf_dict
{'in:True': 0.16450627865128808,
'in:False': 0.1609668098178645,
'__contains__:True': 0.24355481654697542,
'__contains__:False': 0.24382793854783813,
'find:True': 0.3067379407923454,
'find:False': 0.29860888058124146,
'index:True': 0.29647137792585454,
'index:False': 0.5502287584545229}
in
be faster than __contains__
if in
uses __contains__
?This is a fine follow-on question.
Let's disassemble functions with the methods of interest:
>>> from dis import dis
>>> dis(lambda: 'a' in 'b')
1 0 LOAD_CONST 1 ('a')
2 LOAD_CONST 2 ('b')
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
>>> dis(lambda: 'b'.__contains__('a'))
1 0 LOAD_CONST 1 ('b')
2 LOAD_METHOD 0 (__contains__)
4 LOAD_CONST 2 ('a')
6 CALL_METHOD 1
8 RETURN_VALUE
so we see that the .__contains__
method has to be separately looked up and then called from the Python virtual machine - this should adequately explain the difference.
str.index
and str.find
? How else would you suggest someone find the index of a substring instead of just whether it exists or not? (or did you mean avoid using them in place of contains - so don't use s.find(ss) != -1
instead of ss in s
?)
Commented
Jun 10, 2015 at 3:35
re
module. I have not yet found a use for str.index or str.find myself in any code I have written yet.
str.count
as well (string.count(something) != 0
). shudder
in_
above - but with a stackframe around it, so it's slower than that: github.com/python/cpython/blob/3.7/Lib/operator.py#L153
if needle in haystack:
is the normal use, as @Michael says -- it relies on the in
operator, more readable and faster than a method call.
If you truly need a method instead of an operator (e.g. to do some weird key=
for a very peculiar sort...?), that would be 'haystack'.__contains__
. But since your example is for use in an if
, I guess you don't really mean what you say;-). It's not good form (nor readable, nor efficient) to use special methods directly -- they're meant to be used, instead, through the operators and builtins that delegate to them.
in
Python strings and listsHere are a few useful examples that speak for themselves concerning the in
method:
>>> "foo" in "foobar"
True
>>> "foo" in "Foobar"
False
>>> "foo" in "Foobar".lower()
True
>>> "foo".capitalize() in "Foobar"
True
>>> "foo" in ["bar", "foo", "foobar"]
True
>>> "foo" in ["fo", "o", "foobar"]
False
>>> ["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]
Caveat. Lists are iterables, and the in
method acts on iterables, not just strings.
If you want to compare strings in a more fuzzy way to measure how "alike" they are, consider using the Levenshtein package
If you are happy with "blah" in somestring
but want it to be a function/method call, you can probably do this
import operator
if not operator.contains(somestring, "blah"):
continue
All operators in Python can be more or less found in the operator module including in
.
So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:
names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names)
>> True
any(st in 'mary and jane' for st in names)
>> False
in
should not be used with lists because it does a linear scan of the elements and is slow compared. Use a set instead, especially if membership tests are to be done repeatedly.
You can use y.count()
.
It will return the integer value of the number of times a substring appears in a string.
For example:
string.count("bah") # gives 0
string.count("Hello") # gives 1
Here is your answer:
if "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
For checking if it is false:
if not "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
OR:
if "insert_char_or_string_here" not in "insert_string_to_search_here":
#DOSTUFF
You can use regular expressions to get the occurrences:
>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']
in
operator. But it's a fun solution. In case you insist of using re
, re.match
is better to use as boolean.
Commented
Aug 8, 2023 at 14:00