Questions tagged [string-matching]
String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).
string-matching
2,307
questions
0
votes
1
answer
51
views
Identifying Correct String Order in Pandas
I have a dataframe as the following, showing the relationship of different entities in each row.
Child
Parent
Ult_Parent
Full_Family
A032
A001
A039
A001, A032, A039, A040, A041, A043, A043, A045, ...
0
votes
0
answers
34
views
Fuzzy Match 2 Large Pandas Dataframes
I have 2 pandas dataframes that both contain company names. I want to left join df1(~10k rows) with df2(~1.6m rows) on company names using a fuzzy match. My current function takes too long to run, so ...
2
votes
6
answers
114
views
Matching the start of a sequence in R
I have a series of string in a vector and need to remove the matching starting pattern from the string. However, I don't know the pattern or how long it is.
stringa <- c("apple_tart", &...
1
vote
1
answer
48
views
Given a String count the possible Permutations that satisfy a condition. How to Optimize from O(N*N!)
Hi I recently came across an interesting question and had a hard time trying to optimize it beyond O(N*N!).
Here is the question:
Given a string, return the number of possible combination that satisfy ...
0
votes
0
answers
23
views
I'm on MATLAB analyzing synchronization data. How do I avoid populating my variables with these random characters?
I want to analyze synchronization data. I have the timing of note onsets in seconds of a 30s long audio file in an xlsx file. I also have the timestamps of a participant's taps in relation to a ...
1
vote
2
answers
134
views
How can I find all exact occurrences of a string, or close matches of it, in a longer string in Python?
Goal:
I'd like to find all exact occurrences of a string, or close matches of it, in a longer string in Python.
I'd also like to know the location of these occurrences in the longer string.
To define ...
1
vote
1
answer
31
views
Why doesn't fuzzywuzzy's process.extractBests give a 100% score when the tested string 100% contains the query string?
I'm testing fuzzywuzzy's process.extractBests() as follows:
from fuzzywuzzy import process
# Define the query string
query = "Apple"
# Define the list of choices
choices = ["Apple&...
0
votes
0
answers
70
views
How to efficiently compute similarity scores for prefixes of a string with another string in C?
I'm working on a problem involving string matching where I need to compute the similarity scores for each prefix of a string C against another string S. The similarity score for a prefix P of C and S ...
0
votes
0
answers
33
views
Spotfire's "~=" not matching wildcard characters
Using Spotfire Alanyst 14.0.3
I'm in the Data Canvas adding a filter via the "Add transformation" feature.
When I use the filter expression ...
[customdata_name]~='Binary Pump : 1 : ...
0
votes
1
answer
77
views
How to do fuzzy merge with 2 large pandas dataframes?
I have 2 pandas dataframes that both contain company names. I want to merge these 2 dataframes on company names using a fuzzy match. But the problem is 1 dataframe contains 5m rows and the other 1 ...
1
vote
0
answers
112
views
How to find best matching anchor texts from paragraph and list of titles?
I have a paragraph:
In today's world, keeping your personal information safe online is more important than ever. With cyber-attacks on the rise, having a strong cybersecurity strategy is essential.
...
2
votes
3
answers
142
views
How to Compare Hierarchy in 2 Pandas DataFrames? (New Sample Data Updated)
I have 2 dataframes that captured the hierarchy of the same dataset. Df1 is more complete compared to Df2, so I want to use Df1 as the standard to analyze if the hierarchy in Df2 is correct. However, ...
-1
votes
1
answer
62
views
Can i combine contain and startswith in order to match two columns from one dataframe to another's master column?
Master dataframe filled with a specific match's players and statistics.
34 columns and variable number of rows.
Column "Player" has full names
Player
Goals
Assists
Dominic Calvert-Lewin
1
...
0
votes
0
answers
29
views
Aho-Corasick algorithm: possible to match non-adjacent keywords?
I need to match non-adjacent keywords in a large collection of texts. If there is a match it should return the match, else return "unknown". For the first trial run it will be several ...
0
votes
0
answers
67
views
How do I create a query to find a specific string in Firebase Firestore? [duplicate]
I am developing a Flutter app where, upon user input, the app needs to search within a PDF and return only the portion of text where the user-entered string appears. I'm using Firestore and have ...
2
votes
2
answers
397
views
polars: efficient way to apply function to filter column of strings
I have a column of long strings (like sentences) on which I want to do the following:
replace certain characters
create a list of the remaining strings
if a string is all text see whether it is in a ...
-1
votes
1
answer
77
views
How do I find the first # after an even number of "?
Reading a text file with the format:
e2c=["(vsim-86)" ,'kkk', "pppp",
"bbbbbb", #"old", "uio",
" sds # sds", #"old2",
" sds #...
0
votes
1
answer
53
views
Asymmetric partial matching of text strings between two dataframes
I have two dataframes:
df1 is based on survey responses and includes a non-restricted field for users to add their location in the UK (or refuse to do so) formatted as so (not real data):
Name
...
1
vote
0
answers
37
views
Is a Generalized Suffix Tree a good data structure to use for string searches on a dict of strings where partial matches should also be returned?
I have a dictionary of strings that I would like to perform string searches on in real time (web application with approx. 1500 total users).
Background: I have a data table that follows the structure
...
0
votes
0
answers
47
views
String Matching Function Not Matching Strings Despite Threshold Set to 0
I have implemented a string matching function in Python utilizing n-grams and similarity ratios. The function signature is as follows:
# concise version of the function
def match_strings(...
-2
votes
1
answer
51
views
Incorporating Phone Number Matching into Existing String based Name Matching Function
I have a Python function, match_strings, which is designed to match names from two different data sources. Here is the function definition:
python
def match_strings(strings1, strings2, ngram_n=2, ...
0
votes
0
answers
68
views
Jaccard vs Cosine similarity for addresses string comparison
I've seen a ton of questions on these 2 algorithms but I can't make my mind around what I should use in my use case.
I need to compare 2 strings representing addresses and I need to know if 2 strings '...
1
vote
1
answer
44
views
Is there a way to recode a vector of strings based on two key words or phrases that appear in every value into new vector with those two values?
As my question indicates, I would like to convert a vector of strings into a new vector one of two values that appears in every string. Here is an example of a very simple data frame I have:
data <-...
0
votes
1
answer
84
views
Filtering Range based on Multiple Criteria
I am trying to filter a list of properties based on multiple keywords (e.g. "Cool Interior," "Terrace/Patio"). Here's a basic interpretation:
The range I want to filter is on a ...
0
votes
0
answers
200
views
Google Sheets - Count if two cells have the same text
I'm trying to create a code to see if my predictions for games and the actual result of the games are the same. I was going to create a point value, like March Madness has, but I can't actually get ...
3
votes
1
answer
108
views
Aho-Corasick algorithm with C language
I have programmed an Aho-Corasick algorithm with a transition table that searches for a set of words in a text and displays the number of occurrences by using malloc(), but I am encountering this ...
1
vote
1
answer
111
views
module 'thefuzz' has no attribute 'partial_ratio' and other odd errors
Been trying to use thefuzz to compare two different lists, and got the above error, which doesn't seem right. I've commented everything else out in my code except the below two test lines and still ...
0
votes
1
answer
153
views
searching for matching words in pdf using page.searc_for
I have a list of words which I am searching in a pdf document using fitz in python
The code generally works for most of the words except for a few like "efficiency"
My code is given below :
...
0
votes
0
answers
34
views
powershell ilike operator not returning true [duplicate]
PS C:\Users\Administrator> $string = "hello world"
PS C:\Users\Administrator> $string -ilike "hello"
False
the above is outputing false, and not true. not sure what I am ...
0
votes
0
answers
82
views
Why is Rabin-Karp algo seemingly less efficient than brute force algo for string matching
I am just looking at various algorithm's efficiency. Not just big O efficiency, but practical efficiency. Anyway i was testing a Rabin Karp algorithm i wrote against a brute force string comparison ...
0
votes
2
answers
75
views
Is there a way in R to join between two columns based on whether a string in column 1 is contained within the string in column 2?
I am trying to join several messy datasets together without using "fuzzy matching".
In the core dataset (example dataset1 below), I have simple names for companies. In the datasets I would ...
-1
votes
2
answers
64
views
Compare two columns (with merged phone numbers) if any phone number from first column exists in the second column
I need to compare two columns which are in resulting data frame and those two columns are coming from a separate sources.
Now, I would like to compare them and have a resulting (tag) column based on ...
1
vote
1
answer
333
views
Split full address to contain only street name
I have a table with address1, city, state, and postal code. However, some address1 will also contains city, state and postal code (separated by either comma or space or both). Example:
Address1: 9999 ...
-1
votes
3
answers
93
views
Having trouble with regex in Java 11
Trying to strip server name from: //some.server.name/path/to/a/dir (finishing with /path/to/a/dir)
I have tried 3 different regexes (hardcoded works), but the other two look like they should work but ...
0
votes
0
answers
24
views
SQL fulltext search using containstable returns false match
The problem I am facing right now is that the full-text search in SQL doesn't yield the results that I would be expecting. The containstable method returns a result that does not contain the provided ...
1
vote
1
answer
81
views
Lookup items of Col1 in Col2 and Comment the matching Percentage
My data frame:
data = {'Col1': ['Bad Homburg', 'Bischofferode', 'Essen', 'Grabfeld OT Rentwertshausen','Großkrotzenburg','Jesewitz/Weg','Kirchen (Sieg)','Laudenbach a. M.','Nachrodt-Wiblingwerde','...
0
votes
0
answers
48
views
How can I compare the order in which characters appear in excel?
The problem - I want to decide how similar two strings are based on the order in which the letters appear.
For instance, comparing the strings "Paul" and "JoPaul". JoPau has 2 ...
1
vote
0
answers
344
views
Create embeddings for string matching
I have 4 lists of companies names. Lets take a company Google. In List A, Google is written as Google Ltd, In 2nd list, it is written as Google Inc (extended etc), 3rd contain Beta Gogl (misspelled ...
1
vote
1
answer
257
views
Powershell Question How to Select Specific Characters in a File's Name?
I'm trying to create a Powershell script that looks for just files with the extension .dgn within a specific directory. Then if it has a character string of "_ch_" in the name of the file ...
0
votes
0
answers
23
views
VScode - regex find match in the middle and remove start and end [duplicate]
I want to replace all (start and end) of the string but the parameter in the middle (for example @ModelKey or @ProductNumber) from this Input
[MODEL_KEY] = IIF(@ModelKey IS NOT NULL, @ModelKey , [...
0
votes
1
answer
74
views
PHP extract a substring between two strings before a substring found
I have this string of escaped html code:
$html="
...
euro�<strong>0,00</strong>�sono relativi a Operazioni finanziarie di <strong>Importo Ridotto</...
0
votes
1
answer
97
views
How to get the matched groups in regex Python and save it as a new column
I have a dataframe and i want to find out, if there was any mentions of the firms that i'm looking for in DocumentIdentifier column. probably it should be done through Regex groups, but I'm not sure ...
0
votes
1
answer
150
views
find url in web page content using powershell
I need to search for https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip url from https://www.windwardstudios.com/version/version-downloads using powershell.
Thus i ...
2
votes
2
answers
70
views
How to split the rows in the array using match()?
I have a matrix containing arrays of rows.
let matrix=[['hello'],['world']];
I'm duplicating rows.
matrix=matrix.map(x=>String(x).repeat(2)).map(x=>x.match(new RegExp??))
I want to get
[['...
3
votes
4
answers
37
views
Return a data frame subset based on similar (not identical) elements in a vector?
I have a dataframe (dim 2914 x 6) where one column is a vector of animal groups and species abbreviations, e.g. "bird_F.pw", and I have a separate vector of a few species abbreviations, e.g. ...
-4
votes
1
answer
98
views
Feedback on my Javascript search engine project. Prints not all accepting result, no errors displayd [closed]
[JSON data recipes][1]
'use strict';
const cakeRecipes = require("./cake-recipes.json");
console.log(cakeRecipes[0]);
// If you're ready to test: uncomment the code below.
// printRecipes(...
0
votes
1
answer
38
views
How is it practically possible to compute an automaton inside a function and then return it?
I'm trying to follow Cormen - Algorithms, 3rd edition. Specifically, Chapter VII, 32 "String Matching". In general, I find this book extremely hard to follow, due to the abundance of math-...
0
votes
1
answer
108
views
Understanding a Specific Detail in the KMP Pattern Matching Algorithm
I have a question about the KMP pattern matching algorithm. Below is a code snippet for calculating the next array:
int GetNext(char ch[], int length, int next[]) {
next[1] = 0;
int i = 1, j = ...
0
votes
0
answers
52
views
Question about the KMP pattern matching algorithm
I have a question about the KMP pattern matching algorithm. Below is a code snippet for calculating the next array:
int GetNext(char ch[], int length, int next[]) {
next[1] = 0;
int i = 1, j = ...
2
votes
2
answers
59
views
Search for a large block of lines across directory
I have found that a large section of json I am pretty sure has been copied to about 80 files.
I have that section edited down into FILEA, it is 95 lines of text.
I want to grep -lr -F FILEA . EXCEPT, ...