Skip to main content

Questions tagged [string-matching]

String matching is the problem of finding occurrences of one string (“pattern”, “needle”) in another (“text”, “haystack”).

string-matching
Filter by
Sorted by
Tagged with
0 votes
1 answer
51 views

Identifying Correct String Order in Pandas

I have a dataframe as the following, showing the relationship of different entities in each row. Child Parent Ult_Parent Full_Family A032 A001 A039 A001, A032, A039, A040, A041, A043, A043, A045, ...
L H's user avatar
  • 27
0 votes
0 answers
34 views

Fuzzy Match 2 Large Pandas Dataframes

I have 2 pandas dataframes that both contain company names. I want to left join df1(~10k rows) with df2(~1.6m rows) on company names using a fuzzy match. My current function takes too long to run, so ...
L H's user avatar
  • 27
2 votes
6 answers
114 views

Matching the start of a sequence in R

I have a series of string in a vector and need to remove the matching starting pattern from the string. However, I don't know the pattern or how long it is. stringa <- c("apple_tart", &...
Katie Helm's user avatar
1 vote
1 answer
48 views

Given a String count the possible Permutations that satisfy a condition. How to Optimize from O(N*N!)

Hi I recently came across an interesting question and had a hard time trying to optimize it beyond O(N*N!). Here is the question: Given a string, return the number of possible combination that satisfy ...
Zi Ming's user avatar
  • 13
0 votes
0 answers
23 views

I'm on MATLAB analyzing synchronization data. How do I avoid populating my variables with these random characters?

I want to analyze synchronization data. I have the timing of note onsets in seconds of a 30s long audio file in an xlsx file. I also have the timestamps of a participant's taps in relation to a ...
A. Chavez's user avatar
1 vote
2 answers
134 views

How can I find all exact occurrences of a string, or close matches of it, in a longer string in Python?

Goal: I'd like to find all exact occurrences of a string, or close matches of it, in a longer string in Python. I'd also like to know the location of these occurrences in the longer string. To define ...
Franck Dernoncourt's user avatar
1 vote
1 answer
31 views

Why doesn't fuzzywuzzy's process.extractBests give a 100% score when the tested string 100% contains the query string?

I'm testing fuzzywuzzy's process.extractBests() as follows: from fuzzywuzzy import process # Define the query string query = "Apple" # Define the list of choices choices = ["Apple&...
Franck Dernoncourt's user avatar
0 votes
0 answers
70 views

How to efficiently compute similarity scores for prefixes of a string with another string in C?

I'm working on a problem involving string matching where I need to compute the similarity scores for each prefix of a string C against another string S. The similarity score for a prefix P of C and S ...
NatsumiStar's user avatar
0 votes
0 answers
33 views

Spotfire's "~=" not matching wildcard characters

Using Spotfire Alanyst 14.0.3 I'm in the Data Canvas adding a filter via the "Add transformation" feature. When I use the filter expression ... [customdata_name]~='Binary Pump : 1 : ...
RightmireM's user avatar
  • 2,451
0 votes
1 answer
77 views

How to do fuzzy merge with 2 large pandas dataframes?

I have 2 pandas dataframes that both contain company names. I want to merge these 2 dataframes on company names using a fuzzy match. But the problem is 1 dataframe contains 5m rows and the other 1 ...
L H's user avatar
  • 27
1 vote
0 answers
112 views

How to find best matching anchor texts from paragraph and list of titles?

I have a paragraph: In today's world, keeping your personal information safe online is more important than ever. With cyber-attacks on the rise, having a strong cybersecurity strategy is essential. ...
Manoj Kamble's user avatar
2 votes
3 answers
142 views

How to Compare Hierarchy in 2 Pandas DataFrames? (New Sample Data Updated)

I have 2 dataframes that captured the hierarchy of the same dataset. Df1 is more complete compared to Df2, so I want to use Df1 as the standard to analyze if the hierarchy in Df2 is correct. However, ...
L H's user avatar
  • 27
-1 votes
1 answer
62 views

Can i combine contain and startswith in order to match two columns from one dataframe to another's master column?

Master dataframe filled with a specific match's players and statistics. 34 columns and variable number of rows. Column "Player" has full names Player Goals Assists Dominic Calvert-Lewin 1 ...
filipakous's user avatar
0 votes
0 answers
29 views

Aho-Corasick algorithm: possible to match non-adjacent keywords?

I need to match non-adjacent keywords in a large collection of texts. If there is a match it should return the match, else return "unknown". For the first trial run it will be several ...
Simone's user avatar
  • 585
0 votes
0 answers
67 views

How do I create a query to find a specific string in Firebase Firestore? [duplicate]

I am developing a Flutter app where, upon user input, the app needs to search within a PDF and return only the portion of text where the user-entered string appears. I'm using Firestore and have ...
João Bosco's user avatar
2 votes
2 answers
397 views

polars: efficient way to apply function to filter column of strings

I have a column of long strings (like sentences) on which I want to do the following: replace certain characters create a list of the remaining strings if a string is all text see whether it is in a ...
MikeB2019x's user avatar
  • 1,107
-1 votes
1 answer
77 views

How do I find the first # after an even number of "?

Reading a text file with the format: e2c=["(vsim-86)" ,'kkk', "pppp", "bbbbbb", #"old", "uio", " sds # sds", #"old2", " sds #...
taquionbcn's user avatar
0 votes
1 answer
53 views

Asymmetric partial matching of text strings between two dataframes

I have two dataframes: df1 is based on survey responses and includes a non-restricted field for users to add their location in the UK (or refuse to do so) formatted as so (not real data): Name ...
Edward Blackburn's user avatar
1 vote
0 answers
37 views

Is a Generalized Suffix Tree a good data structure to use for string searches on a dict of strings where partial matches should also be returned?

I have a dictionary of strings that I would like to perform string searches on in real time (web application with approx. 1500 total users). Background: I have a data table that follows the structure ...
zolo00's user avatar
  • 21
0 votes
0 answers
47 views

String Matching Function Not Matching Strings Despite Threshold Set to 0

I have implemented a string matching function in Python utilizing n-grams and similarity ratios. The function signature is as follows: # concise version of the function def match_strings(...
NIDHI SHASTRY's user avatar
-2 votes
1 answer
51 views

Incorporating Phone Number Matching into Existing String based Name Matching Function

I have a Python function, match_strings, which is designed to match names from two different data sources. Here is the function definition: python def match_strings(strings1, strings2, ngram_n=2, ...
Rahul T's user avatar
0 votes
0 answers
68 views

Jaccard vs Cosine similarity for addresses string comparison

I've seen a ton of questions on these 2 algorithms but I can't make my mind around what I should use in my use case. I need to compare 2 strings representing addresses and I need to know if 2 strings '...
Akinn's user avatar
  • 2,000
1 vote
1 answer
44 views

Is there a way to recode a vector of strings based on two key words or phrases that appear in every value into new vector with those two values?

As my question indicates, I would like to convert a vector of strings into a new vector one of two values that appears in every string. Here is an example of a very simple data frame I have: data <-...
jdenn0514's user avatar
0 votes
1 answer
84 views

Filtering Range based on Multiple Criteria

I am trying to filter a list of properties based on multiple keywords (e.g. "Cool Interior," "Terrace/Patio"). Here's a basic interpretation: The range I want to filter is on a ...
John Lane's user avatar
0 votes
0 answers
200 views

Google Sheets - Count if two cells have the same text

I'm trying to create a code to see if my predictions for games and the actual result of the games are the same. I was going to create a point value, like March Madness has, but I can't actually get ...
Dixon Gerber's user avatar
3 votes
1 answer
108 views

Aho-Corasick algorithm with C language

I have programmed an Aho-Corasick algorithm with a transition table that searches for a set of words in a text and displays the number of occurrences by using malloc(), but I am encountering this ...
Zahra Chahi's user avatar
1 vote
1 answer
111 views

module 'thefuzz' has no attribute 'partial_ratio' and other odd errors

Been trying to use thefuzz to compare two different lists, and got the above error, which doesn't seem right. I've commented everything else out in my code except the below two test lines and still ...
user2981194's user avatar
0 votes
1 answer
153 views

searching for matching words in pdf using page.searc_for

I have a list of words which I am searching in a pdf document using fitz in python The code generally works for most of the words except for a few like "efficiency" My code is given below : ...
vani's user avatar
  • 29
0 votes
0 answers
34 views

powershell ilike operator not returning true [duplicate]

PS C:\Users\Administrator> $string = "hello world" PS C:\Users\Administrator> $string -ilike "hello" False the above is outputing false, and not true. not sure what I am ...
ctappy's user avatar
  • 177
0 votes
0 answers
82 views

Why is Rabin-Karp algo seemingly less efficient than brute force algo for string matching

I am just looking at various algorithm's efficiency. Not just big O efficiency, but practical efficiency. Anyway i was testing a Rabin Karp algorithm i wrote against a brute force string comparison ...
Alex's user avatar
  • 23
0 votes
2 answers
75 views

Is there a way in R to join between two columns based on whether a string in column 1 is contained within the string in column 2?

I am trying to join several messy datasets together without using "fuzzy matching". In the core dataset (example dataset1 below), I have simple names for companies. In the datasets I would ...
lyd-m's user avatar
  • 3
-1 votes
2 answers
64 views

Compare two columns (with merged phone numbers) if any phone number from first column exists in the second column

I need to compare two columns which are in resulting data frame and those two columns are coming from a separate sources. Now, I would like to compare them and have a resulting (tag) column based on ...
sebekkg's user avatar
  • 17
1 vote
1 answer
333 views

Split full address to contain only street name

I have a table with address1, city, state, and postal code. However, some address1 will also contains city, state and postal code (separated by either comma or space or both). Example: Address1: 9999 ...
shano's user avatar
  • 23
-1 votes
3 answers
93 views

Having trouble with regex in Java 11

Trying to strip server name from: //some.server.name/path/to/a/dir (finishing with /path/to/a/dir) I have tried 3 different regexes (hardcoded works), but the other two look like they should work but ...
Andy Knipp's user avatar
0 votes
0 answers
24 views

SQL fulltext search using containstable returns false match

The problem I am facing right now is that the full-text search in SQL doesn't yield the results that I would be expecting. The containstable method returns a result that does not contain the provided ...
DasMonopol's user avatar
1 vote
1 answer
81 views

Lookup items of Col1 in Col2 and Comment the matching Percentage

My data frame: data = {'Col1': ['Bad Homburg', 'Bischofferode', 'Essen', 'Grabfeld OT Rentwertshausen','Großkrotzenburg','Jesewitz/Weg','Kirchen (Sieg)','Laudenbach a. M.','Nachrodt-Wiblingwerde','...
s_max's user avatar
  • 25
0 votes
0 answers
48 views

How can I compare the order in which characters appear in excel?

The problem - I want to decide how similar two strings are based on the order in which the letters appear. For instance, comparing the strings "Paul" and "JoPaul". JoPau has 2 ...
Ne Mo's user avatar
  • 230
1 vote
0 answers
344 views

Create embeddings for string matching

I have 4 lists of companies names. Lets take a company Google. In List A, Google is written as Google Ltd, In 2nd list, it is written as Google Inc (extended etc), 3rd contain Beta Gogl (misspelled ...
user3585510's user avatar
1 vote
1 answer
257 views

Powershell Question How to Select Specific Characters in a File's Name?

I'm trying to create a Powershell script that looks for just files with the extension .dgn within a specific directory. Then if it has a character string of "_ch_" in the name of the file ...
Grot's user avatar
  • 55
0 votes
0 answers
23 views

VScode - regex find match in the middle and remove start and end [duplicate]

I want to replace all (start and end) of the string but the parameter in the middle (for example @ModelKey or @ProductNumber) from this Input [MODEL_KEY] = IIF(@ModelKey IS NOT NULL, @ModelKey , [...
surfmuggle's user avatar
  • 5,776
0 votes
1 answer
74 views

PHP extract a substring between two strings before a substring found

I have this string of escaped html code: $html=" ... euro�&lt;strong&gt;0,00&lt;/strong&gt;�sono relativi a Operazioni finanziarie di &lt;strong&gt;Importo Ridotto&lt;/...
Jenemj's user avatar
  • 35
0 votes
1 answer
97 views

How to get the matched groups in regex Python and save it as a new column

I have a dataframe and i want to find out, if there was any mentions of the firms that i'm looking for in DocumentIdentifier column. probably it should be done through Regex groups, but I'm not sure ...
Mostafa Bouzari's user avatar
0 votes
1 answer
150 views

find url in web page content using powershell

I need to search for https://cdn.windwardstudios.com/Archive/23.X/23.3.0/JavaRESTfulEngine-23.3.0.32.zip url from https://www.windwardstudios.com/version/version-downloads using powershell. Thus i ...
Ashar's user avatar
  • 3,420
2 votes
2 answers
70 views

How to split the rows in the array using match()?

I have a matrix containing arrays of rows. let matrix=[['hello'],['world']]; I'm duplicating rows. matrix=matrix.map(x=>String(x).repeat(2)).map(x=>x.match(new RegExp??)) I want to get [['...
Lelik's user avatar
  • 27
3 votes
4 answers
37 views

Return a data frame subset based on similar (not identical) elements in a vector?

I have a dataframe (dim 2914 x 6) where one column is a vector of animal groups and species abbreviations, e.g. "bird_F.pw", and I have a separate vector of a few species abbreviations, e.g. ...
ElizaBeso000's user avatar
-4 votes
1 answer
98 views

Feedback on my Javascript search engine project. Prints not all accepting result, no errors displayd [closed]

[JSON data recipes][1] 'use strict'; const cakeRecipes = require("./cake-recipes.json"); console.log(cakeRecipes[0]); // If you're ready to test: uncomment the code below. // printRecipes(...
Yass's user avatar
  • 5
0 votes
1 answer
38 views

How is it practically possible to compute an automaton inside a function and then return it?

I'm trying to follow Cormen - Algorithms, 3rd edition. Specifically, Chapter VII, 32 "String Matching". In general, I find this book extremely hard to follow, due to the abundance of math-...
ScienceDiscoverer's user avatar
0 votes
1 answer
108 views

Understanding a Specific Detail in the KMP Pattern Matching Algorithm

I have a question about the KMP pattern matching algorithm. Below is a code snippet for calculating the next array: int GetNext(char ch[], int length, int next[]) { next[1] = 0; int i = 1, j = ...
NewGreat H's user avatar
0 votes
0 answers
52 views

Question about the KMP pattern matching algorithm

I have a question about the KMP pattern matching algorithm. Below is a code snippet for calculating the next array: int GetNext(char ch[], int length, int next[]) { next[1] = 0; int i = 1, j = ...
NewGreat H's user avatar
2 votes
2 answers
59 views

Search for a large block of lines across directory

I have found that a large section of json I am pretty sure has been copied to about 80 files. I have that section edited down into FILEA, it is 95 lines of text. I want to grep -lr -F FILEA . EXCEPT, ...
Steve Hammond's user avatar

1
2 3 4 5
47