Questions tagged [johnsnowlabs-spark-nlp]
John Snow Labs’ NLP is a natural language processing tool built on top of Apache Spark ML pipelines
johnsnowlabs-spark-nlp
108
questions
8
votes
1
answer
2k
views
Do Spark-NLP pretrained pipelines only work on linux systems?
I am trying to set up a simple code where I pass a dataframe and test it with the pretrained explain pipeline provided by johnSnowLabs Spark-NLP library.
I am using jupyter notebooks from anaconda ...
8
votes
2
answers
4k
views
unable to download the pipeline provided by spark-nlp library
i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library
i tried installing different versions of pyspark and spark-nlp library
import sparknlp
from ...
6
votes
2
answers
5k
views
spark-nlp : DocumentAssembler initializing failing with 'java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class'
I am trying out the ContenxtAwareSpellChecker provided in https://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc
The first of the component in the pipeline is a ...
6
votes
1
answer
2k
views
How should we use the setDictionary for the lemmatization annotator in Spark-NLP?
I have a requirement where I have to add a dictionary in the lemmatization step. While trying to use it in a pipeline and doing pipeline.fit() I get a arrayIndexOutOfBounds exception.
What is the ...
5
votes
3
answers
9k
views
After installing sparknlp, cannot import sparknlp
The following ran successfully on a Cloudera CDSW cluster gateway.
import pyspark
from pyspark.sql import SparkSession
spark = (SparkSession
.builder
.config("spark.jars....
4
votes
1
answer
3k
views
How to load a spark-nlp pre-trained model from disk
From the spark-nlp Github page I downloaded a .zip file containing a pre-trained NerCRFModel. The zip contains three folders: embeddings, fields, and metadata.
How do I load that into a Scala ...
4
votes
1
answer
7k
views
spark-nlp 'JavaPackage' object is not callable
I am using jupyter lab to run spark-nlp text analysis. At the moment I am just running the sample code:
import sparknlp
from pyspark.sql import SparkSession
from sparknlp.pretrained import ...
4
votes
1
answer
693
views
SparkNLP Sentiment Analysis in Java
I want to use SparkNLP for doing sentiment analysis on a spark dataset on column column1 using the default trained model. This is my code:
DocumentAssembler docAssembler = (DocumentAssembler) new ...
4
votes
1
answer
5k
views
Spark Python Pyspark How to flatten a column with an array of dictionaries and embedded dictionaries (sparknlp annotator output)
I'm trying to extract the output from the sparknlp (using Pretrained Pipeline 'explain_document_dl'). I have spent a lot of time looking for ways (UDFs, explode, etc) but cannot get anywhere close to ...
3
votes
1
answer
1k
views
How to use JohnSnowLabs NLP Spell correction module NorvigSweetingModel?
I was going through the JohnSnowLabs SpellChecker here.
I found the Norvig's algorithm implementation there, and the example section has just the following two lines:
import com.johnsnowlabs.nlp....
3
votes
1
answer
374
views
Where can I find a list of class labels for pretrained SparkNLP NerDLModel?
I have been searching for a while but no luck finding out what NER labels are included in the pretrained NerDL(tensorflow) model. I would think the training data can provide such information, but I do ...
3
votes
1
answer
2k
views
TypeError: 'JavaPackage' object is not callable | using java 11 for spark 3.3.0, sparknlp 4.0.1 and sparknlp jar from spark-nlp-m1_2.12
spark nlp jar, I got it from https://jar-download.com/artifacts/com.johnsnowlabs.nlp/spark-nlp-m1_2.12/4.0.1/source-code
JAVA_HOME = C:\Program Files\Java\jdk-18.0.1.1
In the system variables and ...
3
votes
1
answer
270
views
Sentence similarity with SparkNLP only works on Google Dataproc with ONE sentence, FAILS when multiple sentences are provided
Deployed the following colab python code(see link below) to Dataproc on Google Cloud and it only works when the input_list is an array with one item, when the input_list has two items then the PySpark ...
2
votes
1
answer
537
views
Is it possible to use the library Spark-NLP with Spark Structured Streaming?
I want to perform tweets sentiment analysis on a stream of messages I get from a Kafka cluster that, in turn, gets the tweets from the Twitter API v2.
When I try to apply the pre-trained sentiment ...
2
votes
2
answers
320
views
Local data cannot be read in a Dataproc cluster, when using SparkNLP
I am trying to build a Dataproc cluster, with Spark NLP installed in it, then quick test it by reading some CoNLL 2003 data. First, I used this codelab as inspiration, to build my own smaller cluster (...
2
votes
1
answer
3k
views
Spark-nlp Pretrained-model not loading in windows
I am trying to install pretrained pipelines in spark-nlp in windows 10 with python.
The following is the code I have tried so far in the Jupyter notebook in the local system:
! java -version
# should ...
2
votes
1
answer
3k
views
Spark-nlp: can't load pretrained recognize entity model from disk in pyspark
I have a spark cluster set up and would like to integrate spark-nlp to run named entity recognition. I need to access the model from disk rather than download it from the internet at runtime. I have ...
2
votes
1
answer
233
views
multilingual bert in spark nlp
I was wondering if pre-trained multilingual Bert is available in sparknlp?
As you know Bert is pre-trained for 109 languages. I was wondering if all of these languages are in spark bert too?
Thanks
2
votes
1
answer
815
views
Can't get the johnsnow OCR notebook run on databricks
So I am trying to follow this notebook and get it to work on a databricks notebook: https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/ocr-spell/OcrSpellChecking.ipynb ; However, ...
2
votes
1
answer
429
views
Does John Snow Labs’ NLP library built on top of Apache Spark support Java
John Snow Labs’ NLP library built on top of Apache Spark and Spark ML library.
All its examples are provided in scala and python. Does it support java? If yes where can I find the related guides? If ...
2
votes
1
answer
165
views
Wrong or missing inputCols annotators - spark-nlp
I'm new to NLP and started with the spark-nlp package for Python. I trained a simple NER model, which I saved and now want to use. However, I am facing the problem of wrong or missing inputCols, ...
2
votes
1
answer
899
views
Spark NLP is not working in PySpark: TypeError: 'JavaPackage' object is not callable
I'm trying to spark-submit a PySpark application but every time I try it throws this error when it tries to download a pre-trained model from Spark NLP:
TypeError: 'JavaPackage' object is not callable
...
2
votes
1
answer
744
views
BERT embeddings in SPARKNLP or BERT for token classification in huggingface [closed]
Currently I am working on productionize a NER model on Spark. I have a current implementation that is using Huggingface DISTILBERT with the TokenClassification head, but as the performance is a bit ...
2
votes
1
answer
435
views
NLP analysis for some pyspark dataframe columns by numpy vectorization
I would like to do some NLP analysis for a string column in pyspark dataframe.
df:
year month u_id rating_score p_id review
2010 09 tvwe 1 p_5 I do not like it because its size is not ...
2
votes
1
answer
131
views
How to extract embeddings generated from sparknlp WordEmbeddingsModel to feed a RNN model using keras and tensorflow
I have a text classification problem.
I'm particularly interested in this embedding model in sparknlp because I have a dataset from Wikipedia in 'sq' language. I need to convert sentences of my ...
2
votes
0
answers
45
views
Generate a summarizing word based on a set of words
I'm very new to NLP, so I have some theoretical question.
Let's say I have the following Spark dataframe:
+--+------------------------------------------+
|id| word_list|...
2
votes
0
answers
158
views
I am getting a TypeError: 'JavaPackage' object is not callable when trying to perform DocumentAssembler() in google colab
While trying to call the DocumentAssembler() in google colab, I am getting the above error. I have used '!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 2.4.5 -s 2.6.5' for ...
2
votes
0
answers
509
views
java.lang.VerifyError: Bad return type Reason: Type 'java/lang/Object' (current frame, stack[0]) is not assignable to 'org/tensorflow/Tensor'
I want to run sparknlp in python, I am using apache-spark 3.2.1, spark-nlp==3.4.1 pyspark==3.1.2. I am following this guide. I am able to get the spark session using this code :
sc = pyspark....
2
votes
1
answer
894
views
How to load SparkNLP offline model in python
I need to use sparknlp to do lemmatization in python, i want to use the pretrained pipeline, however need to do it offline. what is the correct way to do this? i am not able to find any python example....
2
votes
1
answer
1k
views
Using pretrained models from sparknlp on Databricks
I am trying to follow the official examples from John Snow Labs but every time I get a TypeError: 'JavaPackage' object is not callable error. I followed all of the steps in the Databricks install ...
2
votes
0
answers
40
views
How to identify main entity (category) if query contain multiple category
I want to extract out the key intent of user by identify the key category from the probable category identified by some process.
E.g. Christmas tree ornament
Above query has 2 category in it
1) ...
2
votes
0
answers
250
views
Version Compatibility issues with Scala, Spark, Spark NLP
I am new to 'Spark NLP' and I got stuck in version compatibility issues only. That may seems to be silly but still I request you to help me in this:
‘Spark NLP’ is built on top of Apache Spark 2.4.0 ...
1
vote
1
answer
547
views
How to use NER model fine tuned using hugging face transformers with spark nlp on databricks
I needed to train (fine tune) NER token classifier to recognize our custom tokens.
The easiest way to do that I found was:
Token Classification with W-NUT Emerging Entities
But now I encountered a ...
1
vote
2
answers
269
views
"Param poolingLayer does not exist" error coming while loading BERT embedding model in spark-nlp
My NLP pipeline uses pre-trained BERT embedding model "bert_base_uncased" from johnsnowlabs. But while loading this downloaded model I am getting following exception.
Caused by: java.util....
1
vote
1
answer
523
views
Regex in Spark NLP Normalizer is not working correctly
I'm using the Spark NLP pipeline to preprocess my data. Instead of only removing punctuation, the normalizer also removes umlauts.
My code:
documentAssembler = DocumentAssembler() \
.setInputCol(&...
1
vote
1
answer
509
views
Cannot use SparkNLP pre-trained T5Transformer, executor fails with error "No Operation named [encoder_input_ids] in the Graph"
Downloaded T5-small model from SparkNLP website, and using this code (almost entirely from the examples):
import com.johnsnowlabs.nlp.SparkNLP
import com.johnsnowlabs.nlp.annotators.seq2seq....
1
vote
1
answer
190
views
Palantir Foundry - run Spark-NLP library offline
I am trying to run the spark-nlp library offline in Palantir Foundry. We do not have egress configured to make http calls, so I'm attempting to use spark-nlp in offline mode by downloading ...
1
vote
1
answer
131
views
Remove the repeated punctuation from pyspark dataframe
I need to remove the repeated punctuations and keep the last occurrence only.
For example: !!!! -> !
!!$$ -> !$
I have a dataset that looks like below
temp = spark.createDataFrame([...
1
vote
1
answer
3k
views
How to start Spark session on Vertex AI workbench Jupyterlab notebook?
Can you kindly show me how do we start the Spark session on Google Cloud Vertex AI workbench Jupyterlab notebook?
This is working fine in Google Colaboratory by the way.
What is missing here?
# ...
1
vote
1
answer
1k
views
Converting Spacy NER entity format to CONLL 2003 format
I am working on NER application where i have data annotated in the following data format.
[('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}),
('did you see the F16 landing?',...
1
vote
1
answer
157
views
SparkNLP PipelineModel which includes AnnotatorApproach in stages
In a SparkNLP's PipelineModel all the stages have to be of type AnnotatorModel. But what if one of those annotatormodels requires a certain column in the dataset as input and this input column is the ...
1
vote
1
answer
1k
views
How to install offline Spark NLP packages
How can I install offline Spark NLP packages without internet connection.
I've downloaded the package (recognizee_entities_dl) and uploaded it to the cluster.
I've installed Spark NLP using pip ...
1
vote
1
answer
976
views
How do we extract named entities in scala using any nlp library
I have a huge text file and I have to extract only named entites in from this file. I am using Scala language and Databricks cluster for this.
val input = sc.textFile('....Mypath...').flatMap(line =&...
1
vote
1
answer
2k
views
Persist BERT model on disk as pickle file
I have managed to get the BERT model to work on johnsnowlabs-spark-nlp library. I am able to save the "trained model" on disk as follows.
Fit Model
df_bert_trained = bert_pipeline.fit(textRDD)
...
1
vote
2
answers
1k
views
requirement failed: Wrong or missing inputCols annotators in johnsnowlabs.nlp
I'm using com.johnsnowlabs.nlp-2.2.2 with spark-2.4.4 to process some articles. In those articles, there are some very long words I'm not interested in and which slows down the POS tagging a lot.
I ...
1
vote
1
answer
757
views
Not able to use JohnSnowLabs pretrained model in Zeppelin
I want to use the JohnSnowLabs pretrained spell check module in my Zeppelin notebook. As mentioned here I have added com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.3 to the Zeppelin dependency section as ...
1
vote
0
answers
85
views
spark-nlp : DocumentAssembler initializing failing with java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class ESG
I'm trying to use the john snow ESG model.
And I keep getting the following error:
Line document_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document')
Error java.lang....
1
vote
0
answers
52
views
I have Dataframe Spark and I want to generate Ngrams but the way gensim bigram model does it
I have a text dataframe (tweets), I am using Spark for high volume data handling and I want to generate Bigrams in the same way as Gensim bigrams models do. I have been using Spark NLP for ...
1
vote
0
answers
40
views
SparkNLP messages disabled
When I run the code "spark = sparknlp.start(), it always returns such a message to the terminal, which is very annoying.
23/07/13 14:23:33 WARN Utils: Your hostname, ---------- resolves to a ...
1
vote
0
answers
122
views
How to get vocabulary from WordEmbeddingsModel in sparknlp
I need to create an embedding matrix from embeddings generated by WordEmbeddingsModel in sparknlp. Until now i have this code :
from sparknlp.annotator import *
from sparknlp.common import *
from ...