Skip to main content

Questions tagged [johnsnowlabs-spark-nlp]

John Snow Labs’ NLP is a natural language processing tool built on top of Apache Spark ML pipelines

johnsnowlabs-spark-nlp
Filter by
Sorted by
Tagged with
8 votes
1 answer
2k views

Do Spark-NLP pretrained pipelines only work on linux systems?

I am trying to set up a simple code where I pass a dataframe and test it with the pretrained explain pipeline provided by johnSnowLabs Spark-NLP library. I am using jupyter notebooks from anaconda ...
StuckProgrammer's user avatar
8 votes
2 answers
4k views

unable to download the pipeline provided by spark-nlp library

i am unable to use the predefined pipeline "recognize_entities_dl" provided by the spark-nlp library i tried installing different versions of pyspark and spark-nlp library import sparknlp from ...
bhawana's user avatar
  • 81
6 votes
2 answers
5k views

spark-nlp : DocumentAssembler initializing failing with 'java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class'

I am trying out the ContenxtAwareSpellChecker provided in https://medium.com/spark-nlp/applying-context-aware-spell-checking-in-spark-nlp-3c29c46963bc The first of the component in the pipeline is a ...
Abhishek P's user avatar
6 votes
1 answer
2k views

How should we use the setDictionary for the lemmatization annotator in Spark-NLP?

I have a requirement where I have to add a dictionary in the lemmatization step. While trying to use it in a pipeline and doing pipeline.fit() I get a arrayIndexOutOfBounds exception. What is the ...
StuckProgrammer's user avatar
5 votes
3 answers
9k views

After installing sparknlp, cannot import sparknlp

The following ran successfully on a Cloudera CDSW cluster gateway. import pyspark from pyspark.sql import SparkSession spark = (SparkSession .builder .config("spark.jars....
Clay's user avatar
  • 2,666
4 votes
1 answer
3k views

How to load a spark-nlp pre-trained model from disk

From the spark-nlp Github page I downloaded a .zip file containing a pre-trained NerCRFModel. The zip contains three folders: embeddings, fields, and metadata. How do I load that into a Scala ...
Marsellus Wallace's user avatar
4 votes
1 answer
7k views

spark-nlp 'JavaPackage' object is not callable

I am using jupyter lab to run spark-nlp text analysis. At the moment I am just running the sample code: import sparknlp from pyspark.sql import SparkSession from sparknlp.pretrained import ...
teaoverflow's user avatar
4 votes
1 answer
693 views

SparkNLP Sentiment Analysis in Java

I want to use SparkNLP for doing sentiment analysis on a spark dataset on column column1 using the default trained model. This is my code: DocumentAssembler docAssembler = (DocumentAssembler) new ...
AngryLeo's user avatar
  • 400
4 votes
1 answer
5k views

Spark Python Pyspark How to flatten a column with an array of dictionaries and embedded dictionaries (sparknlp annotator output)

I'm trying to extract the output from the sparknlp (using Pretrained Pipeline 'explain_document_dl'). I have spent a lot of time looking for ways (UDFs, explode, etc) but cannot get anywhere close to ...
Peggy's user avatar
  • 93
3 votes
1 answer
1k views

How to use JohnSnowLabs NLP Spell correction module NorvigSweetingModel?

I was going through the JohnSnowLabs SpellChecker here. I found the Norvig's algorithm implementation there, and the example section has just the following two lines: import com.johnsnowlabs.nlp....
user3243499's user avatar
  • 3,121
3 votes
1 answer
374 views

Where can I find a list of class labels for pretrained SparkNLP NerDLModel?

I have been searching for a while but no luck finding out what NER labels are included in the pretrained NerDL(tensorflow) model. I would think the training data can provide such information, but I do ...
ZEE's user avatar
  • 186
3 votes
1 answer
2k views

TypeError: 'JavaPackage' object is not callable | using java 11 for spark 3.3.0, sparknlp 4.0.1 and sparknlp jar from spark-nlp-m1_2.12

spark nlp jar, I got it from https://jar-download.com/artifacts/com.johnsnowlabs.nlp/spark-nlp-m1_2.12/4.0.1/source-code JAVA_HOME = C:\Program Files\Java\jdk-18.0.1.1 In the system variables and ...
Krishna Kumar S's user avatar
3 votes
1 answer
270 views

Sentence similarity with SparkNLP only works on Google Dataproc with ONE sentence, FAILS when multiple sentences are provided

Deployed the following colab python code(see link below) to Dataproc on Google Cloud and it only works when the input_list is an array with one item, when the input_list has two items then the PySpark ...
Machine Learning's user avatar
2 votes
1 answer
537 views

Is it possible to use the library Spark-NLP with Spark Structured Streaming?

I want to perform tweets sentiment analysis on a stream of messages I get from a Kafka cluster that, in turn, gets the tweets from the Twitter API v2. When I try to apply the pre-trained sentiment ...
Doraemon's user avatar
  • 337
2 votes
2 answers
320 views

Local data cannot be read in a Dataproc cluster, when using SparkNLP

I am trying to build a Dataproc cluster, with Spark NLP installed in it, then quick test it by reading some CoNLL 2003 data. First, I used this codelab as inspiration, to build my own smaller cluster (...
David Espinosa's user avatar
2 votes
1 answer
3k views

Spark-nlp Pretrained-model not loading in windows

I am trying to install pretrained pipelines in spark-nlp in windows 10 with python. The following is the code I have tried so far in the Jupyter notebook in the local system: ! java -version # should ...
gaurav gund's user avatar
2 votes
1 answer
3k views

Spark-nlp: can't load pretrained recognize entity model from disk in pyspark

I have a spark cluster set up and would like to integrate spark-nlp to run named entity recognition. I need to access the model from disk rather than download it from the internet at runtime. I have ...
jdukatz's user avatar
  • 66
2 votes
1 answer
233 views

multilingual bert in spark nlp

I was wondering if pre-trained multilingual Bert is available in sparknlp? As you know Bert is pre-trained for 109 languages. I was wondering if all of these languages are in spark bert too? Thanks
amir haghighi's user avatar
2 votes
1 answer
815 views

Can't get the johnsnow OCR notebook run on databricks

So I am trying to follow this notebook and get it to work on a databricks notebook: https://github.com/JohnSnowLabs/spark-nlp-workshop/blob/master/jupyter/ocr-spell/OcrSpellChecking.ipynb ; However, ...
Kay's user avatar
  • 59
2 votes
1 answer
429 views

Does John Snow Labs’ NLP library built on top of Apache Spark support Java

John Snow Labs’ NLP library built on top of Apache Spark and Spark ML library. All its examples are provided in scala and python. Does it support java? If yes where can I find the related guides? If ...
Mahesha999's user avatar
2 votes
1 answer
165 views

Wrong or missing inputCols annotators - spark-nlp

I'm new to NLP and started with the spark-nlp package for Python. I trained a simple NER model, which I saved and now want to use. However, I am facing the problem of wrong or missing inputCols, ...
padraig's user avatar
  • 31
2 votes
1 answer
899 views

Spark NLP is not working in PySpark: TypeError: 'JavaPackage' object is not callable

I'm trying to spark-submit a PySpark application but every time I try it throws this error when it tries to download a pre-trained model from Spark NLP: TypeError: 'JavaPackage' object is not callable ...
Doraemon's user avatar
  • 337
2 votes
1 answer
744 views

BERT embeddings in SPARKNLP or BERT for token classification in huggingface [closed]

Currently I am working on productionize a NER model on Spark. I have a current implementation that is using Huggingface DISTILBERT with the TokenClassification head, but as the performance is a bit ...
Ed.'s user avatar
  • 876
2 votes
1 answer
435 views

NLP analysis for some pyspark dataframe columns by numpy vectorization

I would like to do some NLP analysis for a string column in pyspark dataframe. df: year month u_id rating_score p_id review 2010 09 tvwe 1 p_5 I do not like it because its size is not ...
user3448011's user avatar
  • 1,549
2 votes
1 answer
131 views

How to extract embeddings generated from sparknlp WordEmbeddingsModel to feed a RNN model using keras and tensorflow

I have a text classification problem. I'm particularly interested in this embedding model in sparknlp because I have a dataset from Wikipedia in 'sq' language. I need to convert sentences of my ...
Aiha's user avatar
  • 51
2 votes
0 answers
45 views

Generate a summarizing word based on a set of words

I'm very new to NLP, so I have some theoretical question. Let's say I have the following Spark dataframe: +--+------------------------------------------+ |id| word_list|...
Hilary's user avatar
  • 485
2 votes
0 answers
158 views

I am getting a TypeError: 'JavaPackage' object is not callable when trying to perform DocumentAssembler() in google colab

While trying to call the DocumentAssembler() in google colab, I am getting the above error. I have used '!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash /dev/stdin -p 2.4.5 -s 2.6.5' for ...
Neelabh P Bhaskar's user avatar
2 votes
0 answers
509 views

java.lang.VerifyError: Bad return type Reason: Type 'java/lang/Object' (current frame, stack[0]) is not assignable to 'org/tensorflow/Tensor'

I want to run sparknlp in python, I am using apache-spark 3.2.1, spark-nlp==3.4.1 pyspark==3.1.2. I am following this guide. I am able to get the spark session using this code : sc = pyspark....
Rishabhg's user avatar
  • 118
2 votes
1 answer
894 views

How to load SparkNLP offline model in python

I need to use sparknlp to do lemmatization in python, i want to use the pretrained pipeline, however need to do it offline. what is the correct way to do this? i am not able to find any python example....
Fiona's user avatar
  • 85
2 votes
1 answer
1k views

Using pretrained models from sparknlp on Databricks

I am trying to follow the official examples from John Snow Labs but every time I get a TypeError: 'JavaPackage' object is not callable error. I followed all of the steps in the Databricks install ...
Frank B.'s user avatar
  • 1,863
2 votes
0 answers
40 views

How to identify main entity (category) if query contain multiple category

I want to extract out the key intent of user by identify the key category from the probable category identified by some process. E.g. Christmas tree ornament Above query has 2 category in it 1) ...
Aman Tandon's user avatar
  • 1,489
2 votes
0 answers
250 views

Version Compatibility issues with Scala, Spark, Spark NLP

I am new to 'Spark NLP' and I got stuck in version compatibility issues only. That may seems to be silly but still I request you to help me in this: ‘Spark NLP’ is built on top of Apache Spark 2.4.0 ...
amandeep1991's user avatar
  • 1,384
1 vote
1 answer
547 views

How to use NER model fine tuned using hugging face transformers with spark nlp on databricks

I needed to train (fine tune) NER token classifier to recognize our custom tokens. The easiest way to do that I found was: Token Classification with W-NUT Emerging Entities But now I encountered a ...
Lord_JABA's user avatar
  • 2,605
1 vote
2 answers
269 views

"Param poolingLayer does not exist" error coming while loading BERT embedding model in spark-nlp

My NLP pipeline uses pre-trained BERT embedding model "bert_base_uncased" from johnsnowlabs. But while loading this downloaded model I am getting following exception. Caused by: java.util....
dev.ak's user avatar
  • 29
1 vote
1 answer
523 views

Regex in Spark NLP Normalizer is not working correctly

I'm using the Spark NLP pipeline to preprocess my data. Instead of only removing punctuation, the normalizer also removes umlauts. My code: documentAssembler = DocumentAssembler() \ .setInputCol(&...
jonas's user avatar
  • 380
1 vote
1 answer
509 views

Cannot use SparkNLP pre-trained T5Transformer, executor fails with error "No Operation named [encoder_input_ids] in the Graph"

Downloaded T5-small model from SparkNLP website, and using this code (almost entirely from the examples): import com.johnsnowlabs.nlp.SparkNLP import com.johnsnowlabs.nlp.annotators.seq2seq....
shay__'s user avatar
  • 3,895
1 vote
1 answer
190 views

Palantir Foundry - run Spark-NLP library offline

I am trying to run the spark-nlp library offline in Palantir Foundry. We do not have egress configured to make http calls, so I'm attempting to use spark-nlp in offline mode by downloading ...
tessa's user avatar
  • 826
1 vote
1 answer
131 views

Remove the repeated punctuation from pyspark dataframe

I need to remove the repeated punctuations and keep the last occurrence only. For example: !!!! -> ! !!$$ -> !$ I have a dataset that looks like below temp = spark.createDataFrame([...
merkle's user avatar
  • 1,741
1 vote
1 answer
3k views

How to start Spark session on Vertex AI workbench Jupyterlab notebook?

Can you kindly show me how do we start the Spark session on Google Cloud Vertex AI workbench Jupyterlab notebook? This is working fine in Google Colaboratory by the way. What is missing here? # ...
gracenz's user avatar
  • 129
1 vote
1 answer
1k views

Converting Spacy NER entity format to CONLL 2003 format

I am working on NER application where i have data annotated in the following data format. [('The F15 aircraft uses a lot of fuel', {'entities': [(4, 7, 'aircraft')]}), ('did you see the F16 landing?',...
imhans33's user avatar
  • 133
1 vote
1 answer
157 views

SparkNLP PipelineModel which includes AnnotatorApproach in stages

In a SparkNLP's PipelineModel all the stages have to be of type AnnotatorModel. But what if one of those annotatormodels requires a certain column in the dataset as input and this input column is the ...
Martin Wunderlich's user avatar
1 vote
1 answer
1k views

How to install offline Spark NLP packages

How can I install offline Spark NLP packages without internet connection. I've downloaded the package (recognizee_entities_dl) and uploaded it to the cluster. I've installed Spark NLP using pip ...
John Doe's user avatar
  • 10.1k
1 vote
1 answer
976 views

How do we extract named entities in scala using any nlp library

I have a huge text file and I have to extract only named entites in from this file. I am using Scala language and Databricks cluster for this. val input = sc.textFile('....Mypath...').flatMap(line =&...
Reetish Chand's user avatar
1 vote
1 answer
2k views

Persist BERT model on disk as pickle file

I have managed to get the BERT model to work on johnsnowlabs-spark-nlp library. I am able to save the "trained model" on disk as follows. Fit Model df_bert_trained = bert_pipeline.fit(textRDD) ...
user8291021's user avatar
1 vote
2 answers
1k views

requirement failed: Wrong or missing inputCols annotators in johnsnowlabs.nlp

I'm using com.johnsnowlabs.nlp-2.2.2 with spark-2.4.4 to process some articles. In those articles, there are some very long words I'm not interested in and which slows down the POS tagging a lot. I ...
ticapix's user avatar
  • 1,622
1 vote
1 answer
757 views

Not able to use JohnSnowLabs pretrained model in Zeppelin

I want to use the JohnSnowLabs pretrained spell check module in my Zeppelin notebook. As mentioned here I have added com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.3 to the Zeppelin dependency section as ...
user3243499's user avatar
  • 3,121
1 vote
0 answers
85 views

spark-nlp : DocumentAssembler initializing failing with java.lang.NoClassDefFoundError: org/apache/spark/ml/util/MLWritable$class ESG

I'm trying to use the john snow ESG model. And I keep getting the following error: Line document_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document') Error java.lang....
Dolev Mitz's user avatar
1 vote
0 answers
52 views

I have Dataframe Spark and I want to generate Ngrams but the way gensim bigram model does it

I have a text dataframe (tweets), I am using Spark for high volume data handling and I want to generate Bigrams in the same way as Gensim bigrams models do. I have been using Spark NLP for ...
Criscas05's user avatar
1 vote
0 answers
40 views

SparkNLP messages disabled

When I run the code "spark = sparknlp.start(), it always returns such a message to the terminal, which is very annoying. 23/07/13 14:23:33 WARN Utils: Your hostname, ---------- resolves to a ...
user22219872's user avatar
1 vote
0 answers
122 views

How to get vocabulary from WordEmbeddingsModel in sparknlp

I need to create an embedding matrix from embeddings generated by WordEmbeddingsModel in sparknlp. Until now i have this code : from sparknlp.annotator import * from sparknlp.common import * from ...
Aiha's user avatar
  • 51