Highest scored 'scikit-learn' questions

324 votes

25 answers

390k views

Label encoding across multiple columns in scikit-learn

I'm trying to use scikit-learn's LabelEncoder to encode a pandas DataFrame of string labels. As the dataframe has many (50+) columns, I want to avoid creating a LabelEncoder object for each column; I'...

Bryan

6,129

asked Jun 27, 2014 at 18:29

317 votes

16 answers

1.1m views

How to normalize a numpy array to a unit vector

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function: def normalize(v): norm = np.linalg.norm(v) if ...

Donbeo

17.4k

asked Jan 9, 2014 at 20:25

272 votes

14 answers

538k views

Is there a library function for Root mean square error (RMSE) in python?

I know I could implement a root mean squared error function like this: def rmse(predictions, targets): return np.sqrt(((predictions - targets) ** 2).mean()) What I'm looking for if this rmse ...

siamii

23.9k

asked Jun 19, 2013 at 17:24

267 votes

27 answers

828k views

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error. ValueError: Input contains NaN, infinity or a value too ...

Ethan Waldie

2,869

asked Jul 9, 2015 at 16:40

264 votes

15 answers

497k views

ImportError: No module named sklearn.cross_validation

I am using python 2.7 in Ubuntu 14.04. I installed scikit-learn, numpy and matplotlib with these commands: sudo apt-get install build-essential python-dev python-numpy \ python-numpy-dev python-...

arthurckl

5,371

asked Jun 5, 2015 at 13:15

258 votes

7 answers

163k views

Save classifier to disk in scikit-learn

How do I save a trained Naive Bayes classifier to disk and use it to predict data? I have the following sample program from the scikit-learn website: from sklearn import datasets iris = datasets....

garak

4,763

asked May 15, 2012 at 0:06

251 votes

11 answers

417k views

Find p-value (significance) in scikit-learn LinearRegression

How can I find the p-value (significance) of each coefficient? lm = sklearn.linear_model.LinearRegression() lm.fit(x,y)

elplatt

3,337

asked Jan 13, 2015 at 17:46

250 votes

9 answers

339k views

pandas dataframe columns scaling with sklearn

I have a pandas dataframe with mixed type columns, and I'd like to apply sklearn's min_max_scaler to some of the columns. Ideally, I'd like to do these transformations in place, but haven't figured ...

flyingmeatball

7,817

asked Jul 9, 2014 at 3:57

249 votes

13 answers

245k views

How to split data into 3 sets (train, validation and test)?

I have a pandas dataframe and I wish to divide it to 3 separate sets. I know that using train_test_split from sklearn.cross_validation, one can divide the data in two sets (train and test). However, I ...

CentAu

11k

asked Jul 7, 2016 at 16:26

245 votes

9 answers

352k views

A column-vector y was passed when a 1d array was expected

I need to fit RandomForestRegressor from sklearn.ensemble. forest = ensemble.RandomForestRegressor(**RF_tuned_parameters) model = forest.fit(train_fold, train_y) yhat = model.predict(test_fold) This ...

Klausos Klausos

15.8k

asked Dec 8, 2015 at 20:47

236 votes

11 answers

132k views

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

bmasc

2,490

asked Apr 3, 2011 at 12:39

234 votes

20 answers

358k views

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Importing from pyxdameraulevenshtein gives the following error, I have pyxdameraulevenshtein==1.5.3 pandas==1.1.4 scikit-learn==0.20.2. Numpy is 1.16.1. Works well in Python 3.6, Issue in Python 3.7....

Sachit Jani

2,441

asked Feb 5, 2021 at 9:11

227 votes

16 answers

949k views

ModuleNotFoundError: No module named 'sklearn'

I want to import sklearn but there is no module apparently: ModuleNotFoundError: No module named 'sklearn' I am using Anaconda and Python 3.6.1; I have checked everywhere but still can't find ...

Hareez Rana

2,283

asked Sep 8, 2017 at 9:56

211 votes

26 answers

172k views

How to extract the decision rules from scikit-learn decision-tree?

Can I extract the underlying decision-rules (or 'decision paths') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class='X'

Dror Hilman

7,347

asked Nov 26, 2013 at 17:58

210 votes

8 answers

297k views

Random state (Pseudo-random number) in Scikit learn

I want to implement a machine learning algorithm in scikit learn, but I don't understand what this parameter random_state does? Why should I use it? I also could not understand what is a Pseudo-...

Elizabeth Susan Joseph

6,455

asked Jan 21, 2015 at 10:17

198 votes

9 answers

152k views

what is the difference between 'transform' and 'fit_transform' in sklearn

In the sklearn-python toolbox, there are two functions transform and fit_transform about sklearn.decomposition.RandomizedPCA. The description of two functions are as follows But what is the ...

tqjustc

3,764

asked May 23, 2014 at 20:42

180 votes

2 answers

207k views

How does the class_weight parameter in scikit-learn work?

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn's Logistic Regression operates. The Situation I want to use logistic regression to do binary classification ...

kilgoretrout

3,627

asked Jun 22, 2015 at 4:11

177 votes

11 answers

157k views

RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility

I have this error for trying to load a saved SVM model. I have tried uninstalling sklearn, NumPy and SciPy, reinstalling the latest versions all-together again (using pip). I am still getting this ...

Blue482

3,016

asked Nov 28, 2016 at 13:17

172 votes

6 answers

313k views

Parameter "stratify" from method "train_test_split" (scikit Learn)

I am trying to use train_test_split from package scikit Learn, but I am having trouble with parameter stratify. Hereafter is the code: from sklearn import cross_validation, datasets X = iris.data[:,:...

Daneel Olivaw

2,273

asked Jan 17, 2016 at 19:05

172 votes

30 answers

206k views

How to convert a Scikit-learn dataset to a Pandas dataset

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame? from sklearn.datasets import load_iris import pandas as pd data = load_iris() print(type(data)) data1 = pd. # Is there a ...

SANBI samples

2,108

asked Jun 27, 2016 at 7:28

166 votes

9 answers

303k views

Can anyone explain me StandardScaler?

I am unable to understand the page of the StandardScaler in the documentation of sklearn. Can anyone explain this to me in simple terms?

nitinvijay23

1,831

asked Nov 23, 2016 at 7:37

164 votes

3 answers

457k views

How can I plot a confusion matrix? [duplicate]

I am using scikit-learn for classification of text documents(22000) to 100 classes. I use scikit-learn's confusion matrix method for computing the confusion matrix. model1 = LogisticRegression() ...

minks

2,979

asked Feb 23, 2016 at 8:06

158 votes

2 answers

138k views

Logistic regression python solvers' definitions

I am using the logistic regression function from sklearn, and was wondering what each of the solver is actually doing behind the scenes to solve the optimization problem. Can someone briefly describe ...

Clement

1,730

asked Jul 28, 2016 at 15:02

158 votes

11 answers

224k views

How to use sklearn fit_transform with pandas and return dataframe instead of numpy array?

I want to apply scaling (using StandardScaler() from sklearn.preprocessing) to a pandas dataframe. The following code returns a numpy array, so I lose all the column names and indeces. This is not ...

Louic

2,553

asked Mar 1, 2016 at 12:51

152 votes

4 answers

104k views

What is exactly sklearn.pipeline.Pipeline?

I can't figure out how the sklearn.pipeline.Pipeline works exactly. There are a few explanation in the doc. For example what do they mean by: Pipeline of transforms with a final estimator. To ...

farhawa

10.3k

asked Oct 12, 2015 at 22:42

150 votes

7 answers

88k views

How are feature_importances in RandomForestClassifier determined?

I have a classification task with a time-series as the data input, where each attribute (n=23) represents a specific point in time. Besides the absolute classification result I would like to find out, ...

user2244670

1,501

asked Apr 4, 2013 at 11:53

149 votes

5 answers

78k views

What are the pros and cons between get_dummies (Pandas) and OneHotEncoder (Scikit-learn)?

I'm learning different methods to convert categorical variables to numeric for machine-learning classifiers. I came across the pd.get_dummies method and sklearn.preprocessing.OneHotEncoder() and I ...

O.rka

30.5k

asked Apr 14, 2016 at 18:28

145 votes

4 answers

319k views

How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?

I'm working in a sentiment analysis problem the data looks like this: label instances 5 1190 4 838 3 239 1 204 2 127 So my data is unbalanced since 1190 ...

new_with_python

1,607

asked Jul 15, 2015 at 4:17

145 votes

4 answers

111k views

Sklearn, gridsearch: how to print out progress during the execution?

I am using GridSearch from sklearn to optimize parameters of the classifier. There is a lot of data, so the whole process of optimization takes a while: more than a day. I would like to watch the ...

doubts

1,822

asked Jun 9, 2014 at 13:08

144 votes

4 answers

78k views

What are the different use cases of joblib versus pickle?

Background: I'm just getting started with scikit-learn, and read at the bottom of the page about joblib, versus pickle. it may be more interesting to use joblib’s replacement of pickle (joblib....

msunbot

1,951

asked Sep 27, 2012 at 6:39

141 votes

10 answers

310k views

UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples

I'm getting this weird error: classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. 'precision', 'predicted', average, ...

Sticky

3,859

asked Apr 1, 2017 at 22:05

138 votes

10 answers

360k views

how to check which version of nltk, scikit learn installed?

In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script: import nltk echo nltk.__version__ but it stops shell script at ...

nlper

2,377

asked Feb 13, 2015 at 13:46

136 votes

6 answers

282k views

Run an OLS regression with Pandas Data Frame

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,...

Michael

13.7k

asked Nov 15, 2013 at 0:47

135 votes

9 answers

310k views

Stratified Train/Test-split in scikit-learn

I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below: X, Xt, userInfo, userInfo_train = sklearn.cross_validation.train_test_split(X, userInfo) ...

pir

5,785

asked Apr 3, 2015 at 19:11

134 votes

13 answers

358k views

ImportError in importing from sklearn: cannot import name check_build

I am getting the following error while trying to import from sklearn: >>> from sklearn import svm Traceback (most recent call last): File "<pyshell#17>", line 1, in <module> ...

ayush singhal

1,919

asked Mar 7, 2013 at 15:12

132 votes

3 answers

40k views

Why does one hot encoding improve machine learning performance? [closed]

I have noticed that when One Hot encoding is used on a particular data set (a matrix) and used as training data for learning algorithms, it gives significantly better results with respect to ...

maheshakya

2,208

asked Jul 4, 2013 at 12:04

130 votes

3 answers

375k views

LogisticRegression: Unknown label type: 'continuous' using sklearn in python

I have the following code to test some of most popular ML algorithms of sklearn python library: import numpy as np from sklearn import metrics, svm from sklearn.linear_model ...

mllamazares

8,006

asked Jan 29, 2017 at 19:43

128 votes

21 answers

272k views

Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

My problem: I have a dataset which is a large JSON file. I read it and store it in the trainList variable. Next, I pre-process it - in order to be able to work with it. Once I have done that I ...

Euskalduna

1,607

asked Jul 9, 2015 at 17:19

128 votes

6 answers

110k views

Understanding min_df and max_df in scikit CountVectorizer

I have five text files that I input to a CountVectorizer. When specifying min_df and max_df to the CountVectorizer instance what does the min/max document frequency exactly mean? Is it the frequency ...

moeabdol

4,979

asked Dec 29, 2014 at 23:57

128 votes

10 answers

396k views

sklearn plot confusion matrix with labels

I want to plot a confusion matrix to visualize the classifer's performance, but it shows only the numbers of the labels, not the labels themselves: from sklearn.metrics import confusion_matrix import ...

hmghaly

1,452

asked Oct 7, 2013 at 20:08

125 votes

4 answers

268k views

ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT

I have a dataset consisting of both numeric and categorical data and I want to predict adverse outcomes for patients based on their medical characteristics. I defined a prediction pipeline for my ...

sums22

1,983

asked Jun 30, 2020 at 13:08

124 votes

3 answers

195k views

Will scikit-learn utilize GPU?

Reading implementation of scikit-learn in TensorFlow: http://learningtensorflow.com/lesson6/ and scikit-learn: http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html I'm ...

blue-sky

53.3k

asked Jan 10, 2017 at 11:37

116 votes

8 answers

170k views

Passing categorical data to Sklearn Decision Tree

There are several posts about how to encode categorical data to Sklearn Decision trees, but from Sklearn documentation, we got these Some advantages of decision trees are: (...) Able to handle both ...

0xhfff

1,275

asked Jun 29, 2016 at 19:47

115 votes

6 answers

140k views

scikit-learn .predict() default threshold

I'm working on a classification problem with unbalanced classes (5% 1's). I want to predict the class, not the probability. In a binary classification problem, is scikit's classifier.predict() using 0....

ADJ

5,112

asked Nov 14, 2013 at 18:00

115 votes

4 answers

101k views

A progress bar for scikit-learn?

Is there any way to have a progress bar to the fit method in scikit-learn ? Is it possible to include a custom one with something like Pyprind ?

user5674731

asked Dec 13, 2015 at 14:07

114 votes

8 answers

338k views

Accuracy Score ValueError: Can't Handle mix of binary and continuous target

I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric. This is ...

Arij SEDIRI

2,118

asked Jun 24, 2016 at 13:57

112 votes

10 answers

219k views

sklearn: Found arrays with inconsistent numbers of samples when calling LinearRegression.fit()

Just trying to do a simple linear regression but I'm baffled by this error for: regr = LinearRegression() regr.fit(df2.iloc[1:1000, 5].values, df2.iloc[1:1000, 2].values) which produces: ValueError:...

sunny

3,861

asked Jun 12, 2015 at 22:26

110 votes

2 answers

305k views

Converting list to numpy array

I have managed to load images in a folder using the command line sklearn: load_sample_images() I would now like to convert it to a numpy.ndarray format with float32 datatype I was able to convert it ...

Priya Narayanan

1,277

asked Nov 10, 2014 at 18:22

106 votes

14 answers

90k views

sklearn.LabelEncoder with never seen before values

If a sklearn.LabelEncoder has been fitted on a training set, it might break if it encounters new values when used on a test set. The only solution I could come up with for this is to map everything ...

cjauvin

3,623

asked Jan 11, 2014 at 1:54

105 votes

3 answers

42k views

RandomForestClassifier vs ExtraTreesClassifier in scikit learn

Can anyone explain the difference between the RandomForestClassifier and ExtraTreesClassifier in scikit learn. I've spent a good bit of time reading the paper: P. Geurts, D. Ernst., and L. Wehenkel, ...

denson

2,416

asked Mar 14, 2014 at 15:50

Collectives™ on Stack Overflow

Questions tagged [scikit-learn]

Related Tags