Questions tagged [pandas]
Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data science libraries in Python.
pandas
287,969
questions
4119
votes
34
answers
7.5m
views
How can I iterate over rows in a Pandas DataFrame?
I have a pandas dataframe, df:
c1 c2
0 10 100
1 11 110
2 12 120
How do I iterate over the rows of this dataframe? For every row, I want to access its elements (values in cells) by the name ...
3541
votes
18
answers
6.5m
views
How do I select rows from a DataFrame based on column values?
How can I select rows from a DataFrame based on values in some column in Pandas?
In SQL, I would use:
SELECT *
FROM table
WHERE column_name = some_value
2993
votes
33
answers
6.6m
views
Renaming column names in Pandas
I want to change the column labels of a Pandas DataFrame from
['$a', '$b', '$c', '$d', '$e']
to
['a', 'b', 'c', 'd', 'e']
2264
votes
22
answers
4.3m
views
Delete a column from a Pandas DataFrame
To delete a column in a DataFrame, I can successfully use:
del df['column_name']
But why can't I use the following?
del df.column_name
Since it is possible to access the Series via df.column_name, I ...
1948
votes
19
answers
4.4m
views
How do I get the row count of a Pandas DataFrame?
How do I get the number of rows of a pandas dataframe df?
1761
votes
22
answers
4.1m
views
Selecting multiple columns in a Pandas dataframe
How do I select columns a and b from df, and save them into a new dataframe df1?
index a b c
1 2 3 4
2 3 4 5
Unsuccessful attempt:
df1 = df['a':'b']
df1 = df.ix[:, 'a':'b']
1647
votes
43
answers
2.5m
views
How to change the order of DataFrame columns?
I have the following DataFrame (df):
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(10, 5))
I add more column(s) by assignment:
df['mean'] = df.mean(1)
How can I move the ...
1540
votes
17
answers
4.0m
views
Change column type in pandas
I created a DataFrame from a list of lists:
table = [
['a', '1.2', '4.2' ],
['b', '70', '0.03'],
['x', '5', '0' ],
]
df = pd.DataFrame(table)
How do I convert the columns to ...
1452
votes
17
answers
2.2m
views
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
I have this DataFrame and want only the records whose EPS column is not NaN:
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 ...
1435
votes
26
answers
2.4m
views
How to deal with SettingWithCopyWarning in Pandas
Background
I just upgraded my Pandas from 0.11 to 0.13.0rc1. Now, the application is popping out many new warnings. One of them like this:
E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value ...
1405
votes
32
answers
2.3m
views
Create a Pandas Dataframe by appending one row at a time [duplicate]
How do I create an empty DataFrame, then add rows, one by one?
I created an empty DataFrame:
df = pd.DataFrame(columns=('lib', 'qty1', 'qty2'))
Then I can add a new row at the end and fill a single ...
1362
votes
25
answers
2.2m
views
Get a list from Pandas DataFrame column headers
I want to get a list of the column headers from a Pandas DataFrame. The DataFrame will come from user input, so I won't know how many columns there will be or what they will be called.
For example, ...
1356
votes
8
answers
1.7m
views
Use a list of values to select rows from a Pandas dataframe
Let’s say I have the following Pandas dataframe:
df = DataFrame({'A': [5,6,3,4], 'B': [1,2,3,5]})
df
A B
0 5 1
1 6 2
2 3 3
3 4 5
I can subset based on a specific value:
x =...
1324
votes
33
answers
2.7m
views
How to add a new column to an existing DataFrame
I have the following indexed DataFrame with named columns and rows not- continuous numbers:
a b c d
2 0.671399 0.101208 -0.181532 0.241273
3 0.446172 -0.243316 0....
1213
votes
14
answers
1.5m
views
Pretty-print an entire Pandas Series / DataFrame
I work with Series and DataFrames on the terminal a lot. The default __repr__ for a Series returns a reduced sample, with some head and tail values, but the rest missing.
Is there a builtin way to ...
1191
votes
8
answers
951k
views
Convert list of dictionaries to a pandas DataFrame
How can I convert a list of dictionaries into a DataFrame?
I want to turn
[{'points': 50, 'time': '5:00', 'year': 2010},
{'points': 25, 'time': '6:00', 'month': "february"},
{'points':90,...
1184
votes
16
answers
354k
views
"Large data" workflows using pandas [closed]
I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible ...
1129
votes
10
answers
2.6m
views
Writing a pandas DataFrame to CSV file
I have a dataframe in pandas which I would like to write to a CSV file.
I am doing this using:
df.to_csv('out.csv')
And getting the following error:
UnicodeEncodeError: 'ascii' codec can't encode ...
1034
votes
20
answers
2.0m
views
Deleting DataFrame row in Pandas based on column value
I have the following DataFrame:
daysago line_race rating rw wrating
line_date
2007-03-31 62 11 56 1.000000 ...
1016
votes
23
answers
1.7m
views
How do I expand the output display to see more columns of a Pandas DataFrame?
Is there a way to widen the display of output in either interactive or script-execution mode?
Specifically, I am using the describe() function on a Pandas DataFrame. When the DataFrame is five ...
977
votes
22
answers
1.9m
views
Combine two columns of text in pandas dataframe
I have a dataframe that looks like
Year quarter
2000 q2
2001 q3
How do I add a new column by combining these columns to get the following dataframe?
Year quarter period
2000 q2 ...
974
votes
7
answers
823k
views
How are iloc and loc different?
Can someone explain how these two methods of slicing are different? I've seen the docs
and I've seen previous similar questions (1, 2), but I still find myself unable to understand how they are ...
931
votes
8
answers
436k
views
Pandas Merging 101
How can I perform a (INNER| (LEFT|RIGHT|FULL) OUTER) JOIN with pandas?
How do I add NaNs for missing rows after a merge?
How do I get rid of NaNs after merging?
Can I merge on the index?
How do I ...
900
votes
18
answers
1.5m
views
Filter pandas DataFrame by substring criteria
I have a pandas DataFrame with a column of string values. I need to select rows based on partial string matches.
Something like this idiom:
re.search(pattern, cell_in_question)
returning a boolean. ...
886
votes
8
answers
2.4m
views
Creating an empty Pandas DataFrame, and then filling it
I'm starting from the pandas DataFrame documentation here: Introduction to data structures
I'd like to iteratively fill the DataFrame with values in a time series kind of calculation. I'd like to ...
874
votes
12
answers
1.3m
views
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
How can I achieve the equivalents of SQL's IN and NOT IN?
I have a list with the required values. Here's the scenario:
df = pd.DataFrame({'country': ['US', 'UK', 'Germany', 'China']})
...
869
votes
15
answers
912k
views
Shuffle DataFrame rows
I have the following DataFrame:
Col1 Col2 Col3 Type
0 1 2 3 1
1 4 5 6 1
...
20 7 8 9 2
21 10 11 12 2
...
45 13 14 15 ...
866
votes
15
answers
2.5m
views
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
I want to filter my dataframe with an or condition to keep rows with a particular column's values that are outside the range [-0.25, 0.25]. I tried:
df = df[(df['col'] < -0.25) or (df['col'] > 0....
840
votes
10
answers
1.3m
views
How to convert index of a pandas dataframe into a column
How to convert an index of a dataframe into a column?
For example:
gi ptt_loc
0 384444683 593
1 384444684 594
2 384444686 596
to
index1 gi ptt_loc
...
833
votes
24
answers
1.5m
views
Constructing DataFrame from values in variables yields "ValueError: If using all scalar values, you must pass an index"
I have two variables as follows.
a = 2
b = 3
I want to construct a DataFrame from this:
df2 = pd.DataFrame({'A':a, 'B':b})
This generates an error:
ValueError: If using all scalar values, you must ...
808
votes
32
answers
1.5m
views
How do I count the NaN values in a column in pandas DataFrame?
I want to find the number of NaN in each column of my data.
804
votes
12
answers
1.7m
views
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
I have a dataframe df and I use several columns from it to groupby:
df['col1','col2','col3','col4'].groupby(['col1','col2']).mean()
In the above way, I almost get the table (dataframe) that I need. ...
795
votes
25
answers
1.8m
views
Set value for particular cell in pandas DataFrame using index
I have created a Pandas DataFrame
df = DataFrame(index=['A','B','C'], columns=['x','y'])
Now, I would like to assign a value to particular cell, for example to row C and column x. In other words, I ...
768
votes
20
answers
986k
views
Import multiple CSV files into pandas and concatenate into one DataFrame
I would like to read several CSV files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:
import glob
...
750
votes
16
answers
1.2m
views
How to apply a function to two columns of Pandas dataframe
Suppose I have a function and a dataframe defined as below:
def get_sublist(sta, end):
return mylist[sta:end+1]
df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ...
750
votes
6
answers
840k
views
How to avoid pandas creating an index in a saved csv
I am trying to save a csv to a folder after making some edits to the file.
Every time I use pd.to_csv('C:/Path of file.csv') the csv file has a separate column of indexes. I want to avoid printing ...
732
votes
12
answers
536k
views
Difference between map, applymap and apply methods in Pandas
Can you tell me when to use these vectorization methods with basic examples?
I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods ...
729
votes
19
answers
2.6m
views
How can I get a value from a cell of a dataframe?
I have constructed a condition that extracts exactly one row from my dataframe:
d2 = df[(df['l_ext']==l_ext) & (df['item']==item) & (df['wn']==wn) & (df['wd']==1)]
Now I would like to ...
728
votes
30
answers
1.5m
views
How to check if any value is NaN in a Pandas DataFrame
How do I check whether a pandas DataFrame has NaN values?
I know about pd.isnan but it returns a DataFrame of booleans. I also found this post but it doesn't exactly answer my question either.
713
votes
27
answers
1.1m
views
UnicodeDecodeError when reading CSV file in Pandas
I'm running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error...
File "C:\Importer\src\dfman\importer.py", line 26, in ...
707
votes
16
answers
1.7m
views
Convert pandas dataframe to NumPy array
How do I convert a pandas dataframe into a NumPy array?
DataFrame:
import numpy as np
import pandas as pd
index = [1, 2, 3, 4, 5, 6, 7]
a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1]
b = [0.2, np....
699
votes
50
answers
2.1m
views
pandas.parser.CParserError: Error tokenizing data
I'm trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12
I have tried to read the ...
684
votes
6
answers
986k
views
How do I check if a pandas DataFrame is empty?
How do I check if a pandas DataFrame is empty? I'd like to print some message in the terminal if the DataFrame is empty.
684
votes
13
answers
1.1m
views
Converting a Pandas GroupBy multiindex output from Series back to DataFrame
I have a dataframe:
City Name
0 Seattle Alice
1 Seattle Bob
2 Portland Mallory
3 Seattle Mallory
4 Seattle Bob
5 Portland Mallory
I perform the following grouping:
g1 ...
683
votes
6
answers
1.7m
views
How to delete rows from a pandas DataFrame based on a conditional expression [duplicate]
I have a pandas DataFrame and I want to delete rows from it where the length of the string in a particular column is greater than 2.
I expect to be able to do this (per this answer):
df[(len(df['...
676
votes
15
answers
1.3m
views
How to sort pandas dataframe by one column
I have a dataframe like this:
0 1 2
0 354.7 April 4.0
1 55.4 August 8.0
2 176.5 December 12.0
3 95.5 February 2.0
4 85.6 January 1.0
5 ...
659
votes
16
answers
1.3m
views
How to replace NaN values in a dataframe column
I have a Pandas Dataframe as below:
itm Date Amount
67 420 2012-09-30 00:00:00 65211
68 421 2012-09-09 00:00:00 29424
69 421 2012-09-16 00:00:00 29877
70 421 ...
632
votes
6
answers
689k
views
How to check if a column exists in Pandas
How do I check if a column exists in a Pandas DataFrame df?
A B C
0 3 40 100
1 6 30 200
How would I check if the column "A" exists in the above DataFrame so that I can compute:...
631
votes
8
answers
1.3m
views
Create new column based on values from other columns / apply a function of multiple columns, row-wise in Pandas
I want to apply my custom function (it uses an if-else ladder) to these six columns (ERI_Hispanic, ERI_AmerInd_AKNatv, ERI_Asian, ERI_Black_Afr.Amer, ERI_HI_PacIsl, ERI_White) in each row of my ...
628
votes
5
answers
75k
views
How can I pivot a dataframe? [closed]
What is pivot?
How do I pivot?
Long format to wide format?
I've seen a lot of questions that ask about pivot tables, even if they don't know it. It is virtually impossible to write a canonical ...