pandas Cheat Sheet
pandas
concat
pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)
concat
method concatenates pandas objects along a particular axis
parameter | input | description |
---|---|---|
objs |
a sequence or mapping of Series/DataFrame objects | |
join |
{inner ,outer } |
|
verify_integrity |
bool, default False |
check whether the new concatenated axis contains duplicates (expensive) |
to_datetime
pandas.to_datetime('arg, errors='raise', dayfirst=False, yearfirst=False, utc=False, format=None, exact=no_default, unit=None, infer_datetime_format=no_default, origin='unix', cache=True)
to_datetime
method converts a scalar, array-like Series/DataFrame to a pandas datetime object
parameter | input | description |
---|---|---|
arg |
int, float, datetime, list, tuple, etc | object to convert to a datetime |
errors |
{ignore ,raise ,coerce } |
raise raises exception, coerce will set invalid to NaT , ignore will return the input |
dayfirst |
bool, default False |
parses with date first |
yearfirst |
bool, default False |
parses with year first |
format |
str, default None |
e.g.'%d/%m/%Y', ISO8601 , mixed to infer format (should go with dayfirst ) |
unit |
str, default 'ns' | |
infer_datetime_format |
bool, default False |
if True and no foramt given, infer format of the datetime from the first previous non-NaT element |
merge
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x','_y'), copy=None, indicator=False, validate=None)
merge
method merges DataFrame or named Series objects with a database-style join
parameter | input | description |
---|---|---|
left |
df or series | |
right |
df or series | |
how |
{left ,right ,outer ,inner ,cross } |
|
on |
label or list | column or index level names to join |
pandas.Series
apply
pandas.Series.apply(func, convert_dtype=no_default, args=(), *, by_row='compat', **kwargs)
apply
method invokes function on values of Series
map
pandas.Series.map(arg, na_action=None)
map
maps values of Series according to an input mapping or function
pandas.DataFrame
describe
pandas.DataFrame.describe(percentiles=None, include=None, exclude=None)
parameter | input |
---|---|
percentiles |
list-like numbers. Default is [.25, .5, .75] |
include |
all list-like of dtypes or None |
exclude |
list-like dtypes or None |
Consider df that has not only integer
but also object
as dtype:
df.describe(include=['object'])
will return statistics of object
columns as well as other numerical columns.
reindex
pandas.DataFrame.reindex(labels=None, *, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=None, limit=None, tolerance=None)
parameter | input | description |
---|---|---|
labels |
array-like | |
index |
array-like | |
columns |
array-like | |
axis |
int or str | can be either axis name(index ,columns ) or number(0, 1) |
method |
{None , bfill , ffill , nearest } |
method to use for filling holes in the reindexed |
copy |
boolean | return new object |
level |
int or name | |
fill_value |
scalar, default np.nan |
value to use for missing values |
limit |
int, default None |
max number of consecutive elements to ffill or bfill |
# setting index to 'name' column
df = df.set_index('name')
# resetting index onto df itself
df.reset_index(inplace=True)
duplicated
duplicated
method returns boolean Series denoting duplicate rows.
pandas.DataFrame.duplicated(subset=None, keep='first')
parameter | input | description |
---|---|---|
subset |
column label or sequence of labels | Only consider certain columns for duplicate check |
keep |
{first , last , `False} |
mark duplicate as True except for the first or last occurence. False will mark all duplicates as True |
drop
, drop_duplicates
, drop_na
pandas.DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
parameter | input | description |
---|---|---|
labels |
single label or list-like | index or column labels to drop |
axis |
0 or index , 1 or columns |
|
index ,columns |
single label or list-like | |
inplace |
bool, default False |
|
errors |
{ignore ,raise } |
ignore supresses error and only existing labels are dropped. |
pandas.DataFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)
parameter | input | description |
---|---|---|
subset |
column label or sequence or labels | Only consider certain columns for identifying duplicates, be default use all of the columns |
keep |
{first ,last ,False } |
drop duplicates except for the first , or last . False to drop all duplicates |
ignore_index |
bool, default False |
if True ,resulting axis will be indexed 0, 1, .. n-1 |
pandas.DataFrame.dropna(*, axis=0, how=no_default, thresh=no_default, subset=None, inplace=False, ignore_index=False)
parameter | input | description |
---|---|---|
how |
{'any', 'all'}, default any |
if any NA value present, drop. if all values are NA, drop. |
thresh |
int | short for threshold. Require that many Na |
groupby
pandas.DataFrame.groupby(by=None, axis=no_default, level=None, as_index=None, sort=True, observed=no_default, dropna=True)
parameter | input | description |
---|---|---|
by |
mapping, function, label, list etc. | used to determine the groups for groupby |
as_index |
bool, default True |
return object with group labels as index |
sort_values
, sort_index
pandas.DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='Last', ignore_index=False, key=None)
parameter | input | description |
---|---|---|
by |
str or list of str |
if axis=0(index ), then by can be index levels or column labels. if axis=1(columns ), column levels or index labels |
axis |
{0 or index , 1 or columns }, default 0 |
axis to be sorted |
kind |
{quicksort ,mergesort ,heapsort ,stable }, default quicksort |
|
na_position |
{first ,last } |
|
key |
callable | apply key function to the values before sorting |
pandas.DataFrame.sort_index(*, axis=0, level=None,ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)
sort_index
returns the original DataFrame sorted by the labels (index)