Skip to main content

pandas Cheat Sheet

pandas

concat
pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)

concat method concatenates pandas objects along a particular axis

parameter input description
objs a sequence or mapping of Series/DataFrame objects
join {inner,outer}
verify_integrity bool, default False check whether the new concatenated axis contains duplicates (expensive)
to_datetime
pandas.to_datetime('arg, errors='raise', dayfirst=False, yearfirst=False, utc=False, format=None, exact=no_default, unit=None, infer_datetime_format=no_default, origin='unix', cache=True)

to_datetime method converts a scalar, array-like Series/DataFrame to a pandas datetime object

parameter input description
arg int, float, datetime, list, tuple, etc object to convert to a datetime
errors {ignore,raise,coerce} raise raises exception, coerce will set invalid to NaT, ignore will return the input
dayfirst bool, default False parses with date first
yearfirst bool, default False parses with year first
format str, default None e.g.'%d/%m/%Y', ISO8601, mixed to infer format (should go with dayfirst)
unit str, default 'ns'
infer_datetime_format bool, default False if True and no foramt given, infer format of the datetime from the first previous non-NaT element
merge
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x','_y'), copy=None, indicator=False, validate=None)

mergemethod merges DataFrame or named Series objects with a database-style join

parameter input description
left df or series
right df or series
how {left,right,outer,inner,cross}
on label or list column or index level names to join

pandas.Series

apply
pandas.Series.apply(func, convert_dtype=no_default, args=(), *, by_row='compat', **kwargs)

apply method invokes function on values of Series

map
pandas.Series.map(arg, na_action=None)

map maps values of Series according to an input mapping or function

pandas.DataFrame

describe
pandas.DataFrame.describe(percentiles=None, include=None, exclude=None)
parameter input
percentiles list-like numbers. Default is [.25, .5, .75]
include all list-like of dtypes or None
exclude list-like dtypes or None

Consider df that has not only integer but also object as dtype:

df.describe(include=['object'])

will return statistics of object columns as well as other numerical columns.

reindex
pandas.DataFrame.reindex(labels=None, *, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=None, limit=None, tolerance=None)
parameter input description
labels array-like
index array-like
columns array-like
axis int or str can be either axis name(index,columns) or number(0, 1)
method {None, bfill, ffill, nearest} method to use for filling holes in the reindexed
copy boolean return new object
level int or name
fill_value scalar, default np.nan value to use for missing values
limit int, default None max number of consecutive elements to ffill or bfill
# setting index to 'name' column
df = df.set_index('name')

# resetting index onto df itself
df.reset_index(inplace=True)
duplicated

duplicated method returns boolean Series denoting duplicate rows.

pandas.DataFrame.duplicated(subset=None, keep='first')
parameter input description
subset column label or sequence of labels Only consider certain columns for duplicate check
keep {first, last, `False} mark duplicate as True except for the first or last occurence. False will mark all duplicates as True
drop, drop_duplicates, drop_na
pandas.DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
parameter input description
labels single label or list-like index or column labels to drop
axis 0 or index, 1 or columns
index,columns single label or list-like
inplace bool, default False
errors {ignore,raise} ignore supresses error and only existing labels are dropped.
  pandas.DataFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)
parameter input description
subset column label or sequence or labels Only consider certain columns for identifying duplicates, be default use all of the columns
keep {first,last,False} drop duplicates except for the first, or last. False to drop all duplicates
ignore_index bool, default False if True,resulting axis will be indexed 0, 1, .. n-1
pandas.DataFrame.dropna(*, axis=0, how=no_default, thresh=no_default, subset=None, inplace=False, ignore_index=False)
parameter input description
how {'any', 'all'}, default any if any NA value present, drop. if all values are NA, drop.
thresh int short for threshold. Require that many Na
groupby
pandas.DataFrame.groupby(by=None, axis=no_default, level=None, as_index=None, sort=True, observed=no_default, dropna=True)
parameter input description
by mapping, function, label, list etc. used to determine the groups for groupby
as_index bool, default True return object with group labels as index
sort_values, sort_index
pandas.DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='Last', ignore_index=False, key=None)
parameter input description
by str or list of str if axis=0(index), then by can be index levels or column labels. if axis=1(columns), column levels or index labels
axis {0 or index, 1 or columns}, default 0 axis to be sorted
kind {quicksort,mergesort,heapsort,stable}, default quicksort
na_position {first,last}
key callable apply key function to the values before sorting
pandas.DataFrame.sort_index(*, axis=0, level=None,ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)

sort_index returns the original DataFrame sorted by the labels (index)