pandas Cheat Sheet

`pandas`

`concat`

pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)

concat method concatenates pandas objects along a particular axis

parameter	input	description
`objs`	a sequence or mapping of Series/DataFrame objects
`join`	{`inner`,`outer`}
`verify_integrity`	bool, default `False`	check whether the new concatenated axis contains duplicates (expensive)

`to_datetime`

pandas.to_datetime('arg, errors='raise', dayfirst=False, yearfirst=False, utc=False, format=None, exact=no_default, unit=None, infer_datetime_format=no_default, origin='unix', cache=True)

to_datetime method converts a scalar, array-like Series/DataFrame to a pandas datetime object

parameter	input	description
`arg`	int, float, datetime, list, tuple, etc	object to convert to a datetime
`errors`	{`ignore`,`raise`,`coerce`}	`raise` raises exception, `coerce` will set invalid to `NaT`, `ignore` will return the input
`dayfirst`	bool, default `False`	parses with date first
`yearfirst`	bool, default `False`	parses with year first
`format`	str, default `None`	e.g.'%d/%m/%Y', `ISO8601`, `mixed` to infer format (should go with `dayfirst`)
`unit`	str, default 'ns'
`infer_datetime_format`	bool, default `False`	if `True` and no `foramt` given, infer format of the datetime from the first previous non-NaT element

`merge`

pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x','_y'), copy=None, indicator=False, validate=None)

mergemethod merges DataFrame or named Series objects with a database-style join

parameter	input	description
`left`	df or series
`right`	df or series
`how`	{`left`,`right`,`outer`,`inner`,`cross`}
`on`	label or list	column or index level names to join

`pandas.Series`

`apply`

pandas.Series.apply(func, convert_dtype=no_default, args=(), *, by_row='compat', **kwargs)

apply method invokes function on values of Series

`map`

pandas.Series.map(arg, na_action=None)

map maps values of Series according to an input mapping or function

`pandas.DataFrame`

`describe`

pandas.DataFrame.describe(percentiles=None, include=None, exclude=None)

parameter	input
`percentiles`	list-like numbers. Default is [.25, .5, .75]
`include`	`all` list-like of dtypes or `None`
`exclude`	list-like dtypes or `None`

Consider df that has not only integer but also object as dtype:

df.describe(include=['object'])

will return statistics of object columns as well as other numerical columns.

`reindex`

pandas.DataFrame.reindex(labels=None, *, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=None, limit=None, tolerance=None)

parameter	input	description
`labels`	array-like
`index`	array-like
`columns`	array-like
`axis`	int or str	can be either axis name(`index`,`columns`) or number(0, 1)
`method`	{`None`, `bfill`, `ffill`, `nearest`}	method to use for filling holes in the reindexed
`copy`	boolean	return new object
`level`	int or name
`fill_value`	scalar, default `np.nan`	value to use for missing values
`limit`	int, default `None`	max number of consecutive elements to `ffill` or `bfill`

# setting index to 'name' column
df = df.set_index('name')

# resetting index onto df itself
df.reset_index(inplace=True)

`duplicated`

duplicated method returns boolean Series denoting duplicate rows.

pandas.DataFrame.duplicated(subset=None, keep='first')

parameter	input	description
`subset`	column label or sequence of labels	Only consider certain columns for duplicate check
`keep`	{`first`, `last`, `False}	mark duplicate as `True` except for the `first` or `last` occurence. `False` will mark all duplicates as `True`

`drop`, `drop_duplicates`, `drop_na`

pandas.DataFrame.drop(labels=None, *, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

parameter	input	description
`labels`	single label or list-like	index or column labels to drop
`axis`	0 or `index`, 1 or `columns`
`index`,`columns`	single label or list-like
`inplace`	bool, default `False`
`errors`	{`ignore`,`raise`}	`ignore` supresses error and only existing labels are dropped.

  pandas.DataFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)

parameter	input	description
`subset`	column label or sequence or labels	Only consider certain columns for identifying duplicates, be default use all of the columns
`keep`	{`first`,`last`,`False`}	drop duplicates except for the `first`, or `last`. `False` to drop all duplicates
`ignore_index`	bool, default `False`	if `True`,resulting axis will be indexed 0, 1, .. n-1

pandas.DataFrame.dropna(*, axis=0, how=no_default, thresh=no_default, subset=None, inplace=False, ignore_index=False)

parameter	input	description
`how`	{'any', 'all'}, default `any`	if `any` NA value present, drop. if `all` values are NA, drop.
`thresh`	int	short for threshold. Require that many Na

`groupby`

pandas.DataFrame.groupby(by=None, axis=no_default, level=None, as_index=None, sort=True, observed=no_default, dropna=True)

parameter	input	description
`by`	mapping, function, label, list etc.	used to determine the groups for groupby
`as_index`	bool, default `True`	return object with group labels as index

`sort_values`, `sort_index`

pandas.DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='Last', ignore_index=False, key=None)

parameter	input	description
`by`	`str` or list of `str`	if axis=0(`index`), then by can be index levels or column labels. if axis=1(`columns`), column levels or index labels
`axis`	{0 or `index`, 1 or `columns`}, default 0	axis to be sorted
`kind`	{`quicksort`,`mergesort`,`heapsort`,`stable`}, default `quicksort`
`na_position`	{`first`,`last`}
`key`	callable	apply key function to the values before sorting

pandas.DataFrame.sort_index(*, axis=0, level=None,ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index=False, key=None)

sort_index returns the original DataFrame sorted by the labels (index)

Sample JS Code / Starting http via Python

Python and Javascript caveats and considerations

Importing Modules/Scripts

Simple Data Processing

Data Types & Basic Functions

Iteration

Conditionals: if elif else

File Input and Output

Classes and Prototypes

ETC

Reading and Writing Data with Python (CSV, JSON)

Reading and Writing Data with Python (datetime)

SQLAlchemy

Dataset

MongoDB

Webdev 101(HTML)

Webdev 101(CSS)

A Basic Page with Placeholders

Flex Box (CSS): Positioning and Sizing Containers

Scalable Vector Graphics(SVG)

Getting Data: Requests library

Using Libraries to Access WEB APIs

Scraping Data with Python (Bautiful Soup, lxml)

Srapy Part I : The Basics

Scrapy Part II. Full Implementation

Srapy Part III. Scrapy Pipelines

Numpy, Pandas

Cleaning Data with pandas

matplotlib Part I: Basics

matplotlib Part II: Bar Charts and Scatter Charts

matplotlib Part III: seaborn

numpy Cheat Sheet

pandas Cheat Sheet

matplotlib Cheat Sheet

pandas Cheat Sheet

pandas

concat

to_datetime

merge

pandas.Series

apply

map

pandas.DataFrame

describe

reindex

duplicated

drop, drop_duplicates, drop_na

groupby

sort_values, sort_index

`pandas`

`concat`

`to_datetime`

`merge`

`pandas.Series`

`apply`

`map`

`pandas.DataFrame`

`describe`

`reindex`

`duplicated`

`drop`, `drop_duplicates`, `drop_na`

`groupby`

`sort_values`, `sort_index`