Skip to main content

Pandas Part V: Data Loading, Storage, and File Formats

Reading and Writing Data in Text Format
Function Description
read_csv csv
read_clipboard clipboard, converting tables from web pages
read_excel xls, xlsx
read_hdf HDF5 files written by pandas
read_html html
read_json JSON
read_feather Feather binary file
read_orc Apache ORC binary file
read_parquest Apache Parquet binary file
read_pickle Python pickle
read_sas SAS dataset
read_spss SPSS file
read_sql SQL query
read_sql_table SQL table
read_stata Stata file
read_xml XML file
read_csv()
pandas.read_csv(filepath_or_buffer, *, sep=_NoDefault.no_default,
              delimiter=None, header='infer', names=_NoDefault.no_default,
              index_col=None, usecols=None, dtype=None,
              engine=None, converters=None,
              true_values=None, false_values=None,
              skipinitialspace=False, skiprows=None, skipfooter=0,
              nrows=None, na_values=None, keep_default_na=True, na_filter=True,
              verbose=_NoDefault.no_default, skip_blank_lines=True, parse_dates=None,
              infer_datetime_format=_NoDefault.no_default,
              keep_date_col=_NoDefault.no_default, date_parser=_NoDefault.no_default,
              date_format=None, dayfirst=False, cache_dates=True,
              iterator=False, chunksize=None, compression='infer',
              thousands=None, decimal='.', lineterminator=None,
              quotechar='"', quoting=0, doublequote=True,
              escapechar=None, comment=None, encoding=None, encoding_errors='strict',
              dialect=None, on_bad_lines='error',
              delim_whitespace=_NoDefault.no_default,
              low_memory=True, memory_map=False,
              float_precision=None, storage_options=None, dtype_backend=_NoDefault.no_default)
Argument Description
sep,delimiter character sequence or regex to split
header row number to use as column names; defaults to 0 should be None if no header row
index_col col numbers or names to use as the row index
names list of column names
skiprows number of rows at the beginning to ignore or list or row numbers[0:]
na_values sequence of values to replace with NA
keep_deafult_na whether to use the default NA list or not
comment characters to split comments off the end of lines
parse_dates attempt to parse data to datetime False by default
keep_date_col If joining columns to parse date, keep the joined columns False by default
converters dictionary containing column number of name mapping to functions {'foo': f}
dayfirst parse date with day-first option
date_parser function to use to parse dates
nrows number of rows to read from begining of file(not counting the header)
iterator return a TextFileReader object for reading the file piecemeal.
chunksize for iteration, size of file chunks
skip_footer number of lines to ignore at end of file
verbose print various parsing information
encoding text encoding utf-8 by default
squeeze if parsed data contains only one cloumn, return a Series
thousands separator for thousands None by default
engine CSV parsing and conversion engine to use {c,python,pyarrow}