Creating and Loading Table Objects¶
Motivation¶
The Table is similar in concept to the DataFrame (either in R or in pandas) but is optimized for fast appends, iteration, and copying into databases. Because they are structured as lists of lists, all of the Python methods for operating with lists apply to Tables. Furthermore, Table objects provide their own set of specialized methods, with an API inspired by R’s dplyr package.
The key difference between an R or pandas DataFrame and a pgreaper Table is that the Table is designed to be copied into a SQL database, rather than attempt to replace its functionality. As such, most of the methods are geared towards collecting, cleaning and restructuring data, rather than analyzing it.
Structure¶
Each list in a Table represents a row, while an item in each row represents a cell. If you had a table stored as–say–world_gdp_data, then world_gdp_data[0] would return a list representing the first row and world_gdp_data[0][1] would be the second column of the first row.
Type-Inference¶
Table objects keep track of the data types inserted into every column, and uses this information to determine the final column type. By default, PGReaper will attempt to treat unrecognized data types as text columns. Currently, Table is able to recognize text, integer, float, jsonb (list or dict), and datetime types as well as most numpy types.
Brief Example: Creating and Loading Table Objects¶
from pgreaper import Table, table_to_pg
planes = Table(
name='planes',
col_names=['weight', 'length', 'wingspan', 'cost', 'manufacturer']
)
planes.append(...)
table_to_pg(planes, dbname='vehicles')
Full Reference: Creating and Modifying Tables¶
Slicing¶
Exotic Methods¶
These are methods which be handy for some edge cases
Operations Which Create New Tables¶
Groupby¶
Adding Rows to a Table¶
Since a Table is really just a nested list, you can use the append() method. However, Table objects will refuse to add rows which are shorter or longer than the existing rows. This is motivated by the fact that a lot of input sources, especially HTML tables, are not very clean and do not guarantee consistent record lengths even if they should.
If you want to intentionally extend a table, you can use the add_col() method to create a new column, or add_dict() method which implicitly creates new columns.
Full Reference: Loading Tables to Postgres¶
Dumping to Files¶
It is also possible to dump the contents of Table objects into files instead of SQL databases.
Jupyter Notebooks¶
Table objects take advantage of Jupyter’s pretty HTML display.