2.4. Statistics with pandas#
Recall some functions such as np.mean() and np.max(); these functions can be used to calculate a row’s or column’s statistics. Say you want to know what’s the average hardness of the different minerals:
import pandas as pd
file_location = ("mineral_properties.txt")
df4 = pd.read_csv(file_location + 'mineral_properties.txt',sep=',',header=[1],
skiprows=None, index_col=0, skipinitialspace=True)
df4['hardness'].mean()
4.666666666666667
Often we don’t know much about the data, and printing all the values is inconvenient. In that case, it’s wise to take a look at some of its attributes first.
See the labels of the columns and rows.
print(df4.columns)
print('----------------------')
print(df4.index)
Index(['hardness', 'sp. gr.', 'cleavage'], dtype='object')
----------------------
Index(['Amphibole', 'Biotite', 'Calcite', 'Dolomite', 'Feldspars', 'Garnet',
'Graphite', 'Kyanite', 'Muscovite', 'Pyroxene', 'Quartz',
'Sillimanite'],
dtype='object', name='name')
df4.info is similar to print(df4.info).
df4.info
<bound method DataFrame.info of hardness sp. gr. cleavage
name
Amphibole 5.50 2.800 Two
Biotite 2.75 3.000 One
Calcite 3.00 2.720 Three
Dolomite 3.00 2.850 Three
Feldspars 6.00 2.645 Two
Garnet 7.00 3.900 Fracture
Graphite 1.50 2.300 One
Kyanite 6.00 4.010 One
Muscovite 2.25 2.930 One
Pyroxene 5.50 3.325 Two
Quartz 7.00 2.650 Fracture
Sillimanite 6.50 3.230 One>
2.4.1. Deep copying a DataFrame#
As you have seen in Notebook 4, shallow copies can be troublesome if you’re not aware of it. In pandas, it’s the same story.
To make a deep copy use the DataFrame.copy(deep=True) function.
df_deep = df4.copy(deep=True)
Now, altering df_deep will not alter df4; and vice-versa.
2.5. Additional study material:#
After this Notebook you should be able to:
understand
SeriesandDataFramesconcatenate
DataFrameswork with different labels of a
DataFramedrop unwanted rows and columns
access and modify values within your
DataFrameimport data into a
pandas DataFramemanipulate a
DataFramein several important ways