Statistics with `pandas`

2.4. Statistics with `pandas`#

Recall some functions such as np.mean() and np.max(); these functions can be used to calculate a row’s or column’s statistics. Say you want to know what’s the average hardness of the different minerals:

import pandas as pd
file_location = ("mineral_properties.txt")
df4 = pd.read_csv(file_location + 'mineral_properties.txt',sep=',',header=[1], 
                  skiprows=None, index_col=0, skipinitialspace=True)
                  

df4['hardness'].mean()

4.666666666666667

Often we don’t know much about the data, and printing all the values is inconvenient. In that case, it’s wise to take a look at some of its attributes first.

See the labels of the columns and rows.

print(df4.columns)
print('----------------------')
print(df4.index)

Index(['hardness', 'sp. gr.', 'cleavage'], dtype='object')
----------------------
Index(['Amphibole', 'Biotite', 'Calcite', 'Dolomite', 'Feldspars', 'Garnet',
       'Graphite', 'Kyanite', 'Muscovite', 'Pyroxene', 'Quartz',
       'Sillimanite'],
      dtype='object', name='name')

df4.info is similar to print(df4.info).

df4.info

<bound method DataFrame.info of              hardness  sp. gr.  cleavage
name                                    
Amphibole        5.50    2.800       Two
Biotite          2.75    3.000       One
Calcite          3.00    2.720     Three
Dolomite         3.00    2.850     Three
Feldspars        6.00    2.645       Two
Garnet           7.00    3.900  Fracture
Graphite         1.50    2.300       One
Kyanite          6.00    4.010       One
Muscovite        2.25    2.930       One
Pyroxene         5.50    3.325       Two
Quartz           7.00    2.650  Fracture
Sillimanite      6.50    3.230       One>

2.4.1. Deep copying a `DataFrame`#

As you have seen in Notebook 4, shallow copies can be troublesome if you’re not aware of it. In pandas, it’s the same story.

To make a deep copy use the DataFrame.copy(deep=True) function.

df_deep = df4.copy(deep=True)

Now, altering df_deep will not alter df4; and vice-versa.

2.5. Additional study material:#

After this Notebook you should be able to:

understand Series and DataFrames
concatenate DataFrames
work with different labels of a DataFrame
drop unwanted rows and columns
access and modify values within your DataFrame
import data into a pandas DataFrame
manipulate a DataFrame in several important ways

Statistics with pandas

Contents

2.4. Statistics with pandas#

2.4.1. Deep copying a DataFrame#

2.5. Additional study material:#

After this Notebook you should be able to:

Statistics with `pandas`

2.4. Statistics with `pandas`#

2.4.1. Deep copying a `DataFrame`#