Bootstrapping — is random sampling to estimate the standard deviation, or 95th confidence interval (or some other interval).

Monte Carlo — is estimating the probability of an event.

If one wants to get the mean of randomly sampling, I don’t think that is either monte carlo OR bootstrapping?

## Here is the code to get the mean and std of n random cores from 100 core samples.

sample_number = 3

# choose n random samples from the group

df_3_random = df.groupby(‘CEA_strata’).apply(lambda x: x.sample(sample_number)).reset_index(drop=True)

sample_number = str(sample_number)

print(df_3_random.groupby([‘CEA’,’separate_strata_name’])[‘cumulative_whole_carbon_mass’].count().reset_index().round(2) )# this will be count)

df_3_random.groupby([‘CEA’,’separate_strata_name’]).agg(mean_carbon=(‘cumulative_whole_carbon_mass’,np.mean), std_carbon=(‘cumulative_whole_carbon_mass’,np.std)).reset_index()

--

--

### If I wanted to get the p value to list on the column of each graph

## steps — first partition the string##

before, sep, p_CEA1 = str(stats.kruskal(cea_1_T0, cea_1_T1)).partition(‘pvalue=’)

# then convert the second half of the text with ‘)’ in there to a 4 decimal place float.

p_CEA1 = f’{float(p_CEA1.split(“)”)[0]):.4f}’

p_CEA1

--

--

Different methods of normalizing data:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1

Scaling using sklearn, including normalising the std dev.

Write a function:

def normalize_columns(arr):
rows, cols = arr.shape
for col in xrange(cols):
arr[:,col] /= abs(arr[:,col]).max()

--

--

Merging is simple enough with two rows. However with multiple rows, concat is simpler.

Remember to use [], and add axis = 1 to ensure columns are bound.

df = pd.concat([cea_sample_97, cea_b_103, cea_model_only_90, cea_model_MV_sites_102], axis = 1)

--

--

Read in SQL db from the Audit pack:

#Create a SQL connection to our SQLite database

con = sqlite3.connect(‘Audit_Pack_DataAggregation_220705.db’)

cur = con.cursor()

# select one of the tables from the db

df = pd.read_sql_query(‘SELECT * FROM “0-sites_data”;’, con)

# close the connection

con.close()

df

--

--

# join cores & latts & longs for 7 properties

# Note — check if reading data from 30 cm layer or 120 cm layer

for p in range(0, len( property )):

# read in files and correct column headings

cov_df = pd.read_csv(‘C:/Users/MelZeppel/OneDrive — McCosker Contracting Pty Ltd/ML_development/2022_covariates_QGIS/’ + property.iloc[p] + ‘_ML_covariates_2208.csv’)

cov_df = cov_df.rename(columns = {‘core_numbe’:’core_number’})

property_name = cov_df[‘property_n’].iloc[0]

print(property_name)

layer_30 = pd.read_csv(‘C:/Users/MelZeppel/OneDrive — McCosker Contracting Pty Ltd/ML_development/carbon_point_join_files/point_join_’ + property_name + ‘_30_no_nulls.csv’)

# #join and export files

df = cov_df.merge(layer_30, on = ‘core_number’, how = ‘left’)

df = df.rename(columns = {‘cea_name_x’:’cea_name’, ‘sampling_r’:’sampling_round’})

df = df.drop(columns = {‘cea_name_y’})

lower_depth = df[‘lower_depth’].iloc[0].astype(str)

print(df.head() )

# df = df[[‘property_name’, ‘lower_depth’,’cea_name’, ‘strata_name’, ‘sampling_round’, ‘core_number’, ‘actual_latitude’, ‘actual_longitude’, ‘core_carbon_mass’]]

df.to_csv(‘C:/Users/MelZeppel/OneDrive — McCosker Contracting Pty Ltd/ML_development/2022_covariates_QGIS/’ + property.iloc[p] + lower_depth + ‘_ML_covariates_2208.csv’)

--

--

# Read in the Mullion FlintPro sqlite db

conn = sqlite3.connect(“Simplified_RothC_DB.sqlite”)

# Create a SQL connection to our SQLite database

con = sqlite3.connect(“Simplified_RothC_DB.sqlite”)

cur = con.cursor()

# select one of the tables from

df = pd.read_sql_query(‘SELECT * FROM soil_inputs;’, con)

# close the connection

con.close()

df

--

--

Melanie Zeppel

Women in AI: Agribusiness winner - 2022 Superstar of STEM 2022-2023 Scopus Sustainability Researcher of the Year - 2019