Jan 25Random sampling — mean & std deviation: Monte carlo & bootstrapping in Python.Bootstrapping — is random sampling to estimate the standard deviation, or 95th confidence interval (or some other interval). Monte Carlo — is estimating the probability of an event. If one wants to get the mean of randomly sampling, I don’t think that is either monte carlo OR bootstrapping? ## Here is the code to get the mean and std of n random cores from 100 core samples. sample_number = 3 # choose n random samples from the groupMonte Carlo1 min readMonte Carlo1 min read
Jan 18Convert a DataFrame into an array or array to DataFrame# DataFrame to array ndarray = df.to_numpy() # Converting an array to a DataFrame: df = pd.DataFrame( X, columns = [‘col1’] )1 min read1 min read
Dec 21, 2022Converting an object (Kruskal Wallis) to a 3 decimal place number### If I wanted to get the p value to list on the column of each graph ## steps — first partition the string## before, sep, p_CEA1 = str(stats.kruskal(cea_1_T0, cea_1_T1)).partition(‘pvalue=’) # then convert the second half of the text with ‘)’ in there to a 4 decimal place float. p_CEA1 = f’{float(p_CEA1.split(“)”)[0]):.4f}’ p_CEA11 min read1 min read
Dec 21, 2022Non parametric stats, using loopsTo run non parametric stats (Kruskal Wallis, for n > 30), on a number of properties, use this code: ## not all data are normally distributed — use a Mann-Whitney or Kruskall Wallis # transform the data using SQRT transformation for p in range(0, len( property )): df = pd.read_excel(‘C:/Users/MelZeppel/OneDrive…1 min read1 min read
Nov 30, 2022Transforming and normalising data for MLDifferent methods of normalizing data: import numpy as np a = np.random.rand(3,2) # Normalised [0,1] b = (a - np.min(a))/np.ptp(a) # Normalised [0,255] as integer: don't forget the parenthesis before astype(int) c = (255*(a - np.min(a))/np.ptp(a)).astype(int) # Normalised [-1,1] d = 2.*(a - np.min(a))/np.ptp(a)-1 Scaling using sklearn, including normalising the std dev. Write a function:1 min read1 min read
Sep 29, 2022How to columnbind in PythonMerging is simple enough with two rows. However with multiple rows, concat is simpler. Remember to use [], and add axis = 1 to ensure columns are bound. df = pd.concat([cea_sample_97, cea_b_103, cea_model_only_90, cea_model_MV_sites_102], axis = 1)1 min read1 min read
Jul 28, 2022Create new FlintPro residue SQLite databaseIncluding 3 tables in the database. # Read in the Mullion FlintPro sqlite db con = sqlite3.connect(“Simplified_RothC_DB.sqlite”) cur = con.cursor() df = pd.read_sql_query(‘SELECT * FROM soil_inputs;’, con) # df = pd.read_sql_query(‘SELECT * FROM soil_inputs;’, con) # close the connection con.close() df # make a loop to make a dataframe (with…2 min read2 min read
Jul 13, 2022Read in SQL db from the Audit pack:Read in SQL db from the Audit pack: #Create a SQL connection to our SQLite database con = sqlite3.connect(‘Audit_Pack_DataAggregation_220705.db’) cur = con.cursor() # select one of the tables from the db df = pd.read_sql_query(‘SELECT * FROM “0-sites_data”;’, con) # close the connection con.close() df1 min read1 min read
Jul 12, 2022Read in 2 CSVs from separate folders, merge and export# join cores & latts & longs for 7 properties # Note — check if reading data from 30 cm layer or 120 cm layer for p in range(0, len( property )): # read in files and correct column headings cov_df = pd.read_csv(‘C:/Users/MelZeppel/OneDrive — McCosker Contracting Pty Ltd/ML_development/2022_covariates_QGIS/’ + property.iloc[p] + ‘_ML_covariates_2208.csv’) cov_df = cov_df.rename(columns = {‘core_numbe’:’core_number’})1 min read1 min read
Jul 1, 2022Read in SQL files# Read in the Mullion FlintPro sqlite db conn = sqlite3.connect(“Simplified_RothC_DB.sqlite”) # Create a SQL connection to our SQLite database con = sqlite3.connect(“Simplified_RothC_DB.sqlite”) cur = con.cursor() # select one of the tables from df = pd.read_sql_query(‘SELECT * FROM soil_inputs;’, con) # close the connection con.close() df1 min read1 min read