# create a new column w CEA

CEA1 = df[df[‘project’].str.contains(“CEA1”)]

CEA2 = df[df[‘project’].str.contains(“CEA2”)]

CEA3 = df[df[‘project’].str.contains(“CEA3”)]

CEA4 = df[df[‘project’].str.contains(“CEA4”)]

CEA5 = df[df[‘project’].str.contains(“CEA5”)]

CEA1[‘CEA’] = ‘CEA1’

CEA2[‘CEA’] = ‘CEA2’

CEA3[‘CEA’] = ‘CEA3’

CEA4[‘CEA’] = ‘CEA4’

CEA5[‘CEA’] = ‘CEA5’

merged_df2 = pd.concat([CEA1, CEA2, CEA3, CEA4, CEA5])

merged_df2


VS Code & Python — packages & set-up

How to install packages:

# check the packages — within command prompt

pip list

# check the paths

sys.path

# Install and uninstall packages — in the terminal

pip install lux — user. # NB a space and 2 dashes there.

pip uninstall lux

# check which interpreter is being used — in the command prompt

py-0

# verify the python installation, in Command Prompt

py -3 — version # NB space plus 2 dashes

# Pip install packages

# Note the ‘ — ‘ double dash followed by user (singular) to store the files in the correct location. Add any other packages, using spaces, as require


Stats packages and code I have found useful include the following:

df.groupby(‘col1’).sum()

The different types of groupby function include these:

  1. sum(): Compute sum of group values
  2. size(): Compute group sizes
  3. count(): Compute count of group
  4. std(): Standard deviation of groups
  5. var(): Compute variance of groups
  6. sem(): Standard error of the mean of groups
  7. describe(): Generates descriptive statistics
  8. first(): Compute first of group values
  9. last(): Compute last of group values
  10. nth() : Take nth value, or a subset if n is a list
  11. min(): Compute min of group values
  12. max(): Compute max of group values

From the Command Prompt:

# check which packages you have installed

pip list

# check your paths

sys.path

#Install and uninstall packages — in the terminal — note double dash and user singlar

pip install lux — user

pip uninstall lux

First of all find out where your Python package is installed.

# Then change your directory to be pointed at the correct location.

cd C:\ProgramData\python36\Scripts\

# Next add the packages

./pip install pandas seaborn matpotlib numpy sklearn statsmodels pandas_profiling plotly.graph_objects — user


If your question is ‘how does the dependent variable change across quite a few co-variates’, I like to use (1) an ANOVA, and (2) data visualisations with facets, to see which are increasing and which are decreasing.

Here are some of the most common types of graphs I’ve used, largely from Seaborn. I include those that allow faceting, for more complex analysis.

Within Seaborn, mean and error bars, check out sns.pointplot.

Note that for some of these one needs to do a group by first, to get the mean of each group.

e.g. Q = df.groupby([‘DeviceType’,’MobileUsage’,’Generation’])[‘IsAccountActive’].mean().reset_index()

sns.pointplot

sns.pointplot = point…


Handy code:

###########################################

### Select specific columns

df=df[[‘Name’]]

###########################################

### Drop columns

df.drop(‘column_name’, axis=1, inplace=True)

###########################################

# filter for a smaller group

df=df[df[‘Month’]==”2020–03–31"]

# remove outliers
df = df[(df.Price < 500000) & (df.Landsize < 5000)]

###########################################

# rename column headings

df= df.rename(columns={‘DueMonth’:’Target_Month’})

##############################

## Cross tab

pd.crosstab(df.A,df.B, normalize=’index’)\

.round(4)*100

#################

# write to csv t

dfxxx.to_csv(r’dfxxx.csv’)

# read csv

gender = pd.read_csv(r’gender_traits_df.csv’)###########################################

### Select specific columns

df=df[[‘Name’]]

###########################################

### Drop columns

df.drop(‘column_name’, axis=1, inplace=True)

###########################################

# filter for a smaller group

df=df[df[‘Month’]==”2020–03–31"]

# remove outliers
df = df[(df.Price < 500000) & (df.Landsize < 5000)]…

Mel Zeppel

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store