This notebook shows a general example of mass balance calculation, we currently support two algorithms:

Non-negative algorithm, it solves KKT (Karush-Kuhn-Tucker) conditions for the non-negative least squares problem
Matrix decomposition, following Li et al. (2020) and Ghiorso 1983. The compositional matrix will be broken down by singular value decomposition, the coefficients will then calculated.

[1]:

# import python modules
import pandas as pd
import matplotlib.pyplot as plt
from massbalance.mb_tools import MassBalance

1. Define elements for calculation

We first need to define element list for mass balance calculation, special attention goes to FeO term, which we use FeO, not FeOt nor FeOtot, this is consistent with the input excel files. Feel free to change the order of the elements

[2]:

cmpnts = ['SiO2','Al2O3', 'TiO2', 'MgO', 'FeO', 'MnO',  'CaO', 'Na2O', 'K2O', 'P2O5', 'Cr2O3']  # change your desired elements for mass balance calculation

if you also want to integrate standard deviation on elements, this can be done manually or by the following line

[3]:

cmpnts_std =[x + '_std' for x in cmpnts]  # save columns for std
cmpnts_std

[3]:

['SiO2_std',
 'Al2O3_std',
 'TiO2_std',
 'MgO_std',
 'FeO_std',
 'MnO_std',
 'CaO_std',
 'Na2O_std',
 'K2O_std',
 'P2O5_std',
 'Cr2O3_std']

2. Load data

The easist way is to copy paste data in the input excel file, please also check README file for details of data prepration

[4]:

nat_comp = pd.ExcelFile("input_comp.xlsx")

We load the whole excel spreadsheet to a pandas ExcelFile, to access data in your spreadsheet, you can do so by looping through ExcelFile to a dictionary:

[5]:

nat_dict = {}  # create dictionary to collect compositions of different phases
nat_phases = []  # create list to save all phases in the calculation. Note bulk will be the bulk compsition, run_index will be your sample number, or rock id . etc
for sheet_name in nat_comp.sheet_names:
    nat_phases.append(sheet_name)
    nat_dict[sheet_name] = nat_comp.parse(sheet_name)

nat_phases stores the sheet names, nat_dict stores the data of each sheet.

[6]:

nat_phases

[6]:

['gl', 'ol', 'sp', 'py', 'pureFe', 'pureNa', 'bulk', 'run_index']

Use the sheet name to access data, let’s what we have in the gl phase

[7]:

nat_dict['gl'].head()

[7]:

	Run_no	SiO2	Al2O3	P2O5	CaO	FeO	Na2O	MgO	TiO2	K2O	...	Al2O3_std	P2O5_std	CaO_std	FeO_std	Na2O_std	MgO_std	TiO2_std	K2O_std	MnO_std	Cr2O3_std
0	02As1	46.66	10.16	0.17	11.64	13.06	1.95	13.15	2.43	0.49	...	0.17	0.10	0.22	0.34	0.10	0.26	0.20	0.04	0.04	0.04
1	01As1	46.96	10.34	0.20	11.81	13.21	1.84	12.16	2.53	0.44	...	0.11	0.07	0.20	0.21	0.10	0.12	0.14	0.05	0.03	0.08
2	04As1	47.20	10.80	0.23	12.13	12.77	1.87	11.04	2.65	0.48	...	0.11	0.05	0.21	0.45	0.14	0.09	0.16	0.05	0.05	0.03
3	03As1	48.38	11.36	0.18	12.63	13.03	1.42	10.35	2.54	0.43	...	0.16	0.06	0.12	0.27	0.07	0.19	0.27	0.05	0.05	0.05
4	05As1	47.18	11.63	0.23	12.84	12.57	2.04	9.42	2.60	0.51	...	0.12	0.05	0.25	0.20	0.12	0.18	0.23	0.05	0.04	0.05

5 rows × 25 columns

we use run_index sheet to store all runs we wish to perform mass balance with, we can check by doing:

[8]:

nat_dict['run_index']

[8]:

	Run_no	T_C	fO2
0	02As1	1320	QFM
1	01As1	1302	QFM
2	04As1	1280	QFM
3	03As1	1260	QFM
4	05As1	1240	QFM
5	06As1	1200	QFM
6	15As1	1180	QFM
7	17As1	1170	QFM
8	14As1	1160	QFM

3. Mass balance calculation with MCMC progagting errors

MassBalance class does the core calculation. To start the calculation, we need to define the MassBalance class by passing all necessary infos for calculations: 1. input_comp – load the ExcelFile variable 2. comp_col – load element list 3. comp_std_col – load element std list (if avaiable), default is None 4. match_column – this is the column name that saves your expts run no., sample no., or rock id. in each spreadsheet 5. bulk_sheet – sheet name of which saves bulk composition(s) 6. index_sheet – sheet name stores entire infos about expts run numbers ± condition, sample numbers or rock id. 7. normalize – normalize your data to 100 during calculation, default is True

[9]:

mb_cal = MassBalance(
    input_comp=nat_comp,
    comp_col=cmpnts,
    comp_std_col=cmpnts_std,
    match_column="Run_no",
    bulk_sheet="bulk",
    index_sheet="run_index",
    normalize=True,
)

mb_cal is a variable we create for mass balance calculation The calculation is done by calling the compute method in the MassBalance class as mb_cal.compute() Four parameters can be tuned during computing: 1. mc – define the number of MC calculations, the default is None, which will be a one time calculation 2. exportFiles – we can export the results to excel file output.xlsx and output_mean_median_std.xlsx. If mc is given, output_mean_median_std.xlsxstores statistical infos for mc results 3. batch_bulk: define if you have different bulk for each group of phases, this is useful for layers in layered intrusions or expts with variable starting compositions for different runs. If you only have one bulk composition, can turn it to False 4. method: use nnl for non-negative method, svd for matrix decomposition method

Below we give examples to perform 100 times MC calculation with non-negative method, export files, and we have pre-defined a batch bulk composition in the input file. More examples are given in the python scipt along with this code

[10]:

res_dict = mb_cal.compute(mc=100, exportFiles=True, batch_bulk=True, method='nnl')

The result is saved in a python dictionary, with each key responses to the expts run no., sample no, or rock id you defined in the index sheet Let’s take a look the results of first experimental run 02As1

[11]:

res_02As1 = res_dict['02As1']
res_02As1

[11]:

	gl	ol	sp	py	pureFe	pureNa	r2	residues
0	0.980309	0.025299	0.0	0.0	0.0	0.000665	0.318216	0.101261
1	0.959695	0.031032	0.0	0.0	0.0	0.001112	1.174406	1.379230
2	0.979235	0.019375	0.0	0.0	0.0	0.002412	1.023295	1.047133
3	0.981838	0.011746	0.0	0.0	0.0	0.001503	0.629961	0.396851
4	0.968629	0.035022	0.0	0.0	0.0	0.000457	0.707438	0.500469
...	...	...	...	...	...	...	...	...
95	0.981687	0.022148	0.0	0.0	0.0	0.000000	1.030632	1.062203
96	0.980176	0.015765	0.0	0.0	0.0	0.000000	0.661183	0.437163
97	0.980972	0.025854	0.0	0.0	0.0	0.001035	0.950629	0.903696
98	0.974249	0.021821	0.0	0.0	0.0	0.000774	1.953085	3.814540
99	0.993793	0.003956	0.0	0.0	0.0	0.000000	0.733781	0.538435

100 rows × 8 columns

This is the MC results (100 times), to calculate statistical infos, you can simply call the built-in pandas functions as:

[12]:

res_02As1.describe()

[12]:

	gl	ol	sp	py	pureFe	pureNa	r2	residues
count	100.000000	100.000000	100.0	100.0	100.000000	100.000000	100.000000	100.000000
mean	0.981366	0.020590	0.0	0.0	0.000010	0.000551	0.876400	0.870697
std	0.009831	0.007873	0.0	0.0	0.000092	0.000666	0.321958	0.630622
min	0.957179	0.000337	0.0	0.0	0.000000	0.000000	0.275481	0.075890
25%	0.976014	0.015349	0.0	0.0	0.000000	0.000000	0.663642	0.440423
50%	0.981365	0.020979	0.0	0.0	0.000000	0.000344	0.812141	0.660398
75%	0.985775	0.025433	0.0	0.0	0.000000	0.000914	1.063028	1.130062
max	1.012758	0.042551	0.0	0.0	0.000910	0.002896	1.953085	3.814540

or only interested parameters

[13]:

res_02As1.agg(['mean', 'median', 'std'])

[13]:

	gl	ol	pureFe	pureNa	r2	residues
mean	0.981366	0.020590	0.000010	0.000551	0.876400	0.870697
median	0.981365	0.020979	0.000000	0.000344	0.812141	0.660398
std	0.009831	0.007873	0.000092	0.000666	0.321958	0.630622

To plot the result, you can simply do:

[14]:

fig, axes = plt.subplots(1,2, constrained_layout=True)
axes[0].hist(res_02As1['gl'], density=True)
axes[1].hist(res_02As1['ol'], density=True)
axes[0].set_xlabel('gl')
axes[1].set_xlabel('ol')
plt.show()

We pre-defined pure FeO and pure Na2O phases as the evaluation of Fe loss and Na loss, if you don’t want to check this, simply delete these sheets Below we give an example to calculate Fe loss (%) and Na loss (%) in the first experimental run 02As1

[15]:

nat_bulk = nat_dict['bulk'].set_index('Run_no')
nat_bulk.loc['02As1', 'FeO']

[15]:

12.42

[16]:

nat_bulk.loc['02As1', 'Na2O']

[16]:

1.96

[17]:

res_02As1['Fe_loss'] = res_02As1['pureFe'] / nat_bulk.loc['02As1', 'FeO'] * 100 * 100
res_02As1['Fe_loss'].agg(['mean', 'median', 'std'])

[17]:

mean      0.008275
median    0.000000
std       0.073758
Name: Fe_loss, dtype: float64

[18]:

res_02As1['Na2O_loss'] = res_02As1['pureNa'] / nat_bulk.loc['02As1', 'Na2O'] * 100 * 100
res_02As1['Na2O_loss'].agg(['mean', 'median', 'std'])

[18]:

mean      2.813345
median    1.756814
std       3.398196
Name: Na2O_loss, dtype: float64

Fin ~ happy coding!