XML Schema for Monte Carlo averages


The complexity of developing XML schemas for physics simulations forces us to split the project into smaller pieces and to first identify those areas where common ground between many applications can be found.

One common element of all Monte Carlo simulations are the resulting averages of measurements. In order to facilitate common evaluation and archival programs we should agree on a standardized schema. Here is our draft proposal:
We see currently two types of averages, scalar-valued and vector-valued ones. Future xtensions could include histograms. Any number of these are collected in an <AVERAGES> element:
  <AVERAGES>
<SCALAR_AVERAGE name="Energy">
...
</SCALAR_AVERAGE>

<SCALAR_AVERAGE name="Magnetization">
...
</SCALAR_AVERAGE>

<VECTOR_AVERAGE name="Correlations" nvalues="100">
<SCALAR_AVERAGE indexvalue="0">
...
</SCALAR_AVERAGE>
...
<SCALAR_AVERAGE indexvalue="99">
...
</SCALAR_AVERAGE>
  </VECTOR_AVERAGE>
</AVERAGES>
Vector-values averages are represented as a list of scalar ones. One for each vector element. Another option would be to keep them as one average, but have all values (mean, error, ... ) vector-valued. This however does not allow easy extraction of specific indices by XSLT or other tools.

The SCALAR_AVERAGE element contains (optionally) all the relevant information such as mean value, error, variance, number of measurements, and the number of Monte Carlo steps used for thermalization:
  <SCALAR_AVERAGE name="Energy">
<MEAN method="simple">-0.9469</MEAN>
<ERROR method="binning">0.00362</ERROR>
<VARIANCE method="simple">0.000917</VARIANCE>
<COUNT>10000</COUNT>
<AUTOCORR method="binning">12.4</AUTOCORR>
<THERMALIZATION>1000</THERMALIZATION>
</SCALAR_AVERAGE>

The method attribute can be used to specify the method employed to obtain the result (e.g. a binning, jack-knife or bootstrap analysis). Optionally a program attribuite is included to record information about the program/library used to evaluate and obtain the information.

In addition to these main results often additional information, e.g. a binning analysis or time series. Currently only results from a binning analysis have been defined, in an additional element:
  <SCALAR_AVERAGE name="Energy">
<!-- best estimates here -->
<MEAN method="simple">-0.9469</MEAN>
<ERROR method="binning">0.00362</ERROR>
...
<!-- additional estimates from binning here -->
<BINNED size="128">
<COUNT>78</COUNT>
<MEAN>-0.9469</MEAN>
<ERROR>0.00343</ERROR>
</BINNED>
<BINNED size="256">
<COUNT>39</COUNT>
<MEAN>-0.9469</MEAN>
<ERROR>0.00362</ERROR>
</BINNED>
</SCALAR_AVERAGE>

The additional BINNED elements can give information about errors, mean values, etc. obtained using binning with fixed-size bins. This can provide additional information to help judge the quality of the best estimate, which should be presented in the direct child elements of SCALAR_AVERAGE .

Topics to be discussed, and possible extensions include:
We encourage your comments and ideas .


Back to XML in Compuational Physics