wiki:PythonUncertainties

Overview of uncertainty tools for Python

Abstract

In this page, we gather a list of Python packages useful for uncertainty studies.

Introduction

In practice, OpenTURNS is the main tool that we use for uncertainty studies. It provides a simple, efficient and global solution for most studies that we perform. Nevertheless, it may happen that some part of the computations cannot be entirely done within OT. This is because OT does not provide all the scientific computing tools which are more or less connected to a practical computation. Actually, even with infinite resources, it would seem unnecessary to re-develop and integrate all the existing tools within OT.

This is why there is a need to use other software components. Even if OT is a C++ library, many users only focus on the Python layer. From this point of view, it seems interesting to use OT in combination with other Python tools. The remaining problem is to have a sufficiently good knowledge of the Python tools which are available for uncertainty studies. The goal of this wiki page is to fill this need.

Scipy

SciPy (pronounced "Sigh Pie") is open-source software for mathematics, science, and engineering. The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines such as routines for numerical integration and optimization.

SciPy's license is free for both commercial and non-commercial use, under the BSD terms.

Thesee are, at least, the modules which are relevant to uncertainty within Scipy:

  • special: special function types (bessel, gamma, airy, etc.)
  • maxentropy: Support for fitting maximum entropy models, either discrete or continuous
  • stats: statistical functions (stdev, var, mean, etc.)
  • optimize: constrained and unconstrained optimization methods and root-finding algorithms

More details on Scipy are available at:

http://www.scipy.org/

Scipy.stats

The scipy.stats package is the statistical package in Scipy.

There are two general distribution classes that have been implemented for encapsulating continuous random variables and discrete random variables. Over 80 continuous random variables (RVs) and 10 discrete random variables have been implemented using these classes.

The main public methods for continuous RVs are:

  • rvs: Random Variates
  • pdf: Probability Density Function
  • cdf: Cumulative Distribution Function
  • sf: Survival Function (1-CDF)
  • ppf: Percent Point Function (Inverse of CDF)
  • isf: Inverse Survival Function (Inverse of SF)
  • stats: Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis
  • moment: non-central moments of the distribution

The main additional methods of the not frozen distribution are related to the estimation of distribution parameters:

  • fit: maximum likelihood estimation of distribution parameters, including location and scale
  • fit_loc_scale: estimation of location and scale when shape parameters are given
  • nnlf: negative log likelihood function
  • expect: Calculate the expectation of a function against the pdf or pmf

The stats package also has methods to analyze one sample, including the T-test (ttest_1samp) and KS-test (kstest). There are additional methods (skewtest and kurtosistest) to test whether a sample could have been drawn from a normal distribution.

To compare two samples, Scipy provides the Kolmogorov-Smirnov test for two samples (ks_2samp).

The gaussian_kde function can be used to estimate the kernel density of univariate or multivariate data.

Scikit-learn

scikit-learn is a Python module integrating classic machine learning algorithms. It aims to provide simple and efficient solutions to learning problems, accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering.

Scikit-learn is provided under the BSD license (3 clause), implying that it is free for both commercial and non-commercial use.

The authors of Scikit-learn are David Cournapeau, Matthieu Brucher, Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort and Vincent Michel.

Binaries of Scikit-learn are available with pip or easy_install. Windows binaries are also available for Python 2.6 and 2.7.

An example is presented at:

http://scikit-learn.org/stable/auto_examples/plot_classification_probability.html

This package is developped with GIT:

https://github.com/scikit-learn/scikit-learn

More details on Scikit-learn are available at:

http://scikit-learn.org/stable/

Orange

Orange is a comprehensive, component-based software suite for machine learning and data mining. This can be done through visual programming or Python scripting. Its graphical user interface builds upon the cross-platform Qt framework.

Orange provides features for machine learning. Induction of models in Orange is implemented through a two-class schema. A learning algorithm is represented by an instance of a class derived from Orange.classification.Learner. The learner stores all parameters of the learning algorithm. Induced models are represented by instances of classes derived from Orange.classification.Classifier. The provided classification algorithms are the following:

  • Naive Bayes classifier
  • k-nearest neighbors
  • Rule induction
  • Support Vector Machines
  • Classification trees
  • Logistic regression
  • Majority
  • Lookup classifiers
  • Classifier from variable
  • Constant Classifier
  • Neural Network Learner

For regression, Orange provides several classes :

  • Linear regression (linear)
  • Lasso regression (lasso)
  • Partial least sqaures regression (PLS)
  • Multivariate Adaptive Regression Splines (earth)
  • Regression trees (tree)
  • Mean (mean)
  • Base class for regression

With respect to graphics, Orange can create parallel coordinates plot (also known as cobweb in OpenTURNS).

Orange is developed at Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia, together with open source community.

Orange is provided at:

http://orange.biolab.si/

Orange is an Open Source software provided with the GNU General Public License.

It is developped at:

http://orange.biolab.si/trac

It uses Mercurial as the version control system.

Orange is provided for Linux, Windows and Mac. For Windows, Orange is provided in binary form either as a full package (including Python 2.5, 2.6 or 2.7), or as a pure Python package (for Python 2.5, 2.6, or 2.7). For Mac, Orange is provided either as a bundle or with easy_install/pip.

PYMC

Pymc is a python package that implements Markov Chain Monte Carlo (MCMC) routines. It especially need to be used in Bayesian inference for sampling posterior distribution. Pymc implements the Metropolis-Hastings algorithm as a python class. Pymc includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics. Pymc only depends on numpy.

PyMc is provided under the 3-clause BSD License.

Link to download :

http://pypi.python.org/pypi/pymc/

The authors of Pymc are Christopher Fonnesbeck, Anand Patil and David Huard.

Link to home page :

http://github.com/pymc-devs/pymc

Link to bug tracker:

http://github.com/pymc-devs/pymc/issues

Sympy

SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python and does not require any external libraries.

http://sympy.org/en/index.html

Mayavi

MayaVi1 is a free, easy to use scientific data visualizer. It is written in Python and uses the amazing Visualization Toolkit (VTK) for the graphics. It provides a GUI written using Tkinter. MayaVi is free and distributed under the conditions of the BSD license. It is also cross platform and should run on any platform where both Python and VTK are available (which is almost any *nix, Mac OSX or Windows).

http://mayavi.sourceforge.net/

Rpy2

TODO

Other

Acknowledgments

  • Merlin Keller (EDF R&D)
  • Guillaume Damblin (EDF R&D)
Last modified 3 years ago Last modified on 05/07/14 21:51:15