Random Variables#

functions for random and discrete random variables.

class osc_physrisk_financial.random_variables.DiscreteRandomVariable(probabilities: Sequence[float | int], values: Sequence[float | int] | None = None, intervals: Sequence[float | int] | None = None, convert_to_osc_format: bool | None = False)[source]#

Bases: RandomVariable

A class to represent a discrete random variable derived from observed data.

Parameters:
  • probabilities (array like) – The probabilities associated with each interval or value in the histogram.

  • values (array like, optional) – The specific values representing the discrete random variable. Required if intervals is not provided.

  • intervals (array like, optional) – The intervals (bins) of the histogram representing the discrete random variable. Required if values is not provided.

  • convert_to_osc_format (bool, optional) – If True, it ensures that the probabilities sum to 1 by adjusting the zero-impact bin. This is needed for ImpactDistrib from OS-C. Default, False.

Examples

Values Example:

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]  # This should sum up to 1
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)

Intervals Example:

>>> intervals = [0, 0.2, 0.4, 0.6, 0.8, 1.0]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]  # This should sum up to 1
>>> drv = DiscreteRandomVariable(intervals=intervals, probabilities=probabilities)

Notes

  • We use intervals following OS-C convention. Internally, we work with the midpoints of each interval.

  • We define this class since classes like rv_discrete from scipy do not support some important operations like multiplication by scalar or adding a scalar to the random variable. However, it would be nice to have these features since they seem standard. Maybe from another library outside Scipy.

  • When the probabilities do not sum to one, as in the case of the ImpactDistrib class from OS-C, we add the missing value to zero to make the sum equal to one. In this way, we create a “mass point” at zero, meaning that we take the mean value for each interval except for zero, where we assign the remaining the probability. TODO: We need to check the output (methodology implemented in code) of OS-C impact distribution so we are sure the constructor of this class is properly defined. That is to say, verify that methodologically this is what we want given OS-C code.

check_values(min_value: float = 0, max_value: float = 1) bool[source]#

Check if all values of the DiscreteRandomVariable instance fall within a specified range.

This method verifies that each value defined in the DiscreteRandomVariable instance is between a specified minimum value and maximum value, inclusive. By default, it checks whether the values are between 0 and 1.

Parameters:
  • min_value (float, optional) – The minimum allowable value for the values. This value is inclusive, meaning that values can be equal to this minimum value. The default is 0.

  • max_value (float, optional) – The maximum allowable value for the values. This value is inclusive, meaning that values can be equal to this maximum value. The default is 1.

Returns:

Returns True if all values are within the specified range (min_value to max_value, inclusive). Otherwise, returns False.

Return type:

bool

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drv.check_values()
True
>>> drv.check_values(0,0.5)
False

Notes

The method utilizes numpy’s vectorized operations to efficiently check all values against the provided bounds. This approach is effective for instances with a large number of values.

compute_cdf()[source]#

Compute the Cumulative Distribution Function (CDF) for the discrete random variable.

The CDF is defined as the probability that the variable takes a value less than or equal to x. Formally, for a discrete random variable X with values x_i and corresponding probabilities p_i, the CDF at a point x is given by:

\[F(x) = P(X \leq x) = \sum_{x_i \leq x} p_i\]
Returns:

cdf – An array representing the cumulative probabilities corresponding to the values of the random variable.

Return type:

np.ndarray

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drv.compute_cdf()
array([0.1, 0.4, 0.7, 0.9, 1. ])
compute_es(percentile=95)[source]#

Compute the Expected Shortfall \(\\mathrm{ES}^{p}_{X}\) for a discrete random variable \(X\).

The Expected Shortfall at level \(p\) for a discrete random variable \(X\), is defined formally as:

\[\begin{split}\\text{ES}^{p}_X = \\frac{1}{1-p} \int_{p}^{1} V^{q}_X \, dq\end{split}\]

Where \(V^{p}_X\) is the Value at Risk at level \(p\).

Parameters:

percentile (float, optional) – The confidence level (\(p\)) for ES, expressed as a percentile (0-100). Default is 95.

Returns:

es_value – The computed ES at the given percentile (confidence level).

Return type:

float

Raises:

ValueError – If percentile is not within the range (0, 100).

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drv.compute_es()
0.899999999999998
static compute_es_vectorized(drvs, percentile=95)[source]#

Compute the Expected Shortfall (ES) for an array of DiscreteRandomVariable instances using a vectorized approach.

Parameters:
  • drvs (np.ndarray) – An array of DiscreteRandomVariable instances.

  • percentile (float, optional) – The confidence level (\(p\)) for ES expressed as a percentile (0-100). Default is 95.

Returns:

An array of floats representing the ESs of the discrete random variables.

Return type:

np.ndarray

Notes

This method utilizes np.vectorize to apply the ES calculation to each instance in the array. It is primarily for convenience and does not offer performance benefits over a traditional loop.

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drvs = np.array([drv, 1 / drv])
>>> DiscreteRandomVariable.compute_es_vectorized(drvs)
array([ 0.9, 10. ])
compute_exceedance_probability()[source]#

Compute the exceedance probability for a given threshold.

The exceedance probability is the probability that the discrete random variable exceeds a certain value x. Formally:

\[F_X^c(x) = P(X > x) = 1 - F_X(x)\]
Returns:

exceed_prob – An array representing the exceedance probabilities corresponding to the values of the random variable.

Return type:

np.ndarray

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drv.compute_exceedance_probability()
array([9.00000000e-01, 6.00000000e-01, 3.00000000e-01, 1.00000000e-01,
       1.11022302e-16])
static compute_exceedance_probability_vectorized(drvs, x)[source]#

Compute the exceedance probabilities for an array of DiscreteRandomVariable instances using a vectorized approach.

Parameters:
  • drvs (np.ndarray) – An array of DiscreteRandomVariable instances.

  • x (float) – Value at which to evaluate the exceedance probability function.

Returns:

An array of floats representing the exceedance probabilities of the discrete random variables evaluated at x.

Return type:

np.ndarray

Notes

This method utilizes np.vectorize to apply the exceedance probability calculation to each instance in the array. It is primarily for convenience and does not offer performance benefits over a traditional loop.

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drvs = np.array([drv, 1 / drv])
>>> DiscreteRandomVariable.compute_exceedance_probability_vectorized(drvs, 2)
array([1.11022302e-16, 4.00000000e-01])
compute_occurrence_probability(lambda_value)[source]#

Compute the occurrence probability \(O(x)\) for the discrete random variable using a Poisson process model.

We assume i.i.d. random variables.

In this case we have:

\[\begin{split}F_X(x) = \\frac{1}{\\lambda} \\log(1 - O(x)) + 1,\end{split}\]

where \(F_X(x)\) is the CDF of the random variable.

Parameters:

lambda_value (float) – The rate parameter of the Poisson process (number of occurrences per time unit).

Returns:

occurrence_prob – An array representing the occurrence probabilities O(s) for the values of the random variable.

Return type:

np.ndarray

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> lambda_value = 0.5  # Example rate parameter for the Poisson process
>>> drv.compute_occurrence_probability(lambda_value)
array([0.36237185, 0.25918178, 0.13929202, 0.04877058, 0.        ])
static compute_occurrence_probability_vectorized(drvs, lambda_value, x)[source]#

Compute the occurrence probabilities at x for an array of DiscreteRandomVariable instances using a vectorized approach.

Parameters:
  • drvs (np.ndarray) – An array of DiscreteRandomVariable instances.

  • lambda_value (float) – The rate parameter of the Poisson process (number of occurrences per time unit).

  • x (float) – Value at which to evaluate the occurrence probability function.

Returns:

An array of floats representing the occurrence probabilities of the discrete random variables evaluated at x.

Return type:

np.ndarray

Notes

This method utilizes np.vectorize to apply the occurrence probability calculation to each instance in the array. It is primarily for convenience and does not offer performance benefits over a traditional loop.

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drvs = np.array([drv, 1 / drv])
>>> lambda_value = 0.5  # Example rate parameter for the Poisson process
>>> DiscreteRandomVariable.compute_occurrence_probability_vectorized(drvs, lambda_value, 0.3)
array([0.25918178, 0.39346934])
compute_var(percentile=95)[source]#

Compute the Value at Risk \(V^{p}_{X}\) for a discrete random variable \(X\).

The Value at Risk (\(V^{p}_{X}\)) of a discrete random variable \(X\) at the level \(p \in (0, 1)\) is the p-quantile of \(X\) defined by the condition that the cumulative distribution function \(F_{X}(x)\) is greater than or equal to \(p\). Formally, \(V^{p}_{X}\) is given by:

\[V^{p}_{X} := \inf\{x \in \mathbb{R} : P(X \leq x) \geq p\}.\]
Parameters:

percentile (float, optional) – The confidence level (\(p\)) for VaR expressed as a percentile (0-100). Default is 95.

Returns:

var_value – The computed VaR at the given percentile (confidence level).

Return type:

float

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drv.compute_var()
0.9
static compute_var_vectorized(drvs, percentile=95)[source]#

Compute VaRs for an array of DiscreteRandomVariable instances using a vectorized approach.

Parameters:
  • drvs (np.ndarray) – An array of DiscreteRandomVariable instances.

  • percentile (float, optional) – The confidence level (\(p\)) for VaR expressed as a percentile (0-100). Default is 95.

Returns:

An array of floats representing the VaRs of the discrete random variables.

Return type:

np.ndarray

Notes

This method utilizes np.vectorize to apply the VaR calculation to each instance in the array. It is primarily for convenience and does not offer performance benefits over a traditional loop.

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drvs = np.array([drv, 1 / drv])
>>> DiscreteRandomVariable.compute_var_vectorized(drvs)
array([ 0.9, 10. ])
mean()[source]#

Calculate the mean of the discrete random variable.

Returns:

The mean of the discrete random variable.

Return type:

float

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drv.mean()
0.48000000000000004
static means_vectorized(drvs)[source]#

Compute means for an array of DiscreteRandomVariable instances using a vectorized approach.

Parameters:

drvs (np.ndarray) – An array of DiscreteRandomVariable instances.

Returns:

An array of floats representing the means of the discrete random variables.

Return type:

np.ndarray

Notes

This method utilizes np.vectorize to apply the mean calculation to each instance in the array. It is primarily for convenience and does not offer performance benefits over a traditional loop.

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drvs = np.array([drv, 1 / drv])
>>> DiscreteRandomVariable.means_vectorized(drvs)
array([0.48     , 2.9968254])
plot_pmf()[source]#

Plot an interactive histogram representing the probability mass function (PMF) of the discrete random variable.

This method uses Plotly to create an interactive histogram that provides a visual representation of how probabilities are distributed across different intervals.

sample(n: int | None = 1)[source]#

Generate n random samples from the discrete random variable.

Parameters:

n (int, optional) – The number of samples to generate. The default is 1.

Returns:

An array of sampled values.

Return type:

np.ndarray

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> sample = drv.sample(5)
var()[source]#

Calculate the variance of the discrete random variable.

Returns:

The variance of the discrete random variable.

Return type:

float

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drv.var()
0.05160000000000001
static vars_vectorized(drvs)[source]#

Compute variances for an array of DiscreteRandomVariable instances using a vectorized approach.

Parameters:

drvs (np.ndarray) – An array of DiscreteRandomVariable instances.

Returns:

An array of floats representing the means of the discrete random variables.

Return type:

np.ndarray

Notes

This method utilizes np.vectorize to apply the variance calculation to each instance in the array. It is primarily for convenience and does not offer performance benefits over a traditional loop.

Examples

>>> values = [0.1, 0.3, 0.5, 0.7, 0.9]
>>> probabilities = [0.1, 0.3, 0.3, 0.2, 0.1]
>>> drv = DiscreteRandomVariable(values=values, probabilities=probabilities)
>>> drvs = np.array([drv, 1 / drv])
>>> DiscreteRandomVariable.vars_vectorized(drvs)
array([0.0516    , 6.08399093])
class osc_physrisk_financial.random_variables.RandomVariable[source]#

Bases: ABC

Abstract class with the common methods and attributes of discrete and continuous random variables.

Ideally, we wouldn’t have to implement this class from scratch, but an initial search seems to indicate that what we want doesn’t exist in another libraries (like SciPy).

abstract compute_cdf()[source]#

Compute the Cumulative Distribution Function (CDF) for the random variable.

abstract compute_var(percentile=95)[source]#

Compute the Value at Risk \(V^{p}_{X}\) for a random variable \(X\).

The Value at Risk (\(V^{p}_{X}\)) of a discrete random variable \(X\) at the level \(p \in (0, 1)\) is the p-quantile of \(X\) defined by the condition that the cumulative distribution function \(F_{X}(x)\) is greater than or equal to \(p\). Formally, \(V^{p}_{X}\) is given by:

\[V^{p}_{X} := \inf\{x \in \mathbb{R} : P(X \leq x) \geq p\}.\]

Notes

This is an abstract method and must be implemented by subclasses.

abstract static compute_var_vectorized(rvs)[source]#

Compute VaRs for an array of RandomVariable instances using a vectorized approach.

Parameters:

rvs (Sequence[RandomVariable]) – An array or sequence of RandomVariable instances.

Returns:

An array of floats representing the VaRs of the random variables.

Return type:

np.ndarray

Notes

This is an abstract method and must be implemented by subclasses.

abstract mean()[source]#

Calculate the mean of the random variable.

Returns:

The mean of the random variable.

Return type:

float

Notes

This is an abstract method and must be implemented by subclasses.

abstract static means_vectorized(rvs: Sequence[RandomVariable]) ndarray[source]#

Abstract static method to compute means for an array of RandomVariable instances using a vectorized approach.

Parameters:

rvs (Sequence[RandomVariable]) – An array or sequence of RandomVariable instances.

Returns:

An array of floats representing the means of the random variables.

Return type:

np.ndarray

Notes

This is an abstract method and must be implemented by subclasses.

abstract var()[source]#

Calculate the variance of the random variable.

Returns:

The variance of the discrete random variable.

Return type:

float

Notes

This is an abstract method and must be implemented by subclasses.

abstract static vars_vectorized(rvs: Sequence[RandomVariable]) ndarray[source]#

Abstract static method to compute variances for an array of RandomVariable instances using a vectorized approach.

Parameters:

rvs (Sequence[RandomVariable]) – An array or sequence of RandomVariable instances.

Returns:

An array of floats representing the variances of the random variables.

Return type:

np.ndarray

Notes

This is an abstract method and must be implemented by subclasses.