Returns the minimal value in col. None values are ignored.
Parameters: | col (str) – column name |
---|
Returns the maximum value in col. None values are ignored.
Parameters: | col (str) – column name |
---|
Returns the row containing the cell with the maximal value in col. If several rows have the highest value, only the first one is returned. None values are ignored.
Parameters: | col (str) – column name |
---|---|
Returns: | row with maximal col value or None if the table is empty |
Returns the row containing the cell with the minimal value in col. If several rows have the lowest value, only the first one is returned. None values are ignored.
Parameters: | col (str) – column name |
---|---|
Returns: | row with minimal col value or None if the table is empty |
Returns the mean of the given column. Cells with None are ignored. Returns None, if the column doesn’t contain any elements. Col must be of numeric (‘float’, ‘int’) or boolean column type.
If column type is bool, the function returns the ratio of number of ‘Trues’ by total number of elements.
Parameters: | col (str) – column name |
---|---|
Raises : | TypeError if column type is string |
Returns the median of the given column. Cells with None are ignored. Returns None, if the column doesn’t contain any elements. Col must be of numeric column type (‘float’, ‘int’) or boolean column type.
Parameters: | col (str) – column name |
---|---|
Raises : | TypeError if column type is string |
Returns the standard deviation of the given column. Cells with None are ignored. Returns None, if the column doesn’t contain any elements. Col must be of numeric column type (‘float’, ‘int’) or boolean column type.
Parameters: | col (str) – column name |
---|---|
Raises : | TypeError if column type is string |
count the number of cells in column that are not equal to None.
Parameters: |
|
---|
Calculate the Pearson correlation coefficient between col1 and col2, only taking rows into account where both of the values are not equal to None. If there are not enough data points to calculate a correlation coefficient, None is returned.
Parameters: |
|
---|
Calculate the Spearman correlation coefficient between col1 and col2, only taking rows into account where both of the values are not equal to None. If there are not enough data points to calculate a correlation coefficient, None is returned.
Warning : | The function depends on the following module: scipy.stats.mstats |
---|---|
Parameters: |
|
Computes the receiver operating characteristics (ROC) of column score_col classified according to class_col.
For this it is necessary, that the datapoints are classified into positive and negative points. This can be done in two ways:
- by using one ‘bool’ column (class_col) which contains True for positives and False for negatives
- by using a non-bool column (class_col), a cutoff value (class_cutoff) and the classification columns direction (class_dir). This will generate the classification on the fly
- if class_dir=='-': values in the classification column that are less than or equal to class_cutoff will be counted as positives
- if class_dir=='+': values in the classification column that are larger than or equal to class_cutoff will be counted as positives
During the calculation, the table will be sorted according to score_dir, where a ‘-‘ values means smallest values first and therefore, the smaller the value, the better.
If class_col does not contain any positives (i.e. value is True (if column is of type bool) or evaluated to True (if column is of type int or float (depending on class_dir and class_cutoff))) the ROC is not defined and the function will return None.
Warning : | If either the value of class_col or score_col is None, the data in this row is ignored. |
---|
Computes the enrichment of column score_col classified according to class_col.
For this it is necessary, that the datapoints are classified into positive and negative points. This can be done in two ways:
- by using one ‘bool’ type column (class_col) which contains True for positives and False for negatives
- by specifying a classification column (class_col), a cutoff value (class_cutoff) and the classification columns direction (class_dir). This will generate the classification on the fly
- if class_dir=='-': values in the classification column that are less than or equal to class_cutoff will be counted as positives
- if class_dir=='+': values in the classification column that are larger than or equal to class_cutoff will be counted as positives
During the calculation, the table will be sorted according to score_dir, where a ‘-‘ values means smallest values first and therefore, the smaller the value, the better.
Warning : | If either the value of class_col or score_col is None, the data in this row is ignored. |
---|
Compute Matthews correlation coefficient (MCC) for one column (score_col) with the points classified into true positives, false positives, true negatives and false negatives according to a specified classification column (class_col).
The datapoints in score_col and class_col are classified into positive and negative points. This can be done in two ways:
- by using ‘bool’ columns which contains True for positives and False for negatives
- by using ‘float’ or ‘int’ columns and specifying a cutoff value and the columns direction. This will generate the classification on the fly
- if class_dir/score_dir=='-': values in the classification column that are less than or equal to class_cutoff/score_cutoff will be counted as positives
- if class_dir/score_dir=='+': values in the classification column that are larger than or equal to class_cutoff/score_cutoff will be counted as positives
The two possibilities can be used together, i.e. ‘bool’ type for one column and ‘float’/’int’ type and cutoff/direction for the other column.
This returns the optimal prefactor values (i.e. a, b, c, ...) for the following equation
where u, v, w and z are vectors. In matrix notation
where A contains the data from the table (u,v,w,...), p are the prefactors to optimize (a,b,c,...) and z is the vector containing the result of equation (1).
The parameter ref_col equals to z in both equations, and *args are columns u, v and w (or A in (2)). All columns must be specified by their names.
Example:
tab.optimal_prefactors('colC', 'colA', 'colB')
The function returns a list of containing the prefactors a, b, c, ... in the correct order (i.e. same as columns were specified in *args).
Weighting: If the kwarg weights=”columX” is specified, the equations are weighted by the values in that column. Each row is multiplied by the weight in that row, which leads to (3):
Weights must be float or int and can have any value. A value of 0 ignores this equation, a value of 1 means the same as no weight. If all weights are the same for each row, the same result will be obtained as with no weights.
Example:
tab.optimal_prefactors('colC', 'colA', 'colB', weights='colD')
Adds a new column of type ‘float’ with a specified name (mean_col_name), containing the mean of all specified columns for each row.
Cols are specified by their names and must be of numeric column type (‘float’, ‘int’) or boolean column type. Cells with None are ignored. Adds None if the row doesn’t contain any values.
Parameters: |
|
---|---|
Raises : | TypeError if column type of columns in col is string |
== Example ==
Staring with the following table:
x | y | u |
---|---|---|
1 | 10 | 100 |
2 | 15 | None |
3 | 20 | 400 |
the code here adds a column with the name ‘mean’ to yield the table below:
x | y | u | mean |
---|---|---|---|
1 | 10 | 100 | 50.5 |
2 | 15 | None | 2 |
3 | 20 | 400 | 201.5 |
Returns the sum of the given column. Cells with None are ignored. Returns 0.0, if the column doesn’t contain any elements. Col must be of numeric column type (‘float’, ‘int’) or boolean column type.
Parameters: | col (str) – column name |
---|---|
Raises : | TypeError if column type is string |
Two-sided test for the null-hypothesis that two related samples have the same average (expected values)
Parameters: |
|
---|---|
Returns: | P-value between 0 and 1 that the two columns have the same average. The smaller the value, the less related the two columns are. |