OmicsIntegrator2¶
For more information about the scientific uses of OmicsIntegrator2, please see the Readme. To report an issue, please refer to the Issues.
-
class
graph.
Graph
(interactome_file, params={})[source]¶ A Graph object is a representation of a graph, with convenience methods for using the pcst_fast package, which approximately minimizes the Prize-Collecting Steiner Forest objective.
-
__init__
(interactome_file, params={})[source]¶ Builds a representation of a graph from an interactome file.
From the interactome_file, populates - graph.interactome_dataframe (pandas.DataFrame) - graph.interactome_graph (networkx.Graph) - graph.nodes (pandas.Index), - graph.edges (list of pairs), - graph.costs and graph.edge_penalties (lists, such that the ordering is the same as in graph.edges), - graph.node_degrees (list, such that the ordering is the same as in graph.nodes).
Parameters: - interactome_file (str or FILE) – tab-delimited text file containing edges in interactome and their weights formatted like “ProteinA ProteinB Cost”
- params (dict) – params with which to run the program
-
prepare_prizes
(prize_file)[source]¶ Parses a prize file and adds prizes and other attributes to the graph object.
The file passed to this function must have at least two columns: node name and prize. Any additional columns will be assumed to be node attributes. However, in order to know the names of those attributes, this function requires the input file to contain headers, i.e. the first row of the tsv must be the names of the columns.
Sets the graph attributes - graph.bare_prizes (numpy.array): properly indexed (same as graph.nodes) prizes from the file. - graph.prizes (numpy.array): properly indexed prizes, scaled by beta (graph.params.b) - graph.terminals (numpy.array): their indices - graph.node_attributes (pandas.DataFrame) Any node attributes passed in with the prize file (columns 3, …)
Parameters: prize_file (str or FILE) – a filepath or file object containing a tsv with column headers.
-
pcsf
(pruning='strong', verbosity_level=0)[source]¶ Select the subgraph which approximately optimizes the Prize-Collecting Steiner Forest objective.
This function mostly defers to pcst_fast, but does one important pre-processing step: it adds a dummy node which will serve as the PCSF root and connects that dummy node to either terminals, non-terminals, or all other nodes with edges weighted by self.params.w.
In order to interpret the results of this function, use output_forest_as_networkx with the results of this function.
Parameters: - pruning (str) – a string value indicating the pruning method. Possible values are ‘none’, ‘simple’, ‘gw’, and ‘strong’ (all literals are case-insensitive).
- verbosity_level (int) – an integer indicating how much debug output the function should produce.
Returns: indices of the selected vertices numpy.array: indices of the selected edges
Return type: numpy.array
-
output_forest_as_networkx
(vertex_indices, edge_indices)[source]¶ Construct a networkx graph from a set of vertex and edge indices (i.e. a pcsf output)
Parameters: - vertex_indices (list) – indices of the vertices selected in self.nodes. Note, this list must be of type int or boolean. Errors may arise from empty numpy arrays of dtype=’object’
- edge_indices (list) – indices of the edges selected in self.edges
Returns: a networkx graph object
Return type: networkx.Graph
-
pcsf_objective_value
(forest)[source]¶ Calculate PCSF objective function
Parameters: forest (networkx.Graph) – a forest like the one returned by output_forest_as_networkx – Not an augmented forest! Returns: PCSF objective function score Return type: float
-
randomizations
(noisy_edges_reps=0, random_terminals_reps=0)[source]¶ Macro function which performs randomizations and merges the results
Note that thee parameters are additive, not multiplicative: noisy_edges_reps = 5 and random_terminals_reps = 5 makes 10 PCSF runs, not 25.
Parameters: - noisy_edges_reps (int) – Number of “Noisy Edges” type randomizations to perform
- random_terminals_reps (int) – Number of “Random Terminals” type randomizations to perform
Returns: forest networkx.Graph: augmented_forest
Return type: networkx.Graph
-
grid_randomization
(prize_file, Ws=[5], Bs=[1], Gs=[3], noisy_edges_reps=0, random_terminals_reps=0)[source]¶ Macro function which performs grid search or randomizations or both.
Parameters: - prize_file (str) – filepath
- Gs (list) – Values of gamma
- Bs (list) – Values of beta
- Ws (list) – Values of omega
- noisy_edges_reps (int) – Number of robustness experiments
- random_terminals_reps (int) – Number of specificity experiments
Returns: Forest and augmented forest networkx graphs, keyed by parameter string
Return type: dict
-
grid_search
(prize_file, Ws, Bs, Gs)[source]¶ Macro function which performs grid search.
Parameters: - prize_file (str) – filepath
- Gs (list) – Values of gamma
- Bs (list) – Values of beta
- Ws (list) – Values of omega
Returns: forest networkx.Graph: augmented_forest pd.DataFrame: parameters and node membership lists
Return type: networkx.Graph
-
-
graph.
betweenness
(nxgraph)[source]¶ Compute and add as an attribute the betweenness of each node.
Betweenness centrality of a node v is the sum of the fraction of all-pairs shortest paths that pass through v.
Parameters: nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.
-
graph.
louvain_clustering
(nxgraph)[source]¶ Compute “Louvain”/”Community” clustering on a networkx graph, and add the cluster labels as attributes on the nodes.
Parameters: nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.
-
graph.
k_clique_clustering
(nxgraph, k)[source]¶ Compute “k-Clique” clustering on a networkx graph, and add the cluster labels as attributes on the nodes.
See the [networkx docs](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.kclique.k_clique_communities.html#networkx.algorithms.community.kclique.k_clique_communities)
Parameters: nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.
-
graph.
spectral_clustering
(nxgraph, k)[source]¶ Compute “spectral” clustering on a networkx graph, and add the cluster labels as attributes on the nodes.
Parameters: nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.
-
graph.
annotate_graph_nodes
(nxgraph)[source]¶ Parameters: nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.
-
graph.
summarize_grid_search
(results, mode, top_n=inf)[source]¶ Summarizes results of grid_randomization or grid_search into a matrix where each row is a gene and each column is a parameter run. If summarizing “membership”, entries will be 0 or 1 indicating whether or not a node appeared in each experiment. If summarizing “robustness” or “specificity”, entries indicate robustness or specificity values for each experiment.
Parameters: - results (list of tuples) – Results of grid_randomization or grid_search of form {‘paramstring’: { ‘forest’: object, ‘augmented forest’: object}}
- mode (str) – Reported values “membership”, “robustness”, “specificity”
- top_n (int) – Takes the top_n values of the summary dataframe. top_n=-1 sets no threshold
Returns: Columns correspond to each parameter experiment, indexed by nodes
Return type: pd.DataFrame
-
graph.
get_robust_subgraph_from_randomizations
(nxgraph, max_size=400, min_component_size=5)[source]¶ Given a graph with robustness attributes, take the top max_size robust nodes and prune any “small” components.
Parameters: - nxgraph (networkx.Graph) – Network from randomization experiment
- max_size (int) – Max size of robust network
Returns: Robust network
Return type: networkx.Graph
-
graph.
filter_graph_by_component_size
(nxgraph, min_size=5)[source]¶ Removes any components that are less than min_size.
Parameters: - nxgraph (networkx.Graph) – Network from randomization experiment
- min_size (int) – Min size of components in nxgraph. Set to 2 to remove singletons only.
Returns: Network with components less than specified size removed.
Return type: networkx.Graph
-
graph.
get_networkx_graph_as_dataframe_of_nodes
(nxgraph)[source]¶ Parameters: nxgraph (networkx.Graph) – any instance of networkx.Graph Returns: nodes from the input graph and their attributes as a dataframe Return type: pd.DataFrame
-
graph.
get_networkx_graph_as_dataframe_of_edges
(nxgraph)[source]¶ Parameters: nxgraph (networkx.Graph) – any instance of networkx.Graph Returns: edges from the input graph and their attributes as a dataframe Return type: pd.DataFrame
-
graph.
output_networkx_graph_as_pickle
(nxgraph, output_dir='.', filename='pcsf_results.pickle')[source]¶ Parameters: - nxgraph (networkx.Graph) – any instance of networkx.Graph
- output_dir (str) – the directory in which to output the graph.
- filename (str) – Filenames ending in .gz or .bz2 will be compressed.
Returns: the filepath which was outputted to
Return type: Path
-
graph.
output_networkx_graph_as_graphml_for_cytoscape
(nxgraph, output_dir='.', filename='pcsf_results.graphml.gz')[source]¶ Parameters: - nxgraph (networkx.Graph) – any instance of networkx.Graph
- output_dir (str) – the directory in which to output the graph.
- filename (str) – Filenames ending in .gz or .bz2 will be compressed.
Returns: the filepath which was outputted to
Return type: Path
-
graph.
output_networkx_graph_as_interactive_html
(nxgraph, attribute_metadata={}, output_dir='.', filename='graph.html')[source]¶ Parameters: - nxgraph (networkx.Graph) – any instance of networkx.Graph
- output_dir (str) – the directory in which to output the file
- filename (str) – the filename of the output file
Returns: the filepath which was outputted to
Return type: Path