OmicsIntegrator2¶

For more information about the scientific uses of OmicsIntegrator2, please see the Readme. To report an issue, please refer to the Issues.

class graph.Graph(interactome_file, params={})[source]¶

A Graph object is a representation of a graph, with convenience methods for using the pcst_fast package, which approximately minimizes the Prize-Collecting Steiner Forest objective.

__init__(interactome_file, params={})[source]¶

Builds a representation of a graph from an interactome file.

From the interactome_file, populates - graph.interactome_dataframe (pandas.DataFrame) - graph.interactome_graph (networkx.Graph) - graph.nodes (pandas.Index), - graph.edges (list of pairs), - graph.costs and graph.edge_penalties (lists, such that the ordering is the same as in graph.edges), - graph.node_degrees (list, such that the ordering is the same as in graph.nodes).

Parameters:	interactome_file (str or FILE) – tab-delimited text file containing edges in interactome and their weights formatted like “ProteinA ProteinB Cost” params (dict) – params with which to run the program

prepare_prizes(prize_file)[source]¶

Parses a prize file and adds prizes and other attributes to the graph object.

The file passed to this function must have at least two columns: node name and prize. Any additional columns will be assumed to be node attributes. However, in order to know the names of those attributes, this function requires the input file to contain headers, i.e. the first row of the tsv must be the names of the columns.

Sets the graph attributes - graph.bare_prizes (numpy.array): properly indexed (same as graph.nodes) prizes from the file. - graph.prizes (numpy.array): properly indexed prizes, scaled by beta (graph.params.b) - graph.terminals (numpy.array): their indices - graph.node_attributes (pandas.DataFrame) Any node attributes passed in with the prize file (columns 3, …)

Parameters:	prize_file (str or FILE) – a filepath or file object containing a tsv with column headers.

pcsf(pruning='strong', verbosity_level=0)[source]¶

Select the subgraph which approximately optimizes the Prize-Collecting Steiner Forest objective.

This function mostly defers to pcst_fast, but does one important pre-processing step: it adds a dummy node which will serve as the PCSF root and connects that dummy node to either terminals, non-terminals, or all other nodes with edges weighted by self.params.w.

In order to interpret the results of this function, use output_forest_as_networkx with the results of this function.

Parameters:	pruning (str) – a string value indicating the pruning method. Possible values are ‘none’, ‘simple’, ‘gw’, and ‘strong’ (all literals are case-insensitive). verbosity_level (int) – an integer indicating how much debug output the function should produce.
Returns:	indices of the selected vertices numpy.array: indices of the selected edges
Return type:	numpy.array

output_forest_as_networkx(vertex_indices, edge_indices)[source]¶

Construct a networkx graph from a set of vertex and edge indices (i.e. a pcsf output)

Parameters:	vertex_indices (list) – indices of the vertices selected in self.nodes. Note, this list must be of type int or boolean. Errors may arise from empty numpy arrays of dtype=’object’ edge_indices (list) – indices of the edges selected in self.edges
Returns:	a networkx graph object
Return type:	networkx.Graph

pcsf_objective_value(forest)[source]¶

Calculate PCSF objective function

Parameters:	forest (networkx.Graph) – a forest like the one returned by output_forest_as_networkx – Not an augmented forest!
Returns:	PCSF objective function score
Return type:	float

randomizations(noisy_edges_reps=0, random_terminals_reps=0)[source]¶

Macro function which performs randomizations and merges the results

Note that thee parameters are additive, not multiplicative: noisy_edges_reps = 5 and random_terminals_reps = 5 makes 10 PCSF runs, not 25.

Parameters:	noisy_edges_reps (int) – Number of “Noisy Edges” type randomizations to perform random_terminals_reps (int) – Number of “Random Terminals” type randomizations to perform
Returns:	forest networkx.Graph: augmented_forest
Return type:	networkx.Graph

grid_randomization(prize_file, Ws=[5], Bs=[1], Gs=[3], noisy_edges_reps=0, random_terminals_reps=0)[source]¶

Macro function which performs grid search or randomizations or both.

Parameters:	prize_file (str) – filepath Gs (list) – Values of gamma Bs (list) – Values of beta Ws (list) – Values of omega noisy_edges_reps (int) – Number of robustness experiments random_terminals_reps (int) – Number of specificity experiments
Returns:	Forest and augmented forest networkx graphs, keyed by parameter string
Return type:	dict

grid_search(prize_file, Ws, Bs, Gs)[source]¶

Macro function which performs grid search.

Parameters:	prize_file (str) – filepath Gs (list) – Values of gamma Bs (list) – Values of beta Ws (list) – Values of omega
Returns:	forest networkx.Graph: augmented_forest pd.DataFrame: parameters and node membership lists
Return type:	networkx.Graph

graph.betweenness(nxgraph)[source]¶

Compute and add as an attribute the betweenness of each node.

Betweenness centrality of a node v is the sum of the fraction of all-pairs shortest paths that pass through v.

Parameters:	nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.

graph.louvain_clustering(nxgraph)[source]¶

Compute “Louvain”/”Community” clustering on a networkx graph, and add the cluster labels as attributes on the nodes.

Parameters:	nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.

graph.k_clique_clustering(nxgraph, k)[source]¶

Compute “k-Clique” clustering on a networkx graph, and add the cluster labels as attributes on the nodes.

See the [networkx docs](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.kclique.k_clique_communities.html#networkx.algorithms.community.kclique.k_clique_communities)

Parameters:	nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.

graph.spectral_clustering(nxgraph, k)[source]¶

Compute “spectral” clustering on a networkx graph, and add the cluster labels as attributes on the nodes.

Parameters:	nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.

graph.annotate_graph_nodes(nxgraph)[source]¶

Parameters:	nxgraph (networkx.Graph) – a networkx graph, usually the augmented_forest.

graph.summarize_grid_search(results, mode, top_n=inf)[source]¶

Summarizes results of grid_randomization or grid_search into a matrix where each row is a gene and each column is a parameter run. If summarizing “membership”, entries will be 0 or 1 indicating whether or not a node appeared in each experiment. If summarizing “robustness” or “specificity”, entries indicate robustness or specificity values for each experiment.

Parameters:	results (list of tuples) – Results of grid_randomization or grid_search of form {‘paramstring’: { ‘forest’: object, ‘augmented forest’: object}} mode (str) – Reported values “membership”, “robustness”, “specificity” top_n (int) – Takes the top_n values of the summary dataframe. top_n=-1 sets no threshold
Returns:	Columns correspond to each parameter experiment, indexed by nodes
Return type:	pd.DataFrame

graph.get_robust_subgraph_from_randomizations(nxgraph, max_size=400, min_component_size=5)[source]¶

Given a graph with robustness attributes, take the top max_size robust nodes and prune any “small” components.

Parameters:	nxgraph (networkx.Graph) – Network from randomization experiment max_size (int) – Max size of robust network
Returns:	Robust network
Return type:	networkx.Graph

graph.filter_graph_by_component_size(nxgraph, min_size=5)[source]¶

Removes any components that are less than min_size.

Parameters:	nxgraph (networkx.Graph) – Network from randomization experiment min_size (int) – Min size of components in nxgraph. Set to 2 to remove singletons only.
Returns:	Network with components less than specified size removed.
Return type:	networkx.Graph

graph.get_networkx_graph_as_dataframe_of_nodes(nxgraph)[source]¶

Parameters:	nxgraph (networkx.Graph) – any instance of networkx.Graph
Returns:	nodes from the input graph and their attributes as a dataframe
Return type:	pd.DataFrame

graph.get_networkx_graph_as_dataframe_of_edges(nxgraph)[source]¶

Parameters:	nxgraph (networkx.Graph) – any instance of networkx.Graph
Returns:	edges from the input graph and their attributes as a dataframe
Return type:	pd.DataFrame

graph.output_networkx_graph_as_pickle(nxgraph, output_dir='.', filename='pcsf_results.pickle')[source]¶

Parameters:	nxgraph (networkx.Graph) – any instance of networkx.Graph output_dir (str) – the directory in which to output the graph. filename (str) – Filenames ending in .gz or .bz2 will be compressed.
Returns:	the filepath which was outputted to
Return type:	Path

graph.output_networkx_graph_as_graphml_for_cytoscape(nxgraph, output_dir='.', filename='pcsf_results.graphml.gz')[source]¶

Parameters:	nxgraph (networkx.Graph) – any instance of networkx.Graph output_dir (str) – the directory in which to output the graph. filename (str) – Filenames ending in .gz or .bz2 will be compressed.
Returns:	the filepath which was outputted to
Return type:	Path

graph.output_networkx_graph_as_interactive_html(nxgraph, attribute_metadata={}, output_dir='.', filename='graph.html')[source]¶

Parameters:	nxgraph (networkx.Graph) – any instance of networkx.Graph output_dir (str) – the directory in which to output the file filename (str) – the filename of the output file
Returns:	the filepath which was outputted to
Return type:	Path