Iterative run of TimeXNet from command line
TimeXNet can be run iteratively through a range of gamma1 and gamma2 values to identify the combination of values that predict in an optimal subnetwork with the largest number of starting genes and the fewest number of low confidence edges.
Users should note that this option takes a long time to run depending on how many combinations of gamma1 and gamma2 are possible. The program can be executed as follows:
java –classpath [full installation directory]/timexnet.jar timexnet.NetIterate [input network] [initial gene list] [intermediate gene list] [late gene list] [gamma_min] [gamma_max] [gamma_int] [output directory] [glpsol location]
It is necessary to specify the fully qualified path where the jar file is installed in order to run TimeXNet. "glpsol location" is an optional parameter.
The input parameters must be given in the specified order.
- Input network: A list of interactions with their directionality and their reliability score in the format:
- Molecules 1 and 2 can be an Id or a name.
- The direction of the interaction can be either "--" (bidirectional interaction) or "->" (uni-directional interaction).
- The reliability score is between 0 and 1.
- This data is currently provided in the form of a tab-delimited flat file.
The molecule name should be the same as in the gene lists.
- Initial gene list: A list of the genes of interest showing their greatest change in expression in the early hours after stimulation along with their scores in the format:
- Molecule can be an ID or name. The node identifier used here is the same as that used in the interactions file above.
- The score can be any positive real number that indicates the amount of change in expression of the gene.
- This data is also provided in the form of a tab-delimited flat-file.
- Intermediate gene list: A list of the genes of interest showing their greatest change in expression in the intermediate hours after stimulation along with their scores in the format specified above.
- Late gene list: A list of the genes of interest showing their greatest change in expression in the late hours after stimulation along with their scores in the format specified above.
- Gamma_min: The minimum value for gamma1 and gamma2. This should be a real positive value.
- Gamma_max: The maximum value for gamma1 and gamma2. This should be a real positive value.
- Gamma_int: A real positive value used as an interval to increment gamma1 and gamma2 from gamma_min to gamma_max. For example, if gamma_min is 0 and gamma_max is 2, with gamma_int = 0.5, then network predictions will be performed for all combinations of gamma1 and gamma2 for the values 0, 0.5, 1, 1.5 and 2.
- Output directory: Fully qualified path of the location where the output files should be stored.
- GLPSOL location: The fully qualified path to the GNU executable GLPSOL (including the name of the executable) used to solved the optimization problem eg. "c:\\Program Files (x86)\\GnuWin32\\bin\\glpsol.exe". This is an optional parameter and is used only if TimeXNet is not able to find a GLPK installation.
For each combination of gamma1 and gamma2 values, TimeXNet creates a folder, net_g1_g2 (g1, g2 indicate the values of gamma1 and gamma2 used), to store the following output files obtained for each run:
- lp_form_g1-g2: The problem formulation in the format required by glpsol
- lp_sol_g1-g2: The output file generated by glpsol that contains the solution to the optimization problem
- lp_sol_g1-g2.edges: The list of interactions with their flows parsed from the glpsol output file. The format is as follows:
Here "Type" can be one of "pp" or "pd" indicating a bi-directional or uni-directional interaction, respectively. This file is in a tab-delimited format and can be directly uploaded to Cytoscape to visualize the network.
- lp_sol_g1-g2.nodes: The list of nodes in the network in (3) with their associated flows, calculated by adding all the incoming flows per node. The format of this file is as follows:
In this case, Type can be one of SRC (initial genes), INT (intermediate genes), SNK2 (late genes) or NOD (predicted gene showing no change in expression). This file is also in a tab-delimited format and can be uploaded into Cytoscape.
- log_g1-g2: Log file showing the detailed progress of the TimeXNet run including the duplicate edges identified and ignored, edges and nodes with erroneous weights and scores, and the detailed output of the glpsol program.
- edge_lst_g1-g2: List of edges used to run the final cost flow optimization problem. This file represents the final input network.
- Additional filesTimeXNet also generates node and egde attribute files along with a .sif file representing the network that can by uploaded into Cytoscape.
TimeXNet also creates a summary file "net_stats" in the main output folder with the details of each run given in the following columns:
- Nodes:Total number of genes in the predicted response network
- Sources: Total number of "initial" genes in the predicted response network
- Intermediates: Total number of "intermediate" genes in the predicted response network
- Sinks: Total number of "late" genes in the predicted response network
- Predicted: Total number of genes without significant changes in expression predicted in the response network
- Edges: Total number of edges predicted in the response network
- Edges0.5: Total number of edges with reliability less than 0.5 predicted in the response network
- Edges0.3: Total number of edges with reliability less than 0.3 predicted in the response network
Human Genome Centre, Institute of Medical Science, University of Tokyo