You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The primary difficulties in doing this are as follows:
The Pair HMM of Holmes 2020 (representing evolution along a single branch) is fully connected, whereas Historian assumes that the D->I transition probability is zero. This has knock-on ramifications for the composite HMMs used for parent-sibling triads (two branches), and grandparent-parent-sibling tetrads (three branches) (although the tetrads are never used directly, instead we use Redelings-Suchard kernels to propose moves)
If(?) there are any parts of the code that group insertions before deletions (implicitly assuming that the I->D transition is allowed but the D->I transition isn't), these may need to be updated
Alternatively (and probably better), when calculating likelihoods of gaps that include both deletions and insertions, use combinatoric formulae to calculate likelihood of gap summed over I/D ordering
Historian currently assumes that the transition probabilities of the Pair HMM can be computed in closed form; instead a numerical approximation (eg Runge-Kutta RK4) is needed
The current parameter-fitting method of Historian uses a (fudged) EM method to compute the indel rates; instead we will need a gradient ascent approach based on autodiff of RK4, and methods that accumulate indel counts will need to be adapted to tally transition counts sorted by branch length
(A limited strategy may be to leave the current inference code in place for the ML alignment/reconstruction phase (and associated HMMs), and update only the MCMC and parameter-fitting code.
Viability of this strategy is unclear though, it seems a little risky from a correctness standpoint.)
For a full update, at a bare minimum, the following code will need to be updated:
The goal is to update the HMMs of Historian (ML and MCMC components) to use the systematic approximation to the indel model described here: https://academic.oup.com/genetics/article/216/4/1187/6065876
The primary difficulties in doing this are as follows:
(A limited strategy may be to leave the current inference code in place for the ML alignment/reconstruction phase (and associated HMMs), and update only the MCMC and parameter-fitting code.
Viability of this strategy is unclear though, it seems a little risky from a correctness standpoint.)
For a full update, at a bare minimum, the following code will need to be updated:
ProbModel::transProb
IndelCounts
,EventCounts
One question is... would it be better to build a generic, parallelizable, MCMC-only system using Machine Boss?
Pros of doing in in Machine Boss:
Cons of doing it in Machine Boss:
The text was updated successfully, but these errors were encountered: