Astrophysics comprises at least two distinct fields requiring different mathematics. Profiling black holes is a modeling exercise, while discovering unknown Saturn moons requires precise raw data mining.
Using one math to solve the other won’t work. In particular, using old discoveries to build a model that is blindly applied to new data would yield mostly false positives and negatives. To be clear, models are useful for gross pre-filtering, but not for novel discovery, because they cannot embody yet-to-be-discovered insights.
It’s easy to explain why clinical proteomics has had low success
The math is unfit for purpose. Labs mistakenly conflate proteomics’s multiple mathematical fields as one, and blindly apply ad hoc models (often free and fast PC downloads) to produce costly false positives.
Recall proteomics originated as a modeling exercise to catalog every protein in the human ‘proteome’. Since the first draft need not be real science, “homemade” and even opaque statistical models can pass peer-review, 1% FDR is adequate, and individual accuracy is irrelevant. Such analyses can be highly irreproducible.
But along the way, “clinical proteomics” evolved such as for discovering novel biomarkers. Targets of interest may be ultra-low abundance, modified proteins manifested in relatively few data-points hiding in huge datasets. Unfortunately, the same ad hoc models (e.g. “probability scores” instead of rigorous p-values) are used, which is obviously an oversight. As a result, every lab has been overwhelmed by false positives.
For the record, we recently invented the solution (e.g. SorcererScore™) based on precise raw data mining similar to discovering Saturn moons. Parts were published in posters presented to top researchers with no pushback. It turns out that any needle-in-a-haystack discovery necessarily involves very few data-points, so the math has to be simple. In a nutshell, around eight lowest +1 y-ions, if near-perfectly matched, are typically enough to identify the modified protein form to a large degree.
Therefore, like dots from elusive Saturn moons, the computational challenge is not the math, but the discovery of the critical few data-points not knowing exactly what to expect, whether they even exist, or where they are.
The discovery and fix of the math problem changes everything.
Proteomics seems to be in a holding pattern awaiting a data analysis breakthrough. Most probably prejudged the breakthrough to be a scholarly statistics paper from an academic lab, not an email from a Silicon Valley math geek from MIT. On the other hand, maybe it’s not shocking that computational discoveries can come from computation experts.
To be honest, something always seemed amiss. Proteomics uniquely relies on fast PC programs instead of powerful servers for huge datasets. (It’s hard to imagine serious astrophysicists hunting moons with PCs.) But since no one found the problem, the field staggered forward with minimal success. Once projected at rapid growth (33%/year or double every 2.5 years), it stalled just short of $1B/year because of the hidden problem.
The near-$B market will soon double from realizing its potential to accelerate medical discoveries. The key is to be able to mine the content-rich data. This is not a push-button software play, but rather a semi-custom script on a data mining platform. Note infrastructure, whether bridges or data platforms, are ill-suited for first-timers due to robustness, maintenance, and cost of failure.
The bottom-line
Odds are, any clinical proteomics project using conventional statistical workflows will likely fail from unfit math, with false positives prolonging false hopes. But labs are caught in a Catch-22: Their funding probably stipulates using peer-reviewed workflows that won’t work. One political solution may be to run a parallel data mining workflow, both to cross-check false positives and as an insurance policy if/when the conventional workflow fails.
In my experience, too many labs conflate software and technology. With “software” products, it’s always available, so maybe you wait for a special deal, or get your grad student to write a little app. With “technology” products, it’s more like stock investment — you get in as early as possible before it’s too late.
Leave a Reply
Send Us Your Thoughts On This Post.