The 2018 Nobel laureate James Allison brings inspiration to all maverick researchers. He bucked convention and toiled at the fringes with custom assays for protein breakthroughs in immunotherapy.
Conventional assays use chemistry and can take days to weeks to create if even possible. That’s why one experiment can take weeks to months — while breakthroughs take years to decades. But what if we accelerate custom assays by 10X?
Digitalizing peptides with mass spectrometry transforms assays from chemical to algorithmic (and labs into tech startups). Although it converts tricky chemistry into tricky m/z Sudoku, the latter becomes solvable with computers that, unlike chemistry, improves exponentially over time. That’s the game-changer!
For example, drug discovery can be accelerated with cell cultures grown with-vs-without isotope-labeled amino acids, then differentially quantified with a high-sensitivity SILAC “cyber-assay” (SORCERER™ algorithm) for specific pathway proteins.
With SORCERER CyberAssay™ technology designed for rapid turnaround, a high-skill lab can in theory create and run a custom assay practically every day, which is 10X more productive than labs running a conventional assay every two weeks.
The secret: exponential productivity
The Industrial Revolution delivered high-throughput with physical machines (e.g. printing press) whose capability increases linearly with time.
The Information Revolution ups it another level with exponential productivity through electronic machines, notably computers and mass spectrometers, whose capability doubles every year or two. At least that’s the potential which takes skilled engineers and scientists to realize.
Mass spectrometry data actually grow faster than Moore’s Law, so every year the acquisition-vs-analysis gap grows exponentially wider. That means having the latest PC is not enough. Instead, just keeping pace means add exponentially more CPU cycles every year. In practice, this means running multi-core servers overnight and shifting to a programmable heterogeneous workflow that focuses most CPU cycles on the most promising subsets. For example, infection researchers would deep-analyze only non-human data that make up less than 1% of the bio-sample.
Understanding software as cyber-assays
Many chemists and biologists seem intimidated by computational workflows. Parallels can be drawn between chemical and data processing that deepen understanding.
A conventional assay uses chemistry to detect/quantify a specific protein within a bio-sample. An immunoassay is created by purifying the protein (hard), injecting the protein into a rat, and harvesting its antibodies to affix to a surface. The assay is then bathed in a bio-sample solution to see how much sticks. Note low-quality assays can work for test mixtures but yield irreproducible results for real-world bio-samples.
Tandem mass spectrometry (MSMS) can also detect/quantify one or more proteins, but using fragment ion masses instead. Since every peptide can be fragmented, MSMS can be viewed as a universal multiplexed assay. As noted, its capability grows exponentially over time, but computing requirements grow even faster. Like chemical assays, low-quality MSMS cyber-assays can work for test mixtures but not necessarily for bio-samples. Cyber-assays have significant differences that can be confusing.
We can think of a chemical assay in terms of proteins having an ID barcode and each antibody as a scanner looking for a match. Agitating the solution allows every molecule to be scanned by an antibody. In contrast, cyber-assays can only scan digitalized molecules.
For the cyber-assay of a single protein, its barcode is the protein sequence and the scanner is the computer algorithm. Since MSMS data is from digested proteins, the barcode is broken up and scattered as peptide sequences. Additionally, signal-to-noise and limit-of-detection mean m/z peaks from the lowest abundance peptides are distorted or missing. Therefore, as a practical matter, any protein of clinical interest would be elusive with incomplete or ambiguous data.
Three key insights: First, sensitivity is proportional to data volume, preferably unfiltered data with wide dynamic range. Second, computation is linear with data volume and exponential with signal noise, so finite computing means a targeted approach. Finally, no automated algorithm can match a scientist’s sixth sense, since humans have a higher understanding than any finite-dimensional computer model.
Hence the three parts to detect/quantify an elusive protein: [1] massive amounts of big deep data, [2] a semi-custom server algorithm to find raw m/z evidence, and [3] scientific experts to interpret ambiguous detection/quantitation.
The cyber-assay abstraction clarifies misconceptions. Some labs worried about data analysis throughput choose instruments that generated smaller data files, but that compromises sensitivity. (The only correct answer is to fully fund IT in any competitive information science.) Many labs seek fast, fully-automated PC programs for high-throughput productivity. But the math shows they are limited to abundant proteins at best and irreproducible results at worse — hardly delivering true productivity.
Most importantly, any cyber-assay that uses spectral filtering or statistical matching (basically all except CyberAssay [click here]) means low-specificity — a fundamental problem with biomarker discovery.
Immuno-proteomics is a prototypical field ripe for ground-breaking discoveries but is poorly served by canned methodologies. Given current understanding, immune-peptide sequencing is computational intractable for large spectra sets. However, any one spectrum can be sequenced with high confidence using an outline strategy [click here]. Custom cyber-assays are ideal for unlocking secrets.
In our view
If exponential productivity requires a flexible server platform applying custom cyber-assays, then the de facto standard of canned PC programs implies constant low productivity. This inconvenient truth is consistent with objective reality.
The problem: The same echo-chamber dynamics that perpetuates fake news via social media is also propagating irreproducible proteomics via peer-review. In a paradigm shift, peers may no longer be experts, but politics and misinformation get in the way of seeking new experts and new methodologies. This allows out-of-the-box upstarts to leapfrog established leaders.
The misconceptions: Chemists best know data, and data analysis (given high accuracy) is trivial “programming,” so the best software must be written by chemists who program. And since all “software” superficially look the same (common benchmarks show 90%+ agreement), the best software must also be the most efficient — i.e. fastest and cheapest reporting the most IDs— usually from academic labs.
The fallacy: Misunderstanding signal-to-noise conflates shallow vs. deep analysis. Noise-free data analysis is easy but practically useless. Robust real-world application, including noise suppression, is the art and focus of engineering.
As an analogy, consider three categories of tax software: [1] low-priced PC products for beginners (e.g. TurboTax), [2] a professional platform for CPAs handling multi-nationals, and [3] downloadable student projects using statistical models to guesstimate tax from summary data.
They correspond to: [1] canned shallow analysis, [2] interactive deep analysis, and [3] hit-or-miss approximation, respectively. Tax pros make modest pay with #1, millions with #2, and lose their job with #3.
A common rookie mistake is to run a benchmark without objectives or hypotheses using simple test data to “see what happens.” For starters, all three would compute near-identical shallow results which falsely implies all are functionally equivalent. Worse, faulty error estimation actually makes the least accurate software look the best. Instead, benchmarking with real data for deep identifications will tell a more meaningful story.
Every few years the community considers a new PC algorithm to be a global breakthrough only to have it fade away — a predictable outcome when research prototypes are confused with robust products. Like alchemists set on turning Pb into gold, labs do the same thing over and over expecting different results. Mismatched exponential scaling explains how PC programs are increasingly futile for real research. A proteomics lab with no servers is basically a tech startup with no tech.
SORCERER accelerates success
Scientists are anxious about funding, but that’s inside-out thinking. It’s exponentially more productive to focus on success which solves funding and most other problems. But time is of the essence.
Cyber-assays are a novel inside-out paradigm for precise deep proteomics that anyone can understand. However, quality development requires both the science of fragmentation and engineering trade-offs.
Proteomics may be reaching an inflection point transformed by 10X technologies like ion mobility MSMS and CyberAssay. This means the race is on to master key technologies, with a typical window of opportunity of perhaps 2 years to position yourself. For most bio-science labs, the Achilles heel is IT platform and know-how.
CyberAssay is a collection of patented and patent-pending algorithmic technologies on the scalable SORCERER platform. SORCERER is available in three primary configurations: the symbolically priced developer license (virtual machine suitable for laptops), the low-priced SORCERER X™ entry-level integrated data appliance (iDA), and the flagship SORCERER Pro™ iDA for serious research.
Sage-N Research, Inc. begins focus on contract services on our platform for rapid cyber-assay development during this critical time. We seek scientific consultants and business partners to help develop custom cyber-assays for present and future clients for breakthrough discoveries.
Please contact Terri (Sales@SageNResearch.com) for information.
Leave a Reply
Send Us Your Thoughts On This Post.