Eight of the ten top-selling pharmaceutical products in 2016 were biopharmaceuticals – drugs produced in or obtained from biological sources – most of which are large therapeutic proteins. Advantages of proteins as therapeutic agents are their high specificity, large therapeutic impact, and long in vivo half-life.  Due to the very complex structure of therapeutic proteins and their production in host organisms, drug development is very challenging, as the biotechnological production process and its natural variability significantly affect the chemical structure, and hence, biological and therapeutic function of the target molecule. Therefore, biotherapeutics have to be extensively characterized in order to prove structural integrity and therapeutic safety of the product. From the “totality-of-the-evidence-concept of biosimilarity”, we learn that multiparametric, physico-chemical characterization represents the fundament of evidence providing sufficient structural, functional, nonclinical, and clinical data to demonstrate that no clinically meaningful differences in quality, safety, or efficacy are observed compared to a reference product. This concept is also essential in the approval of “generic” biopharmaceuticals (so-called biosimilars) after expiration of patent protection for the originator biopharmaceuticals.

The Christian Doppler Laboratory for Innovative Tools for the Characterization of Biosimilars (short CDL for Biosimilar Characterization) was, therefore, installed in 2013 at the Paris-Lodron University of Salzburg with the ambitious aim of developing and implementing novel analytical tools for the multiparametric, physico-chemical characterization of molecular attributes of biotherapeutics. Five scientific experts of the Department of Biosciences contributed their knowhow: Hans Brandstetter in protein structural and functional analysis, Chiara Cabrele (later together with Mario Schubert) in polypeptide synthesis and modification as well as in nuclear magnetic resonance spectroscopy, Gabriele Gadermaier in protein expression and biochemical and functional protein characterization, Christian Huber in protein separation and analysis by chromatography and mass spectrometry, and Hanno Stutz in protein separations by electromigrative techniques. The scientific program was shaped through the cooperation with two industry partners rating among the largest worldwide active in the area of biosimilar production, namely Novartis, as well as marketing and analytical solutions for protein characterization (Thermo Fisher Scientific). Consequently, the CDL research team was confronted with cutting-edge real-life problems of the biopharmaceutical and instrument industry. The collaboration in the Christian Doppler Laboratory was extremely friendly and productive, leading to a number of novel tools for the characterization both of the structure and function of therapeutic proteins.

Project area 1.1 delivered and archived a dedicated set of molecules highly relevant in the context of therapeutic protein characterization. Biotherapeutics for various therapeutic indications ranging from cancer to inflammatory diseases, including MabThera®, Rituxan®, Neupogen®, Neulasta®, Enbrel®, Avastin®, Humira®, Herceptin®, Ovitrelle® were mainly supplied as originator molecules and obtained from the respective provider. Moreover, a set of allergenic proteins was recombinantly produced and characterized in order to provide protein-based model systems for studying different parameters such as protein sequence, 3D structure, post-translational modifications (PTMs), and degradation. These proteins were utilized throughout the research program to develop and validate novel characterization method targeting the structure and function of the biotherapeutics.

Therapeutic proteins typically consist of 20 structurally diverse building blocks – amino acids – that are assembled into long linear chains of up to several hundred building blocks, which fold into a characteristic three-dimensional structure that is essential for the correct biological function of the biomolecule. Moreover, small chemical or enzymatic modifications  (post-translational modifications), may be introduced during or after production via bioprocesses, which usually influence the biological function of the biotherapeutic quite significantly. Additionally therapeutic proteins are commonly quite sensitive towards external conditions such as heat, UV light and inappropriate buffer conditions, leading to artificial chemical modifications. In project area 1.2, the analytical technique of nuclear magnetic resonance (NMR) spectroscopy, which is recognized as the gold standard to verify the exact identity of an organic compound or to identify an unknown compound, was implemented to study the detailed molecular structure of biotherapeutics. The principal limitation of the NMR technology to large molecules could be overcome by measuring the macromolecular compounds under conditions that unfold protein structures to flexible chains (denaturing conditions), which eliminates problem of spectral complexity and low quality of signals normally obtained for biotherapeutics.

In addition to the known signals of all 20 amino acids, post-translational modifications lead to a characteristic pattern of signals, which can be exploited in the search for structurally and functionally relevant modifications in biotherapeutics. Hence, the goal of the project area 1.2 was to explore the characteristic signatures of a variety of modifications, which can be used to proof their presence in an investigated protein. Unique and characteristic NMR signatures were obtained using small synthetic peptides containing the modification under investigation. The expertise of peptide synthesis in this project area was a clear advantage. Modifications that could be unambiguously identified include the oxidation of methionines and tryptophanes, the cyclization of N-terminal glutamine, the deamidation of asparagines, peptide chain cleavage and N-terminal gluconoylation.

For therapeutic antibodies, the exact recognition site of the antigen – the recognized structure in the target molecule – is of crucial importance. So far, there was no reliable, robust and efficient method available to identify the epitope of an antibody. X-ray crystallography is presumably the best available method for this purpose, but it is quite laborious and there is no guarantee to find the epitope, because it relies on crystal formation of antibody-antigen complexes. As an alternative, we developed an NMR method based on the measurement of the exchange rate between hydrogen and deuterium for each amino acid residue for the complex between antibody and antigen. This approach, which we called hydrogen/deuterium exchange memory (HDXMEM), was successfully applied to identify epitopes of the allergen Art v 3 form common mugwort (Artemisia vulgaris) recognized by three different monoclonal antibodies. It is a generally applicable method for the identification of the epitope of each monoclonal antibody.

Most changes in the chemical structure of a biotherapeutic are associated with a change in the properties of the molecular surface or molecular mass of the protein, which can be sensed by analytical techniques called high-performance liquid chromatography (HPLC) and mass spectrometry (MS), respectively. Therefore, project area 1.3 explored new approaches to detect and quantify such changes by HPLC or MS. Very often, both techniques are utilized in combination, then termed HPLC-MS. Taking advantage of the high sensitivity of HPLC and MS towards small structural differences, it was possible to demonstrate the different molecular species evolving during the biotechnological production of therapeutic antibodies in a rapid and efficient “dilute-and-shoot” workflow. In another workflow employing a special technique of MS, native MS, more than 100 different slightly different molecular species, so-called proteoforms of the protein therapeutic etanercept (“Enbrel®”) were discovered. This enlightening insight that nature not only provides the tools to produce functional proteins in an enormous space of structural variability, but also that we can find ways to experimentally verify and characterize (at least part of) the complexity of such proteoforms was well appreciated by the scientific community. It led to the invitation to join an international panel of experts in protein characterization to collect a perspective article dealing with the question “How many human protein isoforms are there”, which was published in Nature Chemical Biology and has been cited more than 350 times since publication in 2018.

Changes in the 3-dimensional structure of proteins, so-called conformational changes, are known to affect protein functions. The general applicability of separation based on electrical charge and molecular shape of the biotherapeutics in an electrical field performed in fused silica capillaries with miniaturized diameter (capillary electrophoresis, CE) was tested in project area 1.4 for the distinction of these conformational variants. Individual protein variants were identified by direct combination with MS (CE-MS). The recombinant major allergen from mugwort (rArt v 3.0201) was selected as a model protein and exposed to different stress conditions such as extreme pH values and increased temperatures for various durations. The induced conformational changes were confirmed with far-UV circular dichroism spectroscopy and affected the allergen reactivity with immunoglobulin E, which is highly relevant for the allergic response of the immune system to the allergen. A disruption in the original intramolecular bridging (i.e. by disulfide bonds) was identified as the molecular origin for changes in the protein structure. Instead, new intramolecular thioether bonds, so-called lanthionines, were formed, which are responsible for the changes in the protein structure. Moreover, oxidations and deamidations of the allergen were identified, in total generating 41 novel variants of the allergen. The identified molecular principle of the structural changes of the model protein in response to heating allows for the explanation of the observed loss of allergenicity of food allergens in heat-processed food. Since similar changes were previously also observed in mAbs, the demonstrated applicability of CE-MS may constitutes an innovative tool in differentiating and identifying such variants.

An improved performance of CE for the separation of structural variants of the biotherapeutics was achieved through chemical modification of the surface of the fused silica separation capillaries. A novel coating composed of alternating layers of countercharged polyelectrolytes was developed and successfully applied in the separation of variants of a challenging therapeutic mAb, i.e. rituximab. An innovative mode of atomic force microscopy (AFM-TREC) was applied for simultaneous imaging of the topography and the surface charge distribution of the coating. This multilayer coating of capillaries was successfully applied in CE-MS analysis of mAbs. For the first time, quasi-native separation conditions were realized. Enzymatically generated fragments of two mAbs (original and copy product of rituximab) were separated and identified by CE-MS, which facilitated the distinction of charge variants (lysine- and deamidation variants) as well as glycovariants. In addition, another CE-based separation mode performed in a pH gradient (capillary isoelectric focusing, CIEF), which is based on a distinction of isoelectric points of different protein (variants), was developed for the characterization of mAbs. For the first time, an integrated CIEF strategy under non-denaturing conditions, which combines results of intact mAbs and corresponding fragments, was realized.

Proteins, like all biomolecules, in general serve important biological functions. They do so either alone or in concert with other biomolecules. The latter “contextual function” is mediated by principles like molecular recognition and modifications. To characterize these properties, we developed two novel key techniques in project area 1.5 that inherently capitalize on exactly these molecular principles, i.e. the aptamer technology and the enzyme cascade.
An aptamer is a nucleic-acid-based biomacromolecule (typically an oligonucleotide), which is able to adopt a characteristic three-dimensional structure that binds very specifically to a well-defined sub-structure of a target molecule. The term aptamer derives from Latin and Greek roots: aptus fitting and meros part. Indeed, aptamers are perfectly fitting parts with manifold applications, including the characterization and discrimination of proteins of interest by their binding properties. Starting from large compound banks with billions of different oligonucleotides, the aptamer technology uses repetitive amplification and selection steps to identify oligonucleotides, which bind to the protein of interest with high affinity and selectivity. We have utilized a set of selected aptamers to probe the surface of biotherapeutics in order to confirm their correct structure or detect minute difference in the structure, which results in weaker binding of the specific aptamers.

The enzyme cascade approach relies on the selective recognition of proteins of interest by analytical enzymes. Enzymes are proteins, which catalyze tailor-made modification reactions, e.g., the cleavage of a peptide bond or the conversion of an amino acid within another protein. The type and extent of modifications are sensitive to the three-dimensional structure of the protein of interest. By combining several enzymatic modifications it is possible to pick up small variations in the protein in a cascade-like manner, thus allowing their ultra-sensitive detection, even if these variations affect only a fractional amount of the protein. Protein modifications can be assembled into a massive digital database, which should ultimately cover all proteins. To fully exploit such a “modificatome” database, algorithms are used which are currently expert-trained, but will eventually use artificial intelligence to recognize and assign distinct modifications to a characteristic physiological or pathological context. These features combine to make the enzyme cascade an attractive method for many relevant applications, including early detection of age-dependent diseases like cancer or neurodegeneration, where aberrations in biomarker proteins develop only over years, yet an early detection is critical.

An important lesson learnt in the CDL was the importance of meticulous interpretation of the vast amount of data generated by modern analytical methods that is only feasible via elaborate and dedicated (bio)informatics workflows. Thus, the CDL had significant support from bioinformatics experts, who finally managed to make sense of the generated data and to materialize the protein heterogeneity hidden in the experimental data. The research performed in project areas 2.1-2.3, therefore, focused on computational treatment of the vast amount of data generated in the project areas 1.1-1.5. For instance, the typical readout of a single mass spectrometry measurement is a mass spectrum, which can be generated on modern instruments at a frequency of 40 spectra per second, summing up to 144.000 per hour. It is impossible to evaluate such a high number of experimental outputs in a manual manner. Therefore, we implemented and evaluated different computational tools in order to extract the requested analytical information from the raw data. In some cases, however, no software tools were available to comprehensively answer the research questions under investigation. Therefore, the CDL developed its own software tools, especially for assisting the very challenging data interpretation of mass spectrometric data from very complex glycoprotein structures. Thus, it was possible for the first time to provide experimental proof of the existence of more than 1000 different protein variants in the biotherapeutic Ovitrelle®, which contains a recombinant version of the pregnancy hormone human chorionic gonadotropin. Moreover, a clear documentation of developed methods in form of standardized protocols is essential to guarantee a correct transfer of knowledge between operators. This is of particular relevance for translation of research in order to foster the method transfer from academic institutions to industry. Numerous standard operating procedures (SOPs), performance qualifications (PQ), and standardized workflows were compiled and transferred to the industry partners. This form of knowledge transfer constitutes an integral aspect of the industrial method implementation.

Feedback from the industry partners regarding the work performed and transferred in the CDL listed the following major categories of benefits: (1) Truly innovative approaches, representing high-risk endeavors that were unlikely to be investigated in a solely industrial environment. The aptamer technology, or the analytical cascade of enzymes, or the application of nuclear magnetic resonance spectroscopy for studying biotherapeutics are falling under this category. The scientific output represented by publications is recognized as the strongest benefit supporting activities especially in the instrument-manufacturing industry. Ten of forty-one publications were co-authored by colleagues from the industry partners, documenting the significant level of scientific collaboration in the project. (2) Transferable approaches, constituting a significant enhancement over technologies already established at the industry partners. Specifically, two approaches related to the investigation of protein modification with polyethyleneglycol(PEG) in pegfilgrastim and to studying antibody oxidation were proven of high value for internal industrial use, and the former was even recognized by regulatory agencies as a new gold-standard in the analysis of pegfilgrastim. The implementation of native MS to study protein glycosylation is seen as groundbreaking work for the dissection of highly complex glycosylation patterns by MS. (3) Public visibility for topics of characterization of biotherapeutics. Publication and scientific presentation of topics related to biotherapeutics supports the industry partners in their international visibility and to demonstrate and maintain their efforts in pro-active research related to manufacturing highest-quality biopharmaceuticals for medical treatment. These activities distribute the scientific concepts of the new approaches in conjunction with the need for thorough biosimilar characterization to a broad audience outside of Novartis, Thermo Fisher Scientific and the pharmaceutical industry in general. (4) Education of researchers to continue within the academic field or enter the industrial work force in the field of biopharmaceutical production and quality control. Several former students of the CDL or working in the periphery of the CDL have joined the companies, which underlines the importance of the CDL as an educational institution for preparing the young generation for a career in the pharmaceutical industry.

Besides more than 40 publications in peer-reviewed scientific journals between 2013 and 2021, the most valuable output of the Christian Doppler Laboratory was the teaching and education of 3 Bachelor-, 12 Master-, and 11 PhD students, who either continued their studies or got readily hired to fill interesting and responsible positions mostly in the pharmaceutical and lab instrumentation industry. Moreover, two advanced researchers of the Christian Doppler Laboratory managed to obtain academic lecturing qualification, Gabriele Gadermaier and Therese Wohlschlager. They all represent a well-educated and multidisciplinary network of experts in therapeutic protein characterization ready to spread around the world. A very unique and distinguishing experience both for advanced academic researchers as well as bachelor-, master-, and PhD students in the Christian-Doppler Laboratory was to gather insights into the practices of approaching challenges in the pharmaceutical and instrumental industry, including well-defined research problems, distinct milestones, and careful reporting as helpful tools of project management. On the other hand, our partners were also very open to unexpected findings and innovative approaches that could possibly open the doors to novel characterization approaches of protein structures. Nevertheless, it was sometimes challenging to tackle quite diametric interests of commercial production/marketing of pharmaceuticals as well as scientific instruments and the academic goals of performing pure and unbiased scientific research. In the end we are all very thankful for any kind of support that enabled research in this extremely interesting area of science: from the industry partners Novartis and Thermo Fisher Scientific, from the Christian Doppler Society and its financiers, the federal Ministry of Digital and Economic Affairs and the National Foundation, from the State of Salzburg, from the Allergy Cancer BioNano Research Center, from the Department of Biosciences, from the University of Salzburg, and too many more to be named individually.

What comes next?

We have joined forces with another Christian Doppler Laboratory, the Christian Doppler Laboratory of Mechanistic and Physiological Methods for Improved Bioprocesses installed at the Technical University of Vienna, with the ambitious goal of combining molecular characterization attributes as provided by our analytical tools with a comprehensive set of bioprocess parameters together with host-cell “-omics” data in order to derive empirical and mechanistic models for predictable tuning of bioprocesses to obtain desired molecular product quality attributes. A proposal for a research group (“DigiTherapeutX: Integrated Digitalized Production of Protein Therapeutics: Linking Molecular Attributes to Process Optimization via Digital Twins”) was recently granted by the Austrian Science Fund, which enables us to continue this exciting and challenging research into production, structure, and function of biotherapeutics.

The group hypothesizes that simultaneous monitoring of quality attributes facilitates instant tuning of critical process parameters (CPPs) to achieve a desired critical quality attribute (CQA) product profile. The relationship between CQAs and CPPs is highly nonlinear and multidimensional. Therefore, additional metabolome and proteome data will be merged the first time with well controlled CPPs and enhanced CQA analytics. However, due to the complexity of the product, not all quality attributes can be measured. Hence, there is a strong need for enhanced modelling approaches, for capturing the process understanding and deploying it in digital twins (DT) for robust control strategies. Computational modeling is based on the unique combination of comprehensive process data, omics-data, and multiple molecular attributes of the therapeutic proteins. Please visit our new website to learn more about this initiative.