Hannah Murfet (MCQI,CQP, BSc, DipQ) - Product Quality Manager, Horizon Discovery
Martin Kristensson (MSc) - Director of Sales & Project Manager, Visiopharm
(JW): Hannah Murfet, our Product Quality Manager here at Horizon will be presenting on the Blueprint to Bluesky proposal and is a medical biochemist turned quality professional with a strong interest in the quality and regulatory framework. And our other presenter I’m very pleased to announce is Visiopharm’s Martin Kristensson Director of Sales and Project Manager at Visiopharm which many of you may well know. Martin has a strong interest in digital pathology and continuously supports clinical laboratories in the transition to digital pathology. So without further ado I’d like to pass you over to the first presenter of the day, Hannah Murfet.
HM: Thank you Joe. Hi everyone, it’s a pleasure to be presenting this webinar today. This webinar will cover recent development, tools and research in the field of immunohistochemistry. Much of what I will explore will be based on the FDA-AACR-ASCO public workshop, as well as developments within Horizon Diagnostics.
So to begin with, my first slide covers the impact of biomarkers. In the 1990’s, the framework for companion diagnostics was very much one drug one assay paradigm, with around 5% drugs being based on a targeted therapy. As we move into the 2013, we can see that it’s moved to more of a 45% targeted therapy with multiple drug and multiple assays. If we look back to the 1990’s, it was one drug, one disease indication, one test and one allele. For example, Abbot break apart FISH probe for the drug cruzotanib. As we now move into the present time, we move into the EGFR with two different assays – we have the Roche Covax and the Qiagen therascreen, each with their own individual drug indications.
So new biomarkers are clearly increasing in complexity. There are different tumour indications and different combinations of influencing biomarkers. There are different IHC antibody clones available with different staining protocols and different platforms, different clinical decision points and different assessment methods. There are also challenges in the nature of the assessment – so there are challenges tainted by the limitations by IHC precision and challenges by limited biopsy tissue, particularly small cell lung cancer.
So these challenges exist within the clinical pipeline. And there are several opportunities here where new tools and technologies can assist laboratories with development and routine monitoring by internal controls. If we look to the beginning of the slide here, in research and development, be that assay development, sample screening, patient stratification, the challenges in the availability of clinical samples. Raiding the tissue archives presents many difficulties especially for those rare variants.
If we move into the second part of this slide, clinical studies - the different phases of clinical trials, the CLIA evaluations , patient stratification, these all present new clinical challenges. So in clinical trials you have multiple partners, multiple geographies and these can lead to drift in assay performance. Outsourcing to clinical resourcing organisations can make managing variation much more difficult. There is also unnecessary and sustained variability that can affect study performance.
And then the final part on market and companion diagnostics and proficiency testing. There are a number of other challenges, here such as manual evaluation methods leading to the potential of variation. New drugs rely on successful and effective adoption of diagnostics to ensure their reimbursement. And following proficiency testing ensuring that standardisation and accuracy of diagnostic testing are met.
So the regulatory challenge is quite significant and the new biomarker complexity can impact this. The safety and efficacy of new drugs has a large emphasis on marker positive without always necessarily considering the impact of those treatments on marker negative patients. With multiple tests that have the potential of changing the selected group for each test so just because one variant or one biomarker is indicated by one marker or another one, it doesn’t mean that those two tests are necessarily equivalent. For new tests for new drug treatments bridging studies may become necessary to support new safety and efficacy. And there’s potential as we move away from the one drug one marker assay, into a matrix of companion diagnostics, there’s a real risk of mismatched approved drugs and device combinations, let alone without considering the LDT scenario.
To put this into context, at the FDA-AACR-ASCO public workshop in March earlier this year there was a large discussion that took place with regard to particular biomarker PD-L1. So what is PD-L1? It is a biomarker with a major role in suppressing tumour immunity and it shows variation in tumour expression and response. It’s subject to a range of pre analytical and analytical variables and shows differences in application of assays, and therefore it’ll become more and more important to define stratification of the patient. Several open questions even still remain on the utility of PD-L1 as a predictive marker, especially in comparison with other biomarkers.
So the PD-L1 challenge just to put it into a few summary words here – there’s 4-8 drugs in development, each with their own parallel development programme, each with variation in clinical trial design. In addition to all of this there are multiple companion diagnostics, there is a different test for each drug.
This is the introduction to the blueprint and the blue sky. So the blueprint proposal is derived from the work undertaken by an industry consortium that was proposed and discussed by the FDA-AACR-ASCO public workshop. Bluesky is the development of novel IHC tools and this discussion will focus on Horizon Diagnostics and Visiopharm.
So in terms of blue sky development, this year we’ve launched at Horizon Diagnostics a range of IHC reference standards and we are continuing to increase that range in developing new targets such as PD-L1. So the advantage of our reference standards are that they provide an unlimited supply mechanism. They are cell line derived therefore provide consistency and accessibility compared to tissue archives. They undergo a reproducible production pathway taking the cell lines all the way through to the on slide control. It’s independent from tissue archives and so therefore can be used as a supplement and as addition. The same material source, being the cell lines, can be used for multiple tests – you no longer have the issue of availability of tumour material for validation. The same material can be used for NGS testing, IHC testing or for FISH testing. And they provide an on slide control so every slide can is like a miniature assay – it has its own environment, its own staining and therefore the on slide control provides a reference point to the material that may be placed on that slide. In our development it was critical to us to find a reproducible method of QCing our material and we embarked upon a series of assessments using quantitative digital pathology. At this point I’d like to hand you over to Martin who will further explore the value in quantitative digital pathology.
MK: Well thank you very much Hannah. So as Hannah mentioned tissue based immunohistochemistry data is already used extensively in clinical trials and in some indication areas as drug treatments, this kind of input may be increased further in importance as we’ve already talked about. So the only problem is there is a huge variability between centres in staining, protocols, staining quality, and even in the reading and interpretation of biomarker data. And all of these can have a big impact in the precision and accuracy of data. So at the very fundamental level, the challenge we are facing in all of these situations is data quality in the sense of sensitivity, specificity and reproducibility. And when it comes to tissue based data, the journey to biopsy to data is very long, complex and with several challenges along the way. It’s not difficult to imagine that any one of these steps can cause artefacts or errors or some variations.
So some of these steps include tissue preparation, which also includes fixation, tissue thickness and all sorts of other things. It includes staining protocols, staining sufficiency and of course the last reading, interpretation and quantification of biomarker expression. Now you’ve already heard from Hannah that some of these challenges can actually be solved quite efficiently by using genetically defined reference material and cell lines, but it’s also well know that manual assessment of biomarker expression is associated with quite a big significant intra and inter read and variability’s. And in some cases there are also limitations when it comes to sensitivity and specificity of manual biomarker assessment.
Now when we start to fill out some of these holes and apply standardisation, we get closer to a complete standardised workflow of working with human samples and along with biomarker development. So let me show you one of the difficulties in reading and interpretation when it comes to biomarker expression. These few series of images illustrate my point quite well. Here we’ve arranged a Ki67 stained sample from lowest reacting to highest reacting then we’ve asked one pathologist in this case to determine at what point would you call a nuclei positive or negative. And as you can see many of the ones on the right are negative and you can see the nuclei in question is highlighted by this red square. And we have to go to the second column to find the cut-off point between negative and positive. But in my non-pathologist opinion, but just as an image analysis expert there is very little difference in the positive and negative nuclei and it can be extremely difficult as a human to consistently detect that minute difference.
And of course I have the luxury of visiting a lot of pathology labs, research places and industries, and it is very rare that I have two pathologists or people setting the exact same cut off. So some would believe that image to the top right should also be positive, whilst others believe that the positive boundary should be a bit lower in that setting. So it’s very difficult to standardise the reading and interpretation of these biomarker expressions.
In multiple locations, we have shown that image analysis can be used, if used correctly, to effectively prove the accuracy and reproducibility of biomarker assessment. So here I’ve included just two examples, but of course there’s many more. In the one on the left, the pure contribution of inter read variability associated with Ki67 assessment was quantified across 20 tumours and 106 participating labs. So this was a huge study conducted by the Nordic QC. And in that study it was demonstrated that image analysis could be used to significantly reduce inter read variability by using a different method to quantify the tumour areas and quantify the positive and negative nuclei’s. So in another study, the one to the right, was conducted by the National Danish Validation study for HER2 assessment using algorithms. It was demonstrated how improved sensitivity and specificity of the algorithm when quantifying HER2 protein expression could significantly reduce inconclusive cases when comparing to gene amplification. When we do see the number of inconclusive cases, this translates directly to huge time and cost savings, especially as you can limit the number of reflex testings which typically require more expensive biomarkers such as HER2 FISH.
There’s a lot more detail offered on our website and other webinars on these kinds of studies.
So of course we see many obvious applications for this kind of automated method using image analysis. So by automating aspects of stain quality control it could be possible and maybe even scalable for EQA organisations to offer more frequent, perhaps on demand proficiency testing and calibration services, maybe even using genetically defined material together with image analysis to standardise as much of the process as possible. It would also be possible to perform an objective and quantitative standard estimation to improve the compliance and the protocol recommendations given by these EQA organisations.
In multi centre clinical trials it would be a lot easier to standardise and monitor the data from each of the centres because they can all use the same method, reference material and the same reading algorithms. Of course it is our big hope that larger diagnostic pathology labs would be able to benefit from such methods to closely monitor drift in the staining quality for biomarkers, but also use the same kind of algorithms on their human samples for patient diagnostics.
So with those few words I’ll leave it back to you Hannah.
HM: Thank you Martin. So to continue on from that we’re looking more now at the Horizon Diagnostic blue sky application and I’d just like to discuss a couple of key questions. So the first question is does your assay accurately and reproducibly measure what you expect it to, what you say it does? Does the assay actually identify a biological difference of clinical significance? Does your validation show evidence to support improvements in the clinical workflow? And how do you monitor for your IHC analytical performance?
So these kinds of questions, if you just have them at the back of your mind as we continue through a case study with one of our partner laboratories. So the IHC Bluesky application can be shown by a number of utilities of the reference standards. The performance can be assessed when optimising the protocol or platform. The reference standard can guide the evaluation of antibodies. Protocols can be assessed for use of the same antibody clone and concordance or discordance can be assessed to determine reproducibility.
As part of our development of an upcoming variant, the PD-L1 IHC reference standard, we produced a IHC PD-L1 array containing 8 cell lines with negative and positive protein expressing levels. The results shown here show that the PD-L1 expression levels using the Cell Signalling antibody shown at the bottom here. Here you can see positive in the brown stain and the negative cores in blue.
To improve our confidence as part of our assessment was to use quantitative digital pathology and these two graphs show our initial evaluations, showing the percentage positive and the H-scores of each of those individual cores. Allowing us to see a clear difference between what is determined to be positive and what is determined to be negative.
And here you can see the individual images of the positive reference standard and then the negative.
In working up our development, we provided our reference standards to a top US reference laboratory with a validated protocol for PD-L1. And this was their results for comparison. As you can see, they used the same antibody at the same concentration and we can see here that there is a lot more positive staining even within the negative cores.
And here we have the close up of the positive core and what is supposed to be the negative core. And you can see a large amount of off target staining and its clearly going to need some investigation that takes place here to understand why the negative core is presenting as positive. The laboratory did some investigation and performed an orthogonal method that proved that actually in fact on the basis of RNA there was an issue with the assay and not with the performance of the reference standard.
So I’d just like you to think about how you monitor for your everyday reagent performance. How do you evaluate potential fault points, such as the serial dilution of antibodies. And consider how LDT’s have positive and negative aspects but how this potential drawback could be mitigated using new technologies.
And finally I’d like to touch on the great work undertaken by the IHC Blueprint proposal. The goal of this group is to package information together on analytical comparisons of various diagnostic assays that may be conducted. This increased understanding of the analytical performance of difference PD-L1 biomarker assays. The study of the (----audio cuts out----20.22-20.27). The sample being used for this study are a mix of sample types that are representative of target patient populations with a focus on non small cell lung cancer and PD-L1 testing. The staining will take place on investigation use only assays at the individual diagnostic stakeholders, and the evaluation will take place by both a company pathologist and an independent third party. I think this kind of study goes to show what great research is being undertaken and Horizon is always a great advocate of these kind of studies so we are very much looking forward to seeing the results.
So for the penultimate slide, this is shows Horizon Discovery’s range of products and services. For us collaboration is absolutely key and so we engage with a range of different industry stakeholders and academics to provide an end to end product and service to support changes in healthcare practises.