Key Facts in COVID19 Testing
Wayne Dimech, NRL Executive Manager provides a useful overview of SARS-Cov-2 testing and the COVID-19 illness.
Part 1: Testing for COVID-19
“Test, test, test.” That was the advice of Tedros Adhanom Ghebreyesus , the Director General of the World Health Organisation (WHO). Over the past month, numerous media releases extol the release of SARS-Cov-2 (COVID-19) tests; 5 minutes tests; only a drop of blood required; photo ops in the Rose Garden for CEOs of test kit manufacturers. Laboratory testing has been pushed to the forefront of the global response to the COVID-19 pandemic. In this unprecedented situation, it is important to understand the role of pathology testing and critical not to assume that the testing is always accurate at this early stage of a new and emerging infection.
First, a brief introduction. I have been a medical microbiology scientist since the early 1980s and for the past two decades worked at the NRL, Australia a not-for-profit, WHO Collaborating Centre, based in Melbourne, whose mission is to promote the quality of tests for infectious diseases, globally. NRL actively collaborates with WHO, the US Centres for Disease Control (CDC) and non-government organisations (NGO) such as Foundation Merieux (Lyon, France) and FIND (Geneva, Switzerland). We monitor Australian infectious disease testing laboratories, funded by the Australian Government, and work closely with most of the largest in-vitro diagnostic device (IVD; or test kit) manufacturers, designing and performing test kit evaluations on their behalf. NRL is one of a small number of laboratories world-wide authorised to evaluate test kits on behalf of the WHO Prequalification Program. Understanding the performance of test kits and monitoring their performance is our bread and butter.
It is important to know a bit about testing for COVID-19 to understand the impact that this emergency has on the quality of testing. For those not closely connected to laboratory testing, I will describe aspects of COVID-19 testing. Early in the infection, the virus grows in cells at the back of the nose and throat, taking about 5-14 days before symptoms appear. A swab of the nose or throat can be taken and the nucleic acid of the virus detected. The virus is detectable several days before symptoms appear and up to eight day after symptoms. However, this period of detectability will depend on the analytical sensitivity of the nucleic acid test (NAT), how well the sample was taken and the amount of virus present at the time of swabbing. Once the antibody response begins, the virus is cleared from the body. Once the virus is no longer present in the nose/throat, NAT will be negative. In the initial response to the pandemic, NAT protocols were developed by the laboratories using one of several methods described by WHO, Charité Institute of Virology in Germany, the University of Hong Kong (HKU), US CDC and others. Other laboratories have developed their own methods using different approaches.
NAT can target different parts of the viral nucleic acid, which may affect their performance, especially with regard to cross reactivity with other, similar viruses. If the test is less specific, it may have a high rate of false positive results. If it is not designed to a stable part of the viral sequence and this change, it may miss some true positives. Development of laboratory based NAT allowed for rapid introduction of much needed testing. Unfortunately, the CDC test faced initial difficulties, delaying the response and highlighting that, even in the best of laboratories, things can go wrong. These initial “home-brew” or “in-house” tests must be performed in a laboratory, using specialised equipment and performed by scientists and technicians with a high level of expertise. More recently, IVD manufacturers have developed NAT that can be manufactured and supplied in large quantities under strict manufacturing processes. This has allowed rapid scale-up of testing and access to testing by a greater number of laboratories and using automated test platforms.
Generally in viral infections, around the time when symptoms start, as part of a series of complex chemical reactions, white blood cells start making antibodies to parts of the virus. At first these antibodies are relatively non-specific and react to certain parts of the virus called antigens, but may also cross-react to other non-COVID-19 antigens of similar structure. Over time the antibody response matures, becomes more specific and targeted to COVID-19 viral antigens. There is still a possibility that the antibodies may cross react to antigen of the same family of virus (non-COVID-19). There are several different types of antibodies called IgM and IgG. IgM antibodies usually are the first to be detectable, but are transient and become undetectable after a few weeks. IgG is formed around the time of resolution of symptoms and the level continues to increase for a period. Generally, IgG remains detectable in a person’s bloodstream. Another antibody type, IgA, is secreted by mucous membranes in the nose and throat, but is also found in blood. Tests for antibody (serology tests) can be used to determine if a person has been exposed to a particular infection. They are an important tool to determine the percentage of people with IgG antibodies and therefore considered immune, potentially allowing them to return to normal activities and/or to estimate herd immunity (seroprevalence) that is, when sufficient percentage of the population has immunity to slow or stop viral transmission. Detecting different antibodies may help in differentiating those recently infected (by detecting IgM and possibly IgA) from those that have had the infection in the past. This will be very important after the pandemic subsides and possibly a new wave emerges.
Serology tests come in different forms. Some detect one of the immunoglobulin classes, others detect all types (total antibody tests) without differentiating between which type is detected, and others detect each class individually. Currently for COVID-19 testing, the majority of tests are rapid test devices (RDTs), similar to personal use pregnancy tests. RDTs usually require a drop of whole blood from a finger prick. Although they are relatively rapid, taking less than 15 minutes to perform, they are often performed singly or in small batches. It should be noted that an unusually large number of RDTs have come to market since the emergence of COVID-19, including from manufacturers that have not been active in the past. The quality of these tests is unknown. IVD manufacturers will soon release laboratory-based antibody tests that can be performed in large numbers on automated test platforms. One such test is already on the market (SNIBE Diagnostics). This will allow more widespread serology testing.
To summarise, testing for COVID-19, like any other infectious disease, is complex and requires detailed knowledge of the performance of the test kits used. Initial screening for COVID-19 is by detecting viral nucleic acid from swabs of nose and/or throat. NAT can be either developed in-house by the laboratory or by commercial test manufacturers. A swab test that can detect virus only in large qualities may miss early and/or late infections and its accuracy will be dependent on good quality samples. Blood tests detect antibodies to the virus. Different types of antibodies can be detected either as a combination or separately. Serology tests currently are PoCTs that use fingerpick samples but high throughput, serum-based laboratory serology tests are on the way. Each of these tests will be positive at different stages of the disease, as different markers are detectable at different times. So, although the statement “Test, test, test!” is essentially correct, it should come with a caveat: “Test” - using test kits for their stated purpose; “Test” – understanding what you are testing for and what the limitations of the test are, and “Test” - with test kits of known quality.
Part 2: Maintaining Quality of Testing in a State of Emergency
Surely the quick release of so many COVID-19 test kits, also known as in-vitro diagnostic devices (IVDs), is a good thing? Well, yes and no. I’ll let you into a trade secret. No medical pathology test is 100% sensitive and specific. To put it another way, all pathology tests will report some false positive and some false negative results; some tests more than others. Like pharmaceuticals, the sale of IVDs is highly regulated in counties with developed regulatory systems such as Europe, USA, Canada, Australia, Japan, Korea and others. Stringent regulatory conditions apply to the sale of IVDs, with the extent of regulations depending on the risk that false results pose to the individual and/or to the community. Many countries have immature, and some have no IVD regulatory systems. Often governments in these countries select test kits through a tender process in which the lowest priced product is chosen, rather than the one best suited to the purpose. In response to this situation, the World Health Organization (WHO) created the IVD Prequalification Program. WHO and collaborating laboratories like NRL assess the performance of IVDs and make the results public to help inform decisions made by governments and NGOs procuring test kits.
IVDs for HIV and hepatitis and all test kits used to screen the blood supply are highly regulated, as a false result can cause harm to both the individual and the community. Other infectious diseases are usually classified to a lesser extent, whereas blood tests for markers such as glucose or liver/ renal functions are lower still. To register a test kit for high risk organisms, the manufacturer must have international accreditation as a manufacturer (ISO 13485), provide the Regulator with a dossier containing comprehensive scientific evidence of the performance of the IVD, have package inserts or instructions for use (IFU) reviewed for clarity and completeness and approved for compliance with the regulations. A risk assessment for the safety of the IVD is also undertaken. Often the manufacturing facility is assessed for conformance to the standard. Registration of IVDs is complex and expensive, but is in place to minimise the risk of poorly performing tests released into the market and the misuse of IVDs. A Global Harmonisation Task Force, now superseded by the International Medical Device Regulators Forum (IMDF), was established to encourage Regulators in different countries to recognise the registration of IVDs in other signatory countries, thereby reducing the regulatory burden on IVD manufacturers. Most leading economies are signatories to the IMDF.
An IVD is registered for an “Intended Use”, and laboratories cannot legally use them for any other purpose. If an IVD is modified, it becomes an “in-house” test and is therefore used “off license” and the laboratory takes responsibility for the quality of the test results and any implications for patients. Some countries, including Australia, have specific regulations around the use of “in-house” tests. Manufacturers of high risk IVDs must show scientific evidence of the test kit performance and its suitability to the Intended Use” stated in the IFU. Examples of Intended Use are: screening of blood and tissue donations; general laboratory testing for diagnostic purposes; monitoring effectiveness of treatment or disease progression, confirmatory testing; rapid point of care test for community testing; or for home use only. Depending on the IVD and the Intended Use, a range of performance characteristics may be assessed, including:
Sensitivity – the ability of a test to report a positive result on a truly positive sample;
Specificity – the ability of a test to report a negative result on a truly negative sample;
Precision – the amount of variation inherent in the test;
Bias – the accuracy of the test measured against a true results (usually a reference measurement or standard);
Limit of detection (LOD)- a measure of the lowest amount of analyte that the test kit can detect;
Limit of quantification (LOQ) – a measure of the lowest amount of analyte the test kit can quantify (usually reported as 95% confidence);
Linearity – a demonstration that, as the amount of analyte increases, the signal increases proportionally;
Cross reactivity – whether analytes other than the analyte being detected causes the test kit to report positive results. This is common in some disease states such as autoimmune diseases, or infections with organisms similar to that being detected;
Serotypes/Genotype variation – Many organisms have a number of different circulating serotypes or genotypes, which may not be detectable in a poorly designed test kit;
Stability – Ensuring the reagents are stable and do not deteriorate over time.
Not all of these performance characteristic are relevant for all test kits, but where they are, they must be understood by the user so an appropriate interpretation of the result can be made. Most of these characteristics are self-explanatory, but a brief review of how each may be important for COVID-19 testing.
Sensitivity and specificity are important performance characteristics for almost all test kits. To estimate these characteristics, access to large numbers of known positive and negative samples is required. The European Common Technical Specifications currently require about 500 positive samples and 5,000 negative samples to assess specificity and sensitivity for blood screening assays. For Rapid Diagnostic Tests (RDT) for HIV, HCV and HBsAg, 500 positive and 1,500 negative samples are required; with an expected sensitivity of > 99.0%. Samples obtained from blood donors, clinical samples, pregnant women and samples with potentially interfering substances need to be sourced and tested. NAT for HIV RNA, HCV RNA and HBV DNA need to have the LOD estimated, as well as the LOQ for quantitative assays, by testing against an international standard. At least 10 samples for each HIV genotype must be tested to demonstrate the ability to detect each genotype equally. Samples containing cross-reacting analytes and interfering substances are tested to demonstrate specificity. Precision experiments, with both a limited number of variables i.e. same instrument, lot number and operator over a short period of time (repeatability) and multiple variables over a longer period of time (reproducibility) are conducted. Demonstration of minimal lot-to-lot variation is required and in Europe and USA, the regulator must test and release each batch high risk IVDs until they are satisfied with the IVD performance. WHO Prequalification protocols have similar but often less stringent criteria. By the time an IVD is released to the market, extensive performance evaluations have been conducted and are in the public domain for potential users to assess.
Quality of Current COVID-19 Testing
However, in the current situation with COVID-19 testing, a different quality paradigm exists. Correctly, governments waived the regulatory requirements to allow use of kits without the complete manufacturer evidence normally required. According to IVD Directive 98/79/EC, COVID-19 diagnostic devices are Annex 3 and the manufacturer has to specify device performance characteristics and self-declare conformity with the safety and performance requirements listed in the Directive. No performance testing or batch release is required by Notified bodies. Europe is currently transition to a new regulatory framework based on the IMDF, however this transition is not yet in place. It is uncertain if COVID-19 would be in the highest risk category, requiring a full assessment including batch release. Most likely, Europe would follow the same path as USA, Australia and WHO and allow Emergency Use provisions. In Australia, special legislation was passed to exempt IVDs from normal regulatory scrutiny. WHO implemented an Emergency Use Listing Procedure for COVID-19 NAT assays (but not serology), requiring limited evidence of performance. Similarly USA FDA; Canada; Japan; Korea and Singapore have also implemented Emergency Use Listing for NAT and serology without requiring complete performance evidence. Europe and Australia has referred validation of COVID-19 tests to public health laboratories outside the usual regulatory processes. These laboratories are the front line of COVID-19 testing, but have limited experience in formal test kit evaluations and are currently overloaded dealing with large volumes of samples. FIND (Geneva, Switzerland) is compiling results of evaluations for COVID-19 NAT and serology assays.
There are some technical difficulties faced when designing evaluation protocols for COVID-19 tests. Unlike HIV, syphilis and HBsAg serology tests, there are no acknowledged reference or confirmatory methods. In HIV serology, a testing strategy using multiple tests including screening tests followed by supplemental and/or confirmatory tests such as western blots are used to confirm positivity. Similarly syphilis testing used multiple specific anti-treponemal tests such as EIAs, CHLIA and TPPA in the testing strategy. HBsAg and HIV p24 positivity can be confirmed using neutralisation testing. COVID-19 serology currently has no reference test, however NRL is developing a western blot. Viral neutralisation may be useful. An international standard of a known viral load is required to assess the LOD or LOQ of COVID-19 NAT. However this takes a considerable amount of time to prepare. In the meantime, multiple panels of serial dilutions of virus should be made available and all NAT compared using the same dilution series so a comparison of LOD can be made. Without this level of discipline, our understanding of the performance of COVID-19 test kits will remain limited.
It is important that we recognise that there is currently little performance data available for COVID-19 test kits. A recent publication by National University of Singapore included an annex of the test kits available at the time. Disappointingly, most kits had little or no performance testing data available. Those that did inspired little confidence, having unacceptable sensitivity and specificity. One assay that does have data presented in the NUS study was BioMedomics /Jiangsu Medomics Medical, COVID-19 IgM/IgG Rapid Test. Using this as an example only, the stated sensitivity is 88.6% and the specificity is 90.6%. Firstly, without 95% confidence limits, report of sensitivity or specificity should be queried. If an assay detects 9 out of 10 positive samples correctly, it will have a sensitivity of 90% (95% CI: 54.1 – 99.5%), meaning that the real sensitivity is as low as 54.1%. If 1,000 samples were tested and 900 were detected, the sensitivity is also 90% (95% CI: 87.9 – 91.7%), demonstrating much greater confidence in the estimation. In the example, 397 positive samples were tested, and 351 were detected, giving a sensitivity of 88.6% (95% CI: 84.8 – 91.3%). To put it differently, for every 100,000 true positive individuals tested, 11,400 will incorrectly be reported as negative. The danger of using poor performing test kits is stark, both in health and economic terms.
We are flying blind. At the time of writing, just over one million people have confirmed COVID-19 infection, and the true number is assumed to be much greater. However, difficulties in obtaining sufficient numbers of samples of appropriate quantity to undertake comprehensive studies remain. One problem is that, when individuals tests NAT positive, they are put into isolation and samples are difficult to obtain. Those on the frontline are unaware of the need for evaluation samples and do not understand the type of sample required. Yes, there are many “evaluations” being undertaken using samples of convenience. FIND is collecting and compiling these data for public use. Manufacturers are required to obtain data to support their claims for future submission once COVID-19 is no longer emergency use.
However, to my knowledge, there has not been a systematic and scientifically constructed process to evaluate test kits using the same well-constructed panel of samples. Currently, there are over 200 known test kits, not to mention in-house test, available or being developed. What is required is a panel of samples assembled that can be used to evaluate the relevant performance characteristics of each COVID-19 test kit. These panels should be in sufficient quantity to allow testing of as many test kit as possible. Compiling the results obtained from each test kit will add value to the evaluation panel. WHO prequalification program has assembled such panels of sample for HIV serology and NRL is currently assembling panels for syphilis. This activity will take a coordinated effort and knowledge of evaluations protocols.
NRL would welcome communicating with potential collaborators that can support ethically accessed, non-commercial clinical samples to be used in development of an evaluation panel. As a scientific community, and to inform our governments, the need for comprehensive, scientifically-sound information on COVID-19 testing is urgent and important. Ad-hoc validations, using samples of convenience without well-constructed scientific protocols will just not suffice; and may even be dangerous and misleading. We need a coordinated international approach to solve this problem.
If your organisation is willing and able to support the development of universal sample panels for evaluations of serology tests, please email firstname.lastname@example.org