Explanation of paper with a new way of getting the TSH reference range

I've sent Lyn Mynott a copy of a paper where we describe a new method of getting a better fix on the TSH reference range. Some of you have asked for a copy, but you might find it very technical (it is). I'll try to describe in short what we've done and the advantage over the classical method. The classical method simply plotted all the TSH values in a euthyroid population as logarithmic values and made a graph relating the frequency of finding patients at a particular TSH value over the whole range of TSH values. What you get is a very skewed distribution down towards the bottom end of the range with most people clustered about 0.6-1.5 and with a long "tail" of a few people stretching up towards the hypothyroid region with numbers going up to 3.5 and beyond. The problem with this is that these few people's results have a very big effect on setting the top end of the range, especially if some of them are doubtful in diagnosis. But by good statistics you can't ignore them if you don't have a good reason for doing it. Our new method gives a lot more power to the many people in the midrange rather than the few I mentioned before, and it shows clearly where the hyperthyroid or hypothyroid subjects' TSH values diverge from the euthyroid line, thus defining where the euthyroid range should end. So the method is based on more powerful statistics that minimises the effects of what might be aberrant wrongly diagnosed patients near the hypo end of the TSH spectrum and maximises the majority. Hope this helps those who read the paper.

Last edited by

29 Replies

  • Diogenes, thanks for the explanation, it is a very technical paper. PR

  • Thank you diogenes. It will be interesting to know who will be open to acknowledging your research and change the method they use at present.

  • "most people clustered about 0.6-1.5" - may I ask if you could give any idea of the (rough) percentage?

    apologies for potential shortened mis-quote.... but this is so hopeful for us! :)

  • At a rough guess 85-90% in that range. The remaining 10-15% spread out ever more thinly as you approach 3.5 or thereabouts as the top of the range.

  • diogenes - many thanks for the clarification.

    85-90% had a TSH 0.6 to 1.5.... and 3.5 top of range...

    I do hope the 'powers that be' finally take note... but we sufferers will anyway...

    I would like to be able to have the correct words of appreciation...

    I will ask Lyn for a copy of the paper - & have a crack at reading techy stuff, I used to do this in a past life - dare I take it to my next GP appointment? ... perhaps not yet!

    P.S. finally diagnosed at Xmas via TSH 19.79 - previous years' clinical signs/symptoms & after PT op discounted as CFS - but worried if treatment plan will be per blood test - i.e. if in range, left... (to own devices....) J :D

  • Given the reported daily variation of TSH, how much of an impact would it have if TSH tests were performed within a relatively tight range if time? (E.g. between 07:00 and 09:00.) After all, we are trying to see where the reference range should lie where the within-day variation could be 50% of that range. Perhaps taking steps to avoid the worst impact of this daily variation would be appropriate?

  • Owing to the great diurnal variation in TSH levels, it should always be the case that a narrow constant time window should be chosen preferably am, for measurement and sampling.

  • Many thanks for this - just to check, are we at liberty to show the paper to our GPs/Endos?

  • Yes you can do so because its in the public domain, but you'll likely burst the poor endo/GP's brain. Just explain where most people's TSH lies in the range and, if yours is 2.5+, to start having a suspicion that you need some followup later to ensure it doesn't get any worse, then needing treatment.

  • Thank you, you mention 2.5+. Is that regardless of the differentials in current TSH ranges throughout the country? i.e. in my area the range is 0.3-6.0, but in other areas the upper limit is 4.0 (and the 'current' upper limit which was stated in Table 2 in the article - presumably this relates to Germany?)? Or have I completely misunderststood?

  • Unfortunately, TSH and the other TFT tests suffer in that various manufacturers have detectably different ranges, coming from their inability to calibrate properly before they put the product on the market. TSH has the best consistency, but still the tests can vary by up to 25% one with another. This is an additional headache on top of the individual hospital lab's determinations for setting a normal range by their own patients. Thankfully this year this problem (and the same for FT4 whose consistency is even worse) is being practically addressed by the biggest 10 manufacturers under the leadership of Prof L Thienpont of Ghent University in Belgium. At long last, she tells me, the ducks are in a row, to do the work this year starting I believe in earnest in midyear when the carefully defined samples are all ready. Unhappily FT3 (the worst of the lot in inconsistency) isn't yet considered - largely because it isn't used so much (the irony).

  • Many thanks for your comprehensive reply. It's good to hear about the progress being made. Does this cover the UK or does it only relate to the Low Countries/Germany?

    A further couple of question (and please forgive my ignorance about such matters):

    1. does the fact that the article has been published mean it has already been peer reviewed (pre-publication) or does publication lead to peer review? I'm not sure how these things work. My endo takes great pains to say he will only take peer-reviewed articles seriously (just as he sees TSH as, quote, the 'gold standard').

    2. if FT3 is so unreliable, doesn't that confirm what GPs and endos have been saying to many of us and why they refuse to use it as a diagnostic tool vis-a-vis thyroid function per se and/or conversion problems?

    I hope you don't mind me posing further questions.

  • 1) Yes the article is peer reviewed and in the published literature.

    2) FT3 testing will always be under a cloud until the wretched incompetents (i.e. the manufacturers) finally are forced to standardise their products. A 60% max variation in performance isn't good enough.

  • Will the Royal College of Physicians and the British Thyroid Association, and BTF be given details of your findings. Particularly as they state that the TSH should reach 10 before medicating.

  • I'm afraid that you have to realise the parochial nature of medical practice. By that I mean that in my case a mix of German physicians/endocrinologists and an English retired biochemist have constituted our work background. The acceptance will therefore be greatest in Germany and surrounding countries. The UK/US contingent will have to repeat the work to their own satisfaction before even deigning to look at its importance or change practice. And try to steal intellectual priority while they do it (esp the US). The first response to any work coming out of "left field" is to ignore it until it can be ignored no longer. Sorry for the gloomy prediction, but I've seen it far too many times. For example, 30 odd years ago I used to get the greatest putdown when I argued against the then prevailing view that people with severe non thyroidal illness were also prone (especially when critically ill) to disturbances in thyroid function, which did not indicate thyroid failure but could show e.g. in aberrant FT4/3/TSH values outside the ref ranges. The prevailing view was that people were still euthyroid and should show normal thyroid values. If they didn't it was the test at fault and not due to the change in the patient. Papers showing the opposite I submitted were squashed because of that thinking, which is now utterly demolished and what I said then is now accepted gospel - with other people getting the intellectual kudos of course. Such is medicine (and life).

  • So, I will probably be looking down from heaven (I hope) when things are polished and dusted. :)

    It must be most frustrating for people like you/your team when you know you're right. We, the patients know you are but the 'closed minds' stop progress and, most of all, prevent patients recovering their health.

    Thank you and your team for your hard work and success in getting a publication.

  • TSH isn't that valuable of a test anyway. In the old days doctors were taught to treat by elimination of symptoms.

  • but the correct TSH assay would have been very valuable especially now they have to have clinical evidential proof....

  • I really shouldn't comment as i can't grasp what is being said here..truly understand it. I just know if you have Pituitary issues, the test may show that. Once meds are introduced, many have a very suppressed TSH and can still have low levels of ft4 and ft3..has happens to me. This causes the doctors, to tell me i am severely over medicated. The TSH test has caused many so much harm. Look at all the research Dr. Lowe did and all the patients he made well and yet, no doctor knows about his work...or is interested in it for that matter. I just think TSH, as a Thyroid test, just isn't needed. In my opinion, it would make more sense to go back to basics, looking at symptoms. We are too reliant on testing and not focused on people.

  • I agree TSH shouldn't be used on it's own - please don't stop commenting!

    If only symptoms had such weight with GPs now...

    here is the (abstract) paper ...


    but here is a previous post


  • Thanks so much! Really scary and sad video. I have felt so ill for so long, i guess i am not able to see humor in it. It is that total non sense that keeps people ill, has kept me ill. The doctors will not look into why my thyroid hormone isn't working well, because they say i have fibromyaligia, migraines, chronic fatigue syndrome. I thought going on t3 only, on my own would be the answer and i was wrong. It must be just having the Hashimotos, that is causing it's own set of symptoms.

  • Jane, having a more intelligent reference range would be an improvement but it would not negate the fact that the thyroid function tests still all suffer from a low index of individuality. We all have our own personal reference range and our own set point which varies from person to person based on our genetics, our epigenetics and our metagenomics. In Anderson et al. the width of the 15 subject's individual reference ranges varied from 0.32 mIU/L to 2.35 mIU/L wide and each had their own individual set point. The reference range for the group was 0.16-2.39. Any individuals own reference range is much narrower then the laboratory reference range by 1/2 to 1/3 or less. The laboratory reference range is not the individual's reference range. Diogenes has a graph in their study which illustrates this principal. The thyroid functions tests don't show anything until an extreme variation, for many, from their normal set point. Science does not have any real understanding of how far off your normal set point you have to be to cause problems. The TFTs are not designed to show optimum for an individual, in fact they are incapable of showing optimum for any given individual. All you can make are very general statements such as, "Many people on thyroid medication, but not all, generally feel pretty good when their TSH is between complete suppression and about, say, 1.5-2.0 or so." Or "A lot of people feel pretty good when their FT3 is in the top quartile of the range." The TFTs are useful but only if you understand their limitations and most doctors do not understand the science. As Diogenes has said many times the 'goal post' mentality is not how the tests work, any result within the reference range does not guarantee normality either for diagnosis or for titrating the dose. And this discussion hasn't even touched on the other problems of accuracy affecting the TFTs. PR

  • Where's the paper? I'm still undiagnosed with a TSH of 4.3!

  • Contact Lyn Mynott or Louise Warvill - they have it on file

  • aDoctor, regarding the 'low index of individuality' I can't remember where I came across that description, whether it was from one of Dr. Spencer's articles or one of the 7 studies that I know of dealing with the fact an individual's reference range is much smaller than the laboratory reference range therefore making any assumption questionable at best when the patient's clinical presentation is ignored. I quite agree with you that the TSH is most relevant when elevated, unfortunately, for many individuals it can be elevated within the reference range. PR

  • No the paper is not comparing methods, but using more robust statistical techniques to obtain a reference range for TSH applicable to any method. The ordinary way of doing it exaggerates the influence of the relatively few patients with TSH's above 2.5 in setting the top end of the range. This method gives everyone a more equal influence in determining the range.

  • Diogenes, I would like to draw on your knowledge and experience since you have actually spent time developing and refining reference ranges and you understand the math.

    The 3rd generation, and possibly the 2nd generation, TSH tests have never had a normal Gaussian curve, it has always been a highly distorted, or warped, curve with a front porch.

    As stated in the 2002 NACB Guidelines, "Serum TSH concentrations determined in normal euthyroid subjects are skewed with a relatively long "tail" towards the higher values of the distribution. The values become more normally distributed when logtransformed. For reference range calculations, it is customary to log-transform the TSH results to calculate the 95% reference interval (typical population mean value ~1.5 mIU/L, range 0.4 to 4.0 mIU/L in iodide-sufficient populations) (202,206). However, given the high prevalence of mild (subclinical) hypothyroidism in the general population, it is likely that the current upper limit of the population reference range is skewed by the inclusion of persons with occult thyroid dysfunction (18).

    Also stated in the Guidelines, "Even the current sensitive TPOAb immunoassays may not identify all individuals with occult thyroid insufficiency. In the future, it is likely that the upper limit of the serum TSH euthyroid reference range will be reduced to 2.5 mIU/L because >95% of rigorously screened normal euthyroid volunteers have serum TSH values between 0.4 and 2.5 mIU/L."

    The 95% was by one of the assays used on the 13,400 qualified subjects although they were not screened using Ultrasound.

    You have also talked about the 10-15% of subjects in the upper range that distort it thus the study by your group trying to remove the 'exaggerated influence'.

    You also said that 85-90% fall in a range of 0.6-1.5.

    First question. Why not use 2.5 for the top? Does the math not support it? I have heard that with a bottom of 0.3-0.5 and a mean of 1.5, (the numbers from NHANES III) the normal top (using a normal shaped Gaussian curve) would be 2.5. Does your experience not bear this out?

    Second question. How accurate are the antibody tests, TPO + TG? I know they have improved but do they still lag at being able to detect at an early enough stage? Is it possible that the upper skew is because we are not detecting problems in that group of subjects?

    Third question. Do the reference ranges for T4, FT4, T3 and FT3 actually fall in a normal Gaussian curve? Is it only the TSH that has a warped curve?

    Last question. I came across an interesting statement in an article by ZRT labs here in the US that I haven't seen before.

    "In classical laboratory work, reference ranges are determined by looking at a relatively large number of normal subjects with respect to the analyte of interest. A reference range is usually determined statistically by using the average value plus and minus two standard deviations. Theoretically, this range represents 90 percent of the population (5th to 95th percentile) as shown in Figure 1.

    Figure 1 is the classic example of the standard Gaussian curve with 2 standard deviations.

    It won't copy for me but I'm sure you are familiar with it.

    "The above approach to determining “normal” ranges is effective when testing patients for conditions that are present in less than five percent of the population; however, this method is insufficient for conditions found in a larger percentage of the population."

    The last paragraph is the one I find of interest. It implies that for conditions which exist in greater than 5% of the population the standard method of determining a reference range becomes inaccurate.

    My question is, of course, is this a valid argument? Have you heard of this before?

    As always, I would appreciate your insight. PR

  • You are asking a lot of questions here, some of which I can’t answer.

    Answer to Q1. You are falling into the Goalpost error here. The top of the TSH reference range is only a probability function as is any range limit. It should be the place where simultaneously, its placing is most efficient for discriminating the greatest number of ill from healthy people. It does NOT mean that there are no ill people with results in the “healthy” range or healthy people with results in the “ill” range. The placing of the range limit is purely a statistical exercise to optimize the range setting. Then the “grey areas” have to be set either side to encourage followup as I’ve said before. As regards TSH even log transforming does not normalize to a symmetrical Gaussian distribution. That’s why we used a different statistical approach to minimize the effects of the “tail” of high results. The value of 3.5 is therefore merely a result from the statistics and the calibration of the test. The problem of subclinicals is an important one, that gives headaches in placing the hypo limit with complete confidence.

    Answer to Q3. For the thyroid hormones, total or free, the distribution in health is virtually Gaussian without transformation – maybe v slightly skewed towards the bottom end but not seriously so. There are serious overlaps for T4/FT4 at the hypo end and also for T3/FT3, but very little at the hyper end.

    Answer to Q2. This lack of sensitivity is probably because of heterogeneity in the antibodies formed – they will be a mix of weak, moderate and strong with different specificities and therefore according to the mix in an individual they may show up detectably at different start points. This affects overall sensitivity.

    Answer to Q4. 2SDs gives a 95% certainty not 90. I don’t know what the labs are saying about conditions where more than 5% of the population will be outside the ref range. One always sets the range to the 5% limit by statistics. This one defeats me as I don’t know the basis for their observation, which I think is faulty.

  • Diogenes, thank you, that clarifies some things. PR

You may also like...