Failure of formal logic in the determination of... - Thyroid UK

Thyroid UK

143,338 members•168,686 posts

Failure of formal logic in the determination of normal thyroid function. Accepting the Null Hypothesis.

11 years ago•25 Replies

A sample of TSH values is taken from normal people. People with hypo or hyper thyroid symptoms would necessarily be excluded from this sample. The mean and standard deviation of this sample are computed. If a newly tested individual has symptoms they are already excluded from this test since their symptoms exclude them from this group. If the individual has no symptoms, then a tested value of mean minus 1.65 SD would indicate that they were likely hypothyroid. A value of mean plus 1.65 SD would indicate that they were likely hyperthyroid. A mean between these values would result in no conclusion. It should not lead to the conclusion that the individual has normal function. To make such a conclusion is to commit the logical fallacy called 'Accepting the Null Hypothesis.'

Apparently the medical profession does not understand statistical logic because this fallacy is what I see being committed. Specifically, this is my observation in the Kaiser Permanente Medical system in the San Francisco California area.

Written by

jlovell88

To view profiles and participate in discussions please or .

Read more about...

TSH test

25 Replies

•

LouiseRoberts11 years ago

There has been some recent research exploring this. We hope to see it published very soon.

helvellaAdministrator11 years ago

It is far worse than you suggest.

First, TSH distribution is absolutely NOT a Gaussian (also called "normal") distribution. Therefore use of the probability tables that are appropriate to Gaussian distributions is fundamentally wrong.

The range of TSH values that individuals could have (whilst remaining "healthy") is far narrower than the mean plus or minus 1.65 standard deviations. That is, individual ranges are far narrower than population ranges.

Excluding those without thyroid symptoms is inadequate. This has been better recognised over the years and anyone with any question of thyroid issues, including presence of thyroid peroxidase antibodies or thyroglobulin antibodies should be excluded. (I'd argue that there are probably many other exclusion criteria that need to be applied such as "having appropriate levels of B12, iron, folates, ...", not having suffered any head or brain injury, not currently or recently suffering any of numerous diseases, not having anti-T4 or anti-T3 antibodies, and so on.)

It is also essential to use the right lab reference range and not, as I have seen a senior endocrinologist write, to use a "standard" range if the specific reference range is not available.

Certainly the Null hypothesis is a trap to fall into.

I am convinced that there are woefully too few medical statisticians. It would be appropriate, in my view, for pretty much every published paper to pass such a specialist to identify the researchers' misuse of statistical methods.

Rod

LindaC• in reply tohelvella11 years ago

As always, very interesting. Been off here for a while (still gut... now affecting thyroid in weird ways) BUT the subject of stats certainly caught my eye.

Have noted how well everyone is doing, one way or another! Hats off to Mary, Louise and T UK! Off to check out how people are doing...

The Null Hypothesis is not, as such, 'accepted': usually an experimenter frames a null hypothesis with the intent of rejecting it, that is, intending to run an experiment which produces data that shows the thing under study does make a difference. In statistics, a null hypothesis is a statement that the thing being studied produces no effect or makes no difference. An example of a null hypothesis is the statement "This diet has no effect on people's weight." I believe the term used is, "The null hypothesis cannot be rejected".

A favourite of mine is Type I and Type II Errors. I can no longer these days rattle this off the top of my head (I can say it, just not write it well) so this is the main focus a la wiki:

Concluding that the means were different - when in reality they were not - would be a Type I error.

Concluding the means were not different - when in reality they were - would be a Type II error.

So, given what everyone has said here, comparing two means (WHEN and WHERE appropriate):

then consider that all statistical hypothesis testing has a probability of making type I and type II errors, the problem surely multiplies!

A type I error (or error of the first kind) is the incorrect rejection of a true null hypothesis. With respect to the non-null hypothesis, it represents a false positive. Usually a type I error leads one to conclude that a supposed effect or relationship exists when in fact it doesn't. Examples of type I errors include a test that shows a patient to have a disease when in fact the patient does not have the disease, a fire alarm going off indicating a fire when in fact there is no fire or an experiment indicating that a medical treatment should cure a disease when in fact it does not.

A type II error (or error of the second kind) is the failure to reject a false null hypothesis. With respect to the non-null hypothesis, it represents a false negative. Examples of type II errors would be a blood test failing to detect the disease it was designed to detect, in a patient who really has the disease; a fire breaking out and the fire alarm does not ring or a clinical trial of a medical treatment failing to show that the treatment works when really it does.

What's important for us all... 'significantly' [statistically? LOL] educated endos

Best to everyone, each and every one of you!

jlovell88• in reply toLindaC11 years ago

Comparing means is not quite the same as deciding if an individual's test score indicates if they are likely not in a given distribution. A type 2 error would be a related idea for a situation in which an individual was tested to see if he was hypothyroid. Let us suppose that he had symptoms and the TSH value was 3.0 in reference to the graph

web.archive.org/web/2005020...

This value would not allow the inference that the individual was hypothyroid. (rejecting the null hypothesis). But it would also not allow the inference that the individual is not hypothyroid. (this inference would be the fallacy of accepting the null hypothesis.) It would make sense for a doctor to pay attention to symptoms if test results do not indicate a proper inference. But in the case given, the fallacy of accepting the null hypothesis may cause the doctor to ignore the symptoms which are actually the most important information.

I will give an illustration of the same logic: Let us suppose that men have an average IQ of 100 and a SD of 15. Women have an average IQ of 101 and a SD of 15. An individual's IQ is tested and shown to be 100. The null hypothesis is that the individual is a member of the women's distribution. We can not reject the null hypothesis, but it would be silly to accept it. If we accepted the null hypothesis, the average man would be judged to be a woman.

In the parallel of thyroid testing the average hypothyroid might be judged to be normal and symptoms to the contrary would be ignored.

LindaC• in reply tojlovell8811 years ago

That is understood and comparing means was not what was being said. It seems that 'accepting of the null hypothesis' and Gaussian distribution - as pointed out by Rod - are not helpful in this particular scenario. My response was to state that statistical error, of those two varieties (there are, of course, more) is something akin to hanging the innocent man or letting the guilty go free... either side of skewed parameters can lead to poor outcomes (as you say in your last sentence above).

The point being made is that doctors are not sufficiently versed (or trained) in critical probabilistic, inference/likelihood methodologies, yet these are the very areas absolutely required in medicine (and many other fields). With relevant skills, good old Bayesian revising of data when new information 'comes in' (whether test results are accepted or rejected) via good old clinical diagnostic skills, would be pertinent to each individual. It is only cogent statistical analysis that will surely help eliminate these apparent layman's statistical errors (carried out by medics); something which seems to affect many of us who are woefully misdiagnosed.

Thank you for your further comment - perhaps we come from different fields and seem merely at cross-purposes here rather than disagreement. Best wishes

jlovell88• in reply toLindaC11 years ago

I didn't think we were at cross purposes. What you said was right on the issue. I am trying to distinguish between different approaches and outright illogic. For instance, the distribution can be turned normal by transforming the x axis. And that would be a judgment call to do. But it is never proper to accept the null hypothesis. If you are trying to manage type 2 error, you need to sample the alternate distribution. I suspect that would be nearly impossible in the case of hypothyroid. The mean might be very near the normal mean, and the SD might be large. But the issue remains as to whether the doctor should ignore symptoms. And they do ignore symptoms in favor of an incorrect understanding of statistical logic.

humanbean• in reply tohelvella11 years ago

A graph of TSH distribution :

web.archive.org/web/2005020...

The population studied had been partly stripped of people who had thyroid disease. Some people with thyroid antibodies were left in if their TSH was 4.0 or less. They so nearly got it right, it's a shame that they didn't go the whole hog and do the job properly. But the graph is interesting anyway - it would just have been even more impressive if they had truly tested a healthy population.

helvellaAdministrator• in reply tohumanbean11 years ago

That is a good link.

When you look at that graph it is difficult to understand how anyone could argue that values from 3.6 to 4.5 could ever be regarded as being in range.

The article also ends up highlighting the fact that choosing a 2.5% band top and bottom is entirely arbitrary. If only 1% of the population have thyroid issues, that would be too wide; if 20% it would be far too narrow. But that choice of 2.5% is based on what seems to me to be false statistical interpretation. Indeed, there isn't even the slightest reason that the bands should be the same top and bottom!

Rod

humanbean• in reply tohelvella11 years ago

The source for the data that went into creating that graph and the article is available on the web in full, just in case anyone is interested :

eje.org/content/143/5/639.f...

PinkNinja• in reply tohumanbean11 years ago

Great link. Thanks

jlovell88• in reply tohelvella11 years ago

I agree with your comments. These are big problems, but perhaps more subtle to argue. No one should argue that it is adequate to accept the null hypothesis. But the Kaiser doctors that I have talked to simply do not care.

rosetrees11 years ago

It's worse than that.

As I understand it, the range is worked out by taking samples from people who do not have a diagnosis of hypo/hyper. Whether or not they have symptoms is unknown to the lab. This in itself widens the range as a lot of the samples used will be from people who are undiagnosed. All are taken from people who were deemed to be in need of a blood test, and most will be ill.

They don't take the tests from a healthy population who have been interviewed to find out if they have symptoms. At least, this is what my local lab told me.

Angel_of_the_North• in reply torosetrees11 years ago

And where do you get blood samples form people who aren't sick? I've never been for a blood test when I was well. Diseases other than "official" thyroid problems can affect TSH perhaps because of changes in RT3 or pituitary problems.

rosetrees• in reply toAngel_of_the_North11 years ago

Exactly. The whole system is flawed. When I asked if they took samples from healthy volunteers I was told that they aren't allowed to do that.

silverfox711 years ago

I've often been told you can prove/ disprove anything by statistics. That's not to say I don't believe any of the comments above, I found the comments very interesting. It is true to say though that I know nothing about statistics and I expect that is true of many, including medics. I do know in the past that medics get lectured in the subject but know nothing of the content. My late husband, a science based endocrinologist told me, and my then GP that the ranges were flawed and were not based on anything scientifically proven. My children, two being graduates in music and history with politics respectively so not science orientated saw at once when I told them a group of normal people etc that the argument was flawed so why can't the medics? I've also said to disbelieving friends how often do you meet an old acquaintance in the surgery and ask how they are and the reply is fine-in the main they are not fine if they have a need to see a doctor so my argument is on the unknown state of the normal people this so called respected data is based on. Recently I went in the surgery to get some annual results and commented that I felt great! Apparently that gave the impression I was over medicated!! Had I walked in feeling terrible I would have been given antidepressants. Aren't we allowed to be well?

PinkNinja11 years ago

"There are three kinds of lies: lies, damned lies, and statistics".

- Often attributed to Benjamin Disraeli or Mark Twain but was likely coined by someone else entirely...

Angel_of_the_North11 years ago

I once read a report (on the devious nature of stats) that showed that the consumption of Guinness was directly related to the number of monks who wore sandals ...

HarryE• in reply toAngel_of_the_North11 years ago

I now have a lovely vision of lots of monks in sandals drinking Guinness!

helvellaAdministrator• in reply toHarryE11 years ago

Another classic is the direct correlation between birth rate and consumption of orange marmalade! Cannot remember exactly which years but the reason was the end of a war (WW1? WW2?) - which allowed imports of Seville oranges to re-commence and increase, and people found they could start families (troops returning, more optimistic about the future).

Let this image go with the monks and Guinness...

Rod

HarryE• in reply tohelvella11 years ago

• in reply toAngel_of_the_North11 years ago

Lol! We all know the TSH test is just a measure of pituitary hormone. Can't get my head round stats today (or ever!) - love the raccoon & cat pic 'tho!

humanbean• in reply toAngel_of_the_North11 years ago

A wikipedia article : Correlation does not imply causation...

en.wikipedia.org/wiki/Corre...

jlovell88• in reply tohumanbean11 years ago

Likewise, correlation does not deny causation.

jlovell8811 years ago

2.5% is the result of a cultural practice with statisticians of setting the desired level of dependability of a test at equal to or less than 1 chance in 20 of being wrong. The top and bottom 2.5 % added make 5% and that is 1 in 20 of 100%. So values more extreme than these values have a probability of less than 1 in 20 of being included in the given distribution (that is, normal thyroid function).

helvellaAdministrator• in reply tojlovell8811 years ago

I almost agree - it certainly has every appearance of an unthinking use of 2.5% at each end. But anyone who would use 2.5% without critical analysis as to why does not deserve to be called a statistician.

Rod