For example:
Results: A total of 300 subjects (139 men and 161 women) of various ethnicities with a mean (?????SD) body mass index (in kg/m2) of 25.1 ?? 5.4 met the study entry criteria. The mean conceptual model???derived TBKSM ratio was 122 mmol/kg, which was comparable to the measurementderived TBKSM ratios in men and women (119.9 ?? 6.7 and 118.7 ?? 8.4 mmol/kg, respectively), although the ratio tended to be lower in subjects aged ??????70 y. A strong linear correlation was observed between TBK and SM (r = 0.98, P < 0.001), with sex, race, and age as small but significant prediction model covariates.
I think I understand what the p means (like percent change or something?) but what does the r mean? Would an r of 1 mean a 1:1 relationship in the dataset if it were graphed?

Views
1.4K 
Last Activity
1575D AGO
Get FREE instant access to our Paleo For Beginners Guide & 15 FREE Recipes!
4 Answers
best answer
what they are presenting is actually rsquared, not r.
RSquared is the coefficient of determination in statistics. It is the measure of how well least squares measures the outcome.
In English, it is used, with linear regression, as a measure of how well a model predicts observed outcomes. the number is percentage of variation that can be explained by regression. The higher the rsquared, the better the model. and rsquared of 1 means that the model perfectly predicts observations (and thus the real world).
The pvalue is, essentially, a measure of the likelihood that the observed value is extreme given that the null hypotehsis is true. The pvalue, along with chi square, allow one to determine likelihood of statistical significance. A pvalue of 0.05 between two populations would say that there is a 95% likelihood that the change between the two populations is a real change, and not a random artifact.
So, in your scenario, (rsqured=.98) all but 2% of the variation could be explained with regression. This a good result, we typically like to see 35 9's for computational measures, but in human models a .98 is very strong. This means that the model can be used to predict, for example, the TBKSM ratio for a random person given the same entry criteria and given the same protocol within 2%.
The p<0.001, means that there is a very strong statistical likelihood that these findings are accurate and not a result of randomness. This is an absurdly low pvalue. Given the subject size, 300, I would suspect bias in experimental design (but that's just a shot in the dark as I have not read the study).
Also it's always important to question statistical significant and actual significance. Over the course of 10 years, if one diet caused it's population to loose 1 pound more than another diet. While it may be a statistically significant result, in practice 1 pound difference in not significant.
Again, I have not looked at this study in particular, but just a few things that I look for in all studies.
So would that mean that if I graphed their data set and it had a r of .98 then for every 1 unit of x I would see a .98 unit increase in y?
Yea, it is just a study showing correlation between total body potassium (TBK) and skeletal muscle (SM). I think it's interesting and so I wanted to make sure I was interpreting it correctly. The full abstract is http://www.ncbi.nlm.nih.gov/pubmed/12499326 if you want to check it out sometime in more detail. Thanks for explaining this so thoroughly.
So would that mean that if I graphed their data set and it had a r of .98 then for every 1 unit of x I would see a 1 unit increase in y?
It's really about whether you can build a model to make predictions. Either with new subjects or by extrapolating to longer timeframes, etc.
Thanks for going into so much detail CD, especially the graph above really helps. The full abstract is here if you want to check it out sometime: http://www.ncbi.nlm.nih.gov/pubmed/12499326 .
Good answer, but not quite. First of all, unless a paper explicitly says r^2, they're reporting r and not r^2 (otherwise we wouldn't be able to tell a positive correlation from a negative one). Second, p is the likelihood that a false result could have occurred by chance, NOT the likelihood that your result DID occur by chance. See my answer for more details.
no it would mean that 98% of the variance woudl be described by a model. If you look at these two graphs: http://postimg.org/image/o9w5f5vq9/  the graph on the left can be described by a linear model with an rsquared of .98. So it is very strongly coupled to the model (this is a linear model, but any model would work). The graph on the right would be described by a linear model with an rsquared of .84
It's kind of funny that we're having this argument over a result with r (or r^2) = .98. In this case the difference really is immaterial.
Read the paper they are talking about rsquared not r. That nuance, while technically correct, does not add to the understanding for a lay person. I tried to give a description that would enable translation of results.
The quote Stephen provided is definitely showing r, not r^2. I'm sure the paper does talk about r^2 somewhere  you're right that that's the more important metric for determining if a relationship is big enough to actually matter.
CD's answer is mostly good, but two points need correction.
First of all, rsquared does represent the amount of variation that is explained by the model. However, unless a paper explicitly says they're reporting r^2
, they are NOT reporting r^2
, just r
. You need to square it yourself to find out what the actual meaning of the correlation is. One reason they do this is that it's important to know whether a correlation is positive or negative, but after you square it it's always positive.
Second, p
is not a measure of how likely it is that your results are true. This is a very common misunderstanding and even a lot of scientists get drawn into it. Apologies in advance for the lecture, but this is kind of hard to explain. What p tells you is this:
If the two variables weren't actually related, what is the probability you'd get a result like this due to random chance?
For example, it's theoretically possible that smoking doesn't actually cause lung cancer. Maybe it's just that by a random fluke of chance, every study that's ever shown a relation just happened to recruit a bunch of nonsmokers who got really, really lucky with regards to cancer and a bunch of smokers who were astronomically unlucky. Just like it's possible that you could flip a fair coin and have it come up heads 1000 times in a row. You can calculate the odds of that happening, and obviously they're very low  that's what the p value represents.
A p value tells you how UNlikley your results are, in a theoretical world where the results aren't actually true. It DOESN'T tell you how likely it is that your result actually is true.
For example, imagine that I run a study and find that people who drink 10 beers a day are healthier and live longer than people who drink 02. My result has p=.05, meaning that there's only a 5% chance I would have found a difference that big by chance, if drinking actually didn't affect your health at all.
Would you conclude from this that there's a 95% chance that drinking 10 beers a day is good for you? Or would you still think that it's probably bad, and assume that my results were the equivalent of rolling a 20 on a 20sided die  something that's not common, but still totally possible?
Just to add, a value of p < 0.05 is usually considered statistically significant. p values greater than that usually don't get treated as significant.
I know that p values much lower than .05 are the norm in fields like bioinformatics where you're testing a huge number of relationships at once, often with less theoretical guidance, since that increases the number of false positives you'd expect. I think that nonacademic survey research typically uses a de facto p of around .05.
Outside of academia, I have never seen anyone use a pvalue less than .01. Typically we look for at least .001. But I work in data science so low Ns are never an issue for us. Maybe in other fields they still use the Fisher ratio?
Thanks Axial .
Someone who's taken more than intro stats can better answer this, but I believe r is the correlation coefficient, which is a measurement of strength of correlation (1 < r < 1, r=1 is a perfectly linear positive association). Pvalue is a little more complicated, but in this case would basically be the probability of finding this particular positive relationship between TBK and SM if in actuality there was NO relationship at all between TBK and SM. Whatever those are. Whether the pvalue is above or below a certain threshold (often .05, depending on the discipline) determines the the significance of the result.
r simply means "you would expect it to be true". If it's near 1 or 1, yes. If it's near zero, no. Now what are SM and TBK?
There's a lot wrong with traditional NHST, and I wouldn't be disappointed if p values were wiped off the face of the earth. That doesn't change the fact that r measures the strength of an association and not whether you should expect the association to be true. In the social sciences it's rare for two variables to be correlated at anywhere near 1  for one thing, that would mean that one is the sole and complete cause of the other, with no influence by any other factor (or that they both share 100% of the same causal factors)...
Skeletal Muscle and Total Body Potassium.
No, r tells you the strength of the relationship. p gives some information about whether to believe the result is true (sort of  see my answer). For example, imagine that people who eat food x have a very slight tendency to get disease y more often. Since it's only a slight tendency, r is small. But if we've seen that small relationship in lots of different studies, in different populations, etc., we can have very strong reason to believe that it's real.
... And the conclusion about whether two variables are really related hinges on both the size of the correlation and the strength of the evidence (and the plausibility of the relationship, if we want to get Bayesean).
Just like a statistician. Have you read The Black Swan axial? Gaussian statistics are pilloried for trying to overreach and in doing so get things exactly wrong. You're better off being approximately right, which is what I was trying to do.
AxialGentleman
(2624)
on July 13, 2013
at 02:58 PM
It's kind of funny that we're having this argument over a result with r (or r^2) = .98. In this case the difference really is immaterial.
AxialGentleman
(2624)
on July 13, 2013
at 02:54 PM
The quote Stephen provided is definitely showing r, not r^2. I'm sure the paper does talk about r^2 somewhere  you're right that that's the more important metric for determining if a relationship is big enough to actually matter.
AxialGentleman
(2624)
on July 10, 2013
at 03:39 PM
I know that p values much lower than .05 are the norm in fields like bioinformatics where you're testing a huge number of relationships at once, often with less theoretical guidance, since that increases the number of false positives you'd expect. I think that nonacademic survey research typically uses a de facto p of around .05.
AxialGentleman
(2624)
on July 10, 2013
at 03:35 PM
... And the conclusion about whether two variables are really related hinges on both the size of the correlation and the strength of the evidence (and the plausibility of the relationship, if we want to get Bayesean).
AxialGentleman
(2624)
on July 10, 2013
at 03:34 PM
There's a lot wrong with traditional NHST, and I wouldn't be disappointed if p values were wiped off the face of the earth. That doesn't change the fact that r measures the strength of an association and not whether you should expect the association to be true. In the social sciences it's rare for two variables to be correlated at anywhere near 1  for one thing, that would mean that one is the sole and complete cause of the other, with no influence by any other factor (or that they both share 100% of the same causal factors)...
CD
(26217)
on July 10, 2013
at 12:38 PM
Outside of academia, I have never seen anyone use a pvalue less than .01. Typically we look for at least .001. But I work in data science so low Ns are never an issue for us. Maybe in other fields they still use the Fisher ratio?
thhq
(10601)
on July 09, 2013
at 10:29 PM
To put r in slightly different terms, r defines an envelope around the predicted line into which all the data falls. For what you're doing, the slope of the line is of more interest, because it tells you what might happen if you intentionally increase your K level. In loose theory, eating the bananas and increasing your K should increase your skeletal mass. But in reality it probably would have to be done for months at controlled BMI to see a real effect. You'd have to reequilibrate.
thhq
(10601)
on July 09, 2013
at 10:22 PM
To put it in slightly different terms, r can be used to put a confidence envelope around the predicted line to show where the data falls. For what you're trying to do, you're more interested in the slope of the line rather than the scatter. If you raise mmols/kg by 10, what's the effect on skeletal mass? Knowing that and your starting k level, you found commence your N=1
thhq
(10601)
on July 09, 2013
at 10:13 PM
Durianrider strikes again....aaaaaaaagh...
Stephen_4
(10979)
on July 09, 2013
at 09:21 PM
And it leaves a plausible theory that more Potassium rich foods, on top of an otherwise balanced diet (adequate protein, fat, carbs and meeting RDAs of other vitamins/minerals) may be advantageous for those looking to be metabolically fit. It also makes sense that something that lowers risk of heart disease would also be conducive to lean body mass since those are both characteristics of being metabolically fit.
Stephen_4
(10979)
on July 09, 2013
at 09:18 PM
I was actually looking for a study I read a while ago showing a positive correlation between dietary potassium intake and Lean Body Mass in the elderly. I wasn't able to find that one, but still found this one interesting if only because of the fact that Total Body Potassium can actually be used to predict Skeletal Mass of an Individual with such a high R and P value. I was reading up on Potassium this morning and how it relates to blood sugar and it's inverse correlation with Heart Disease. So to me this says multiple things. 1) You can't get tons of muscle without potassium.
thhq
(10601)
on July 09, 2013
at 08:41 PM
Just like a statistician. Have you read The Black Swan axial? Gaussian statistics are pilloried for trying to overreach and in doing so get things exactly wrong. You're better off being approximately right, which is what I was trying to do.
thhq
(10601)
on July 09, 2013
at 08:35 PM
So does it make sense? Potassium level high with lots of skeletal muscle? Why would it make sense? Do fit people concentrate potassium for some reason? If you supplemented with potassium would you become muscular? All that out of a high rvalue, and I have NO idea what I'm talking about. You have to be a subject matter expert to get value out of it.
CD
(26217)
on July 09, 2013
at 08:32 PM
Read the paper they are talking about rsquared not r. That nuance, while technically correct, does not add to the understanding for a lay person. I tried to give a description that would enable translation of results.
Matt_11
(41747)
on July 09, 2013
at 08:04 PM
Just to add, a value of p < 0.05 is usually considered statistically significant. p values greater than that usually don't get treated as significant.
AxialGentleman
(2624)
on July 09, 2013
at 07:55 PM
No, r tells you the strength of the relationship. p gives some information about whether to believe the result is true (sort of  see my answer). For example, imagine that people who eat food x have a very slight tendency to get disease y more often. Since it's only a slight tendency, r is small. But if we've seen that small relationship in lots of different studies, in different populations, etc., we can have very strong reason to believe that it's real.
Stephen_4
(10979)
on July 09, 2013
at 07:53 PM
Skeletal Muscle and Total Body Potassium.
Stephen_4
(10979)
on July 09, 2013
at 07:53 PM
Thanks Axial .
Stephen_4
(10979)
on July 09, 2013
at 07:52 PM
TBK is Total Body Potassium and SM is Skeletal Muscle.
thhq
(10601)
on July 09, 2013
at 07:03 PM
What is TBK? All I'm finding is stuff like Taco Burrito King.
AxialGentleman
(2624)
on July 09, 2013
at 06:40 PM
Good answer, but not quite. First of all, unless a paper explicitly says r^2, they're reporting r and not r^2 (otherwise we wouldn't be able to tell a positive correlation from a negative one). Second, p is the likelihood that a false result could have occurred by chance, NOT the likelihood that your result DID occur by chance. See my answer for more details.
Stephen_4
(10979)
on July 09, 2013
at 04:56 PM
Thanks for going into so much detail CD, especially the graph above really helps. The full abstract is here if you want to check it out sometime: http://www.ncbi.nlm.nih.gov/pubmed/12499326 .
Stephen_4
(10979)
on July 09, 2013
at 04:55 PM
Yea, it is just a study showing correlation between total body potassium (TBK) and skeletal muscle (SM). I think it's interesting and so I wanted to make sure I was interpreting it correctly. The full abstract is http://www.ncbi.nlm.nih.gov/pubmed/12499326 if you want to check it out sometime in more detail. Thanks for explaining this so thoroughly.
CD
(26217)
on July 09, 2013
at 04:54 PM
It's really about whether you can build a model to make predictions. Either with new subjects or by extrapolating to longer timeframes, etc.
CD
(26217)
on July 09, 2013
at 04:53 PM
no it would mean that 98% of the variance woudl be described by a model. If you look at these two graphs: http://postimg.org/image/o9w5f5vq9/  the graph on the left can be described by a linear model with an rsquared of .98. So it is very strongly coupled to the model (this is a linear model, but any model would work). The graph on the right would be described by a linear model with an rsquared of .84
Stephen_4
(10979)
on July 09, 2013
at 04:49 PM
So would that mean that if I graphed their data set and it had a r of .98 then for every 1 unit of x I would see a .98 unit increase in y?
Stephen_4
(10979)
on July 09, 2013
at 04:49 PM
So would that mean that if I graphed their data set and it had a r of .98 then for every 1 unit of x I would see a 1 unit increase in y?