Sunday, October 4, 2009

Where is your information coming from?

Even though it's been a year since I've written in this blog, I still get offers of sponsorship or "partnership" with other sites, services and products. Most of the time, it's to create some sort of linking relationship: If I put their site on my links, they'll put my site on their links and presumably, the traffic to both of our sites equilibrates to a higher level than ever before.

But more recently, I got an email from a company asking me if I would be willing to write reviews on other websites and to post my reviews of those websites in my blog:

"Hello,

I'm Joy from XXXXXXX.com.

I would like to know if by any chance you would be interested in getting paid to publish reviews of products and websites on your blog http://evidencebasedfitness.blogspot.com/.

If you are interested please let us know the amount of money you want in order to publish a review by clicking the following link: XXXXXXXXX

As soon as you do that we'll start sending you paid review proposals from our customers.

Thanks,

The XXXXXXXX Team"

The customer of this company, is presumably a website that is looking to increase its traffic. And the way in which the company can help that website is by paying blogs to write about their customer.

It seems like a fairly straight-forward method of advertisement, only there's a minor catch: There's no obligation on a blogger's part to declare any conflict of interest when they write a review in their blog. And this is where things get murky.

I decided to take a look at their offer--What do I have to lose? And here's the kicker:



If I'm willing to NOT declare my conflict of interest, I can make much more money from this company. If I write an endorsement of a website on MY blog and conceal the fact that I was paid to write the endorsement from my readers, I can make more money.

Now, I haven't written a research review here in a year. I don't know when I will write another one still. My blog's purpose is not to review other people's websites, and I don't intend on changing that. So, I'm not overly inclined to change my relatively inactive blog to make a few bucks. But, you have to admit, this system is a pretty... interesting one.

Where do you think your information is coming from?

Friday, October 10, 2008

You don't always get what you want, even if you get what you need

One of the newer supplements on the market are the aromatase inhibitors. They purport to increase free testosterone levels by inhibiting the enzyme that is responsible for converting androstenedione to estrone as well as converting testosterone to estradiol. By preventing the breakdown of testosterone precursors and the breakdown of testosterone itself, the concentration of testosterone should theoretically increase.

Sounds great, doesn't it?

If any of you have read the advertising, Gaspari Nutrition's Novedex touts the Baylor University study in its ad literature. This paper was published in 2007.

Willoughby DS, Wilborn, C et al. Eight weeks of aromatase inhibition using the nutritional supplement Novedex XT: Effects in young, eugonadal men. International Journal of Sport Nutrition and Exercise Metabolism, 17: 92-108, 2007.

Introduction:

Lots of guys are interested in getting more muscular. Some of them turn to anabolic steroids. Others have tried testosterone precursors like androstenedione, but both of these substances are technically banned or outright illegal in many countries. However, if you can block, or partially block the action of the enzyme that converts androstenedione to estrone, and testosterone to estradiol (a task accomplished by the same enzyme), you should be able to increase free testosterone and therefore achieve an anabolic effect (e.g. increase lean mass, decrease fat mass) because estradiol is the main hormone that feeds back to your brain to tell it to produce less stimulating hormone to your testes (which is where testosterone ultimately comes from). This enzyme is in the class of aromatases. And thus, the ingredient in Novadex that was tested in this study is in the class of aromatase inhibitors.

[Edit: I forgot to mention that this study was funded by Gaspari Nutrition, and appropriately disclaimed in the paper.]

Methods:

The authors chose to study eugonadal men in this study. Eugonadal means that they were producing normal amounts of testosterone. Guys who had used any nutritional supplements in the 2 months prior to the study were excluded from participation (this included creatine). All of the subjects had at least 3 years of resistance training experience. So, we're looking at non-beginner lifters who didn't use creatine or androstenedione or steroids for 2 months before the study.

[I think it's easy for novice readers to dismiss this study on the basis of these inclusion criteria, but I wouldn't share that opinion. This study does use the right kind of guy: beginner lifters probably shouldn't be using aromatase inhibitors right off the bad. Most non-beginners are probably using creatine, but as a researcher, you want to give your new "drug" as good of succeeding as much as possible. Having guys on creatine or other supplements means that you have to account for gains by creatine or other substances and therefore whatever gains you see, might not be attributable to Novedex. I suppose there's an outside chance that there's a synergistic effect, but I'm always leery of claims of synergy of two substances that, on their own, don't do much, but together somehow magically make huge differences (not that I'm saying creatine does nothing, because the evidence for creatine is quite good)]

Measurements were taken at 0, 4, 8 and 11 weeks. The researchers measured percent body fat, fat mass, fat-free mass, total body water as well as total and free testosterone, testosterone precursors and metabolites. Blood tests for safety were also performed, but I'm not going to focus much on those.

The subjects were simply told to continue their workout and diet schedules. Logs of physical activity and diet were kept during selected intervals of the experiment.

Subjects were split into two groups, matched by age and body mass, then assigned to get either Novedex or a placebo.

[The reporting gets a bit sketchy here, since it's not clear how they decided who would get which pill.]

The Novedex group took 4 Novedex capsules at bedtime. The placebo group took 4 placebo capsultes (maltodextrin) at bedtime. After 8 weeks of taking either pill every night, the subjects didn't take either of the pills for 3 weeks. Subjects were not told which group they were in, and apparently, neither were the researchers.

[Again, the reporting gets sketchy because although the paper says, "double blind", we're not sure who they're referring to as the second blind. We're also not told that steps were taken to make sure people didn't somehow figure out their group assignment).]

Statistics:


The researchers used 2x4 ANOVAs with repeated measures for every variable. This included their safety variables--about 45-50 of them ranging from complete blood count to triglycerides and urine ketones.

[That's a lot of variables to test!]

They then did separate 1-way ANOVAs to test for differences between groups and between each of the time intervals.

[This is redundant and unnecessary, and actually increases your chance of finding a significant p-value when a true difference does not exist in the "real world".]

The researchers estimated their required sample size on an unknown variable, presumably free testosterone, and figured they needed 8 subjects per group to detect a difference between the two groups of between 0.8 and 1.25.

[I have no idea if this was calculated on possible vales of free testosterone or not, or what the units of 0.8 or 1.25 are, or if this was a ratio difference or what. It also seems highly unlikely that you would need the same number of people to detect a difference of 0.8 as you would need to detect a difference of 1.25, but maybe you only needed 4 or 6 people to detect the larger difference? Their sample size estimation method wasn't referenced.]

Results:


The guys:

On average, the subjects were aged 26.1 (SD 4.4), were around 182 cm tall (71 inches, or 5'11"ish, weighed about 91kg (SD 14), or about 200 lbs, had a fat free mass of 75 kg (SD 9.5) 165 lbs, and had a body fat percentage of about 17 (SD 5.9)

Diet and Physical Activity:

The two groups did not really differ in any meaningful way from one another in terms of macronutrient ratios or total calories consumed. "Subjective" analysis of workout logs showed that none of the subjects changed up their workout routines.

Hormones:

Total testosterone and free testosterone were observed to increase at the 4 and 8 week points for the Novedex group. On average, total serum testosterone was noted to rise about 4 times (from about 5pg/ml to about 25pg/ml, with a very wide variance for total), and from less than 25ng/ml to just under 150ng/ml with a very large variance for free testosterone. However, even with this huge variance in effect between subjects in the Novedex group, there is no question it was larger than the non-existent change in testosterone of the placebo group. By the 11 week mark, both groups had returned to the week 0 level of total and free testosterone.

Body composition:

The authors reported that there were no statistically significant changes in body comp measurements except the the Novedex group loss more fat mass than the placebo group (about 3.5% on average). But looking at the graph and the data, I'd be hard pressed to make the same conclusion and I would attribute the two statistically significant p-values to be accidental due to the massive number of significance tests that were performed. Even if it really is "statistically significant", I would say that the observed difference isn't actually remarkable. At all.


In fact, if you look at the reported numbers, the placebo group started at an average body-fat percentage of 18.4 (SD 6.3) and at week 8 with a body-fat percentage of 18.7 (SD 6.5) for an average change of +0.3%, while the Novedex group started at an average body-fat percentage of 16.1 (SD 5.6) and at week 8, had an average body-fat percentage of 15.3 (SD 5.3), for an average change of -0.8%. That makes an average difference between the two groups of a mind-numbing 1.1% over 8 weeks.

Safety markers:

There were no marked differences in general blood work noticed.

Discussion:


We know that testosterone in _supraphysiological_ doses increases muscle mass and strength. However, even with a 4-fold increase of total testosterone and a 5-6-fold increase of free testosterone, 8 weeks of Novedex doesn't seem to put a dent in body composition. Apart from the moderately poor reporting of this randomized controlled trial, it's not bad. And the weaknesses in the study aren't really fatal flaws because all the biases in this study either affect generalizability (as opposed to validity) or are all in the same direction as the findings. For instance, the fact that no beginner lifters were included in the study doesn't affect the validity of the trial, only the generalizability (i.e. we can't use this study as justification for beginner lifters to use Novedex, but there's nothing glaringly wrong with the study itself). The fact that we don't know who was blinded or whether some of the subjects figured out their "pill assignment" doesn't really affect the validity of the study because if control group subjects found out they were on the placebo pill, they would have been biased to show that it did nothing (which it did anyways).

The only real weakness in this study of consequence is the coincidental finding of a "significant" reduction in fat mass (as determined by a significant p-value). But with so many tests of significance, we would expect there to be a very high chance (i.e. greater than 50%) that at least one of their tests would be significant by chance alone. And when you look at the actual numbers, you see that it probably is a coincidence that the p-value is less than 0.05, as opposed to being something meaningful, since none of the body comp numbers really changes at all.

One additional problem that you might (correctly) pick out is that this study involved only 16 guys. But again, this goes to narrowing the field of people this study applies to, not the overall validity of the study. If you don't fit into a demographic profile that is similar to these guys, it's really hard to use this study as justification to use Novedex.

One thing that I found very interesting is the range of values that subjects had in response to Novedex. The range of total testosterone and free testosterone in response to Novedex was pretty huge, which leads me to believe that there are probably guys who respond more than others on aromatase inhibitors. If there was a way to predict who was a responder, or if a larger study had been done so that a sub-analysis of high-responders vs. low-responders could be performed, I think we would have had a winner of a study on our hands. Maybe we would have seen substantial changes in body comp that would justify the use of Novedex. However, the average change in fat-free muscle mass and fat mass wasn't very large and unfortunately, neither were the variances, which means that in spite of very impressive increases in testosterone levels, these increases don't seem to translate to a noticeable benefit. You can get what you want (more testosterone), but still not get what you need (more muscle, less fat), even if you're a theoretical high-responder.

The bottom line: Unfortunately, this study leaves us with more questions than answers. If the fact that this study is cited in the ads for Novedex is why you're thinking of, or have decided to use Novedex, you might want to reconsider your decision. Of course, anyone can try just about anything based on hype or anecdotal personal testimonials, but that's not why you're reading this, is it?

Tuesday, February 26, 2008

Why I'm not writing about beta-alanine lately

I've had a more than a few requests to write more reviews on beta-alanine, since it seems to be all the rage. It is so much the rage lately, that I get more hits on my reviews of the three beta-alanine studies than anything else, by a very large margin. The reviews have been linked by so many people that this blog is on the third page of a Google search for the term "beta alanine" (and this search includes hits from ALL of the supplement sites that SELL beta-alanine), and the FIRST hit when you search Google for "beta alanine studies". The. First. Hit. Holy. Crap.

However, I haven't been writing on beta-alanine for a few reasons:

1) This isn't the beta-alanine blog. There are other studies to review, and apart from reviewing for content, I like to review studies that also highlight particular common methodological mistakes, or, ones that highlight particular methodological strengths to build on the fact that good research in fitness IS possible (if we should just line up all the strengths seen in multiple studies into a single study).

2) I don't want to seem like I'm attacking the work of a single research group. Most of the beta-alanine studies come out of a small number of research centres. The authors of these studies often overlap with one another, or come up repetitively. What I did with the three reviews was pick out the ones that I thought would have the most relevance with respect to generalizability to the largest number of people, or would be considered foundational studies. To continue to review each and every beta-alanine study (which I have been challenged to do) makes it seem like I'm malicious towards people who are probably very nice and respectable.

3) There isn't anything in the other studies that would actually change the current level of evidence for beta-alanine supplementation from "There is inadequate evidence to support using beta-alanine," to "Beta-alanine is worth using." It seems repetitive and, frankly, a bit boring to review yet another beta-alanine study that does not add substantially to the existing body of knowledge regarding its efficacy or effectiveness. If a landmark study of higher AND sufficient quality is published, you can be sure I will definitely review it here. This has not yet happened.

Am I aware that there are new studies? Yes. However, these studies have not yet been indexed. Many of them have not yet been fully published in peer-reviewed journals. With the exception of the Trapp thesis, I generally only reviewed peer-reviewed articles. I think we all know what my feeling is on reviewing abstracts.

Am I aware that beta-alanine has been proven to increase muscle carnosine levels? Yes. However, the fact remains that DESPITE this "significant" increase in muscle carnosine levels, beta-alanine remains associated with non-meaningul (as statistically significant as they might be) changes in performance--except possibly at the highest elite level (which has not yet been adequately studied).

I understand that my blog makes it appear like I have a vendetta against beta-alanine, and there really isn't anything I can write here that would change the opinion of people who have that opinion of me. However, the standards I apply here to my reviews are the same standards that I would apply in a review of a submitted manuscript to the journals for which I am invited to be a peer-reviewer. They are consistent with international standards (such as a CONSORT statement on reporting standards for randomized clinical trials--which is openly linked in my link list.) I don't make the standards up, nor do I make the evidence up. The studies I review are available publicly in many university libraries, and are indexed as part of the Index Medicus (which anyone can access through PubMed.) I have no financial interest in seeing beta-alanine succeed or fail. I have no relationships with any supplement companies other than the fact that I buy supplements for myself. My only interest, with respect to this blog is to simply present the evidence in a critical and as unbiased a way as possible, so that others can make informed decisions about health decisions as they pertain to fitness.

I appreciate all of the feedback and notes that I have received throughout this blog's existence. Most of it has been very positive. And I appreciate all of the support that I have received so far. This blog is linked to many sites and I am flattered that people think it's worth reading to the point that they would recommend it to their friends and blog readers. I hope that someday, it will be more than "the beta-alanine blog", but I'll take whatever successes I can glean.

Thanks again for reading.

Sunday, February 17, 2008

Rest vs. Active Recovery

Lots of stuff happens when you're not doing anything. It's amazing. Your muscles rebuild (hopefully stronger than before). Your bones deteriorate less (if you've been doing weight bearing exercise). Britney does something silly (again). All while you're doing nothing! Rest is an integral part of any training program. Certainly, we know that inadequate recovery is responsible for a myriad of bad things, like decreased performance, and an increased risk of injury. But what about this thing called "active recovery"?

Active recovery can be loosely defined as a low-intensity activity (such as submaximal cycling or low-intensity weight training) used to enhance the recovery process between training sessions or competitions. The theory is that by increasing blood flow (your heart rate increases, therefore your blood is making more 'rounds' as it were), lactate and other 'waste products' are cleared faster, thereby minimizing their detrimental effects in tissues. This should translate practically to a faster recovery than if your blood were moving at its normal velocity. This would mean that you could train more frequently at sustained or higher intensity levels without exposing yourself to the risks of inadequate recovery. Sounds like a great idea, eh?

A recent study however, puts this translation of theory to practice into question. Its scope is somewhat limited, but worth looking at.

Andersson, H., et al. Neuromuscular fatigue and recovery in elite female soccer: Effects of active recovery. Medicine and Science in Sport and Exercise. 40(2):372-80, 2008.

Before we even get into the guts of this study, you can probably tell that we are looking at two major limitations: 1) the results of this study are only generalizable to the sport of female soccer; and 2) elite soccer, at that. So, while this study does challenge the concept that active recovery is useful, it only challenges that concept in the context of elite female soccer players, which likely excludes most of you (It definitely excludes me, on all three levels).

Introduction

Soccer is a high-intensity sport, but we don't understand a lot about recovery, particularly after games, and particularly about female soccer players. Most of the studies to date have been either inconclusive or non-demonstrable in demonstrating many changes in the biochemistry of soccer players, despite an observed performance decrease after games. Active recovery has been studied in male soccer players, but not in female ones. Until now.

These researchers wanted to know a two things: 1) what happens neuromuscularly, and biochemically to elite female soccer players after a game, and 2) does the same things happen to them if they're on passive or active recovery?

Methods

To answer this question, they recruited 22 elite female soccer players from the highest division in Sweden and Norway. Only 17 of these players were studied, because two of them were goal-keepers (and while a very difficult and demanding position, not the same activity profile) and three of the remaining 20 were not available for testing. These 22 players played two 90-minute friendly games, 72 hours apart. The same players participated in both games and played the same positions each time. After the first game, each player was randomly assigned to either passive or active recovery, with balancing for age, height, weight, VO2 max, and field playing position.

[With that many balancing factors, one has to wonder how random it actually was]

Active recovery consisted of 2 recovery sessions, at 22 and 46 hours after the first game (20 minutes of cycling at 60% of their peak heart rate, 30 minutes of low-intensity resistance training and 10 minutes of 60% cycling again).

Prior to the first match, players were tested for 20 meter sprint time, countermovement jump, maximal isokinetic knee flexion and extension and perceived muscle soreness. Blood samples were taken 3 hours prior to the first game, immediately after the first game, and then at 21, 45, and 69 hours after the first game and again immediately after the second game.

The blood was analyzed for creatine kinase (otherwise, known as CK, a general inflammatory marker), urea, and uric acid (both waste products).

All players wore heart rate monitors during their games, and each player was filmed for the entire game. These films were later reviewed to tabulate the intensity of the game. Distance covered, running intensity as well as time spent at each running intensity was calculated to ensure that the players weren't slacking off when compared to one another.

All players were given a meal plan to attempt to standardize diet.

Statistics

The data was analysed with multiple repeated-measures two-way ANOVAs, with the Dunnett as the post hoc test.

[That's a lot of tests!]

Results

Work intensity: The average heart rate was significantly higher within the two groups in game 2 vs. game 1. But it was higher in both groups, so the groups remained comparable.

Physiology after the first game: All performance tests were worse after the first game. And all three biochemical markers were elevated too.

Recovery time: Almost everything was back to baseline by 69 hours after the first game, regardless of which group the subjects were in. Sprint time was the first to recover (5h). Knee extension strength recovered by 27 hours, and knee flexion strength at 51 hours. Countermovement jump (similar to vertical jump) never recovered in either group in time for the second game. CK calmed down by 69 hours, while urea and uric acid returned to baseline by 21 hours. Muscle soreness was reportedly gone by 69 hours.

Between groups: The researchers failed to find a difference between the two groups at any time point. If we consider sprint time to be the major variable of interest, at 69 hours post-game 1, which was before game 2 at 72 hours, the 20m sprint time for the active recovery group was 3.25 seconds (SE 0.03) and 3.23 seconds (SE 0.04) in the passive recovery group.

What I liked about this study is that the researchers went out of their way to determine that the groups remained comparable throughout the study, hopefully recognizing that their randomization scheme might not be enough. What I also liked about this study was that they showed that there was a detrimental effect to performance and biochemistry after the first game. We are unable to say that these soccer players were SO elite that a single game was insufficient to cause performance decreases from which they would have to recover.

Limitations (or, why some of the limitations you might think are here aren't):

The biggest limitations to this study are the ones I've already mentioned. You can't really use this study to justify why _you_ (or I) should sit on a couch--unless you happen to be an elite, female soccer player.

There were definitely some reporting issues. I definitely wonder about the quality of their randomization. If you have to balance for 5 things between 17 players, it's not going to be that random. How many choices are you going to have if you have to find another 22 year old, 5'8, 125 pound forward with a specific VO2 max? I suppose it's possible that elite female soccer players might be all very similar to one another...

We don't know anything about the blinding. And we definitely don't know about adherence. The paper doesn't mention whether the people doing the testing knew which group each player was in, though, with most of these measurements, you'd be hard pressed to bias one way or another short of deceptively entering a false number. However, we don't know what "passive" recovery meant for the passive recovery group. Did they sneak off to do some passive recovery on their own? In some ways this isn't a limitation, as it simply reformats the question to ask whether adding 2 sessions of structred active recovery aids in recovery from a game, as opposed to unstructured active recovery (which the non-active group may have done, on purpose or not).

However, adherence aside, the groups did remain comparable throughout the study for most of the variables we would consider as confounders. And ultimately, the goal of randomization is to create comparable groups.

One of the criticisms that I usually make is that there were multiple tests of significance. However, that's only really a problem if you find you have a significant result and focus on it as though you had set out to look specifically for it. In this case, there were none between the two groups. So despite the fact that the chance of seeing a statistically significant difference by random chance was higher, they failed to detect one.

Lastly, one might think that 17 players is too small a sample size and that that quality makes this a bad study. Remember that statistical significance does not dictate whether an effect size is important or not. You use a statistic to bolster the argument that the difference between two groups (which you have deemed important beforehand) is not one that you got purely by random chance alone. However, the differences observed between the two groups were always miniscule. One could argue that statistical testing is unecessary for such numbers because even if they were statistically "different", it wouldn't be enough of a practical difference to justify one behaviour over the other.

The argument one CAN make with a sample of 17, however, is that these 17 players are somehow not an accurate representative sample of all female elite soccer players. I can't speak to that, not knowing what female elite soccer players are like in general. Certainly, you could make a case that this study may even only apply to Scandanvian elite female soccer players (which, I found out to my embarrassment this past summer, does not include Icelandic elite female soccer players), if you could justify why other elite female soccer players from other countries are distinctly and substantially different than Scandanavian ones.

The bottom line:

Stricly speaking, you don't really get to use this study to change anything you do, unless you're a Scandanvian elite female soccer player. If you are, it might be okay for you to sit on the couch between games. For the rest of us, loosely, you can probably do whatever you like best, whether it's sitting on the couch, or getting some active recovery in, feeling relatively assured that it's probably not going to hurt you. But certainly, this study draws attention to question whether active recovery, though theoretically sound, is actually any more beneficial than passive recovery.

Monday, November 12, 2007

This entry's title is most definitely not, "Ice, Ice Baby"

But maybe I should call it, "Just because there are mistakes, doesn't mean it's all bad."

Ice--the ubiquitous item in every trainer, coach, therapist, doctor's arsenal. Whether you use a frozen bag of peas, a "magic bag", or actual ice attached via several feet of cling wrap, or actual cold-water immersion, there are many reasons to use ice, or more fancily, "cryotherapy". One of these reasons is delayed-onset muscle soreness, or DOMS. DOMS is the pain that you experience 1-2 days after a workout, usually after a significant change in your routine or program. It goes away on its own, but can, in some cases, hinder training, since athletes who are sore after strength training may not be able to train to the same level in their sport. Some sport teams encourage or even mandate contrast baths post-training, for a variety of reasons. I don't know of any lay-people that fill their tubs with ice water, but I'm sure someone will tell me that they, or someone they know does. So, the question is, "Does cold-water immersion reduce DOMS?" Is it worth getting into a tub up to your waist in ice-cold water?

Sellwood KL, Brukner PB, Williams D et al. Ice-water immersion and delayed-onset muscle soreness: a randomised controlled trial. British Journal of Sport Medicine, 41: 392-397, 2007.

Introduction:

The mechanisms by which DOMS actually occurs are still relatively poorly understood. We do know that there are structural changes seen on microscopy and biochemical changes in serum levels of things like creatine kinase and prostaglandins. We also know that DOMS tends to manifest more after eccentric exercise. Ice-water immersion is used, particularly by high-level athletes to minimize DOMS, and it is theorized that it decreases inflammation and also causes blood vessel constriction, so as to prevent some of the swelling (which is also part of the inflammatory process). There have been other studies looking at ice-water immersion on DOMS, but they haven't been very good, and the authors state that most of the studies are underpowered (i.e. not enough people), not blinded, have used resistance trained people (thus decreasing the likelihood of DOMS) as reasons why previous trials have tried and failed. So, these authors decided to put it to the test properly. In Oz, they seem to use a one-minute-on, one-minute off cycle, for three immersions.

Methods:

This was a well-reported study. There are a few issues that I have concerns with, but on the whole, there were almost no elements in this paper that I found were missing.

This study recruited volunteers from the University of Melbourne using posters around the schools of physiotherapy and medicine. Subjects had to be older than 18 years of age. They could not have performed any eccentric quadriceps exercise for 3 months prior. They could not have any neurological disease in the lower limbs, could not have any current injury to the lower limbs, could not be diabetic or have a disease for which cold-water immersion would not be allowed (e.g. Raynaud's phenomenon). They also had to understand English.

All subjects went through a protocol to determine their 1RM for a seated leg extension on their non-dominant leg. They then went through 5 sets of 10 reps of _eccentric_ leg extensions using 120% of their 1RM. The subjects got one minute of rest between sets.

Subjects were randomly allocated to receive either a) an ice-water bath, or b) a warm-water bath. The ice-water bath was "...melting ice water at 5 (plus or minus 1) degree Celsius." (That's 41 F, for the backward countries who refuse to join the rest of the civilised world :P )The warm water bath was 24 degrees Celsius. Subjects had to stand submerged up to the level of the anterior superior illiac spines (basically just below your belly button). Three sets of one-minute-in, one-minute-out were done. [Disappointingly, the authors were a little sparse on their reporting of randomization, stating that the sequence was generated using a random numbers table (which is fine), but didn't say whether patients were allocated by any kind of blocking or whether it was just simple. The fact that they ended up with exactly 20 people in each group is somewhat fortuitous for them.] They did mention that the evaluators of the outcomes were blinded though, which is a plus, and also mentioned that subjects were not told which intervention was considered therapeutic (which is an excellent way, if you can ethically justify it--and you can, to blind patients in whom you cannot conceal the actual treatment from).

The subjects came back at 24, 48 and 72 hours after their eccentric workout, and filled out visual analogue scales rating their quad pain for:
-pain on sit-to-standing
-passive quadriceps stretch
-one-legged hop for distance (and distance was also recorded for this test as a measure of quad function)
-maximal isometric contraction

They were also tested for tenderness on pressure, which was assessed using a pressure algometer, which is basically a device that can measure how much pressure is being delivered through it.They exerted a force of 6 pounds per square cm (what an odd mix of metric and imperial...) on two standardized points of the quads and asked the subjects to rate their pain during pressure.

Subjects' thigh circumference was also measured and recorded at two standardized reference points. Blood work was drawn to measure creatine kinase (CK) levels.

The authors calculated a sample size of 30 subjects to detect a 25% difference between the two groups (i.e. the cold-water group were expected to have at least 25% less pain than the warm-water group). They based their calculation on the fact that a previous study found that there was an average increase of 69mm on the 100mm VAS for pain at 48 hours after eccentric exercise.

[This will prove to be the Achilles heel later on. The decided to recruit 40 patients in case people dropped out--which is also good planning. However, they used an alpha level of 0.05 and a beta-level of 0.8 to calculate this sample size--which is puzzling because their alpha level when it came to analysing their data was 0.01. What saves them in the end here, is that with an alpha level of 0.01, they needed 21 subjects per group, and they ended up recruiting 40. Unfortunately, it doesn't save them enough. Read on.]

Statistics

The authors used an intention-to-treat analysis (which means that regardless of whether someone stayed in the study or not, or whether they went off on their own or not to immerse themselves in freezing water, they were part of the analysis and in the group they were randomly allocated to), which is pretty much the accepted standard in randomized controlled trials. The carried the last value forward for any missing values (also the going standard, which many studies don't do).

[Again, disappointingly, they decided on an alpha level of 0.01 to protect against a type I error (finding a significant difference when one does not truly exist) because of the number of significance tests they were going to perform. I stopped counting at 50. On a conservative Bonferonni adjustment, an alpha level of 0.01 would be the appropriate adjustment for 5 significance tests. So, even with the more conservative alpha level, 50 tests is just downright inappropriate. This is a case of poor prioritization as to defining a single primary outcome. However, despite the gross error of judgement, it surprisingly doesn't really affect the conclusions of the study all that much.]

Results

First off, the authors reported a few demographic statistics with respect to age, body-mass index and so on, but then went on to state that, "No significant difference was noted between the participants in the two treatment groups at baseline..."

[It is a well-established caution that significance testing on baseline values in the context of a randomized controlled trial is inappropriate. This is for 2 reasons: 1) You cannot use classic significance tests to positively find "no difference". You can only find that there is insufficient evidence that a difference exists. Absence of evidence is not the same as evidence of absence. 2) The null hypothesis of a significance test is not, "No difference exists between the two groups," as most beginners will tend to tell you (for the reason stated in number 1 of this list), but rather that the probability of observing data as or more extreme than the observed data is lower than that of random chance. However, in the case of randomization, the group a person ends up in IS up to random chance! So the probability that your observed data is by random chance is...1! So the interpretation of a significant p-value in baseline comparisons is problematic at best, and completely non-sensical at worst.]

I'm not going to go through every significance test that they authors did here. The bottom line is that apart from a few significant p-values in tests that weren't that important, the authors failed to find a significant difference in any of the outcomes that actually mattered. This is why the more conservative alpha level, though an inappropriate way to deal with multiple comparisons when you're planning more than FIFTY tests, is not that big of a deal in this case. They did find that the ice-water group had "significantly" more pain at the 24h more than the warm-water group, but with over 50 tests, there's bound to be a few spurious p-values. I certainly would not agree with the statement that ice-water immersion, "...may make athletes more sore the following day." on this basis. That's data fishing.

However...

The highest median pain score in this study was 38mm (interquartile range 13.8-55.0mm), which is FAR below the score we expected to see compared to the previous study that had a mean pain score of 69mm. So, unfortunately, even though they recruited 40 subjects, if they wanted to detect a difference of 25%, they would have needed at least 45 people in each group (with an alpha of 0.05) or 67 in each group (with an alpha of 0.01). The problem with using a percentage as your criteria for practical relevance is that the estimation equation for sample size doesn't care--it only cares about the absolute difference between the two groups (and the variance within each group). So, while 75% of 69mm is 51.75, for an absolute difference of 17.25, 75% of 38 is 28.5, for an absolute difference of 9.5. It is invariably tougher to detect a smaller difference that it is to detect a larger one. So, for all its efforts and criticism of previous underpowered studies, this one is, alas, underpowered. But, as with the closing sentence of many paragraphs in this review, this "mistake" is also somewhat moot.

Discussion

As with any critical review of a study, it's easy to poke holes in things. This is by far the excuse I have heard the most from people who would prefer not to use studies as evidence for why things work or don't work. "You can find a study to prove anything," is the second. But the trained reviewer understands that it's not enough to poke holes in papers--you have to understand how the hole ultimately affects the study's conclusions. In this study's case, it wasn't really that important that the authors did a bajillion significance tests (I think bajillion is somewhere higher than 50, but less than a gazillion) because they didn't actually affect the study's conclusion that they failed to find a difference in pain reduction between ice-water and tepid-water immersion.

The authors acknowledge the limitation of their study in that they failed to elicit as high a pain as other studies (and more importantly the study they based their sample size calculation on), or alternatively, maybe they just had tough-as-nails subjects who didn't rate their pain very high. They said that the strength deficits were pretty small with respect to the DOMS, and also the CK levels didn't rise as much as other studies, but honestly...5 sets of 10 reps of eccentric leg extensions at 120% of your 1RM (if it's a true 1RM, and there is a debate as to whether an untrained individual can generate a true 1RM) seems like more than I would do or would recommend as a strength workout, so how much more would be comparable to a "trained" or "athletic" workout? And do those other studies demand workloads that are far in excess of what is actually done in the "real world" in an effort to create SUPER SOUL-ANNIHILATING DOMS!!!! (tm) ?

And always, the authors hedged on the fact that maybe ice-baths have a psychological benefit to athletes (quite like taping!), and that even if there might not be any benefit (from both a physiological and a pain perspective), who's to take that value away from an athlete who might become mentally crippled if he/she were unable to take an ice-bath or tape their ankles?

Looking at the numbers, I come away from this study with 2 thoughts (apart from the ones above): 1) Maybe DOMS isn't that crippling for most people, and if it is, maybe we should be asking whether that kind of training load is necessary rather than trying to come up with ways of preventing or treating DOMS; and 2) Regardless of the power of the study, if it's an accurate picture of what DOMS pain looks like, I don't need a statistical test to tell me getting in a tub of ice-water up to my belly button is an experience I could probably go without, because the difference between warm and ice-water is tiny.

The bottom line:

Getting into a tub of testicle-shrinking ice-cold water is probably unnecessary for most of us in terms of preventing of treating DOMS. It's probably more fun if there's vodka and a sauna involved though (but they didn't study that).

Monday, October 15, 2007

A return?

For those of you who have followed my blog, thanks for all your support. I realize that I have been remiss in keeping it up, but residency...well, it's a whole new kettle of fish, and I'm not even sure they're all fish in there. I will likely not be updating my blog every week, though, I will try, depending on my call schedule. Basically, a review takes about 2 hours to type up, proof-read and edit (and even then, it could probably use a bit more work), but when I'm sleep-deprived, the last thing I want to do is a blog entry--as fun as it is. The tutorial entries don't take quite as long. Two entries a week would definitely be wishful thinking at this point. If I knew how to send out auto-update notices, I would, but seeing as I don't, I guess you'll just have to keep checking back. RSS feeds can be useful that way.

Going the extra mile doesn't always make things better (but then again, it might)

I picked this study for two reasons: 1) It's actually not a half-bad study, and 2) It addresses a significant fitness issue that has been plaguing athletes, trainers and coaches for decades--to stretch or not to stretch. However, despite the study's many strengths, it falls just short of making it truly useful in helping active people make the decision whether or not to perform static stretching.

Kokkonen J, Nelcon AG, Eldredge C, Winchester JB. Chronic static stretching improves exercise performance. Medicine and Science in Sport and Exercise. 39(10): 1825-1831, 2007.

Introduction:

I really liked this introduction. The authors did a great job of presenting the literature, including why the previous literature doesn't really give us any definitive answers on whether non-pre-event static stretching has detrimental effects on performance. They define their boundaries early on and don't dispute the clear literature base on pre-event stretching. This is not the issue here. However, they make a great case as to why we are still in the dark with regular static stretching and where it should be placed (if at all) in a training program. Clearly, it is not before significant events (which, for many active people, includes their "workout"--whether that be lifting weights, going for a run or otherwise), but is there a place for stretching on say, rest days, or post-workout? And if so, what kinds of benefits might we see from it?

The authors go on to review the literature on performance benefits derived from static stretching, most of which are strength related. Yes, that's right. Stretching might make you stronger. Well, some of you anyways, as we will see.

The purpose of this study therefore, was to determine whether a static stretching program could have an impact on strength, muscle endurance and power. (For those of you who are scratching your heads at "static stretching", this refers to the "stretch and hold" variety of stretching--as opposed to "bounce" or "move" varieties, also referred to loosely as "dynamic stretching")

Methods:

The authors recruited 40 students (why 40, I have no idea, as there is no sample size justification in this paper) who were attending Brigham Young University in Hawaii (Damn it, I should have gone to university in Hawaii). The authors describe these students to be either inactive or recreationally active. They further narrowed this vague term down to exclude anyone who did specific endurance or strength training on multiple days during the week; as well as anyone who did more than 60 minutes of physical activity more than 3 times per week. This basically excluded anyone who lifted weights more than once a week, anyone who went for a run more than once a week (strength and endurance training, respectively), anyone who played basketball for more than 60 minutes, more than 3 times per week, and anyone who went for a walk for more than 60 minutes, 3 times per week. They also excluded people who did any sporting activities more than 6 times per month. Basically if they thought you were doing any kind of structured or planned physical activity, you got booted from the study. And from the sounds of the description, they were fairly strict about it.

[The authors' selection of subjects is both one of the biggest strengths and biggest weaknesses of this study. In order to isolate the effect of stretching on strength, muscular endurance and power, they chose to study people who were basically sedentary. This has two advantages: 1) If you study a group of people who do basically no physical activity other than stretching, then improvements in physical activity testing can be attributable to the stretching; and 2) you are most likely to see the biggest changes in those people who are furthest away from their "athletic potential". You are least likely to see an improvement in reading ability if you study lawyers than if you study third graders (debate what you will about lawyers, but they read good). The same principle applies here.

The drawback to this approach, is naturally, the extent to which we can generalize these results to anyone but basically sedentary people. If you study third graders' reading, you can't go and claim that your revolutionary program is going to improve the reading ability of the lawyers--or even fifth graders. BUT, given that we know so little about static stretching and performance in this context, this study IS the most appropriate starting point, as it gives us the best chance of detecting a benefit. If these researchers had gone with "trained college men" and NOT found a difference, we would still be in the dark because that population is in the middle of the road. Given that the study finds that static stretching improves performance in the least-trained individual, we can then start working our way up to see if this effect persists with more-trained people.]

Subjects were "randomly assigned" to either the stretching group or the control group. The control group were asked to refrain from any stretching activities, but otherwise, the groups were told to maintain what little physical activity they were doing. The researchers also made sure that there were the same number of men and women in each group. Every had to keep a log of their activities.

[I really wish they had gone into just a little more detail about randomly assigning people to groups.]

In addition to all of this, the authors report that all local recreational facilities were monitored by research staff and, "...the presence of any subjects was noted and collaborated with their exercise logs."

[Holy big brother, Batman. This is by far, the furthest extent to verifying non-activity I have ever encountered. A little creepy perhaps, but from a scientific perspective, one of the more rigourous ways of ensuring subject compliance (or non-compliance, as the case may be) ]

Outcomes:

The authors tested the following things:

1) Sit and reach (flexibility)
2) 20m sprint time (power)
3) Vertical jump height (power)
4) Standing long jump length (power)
5) Knee extension and flexion 1RM (strength)
6) Knee extension and flexion endurance (endurance)
7) VO2 peak (endurance)

Testing was done over 3 days. The details of this aren't that important. If you're really interested, read the article.

[We are not told who is blind to what in this article. So, for all we know, everyone know what everyone is doing (i.e. subjects, evaluators, group-assigners, the gym spies...I still can't get over that.) ]

The stretching program:

The stretching program was a 10-week program with 15 different stretches. Each stretch was held for 15 seconds and each stretch was repeated 3 times. All stretches were performed non-assisted, and 12 of the 15 stretches were performed again, but with assistance. Stretching sessions lasted about 40 minutes each.

[I won't go into each of the individual stretches. They are described though. Forty minutes is a long time to stretch though, even for a static stretching program. I would say that this is perhaps one of the bigger weaknesses of the study. How feasible would this be if this were you?]

Statistics:

This is by far the least liked part of the study for me. The authors decided to treat the data as four separate experiments, in order to avoid adjusting their alpha-level to oblivion with 7 multiple tests. So, instead of having to meet an alpha-level of 0.007 (which is 0.05 divided by 7), they had to meet an alpha-level of 0.05 for the "flexibility experiment" (only one test there), 0.017 for the "power experiment" (3 tests), etc. There is an argument that can be made for this kind of strategy--certainly the "experiments" are grouped logically and not haphazard, but given that it's all on the same population, it's still ONE experiment. However, this point is relatively moot when we get to the results.

Otherwise, statistical testing was appropriate with a two-way ANOVA with the protected Tukey as a post-hoc test.

[I'm always a bit conflicted with respect to using the two-way ANOVA in these studies. Basically, you're trying to see if a significant difference exists either between the two groups (the first way), or within each group (the second way), or both. However, in the context of a randomized controlled trial, we are really not interested in whether one group did better compared to itself (i.e. the stretching group improved their, say, strength, when compared to their pre-testing), because whether the stretching group did better compared to itself is irrelevant, if the control group ALSO did similarly well (significantly, or not). So why bother seeing if the groups do better compared to themselves?]

Results:

Thirty-eight subjects finished the study. One subject from each group dropped out because they wanted to exercise. Of the remaining subjects, no one exercised for more than 18 days of the 10 weeks. No one in the stretching group missed more than 2 stretching sessions.

[Overall, I did not like the reporting style, nor the statistical reporting of this paper. Everything was reported and interpreted in percentages, despite the authors providing the actual numbers in a table. This leads me to wonder whether the statistical tests were done with the percentages as well, as opposed to the actual number results--which has implications for the likelihood of uncovering a significant difference, since differences may become magnified once converted to percentages. This includes confidence intervals, which were used, and I think any authors who use them should get some kudos. I just wish they had done them on the actual numbers]

Flexibility: The stretching group was more flexible than the control group after 10 weeks. I'm sure you're all floored. Fortunately, the control group's flexibility did not change (on average, it got worse), which is a good thing because even if they DID stretch on the sly (away from the piercing eyes of the gym spies), they didn't get more flexible, so we can still make statements about the effects of increasing flexibility on other things.

I could go through all of the outcomes, but in the end, the stretching group always did better than the control group from a percentage point of view.

However, when I look at the all-important results table with the actual numbers, what I see is very very small improvements, with not a whole lot of convincing numbers to show that the "significant differences" detected through percentages carry through with the raw performance numbers. For example, the stretching group ended up with an average knee extension 1RM of 82.0 kg, but a standard deviation of 25.8, while the control group had an average 1RM of 71.0 kg, but a standard deviation of 20.8! While the difference is 11 kg, on average, the variance is SO large that I would be surprised if that turned out to be statistically significant. Less dramatic would be something like the 20m sprint time: Stretching group average 3.8 (SD 0.51), control group average 3.68 (SD 0.31), both post- values. Even if that was statistically significant, I'm not sure that a difference of 0.18 seconds is anything to write home about--remembering that we are not talking about Olympic athletes upon which tenths of a second are important.

So, in the end, what we have is a bit of a mixed bag of a study. It has some REALLY strong points (I mean, KUDOS for spying on all your subjects at the gym!), and a few weak points, but overall, fails to produce data that shows us that static stretching is a worthwhile pursuit, unless your life is judged on percentages alone. Given that one would have to stretch for FORTY minutes, 3 times a week, I can think of several things I would rather do for 40 minutes, 3 times per week, which might produce the same (i.e. possibly very little) _absolute_ benefit in strength, power and muscular endurance.

Still, I suppose if I was unable to do any activity other than stretching, I could be convinced that it might prevent me from deteriorating too much. And that's not necessarily a benefit to sneeze at, given that there are many athletes who are sidelined into inactivity due to injury or other circumstances, not to mention several patient groups in inactivity for health reasons. So, in a way, it all depends which side of the fence you're approaching the problem.

The bottom line:

If you can't do anything else, static stretching for 40 minutes, 3 times a week could keep you from deterioration from a muscular strength, power and muscular endurance perspective.