Tuesday, July 22, 2008

Washington Nationals' Injury Report

In a 162-game season, injuries will undoubtedly play a role on almost every major-league team. The key to a successful season lies in part with avoiding too many significant injuries to key players.


In my following of Major League Baseball this season, along with the Washington Nationals, I have noticed an abnormal amount of injuries.


Here is the injury report for the Nationals this season:


1B Nick Johnson: Out for season (only played in 38 games this year), making $5.5 million this year.


Closer Chad Cordero: Out for season (only pitched in six games this year), making $6.2 million this year.


3B Ryan Zimmerman: Out since May 26, making $465,000 this year.


1B Aaron Boone: Out since July 7, making $1 million this year.


OF Elijah Dukes: Out from beginning of season until May 9, and has been out since July 6, making $392,500 this year.


Starting Pitcher Shawn Hill: Out from March 20 to April 18 and has been out since June 25, making $402,000 this year.


OF Lastings Milledge: Out since June 29, making $402,500 this year.


C Paul Lo Duca: Out April 18 to May 2 and May 9 to June 17, making $5 million this year.


1B Dmitri Young: Out from April 8 to May 15 and out since July 19, making $5 million this year.


OF Austin Kearns: Out from May 22 to July 3, making $5 million this year.


OF Wily Mo Pena: Out March 20 to April 13 and out since July 18, making $2 million this year.

2B/3B Ronnie Belliard: Out May 20 through June 10, making $1.6 million this year.


Relief Pitcher Ryan Wagner: Out since March 20, making $450,000 this year.


C Johnny Estrada: Out from March 26 to April 9 and from May 9 to July 18, making $1.25 million this year.


Starting Pitcher Odalis Perez: Out from June 14 to June 26, making $850,000 this year.


This is a very long list, and these are all players who played, or would have played, significant roles on the Nationals this year.


The only position-player starters from the beginning of the season that have avoided the DL are the middle infielders: SS Cristian Guzman and 2B Felipe Lopez. However, Lopez has lost his starting spot at multiple times throughout the year.


The Nationals payroll this season is $43.3 million. Calculated from this list of players, $35.5 million of those players have spent some time on the disabled list.


That is, financially 82 percent, of the team.


The Nationals currently have just over 50 percent, financially, of their team on the DL.


This is not normal, and from what I remember, this does not seem to be much different from last year, either. Nick Johnson missed almost the entire season last year as well, and Cristian Guzman missed the whole second half.


For a team that has been a bottom dweller for all of recent memory, it makes it extremely difficult to rebuild when all of your players are injured.


The list of players the Nationals have used in left field this season is longer than most team’s available infielders: Rob Mackowiak, Wily Mo Pena, Elijah Dukes, Willie Harris, Paul Lo Duca, Ryan Langerhans, and Kory Casto.


I am not one to think the injury bug in DC is coincidence, and I have two possible explanations.


The first is that the Nationals’ medical staff and trainers are totally incompetent.


And the second, more viable explanation, is that players have no interest in coming off the disabled list.


Who can blame them? Who wants to play for a team that has been outscored by more than 100 runs this season?


The Washington Nationals have a lot of issues to address. But first and foremost, they need to get, and keep, their players healthy.


Even the players that have been healthy have no risk of being demoted because there are no available replacements. For most of the first half of the season, both Willie Harris and Wily Mo Pena struggled to hit .200. Aside from Guzman, the rest of the averages haven’t been much higher, either.

Sunday, July 20, 2008

Decade's Best NCAA Basketball Champions Revisited

I wrote an article analyzing the past nine NCAA tournament champions a few weeks ago. However, due to some suggestions and more thought I decided on some different and possibly improved rankings.

To go along with the rankings in the original article, I have three more sets of rankings here that look at things slightly different.

Here are the original rankings. The rating, in parenthesis, here weights each team's average margin of victory by their average seed defeated.

1. 2001 Duke Blue Devils (2.5)

2. 2007 Florida Gators (2.36)

3. 2002 Maryland Terrapins (2.33)

4. 2000 Michigan State Spartans (2.14)

5. 2006 Florida Gators (2.09)

6. 2004 Connecticut Huskies (2)

7. 2005 North Carolina Tar Heels (1.98)

8. 2008 Kansas Jayhawks (1.77)

9. 2003 Syracuse Orangemen (1.53)

These rankings account for the strength of opponents played, but treat each champion as the same regardless of seed. The non-No.1 seeded champions do not look as good in these rankings. This may be realistic or unrealistic, but the following two ranking sets look more into that issue.

Now here are very similar rankings, except these weight teams average margin of victory with the average seed difference in games played. It is possible to have a negative seed difference (example: a No. 3 seed play a No. 1 seed would be a "-2" seed difference).

1. 2006 Florida Gators (3.43)

2. 2003 Syracuse Orangemen (3.25)

3. 2001 Duke Blue Devils (2.94)

4. 2004 Connecticut Huskies (2.86)

5. 2007 Florida Gators (2.83)

6. 2002 Maryland Terrapins (2.8)

7. 2000 Michigan State Spartans (2.49)

8. 2005 North Carolina Tar Heels (2.31)

9. 2008 Kansas Jayhawks (2.02)

This set was much more generous to the lower seeded champions. It basically gives a No. 3 seed more credit for beating a No. 1 seed than if a No. 1 seed were to beat a No. 1 seed.

These next rankings are the same thing as above, except they ignore the first round of the tournament. I noticed that several teams were benefiting greatly from beating No. 16 seeds by 40+ points so I tried to eliminate that as a significant factor.

1. 2003 Syracuse Orangemen (8.6)

2. 2002 Maryland Terrapins (4.6)

3. 2004 Connecticut Huskies (4.2)

4. 2006 Florida Gators (4.12)

5. 2001 Duke Blue Devils (3)

6. 2000 Michigan State Spartans (2.95)

7. 2007 Florida Gators (2.8)

8. 2005 North Carolina Tar Heels (2.62)

9. 2008 Kansas Jayhawks (2.26)

It's actually remarkable that Syracuse's rating doubles nearly every other champion's rating in this set. Syracuse had a very tough road to the championship, which boosted their rankings a ton. Their average seed difference after the first round was exactly 1 because they played three teams that were seeded higher than them.

Finally, I decided to make a set of rankings totally ignoring opponents. This rating is based solely on average margin of victory. The assumption here is that each team should have relatively equivalently difficult paths to a championship. I would actually argue that this is a fairly reasonable assumption.

1. 2001 Duke Blue Devils (16.67)

2. 2006 Florida Gators (16)

3. 2000 Michigan State Spartans (15.33)

4-T. 2007 Florida Gators (14.17)

4-T. 2008 Kansas Jayhawks (14.17)

6. 2002 Maryland Terrapins (14)

7. 2005 North Carolina Tar Heels (13.83)

8. 2004 Connecticut Huskies (13.33)

9. 2003 Syracuse Orangemen (8.67)

As you may have noticed, the 2003 Syracuse team is probably the most interesting team in this rankings. In two of the polls they are at the top or near the top of the list, and they are last in the other two. This is because their main strength was beating very highly ranked (low seeded) opponents.

In general, these four polls don't agree on a whole lot. The biggest thing I noticed in common is that the 2005 UNC team and the 2008 Kansas team were probably the consensus lowest ranked champions. Beyond that, it depends on how you believe teams should be measured.

The main question is if you think the No. 3 seeds should be rewarded because they weren't seeded as favorably or if they should be treated the same as the No. 1 seeds.

I thought about consolidating the rankings into one (and still may do that in a future post) but I think I like the variety that each one gives.

I'd love to hear people's opinions on how they would rank the teams, either objectively or subjectively.

Wednesday, July 16, 2008

MLB All Star Break Report: Statistical Predictions


Teams that outscore their opponents, on average, should win a lot of games. Likewise teams that get outscored on average should lose most of their games.

This is a very simple concept, and I will use it to analyze the MLB season before the All-Star break and make some predictions for the rest of the year.

I have used the run differential (total runs scored minus total runs allowed) in a linear regression to try to explain the win percentage of each major league team.

In general, it would make sense that teams with the highest positive run differential would also have the highest winning percentage. And vice versa; that teams with the highest negative win differential would have the lowest win percentage. Teams with a run differential around 0 should have a win percentage around .500 because, on average, they should win just as many games as they lose.

Of course this would never work out in real life. In addition to random variation and luck, some teams also just perform really well in close games while others do not. Some teams are more over-matched by the better teams and some teams are better at pounding the bad teams.

However, I propose that teams with a run differential much higher than their record would suggest have a strong potential to find more success in the remainder of the season because they have shown the ability to consistently outscore opponents. The reverse is also true; teams that have a win percentage much higher than their run differential would indicate (teams that are getting “lucky”) a potential for a less successful second season.

Under these assumptions, I interpreted the results from my regression and will exhibit them below. The graph of predicting win percentage from scoring difference can be seen at the top of the article.

I was very pleased with how the graph turned out for several reasons. The regression equation is:

Winning Percentage = .500 + .000941(Run Differential)

This says that for every run a team scores more than their opponent, their winning percentage will increase by .0941%. It is a very good sign that this is a positive number, or else scoring more runs than you’re opponent would be a bad thing.

The equation also says that a team with a run differential of exactly 0 would be expected to have a winning percentage of .500. This makes sense and I was very pleased that this worked out exactly. It is a good sign that run differential is a good predictor of winning percentage.

Finally, the R-squared value for the regression was 71.3%. This means that 28.7% of the variability in team winning percentage is left unexplained by only using run differential. This makes sense from my discussion before; some teams get lucky and some teams also have a knack for winning or losing close games. However, 71.3% is fairly high for only using one variable. While run differential is not necessarily a good precise predictor for win percentage, it is a very reasonable approximation.

Now that I feel fairly safe with my assumptions, here are the interpretations for the results.

First, the most interesting teams on the graph are ones that fall far from the regression line. Teams underneath the line have won fewer games than their run differential would suggest (“unlucky”), and teams above the line have won more games than their run differential would suggest (“lucky”). The further a team is from the line, the more lucky or unlucky they have been. Note that I use the term lucky and unlucky very loosely here, as there is certainly some skill involved in winning close games.

Based on the results, here are ten teams that should expect the biggest change in winning percentage for the rest of the season. The over-achievers will likely perform worse, and the under-achievers should do better.

Top 5 Over-Achievers

1. Angels

2. Marlins

3. Twins

4. Rays

5. Rangers

Top 5 Under-Achievers

1. Indians

2. Braves

3. Mariners

4. Phillies

5. Blue Jays

Now I will break down the MLB pre-All Star break season, still based on my results, for each division. I have calculated a modified version of the standings assuming that win percentage only depends on scoring margin. I included the original standings for comparison. Teams with significant changes in standings have strong potential to have differing success for the rest of the season.

AL East:

Team

Modified Standings

Original Standings

Red Sox

-

-

Rays

5

.5

Yankees

7

6

Blue Jays

7

9

Orioles

10

10

The biggest flag here is the Tampa Bay Rays. They could be much further behind the Red Sox now, so don’t be surprised to see them fall further behind after the All-Star break.

Also, look out for the Blue Jays in the second half. It will be difficult for any team to dethrone the Red Sox, but the Blue Jays could have a strong run and at least contend for the Wild Card.

AL Central:

Team

Modified Standings

Original Standings

White Sox

-

-

Twins

6

1.5

Indians

7

13

Tigers

7

7

Royals

13

12

While many consider the White Sox season to date to be a fluke, the numbers suggest otherwise. They have a comfortable division lead in the division standings, so I wouldn’t expect them to fade very much.

After some small glimmers of hope, Royals fans should expect another very poor end of the season results.

The Tigers still need to improve a lot to make a run at the division, and the Indians could also move up the standings a lot in the second half of the season.

The Twins may have already played their best baseball of the season, but could still hang around for a while.

AL West:

Team

Modified Standings

Original Standings

A’s

-

6

Angels

4

-

Rangers

8

7.5

Mariners

11.5

20

The modified standings show a huge reversal at the top of this division. Even though the A’s just traded away their ace, the Angels should be a lot more worried about being caught than most people think.

The Rangers have over-achieved so far, so don’t expect them to make a serious run towards the playoffs.

Also, the Mariners aren’t quite as bad as their record would suggest. They could win a lot more games in the second half and build some momentum going into next season.

NL East:

Team

Modified Standings

Original Standings

Phillies

-

-

Mets

3.5

.5

Braves

3.5

6.5

Marlins

9.5

1.5

Nationals

17.5

16

Like the other Florida team that had a lot of first half success, the Marlins should continue to slide in the standings.

The Phillies look like they are going to be tough to beat this year, but they will have to watch out not only for the Mets but the Braves as well.

The Nationals are flat out bad and should easily secure the worst record in baseball after the All-Star break.

NL Central:

Team

Modified Standings

Original Standings

Cubs

-

-

Cardinals

7.5

4.5

Brewers

8

5

Astros

13.5

13

Reds

14

11.5

Pirates

15.5

12.5

From these results, the Cubs look to have the safest division lead out of all the division leaders. Every team in this division has actually over-achieved, but the Cubs and Astros have over-achieved the least.

The Brewers may be the only team with hope of making a run at the Cubs after adding C.C. Sabathia to the top of their starting rotation.

NL West:

Team

Modified Standings

Original Standings

Dodgers

-

1

Diamondbacks

.5

-

Giants

6

7

Rockies

9

8.5

Padres

9

10

Amazingly, all of these teams under-achieved in the first part of the season. That’s a very good sign considering how bad these teams have been so far. No team has a winning record.

The Dodgers and Diamondbacks should have a very close race for the division lead, and the Giants will be looking to make it a three-way race.

The defending N.L. Champion Rockies might need another miraculous win streak to have a chance to defend their title.

Of course it is impossible to predict what is really going to happen in the future, but hopefully this analysis provides some good insight for what to expect. It will be interesting to see how well these discrepancies match up with what actually plays out, and I will be sure to keep an eye on that.

For those interested, detailed MLB Standings can be found here. I found it interesting to look at the probabilities for each team to make the playoffs, win the division, and win the Wild Card.

Monday, July 14, 2008

New York Yankees' Playoff Chances

I’d like to start off with a question: What’s wrong with the Yankees?

And an answer: Nothing.

Entering the All-Star break, the New York Yankees sit in third place in the AL East. At 50-45, they sit 6 games behind the Boston Red Sox and 5.5 games back of the surprising Tampa Bay Rays.

In this article I will explain why I think the Yankees will still make the playoffs, and discuss some trade moves/additions they could make in the next part of the season.

The first reason is the most important. Last year at this time the Yankees were in an even worse position. Heading into the All-Star break last season they were 9.5 games behind the Red Sox and had a winning percentage of only .500, at 43-43.

They then went on to go 51-25 the rest of the season and win the wild card to qualify for the playoffs. They finished only 2 games behind the Red Sox.

This season the Yankees have a better record than last, and also are 3.5 games closer to the division leaders in the standings. They came back in the standings easily last year, so why not this year too?

Reason number two is the Tampa Bay Rays. The Rays have never won more than 70 games in a season and have finished something other than last (second to last) in the division only once in their franchise history.

While the Yankees have more experience than any other team as far as qualifying for the playoffs, the Rays have absolutely zero experience of playing meaningful games towards the end of the season.

The Rays also enter the All-Star break on a seven game losing streak. Despite the current margin, it would be very surprising for the Rays to end the season ahead of the Yankees in the standings.

In fact, the Rays should even help the Yankees catch their ultimate foe, the Red Sox. The Rays have swept the Red Sox at home this season thus far (6-0), and have a home series against the Sox late in the season. The Rays host Boston in a three game set from September 15-17. If Tampa Bay’s winning trend at home against the Red Sox continues, the Yankees could gain a lot of ground quickly on the Red Sox late in the season.

The third reason is pitching. This may come as a surprise, considering pitching is usually pointed to as the Yankees most glaring weakness. Post All-Star break, their pitching will be better; and here’s why:

Joba Chamberlain has a 2.81 ERA through 41.1 innings pitched as a starter. He will only get better as he gets accustomed to his starting role and will emerge as the ace of this staff.

Mike Mussina and Andy Pettite are both extremely experienced pitchers, and have posted double-digit wins and ERA’s under 4 so far.

Chien-Ming Wang is the arguably the most reliable pitcher on the Yankees staff. He has an 8-2 record and an ERA just over 4. Amazingly, he has given up only four home runs this year in 95.0 innings.

That is four solid starting pitchers the Yankees will use for the rest of the season. The fifth and final spot belongs to Sidney Ponson for the time being. Ponson is an accomplished pitcher who has performed reasonably well in his first three starts as a Yankee. Ponson has had some struggles at this point in his career, so the Yankees are sure to keep a sharp eye on him for his next few starts.

Should Ponson falter in the slightest, expect the Yankees to make a deal for another starting pitcher. While big names like Harden and Sabathia have already been traded, there are several good pitchers the Yankees could still go after. This includes Cleveland’s Paul Byrd, Toronto’s A.J. Burnett, Seattle’s Erik Bedard, and Washington’s Tim Redding. It would be in the Yankees best interest to get another young pitching prospect in the long term, but for the short term they just need another reliable arm to send out to the mound.

Also, the Yankee bullpen is one of the best in the game. Mariano Rivera is still a lights out closer. He has 26 saves and a miniscule 1.06 ERA.

Kyle Farnsworth is a tall, hard throwing right-hander that is very difficult to score runs off of. His ERA is a little high at 3.51, but he has added approximately 1.45 wins to the Yankees cause thus far (measured by WPA, or Win Probability Added).

Jose Veras and Edwar Ramirez have also been very successful in the Yankee bullpen; both have ERA’s under 3.

With a fantastic closer and three very good relievers, the Yankee bullpen should have no trouble through the end of the season and into the playoffs.

Derek Jeter is my fourth reason. While he is the starting shortstop for the AL in the upcoming All-Star game, Jeter’s statistics have been below his career mark across the board thus far. I expect him to have a big second half of the season.

Jeter is striking out far less this year than other year during his career, at only 11.8% of his at-bats. So his problem is certainly not putting the ball in play.

However, his batting average only on balls hit in play (BABIP) is far lower than his career average. Jeter’s BABIP to date this season is a modest .315. Compare this to .368 last year, .394 the year before, and a .356 BABIP for his career. In fact, the .315 mark would be the lowest for Jeter’s entire career if the season ended today.

A lower BABIP means that more of your hits are being successfully fielded. The only explanations for this is either Jeter is getting unlucky and hitting the ball right to fielders, or that his is mis-hitting balls. This would mean either his timing or contact point is slightly off, which would be a rather easy mechanical fix with enough work.

Since Jeter is the Yankee captain and has come up big when the Yankees have needed him most throughout his career, I expect Jeter would put the necessary work in to fix any of these problems. His BABIP should stabilize and move closer to his career mark of .356. This means his productivity will increase, and this should be (along with my next reason) enough to catalyze the Yankee offense into some explosive run production.

The fifth reason is injuries. While all teams deal with injuries, the Yankees have had four different starters injured during the first part of the season. Alex Rodriguez and Jorge Posada both missed significant time at the beginning part of the season, but have since returned to the lineup. The Yankees also currently have outfielders Johnny Damon and Hideki Matsui on the DL.

Damon and Matsui were both hitting well above .300 before their injuries. Damon is expected to return shortly after the All-Star break and the Yankees are hopeful to get Matsui back soon as well. Matsui has started rehab assignments after injuring his knee in June.

With a healthy lineup, the Yankees still pose the biggest offensive threat in baseball. The return of Damon and Matsui not only gives the Yankees two more very successful bats in their already potent lineup, but it allows them to stop using Melky Cabrera in a starting role.

Cabrera has been the Yankees least productive position player by far this season. Melky has 356 at bats and only a .244 average. He also has a WPA (win probability added) of -1.6, which means he has cost the Yankees approximately 1.6 wins already this season. Cabrera has seen a decline in most of his statistics over his few years with the Yankees, so there is not a lot of hope for significant improvement in the second half of the season.

If any other injuries come up, or even if they don’t, I wouldn’t be surprised if the Yankees try to trade for another hitter as well. Jason Giambi has been having a fantastic season, but they could use someone who could play first base and/or DH, and maybe more importantly a utility fielder. Someone that could play second base to spell Robinson Cano, who is the next least productive Yankee behind Cabrera, and someone to fill in the outfield for injured players as needed.

Unfortunately for the Yankees, there are not many of these type players considered to be on the trading market right now. But the Yankees seem to always find ways of making rich enough offers to lure the players that they want.

Finally, the last reason I’m going to give why the Yankees will come back to make the playoffs is Yankee Stadium. Yankee Stadium as we know it is going to be demolished after this season. There is simply too much tradition, spirits, and memories that lie within the confines of Yankee Stadium to not give way to one last chance at baseball’s highest crown.

The Yankees have made the playoffs every year in recent memory, and there is no reason to think this year will be any different. Catching the Red Sox may prove difficult, especially after Boston finally broke through to win the division last year (first non-Yankee AL East Champion since 1997).

However, the Yankees will certainly give the Sox a run for their money. And if they fail to win the division, the Wild Card team from the American League has traditionally come out of the AL East as well in the modern era. The Yankees should have no problem locking up the second position in the division.