Loading

블로그채널

Search !

[R] Chap2 Ex5 Pitcher Strikeout / Walk Ratios


Chapter 2. Exercises 5. 
Pitcher Strikeout / Walk Ratios

Analyzing Baseball Data with R, Introduction to R, page 58


(a) Read the Lahman "pitching.csv" data file into R into a data frame Pitching.

    Pitching <- read.csv("pitching.csv")


(b) The following function computes the cumulative strikeouts, cumulative walks, mid career year, and the total innings pitched (measured in terms of outs) for a pitcher whose season statistics are stored in the data frame d.

stats <- function(d){
  c.SO <- sum(d$SO, na.rm=TRUE)
  c.BB <- sum(d$BB, na.rm=TRUE)
  c.IPouts <- sum(d$IPouts, na.rm=TRUE)
  c.midYear <- median(d$yearID, na.rm=TRUE)
  data.frame(SO=c.SO, BB=c.BB, IPouts=c.IPouts, midYear=c.midYear)
}

Using the function ddply (plyr package) together with the function stats, find the career statistics for all pitchers in the pitching dataset. Call this new data frame career.pitching.

career.pitching <- ddply(Pitching, "playerID", stats)
# or 
# Note the use of the '.' function to allow 
# playerID to be used without quoting
career.pitching <- ddply(Pitching, .(playerID), stats)
}

(c) Use the merge function to merge the Pitching and career.pitching data frames.

merge(career.pitching, Pitching)

(d) Use the subset function to construct a new data frame career.10000 consisting of data for only those pitchers with at least 10,000 career IPouts.

   career.10000 <- subset(career.pitching, IPouts >= 10000)

(e) For the pitchers with at least 10,000 career IPouts, construct a scatterplot of mid career year and ratio of strikeouts to walks. Comment on the general pattern in this scatterplot.

with(career.10000, plot(midYear, SO/BB))

[R] Chap2 Ex5 Pitcher Strikeout / Walk Ratios

Chap2 Ex4 subset function strikeout-walk ratios