This is my title - take me out if neccessary
========================================================

The series of === are just making a line.

The three "back ticks" (`) must be followed by curly brackets "{", and then "r" to tell the computer that you are using R code.  This line is then closed off by another curly bracket "}". 

Anything before three more back ticks "```" are then considered R code (a script).  

I'm reading in the bike lanes here.
```{r readin}
# readin is just a "label" for this code chunk
## code chunk is just a "chunk" of code, where this code usually
## does just one thing, aka a module
### comments are still # here
### you can do all your reading in there
data.dir <- "~/Dropbox/WinterR_2014/Lectures/data"
### let's say we loaded some packages
library(stringr)
library(plyr)
fname <- file.path(data.dir, "Bike_Lanes.csv")
## file.path takes a directory and makes a full name with a full file path
bike = read.csv(fname, as.is=TRUE)
getwd()
```

You can write your introduction here.

## Introduction
Bike lanes are in Baltimore.  People like them.  Why are they so long?

## Exploratory Analysis

Let's look at some plots of bike length.  Let's say we wanted to look at what affects bike length.


### Plots of bike length
Note we made the subsection by using three "hashes" (pound signs): ###.


```{r}
no.missyear <- bike[ bike$dateInstalled != 0,]
plot(no.missyear$dateInstalled, no.missyear$length)
no.missyear$dateInstalled = factor(no.missyear$dateInstalled)
boxplot(no.missyear$length ~ no.missyear$dateInstalled, main="Boxplots of Bike Lenght by Year", xlab="Year", ylab="Bike Length")

```

What does it look like if we took the log (base 10) of the bike length:

```{r}
no.missyear$log.length <- log10(no.missyear$length)
### see here that if you specify the data argument, you don't need to do the $ 
boxplot(log.length ~ dateInstalled, data=no.missyear, main="Boxplots of Bike Lenght by Year", xlab="Year", ylab="Bike Length")
```

I want my boxplots colored, so I set the `col` argument.

```{r}
boxplot(log.length ~ dateInstalled, data=no.missyear, main="Boxplots of Bike Lenght by Year", xlab="Year", ylab="Bike Length", col="red")
```

As we can see, 2006 had a much higher bike length.  What about for the type of bike path?

```{r}
### type is a character, but when R sees a "character" in a "formula", then it automatically converts it to factor
### a formula is something that has a y ~ x, which says I want to plot y against x
### or if it were a model you would do y ~ x, which meant regress against y
boxplot(log.length ~ type, data=no.missyear, main="Boxplots of Bike Lenght by Year", xlab="Year", ylab="Bike Length")
```

What if we want to extract means by each type?

Let's show a few ways:
```{r}
### tapply takes in vector 1, then does a function by vector 2, and then you tell what 
### that function is
tapply(no.missyear$log.length, no.missyear$type, mean)

## aggregate
aggregate(x=no.missyear$log.length, by=list(no.missyear$type), FUN=mean)
### now let's specify the data argument and use a "formula" - much easier to read and 
## more "intuitive"
aggregate(log.length ~ type, data=no.missyear, FUN=mean)

## ddply is from the plyr package
##takes in a data frame, (the first d refers to data.frame) 
## splits it up by some variables (let's say type)
## then we'll use summarise to summarize whatever we want
## then returns a data.frame (the second d) - hence why it's ddply
## if we wanted to do it on a "list" thne return data.frame, it'd be ldply
ddply(no.missyear, .(type), summarise,
      mean=mean(log.length)
      )
```


`ddply` (and other functions in the `plyr` package) is cool because you can do multiple functions really easy.   

Let's show a what if we wanted to go over `type` and `dateInstalled`:
```{r}
### For going over 2 variables, we need to do it over a "list" of vectors
tapply(no.missyear$log.length, 
       list(no.missyear$type, no.missyear$dateInstalled), 
       mean)

tapply(no.missyear$log.length, 
       list(no.missyear$type, no.missyear$dateInstalled), 
       mean, na.rm=TRUE)

## aggregate - looks better
aggregate(log.length ~ type + dateInstalled, data=no.missyear, FUN=mean)

## ddply is from the plyr package
ddply(no.missyear, .(type, dateInstalled), summarise,
      mean=mean(log.length),
      median=median(log.length),
      Mode=mode(log.length),
      Std.Dev=sd(log.length)
      )
```


OK let's do an linear model
```{r}
### type is a character, but when R sees a "character" in a "formula", then it automatically converts it to factor
### a formula is something that has a y ~ x, which says I want to plot y against x
### or if it were a model you would do y ~ x, which meant regress against y
mod.type = lm(log.length ~ type, data=no.missyear)
mod.yr = lm(log.length ~ factor(dateInstalled), data=no.missyear)
mod.yrtype = lm(log.length ~ type + factor(dateInstalled), data=no.missyear)
summary(mod.type)
```

That's rather UGLY, so let's use a package called `xtable` and then make this model into an `xtable` object and then print it out nicely.  

```{r}
### DON'T DO THIS.  YOU SHOULD ALWAYS DO library() statements in the FIRST code chunk.
### this is just to show you the logic of a report/analysis.
require(xtable)
# smod <- summary(mod.yr)
xtab <- xtable(mod.yr)
```

Well `xtable` can make html tables, so let's print this.  We must tell R that the results is actually an html output, so we say the results should be embedded in the html "asis" (aka just print out whatever R spits out).
```{r, results='asis'}
print.xtable(xtab, type="html")
```

OK, that's pretty good, but let's say we have all three models.  Another package called `stargazer` can put models together easily and pritn them out.  So `xtable` is really good when you are trying to print out a table (in html, otherwise make the table and use `write.csv` to get it in Excel and then format) really quickly and in a report.  But it doesn't work so well with *many* models together.  So let's use stargazer.  Again, you need to use `install.packages("stargazer")` if you don't have function.  

```{r}
require(stargazer)
```


OK, so what's the difference here?  First off, we said results are "markup", so that it will not try to reformat the output.  Also, I didn't want those # for comments, so I just made comment an empty string "". 

```{r, results='markup', comment=""}
stargazer(mod.yr, mod.type, mod.yrtype, type="text")
```


## Data Extraction
Let's say I want to get data INTO my text.  Like there are N number of bike lanes with a date installed that isn't zero.  There are `r nrow(no.missyear)` bike lanes with a date installed after 2006.  So you use one backtick ` and then you say "r" to tell that it's R code.  And then you run R code that gets evaulated and then returns the value.  Let's say you want to compute a bunch of things:

```{r computes}
### let's get number of bike lanes installed by year
n.lanes = ddply(no.missyear, .(dateInstalled), nrow)
names(n.lanes) <- c("date", "nlanes")
n2009 <- n.lanes$nlanes[ n.lanes$date == 2009]
n2010 <- n.lanes$nlanes[ n.lanes$date == 2010]
getwd()
```

Now I can just say there are `r n2009` lanes in 2009 and `r n2010` in 2010.  

```{r}
fname <- file.path(data.dir, "Charm_City_Circulator_Ridership.csv")
## file.path takes a directory and makes a full name with a full file path
charm = read.csv(fname, as.is=TRUE)

library(chron)
days = levels(weekdays(1, abbreviate=FALSE))
charm$day <- factor(charm$day, levels=days)
charm$date <- as.Date(charm$date, format="%m/%d/%Y")
cn <- colnames(charm)
daily <- charm[, c("day", "date", "daily")]

```


```{r}
charm$daily <- NULL
require(reshape2)
long.charm <- melt(charm, id.vars = c("day", "date"))
long.charm$type <- "Boardings"
long.charm$type[ grepl("Alightings", long.charm$variable)] <- "Alightings"
long.charm$type[ grepl("Average", long.charm$variable)] <- "Average"

long.charm$line <- "orange"
long.charm$line[ grepl("purple", long.charm$variable)] <- "purple"
long.charm$line[ grepl("green", long.charm$variable)] <- "green"
long.charm$line[ grepl("banner", long.charm$variable)] <- "banner"
long.charm$variable <- NULL

long.charm$line <-factor(long.charm$line, levels=c("orange", "purple", 
                                                   "green", "banner"))

head(long.charm)

### NOW R has a column of day, the date, a "value", the type of value and the 
### circulator line that corresponds to it
### value is now either the Alightings, Boardings, or Average from the charm dataset
```

Let's do some plotting now!
```{r plots}
require(ggplot2)
### let's make a "ggplot"
### the format is ggplot(dataframe, aes(x=COLNAME, y=COLNAME))
### where COLNAME are colnames of the dataframe
### you can also set color to a different factor
### other options in AES (fill, alpha level -which is the "transparency" of points)
g <- ggplot(long.charm, aes(x=date, y=value, color=line)) 
### let's change the colors to what we want- doing this manually, not letting it choose
### for me
g <- g + scale_color_manual(values=c("orange", "purple", "green", "blue"))
### plotting points
g + geom_point()
### Let's make Lines!
g + geom_line()
### let's make a new plot of poitns
gpoint <- g + geom_point()
### let's plot the value by the type of value - boardings/average, etc
gpoint + facet_wrap(~ type)
```

OK let's turn off some warnings - making `warning=FALSE` as an option.
```{r, warning=FALSE}
## let's compare vertically 
gpoint + facet_wrap(~ type, ncol=1)

gfacet = g + facet_wrap(~ type, ncol=1)
## let's smooth this - get a rough estimate of what's going on
gfacet + geom_smooth(se=FALSE)
```

OK, I've seen enough code, let's turn that off, using `echo=FALSE`.
```{r, echo=FALSE, warning=FALSE, fig.width=10, fig.height=5}
#### COMBINE! - let's make the line width bigger (lwd)
### also making the "alpha level" (transparency) low for the point sos we can see the lines
g + geom_point(alpha=0.2) +  geom_smooth(se=FALSE, lwd=1.5) + facet_wrap( ~ type)


```