Trabalhando com uma tabela no R [Parte II] / Working with a table on R [Part II]

Faaaaala cientista!
Hoje vamos continuar nossa série de posts sobre tabelas no R. No primeiro post falamos sobre tabelas de extensão .xls ou .xlsx; hoje vamos trabalhar com arquivos .csv (valores separados por vírgula). Vamos lá?

Hey scientist!
Today we’ll continue our series of posts about tables on R. On the first post we talked about tables of extension .xls or .xlsx; today we’ll work with .csv (comma separated values) files. Let’s go!

Arquivos .csv geralmente contém texto puro e podem ser abertos em um editor de texto qualquer. Duas células representando valores diferentes são separadas por vírgula, daí o nome.

.csv files usually contains pure text and can be open in any text editor. Two cells representing different values are separated by commas, hence the name.

Nesse post vamos usar a Lahman’s Baseball Database, que contém “(…) complete batting and pitching statistics from 1871 to 2014, plus fielding statistics, standings, team stats, managerial records, post-season data, and more”. Esses dados são distribuídos em um pacotinho .zip contendo vários arquivos .csv. Aqui vamos usar o arquivo de nome Salaries.csv, porque… dinheiro é demás! hahaha

In this post we’ll use the Lahman’s Baseball Database, which contains “(…) complete batting and pitching statistics from 1871 to 2014, plus fielding statistics, standings, team stats, managerial records, post-season data, and more”. These data are distributed in a .zip little package containing several .csv files. Here we’ll use the file Salaries.csv, because… money is awesome! hahaha

Trabalhar com o arquivo .csv é mais simples que arquivos .xls. Basta usamos a função read.csv() / Working with .csv files is easier than .xls files. We just have to use read.csv():

sal = read.csv('Salaries.csv',header=TRUE,sep=',')

Então informamos o nome do arquivo, se queremos ou não o cabeçalho (nesse caso, queremos; header=TRUE) e o separador das células (no nosso caso, a vírgula; sep=’,’). A variável sal recebe a informação.

Then we give the filename, if we want the header or not (in this case we do; header=TRUE) and the cell separator (in our case, the comma; sep=’,’). The sal variable receives the information.

Para ver as variáveis que foram importadas em sal, utilizamos names() / To check the variables imported in sal, we use names():

names(sal)
[1] "yearID"   "teamID"   "lgID"     "playerID" "salary"  

Vamos ver também o topo da tabela usando o comando head() / Let’s also see the top of the table using head():

head(sal)
  yearID teamID lgID  playerID salary
1   1985    ATL   NL barkele01 870000
2   1985    ATL   NL bedrost01 550000
3   1985    ATL   NL benedbr01 545000
4   1985    ATL   NL  campri01 633333
5   1985    ATL   NL ceronri01 625000
6   1985    ATL   NL chambch01 800000

Pra ver o fim da tabela, usamos tail() / To see the end of the table, we use tail():

tail(sal)
      yearID teamID lgID  playerID   salary
24753   2014    WAS   NL stammcr01  1375000
24754   2014    WAS   NL storedr01  3450000
24755   2014    WAS   NL strasst01  3975000
24756   2014    WAS   NL werthja01 20000000
24757   2014    WAS   NL zimmejo02  7500000
24758   2014    WAS   NL zimmery01 14000000

Se quisermos ver os salários apenas de 2010, podemos fazer assim / If we’d like to see only the 2010 salaries, we can make this:

sal[sal$yearID==2010,]

E pra ter algumas informações sobre os dados, usamos summary() / And to get some info about the data, we use summary():

summary(sal)
     yearID         teamID      lgID            playerID         salary        
 Min.   :1985   CLE    :  893   AL:12123   moyerja01:   25   Min.   :       0  
 1st Qu.:1993   LAN    :  893   NL:12635   vizquom01:   24   1st Qu.:  260000  
 Median :2000   PHI    :  893              glavito02:   23   Median :  525000  
 Mean   :2000   SLN    :  886              bondsba01:   22   Mean   : 1932905  
 3rd Qu.:2007   BAL    :  883              griffke02:   22   3rd Qu.: 2199643  
 Max.   :2014   BOS    :  883              thomeji01:   22   Max.   :33000000  
                (Other):19427              (Other)  :24620    

É isso aí cientistas! Teste as outras tabelas contidas no arquivo .zip! Use outros parâmetros e comente conosco os resultados!
Um giga abraço! Até semana que vem!

That’s all scientists! Try to use the other tables contained in the .zip file! Use another parameters and tell us your results!
Gigaregards! See you next week!


Gostou? Curta e compartilhe com seus amigos!
Curta a gente também no Facebook: www.facebook.com/programandociencia
Estou no Twitter! Siga-me se puder! @alexdesiqueira

Like this? Please comment and share with your friends!
Like us also on Facebook: www.facebook.com/programandociencia
I’m on Twitter! Follow me if you can! @alexdesiqueira

2 thoughts on “Trabalhando com uma tabela no R [Parte II] / Working with a table on R [Part II]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s