FrontPage


Welcome to the Modern Graphics 2011 wiki  --- 

     Bob Pruzek, University at Albany, SUNY  (NB: WE MEET IN THE APPLE LAB, B-13, SOE, NOT A & S bldg.)

Office Hours: Before class from 3:30, after class, or Wed pms after 3 pm. Also by appointment, and via phone or electronic communication.)

   There are now two Waller articles in the Files, both 2008; Waller and Jones, posted by J.B., and Waller-fungible, posted by bp; they are closely related,

but the one I had been thinking of in the last class was the second one. Go to the Upload Files page to get them. The encycl. article on exploratory f.a. is here: beh.science.encycl05.fa.expl.rp.pdf . Have a look when you can, but this will not be part of the Exam. bp

Here is the final Exam: EPSY887 Final.Spring 2011.pdf Please ask questions of a general kind on the Comments page, those that seem specific or personal using email. Return no later than one week from today, May 9 at 5 pm. Good Luck. BP


                                                  

I hear, and I forget
I see, and I remember
I do, and I understand.
                         Anon.

 

This wiki is aimed at facilitating effective use of modern graphical and visualization methods, especially for research in the educational and behavioral sciences. Since this is a wiki, it is intended that readers will not only acquire information and files from it, but that they will also edit and make their own contributions. Feel free to invite others to use this workspace with you. Collaboration is essential.

 

Three principal forms of information will be provided: URLs, Files for download, and the products of conversations among users as supported at this site. As in the course to follow (see below) most of the emphasis here is on graphics created in R. The reason is that R provides excellent supports graphics of many kinds, and the software is free. Indeed, it appears that R is now the defacto standard platform for statisticians worldwide. You can start to learn R in many ways, but one is to use another wiki: epysm08learnR.pbworks.com.  In due course, additional URLs and files will be added here, and of course the value of the online conversations will in large measure derive from the questions you ask and are answered ... by whomever.

 

Assignments/Readings:(...this section will be updated regularly, so visit it often)

The URL for the Dependent Sample paper is http://www.amstat.org/publications/jse/v17n1/helmreich.html . Note that you once you open R you can copy and paste the data from our appendix into R, which is easily done. If the package 'psych' has been loaded*, then just type read.clipboard( ), and as long as what you copied includes the header, you should get the whole enchilada. Ask questions if you have problems. (Please read this paper at least twice before next Thursday, where the second time will be w/ R open and being tried to replicate analyses you read about. *which means after you've run the 'install.pages' line below.

In addition, the 1-way anova and dependent sample sections of the pdf listed at the top of MOST RECENT (below) should be read before Thursday. I shall be posting an edited, and slightly expanded, version of last week's Optional Exercise for Thursday's (Feb 17th) first assignment. Because no student seemed to have started that work, I'd like you to begin with that if you want to get started sooner than my post (about 3 pm today).  FINALLy, it is here: Feb17Homework.pdf . I'll leave it to you to go to the upload page to get the two bootstrapping documents.   

Added: I just uploaded a pdf that will be of interest to those of you w/ a theoretical bent, those who want to know how the normal (z), t, F and chi square distributions relate to one another. This was given to me by my colleague Professor Bruce Dudek (Psych Dept), for use by you! bDudek.RelationsAmngDistrbs.pdf Aim to read before class the two page pdf, here: Essence of bootstrapping.pdf

For our March 3 class, I'd like you to read (two or three times?) ANOVAplannedContrsts&More.pdf and then try to begin to use the granova.contr function as described there. Please POST YOUR QUESTIONS as you read!  New: See item M. under Files Available below. Strongly recommended.

Re: Latter part of class tomorrow (March 3), I shall introduce some basic regression concepts, methods and of course graphics. Files soon to be available.

For March 10 class, do the exercises here: ExercisesOnRegression.pdf . Note: I have now added the last part while keeping the name the same.

For March 18 class, please read each of these: IntroLogisticRegressionPengEducResearch.pdf ; PropensityScoreAnalysisNotesRP11#.pdf ; RubinpsaexpositPSA.pdf  You will note that they have all been placed in the folder ArticlesPSA. You may also want to begin examining this: PSAintroBP.Mar11.ppt.pdf

ND ASK QUESTIONS, for everyone's sake. On this point, I numbered all the paragraphs in my PSANotes.. document here to facilitate your reference to parts of it. Let me know if that helps. b

re: March 24 Assignment: Apparently I was not clear, as more than one of you have asked questions about what I meant. My intention was that each of you would follow my example involving several different analyses, and graphic constructions, in reanalyses of the birthwt data from the MASS library, but that each of you would choose a different logistic regression model -- which is to say the specification for y ~      in the glm statement -- so that each of you would (probably) be producing a vector of propensity scores that differed from all others. The second Phase of the analysis will of course be different too for each of you too as the PS's are central to the response data analyses, and graphics. I also said that if you wanted to, and had other data that would be SUITABLE for PSA that you were (in principle) welcome to use the other data, but I wanted you to confer w/ me on your choice before investing much in analyses of different data. Note that it is not enough for you to say, "I found some other data, and want to use it." You need to persuade me (and yourself) that the alternative data are appropriate for such study; this means that there must be a 'reasonable set' of covariates to account for what can be argued is 'much if not most' of the selection bias that is associated w/ the (binary) treatment variable, and that there is a reasonable response variable. Details about the data are therefore essential before you begin! Always. Let me know what you are doing -- and if you would like me to comment at the early stages on your approach to PS estimation, I will of course be happy to do that. bob 

 

I shall go over this IllustrationMatchingbwtdata.doc in the March 24 class (to the extent to which we have time); Jason will present for about the last hour. Link to Elizabeth A. Stuart's page, also: Stuart, E.A. (2010). Matching Methods for Causal Inference: A review and a look forward. Statistical Science 25(1): 1-21. [PDF from Dr. Stuart's page] [Journal page URL].

Please, before our March 31 class, begin to study the Sarkar pdf, including some trials w/ his code in: Sarkar08latticeLab.pdf  

 

Re: MISSING DATA, I've posted two items, the first of which I'd like everyone to have read by Thursday's class (4/14): schafer_graham_MissingDataPsychMethods02.pdf . The second is a (lengthy) pdf, based on a talk last year by Prof. R. Yucel (Biostats): Prof.RecaiYucel.MissingDataSlides2011.pdf . Note that the second contains some ways to use R for missing data problems, but that info comes at the very end. Here is a primer on missing data and multiple imputation, published in 2007http://circoutcomes.ahajournals.org/content/3/1/98.full

  Mea Culpa: I thought, in my edits of this 1st page, that I'd made these changes on this page after I uploaded the Shafer-Graham article on Monday, but I forgot to check to see that they were properly saved... and they were not. So do the best you can in the time you now have. Some of you, I hope, will have seen the Shafer-Graham article earlier this week... indeed I hope you've been reading it. Also see Missing DataOverview+Sources.pdf .

      I trust you have all been working through at least some of the content in the Sarkar/lattice pdfs loaded last week. Please bring your questions, and if you can, some examples of your use of Sarkar's code to get lattice plots, with data of your own choosing. bp 

        For our last class (!), I want to spend a little time on matrix operations relevant to data analysis. To help this effort as much as possible, after you have reviewed and asked questions, the material on missing data, we shall get into the details of Basic.Matrix.Ops.correl.regrsn11.pdf  and at least briefly, Details4L2X-AlsoEigen+SVD.11.pdf . You will see there are exercises in each of these too.  

Added 4/27. Since I know you have all read all pdf's related to missing data (yes?) and have few questions on that topic (yes?), and since tomorrow is our last class, w/ one big topic remaining (cf. the last two items above this line), I've decided tomorrow to restrict attention to missingness by ONLY answering your m.d. questions (briefly). Be prepared to go over the Basic.Matrix... document first, and then the one that follows (if you have time to read, while at the console).  b

 

I've posted two new pdfs (4/6), one on Indicator matrices, the other on Correspondence Analysis, Illustrated, with comments. I ask you to read the Ind. Matrices document before Thursday's class, and as much of what has been posted for the past week as well. The new pdfs are: Indicator matrices.pdf  and IllustrationsCAw-comments.pdf. b

 

Also see these intros to Correspondence Analysis, and Mosaics

 

 

Mosaic Plots (emphasis on package 'vcd' and 'vcdExtra'): 

 

 

I uploaded an electronic copy of Thoemmes & Kim (2011) A systematic review of propensity score methods in the social sciences

 

I will present a two-way ANOVA using the STAR dataset (probably worth reading the homepage here: http://www.heros-inc.org/star.htm). This dataset was the result of longitudinal experimental study conducted in Tennessee to examine the effects of class size. The data is freely available and can be downloaded here: http://www.heros-inc.org/data.htm You can download my R script file here: http://moderngraphics11.pbworks.com/w/file/36285605/Star.R and the data file here: http://www.heros-inc.org/starDataFiles/STAR_Students.sav

 

 

Comments and Questions here
    Click to move to that page.

 

You can see the Homework Assignment for February 3 here: Feb3homework.pdf .  And note that the Syllabus is here: Syllabus for EPSY 887.Spring11.pdf

NOTE: I have written a small function for creating dependent sample data (2 columns) here: dep.dat.txt

I shall post the homework asap on Saturday. But for the nonce, see the recently posted Guidelines for Homework Guidelines4Homework.modGraphics.pdf

Homework for Feb10 is here: Feb10.HomworkMGs.pdf ; also see, the blood lead handout: BloodLead.handout-2ways.pdf

Added: Because you should have at least one credible source on TWO way ANOVA, see this: Chapter09.schwartz.simfrazu.pdf  Read by Thursday, perhaps partly as a reference for your two way run. Also: see ONE WAY ANOVA+introContrasts11.pdf and 07_Factorial1.pdf which is a more comprehensive reference concerning 2 way ANOVA, well worth further study.

 

     ---------------------------------------------------------------------------------------------------------

NOTE: I've started a category, MOST RECENT, near the bottom, to direct your attention to latest additions to this Wiki. And I entered the pdf (concerning what I call 'elemental graphics') for a new paper that I just submitted for publication (w/ a coauthor) that I would be pleased to have you all to read (and if you would, respond) by the time classes start. I'll use this color to help identify items in this category.

 

NOTE2 (with new Addendum!): Since all students have 'writing privileges' for this wiki, you are welcome to make whatever edits, and especially additions, that you care to make. Once you have logged in you can go to the Upload Files page, and input files of your own choosing. I suggest that if files are 'large' that you contact me before doing so, as we have only 2 Gb of space for this semester, and -- especially given the sizes of many documents with graphics -- large sizes may be common. I have also begun to use folders (let me do that); most recently, I put up two files that provide instructional information to help you learn R (in directory learnR), one by Venables and Smith, the other by a biostatistician at Vanderbilt University, Theresa Smith. Everyone can profit from reading these, so I trust you will download and use them often. 

Addendum Just today I acquired (based on purchase) the pdf of a long and useful book about R, viz., R_in_a_Nutshell. From the information in the book you can see how to purchase your own copy; but you may want to download the book and use this copy in the meantime. Chapters 12 - 20 are especially relevant for us (but I want to emphasize that I see R as instrumental for this course; EPSY887 should NOT be seen as 'a course about R', even though R will often be used. (I expect, given its size, to delete the Nutshell book in about 2 weeks, so don't hesitate to take a look or download soon.)

 

NOTE3: For those of you who are preparing your laptops with R in this course, I have some suggestions for getting started:

First, upload the most recent version of R (2.12.1 at r-project.org) if you have not done so (was posted in Dec. last year). Then install these packages, perhaps not all at once, but in the next week or two: MASS, Hmisc, BHH2, TeachingDemos, rgl, granova, PSAgraphics, YaleToolkit, ggplot2, ISwR,sos, psych, foreign, nutshell, vcd, car, ellipse, cluster, plyr, rpart, doBy, corrgram, UsingR, RSiteSearch, mi, session, bootstrap, boot. It will help if you can bring your laptop to each class (but Macs are virtually identical for R use, and these will be available for you).   

    Here is the R code to install all these packages (and dependencies) from the main R repository (copy, and paste to your R session):

install.packages(c('MASS','Hmisc','BHH2','TeachingDemos','rgl','granova','PSAgraphics', 'YaleToolkit', 'ggplot2','ISwR','sos', 'psych','foreign','nutshell','vcd','car','ellipse','cluster','plyr', 'rpart', 'doBy', 'corrgram', 'UsingR', 'RSiteSearch','mi','session', 'bootstrap', 'boot'), repos="http://cran.r-project.org", dependencies=TRUE)

 

NOTE4: I've developed some ideas I'd like you to think about (and act on); download here: Suggestions2AidLearning.pdf

 

Here are several potentially helpful URLs:

 

1. http://addictedtor.free.fr/graphiques/thumbs.php?sort=votes

    Hundreds of graphics here; study at least a few of them and note that the (R) code for constructing each is also included.

 

2. http://pfp7.cc.yamaguchi-u.ac.jp/~ichikawa/iv/index.html Information visualization and visualization techniques

 

3. http://www.imagemagick.org/script/index.php Software you may just want to try. It, like R, is free, and seems quite mature.

 

4. http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR See this for intriguing graphics, also Tufte's homepage, next

 

4a. http://www.edwardtufte.com/tufte/index Scroll down to see several excellent graphics

 

5. http://www.stat.auckland.ac.nz/~ihaka/courses/787/slides.html Ross Ihaka's website, see below (Ihaka is, with Robert Gentleman, the originator of R in the 90's).

 

6. http://had.co.nz/ggplot/ Main webpage for Hadley Wickham’s ggplot2 package in R, see book and ppt below, (A and B)

 

6a. http://www.slideshare.net/hadley/19-critique?src=related_normal&rel=1657864  General, but useful in some ways. Not technical; the good news is that it is Hadley Wickham's work.

 

7.  http://www.stat.ucla.edu/~jeroen/ggplot2/ Youtube intro to ggplot2 graphics

 

7a.  http://www.yeroon.net/ Several videos Yeroon has developed to show R applications

 

8.  http://www.r-bloggers.com Large and growing site, with daily editions; archives available (Add others here as you find them useful!)

 

9.  http://www.statmethods.net/advgraphs/interactive.html Part of the Quick-R pages, see others there as well

 

10.  http://www.math.yorku.ca/SCS/StatResource.html#DataVis Excellent (international) source for Visualization and Graphics. Spend some time here; back down to the html as well.

 

11.  http://openmx.psyc.virginia.edu/  This is quite new; an add-on for R to support structural equation modeling [SEM] (there is a forum too, w/ lots of help available).

 

12.  www.blender.org Blender is a free open source 3D content creation suite that works w/ all OS’s, facilitates modeling, shading, animation, rendering, composites; also interactive.

 

13.  http://www-958.ibm.com/software/data/cognos/manyeyes/ Nice graphics for you to try. Can support text analysis, and dynamic graphics. This the new URL.

 

14. http://www.stat.uiowa.edu/~luke/classes/295-vis/   Some great stuff here on Luke Tierney’s website; he’s an R guru, worth checking out.

 

15a. http://www.dataspora.com/blog/ This blog is by Michael Driscoll, a major advocate, developer and user of R. The following two items show what he can do; I think these are quite good. 

 

15b. http://www.slideshare.net/dataspora/a-survey-of-r-graphics Here's a survey of graphics for R. Two years old, but good for its time, and likely to be useful for many.

 

15c. http://www.slideshare.net/dataspora/an-interactive-introduction-to-r-programming-language-for-statistics Same author as for 15b; be sure to see other slideshows on right side of Driscoll's screen.

 

16. http://goo.gl/1NRiV Alex showed that a short URL gets us to the same place (thanks Alex). World poverty and graphics is the topic: a dynamic plot. More on this later.

 

17. https://oli.web.cmu.edu/jcourse/workbook/activity/page?context=90e061a380020ca600b998f05be1e8aa&view=frameset Carnegie Mellon University offers Free courses, some on data analysis and/or statistical reasoning. Includes some graphics too.

 

18.  URLs representing visualization techniques in eye-tracking: (a) interactive minds GmbH, (b) Metrovision, (c) Applied Science Laboratories(on YouTube), (d) Cambridge Research Systems, (e)El-Mar Inc., (f) Ergoneers GmbH, (g) EyeTracking, Inc., (h) ISCAN, Inc. on YouTube, (i) Mangold International GmbH, (j) Mirametrix (on YouTube), (k) Tobii Technology AB (on YouTube), (l) iMotions (on YouTube), (m) OGAMA. Additional URLs related to eye-tracking and visualization: (a) Visual Directed Browsing (thanks Sandy), (b) Emotion Response Analysis (ERA) through EEG (SimpleUsability on YouTube), (c) Project for the Department of Homeland Security, (d) EyeTube, (e) The StarGazer experimental typing system, (f) GazeTalk 5 eye communication system, (g) using similar principles: Head Tracking for iPad: Glasses-Free 3D Display (on YouTube) - Alex T.  

 

19. Here is a nice blog post on various ways of creating pairwise plots: http://r-ecology.blogspot.com/2011/03/for-all-your-pairwise-comparison-needs.html  -Jason

 


Next, we make files in this wiki available for download. Just click. (These, and many others, are on a separate page on this wiki. Let me know if you need access)

 

A.  Wickham.ggplot2.2010basics.pdf  An introduction to ggplot2, one of the most impressive recent additions to R for graphics, written by the author of the package and the book (next).

 

B.  ggplot2-Book09hWickham.pdf Professor Wickham made this pdf available at his website not long before his book was published less than a year ago.

 

C.  Ihaka-presentn-graphics.ppt.pdf Ross Ihaka was one of the originators of R, roughly 15 years ago. This file is from his university website.

 

D.  Ihaka.high-dimensional.Graphics.pdf Counterpart of the preceding, but it goes into larger questions or problems

 

E.  Graphics courseUCLA.codeToo.0125.pdf Intermediate level material, assumes some familiarity with R (not too much tho')

 

F.  sarkar.Lattice09.pdf Lattice is a superb R package that has already been used in various ppt/pdfs above. It has become invaluable to many users.

 

G.  R-GraphicsShortCourse08.ppt.pdf Georges M. (U.K.) with especially interesting material. Code included in the next URL.

 

H.  R-GraphicsGeorgesM.ScriptCigConsumData.R This file will make no sense except in relation to G. above.

 

I.  wilkinson_1999.DotPlots.pdf Not everything graphical is R-related. Here is a helpful article on dotplots; other articles follow.

 

J.  feeney_etal_2000.pdf a book chapter, How People Extract Information from Graphs... results from an illustrative experiment.

 

K.  cumming_finch.Conf.Intervals_2005.pdf Not just graphics, but confidence intervals, and about their interpretation

 

L.  friendly.ExtendingMosaicPlots_99.pdf Article by a psychologist who has developed several effective methods for displaying data.

 

M. Lumley06.225pages.R-fundamentals.pdf Lengthy pdf derived from a ppt presentation; 4 years old, but excellent material; Lumley is R core team man.

 

N.  statgraphicsReferences.BD.Oct_2010_2up.pdf An especially useful listing of a large number of references to topics graphical, with special relevance to psychology and social sciences. Most of these pdfs will be available for download, thanks to the efforts of BD. (BD refers to Professor Bruce Dudek, SUNYA Psych Department)

 


MOST RECENT:

a. ElementalGraphics4ANOVA.RP+JH.pdf  This paper was recently sent off for publication (J. of Statistics Education), and will be revised at least once before being published. It would be timely to receive feedback that might improve (or correct?) parts of it, so I solicit your feedback (preferably by early January to fit w/ our revision schedule). Thank you. BP (Related to granova is a YouTube video: http://www.youtube.com/watch?v=HhOjDn32GIc - Alex T.)

 

b. cleveland&mcgill_1984b.pdf This is one of the first papers I'd like you to read (and use as a starting point for exercises) in this class. The article appeared more than 25 years ago when executing scatterplots the authors illustrate was difficult; but with R it is relatively easy to do virtually everything they show 'rather easily.' I've used quotes here because knowledge of R commands and functions is required to do these things. For those wanting to begin early (and start to do what I shall be spelling out in the syllabus next week) you may find it useful to begin to construct plots that (in form) mirror what Cleveland and McGill show. (Don't try to do them all yet; better to read/study other sources, and edit a. above, if time permits.) Beyond the plot function (R base), several other packages/functions are likely to be helpful: see #1 above (addicted2r site) for examples AND R code that will be most helpful. BP

 

c. http://mind42.com/pub/mindmap?mid=c4e081c4-46c5-4c99-b3f3-04138a3a67a2 This site provides a 'map' for learning R; it may have some value to those who are still learning basic R functionality. Let me know if it was useful to you or not, and why. BP

 

d. Re: textbook for this course: book by William Cleveland (1993). Although I do not expect to use this book for more than about 1/3 of this course, no other book goes as far as this one does in providing basic information I want to emphasize in the class. Purchase is not necessary (about $50 from Amazon, but only 3 copies there), but it can be rented, e.g. $15 for 130 days from http://www.collegebookrenter.com/details.cfm/isbn/0963488406 [isbn is last part].  (You may want to share with one another; order asap.) I (Alex) got the book from collegebookrenter.com 3 business days after ordering it. It shipped from Little Rock, AL via USPS Priority Mail (2-3 day estimate). Condition is like new (crispy pages, clean, no dust, etc.), except someone made highlights on a few of pages, otherwise great. - Alex. T

RE: Textbook rental - I tried to rent the book from College Book Renter - they processed my order & then I just received an email saying they cancelled my order because "they were unable to fill it" - I ordered it on Amazon today for $50 - Catherine

To see first few pages go to: http://www.amazon.com/Visualizing-Data-William-S-Cleveland/dp/0963488406#reader_0963488406

  Book summary: Visualizing Data is about visualization tools that provide deep insight into the structure of data. There are graphical tools such as coplots, multiway dot plots, and the equal count algorithm. There are fitting tools such as loess ... that fit equations, nonparametric curves, and nonparametric surfaces to data. But the book is much more than just a compendium of useful tools. It conveys a strategy for data analysis that stresses the use of visualization to thoroughly study the structure of data and to check the validity of statistical models fitted to data. The result of the tools and the strategy is a vast increase in what you can learn from your data. The book demonstrates this by reanalyzing many data sets from the scientific literature, revealing missed effects and inappropriate models fitted to data.  

 

Here is a link to William Cleveland's personal website on Bell Labs: http://stat.bell-labs.com/wsc/ From there you can download the datasets and S code (often perfectly compatible with R) to generate all the graphics in Visualizing Data.

 

Some additional resources for learning R:

 

          UPDATE2 (bp): while I really like this book (R in Action), I found that it cost $44 to get; it is worth that, but what I'm trying to do is make the book available (but not for download) online. It is probably even better (for you students) than R_in_a_Nutshell, so wait a bit for closure on this unless you have the $$ to spare. It IS a fine book on both R and statistics.

Working With R: Text Editors and IDEs

 

R on both Windows and Mac includes a reasonable text editor. However, if you are going to be working with R a lot, I would suggest trying out some other programs to edit your R script files. Many of these are considered IDEs (Integrated Development Environment) which just means they are designed to provide all the features useful for program development. Most importantly, these provide nice syntax highlighting and lots of additional features that help with manipulating text files (e.g. tabbed browsing, superior text searching and replacing, etc.)

 

 

Data Sources

These are links to some freely available data sources.

 

 

Sweave & LaTeX

I uploaded a PDF of a presentation I gave about R, LaTeX, and Sweave. I also have some resources on a page here: http://bryer.org/2009/latex-and-sweave. Lastly, I uploaded a PDF showing how to create a ggplot version of Bob's loess.psa graphic. The source Rnw file is also available.

  

The catalog description of this course follows:


EPSY 887 Modern graphics for social science research

Graphical and related methods for applications: modern fixed and dynamic graphics and related methods; tools for analysis and interpretations of data. Major goals will be to tailor the course to needs of students for analysis and presentation of data, to help ensure operational skills as well as sound understanding of modern methods for both analysis and graphics to facilitate communication. Software applications, especially R, will be emphasized, taking advantage of new vehicles (youtube, web videos, cloud computing, etc.) to facilitate analysis and display. Prerequisites: EPSY 630 or equivalent. (3 credits)   Thursdays 4.15 – 7.05