-
If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
-
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!
|
FrontPage
Page history
last edited
by bob pruzek 13 years, 4 months ago
Welcome to the Modern Graphics 2011 wiki ---
Bob Pruzek, University at Albany, SUNY (NB: WE MEET IN THE APPLE LAB, B-13, SOE, NOT A & S bldg.)
Office Hours: Before class from 3:30, after class, or Wed pms after 3 pm. Also by appointment, and via phone or electronic communication.)
There are now two Waller articles in the Files, both 2008; Waller and Jones, posted by J.B., and Waller-fungible, posted by bp; they are closely related,
but the one I had been thinking of in the last class was the second one. Go to the Upload Files page to get them. The encycl. article on exploratory f.a. is here: beh.science.encycl05.fa.expl.rp.pdf . Have a look when you can, but this will not be part of the Exam. bp
Here is the final Exam: EPSY887 Final.Spring 2011.pdf Please ask questions of a general kind on the Comments page, those that seem specific or personal using email. Return no later than one week from today, May 9 at 5 pm. Good Luck. BP
I hear, and I forget I see, and I remember I do, and I understand. Anon.
|
This wiki is aimed at facilitating effective use of modern graphical and visualization methods, especially for research in the educational and behavioral sciences. Since this is a wiki, it is intended that readers will not only acquire information and files from it, but that they will also edit and make their own contributions. Feel free to invite others to use this workspace with you. Collaboration is essential.
Three principal forms of information will be provided: URLs, Files for download, and the products of conversations among users as supported at this site. As in the course to follow (see below) most of the emphasis here is on graphics created in R. The reason is that R provides excellent supports graphics of many kinds, and the software is free. Indeed, it appears that R is now the defacto standard platform for statisticians worldwide. You can start to learn R in many ways, but one is to use another wiki: epysm08learnR.pbworks.com. In due course, additional URLs and files will be added here, and of course the value of the online conversations will in large measure derive from the questions you ask and are answered ... by whomever.
Assignments/Readings:(...this section will be updated regularly, so visit it often)
The URL for the Dependent Sample paper is http://www.amstat.org/publications/jse/v17n1/helmreich.html . Note that you once you open R you can copy and paste the data from our appendix into R, which is easily done. If the package 'psych' has been loaded*, then just type read.clipboard( ), and as long as what you copied includes the header, you should get the whole enchilada. Ask questions if you have problems. (Please read this paper at least twice before next Thursday, where the second time will be w/ R open and being tried to replicate analyses you read about. *which means after you've run the 'install.pages' line below.
In addition, the 1-way anova and dependent sample sections of the pdf listed at the top of MOST RECENT (below) should be read before Thursday. I shall be posting an edited, and slightly expanded, version of last week's Optional Exercise for Thursday's (Feb 17th) first assignment. Because no student seemed to have started that work, I'd like you to begin with that if you want to get started sooner than my post (about 3 pm today). FINALLy, it is here: Feb17Homework.pdf . I'll leave it to you to go to the upload page to get the two bootstrapping documents.
Added: I just uploaded a pdf that will be of interest to those of you w/ a theoretical bent, those who want to know how the normal (z), t, F and chi square distributions relate to one another. This was given to me by my colleague Professor Bruce Dudek (Psych Dept), for use by you! bDudek.RelationsAmngDistrbs.pdf Aim to read before class the two page pdf, here: Essence of bootstrapping.pdf
For our March 3 class, I'd like you to read (two or three times?) ANOVAplannedContrsts&More.pdf and then try to begin to use the granova.contr function as described there. Please POST YOUR QUESTIONS as you read! New: See item M. under Files Available below. Strongly recommended.
Re: Latter part of class tomorrow (March 3), I shall introduce some basic regression concepts, methods and of course graphics. Files soon to be available.
For March 10 class, do the exercises here: ExercisesOnRegression.pdf . Note: I have now added the last part while keeping the name the same.
ND ASK QUESTIONS, for everyone's sake. On this point, I numbered all the paragraphs in my PSANotes.. document here to facilitate your reference to parts of it. Let me know if that helps. b
re: March 24 Assignment: Apparently I was not clear, as more than one of you have asked questions about what I meant. My intention was that each of you would follow my example involving several different analyses, and graphic constructions, in reanalyses of the birthwt data from the MASS library, but that each of you would choose a different logistic regression model -- which is to say the specification for y ~ in the glm statement -- so that each of you would (probably) be producing a vector of propensity scores that differed from all others. The second Phase of the analysis will of course be different too for each of you too as the PS's are central to the response data analyses, and graphics. I also said that if you wanted to, and had other data that would be SUITABLE for PSA that you were (in principle) welcome to use the other data, but I wanted you to confer w/ me on your choice before investing much in analyses of different data. Note that it is not enough for you to say, "I found some other data, and want to use it." You need to persuade me (and yourself) that the alternative data are appropriate for such study; this means that there must be a 'reasonable set' of covariates to account for what can be argued is 'much if not most' of the selection bias that is associated w/ the (binary) treatment variable, and that there is a reasonable response variable. Details about the data are therefore essential before you begin! Always. Let me know what you are doing -- and if you would like me to comment at the early stages on your approach to PS estimation, I will of course be happy to do that. bob
I shall go over this IllustrationMatchingbwtdata.doc in the March 24 class (to the extent to which we have time); Jason will present for about the last hour. Link to Elizabeth A. Stuart's page, also: Stuart, E.A. (2010). Matching Methods for Causal Inference: A review and a look forward. Statistical Science 25(1): 1-21. [PDF from Dr. Stuart's page] [Journal page URL].
Please, before our March 31 class, begin to study the Sarkar pdf, including some trials w/ his code in: Sarkar08latticeLab.pdf
Re: MISSING DATA, I've posted two items, the first of which I'd like everyone to have read by Thursday's class (4/14): schafer_graham_MissingDataPsychMethods02.pdf . The second is a (lengthy) pdf, based on a talk last year by Prof. R. Yucel (Biostats): Prof.RecaiYucel.MissingDataSlides2011.pdf . Note that the second contains some ways to use R for missing data problems, but that info comes at the very end. Here is a primer on missing data and multiple imputation, published in 2007: http://circoutcomes.ahajournals.org/content/3/1/98.full
Mea Culpa: I thought, in my edits of this 1st page, that I'd made these changes on this page after I uploaded the Shafer-Graham article on Monday, but I forgot to check to see that they were properly saved... and they were not. So do the best you can in the time you now have. Some of you, I hope, will have seen the Shafer-Graham article earlier this week... indeed I hope you've been reading it. Also see Missing DataOverview+Sources.pdf .
I trust you have all been working through at least some of the content in the Sarkar/lattice pdfs loaded last week. Please bring your questions, and if you can, some examples of your use of Sarkar's code to get lattice plots, with data of your own choosing. bp
For our last class (!), I want to spend a little time on matrix operations relevant to data analysis. To help this effort as much as possible, after you have reviewed and asked questions, the material on missing data, we shall get into the details of Basic.Matrix.Ops.correl.regrsn11.pdf and at least briefly, Details4L2X-AlsoEigen+SVD.11.pdf . You will see there are exercises in each of these too.
Added 4/27. Since I know you have all read all pdf's related to missing data (yes?) and have few questions on that topic (yes?), and since tomorrow is our last class, w/ one big topic remaining (cf. the last two items above this line), I've decided tomorrow to restrict attention to missingness by ONLY answering your m.d. questions (briefly). Be prepared to go over the Basic.Matrix... document first, and then the one that follows (if you have time to read, while at the console). b
I've posted two new pdfs (4/6), one on Indicator matrices, the other on Correspondence Analysis, Illustrated, with comments. I ask you to read the Ind. Matrices document before Thursday's class, and as much of what has been posted for the past week as well. The new pdfs are: Indicator matrices.pdf and IllustrationsCAw-comments.pdf. b
Also see these intros to Correspondence Analysis, and Mosaics
Mosaic Plots (emphasis on package 'vcd' and 'vcdExtra'):
NOTE: I have written a small function for creating dependent sample data (2 columns) here: dep.dat.txt
I shall post the homework asap on Saturday. But for the nonce, see the recently posted Guidelines for Homework Guidelines4Homework.modGraphics.pdf
Added: Because you should have at least one credible source on TWO way ANOVA, see this: Chapter09.schwartz.simfrazu.pdf Read by Thursday, perhaps partly as a reference for your two way run. Also: see ONE WAY ANOVA+introContrasts11.pdf and 07_Factorial1.pdf which is a more comprehensive reference concerning 2 way ANOVA, well worth further study.
---------------------------------------------------------------------------------------------------------
NOTE: I've started a category, MOST RECENT, near the bottom, to direct your attention to latest additions to this Wiki. And I entered the pdf (concerning what I call 'elemental graphics') for a new paper that I just submitted for publication (w/ a coauthor) that I would be pleased to have you all to read (and if you would, respond) by the time classes start. I'll use this color to help identify items in this category.
NOTE2 (with new Addendum!): Since all students have 'writing privileges' for this wiki, you are welcome to make whatever edits, and especially additions, that you care to make. Once you have logged in you can go to the Upload Files page, and input files of your own choosing. I suggest that if files are 'large' that you contact me before doing so, as we have only 2 Gb of space for this semester, and -- especially given the sizes of many documents with graphics -- large sizes may be common. I have also begun to use folders (let me do that); most recently, I put up two files that provide instructional information to help you learn R (in directory learnR), one by Venables and Smith, the other by a biostatistician at Vanderbilt University, Theresa Smith. Everyone can profit from reading these, so I trust you will download and use them often.
Addendum Just today I acquired (based on purchase) the pdf of a long and useful book about R, viz., R_in_a_Nutshell. From the information in the book you can see how to purchase your own copy; but you may want to download the book and use this copy in the meantime. Chapters 12 - 20 are especially relevant for us (but I want to emphasize that I see R as instrumental for this course; EPSY887 should NOT be seen as 'a course about R', even though R will often be used. (I expect, given its size, to delete the Nutshell book in about 2 weeks, so don't hesitate to take a look or download soon.)
NOTE3: For those of you who are preparing your laptops with R in this course, I have some suggestions for getting started:
First, upload the most recent version of R (2.12.1 at r-project.org) if you have not done so (was posted in Dec. last year). Then install these packages, perhaps not all at once, but in the next week or two: MASS, Hmisc, BHH2, TeachingDemos, rgl, granova, PSAgraphics, YaleToolkit, ggplot2, ISwR,sos, psych, foreign, nutshell, vcd, car, ellipse, cluster, plyr, rpart, doBy, corrgram, UsingR, RSiteSearch, mi, session, bootstrap, boot. It will help if you can bring your laptop to each class (but Macs are virtually identical for R use, and these will be available for you).
Here is the R code to install all these packages (and dependencies) from the main R repository (copy, and paste to your R session):
install.packages(c('MASS','Hmisc','BHH2','TeachingDemos','rgl','granova','PSAgraphics', 'YaleToolkit', 'ggplot2','ISwR','sos', 'psych','foreign','nutshell','vcd','car','ellipse','cluster','plyr', 'rpart', 'doBy', 'corrgram', 'UsingR', 'RSiteSearch','mi','session', 'bootstrap', 'boot'), repos="http://cran.r-project.org", dependencies=TRUE)
NOTE4: I've developed some ideas I'd like you to think about (and act on); download here: Suggestions2AidLearning.pdf
Here are several potentially helpful URLs:
Hundreds of graphics here; study at least a few of them and note that the (R) code for constructing each is also included.
6. http://had.co.nz/ggplot/ Main webpage for Hadley Wickham’s ggplot2 package in R, see book and ppt below, (A and B)
7a. http://www.yeroon.net/ Several videos Yeroon has developed to show R applications
8. http://www.r-bloggers.com Large and growing site, with daily editions; archives available (Add others here as you find them useful!)
10. http://www.math.yorku.ca/SCS/StatResource.html#DataVis Excellent (international) source for Visualization and Graphics. Spend some time here; back down to the html as well.
11. http://openmx.psyc.virginia.edu/ This is quite new; an add-on for R to support structural equation modeling [SEM] (there is a forum too, w/ lots of help available).
12. www.blender.org Blender is a free open source 3D content creation suite that works w/ all OS’s, facilitates modeling, shading, animation, rendering, composites; also interactive.
15a. http://www.dataspora.com/blog/ This blog is by Michael Driscoll, a major advocate, developer and user of R. The following two items show what he can do; I think these are quite good.
16. http://goo.gl/1NRiV Alex showed that a short URL gets us to the same place (thanks Alex). World poverty and graphics is the topic: a dynamic plot. More on this later.
Next, we make files in this wiki available for download. Just click. (These, and many others, are on a separate page on this wiki. Let me know if you need access)
A. Wickham.ggplot2.2010basics.pdf An introduction to ggplot2, one of the most impressive recent additions to R for graphics, written by the author of the package and the book (next).
B. ggplot2-Book09hWickham.pdf Professor Wickham made this pdf available at his website not long before his book was published less than a year ago.
C. Ihaka-presentn-graphics.ppt.pdf Ross Ihaka was one of the originators of R, roughly 15 years ago. This file is from his university website.
D. Ihaka.high-dimensional.Graphics.pdf Counterpart of the preceding, but it goes into larger questions or problems
E. Graphics courseUCLA.codeToo.0125.pdf Intermediate level material, assumes some familiarity with R (not too much tho')
F. sarkar.Lattice09.pdf Lattice is a superb R package that has already been used in various ppt/pdfs above. It has become invaluable to many users.
G. R-GraphicsShortCourse08.ppt.pdf Georges M. (U.K.) with especially interesting material. Code included in the next URL.
I. wilkinson_1999.DotPlots.pdf Not everything graphical is R-related. Here is a helpful article on dotplots; other articles follow.
J. feeney_etal_2000.pdf a book chapter, How People Extract Information from Graphs... results from an illustrative experiment.
K. cumming_finch.Conf.Intervals_2005.pdf Not just graphics, but confidence intervals, and about their interpretation
L. friendly.ExtendingMosaicPlots_99.pdf Article by a psychologist who has developed several effective methods for displaying data.
M. Lumley06.225pages.R-fundamentals.pdf Lengthy pdf derived from a ppt presentation; 4 years old, but excellent material; Lumley is R core team man.
N. statgraphicsReferences.BD.Oct_2010_2up.pdf An especially useful listing of a large number of references to topics graphical, with special relevance to psychology and social sciences. Most of these pdfs will be available for download, thanks to the efforts of BD. (BD refers to Professor Bruce Dudek, SUNYA Psych Department)
MOST RECENT:
a. ElementalGraphics4ANOVA.RP+JH.pdf This paper was recently sent off for publication (J. of Statistics Education), and will be revised at least once before being published. It would be timely to receive feedback that might improve (or correct?) parts of it, so I solicit your feedback (preferably by early January to fit w/ our revision schedule). Thank you. BP (Related to granova is a YouTube video: http://www.youtube.com/watch?v=HhOjDn32GIc - Alex T.)
b. cleveland&mcgill_1984b.pdf This is one of the first papers I'd like you to read (and use as a starting point for exercises) in this class. The article appeared more than 25 years ago when executing scatterplots the authors illustrate was difficult; but with R it is relatively easy to do virtually everything they show 'rather easily.' I've used quotes here because knowledge of R commands and functions is required to do these things. For those wanting to begin early (and start to do what I shall be spelling out in the syllabus next week) you may find it useful to begin to construct plots that (in form) mirror what Cleveland and McGill show. (Don't try to do them all yet; better to read/study other sources, and edit a. above, if time permits.) Beyond the plot function (R base), several other packages/functions are likely to be helpful: see #1 above (addicted2r site) for examples AND R code that will be most helpful. BP
c. http://mind42.com/pub/mindmap?mid=c4e081c4-46c5-4c99-b3f3-04138a3a67a2 This site provides a 'map' for learning R; it may have some value to those who are still learning basic R functionality. Let me know if it was useful to you or not, and why. BP
d. Re: textbook for this course: book by William Cleveland (1993). Although I do not expect to use this book for more than about 1/3 of this course, no other book goes as far as this one does in providing basic information I want to emphasize in the class. Purchase is not necessary (about $50 from Amazon, but only 3 copies there), but it can be rented, e.g. $15 for 130 days from http://www.collegebookrenter.com/details.cfm/isbn/0963488406 [isbn is last part]. (You may want to share with one another; order asap.) I (Alex) got the book from collegebookrenter.com 3 business days after ordering it. It shipped from Little Rock, AL via USPS Priority Mail (2-3 day estimate). Condition is like new (crispy pages, clean, no dust, etc.), except someone made highlights on a few of pages, otherwise great. - Alex. T
RE: Textbook rental - I tried to rent the book from College Book Renter - they processed my order & then I just received an email saying they cancelled my order because "they were unable to fill it" - I ordered it on Amazon today for $50 - Catherine
Book summary: Visualizing Data is about visualization tools that provide deep insight into the structure of data. There are graphical tools such as coplots, multiway dot plots, and the equal count algorithm. There are fitting tools such as loess ... that fit equations, nonparametric curves, and nonparametric surfaces to data. But the book is much more than just a compendium of useful tools. It conveys a strategy for data analysis that stresses the use of visualization to thoroughly study the structure of data and to check the validity of statistical models fitted to data. The result of the tools and the strategy is a vast increase in what you can learn from your data. The book demonstrates this by reanalyzing many data sets from the scientific literature, revealing missed effects and inappropriate models fitted to data.
Here is a link to William Cleveland's personal website on Bell Labs: http://stat.bell-labs.com/wsc/ From there you can download the datasets and S code (often perfectly compatible with R) to generate all the graphics in Visualizing Data.
Some additional resources for learning R:
-
Quick-R http://www.statmethods.net/ This is the site I go to first when I have a question about how to do something with R. I would also very much recommend his book which is currently 40% off.
UPDATE2 (bp): while I really like this book (R in Action), I found that it cost $44 to get; it is worth that, but what I'm trying to do is make the book available (but not for download) online. It is probably even better (for you students) than R_in_a_Nutshell, so wait a bit for closure on this unless you have the $$ to spare. It IS a fine book on both R and statistics.
-
R-bloggers http://www.r-bloggers.com/ This is actually a collection of many other blogs. There are generally a half dozen or so posts a day covering all aspects of research and data analysis. This is how I generally discover new information about R.
-
The R Journal http://journal.r-project.org/ Worth browsing the archive. Again, some really good articles here.
-
Nabble http://r.789695.n4.nabble.com/ A forum for asking and getting answers to R questions. Searching this site often reveals answers to your questions.
-
-
Web interfaces for lme4 (package for multilevel modeling) and ggplot2 (graphics). http://yeroon.net/ There are also some videos there.
-
-
-
-
-
-
-
Working With R: Text Editors and IDEs
R on both Windows and Mac includes a reasonable text editor. However, if you are going to be working with R a lot, I would suggest trying out some other programs to edit your R script files. Many of these are considered IDEs (Integrated Development Environment) which just means they are designed to provide all the features useful for program development. Most importantly, these provide nice syntax highlighting and lots of additional features that help with manipulating text files (e.g. tabbed browsing, superior text searching and replacing, etc.)
-
RStudio (http://www.rstudio.org/), a new IDE designed specifically for R has been released. It is easy to install and has some really nice features.
-
For Windows: Notepad++ along with NppToR is fantastic. (A screenshot of R language in Notepad++. - Alex T.)
-
If you use Textmate (not free) for Mac, then you can connect it with R
-
For Windows and Mac I am currently using have used (recently switched to Eclipse) Komodo Edit. You will also need to install the SciViews-K extension. Some people have reported issues getting this to work, however I have had no issues personally. My suggestion, try to install it and if it doesn't work, delete it and move on.
-
Eclipse is perhaps the most popular IDE for Java and other programming languages. The StatEt plugin embeds R directly in Eclipse. I used it with R when I was doing more Java programming and it Eclipse is perhaps one of the finest programs out there, but it can be quite complicated. That is, it is the Swiss army knife of IDEs and now prefer the simpler Komodo Edit.
Data Sources
These are links to some freely available data sources.
Sweave & LaTeX
I uploaded a PDF of a presentation I gave about R, LaTeX, and Sweave. I also have some resources on a page here: http://bryer.org/2009/latex-and-sweave. Lastly, I uploaded a PDF showing how to create a ggplot version of Bob's loess.psa graphic. The source Rnw file is also available.
The catalog description of this course follows:
EPSY 887 Modern graphics for social science research
Graphical and related methods for applications: modern fixed and dynamic graphics and related methods; tools for analysis and interpretations of data. Major goals will be to tailor the course to needs of students for analysis and presentation of data, to help ensure operational skills as well as sound understanding of modern methods for both analysis and graphics to facilitate communication. Software applications, especially R, will be emphasized, taking advantage of new vehicles (youtube, web videos, cloud computing, etc.) to facilitate analysis and display. Prerequisites: EPSY 630 or equivalent. (3 credits) Thursdays 4.15 – 7.05
FrontPage
|
Tip: To turn text into a link, highlight the text, then click on a page or file from the list above.
|
|
|
|
|
Comments (1)
Jason Bryer said
at 9:09 am on Apr 16, 2011
I have uploaded a PDF of the multilevel PSA poster on my website here: http://bryer.org/2011/comparing-public-and-private-schools
You don't have permission to comment on this page.