From owner-r-help@stat.math.ethz.ch Wed Mar 1 04:51:04 2000 Return-Path: Received: from stat.math.ethz.ch (majordom@hypatia.ethz.ch [129.132.58.23]) by t2.mscf.uky.edu (8.9.3/8.8.7) with ESMTP id EAA07138 for ; Wed, 1 Mar 2000 04:51:04 -0500 Received: by stat.math.ethz.ch (8.9.1/8.9.1) id KAA21921 for r-help-gang-use; Wed, 1 Mar 2000 10:25:10 +0100 (MET) Received: (from daemon@localhost) by stat.math.ethz.ch (8.9.1/8.9.1) id KAA21915 for ; Wed, 1 Mar 2000 10:25:07 +0100 (MET) Received: from lynne(129.132.58.30), claiming to be "lynne.ethz.ch" via SMTP by hypatia, id smtpdAAAa005MJ; Wed Mar 1 10:25:01 2000 Received: (maechler@localhost) by lynne.ethz.ch (8.9.3/D-MATH-client) id KAA12484; Wed, 1 Mar 2000 10:25:01 +0100 X-Authentication-Warning: lynne.ethz.ch: maechler set sender to maechler@lynne.ethz.ch using -f MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14524.57836.939246.506307@lynne.ethz.ch> Date: Wed, 1 Mar 2000 10:25:00 +0100 (MET) From: presnell@stat.ufl.edu To: r-help@stat.math.ethz.ch Subject: [R] Contingency tables as data frames In-Reply-To: <14511.3809.113638.667839@flounder.stat.ufl.edu> References: <14511.3809.113638.667839@flounder.stat.ufl.edu> X-Mailer: VM 6.75 under Emacs 20.5.1 Reply-To: Martin Maechler Sender: owner-r-help@stat.math.ethz.ch Precedence: bulk Status: OR {again a message that was sent to owner-r-help (which is me, currently) why on earth ???!??!? reply to R-help or the original sender Brett Presnell; } I'm teaching a categorical data analysis course this term, and a minor "problem" has resurfaced that I have often thought about before. This applies equally to Splus I suppose, but my undergrads aren't using Splus. It seems natural to read/represent a contingency table as a data frame, with one column representing the cell counts (as in the example appended below (data taken from Agresti, "An Introduction to Categorical Data Analysis"). However, functions like ftable, mantelhaen.test, chisq.test, fisher.test, etc. don't work naturally with this representation, and instead require the user to first manipulate the data, say by using tapply to convert the data into an array. This is not difficult of course, but it's one of those things that I'd rather not have to explain to students, who usually need to be focusing on other things. So, am I missing something obvious (not unlikely), or would it be a good idea to extend the methods/arguments of these functions to analyze/manipulate data represented in this way without any preprocessing by the user? It seems that a "count" (or "weight" or "freq" or whatever) argument would do it in most cases. Funny, I can't help but wonder if the answer from those who have thought about this more deeply than I have might be "it's a can of worms". -- Brett Presnell Department of Statistics University of Florida (presnell@stat.ufl.edu) City Smoker Cancer Count Beijing Yes Yes 126 Beijing Yes No 100 Beijing No Yes 35 Beijing No No 61 Shanghai Yes Yes 908 Shanghai Yes No 688 Shanghai No Yes 497 Shanghai No No 807 Shenyang Yes Yes 913 Shenyang Yes No 747 Shenyang No Yes 336 Shenyang No No 598 Nanjing Yes Yes 235 Nanjing Yes No 172 Nanjing No Yes 58 Nanjing No No 121 Harbin Yes Yes 402 Harbin Yes No 308 Harbin No Yes 121 Harbin No No 215 Zhengzhou Yes Yes 182 Zhengzhou Yes No 156 Zhengzhou No Yes 72 Zhengzhou No No 98 Taiyuan Yes Yes 60 Taiyuan Yes No 99 Taiyuan No Yes 11 Taiyuan No No 43 Nanchang Yes Yes 104 Nanchang Yes No 89 Nanchang No Yes 21 Nanchang No No 36 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._