{"id":412,"date":"2020-03-03T13:04:00","date_gmt":"2020-03-03T12:04:00","guid":{"rendered":"http:\/\/www.labo.mathieurella.fr\/?p=412"},"modified":"2020-05-13T14:44:39","modified_gmt":"2020-05-13T12:44:39","slug":"what-is-a-tidy-dataset","status":"publish","type":"post","link":"https:\/\/www.labo.mathieurella.fr\/?p=412","title":{"rendered":"What is a Tidy Dataset"},"content":{"rendered":"\n<p>A\u00a0<a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/cran.r-project.org\/web\/packages\/tidyr\/vignettes\/tidy-data.html\">tidy dataset<\/a>\u00a0is a tabular dataset where:<\/p>\n\n\n\n<ul><li>each variable is a column<\/li><li>each observation is a row<\/li><li>each type of observational unit is a table<\/li><\/ul>\n\n\n\n<p>The first three images below depict a tidy dataset. This tidy dataset is in the field of healthcare and has two tables: one for patients (with their patient ID, name, and age) and one for treatments (with patient ID, what drug that patient is taking, and the dose of that drug).<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278e8_tidy-data-one\/tidy-data-one.png\" alt=\"\" width=\"473\" height=\"160\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278e8_tidy-data-one\/tidy-data-one.png\" alt=\"\" width=\"473\" height=\"160\"\/><\/noscript><figcaption>Each variable in a tidy dataset must have it&#8217;s own column<\/figcaption><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278ea_tidy-data-two\/tidy-data-two.png\" alt=\"\" width=\"486\" height=\"164\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278ea_tidy-data-two\/tidy-data-two.png\" alt=\"\" width=\"486\" height=\"164\"\/><\/noscript><figcaption>Each observation in a tidy dataset must have its own row<\/figcaption><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278ec_tidy-data-three\/tidy-data-three.png\" alt=\"\" width=\"492\" height=\"166\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278ec_tidy-data-three\/tidy-data-three.png\" alt=\"\" width=\"492\" height=\"166\"\/><\/noscript><figcaption>Each observational unit in a tidy dataset must have it&#8217;s how table<\/figcaption><\/figure><\/div>\n\n\n\n<p>The next image depicts the same data but in one representation of a non-tidy format (there are other possible non-tidy representations). The\u00a0<em>Drug A<\/em>,\u00a0<em>Drug B<\/em>, and\u00a0<em>Drug C<\/em>\u00a0columns should form one &#8216;Drug&#8217; column, since this is one variable. The entire table should be separated into two tables: a patients table and a treatments table.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278e7_tidy-data-four\/tidy-data-four.png\" alt=\"\"\/><noscript><img decoding=\"async\" src=\"https:\/\/video.udacity-data.com\/topher\/2018\/January\/5a6278e7_tidy-data-four\/tidy-data-four.png\" alt=\"\"\/><\/noscript><figcaption>Only the second rule of tidy data is satisfied in this non-tidy representation of the above data: each observation forms a row<\/figcaption><\/figure>\n\n\n\n<p>In practice, you may need to perform tidying work before exploration. You should be comfortable with reshaping your data or perform transformations to split or combine features in your data, resulting in new data columns. This work should be performed in the wrangling stage of the data analysis process.<\/p>\n\n\n\n<p>This is also not to say that tidy data is the\u00a0<em>only<\/em>\u00a0useful form that data can take. In fact, as you work with a dataset, you might need to summarize it in a non-tidy form in order to generate appropriate visualizations.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A\u00a0tidy dataset\u00a0is a tabular dataset where: each variable is a column each observation is a row each type of observational &#8230;<\/p>\n","protected":false},"author":1,"featured_media":414,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[],"_links":{"self":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts\/412"}],"collection":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=412"}],"version-history":[{"count":2,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts\/412\/revisions"}],"predecessor-version":[{"id":415,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts\/412\/revisions\/415"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/media\/414"}],"wp:attachment":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=412"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=412"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=412"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}