{"id":455,"date":"2020-05-11T11:38:00","date_gmt":"2020-05-11T09:38:00","guid":{"rendered":"http:\/\/www.labo.mathieurella.fr\/?p=455"},"modified":"2020-05-17T12:01:08","modified_gmt":"2020-05-17T10:01:08","slug":"introduction-to-scatter-plot","status":"publish","type":"post","link":"https:\/\/www.labo.mathieurella.fr\/?p=455","title":{"rendered":"Introduction to Scatter Plot"},"content":{"rendered":"\n<p>If we want to inspect the relationship between two numeric variables, the standard choice of plot is the&nbsp;<strong>scatterplot<\/strong>. In a scatterplot, each data point is plotted individually as a point, its x-position corresponding to one feature value and its y-position corresponding to the second. One basic way of creating a scatterplot is through Matplotlib&#8217;s&nbsp;<a target=\"_blank\" href=\"https:\/\/matplotlib.org\/api\/_as_gen\/matplotlib.pyplot.scatter.html\" rel=\"noreferrer noopener\"><code>scatter<\/code><\/a>&nbsp;function:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>plt.scatter(data = df, x = 'num_var1', y = 'num_var2')<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.51-AM-1024x518.png\" alt=\"\" class=\"wp-image-456\" width=\"427\" height=\"216\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.51-AM-1024x518.png\" alt=\"\" class=\"wp-image-456\" width=\"427\" height=\"216\" srcset=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.51-AM-1024x518.png 1024w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.51-AM-300x152.png 300w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.51-AM-768x388.png 768w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.51-AM.png 1072w\" sizes=\"(max-width: 427px) 100vw, 427px\" \/><\/noscript><\/figure><\/div>\n\n\n\n<p><a href=\"https:\/\/classroom.udacity.com\/nanodegrees\/nd002\/parts\/9f7e8991-8bfb-4103-8307-3b6f93f0ecc7\/modules\/1dc09d28-5703-493c-aab5-a418b8bfa3e1\/lessons\/22ff3b91-42f7-420e-b6ad-de0b29684ed0\/concepts\/9d1316b3-f339-4d52-b63f-91994aefdd40#\"><\/a><\/p>\n\n\n\n<p>We can see a generally positive relationship between the two variables, as higher values of the x-axis variable are associated with greatly increasing values of the variable plotted on the y-axis.<\/p>\n\n\n\n<p>One point is usually plotted for every observation we have in our data resulting in a cloud of points.<\/p>\n\n\n\n<p>The pattern in the cloud of points can clearly showwhat kind of relationship exist between our 2 variables and it&#8217;s strenght. We are often interested in quantifying the strenght of the relationship between two variables through a correlation coefficient.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Pearson Correlation<\/h2>\n\n\n\n<p>The most commonly used is the Pearson correlation coefficient (annoted r), here is a quick overview of this coefficient<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/pearson-1.png\" alt=\"\" class=\"wp-image-460\"\/><noscript><img loading=\"lazy\" decoding=\"async\" width=\"307\" height=\"149\" src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/pearson-1.png\" alt=\"\" class=\"wp-image-460\" srcset=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/pearson-1.png 307w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/pearson-1-300x146.png 300w\" sizes=\"(max-width: 307px) 100vw, 307px\" \/><\/noscript><\/figure><\/div>\n\n\n\n<p>The statistic take a value between -1 and 1.<\/p>\n\n\n\n<p>Positive number indicate relationship where positive changes of 1 variable is associated of a positive change in the second variable.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.31.27-AM-1024x757.png\" alt=\"\" class=\"wp-image-461\" width=\"345\" height=\"254\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.31.27-AM-1024x757.png\" alt=\"\" class=\"wp-image-461\" width=\"345\" height=\"254\" srcset=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.31.27-AM-1024x757.png 1024w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.31.27-AM-300x222.png 300w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.31.27-AM-768x568.png 768w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.31.27-AM.png 1418w\" sizes=\"(max-width: 345px) 100vw, 345px\" \/><\/noscript><figcaption>Positive Relationship : <strong>r \u2248 1<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>While negative r numbers indicate that when one variable increase, the second variable tends to decreased.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.32.55-AM-1024x587.png\" alt=\"\" class=\"wp-image-462\" width=\"398\" height=\"228\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.32.55-AM-1024x587.png\" alt=\"\" class=\"wp-image-462\" width=\"398\" height=\"228\" srcset=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.32.55-AM-1024x587.png 1024w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.32.55-AM-300x172.png 300w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.32.55-AM-768x441.png 768w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.32.55-AM-1536x881.png 1536w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.32.55-AM.png 1548w\" sizes=\"(max-width: 398px) 100vw, 398px\" \/><\/noscript><figcaption>Negative Relationship : <strong>r \u2248 -1<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>Value closes to the extremes of negative one or one indicates a stronger, more predictable relationship while value close to zero indicate a weaker relationship.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.35.26-AM.png\" alt=\"\" class=\"wp-image-463\" width=\"278\" height=\"188\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.35.26-AM.png\" alt=\"\" class=\"wp-image-463\" width=\"278\" height=\"188\" srcset=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.35.26-AM.png 668w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-11.35.26-AM-300x203.png 300w\" sizes=\"(max-width: 278px) 100vw, 278px\" \/><\/noscript><figcaption>No correlation : <strong>r \u2248 0<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>Finally the Pearson correlation only capture linear relationship.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"alternative-approach\">Alternative Approach<\/h2>\n\n\n\n<p>Seaborn&#8217;s&nbsp;<a target=\"_blank\" href=\"https:\/\/seaborn.pydata.org\/generated\/seaborn.regplot.html\" rel=\"noreferrer noopener\"><code>regplot<\/code><\/a>&nbsp;function combines scatterplot creation with regression function fitting:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>sb.regplot(data = df, x = 'num_var1', y = 'num_var2')<\/code><\/pre>\n\n\n\n<p>The basic function parameters, &#8220;data&#8221;, &#8220;x&#8221;, and &#8220;y&#8221; are the same for&nbsp;<code>regplot<\/code>&nbsp;as they are for matplotlib&#8217;s&nbsp;<code>scatter<\/code>.<a href=\"https:\/\/classroom.udacity.com\/nanodegrees\/nd002\/parts\/9f7e8991-8bfb-4103-8307-3b6f93f0ecc7\/modules\/1dc09d28-5703-493c-aab5-a418b8bfa3e1\/lessons\/22ff3b91-42f7-420e-b6ad-de0b29684ed0\/concepts\/9d1316b3-f339-4d52-b63f-91994aefdd40#\"><\/a><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.57-AM-1024x560.png\" alt=\"\" class=\"wp-image-457\" width=\"419\" height=\"229\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.57-AM-1024x560.png\" alt=\"\" class=\"wp-image-457\" width=\"419\" height=\"229\" srcset=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.57-AM-1024x560.png 1024w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.57-AM-300x164.png 300w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.57-AM-768x420.png 768w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.54.57-AM.png 1064w\" sizes=\"(max-width: 419px) 100vw, 419px\" \/><\/noscript><\/figure><\/div>\n\n\n\n<p>By default, the regression function is linear, and includes a shaded confidence region for the regression estimate. In this case, since the trend looks like a\u00a0log(<em>y<\/em>)\u221d<em>x<\/em>\u00a0relationship (that is, linear increases in the value of x are associated with linear increases in the log of y), plotting the regression line on the raw units is not appropriate. If we don&#8217;t care about the regression line, then we could set\u00a0<code>fit_reg = False<\/code>\u00a0in the\u00a0<code>regplot<\/code>\u00a0function call. Otherwise, if we want to plot the regression line on the observed relationship in the data, we need to transform the data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def log_trans(x, inverse = False):\n    if not inverse:\n        return np.log10(x)\n    else:\n        return np.power(10, x)\n\nsb.regplot(df&#91;'num_var1'], df&#91;'num_var2'].apply(log_trans))\ntick_locs = &#91;10, 20, 50, 100, 200, 500]\nplt.yticks(log_trans(tick_locs), tick_locs)<\/code><\/pre>\n\n\n\n<p>In this example, the x- and y- values sent to&nbsp;<code>regplot<\/code>&nbsp;are set directly as Series, extracted from the dataframe.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"data:image\/gif;base64,R0lGODlhAQABAIAAAAAAAP\/\/\/yH5BAEAAAAALAAAAAABAAEAAAIBRAA7\" data-src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.55.04-AM-1024x542.png\" alt=\"\" class=\"wp-image-458\" width=\"415\" height=\"219\"\/><noscript><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.55.04-AM-1024x542.png\" alt=\"\" class=\"wp-image-458\" width=\"415\" height=\"219\" srcset=\"https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.55.04-AM-1024x542.png 1024w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.55.04-AM-300x159.png 300w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.55.04-AM-768x406.png 768w, https:\/\/www.labo.mathieurella.fr\/wp-content\/uploads\/2020\/05\/Capture-d\u2019e\u0301cran-2020-05-17-a\u0300-10.55.04-AM.png 1096w\" sizes=\"(max-width: 415px) 100vw, 415px\" \/><\/noscript><\/figure><\/div>\n","protected":false},"excerpt":{"rendered":"<p>If we want to inspect the relationship between two numeric variables, the standard choice of plot is the&nbsp;scatterplot. In a &#8230;<\/p>\n","protected":false},"author":1,"featured_media":468,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[17],"tags":[],"_links":{"self":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts\/455"}],"collection":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=455"}],"version-history":[{"count":2,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts\/455\/revisions"}],"predecessor-version":[{"id":469,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/posts\/455\/revisions\/469"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=\/wp\/v2\/media\/468"}],"wp:attachment":[{"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=455"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=455"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.labo.mathieurella.fr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=455"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}