{"id":194,"date":"2023-04-15T00:29:52","date_gmt":"2023-04-15T00:29:52","guid":{"rendered":"https:\/\/blog.amalgamcs.com\/?p=194"},"modified":"2023-04-15T00:29:52","modified_gmt":"2023-04-15T00:29:52","slug":"exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python","status":"publish","type":"post","link":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/","title":{"rendered":"Exploring Dimensionality Reduction and Data Visualization with PCA in Python"},"content":{"rendered":"\n<p>PCA is a technique for transforming a set of correlated variables into a set of linearly uncorrelated variables, known as principal components. The first principal component is the direction in the data with the highest variance, and each subsequent principal component has the highest variance orthogonal to the previous principal components.<\/p>\n\n\n\n<p>PCA can be used for a variety of purposes, such as data compression, noise reduction, and visualization. In this blog post, we will focus on how PCA can be used for data visualization and dimensionality reduction.<\/p>\n\n\n\n<h2>PCA with Python<\/h2>\n\n\n\n<p>Let&#8217;s start by importing the necessary libraries and loading the dataset that we will use for our examples.<\/p>\n\n\n\n<pre class=\"wp-block-code has-green-color has-black-background-color has-text-color has-background\"><code>import numpy as np\r\nimport matplotlib.pyplot as plt\r\nfrom sklearn.datasets import load_digits\r\nfrom sklearn.decomposition import PCA\r\n\r\ndigits = load_digits()\r\nX = digits.data\r\ny = digits.target\r<\/code><\/pre>\n\n\n\n<p>The <code>load_digits()<\/code> function from <code>sklearn.datasets<\/code> loads the handwritten digits dataset. This dataset contains images of handwritten digits, with each image represented as a 8&#215;8 matrix of pixel values. The <code>data<\/code> attribute of the dataset contains a flattened version of these matrices, with each row representing an image. The <code>target<\/code> attribute contains the corresponding digit labels.<\/p>\n\n\n\n<p>Now that we have loaded the dataset, we can perform PCA on it. We will start by scaling the data using <code>StandardScaler<\/code> from <code>sklearn.preprocessing<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code has-green-color has-black-background-color has-text-color has-background\"><code>from sklearn.preprocessing import StandardScaler\r\n\r\nscaler = StandardScaler()\r\nX_scaled = scaler.fit_transform(X)\r<\/code><\/pre>\n\n\n\n<p>We scale the data because PCA is sensitive to the scale of the variables, and we want to ensure that all variables have the same scale.<\/p>\n\n\n\n<p>Next, we will perform PCA using <code>PCA<\/code> from <code>sklearn.decomposition<\/code>.<\/p>\n\n\n\n<pre class=\"wp-block-code has-green-color has-black-background-color has-text-color has-background\"><code>pca = PCA(n_components=2)\r\nX_pca = pca.fit_transform(X_scaled)\r<\/code><\/pre>\n\n\n\n<p>We specify <code>n_components=2<\/code> to reduce the dimensionality of the data to 2 principal components, which will allow us to visualize the data. We fit the PCA model to the scaled data using <code>fit_transform()<\/code>.<\/p>\n\n\n\n<p>Now that we have transformed the data, we can plot it using Matplotlib.<\/p>\n\n\n\n<pre class=\"wp-block-code has-green-color has-black-background-color has-text-color has-background\"><code>plt.scatter(X_pca&#91;:, 0], X_pca&#91;:, 1], c=y, cmap='viridis')\r\nplt.xlabel('First principal component')\r\nplt.ylabel('Second principal component')\r\nplt.colorbar()\r\nplt.show()\r<\/code><\/pre>\n\n\n\n<p>This code produces a scatter plot of the data, with the x-axis representing the first principal component and the y-axis representing the second principal component. The color of each point represents the corresponding digit label. The <code>colorbar()<\/code> function adds a color legend to the plot.<\/p>\n\n\n\n<div style=\"height:52px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>We have demonstrated how PCA can be used for data visualization and dimensionality reduction, using the handwritten digits dataset as an example. PCA is a valuable tool for any data scientist or machine learning practitioner, and we encourage you to explore its many applications.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/amalgamcs.com\/\"><img decoding=\"async\" loading=\"lazy\" width=\"1024\" height=\"378\" src=\"http:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/Original-Logo-1024x378.png\" alt=\"AmalgamCS Logo\" class=\"wp-image-76\" srcset=\"https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/Original-Logo-1024x378.png 1024w, https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/Original-Logo-300x111.png 300w, https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/Original-Logo-768x284.png 768w, https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/Original-Logo-1536x567.png 1536w, https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/Original-Logo-2048x756.png 2048w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption class=\"wp-element-caption\"><a href=\"https:\/\/amalgamcs.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/amalgamcs.com\/<\/a><\/figcaption><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>PCA is a technique for transforming a set of correlated variables into a set of linearly uncorrelated variables, known as principal components. The first principal component is the direction in the data with the highest variance, and each subsequent principal component has the highest variance orthogonal to the previous principal [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[3],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Exploring Dimensionality Reduction and Data Visualization with PCA in Python - AmalgamCS Tech Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Exploring Dimensionality Reduction and Data Visualization with PCA in Python - AmalgamCS Tech Blog\" \/>\n<meta property=\"og:description\" content=\"PCA is a technique for transforming a set of correlated variables into a set of linearly uncorrelated variables, known as principal components. The first principal component is the direction in the data with the highest variance, and each subsequent principal component has the highest variance orthogonal to the previous principal [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/\" \/>\n<meta property=\"og:site_name\" content=\"AmalgamCS Tech Blog\" \/>\n<meta property=\"article:published_time\" content=\"2023-04-15T00:29:52+00:00\" \/>\n<meta name=\"author\" content=\"Garrik Hoyt\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Garrik Hoyt\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/\"},\"author\":{\"name\":\"Garrik Hoyt\",\"@id\":\"https:\/\/blog.amalgamcs.com\/#\/schema\/person\/97a98f183f3f756243c26dbed73f8922\"},\"headline\":\"Exploring Dimensionality Reduction and Data Visualization with PCA in Python\",\"datePublished\":\"2023-04-15T00:29:52+00:00\",\"dateModified\":\"2023-04-15T00:29:52+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/\"},\"wordCount\":355,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/blog.amalgamcs.com\/#organization\"},\"articleSection\":[\"A.I.\/M.L.\/Data Science\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/\",\"url\":\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/\",\"name\":\"Exploring Dimensionality Reduction and Data Visualization with PCA in Python - AmalgamCS Tech Blog\",\"isPartOf\":{\"@id\":\"https:\/\/blog.amalgamcs.com\/#website\"},\"datePublished\":\"2023-04-15T00:29:52+00:00\",\"dateModified\":\"2023-04-15T00:29:52+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.amalgamcs.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Exploring Dimensionality Reduction and Data Visualization with PCA in Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.amalgamcs.com\/#website\",\"url\":\"https:\/\/blog.amalgamcs.com\/\",\"name\":\"AmalgamCS Tech Blog\",\"description\":\"Curated information on the latest in tech\",\"publisher\":{\"@id\":\"https:\/\/blog.amalgamcs.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.amalgamcs.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/blog.amalgamcs.com\/#organization\",\"name\":\"AmalgamCS\",\"url\":\"https:\/\/blog.amalgamcs.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.amalgamcs.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/cropped-cropped-Transparent-Logo.png\",\"contentUrl\":\"https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/cropped-cropped-Transparent-Logo.png\",\"width\":2493,\"height\":485,\"caption\":\"AmalgamCS\"},\"image\":{\"@id\":\"https:\/\/blog.amalgamcs.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.amalgamcs.com\/#\/schema\/person\/97a98f183f3f756243c26dbed73f8922\",\"name\":\"Garrik Hoyt\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.amalgamcs.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/91f854d9f252604310ae9cef7d5ab86d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/91f854d9f252604310ae9cef7d5ab86d?s=96&d=mm&r=g\",\"caption\":\"Garrik Hoyt\"},\"sameAs\":[\"http:\/\/blog.amalgamcs.com\"],\"url\":\"https:\/\/blog.amalgamcs.com\/index.php\/author\/amalgamdvlpmnt\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Exploring Dimensionality Reduction and Data Visualization with PCA in Python - AmalgamCS Tech Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/","og_locale":"en_US","og_type":"article","og_title":"Exploring Dimensionality Reduction and Data Visualization with PCA in Python - AmalgamCS Tech Blog","og_description":"PCA is a technique for transforming a set of correlated variables into a set of linearly uncorrelated variables, known as principal components. The first principal component is the direction in the data with the highest variance, and each subsequent principal component has the highest variance orthogonal to the previous principal [&hellip;]","og_url":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/","og_site_name":"AmalgamCS Tech Blog","article_published_time":"2023-04-15T00:29:52+00:00","author":"Garrik Hoyt","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Garrik Hoyt","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#article","isPartOf":{"@id":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/"},"author":{"name":"Garrik Hoyt","@id":"https:\/\/blog.amalgamcs.com\/#\/schema\/person\/97a98f183f3f756243c26dbed73f8922"},"headline":"Exploring Dimensionality Reduction and Data Visualization with PCA in Python","datePublished":"2023-04-15T00:29:52+00:00","dateModified":"2023-04-15T00:29:52+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/"},"wordCount":355,"commentCount":0,"publisher":{"@id":"https:\/\/blog.amalgamcs.com\/#organization"},"articleSection":["A.I.\/M.L.\/Data Science"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/","url":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/","name":"Exploring Dimensionality Reduction and Data Visualization with PCA in Python - AmalgamCS Tech Blog","isPartOf":{"@id":"https:\/\/blog.amalgamcs.com\/#website"},"datePublished":"2023-04-15T00:29:52+00:00","dateModified":"2023-04-15T00:29:52+00:00","breadcrumb":{"@id":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/blog.amalgamcs.com\/index.php\/2023\/04\/15\/exploring-dimensionality-reduction-and-data-visualization-with-pca-in-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.amalgamcs.com\/"},{"@type":"ListItem","position":2,"name":"Exploring Dimensionality Reduction and Data Visualization with PCA in Python"}]},{"@type":"WebSite","@id":"https:\/\/blog.amalgamcs.com\/#website","url":"https:\/\/blog.amalgamcs.com\/","name":"AmalgamCS Tech Blog","description":"Curated information on the latest in tech","publisher":{"@id":"https:\/\/blog.amalgamcs.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.amalgamcs.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/blog.amalgamcs.com\/#organization","name":"AmalgamCS","url":"https:\/\/blog.amalgamcs.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.amalgamcs.com\/#\/schema\/logo\/image\/","url":"https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/cropped-cropped-Transparent-Logo.png","contentUrl":"https:\/\/blog.amalgamcs.com\/wp-content\/uploads\/2023\/03\/cropped-cropped-Transparent-Logo.png","width":2493,"height":485,"caption":"AmalgamCS"},"image":{"@id":"https:\/\/blog.amalgamcs.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/blog.amalgamcs.com\/#\/schema\/person\/97a98f183f3f756243c26dbed73f8922","name":"Garrik Hoyt","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.amalgamcs.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/91f854d9f252604310ae9cef7d5ab86d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/91f854d9f252604310ae9cef7d5ab86d?s=96&d=mm&r=g","caption":"Garrik Hoyt"},"sameAs":["http:\/\/blog.amalgamcs.com"],"url":"https:\/\/blog.amalgamcs.com\/index.php\/author\/amalgamdvlpmnt\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/posts\/194"}],"collection":[{"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/comments?post=194"}],"version-history":[{"count":1,"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/posts\/194\/revisions"}],"predecessor-version":[{"id":195,"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/posts\/194\/revisions\/195"}],"wp:attachment":[{"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/media?parent=194"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/categories?post=194"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.amalgamcs.com\/index.php\/wp-json\/wp\/v2\/tags?post=194"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}