12/21/2023 0 Comments Parallel plotggplot2Īnother solution is to use geom_path from ggplot2. By specifying a weight, we can make the width of the lines dependent on “n”. ggparallel can’t handle dplyr’s “tbl” data frames, so we have to convert it to a traditional data frame first. We can fix this by using our grouped and filtered data frame that only contains the top ten combinations: df_pcp <- as.ame(df_grouped) # this is important! We fixed the order on the y-axis, but still this produces hardly readable output, because every single combination (of about 200) gets displayed. We can directly use the raw data ( df) with it by only specifying the columns which should be used on the x-axis: ggparallel(list('q1_d1', 'q1_d2', 'q1_d3'), df, order = 0) You could for example display voter movement between parties for different elections with it. It’s very good to display “movements” of groups. It implements several methods for this purpose: “hammock plots, parallel sets plots, common angle plots, and common angle plots with a hammock-like adjustment for line widths”. Ggparallel is specially designed for categorical data and does not produce a “classical” parallel coordinate output like ggparcoord. Unfortunately, it’s not possible to make the line width of the PCP dependent on “n”, therefore it only gives a general idea about the most popular “answer paths”. We only need to supply the grouped data frame, set the columns which should appear on the x-axis (the first three columns with the answer combinations) and a column that identifies groups for coloring (“id”). Ggparcoord(df_grouped, columns = 1:3, groupColumn = 'id', scale = 'globalminmax') Let’s try out ggparcoord, which is easy to use: library(GGally) This is optional, but will generate less chaotic plots. We also sorted by count and then only selected the rows with the most counts, hence the most popular answer combinations. We additionally set an “id” column which denotes the unique answer combination. The count per group is automatically stored in a column “n”. # set an "id" string that denotes the value combination Let’s generate a data set for one question with the three dimensions ( q1_d1 to q1_d3): library(triangle) It would also be easily possible to display more than three dimensions. This is ideal to be displayed via PCP, because the three dimensions have the same unit and scale and hence can be easily compared on parallel coordinates (you can also use different units and scales on parallel coordinates, but the interpretation can become quite tricky then). The distribution of answers across the three dimensions should be displayed per question. Each question is asked three times with a different context and can be answered on a discrete scale from 1 to 7. As an example from practice, we assume that we made a survey with some questions. We need some multivariate data with categorical data for our PCPs. In this post I will compare these approaches using a randomly generated data set with three discrete variables. R provides several packages/functions to draw Parallel Coordinate Plots (PCPs): Parallel Coordinate Plots are useful to visualize multivariate data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |