如何查找每个离散变量组的级别

时间:2021-07-16 23:47:07

Not sure exactly how to title my question so please provide feedback if I can make it more clear.

不确定如何标题我的问题,所以如果我能说得更清楚,请提供反馈。

I have a dataframe where a couple of the columns look similar to this:

我有一个数据框,其中有几列看起来类似于:

    Node     Component  Value
1     A        os.name RedHat
2     A     os.version  16.04
3     A docker.version 1.13.1
4     A kernel.version 3.10.0
5     B        os.name RedHat
6     B     os.version  16.04
7     B docker.version 1.12.1
8     B kernel.version 3.11.0
9     C        os.name Ubuntu
10    C     os.version  18.04
11    C docker.version 1.12.1
12    C kernel.version 3.12.0
13    D        os.name RedHat
14    D     os.version  17.04
15    D docker.version 1.13.1
16    D kernel.version 3.13.0

Which can be reproduced with:

哪个可以复制:

    structure(list(Node = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("A", "B", "C", 
"D"), class = "factor"), Component = structure(c(3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L), .Label = c("docker.version", 
"kernel.version", "os.name", "os.version"), class = "factor"), 
    Value = structure(c(10L, 3L, 2L, 6L, 10L, 3L, 1L, 7L, 11L, 
    5L, 1L, 8L, 10L, 4L, 2L, 9L), .Label = c("1.12.1", "1.13.1", 
    "16.04", "17.04", "18.04", "3.10.0", "3.11.0", "3.12.0", 
    "3.13.0", "RedHat", "Ubuntu"), class = "factor")), class = "data.frame", row.names = c(NA, 
-16L))

This data is being rendered in a shiny app that is used to find nodes that are not using the correct version (represented by value) of a component. I have created a "baseline" dataframe that lists what version a component should be on, and using the rpivotTable package, I display the nodes that do not match this "baseline". In some cases, the baseline may need to be updated by the user. I am trying to determine a way to present the user with all of the possible values for each component, so they can reactively modify the baseline and the pivot table will update.

此数据在一个闪亮的应用程序中呈现,该应用程序用于查找未使用组件的正确版本(由值表示)的节点。我创建了一个“基线”数据框,列出了组件应该在哪个版本,并使用rpivotTable包,显示与此“基线”不匹配的节点。在某些情况下,用户可能需要更新基线。我正在尝试确定向用户显示每个组件的所有可能值的方法,以便他们可以反应性地修改基线,并且数据透视表将更新。

I considered using select inputs in the ui and maybe even a handsontable but I cannot figure out how to render these selections without hard coding them into the "choices" or "levels" which is why I am here. Note: there are more than 100 components to I plan to use a loop to dynamically generate the selectInput or use a handsontable instead).

我考虑在ui中使用select输入,甚至可能是一个handontable,但我无法弄清楚如何渲染这些选择而不将其硬编码到“选择”或“级别”,这就是我在这里的原因。注意:我计划使用一个循环动态生成selectInput或使用handontable来生成100多个组件。

Is there a way to access the levels for each variable group? For example, the user would modify the baseline for os.verion and select either 16.04, 17.04, or 18.04 as the target baseline. Is there a way to leverage group_by for this?

有没有办法访问每个变量组的级别?例如,用户将修改os.verion的基线,并选择16.04,17.04或18.04作为目标基线。有没有办法利用group_by?

Here is a sample app of what I am trying to accomplish with the ui without having to manually configure the choices: 如何查找每个离散变量组的级别

这是一个示例应用程序,我想用ui完成的任务,而无需手动配置选项:

EDIT:

Hopefully this will provide some better clarification to what I am asking. In the same way levels(df$Component) provides the factor levels for the Component category, is there a way to drill down to each component to get it's levels? I know some functions are able to accomplish this in some way such as how

希望这将为我提出的问题提供一些更好的说明。以同样的方式,level(df $ Component)提供了Component类别的因子级别,有没有办法深入到每个组件以获得它的级别?我知道有些功能可以通过某种方式实现这一点,例如如何实现

df %>% group_by(Component) %>%
  add_count(Value)

provides counts based upon the Value grouped by Component.

根据按组分组的值提供计数。

1 个解决方案

#1


0  

If I understand your question correctly, the solution is to reshape your data from long to wide format. Currently, you have every aspect of a node on its own row, which makes it hard to find out what values it can have. If you reshape the data into the wide format, it will be easy to access.

如果我正确理解您的问题,解决方案是将数据从长格式转换为宽格式。目前,您将节点的每个方面都放在自己的行上,这使得很难找到它可以拥有的值。如果将数据重新整形为宽格式,则可以轻松访问。

Try this:

library(tidyverse)

newdf <- spread(df, Component, Value, convert = TRUE)

#   Node docker.version kernel.version os.name os.version
# 1    A         1.13.1         3.10.0  RedHat      16.04
# 2    B         1.12.1         3.11.0  RedHat      16.04
# 3    C         1.12.1         3.12.0  Ubuntu      18.04
# 4    D         1.13.1         3.13.0  RedHat      17.04

unique(newdf$docker.version)

# [1] "1.13.1" "1.12.1"

The setting convert = TRUE automatically chooses a suitable data type for the new columns. You need to check if you want this to happen. In this case, the os.version column is of numeric type, which you might not want. It also transforms string columns into character rather than factor data type. This means you can access unique values using unique or you can use levels after transforming them back into factors.

设置convert = TRUE会自动为新列选择合适的数据类型。您需要检查是否要发生这种情况。在这种情况下,os.version列是数字类型,您可能不需要。它还将字符串列转换为字符而不是因子数据类型。这意味着您可以使用unique访问唯一值,也可以在将它们转换回因子后使用级别。

#1


0  

If I understand your question correctly, the solution is to reshape your data from long to wide format. Currently, you have every aspect of a node on its own row, which makes it hard to find out what values it can have. If you reshape the data into the wide format, it will be easy to access.

如果我正确理解您的问题,解决方案是将数据从长格式转换为宽格式。目前,您将节点的每个方面都放在自己的行上,这使得很难找到它可以拥有的值。如果将数据重新整形为宽格式,则可以轻松访问。

Try this:

library(tidyverse)

newdf <- spread(df, Component, Value, convert = TRUE)

#   Node docker.version kernel.version os.name os.version
# 1    A         1.13.1         3.10.0  RedHat      16.04
# 2    B         1.12.1         3.11.0  RedHat      16.04
# 3    C         1.12.1         3.12.0  Ubuntu      18.04
# 4    D         1.13.1         3.13.0  RedHat      17.04

unique(newdf$docker.version)

# [1] "1.13.1" "1.12.1"

The setting convert = TRUE automatically chooses a suitable data type for the new columns. You need to check if you want this to happen. In this case, the os.version column is of numeric type, which you might not want. It also transforms string columns into character rather than factor data type. This means you can access unique values using unique or you can use levels after transforming them back into factors.

设置convert = TRUE会自动为新列选择合适的数据类型。您需要检查是否要发生这种情况。在这种情况下,os.version列是数字类型,您可能不需要。它还将字符串列转换为字符而不是因子数据类型。这意味着您可以使用unique访问唯一值,也可以在将它们转换回因子后使用级别。