Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension of createNotes() to work with levels specified by seperators #40

Open
panoptikum opened this issue Oct 19, 2018 · 6 comments
Open

Comments

@panoptikum
Copy link

panoptikum commented Oct 19, 2018

Hello,

sorry for my novice behaviour regarding pull requests and so on.

I'm opening an issue for this as I've announced:

It would be great if nodes could be specified by a separator such as an underscore.

This way the function could handle different length of nodes names.

My current solution looks for hts.R can be found in my fork repo. This time I only use function from base R:

panoptikum@1d1ee22

I've tested with the examples:

abc <- ts(5 + matrix(sort(rnorm(1000)), ncol = 10, nrow = 100))
colnames(abc) <- c("AA_100_A_172", "AA_100_A_172", "A_10_C_A", "A_2_B_21", "A_2_B_DA","B30_A_1_H", "B30_B_3_Z", "B30_B_1_%", "B_40_A_2", "B_40_A_3")
y <- hts(abc, characters = c(1, 2, 1), sep="_")

and

abc <- ts(5 + matrix(sort(rnorm(1000)), ncol = 10, nrow = 100))
colnames(abc) <- c("AA_100_A", "AA_10_B1Z2", "A_10_C", "A_2_AB", "A_2_B","B30_A_1", "B30_B_3", "B30_CA_1", "B_40_A", "B_40_B")
y <- hts(abc, characters = c(1, 2, 1), sep="_")

gave me for y$nodes:

$`Level 1`
[1] 4

$`Level 2`
 AA   A B30   B 
  2   3   3   2 

$`Level 3`
AA100   A10    A2  B30A  B30B   B40 
    2     1     2     1     2     2 

$`Level 4`
AA100A   A10C    A2B  B30A1  B30B3  B30B1   B40A 
     2      1      2      1      1      1      2 

and respectively:

$`Level 1`
[1] 4

$`Level 2`
 AA   A B30   B 
  2   3   3   2 

$`Level 3`
AA100  AA10   A10    A2  B30A  B30B B30CA   B40 
    1     1     1     2     1     1     1     2 

Best

@jnuvenus
Copy link

@panoptikum Hello, I have used your function CreateNodes() to create nodes, but got into the debug mode. Then I tried your function hts(), also got into the debug mode. Could you give a detail example? Thanks.

@panoptikum
Copy link
Author

panoptikum commented Oct 25, 2018

@jnuvenus Thanks for trying out my function. I forgot to remove a browser() within the function which was there for debug purposes. It should work with the above example now, but I can highlight the crucial code changes as well.
You have to source the whole hts.R file to ensure functionality or load my fork of HTS.

Let me know, if it works or not.

@jnuvenus
Copy link

@panoptikum Finally, I use hive sql to calculate the num of every level of the nodes, and then use R arrange() fun to sort them。

@panoptikum
Copy link
Author

@jnuvenus Well, I'm sure others would like to stay within R and this package, but I'm pleased to hear that you've found a working solution for you.

@jnuvenus
Copy link

jnuvenus commented Nov 28, 2018

@panoptikum Hello, I use CreateNodes() in your new hts() function,
cols <- c('yCN01_y755Y_y755AC','yCN01_y755Y_y755AG','yCN01_y023Y_y023Y00001','yCN02_y010A_y010AAC')
gtnode <- CreateNodes(bnames=cols, characters = c(1, 2, 1), sep="_")$nodes
then, I got result as follows,
[[1]]
[1] 2

[[2]]
yCN01 yCN02
3 1

[[3]]
yCN01y755Y yCN01y023Y yCN02y010A
2 1 1
This result have one error, the num of yCN01 should be 2.
So, I fixed the bug in cnt count part in your CreateNodes() ,
cnt <- sapply(x_1, function(z) {
vec1 <- sapply(bnames_split, function(i) {paste(c(paste(i[1:(x-1)], collapse = ""),i[x]), collapse = "")})
vec1 <- unique(vec1)
vec <- sapply(vec1, function(j){strsplit( j,"
")[[1]][1]==z})
sum(vec, na.rm = TRUE)
})
then I got result as follows,
[[1]]
[1] 2

[[2]]
yCN01 yCN02
2 1

[[3]]
yCN01y755Y yCN01y023Y yCN02y010A
2 1 1

Maybe you can try this fix on other examples.

@medoeje
Copy link

medoeje commented Apr 22, 2020

Is there any news in terms of this functionality?
I agree, it's so painful to prepare groups. I had to write some functions that make new column in a DF as a key with fixed length where it defines the length of each group.

If I have 3 groups it finds max symbol length of each column and add some symbols in the end of each row so that it has the same length as the longest in the column. Then it concatinates 3 colums, Pivot by the concatinated key to columns and passes to HTS also giving the length of each group (max character length).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants