donderdag 20 oktober 2011

R-Statistics Error messages in Pipeline Pilot

Updated!
(I am seeing about 5-10 views a day on the Pipeline Pilot pages, please be so kind to acknowledge / cite my blog when you use these tools and guides).

Over the last years I have been using R to create my models. However the interface running on top of R (doing the data shaping and fingerprint folding) was pipeline pilot. This works quite nice and efficient (although one could think of better solutions, but for my works this set up suffices). 



When there are errors in your data though, things go wrong. Not all error messages are as intuitive as you would like. The pipeline pilot help can't really help here either, so over the last years I have kept a list of error codes and what they mean in practice. I have listed it here so that anyone else struggling with an unknown error might find it. however this is also convenient for myself as online these things are retrieved quicker than on network share xxx :). 

The organisation is as follows, the closed dot with italic characters is the actual error message received (trimmed), the white dot with regular text contains a possible cause, the closed square a solution.

Related to SVM as performed in the “e1071” package:
  • Error in svm.default(x, y, scale = scale, ..., na.action = na.action) : 
  • dependent variable has to be of factor or integer type for classification mode. 
  •  Calls: doCV -> modelfunc -> svm -> svm.formula -> svm.default
    • Fingerprint properties are not recognized as fingerprints
      • Set property type of properties to learn from to “fingerprint” (like 'SciTegic.value.IntegerFingerprintValue')
      • Set option convert fingerprints to “Fixed-Length array of bits”
      • Possibly due to merge there are array properties present (multiple values for one property)

  • Error in …. Subscript out of bounds
    • The property to learn is incorrect
      • Two values  present in one property where there should be one
      • Only actives are present
    • No properties present to learn from
      • Possibly through ignore properties)

  • Empty beginning of file
    • The property to learn is incorrect.
      • Either not present in the stream
      • The name is misspelled

  • Missing properties in file
    • Problem with the fingerprints that are being input in a learned model.
      • The ‘change fingerprints to fixed length bit size’ is executed wrongly,
      • This specific property is missing
      • Set property type to fingerprint has not been performed ('SciTegic.value.IntegerFingerprintValue')

  • "Error in svd(x, nu = 0) : 0 extent dimensions"
    • When performing a PCA, (multiple) properties are not considered to be numeric.
      • Decimal comma instead of dot

  • “Error in svm.default(x,y,scale,…..): C <= 0!”
    • The allocation of a cost value is incorrect.
      • Decimal comma instead of dot

  •  “Error in matrix(ret$dec, nrow = nrow(newdata), byrow = TRUE, dimnames = list(rowns,  :   matrix: invalid 'ncol' value (< 0)Execution halted”
    • Properties to learn from defined incorrect
      • “allpropertiesonfirstdata” instead of “user set” when not all properties are present in all records

  • Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :           Need numeric dependent variable for regression. In addition: Warning message:data length exceeds size of matrix
    • Property to learn from contains non-numeric characters
    • Continuous model selected for classification data

  • Error in cor(preds[[1]], preds[[2]], method = "pearson") : missing observations in cov/cor. In addition: Warning messages: 1-5: data length exceeds size of matrix
    • Non numeric properties are used to learn from in regression.
      • Use ‘IgnoreProperties’ to exclude non numeric properties
    • Possibly, property should be changed to ('SciTegic.value.IntegerFingerprintValue') while using regression.
  • Error in c(1e-05/nx, 0.001/nx, 1/nx, ) : argument 4 is empty
    • Gamma values to be sampled ends with comm rather than value
      • Remove comma at the end or add value

  • Error in svm.default(x, y, scale = scale, ..., na.action = na.action) :  NA/NaN/Inf in foreign function call (arg 4) Calls: doCV ... modelfunc -> svm -> svm.formula -> svm.default -> .C
    • Property to learn from non-numeric
      • Inf’ rather than numeric

  • Error in withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) : invalid multibyte string at '<b2>II' (or at '<a0>hydra) Calls: readxy -> cleandata -> FactorOrNumber Execution halted
    • Array property present formatted as blabla[1], blabla[2], etc.
      • Flatten to single properties (eg turn binary flag on for proeprties present named by value in array property.


Related to decision tree forests as performed in the “randomForest” package:
  • Error in randomForest.default(xy[-1], y, ntree = 500, mtry = mtry, importance = imp) :   NA/NaN/Inf in foreign function call (arg 2) Calls: randomForest -> randomForest.default -> .C
    • Property to learn from non-numeric
      • Inf’ rather than numeric

  • Error in randomForest.default(xy[-1], y, ntree = 70, mtry = mtry, importance = imp,  :  
  • NA not permitted in predictors
    • Property to learn from numeric when classifying

  • Error in comps[c1, c2] <- round(roc12, digits = 4) : replacement has length zero Calls: print -> genroc
    • One of the classes might be present once, making ou-of-bag validation impossible

  • Error in read.table(file = file, header = header, sep = sep, quote = quote,  :
      empty beginning of file Calls: readxy -> read.csv -> read.table
    • Property to learn from is missing from the data
      • Possibly removed using keep / remove properties


  • Error in `rownames<-`(`*tmp*`, value = row.names(x)) :  attempt to set rownames on object with no dimensions  Calls: randomForest ... randomForest.default -> is.na -> is.na.data.frame -> rownames<-
    • One of observations has an incomplete set of variables, one or more descriptors are missing (n/a) 
  • Error in predict.randomForest(model, x, type = "response") :  New factor levels not present in the training data  Calls: predict -> predict.randomForest
    • One of observations has a level for  set one of the variables that was not observed in the training set (e.g. present in the training set : 0,1,2,3 ; value in the test set 6)
      • Make sure each observation is seen in the training set
      • alternatively use continuous variables to describe the datapoints rather than categorical
  • Error in withCallingHandlers(expr, warning = function(w) invokeRestart("muffleWarning")) : invalid multibyte string at '<b2>II' (or at '<a0>hydra) Calls: readxy -> cleandata -> FactorOrNumber Execution halted
    • Array property present formatted as blabla[1], blabla[2], etc.
      • Flatten to single properties (eg turn binary flag on for proeprties present named by value in array property.

Hope this helps anyone when stuck (and that this page is indexed by Google, well probably isn't the case)

5 opmerkingen:

  1. Yes, this helps! I am taking a class with Dr. Jeff Leek from Johns Hopkins, and yours is the only blog with helpful hints for the following R message:
    Error in randomForest.default(trainingset, na.action = false) :
    NA/NaN/Inf in foreign function call (arg 1)

    Thank you! dank u!

    BeantwoordenVerwijderen
  2. No problem at all! If you have other error messages that are not listed and should be on the page, please feel free to contact me.

    BeantwoordenVerwijderen
  3. Error in predict.svm(ret, xhold, decision.values = TRUE) :
    NA/NaN/Inf in foreign function call (arg 9)

    BeantwoordenVerwijderen
  4. PLEASE WHAT IS THE MEANING OF THIS ERROR MESSAGE IN R:
    Error in predict.svm(ret, xhold, decision.values = TRUE) :
    NA/NaN/Inf in foreign function call (arg 9)

    BeantwoordenVerwijderen
  5. It would appear you are trying to model a non-numeric value with your SVM while it is expecting a numeric value. See also above:

    Error in svm.default(x, y, scale = scale, ..., na.action = na.action) : NA/NaN/Inf in foreign function call (arg 4) Calls: doCV ... modelfunc -> svm -> svm.formula -> svm.default -> .C

    Property to learn from non-numeric

    ‘Inf’ rather than numeric

    BeantwoordenVerwijderen