Native Interfaces for R
Seth Falcon
Fred Hutchinson Cancer Research Center
20-21 May, 2010
Outline
The .Call Native Interface
R types at the C-level
Memory management in R
Self-Study Exercises
...
which as implemented in R prior to R-2.11
which <- function(x) {
seq_along(x)[x & !is.na(x)]
}
which as implemented in R prior to R-2.11
which <- function(x) {
seq_along(x)[x & !is.na(x)]
}
...
which in C
1 SEXP nid_which(SEXP v) {
2 SEXP ans;
3 int i, j = 0, len = Rf_length(v), *buf, *tf;
...
which is now 10x faster
system.time(which(v))
R Version System time Elapsed time
R-2.10 0.0...
Making use of existing code
Access algorithms Interface to other systems
RBGL ...
Review of C
#include <stdio.h>
double _nid_avg(int *data, int len) {
int i;
double ans = 0.0;
...
The .Call interface
Symbolic Expressions (SEXPs)
The SEXP is R’s fundamental data type at the C-level
SEXP is short for symbol...
.Call Start to Finish Demo
Demo
.Call Disection
R
# R/somefile.R
avg <- function(x) .Call(.nid_avg, x)
# NAMESPACE
useDynLib("yourPackage"...
.Call C function
#include <Rinternals.h>
SEXP nid_avg(SEXP data)
{
SEXP ans;
double v = _nid_avg...
.Call C function
#include <Rinternals.h>
SEXP nid_avg(SEXP data)
{
SEXP ans;
double v = _nid_avg...
.Call C function
#include <Rinternals.h>
SEXP nid_avg(SEXP data)
{
SEXP ans;
double v = _nid_avg...
.Call C function
#include <Rinternals.h>
SEXP nid_avg(SEXP data)
{
SEXP ans;
double v = _nid_avg...
.Call C function
#include <Rinternals.h>
SEXP nid_avg(SEXP data)
{
SEXP ans;
double v = _nid_avg...
.Call C function
#include <Rinternals.h>
SEXP nid_avg(SEXP data)
{
SEXP ans;
double v = _nid_avg...
.Call C function
#include <Rinternals.h>
SEXP nid_avg(SEXP data)
{
SEXP ans;
double v = _nid_avg...
Function Registration
In NAMESPACE
In C via R_CallMethodDef (See
ShortRead/src/R_init_ShortRead.c fo...
Outline
The .Call Native Interface
R types at the C-level
Memory management in R
Self-Study Exercises
...
Just one thing to learn
Everything is a SEXP
But there are many different types of SEXPs
INTSXP, R...
What’s a SEXP?
Common SEXP subtypes
R function SEXP subtype Data accessor
integer() INTSXP int *INTEGER(x)
n...
STRSXP/CHARSXP Exercise
Try this:
.Internal(inspect(c("abc", "abc", "xyz")))
STRSXP/CHARSXP Exercise
> .Internal(inspect(c("abc", "abc", "xyz")))
@101d3f280 16 STRSXP g0c3 [] (len=3, tl=1...
character vectors == STRSXPs + CHARSXPs
CHARSXPs are special
STRSXP/CHARSXP API Example
SEXP s = Rf_allocVector(STRSXP, 5);
PROTECT(s);
SET_STRING_ELT(s, 0, mkChar("hello"))...
STRSXP/CHARSXP API Example
SEXP s = Rf_allocVector(STRSXP, 5);
PROTECT(s);
SET_STRING_ELT(s, 0, mkChar("hello"))...
STRSXP/CHARSXP API Example
SEXP s = Rf_allocVector(STRSXP, 5);
PROTECT(s);
SET_STRING_ELT(s, 0, mkChar("hello"))...
STRSXP/CHARSXP API Example
SEXP s = Rf_allocVector(STRSXP, 5);
PROTECT(s);
SET_STRING_ELT(s, 0, mkChar("hello"))...
STRSXP/CHARSXP API Example
SEXP s = Rf_allocVector(STRSXP, 5);
PROTECT(s);
SET_STRING_ELT(s, 0, mkChar("hello"))...
Outline
The .Call Native Interface
R types at the C-level
Memory management in R
Self-Study Exercises
...
R’s memory model (dramatization)
All SEXPs
PROTECT ...
R’s memory model
R allocates, tracks, and garbage collects SEXPs
gc is triggered by R functions
SEXPs th...
Preventing gc of SEXPs with the protection stack
PROTECT(s) Push s onto the protection stack
UNPROTECT(n) ...
Generic memory: When everything isn’t a SEXP
Allocation Function Lifecycle
R_alloc Memory is f...
Case Study: C implementation of which
Code review of nidemo/src/which.c
Outline
The .Call Native Interface
R types at the C-level
Memory management in R
Self-Study Exercises
...
Alphabet frequency of a text file
Yeast Gene YDL143W English Dictionary
a ...
Self-Study in nidemo package
1. Alphabet Frequency
Explore R and C implementations
Make C imp...
Outline
The .Call Native Interface
R types at the C-level
Memory management in R
Self-Study Exercises
...
Odds and Ends
1. Debugging
2. Public API
3. Calling R from C
4. Making sense of the remapped function ...
Debugging Native Code
Rprintf("DEBUG: v => %d, x => %sn", v, x);
$ R -d gdb
(gdb) run
> # now R is running
...
R’s Public API
mkdir R-devel-build; cd R-devel-build
~/src/R-devel-src/configure && make
## hint: configure --...
Calling R functions from C
1. Build a pairlist
2. call Rf_eval(s, rho)
Remapped functions
Source is unadorned
Preprocessor adds Rf_
Object code contains Rf_
Finding C implementations
cd R-devel/src/main
grep '"any"' names.c
{"any", do_logic3, 2, ...}
grep -l do...
R CMD rtags
cd R-devel/src
R CMD rtags --no-Rd
# creates index file TAGS
Outline
The .Call Native Interface
R types at the C-level
Memory management in R
Self-Study Exercises
...
Writing R Extensions
RShowDoc("R-exts")
Book: The C Programming Language (K & R)
Java in a Nutshell 1264 pages
The C++ Programming Language ...
Resources
Read WRE and “K & R”
Use the sources for R and packages
R-devel, bioc-devel mailing lists
...
of 54

Native interfaces for R

Presentation on native interfaces for the R programming language given as part of a course in advanced R programming at FHCRC: https://secure.bioconductor.org/SeattleMay10/
Published on: Mar 3, 2016
Published in: Technology      
Source: www.slideshare.net


Transcripts - Native interfaces for R

  • 1. Native Interfaces for R Seth Falcon Fred Hutchinson Cancer Research Center 20-21 May, 2010
  • 2. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
  • 3. which as implemented in R prior to R-2.11 which <- function(x) { seq_along(x)[x & !is.na(x)] }
  • 4. which as implemented in R prior to R-2.11 which <- function(x) { seq_along(x)[x & !is.na(x)] } 1. is.na 2. ! 3. & 4. seq_along 5. [
  • 5. which in C 1 SEXP nid_which(SEXP v) { 2 SEXP ans; 3 int i, j = 0, len = Rf_length(v), *buf, *tf; 4 buf = (int *) R_alloc(len, sizeof(int)); 5 tf = LOGICAL(v); 6 for (i = 0; i < len; i++) { 7 if (tf[i] == TRUE) buf[j] = i + 1; j++; 8 } 9 ans = Rf_allocVector(INTSXP, j); 10 memcpy(INTEGER(ans), buf, sizeof(int) * j); 11 return ans; }
  • 6. which is now 10x faster system.time(which(v)) R Version System time Elapsed time R-2.10 0.018 0.485 R-2.11 0.001 0.052 v is a logical vector with 10 million elements
  • 7. Making use of existing code Access algorithms Interface to other systems RBGL RSQLite RCurl netcdf Rsamtools SJava
  • 8. Review of C #include <stdio.h> double _nid_avg(int *data, int len) { int i; double ans = 0.0; for (i = 0; i < len; i++) { ans += data[i]; } return ans / len; } main() { int ex_data[5] = {1, 2, 3, 4, 5}; double m = _nid_avg(ex_data, 5); printf("%fn", m); }
  • 9. The .Call interface
  • 10. Symbolic Expressions (SEXPs) The SEXP is R’s fundamental data type at the C-level SEXP is short for symbolic expression and is borrowed from Lisp History at http://en.wikipedia.org/wiki/S-expression
  • 11. .Call Start to Finish Demo Demo
  • 12. .Call Disection R # R/somefile.R avg <- function(x) .Call(.nid_avg, x) # NAMESPACE useDynLib("yourPackage", .nid_avg = nid_avg) C /* src/some.c */ #include <Rinternals.h> SEXP nid_avg(SEXP data) { ... INTEGER(data) ... }
  • 13. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
  • 14. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
  • 15. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
  • 16. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
  • 17. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
  • 18. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
  • 19. .Call C function #include <Rinternals.h> SEXP nid_avg(SEXP data) { SEXP ans; double v = _nid_avg(INTEGER(data), Rf_length(data)); PROTECT(ans = Rf_allocVector(REALSXP, 1)); REAL(ans)[0] = v; UNPROTECT(1); return ans; /* shortcut: return Rf_ScalarReal(v); */ }
  • 20. Function Registration In NAMESPACE In C via R_CallMethodDef (See ShortRead/src/R_init_ShortRead.c for a nice example.
  • 21. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
  • 22. Just one thing to learn Everything is a SEXP But there are many different types of SEXPs INTSXP, REALSXP, STRSXP, LGLSXP, VECSXP, and more.
  • 23. What’s a SEXP?
  • 24. Common SEXP subtypes R function SEXP subtype Data accessor integer() INTSXP int *INTEGER(x) numeric() REALSXP double *REAL(x) logical() LGLSXP int *LOGICAL(x) character() STRSXP CHARSXP STRING ELT(x, i) list() VECSXP SEXP VECTOR ELT(x, i) NULL NILSXP R NilValue externalptr EXTPTRSXP SEXP (accessor funcs)
  • 25. STRSXP/CHARSXP Exercise Try this: .Internal(inspect(c("abc", "abc", "xyz")))
  • 26. STRSXP/CHARSXP Exercise > .Internal(inspect(c("abc", "abc", "xyz"))) @101d3f280 16 STRSXP g0c3 [] (len=3, tl=1) @101cc6278 09 CHARSXP g0c1 [gp=0x20] "abc" @101cc6278 09 CHARSXP g0c1 [gp=0x20] "abc" @101cc6308 09 CHARSXP g0c1 [gp=0x20] "xyz"
  • 27. character vectors == STRSXPs + CHARSXPs
  • 28. CHARSXPs are special
  • 29. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
  • 30. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
  • 31. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
  • 32. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
  • 33. STRSXP/CHARSXP API Example SEXP s = Rf_allocVector(STRSXP, 5); PROTECT(s); SET_STRING_ELT(s, 0, mkChar("hello")); SET_STRING_ELT(s, 1, mkChar("goodbye")); SEXP c = STRING_ELT(s, 0); const char *v = CHAR(c); UNPROTECT(1);
  • 34. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
  • 35. R’s memory model (dramatization) All SEXPs PROTECT reachable stack Precious list DRAMATIZATION
  • 36. R’s memory model R allocates, tracks, and garbage collects SEXPs gc is triggered by R functions SEXPs that are not in-use are recycled. A SEXP is in-use if: it is on the protection stack (PROTECT/UNPROTECT) it is in the precious list (R PreserveObject/R ReleaseObject) it is reachable from a SEXP in the in-use list. All SEXPs PROTECT reachable stack Precious list DRAMATIZATION
  • 37. Preventing gc of SEXPs with the protection stack PROTECT(s) Push s onto the protection stack UNPROTECT(n) Pop top n items off of the protection stack
  • 38. Generic memory: When everything isn’t a SEXP Allocation Function Lifecycle R_alloc Memory is freed when returning from .Call Calloc/Free Memory persists until Free is called
  • 39. Case Study: C implementation of which Code review of nidemo/src/which.c
  • 40. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
  • 41. Alphabet frequency of a text file Yeast Gene YDL143W English Dictionary a e t i g a c o d r 0.00 0.10 0.20 0.30 0.07 0.08 0.09 0.10 frequency frequency
  • 42. Self-Study in nidemo package 1. Alphabet Frequency Explore R and C implementations Make C implementation robust Attach names attribute in C multiple file, matrix return enhancement 2. External pointer example 3. Calling R from C example and enhancement 4. Debugging: a video how-to
  • 43. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
  • 44. Odds and Ends 1. Debugging 2. Public API 3. Calling R from C 4. Making sense of the remapped function names 5. Finding C implementations of R functions 6. Using a TAGS file
  • 45. Debugging Native Code Rprintf("DEBUG: v => %d, x => %sn", v, x); $ R -d gdb (gdb) run > # now R is running There are two ways to write error-free programs; only the third one works. – Alan Perlis, Epigrams on Programming
  • 46. R’s Public API mkdir R-devel-build; cd R-devel-build ~/src/R-devel-src/configure && make ## hint: configure --help ls include R.h Rdefines.h Rinterface.h S.h R_ext/ Rembedded.h Rinternals.h Details in WRE
  • 47. Calling R functions from C 1. Build a pairlist 2. call Rf_eval(s, rho)
  • 48. Remapped functions Source is unadorned Preprocessor adds Rf_ Object code contains Rf_
  • 49. Finding C implementations cd R-devel/src/main grep '"any"' names.c {"any", do_logic3, 2, ...} grep -l do_logic3 *.c logic.c
  • 50. R CMD rtags cd R-devel/src R CMD rtags --no-Rd # creates index file TAGS
  • 51. Outline The .Call Native Interface R types at the C-level Memory management in R Self-Study Exercises Odds and Ends Resources
  • 52. Writing R Extensions RShowDoc("R-exts")
  • 53. Book: The C Programming Language (K & R) Java in a Nutshell 1264 pages The C++ Programming Language 1030 pages The C Programming Language 274 pages
  • 54. Resources Read WRE and “K & R” Use the sources for R and packages R-devel, bioc-devel mailing lists Patience

Related Documents