RIOT: I/O efficient numerical computing without SQL
Journal
CIDR 2009 - 4th Biennal Conference on Innovative Data Systems Research
Date Issued
2009
Author(s)
Abstract
R is a numerical computing environment that is widely popular for
statistical data analysis. Like many such environments, R performs
poorly for large datasets whose sizes exceed that of physical mem-
ory. We present our vision of RIOT(R with I/O Transparency),
a system that makes R programs I/O-efficient in a way transpar-
ent to the users. We describe our experience with
RIOT-DB, aninitial prototype that uses a relational database system as a back-
end. Despite the overhead and inadequacy of generic database sys-
tems in handling array data and numerical computation, RIOT-DB
significantly outperforms R in many large-data scenarios, thanks
to a suite of high-level, inter-operation optimizations that integrate
seamlessly into R. While many techniques in RIOT are inspired by
databases (and, for RIOT-DB, realized by a database system), RIOT
users are insulated from anything database related. Compared with
previous approaches that require users to learn new languages and
rewrite their programs to interface with a database, RIOT will, we
believe, be easier to adopt by the majority of the R user.
statistical data analysis. Like many such environments, R performs
poorly for large datasets whose sizes exceed that of physical mem-
ory. We present our vision of RIOT(R with I/O Transparency),
a system that makes R programs I/O-efficient in a way transpar-
ent to the users. We describe our experience with
RIOT-DB, aninitial prototype that uses a relational database system as a back-
end. Despite the overhead and inadequacy of generic database sys-
tems in handling array data and numerical computation, RIOT-DB
significantly outperforms R in many large-data scenarios, thanks
to a suite of high-level, inter-operation optimizations that integrate
seamlessly into R. While many techniques in RIOT are inspired by
databases (and, for RIOT-DB, realized by a database system), RIOT
users are insulated from anything database related. Compared with
previous approaches that require users to learn new languages and
rewrite their programs to interface with a database, RIOT will, we
believe, be easier to adopt by the majority of the R user.

