R (programming language)
Paradigms  Multiparadigm: procedural, objectoriented, functional, reflective, imperative, array^{[1]} 

Designed by  Ross Ihaka and Robert Gentleman 
Developer  R Core Team 
First appeared  August 1993 
Stable release  
Typing discipline  Dynamic 
Platform  arm64 and x8664 
License  GNU GPL v2^{[3]} 
Filename extensions 

Website  www 
Influenced by  
Influenced  
Julia^{[7]}  

R is a programming language for statistical computing and data visualization. It has been adopted in the fields of data mining, bioinformatics, and data analysis.^{[8]}
The core R language is augmented by a large number of extension packages, containing reusable code, documentation, and sample data.
R software is opensource and free software. It is licensed by the GNU Project and available under the GNU General Public License.^{[3]} It is written primarily in C, Fortran, and R itself. Precompiled executables are provided for various operating systems.
As an interpreted language, R has a native command line interface. Moreover, multiple thirdparty graphical user interfaces are available, such as RStudio—an integrated development environment—and Jupyter—a notebook interface.
History
R was started by professors Ross Ihaka and Robert Gentleman as a programming language to teach introductory statistics at the University of Auckland.^{[9]} The language was inspired by the S programming language, with most S programs able to run unaltered in R.^{[6]} The language was also inspired by Scheme's lexical scoping, allowing for local variables.^{[1]}
The name of the language, R, comes from being both an S language successor as well as the shared first letter of the authors, Ross and Robert.^{[10]} In August 1993, Ihaka and Gentleman posted a binary of R on StatLib — a data archive website. At the same time, they announced the posting on the snews mailing list.^{[11]} On December 5, 1997, R became a GNU project when version 0.60 was released.^{[12]} On February 29, 2000, the first official 1.0 version was released.^{[13]}
Examples
Mean  a measure of center
A numeric data set may have a central tendency — where some of the most typical data points reside.^{[14]} The arithmetic mean (average) is the most commonly used measure of central tendency.^{[14]} The mean of a numeric data set is the sum of the data points divided by the number of data points.^{[14]}
 Let = the mean of a data set.
 Let = a list of data points.
 Let = the number of data points.
Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart.
 Let = a list of degrees Celsius data points of 30, 27, 31, 28.
This R computer program will output the mean of :
# The c() function "combines" a list into a single object.
x < c( 30, 27, 31, 28 )
sum < sum( x )
length < length( x )
mean < sum / length
message( "Mean:" )
print( mean )
Note: R can have the same identifier represent both a function name and its result. For more information, visit scope.
Output:
Mean:
[1] 29
This R program will execute the native mean()
function to output the mean of x:
x < c( 30, 27, 31, 28 )
message( "Mean:" )
print( mean( x ) )
Output:
Mean:
[1] 29
Standard Deviation  a measure of dispersion
A standard deviation of a numeric data set is an indication of the average distance all the data points are from the mean.^{[15]} For a data set with a small amount of variation, then each data point will be close to the mean, so the standard deviation will be small.^{[15]}
 Let = the standard deviation of a data set.
 Let = a list of data points.
 Let = the number of data points.
Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart.
 Let = a list of degrees Celsius data points of 30, 27, 31, 28.
This R program will output the standard deviation of :
x < c( 30, 27, 31, 28 )
distanceFromMean < x  mean( x )
distanceFromMeanSquared < distanceFromMean ** 2
distanceFromMeanSquaredSum < sum( distanceFromMeanSquared )
variance < distanceFromMeanSquaredSum / ( length( x )  1 )
standardDeviation < sqrt( variance )
message( "Standard deviation:" )
print( standardDeviation )
Output:
Standard deviation:
[1] 1.825742
This R program will execute the native sd()
function to output the standard deviation of :
x < c( 30, 27, 31, 28 )
message( "Standard deviation:" )
print( sd( x ) )
Output:
Standard deviation:
[1] 1.825742
Linear regression  a measure of relation
A phenomenon may be the result of one or more observable events. For example, the phenomenon of skiing accidents may be the result of having snow in the mountains. A method to measure whether or not a numeric data set is related to another data set is linear regression.^{[17]}
 Let = a data set of independent data points, in which each point occurred at a specific time.
 Let = a data set of dependent data points, in which each point occurred at the same time of an independent data point.
If a linear relationship exists, then a scatter plot of the two data sets will show a pattern that resembles a straight line.^{[18]} If a straight line is embedded into the scatter plot such that the average distance from all the points to the line is minimal, then the line is called a regression line. The equation of the regression line is called the regression equation.^{[19]}
The regression equation is a linear equation; therefore, it has a slope and yintercept. The format of the regression equation is
.^{[20]}^{[a]}y ^ = b 0 + b 1 x {\displaystyle {\hat {y}}=b_{0}+b_{1}x}
 Let = the slope of the regression equation.
 Let = the yintercept of the regression equation.
Suppose a sample of four observations of Celsius temperature measurements were taken 12 hours apart. At the same time, the thermometer was switched to Fahrenheit temperature and another measurement was taken.
 Let = a list of degrees Celsius data points of 30, 27, 31, 28.
 Let = a list of degrees Fahrenheit data points of 86.0, 80.6, 87.8, 82.4.
This R program will output the slope and yintercept of a linear relationship in which depends upon :
x < c( 30, 27, 31, 28 )
y < c( 86.0, 80.6, 87.8, 82.4 )
# Build the numerator
independentDistanceFromMean < x  mean( x )
sampledDependentDistanceFromMean < y  mean( y )
independentDistanceTimesSampledDistance <
independentDistanceFromMean *
sampledDependentDistanceFromMean
independentDistanceTimesSampledDistanceSum <
sum( independentDistanceTimesSampledDistance )
# Build the denominator
independentDistanceFromMeanSquared <
independentDistanceFromMean ** 2
independentDistanceFromMeanSquaredSum <
sum( independentDistanceFromMeanSquared )
# Slope is rise over run
slope <
independentDistanceTimesSampledDistanceSum /
independentDistanceFromMeanSquaredSum
yIntercept < mean( y )  slope * ( mean( x ) )
message( "Slope:" )
print( slope )
message( "Yintercept:" )
print( yIntercept )
Output:
Slope:
[1] 1.8
Yintercept:
[1] 32
This R program will execute the native functions to output the slope and yintercept:
x < c( 30, 27, 31, 28 )
y < c( 86.0, 80.6, 87.8, 82.4 )
# Execute lm() with Fahrenheit depends upon Celsius
linearModel < lm( y ~ x )
# coefficients() returns a structure containing the slope and y intercept
coefficients < coefficients( linearModel )
# Extract the slope from the structure
slope < coefficients[["x"]]
# Extract the y intercept from the structure
yIntercept < coefficients[["(Intercept)"]]
message( "Slope:" )
print( slope )
message( "Yintercept:" )
print( yIntercept )
Output:
Slope:
[1] 1.8
Yintercept:
[1] 32
Coefficient of determination  a percentage of variation
The coefficient of determination determines the percentage of variation explained by the independent variable.^{[21]} It always lies between 0 and 1.^{[22]} A value of 0 indicates no relationship between the two data sets, and a value near 1 indicates the regression equation is extremely useful for making predictions.^{[23]}
 Let = the data set of predicted response data points when the independent data points are passed through the regression equation.
 Let = the coefficient of determination in a relationship between an independent variable and a dependent variable.
This R program will output the coefficient of determination of the linear relationship between and :
x < c( 30, 27, 31, 28 )
y < c( 86.0, 80.6, 87.8, 82.4 )
# Build the numerator
linearModel < lm( y ~ x )
coefficients < coefficients( linearModel )
slope < coefficients[["x"]]
yIntercept < coefficients[["(Intercept)"]]
predictedResponse < yIntercept + ( slope * x )
predictedResponseDistanceFromMean <
predictedResponse  mean( y )
predictedResponseDistanceFromMeanSquared <
predictedResponseDistanceFromMean ** 2
predictedResponseDistanceFromMeanSquaredSum <
sum( predictedResponseDistanceFromMeanSquared )
# Build the denominator
sampledResponseDistanceFromMean < y  mean( y )
sampledResponseDistanceFromMeanSquared <
sampledResponseDistanceFromMean ** 2
sampledResponseDistanceFromMeanSquaredSum <
sum( sampledResponseDistanceFromMeanSquared )
coefficientOfDetermination <
predictedResponseDistanceFromMeanSquaredSum /
sampledResponseDistanceFromMeanSquaredSum
message( "Coefficient of determination:" )
print( coefficientOfDetermination )
Output:
Coefficient of determination:
[1] 1
This R program will execute the native functions to output the coefficient of determination:
x < c( 30, 27, 31, 28 )
y < c( 86.0, 80.6, 87.8, 82.4 )
linearModel < lm( y ~ x )
summary < summary( linearModel )
coefficientOfDetermination < summary[["r.squared"]]
message( "Coefficient of determination:" )
print( coefficientOfDetermination )
Output:^{[b]}
Coefficient of determination:
[1] 1
Single plot
This R program will display a scatter plot with an embedded regression line and regression equation illustrating the relationship between and :
x < c( 30, 27, 31, 28 )
y < c( 86.0, 80.6, 87.8, 82.4 )
linearModel < lm( y ~ x )
coefficients < coefficients( linearModel )
slope < coefficients[["x"]]
intercept < coefficients[["(Intercept)"]]
# Execute paste() to build the regression equation string
regressionEquation < paste( "y =", intercept, "+", slope, "x" )
# Display a scatter plot with the regression line and equation embedded
plot(
x,
y,
main = "Fahrenheit Depends Upon Celsius",
sub = regressionEquation,
xlab = "Degress Celsius",
ylab = "Degress Fahrenheit",
abline( linearModel ) )
Output:
Multi plot
This R program will generate a multiplot and a table of residuals.
# The independent variable is a list of numbers 1 to 6.
x < 1:6
# The dependent variable is a list of each independent variable squared.
y < x^2
# Executing the linear model on a quadratic equation will produce residuals.
linearModel < lm(y ~ x)
# Display the residuals.
summary( linearModel )
# Create a 2 by 2 multiplot.
par(mfrow = c(2, 2))
# Output the multiplot.
plot( linearModel )
Output:
Residuals:
1 2 3 4 5 6 7 8 9 10
3.3333 0.6667 2.6667 2.6667 0.6667 3.3333
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 9.3333 2.8441 3.282 0.030453 *
x 7.0000 0.7303 9.585 0.000662 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.055 on 4 degrees of freedom
Multiple Rsquared: 0.9583, Adjusted Rsquared: 0.9478
Fstatistic: 91.88 on 1 and 4 DF, pvalue: 0.000662
Mandelbrot graphic
This Mandelbrot set example highlights the use of complex numbers. It models the first 20 iterations of the equation z = z^{2} + c
, where c
represents different complex constants.
Install the package that provides the write.gif()
function beforehand:
install.packages("caTools")
R program:
library(caTools)
jet.colors <
colorRampPalette(
c("green", "pink", "#007FFF", "cyan", "#7FFF7F",
"white", "#FF7F00", "red", "#7F0000"))
dx < 1500 # define width
dy < 1400 # define height
C <
complex(
real =
rep(
seq(2.2, 1.0, length.out = dx), each = dy),
imag = rep(seq(1.2, 1.2, length.out = dy),
dx))
# reshape as matrix of complex numbers
C < matrix(C, dy, dx)
# initialize output 3D array
X < array(0, c(dy, dx, 20))
Z < 0
# loop with 20 iterations
for (k in 1:20) {
# the central difference equation
Z < Z^2 + C
# capture the results
X[, , k] < exp(abs(Z))
}
write.gif(
X,
"Mandelbrot.gif",
col = jet.colors,
delay = 100)
Output:
Programming
R is an interpreted language, so programmers typically access it through a commandline interpreter. If a programmer types 1+1
at the R command prompt and presses enter, the computer replies with 2
.^{[24]} Programmers also save R programs to a file then execute the batch interpreter Rscript.^{[25]}
Object
R stores data inside an object. An object is assigned a name which the computer program uses to set and retrieve a value.^{[26]} An object is created by placing its name to the left of the symbolpair <
.^{[27]} The symbolpair <
is called the assignment operator.^{[28]}
To create an object named x
and assign it the integer value 82
:
x < 82L
print( x )
Output:
[1] 82
The [1]
displayed before the number is a subscript. It shows the container for this integer is index one of an array.
Vector
The most primitive R object is the vector.^{[29]} A vector is a one dimensional array of data. To assign multiple elements to the array, use the c()
function to "combine" the elements. The elements must be the same data type.^{[30]} R lacks scalar data types, which are placeholders for a single word — usually an integer. Instead, a single integer is stored into the first element of an array. The single integer is retrieved using the index subscript of [1]
.^{[c]}
R program to store and retrieve a single integer:
store < 82L
retrieve < store[1]
print( retrieve[1] )
Output:
[1] 82
Elementwise operation
When an operation is applied to a vector, R will apply the operation to each element in the array. This is called an elementwise operation.^{[31]}
This example creates the object named x
and assigns it integers 1 through 3. The object is displayed and then again with one added to each element:
x < 1:3
print( x )
print( x + 1 )
Output:
[1] 1 2 3
[1] 2 3 4
To achieve the many additions, R implements vector recycling.^{[31]} The numeral one following the plus sign (+
) is converted into an internal array of three ones. The +
operation simultaneously loops through both arrays and performs the addition on each element pair. The results are stored into another internal array of three elements which is returned to the print()
function.
Numeric vector
A numeric vector is used to store integers and floating point numbers.^{[32]} The primary characteristic of a numeric vector is the ability to perform arithmetic on the elements.^{[32]}
Integer vector
By default, integers (numbers without a decimal point) are stored as floating point. To force integer memory allocation, append an L
to the number. As an exception, the sequence operator :
will, by default, allocate integer memory.
R program:
x < 82L
print( x[1] )
message( "Data type:" )
typeof( x )
Output:
[1] 82
Data type:
[1] "integer"
R program:
x < c( 1L, 2L, 3L )
print( x )
message( "Data type:" )
typeof( x )
Output:
[1] 1 2 3
Data type:
[1] "integer"
R program:
x < 1:3
print( x )
message( "Data type:" )
typeof( x )
Output:
[1] 1 2 3
Data type:
[1] "integer"
Double vector
A double vector stores real numbers, which are also known as floating point numbers. The memory allocation for a floating point number is double precision.^{[32]} Double precision is the default memory allocation for numbers with or without a decimal point.
R program:
x < 82
print( x[1] )
message( "Data type:" )
typeof( x )
Output:
[1] 82
Data type:
[1] "double"
R program:
x < c( 1, 2, 3 )
print( x )
message( "Data type:" )
typeof( x )
Output:
[1] 1 2 3
Data type:
[1] "double"
Logical vector
A logical vector stores binary data — either TRUE
or FALSE
. The purpose of this vector is to store the result of a comparison.^{[33]} A logical datum is expressed as either TRUE
, T
, FALSE
, or F
.^{[33]} The capital letters are required, and no quotes surround the constants.^{[33]}
R program:
x < 3 < 4
print( x[1] )
message( "Data type:" )
typeof( x )
Output:
[1] TRUE
Data type:
[1] "logical"
Two vectors may be compared using the following logical operators:^{[34]}
Operator  Syntax  Tests 

>  a > b  Is a greater than b? 
>=  a >= b  Is a greater than or equal to b? 
<  a < b  Is a less than b? 
<=  a <= b  Is a less than or equal to b? 
==  a == b  Is a equal to b? 
!=  a != b  Is a not equal to b? 
Character vector
A character vector stores character strings.^{[35]} Strings are created by surrounding text in double quotation marks.^{[35]}
R program:
x < "hello world"
print( x[1] )
message( "Data type:" )
typeof( x )
Output:
[1] "hello world"
Data type:
[1] "character"
R program:
x < c( "hello", "world" )
print( x )
message( "Data type:" )
typeof( x )
Output:
[1] "hello" "world"
Data type:
[1] "character"
Factor
A Factor is a vector that stores a categorical variable.^{[36]} The factor()
function converts a text string into an enumerated type, which is stored as an integer.^{[37]}
In experimental design, a factor is an independent variable to test (an input) in a controlled experiment.^{[38]} A controlled experiment is used to establish causation, not just association.^{[39]} For example, one could notice that an increase in hot chocolate sales is associated with an increase in skiing accidents.
An experimental unit is an item that an experiment is being performed upon. If the experimental unit is a person, then it is known as a subject. A response variable (also known as a dependent variable) is a possible outcome from an experiment. A factor level is a characteristic of a factor. A treatment is an environment consisting of a combination of one level (characteristic) from each of the input factors. A replicate is the execution of a treatment on an experimental unit and yields response variables.^{[40]}
This example builds two R programs to model an experiment to increase the growth of a species of cactus. Two factors are tested:
 water levels of none, light, or medium
 superabsorbent polymer levels of not used or used
R program to setup the design:
# Step 1 is to establish the levels of a factor.
# Vector of the water levels:
waterLevel <
c(
"none",
"light",
"medium" )
# Step 2 is to create the factor.
# Vector of the water factor:
waterFactor <
factor(
# Although a subset is possible, use all of the levels.
waterLevel,
levels = waterLevel )
# Vector of the polymer levels:
polymerLevel <
c(
"notUsed",
"used" )
# Vector of the polymer factor:
polymerFactor <
factor(
polymerLevel,
levels = polymerLevel )
# The treatments are the Cartesian product of both factors.
treatmentCartesianProduct <
expand.grid(
waterFactor,
polymerFactor )
message( "Water factor:" )
print( waterFactor )
message( "\nPolymer factor:" )
print( polymerFactor )
message( "\nTreatment Cartesian product:" )
print( treatmentCartesianProduct )
Output:
Water factor:
[1] none light medium
Levels: none light medium
Polymer factor:
[1] notUsed used
Levels: notUsed used
Treatment Cartesian product:
Var1 Var2
1 none notUsed
2 light notUsed
3 medium notUsed
4 none used
5 light used
6 medium used
R program to store and display the results:
experimentalUnit < c( "cactus1", "cactus2", "cactus3" )
replicateWater < c( "none", "light", "medium" )
replicatePolymer < c( "notUsed", "used", "notUsed" )
replicateInches < c( 82L, 83L, 84L )
response <
data.frame(
experimentalUnit,
replicateWater,
replicatePolymer,
replicateInches )
print( response )
Output:
experimentalUnit replicateWater replicatePolymer replicateInches
1 cactus1 none notUsed 82
2 cactus2 light used 83
3 cactus3 medium notUsed 84
Data frame
A data frame stores a twodimensional array.^{[41]} The horizontal dimension is a list of vectors. The vertical dimension is a list of rows. It is the most useful structure for data analysis.^{[42]} Data frames are created using the data.frame()
function. The input is a list of vectors (of any data type). Each vector becomes a column in a table. The elements in each vector are aligned to form the rows in the table.
R program:
integer < c( 82L, 83L )
string < c( "hello", "world" )
data.frame < data.frame( integer, string )
print( data.frame )
message( "Data type:" )
class( data.frame )
Output:
integer string
1 82 hello
2 83 world
Data type:
[1] "data.frame"
Data frames can be deconstructed by providing a vector's name between double brackets. This returns the original vector. Each element in the returned vector can be accessed by its index number.
R program to extract the word "world". It is stored in the second element of the "string" vector:
integer < c( 82L, 83L )
string < c( "hello", "world" )
data.frame < data.frame( integer, string )
vector < data.frame[["string"]]
print( vector[2] )
message( "Data type:" )
typeof( vector )
Output:
[1] "world"
Data type:
[1] "character"
Vectorized coding
Vectorized coding is a method to produce quality R computer programs that take advantage of R's strengths.^{[43]} The R language is designed to be fast at logical testing, subsetting, and elementwise execution.^{[43]} On the other hand, R does not have a fast for
loop.^{[44]} For example, R can searchandreplace faster using logical vectors than by using a for
loop.^{[44]}
For loop
A for
loop repeats a block of code for a specific amount of iterations.^{[45]}
Example to searchandreplace using a for
loop:
vector < c( "one", "two", "three" )
for ( i in 1:length( vector ) )
{
if ( vector[ i ] == "one" )
{
vector[ i ] < "1"
}
}
message( "Replaced vector:" )
print( vector )
Output:
Replaced vector:
[1] "1" "two" "three"
Subsetting
R's syntax allows for a logical vector to be used as an index to a vector.^{[46]} This method is called subsetting.^{[47]}
R example:
vector < c( "one", "two", "three" )
print( vector[ c( TRUE, FALSE, TRUE ) ] )
Output:
[1] "one" "three"
Change a value using an index number
R allows for the assignment operator <
to overwrite an existing value in a vector by using an index number.^{[28]}
R example:
vector < c( "one", "two", "three" )
vector[ 1 ] < "1"
print( vector )
Output:
[1] "1" "two" "three"
Change a value using subsetting
R also allows for the assignment operator <
to overwrite an existing value in a vector by using a logical vector.
R example:
vector < c( "one", "two", "three" )
vector[ c( TRUE, FALSE, FALSE ) ] < "1"
print( vector )
Output:
[1] "1" "two" "three"
Vectorized code to searchandreplace
Because a logical vector may be used as an index, and because the logical operator returns a vector, a searchandreplace can take place without a for
loop.
R example:
vector < c( "one", "two", "three" )
vector[ vector == "one" ] < "1"
print( vector )
Output:
[1] "1" "two" "three"
Functions
A function is an object that stores computer code instead of data.^{[48]} The purpose of storing code inside a function is to be able to reuse it in another context.^{[48]}
Native functions
R comes with over 1,000 native functions to perform common tasks.^{[49]} To execute a function:
 type in the function's name
 type in an open parenthesis
(
 type in the data to be processed
 type in a close parenthesis
)
This example rolls a die one time. The native function's name is sample()
. The data to be processed are:
 a numeric integer vector from one to six
 the
size
parameter instructssample()
to execute the roll one time
sample( 1:6, size=1 )
Possible output:
[1] 6
The R interpreter provides a help screen for each native function. The help screen is displayed after typing in a question mark followed by the function's name:
?sample
Partial output:
Description:
‘sample’ takes a sample of the specified size from the elements of
‘x’ using either with or without replacement.
Usage:
sample(x, size, replace = FALSE, prob = NULL)
Function parameters
The sample()
function has available four input parameters. Input parameters are pieces of information that control the function's behavior. Input parameters may be communicated to the function in a combination of three ways:
 by position separated with commas
 by name separated with commas and the equal sign
 left empty
For example, each of these calls to sample()
will roll a die one time:
sample( 1:6, 1, F, NULL )
sample( 1:6, 1 )
sample( 1:6, size=1 )
sample( size=1, x=1:6 )
Every input parameter has a name.^{[50]} If a function has many parameters, setting name = data
will make the source code more readable.^{[51]} If the parameter's name is omitted, R will match the data in the position order.^{[51]} Usually, parameters that are rarely used will have a default value and may be omitted.
Data coupling
The output from a function may become the input to another function. This is the basis for data coupling.^{[52]}
This example executes the function sample()
and sends the result to the function sum()
. It simulates the roll of two dice and adds them up.
sum( sample( 1:6, size=2, replace=TRUE ) )
Possible output:
[1] 7
Functions as parameters
A function has parameters typically to input data. Alternatively, a function (A) can use a parameter to input another function (B). Function (A) will assume responsibility to execute function (B).
For example, the function replicate()
has an input parameter that is a placeholder for another function. This example will execute replicate()
once, and replicate()
will execute sample()
five times. It will simulate rolling a die five times:
replicate( 5, sample( 1:6, size=1 ) )
Possible output:
[1] 2 4 1 4 5
Uniform distribution
Because each face of a die is equally likely to appear on top, rolling a die many times generates the uniform distribution.^{[53]} This example displays a histogram of a die rolled 10,000 times:
hist( replicate( 10000, sample( 1:6, size=1 ) ) )
The output is likely to have a flat top:
Central limit theorem
Whereas a numeric data set may have a central tendency, it also may not have a central tendency. Nonetheless, a data set of the arithmetic mean of many samples will have a central tendency to converge to the population's mean. The arithmetic mean of a sample is called the sample mean.^{[54]} The central limit theorem states for a sample size of 30 or more, the distribution of the sample mean () is approximately normally distributed, regardless of the distribution of the variable under consideration ().^{[55]} A histogram displaying a frequency of data point averages will show the distribution of the sample mean resembles a bellshaped curve.
For example, rolling one die many times generates the uniform distribution. Nonetheless, rolling 30 dice and calculating each average (
) over and over again generates a normal distribution.x ¯ {\displaystyle {\bar {x}}}
R program to roll 30 dice 10,000 times and plot the frequency of averages:
hist(
replicate(
10000,
mean(
sample(
1:6,
size=30,
replace=T ) ) ) )
The output is likely to have a bell shape:
Programmercreated functions
To create a function object, execute the function()
statement and assign the result to a name.^{[56]} A function receives input both from global variables and input parameters (often called arguments). Objects created within the function body remain local to the function.
R program to create a function:
# The input parameters are x and y.
# The return value is a numeric double vector.
f < function(x, y)
{
first_expression < x * 2
second_expression < y * 3
first_expression + second_expression
# The return statement may be omitted
# if the last expression is unassigned.
# This will save a few clock cycles.
}
Usage output:
> f(1, 2)
[1] 8
Function arguments are passed in by value.
If statements
R program illustrating if statements:
minimum < function( a, b )
{
if ( a < b )
minimum < a
else
minimum < b
return( minimum )
}
maximum < function( a, b )
{
if ( a > b )
maximum < a
else
maximum < b
return( maximum )
}
range < function( a, b, c )
{
range <
maximum( a, maximum( b, c ) ) 
minimum( a, minimum( b, c ) )
return( range )
}
range( 10, 4, 7 )
Output:
[1] 6
Generic functions
R supports generic functions. They act differently depending on the class of the argument passed in. The process is to dispatch the method specific to the class. A common implementation is R's print()
function. It can print almost every class of object. For example, print(objectName)
.^{[57]}
Programming shortcuts
R provides three notable shortcuts available to programmers.
Omit the print() function
If an object is present on a line by itself, then the interpreter will send the object to the print()
function.^{[58]}
R example:
integer < 82L
integer
Output:
[1] 82
Omit the return() statement
If a programmercreated function omits the return()
statement, then the interpreter will return the last unassigned expression.^{[59]}
R example:
f < function()
{
# Don't assign the expression to an object.
82L + 1L
}
Usage output:
> f()
[1] 83
Alternate assignment operator
The symbolpair <
assigns a value to an object.^{[28]} Alternatively, =
may be used as the assignment operator. However, care must be taken because =
closely resembles the logical operator for equality, which is ==
.^{[60]}
R example:
integer = 82L
print( integer )
Output:
[1] 82
Normal distribution
If a numeric data set has a central tendency, it also may have a symmetric looking histogram — a shape that resembles a bell. If a data set has an approximately bellshaped histogram, it is said to have a normal distribution.^{[61]}
Chest size of Scottish militiamen data set
In 1817, a Scottish army contractor measured the chest sizes of 5,732 members of a militia unit. The frequency of each size was:^{[62]}
Chest size (inches)  Frequency 

33  3 
34  19 
35  81 
36  189 
37  409 
38  753 
39  1062 
40  1082 
41  935 
42  646 
43  313 
44  168 
45  50 
46  18 
47  3 
48  1 
Create a commaseparated values file
R has the write.csv()
function to convert a data frame into a CSV file.
R program to create chestsize.csv:
chestsize <
c( 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48 )
frequency <
c( 3, 19, 81, 189, 409, 753, 1062, 1082, 935, 646, 313, 168, 50, 18, 3, 1 )
dataFrame < data.frame( chestsize, frequency )
write.csv(
dataFrame,
file="chestsize.csv",
# By default, write.csv() creates the first column as the row number.
row.names = FALSE )
Import a data set
The first step in data science is to import a data set.^{[63]}
R program to import chestsize.csv into a data frame:
dataFrame < read.csv( "chestsize.csv" )
print( dataFrame )
Output:
chestsize frequency
1 33 3
2 34 19
3 35 81
4 36 189
5 37 409
6 38 753
7 39 1062
8 40 1082
9 41 935
10 42 646
11 43 313
12 44 168
13 45 50
14 46 18
15 47 3
16 48 1
Transform a data set
The second step in data science is to transform the data into a format that the functions expect.^{[63]} The chestsize data set is summarized to frequency; however, R's normal distribution functions require a numeric double vector.
R function to convert a summarized to frequency data frame into a vector:
# Filename: frequencyDataFrameToVector.R
frequencyDataFrameToVector <
function(
dataFrame,
dataColumnName,
frequencyColumnName = "frequency" )
{
dataVector < dataFrame[[ dataColumnName ]]
frequencyVector < dataFrame[[ frequencyColumnName ]]
vectorIndex < 1
frequencyIndex < 1
vector < NA
for ( datum in dataVector )
{
frequency < frequencyVector[ frequencyIndex ]
for ( i in 1:frequency )
{
vector[ vectorIndex ] < datum
vectorIndex < vectorIndex + 1
}
frequencyIndex < frequencyIndex + 1
}
return ( vector )
}
R has the source()
function to include another R source file into the current program.
R program to load and display a summary of the 5,732 member data set:
source( "frequencyDataFrameToVector.R" )
dataFrame < read.csv( "chestsize.csv" )
chestSizeVector <
frequencyDataFrameToVector(
dataFrame,
"chestsize" )
message( "Head:" )
head( chestSizeVector )
message( "\nTail:" )
tail( chestSizeVector )
message( "\nCount:" )
length( chestSizeVector )
message( "\nMean:" )
mean( chestSizeVector )
message( "\nStandard deviation:" )
sd( chestSizeVector )
Output:
Head:
[1] 33 33 33 34 34 34
Tail:
[1] 46 46 47 47 47 48
Count:
[1] 5732
Mean:
[1] 39.84892
Standard deviation:
[1] 2.073386
Visualize a data set
The third step in data science is to visualize the data set.^{[63]} If a histogram of a data set resembles a bell shape, then it is normally distributed.^{[61]}
R program to display a histogram of the data set:
source( "frequencyDataFrameToVector.R" )
dataFrame < read.csv( "chestsize.csv" )
chestSizeVector <
frequencyDataFrameToVector(
dataFrame,
"chestsize" )
hist( chestSizeVector )
Output:
Standardized variable
Any variable () in a data set can be converted into a standardized variable (). The standardized variable is also known as a zscore.^{[64]} To calculate the zscore, subtract the mean and divide by the standard deviation.^{[65]}
 Let = a set of data points.
 Let = the mean of the data set.
 Let = the standard deviation of the data set.
 Let = the element in the set.
 Let = the zscore of the element in the set.
R function to convert a measurement to a zscore:
# Filename: zScore.R
zScore < function( measurement, mean, standardDeviation )
{
( measurement  mean ) / standardDeviation
}
R program to convert a chest size measurement of 38 to a zscore:
source( "zScore.R" )
print( zScore( 38, 39.84892, 2.073386 ) )
Output:
[1] 0.8917394
R program to convert a chest size measurement of 42 to a zscore:
source( "zScore.R" )
print( zScore( 42, 39.84892, 2.073386 ) )
Output:
[1] 1.037472
Standardized data set
A standardized data set is a data set in which each member of an input data set was run through the zScore
function.
R function to convert a numeric vector into a zscore vector:
# Filename: zScoreVector.R
source( "zScore.R" )
zScoreVector < function( vector )
{
zScoreVector = NA
for ( i in 1:length( vector ) )
{
zScoreVector[ i ] <
zScore(
vector[ i ],
mean( vector ),
sd( vector ) )
}
return( zScoreVector )
}
Standardized chest size data set
R program to standardize the chest size data set:
source( "frequencyDataFrameToVector.R" )
source( "zScoreVector.R" )
dataFrame < read.csv( "chestsize.csv" )
chestSizeVector <
frequencyDataFrameToVector(
dataFrame,
dataColumnName = "chestsize" )
zScoreVector <
zScoreVector(
chestSizeVector )
message( "Head:" )
head( zScoreVector )
message( "\nTail:" )
tail( zScoreVector )
message( "\nCount:" )
length( zScoreVector )
message( "\nMean:" )
round( mean( zScoreVector ) )
message( "\nStandard deviation:" )
sd( zScoreVector )
hist( zScoreVector )
Output:
Head:
[1] 3.303253 3.303253 3.303253 2.820950 2.820950 2.820950
Tail:
[1] 2.966684 2.966684 3.448987 3.448987 3.448987 3.931290
Count:
[1] 5732
Mean:
[1] 0
Standard deviation:
[1] 1
Standard normal curve
A histogram of a normally distributed data set that is converted to its standardized data set also resembles a bellshaped curve. The curve is called the standard normal curve or the zcurve. The four basic properties of the zcurve are:^{[66]}
 The total area under the curve is 1.
 The curve extends indefinitely to the left and right. It never touches the horizontal axis.
 The curve is symmetric and centered at 0.
 Almost all of the area under the curve lies between 3 and 3.
Area under the standard normal curve
The probability that a future measurement will be a value between a designated range is equal to the area under the standard normal curve of the designated range's two zscores.^{[67]}
For example, suppose the Scottish militia's quartermaster wanted to stock up on uniforms. What is the probability that the next recruit will need a size between 38 and 42?
R program:
library( tigerstats )
source( "frequencyDataFrameToVector.R" )
source( "zScore.R" )
dataFrame < read.csv( "chestsize.csv" )
chestSizeVector <
frequencyDataFrameToVector(
dataFrame,
dataColumnName = "chestsize" )
zScore38 <
zScore( 38, mean( chestSizeVector ), sd( chestSizeVector ) )
zScore42 <
zScore( 42, mean( chestSizeVector ), sd( chestSizeVector ) )
areaLeft38 < tigerstats::pnormGC( zScore38 )
areaLeft42 < tigerstats::pnormGC( zScore42 )
areaBetween < areaLeft42  areaLeft38
message( "Probability:" )
print( areaBetween )
Output:
Probability:
[1] 0.6639757
The pnormGC()
function can compute the probability between a range without first calculating the zscore.
R program:
library( tigerstats )
source( "frequencyDataFrameToVector.R" )
dataFrame < read.csv( "chestsize.csv" )
chestSizeVector <
frequencyDataFrameToVector(
dataFrame,
dataColumnName = "chestsize" )
areaBetween <
tigerstats::pnormGC(
c( 38, 42 ),
mean = mean( chestSizeVector ),
sd = sd( chestSizeVector ),
region = "between",
graph = TRUE )
message( "Probability:" )
print( areaBetween )
Output:
Probability:
[1] 0.6639757
Packages
R packages are collections of functions, documentation, and data that expand R.^{[68]} For example, packages add report features such as RMarkdown, knitr and Sweave. Easy package installation and use have contributed to the language's adoption in data science.^{[69]}
The Comprehensive R Archive Network (CRAN) was founded in 1997 by Kurt Hornik and Fritz Leisch to host R's source code, executable files, documentation, and usercreated packages.^{[70]} Its name and scope mimic the Comprehensive TeX Archive Network and the Comprehensive Perl Archive Network.^{[70]} CRAN originally had three mirrors and 12 contributed packages.^{[71]} As of December 2022, it has 103 mirrors^{[72]} and 18,976 contributed packages.^{[73]} Packages are also available on repositories RForge, Omegahat, and GitHub.
The Task Views on the CRAN website lists packages in fields such as finance, genetics, highperformance computing, machine learning, medical imaging, metaanalysis, social sciences, and spatial statistics.
The Bioconductor project provides packages for genomic data analysis, complementary DNA, microarray, and highthroughput sequencing methods.
Packages add the capability to implement various statistical techniques such as linear, generalized linear and nonlinear modeling, classical statistical tests, spatial analysis, timeseries analysis, and clustering.
The tidyverse package is organized to have a common interface. Each function in the package is designed to couple together all the other functions in the package.^{[68]}
Installing a package occurs only once. To install tidyverse:^{[68]}
> install.packages( "tidyverse" )
To instantiate the functions, data, and documentation of a package, execute the library()
function. To instantiate tidyverse:^{[d]}
> library( tidyverse )
Interfaces
R comes installed with a command line console. Available for installation are various integrated development environments (IDE). IDEs for R include R.app (OSX/macOS only), Rattle GUI, R Commander, RKWard, RStudio, and TinnR.
General purpose IDEs that support R include Eclipse via the StatET plugin and Visual Studio via R Tools for Visual Studio.
Editors that support R include Emacs, Vim via the NvimR plugin, Kate, LyX via Sweave, WinEdt (website), and Jupyter (website).
Scripting languages that support R include Python (website), Perl (website), Ruby (source code), F# (website), and Julia (source code).
General purpose programming languages that support R include Java via the Rserve socket server, and .NET C# (website).
Statistical frameworks which use R in the background include Jamovi and JASP.
Community
The R Core Team was founded in 1997 to maintain the R source code. The R Foundation for Statistical Computing was founded in April 2003 to provide financial support. The R Consortium is a Linux Foundation project to develop R infrastructure.
The R Journal is an open access, academic journal which features short to mediumlength articles on the use and development of R. It includes articles on packages, programming tips, CRAN news, and foundation news.
The R community hosts many conferences and inperson meetups. These groups include:
 UseR!: an annual international R user conference (website)
 Directions in Statistical Computing (DSC) (website)
 RLadies: an organization to promote gender diversity in the R community (website)
 SatRdays: Rfocused conferences held on Saturdays (website)
 R Conference (website)
 posit::conf (formerly known as rstudio::conf) (website)
Implementations
The main R implementation is written primarily in C, Fortran, and R itself. Other implementations include:
 pretty quick R (pqR), by Radford M. Neal, attempts to improve memory management.
 Renjin is an implementation of R for the Java Virtual Machine.
 CXXR and Riposte^{[74]} are implementations of R written in C++.
 Oracle's FastR is an implementation of R, built on GraalVM.
 TIBCO Software, creator of SPLUS, wrote TERR — an R implementation to integrate with Spotfire.^{[75]}
Microsoft R Open (MRO) was a R implementation. As of 30 June 2021, Microsoft started to phase out MRO in favor of the CRAN distribution.^{[76]}
Commercial support
Although R is an opensource project, some companies provide commercial support:
 Revolution Analytics provides commercial support for Revolution R.
 Oracle provides commercial support for the Big Data Appliance, which integrates R into its other products.
 IBM provides commercial support for inHadoop execution of R.
See also
 Comparison of numericalanalysis software
 Comparison of statistical packages
 List of numericalanalysis software
 List of statistical software
 Rmetrics
External links
Portal
Notes
 ^ The format of the regression equation differs from the algebraic format of . The yintercept is placed first, and all of the independent variables are appended to the right.
 ^ This may display to standard error a warning message that the summary may be unreliable. Nonetheless, the output of 1 is correct.
 ^ To retrieve the value of an array of length one, the index subscript is optional.
 ^ This displays to standard error a listing of all the packages that tidyverse depends upon. It may also display two errors showing conflict. The errors may be ignored.
References
 ^ ^{a} ^{b} ^{c} Morandat, Frances; Hill, Brandon; Osvald, Leo; Vitek, Jan (11 June 2012). "Evaluating the design of the R language: objects and functions for data analysis". European Conference on ObjectOriented Programming. 2012: 104–131. doi:10.1007/9783642310577_6. Retrieved 17 May 2016 – via SpringerLink.
 ^ Peter Dalgaard (29 February 2024). "R 4.3.3 is released". Retrieved 1 March 2024.
 ^ ^{a} ^{b} "R  Free Software Directory". directory.fsf.org. Retrieved 26 January 2024.
 ^ "R scripts". mercury.webster.edu. Retrieved 17 July 2021.
 ^ "R Data Format Family (.rdata, .rda)". Loc.gov. 9 June 2017. Retrieved 17 July 2021.
 ^ ^{a} ^{b} Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 3.3 What are the differences between R and S?. Archived from the original on 28 December 2022. Retrieved 27 December 2022.
 ^ "Introduction". The Julia Manual. Archived from the original on 20 June 2018. Retrieved 5 August 2018.
 ^ Giorgi, Federico M.; Ceraolo, Carmine; Mercatelli, Daniele (27 April 2022). "The R Language: An Engine for Bioinformatics and Data Science". Life. 12 (5): 648. Bibcode:2022Life...12..648G. doi:10.3390/life12050648. PMC 9148156. PMID 35629316.
 ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 12. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022.
We set a goal of developing enough of a language to teach introductory statistics courses at Auckland.
 ^ Hornik, Kurt; The R Core Team (12 April 2022). "R FAQ". The Comprehensive R Archive Network. 2.13 What is the R Foundation?. Archived from the original on 28 December 2022. Retrieved 28 December 2022.
 ^ Ihaka, Ross. "R: Past and Future History" (PDF). p. 4. Archived (PDF) from the original on 28 December 2022. Retrieved 28 December 2022.
 ^ Ihaka, Ross (5 December 1997). "New R Version for Unix". stat.ethz.ch. Archived from the original on 12 February 2023. Retrieved 12 February 2023.
 ^ Ihaka, Ross. "The R Project: A Brief History and Thoughts About the Future" (PDF). p. 18. Archived (PDF) from the original on 28 December 2022. Retrieved 27 December 2022.
 ^ ^{a} ^{b} ^{c} Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 90. ISBN 0201710587.
 ^ ^{a} ^{b} Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 105. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 155. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 146. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 148. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 156. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 157. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 170. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 175. ISBN 0201710587.
The coefficient of determination always lies between 0 and 1 ...
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 175. ISBN 0201710587.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 4. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 20. ISBN 9781449359010.
An R script is just a plain text file that you save R code in.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 7. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 8. ISBN 9781449359010.
 ^ ^{a} ^{b} ^{c} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 77. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 37. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 38. ISBN 9781449359010.
 ^ ^{a} ^{b} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 10. ISBN 9781449359010.
 ^ ^{a} ^{b} ^{c} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 39. ISBN 9781449359010.
 ^ ^{a} ^{b} ^{c} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 42. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 81. ISBN 9781449359010.
 ^ ^{a} ^{b} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 41. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 49. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 50. ISBN 9781449359010.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 25. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 23. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 24. ISBN 0201710587.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 55. ISBN 9781449359010.
Data frames are the twodimensional version of a list.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 55. ISBN 9781449359010.
They are far and away the most useful storage structure for data analysis[.]
 ^ ^{a} ^{b} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 173. ISBN 9781449359010.
 ^ ^{a} ^{b} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 185. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 165. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 69. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 80. ISBN 9781449359010.
 ^ ^{a} ^{b} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 16. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 29. ISBN 9781449359010.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 13. ISBN 9781449359010.
 ^ ^{a} ^{b} Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 14. ISBN 9781449359010.
 ^ Schach, Stephen R. (1990). Software Engineering. Aksen Associates Incorporated Publishers. p. 231. ISBN 0256085153.
 ^ Downing, Douglas; Clark, Jeffrey (2003). Business Statistics. Barron's. p. 163. ISBN 0764119834.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 95. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 314. ISBN 0201710587.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 17. ISBN 9781449359010.
 ^ R Core Team. "Print Values". R Documentation. R Foundation for Statistical Computing. Retrieved 30 May 2016.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 147. ISBN 9781449359010.
R calls print each time it displays a result in your console window.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 17. ISBN 9781449359010.
R will execute all of the code in the body and then return the result of the last line of code.
 ^ Grolemund, Garrett (2014). HandsOn Programming with R. O'Reilly. p. 82. ISBN 9781449359010.
Be careful not to confuse
=
with==
.=
does the same thing as<
.  ^ ^{a} ^{b} Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 256. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 257. ISBN 0201710587.
 ^ ^{a} ^{b} ^{c} Wickham, Hadley; CetinkayaRundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. p. xiii. ISBN 9781492097402.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 133. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 134. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 266. ISBN 0201710587.
 ^ Weiss, Neil A. (2002). Elementary Statistics, Fifth Edition. AddisonWesley. p. 265. ISBN 0201710587.
 ^ ^{a} ^{b} ^{c} Wickham, Hadley; CetinkayaRundel, Mine; Grolemund, Garrett (2023). R for Data Science, Second Edition. O'Reilly. p. xvii. ISBN 9781492097402.
 ^ Chambers, John M. (2020). "S, R, and Data Science". The R Journal. 12 (1): 462–476. doi:10.32614/RJ2020028. ISSN 20734859.
The R language and related software play a major role in computing for data science. ... R packages provide tools for a wide range of purposes and users.
 ^ ^{a} ^{b} Hornik, Kurt (2012). "The Comprehensive R Archive Network". WIREs Computational Statistics. 4 (4): 394–398. doi:10.1002/wics.1212. ISSN 19395108. S2CID 62231320.
 ^ Kurt Hornik (23 April 1997). "Announce: CRAN". rhelp. Wikidata Q101068595..
 ^ "The Status of CRAN Mirrors". cran.rproject.org. Retrieved 30 December 2022.
 ^ "CRAN  Contributed Packages". cran.rproject.org. Retrieved 29 December 2022.
 ^ Talbot, Justin; DeVito, Zachary; Hanrahan, Pat (1 January 2012). "Riposte: A tracedriven compiler and parallel VM for vector code in R". Proceedings of the 21st international conference on Parallel architectures and compilation techniques. ACM. pp. 43–52. doi:10.1145/2370816.2370825. ISBN 9781450311823. S2CID 1989369.
 ^ Jackson, Joab (16 May 2013). TIBCO offers free R to the enterprise. PC World. Retrieved 20 July 2015.
 ^ "Looking to the future for R in Azure SQL and SQL Server". 30 June 2021. Retrieved 7 November 2021.
Notes
This article is a direct transclusion of the Wikipedia article and therefore may not meet the same editing standards as LIMSwiki.