Data Mining…

less than 1 minute read

Google Tech Talk; Data1

Some interesting stuff we learned in the course

Using R

# ploting:
x<-seq(1,7,by=0.1)
plot(x,sin(x))
#
# help:
?seq
###
# read data
###
# read data from input files
setwd('/tmp') # set working directory
#
# header ... defines whether the first line contains a header
data<-read.csv("apache.log", sep=" ", header=F)
#
###
# view specific parts of the data
###
#
# view the first 5 rows
data[1:5,]    # data[rows,colums]
#
# vector:
v<-c(1,3)
#
# retrieve specific rows/columns:
data[c(1,7,21),1] # retrieves data from row nr 1,7, 21

Type of attributes (data types):

  • categorical vs. numeric
    • categorical (qualitative) - ip, eye color;
  • nominal (no order)
  • ordinal (meaningful order; rankings, grades)
  • quantitative (numeric) - weight, price
    • interval (there is no “true” zero; no division; calendar dates, temperature in celsius)
    • ratio (there is a true zero; there is division; temperature in kelvin, length time)
  • discrete vs. continuous

Attribute type and mathematical operations:

  • distinctness: =, !=
  • order: > <
  • addition: + -
  • multiplication: * /

  • nominal : distinctness
  • ordinal : distinctness, order
  • interval: distinctness, order, addition
  • ratio : all 4