Fine-grained Benchmark Subsetting for System Selection

Accompanying Material

P. de Oliveira Castro, Y. Kashnikov, C. Akel, M. Popov, W. Jalby

Université de Versailles St-Quentin-en-Yvelines

Exascale Computing Research


Summary

  1. Preamble
  2. Implementation of the Clustering and Prediction Methodology
  3. Numerical Recipes results
  4. NAS SER results
  5. Example: Using the Prediction Matrix
  6. Information for Reproducing NAS Benchmarking
  7. Function to generate the random clusterings in figure 7
  8. Genetic Algorithm exploration for selecting the features

Preamble

This IPython Notebook allows to reproduce the results of the paper Fine-grained Benchmark Subsetting for System Selection accepted for publication at the 2014 International Symposium on Code Generation and Optimization.

The Notebook and the original experimental datasets can be downloaded to replay the data analysis on your own computer.

Our subsetting framework depends on a set of GNU R libraries, that we have to load.

In [1]:
%load_ext rmagic
In [2]:
%%R
require(ggplot2)
require(plyr)
require(scales)
require(grid)
require(reshape)
require(reshape2)
require(permute)
require(GMD)
require(lattice)
require(ggdendro)
require(RGraphics) 
require(gridExtra)
Loading required package: ggplot2
Loading required package: plyr
Loading required package: scales
Loading required package: grid
Loading required package: reshape

Attaching package: ‘reshape’

The following objects are masked from ‘package:plyr’:

    rename, round_any

Loading required package: reshape2

Attaching package: ‘reshape2’

The following objects are masked from ‘package:reshape’:

    colsplit, melt, recast

Loading required package: permute
Loading required package: GMD
Loading required package: gplots
Loading required package: gtools

Attaching package: ‘gtools’

The following object is masked from ‘package:permute’:

    permute

Loading required package: gdata
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.

gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.

Attaching package: ‘gdata’

The following object is masked from ‘package:stats’:

    nobs

The following object is masked from ‘package:utils’:

    object.size

Loading required package: caTools
Loading required package: KernSmooth
KernSmooth 2.23 loaded
Copyright M. P. Wand 1997-2009
Loading required package: MASS

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

Loading required package: lattice
Loading required package: ggdendro
Loading required package: RGraphics
Loading required package: gridExtra

We now define some constants and configurations parameters. Ensure that DATA_PATH points to the directory containing the experimental datasets.

In [3]:
%%R

# Declare directory from where to load the experimental data sets
DATA_PATH <<-"."

# Set to a local directory to produce plots as pdf
PDF_OUTPUT_DIR <- ""

# Declare the architecture names
arch_names <- list(
  'sandybridge'="Sandy Bridge",
  'core2'="Core 2",
  'atom'="Atom"
)

# Architecture clk frequencies (used to convert RDTSC cycles to time)
clks = list("nehalem" = 1860*10^3,
            "atom" = 1662*10^3,
            "core2" = 2933*10^3,
            "sandybridge" = 3330*10^3)

# Declare well-behaved threshold

wellbehaved = 10

# Declare feature set
feature_set = c(
    "DP.MFlops.s..DP.assumed.",           
    "L2.bandwidth..MBytes.s.",     
    "L3.miss.rate",               
    "Memory.bandwidth..MBytes.s.", 
    "Nb.FLOP..div",               
    "Nb.insn..SD",                 
    "Nb.uops.P1",                 
    "Ratio.ADD.SUB...MUL",         
    "RecMII..cycles.",            
    "Vec..ratio......mul..FP.",    
    "Vec..ratio......other",      
    "Vec..ratio......other..FP.",  
    "X.L1..Bytes.stored...cycle", 
    "X.L1..IPC")

# Declare color palette
colours <- c("#000000","#AE1C3E", "#F5A275")

Implementation of the Clustering and Prediction methodology

In this section we implement the clustering algorithm used to subset representatives. We also implement functions required to predict the full benchmark suite from the representative's measures.

First, we need a function to load the experimental datasets.

In [4]:
%%R
# new_session loads our experimental dataset for a given benchmark 
# and target architecture
new_session <- function(bench, target) {

  ref_file = paste(DATA_PATH, "/", bench, "-reference.csv", sep="")
  tar_file = paste(DATA_PATH, "/", bench, "-", target,".csv", sep="")
  features_file = paste(DATA_PATH, "/", bench, "-features.csv", sep="")
  
  ref =  read.csv(ref_file, header=T)
  tar =  read.csv(tar_file, header=T)
  S = list(data = merge(ref,tar, by=c("BenchName", "CodeletName")), 
           features = read.csv(features_file, header=T),
           tar_clk = clks[[target]],
           ref_clk = clks[["nehalem"]])
    
  # Sort features data frame with the same order than the timing data 
  S$features = S$features[order(match(S$features[,"CodeletName"], S$data[,"CodeletName"])),]
  return(S)
}

Features are cleaned and scaled before clustering the codelets.

In [5]:
%%R

#
# scale_features: scales the features, so they have the same weight
# -> as a result each feature as variance 1 and mean 0 across all codelets
#
scale_features <- function(features) {
  # remove non numeric columns
  features = features[sapply(features, is.numeric)]
  # replace NA by zero
  features[is.na(features)] <- 0 
  # scale features
  features = scale(features) 
  # remove all-zero columns
  features = features[,!(colSums(abs(features)) == 0)]
  return (data.frame(features))
}

Groups of similar codelets are gathered using Ward's hierarchical clustering on the feature vectors. As described in the paper, we modify the clustering to guarantee that every cluster contains at least one well-behaved codelet.

In [6]:
%%R

# Hierarchical clustering

#
# myhiera: does a hierarchical clustering using ward criteria of the codelets
# k is the desired number of clusters
# if k == -1 (the default) the elbow method is used to automatically select
# an optimal number of clusters 
#
myhiera <- function(d, features, k=-1) {
  fit <- hclust(d, method="ward")
  if (k != -1) {
    clusters = cutree(fit, k=k)
  } else {
    # apply elbow method    
    css.obj <- css.hclust(d, fit, k=nrow(features))
    elbow.obj <- elbow.batch(css.obj)
    clusters = cutree(fit, k = elbow.obj$k)
  }
  return (list(clusters=clusters, tree=fit))
}

#
# distancematrix: returns the symmetric matrix of the distances
# between all pairs of codelets
#
distancematrix <- function(features) {
  d = dist(features, method="euclidean")
  return(d)
}

#
# closest_not_in: used to migrate ill-behaved codelets during representative
# selection. returns the cluster closest to the given codelet that is not
# in the `not_in` set of codelets
#
closest_not_in <- function(CodeletsNames, dist, CodeletName, not_in) {
    pos = which(CodeletsNames == CodeletName)
    dv = data.frame(CodeletName = CodeletsNames, dists = dist[pos,])

    dv = dv[!(dv$CodeletName %in% not_in), ]
    dv = dv[order(dv$dists),]
    return(dv[1,]$CodeletName)
}  

#
# cluster_codelets: returns a hierarchical clustering of codelets that
# guarantees that each cluster contains at least a well-behaved codelet
#
cluster_codelets <- function(S, feature_names, number_of_clusters) {
  # Scale the features
  features = S$features[feature_names]
  features = scale_features(features)
    
  # Compute the distance matrix
  distances = distancematrix(features)
  D = as.matrix(distances)

  # Compute a Ward's hierarchical clustering
  hiera = myhiera(distances, features, number_of_clusters)
  clusters = data.frame(CodeletName = S$features$CodeletName,
                        Cluster = hiera[["clusters"]])

  # Find clusters without at least a well-behaved codelet
  # and migrate them to other clusters
  S$data$Cluster =  clusters$Cluster
  S$data$dists = c(0)
  for (cluster in unique(S$data$Cluster)) {
      cl = S$data[S$data$Cluster==cluster,]
      dists = cl$CPI_mismatch
      S$data[S$data$Cluster==cluster,]$dists = dists

      if (min(dists) > wellbehaved) { 
          for (codelet in cl$CodeletName) {
              closest = closest_not_in(S$features$CodeletName, 
                                       D,
                                       codelet,
                                       cl$CodeletName)

              closest_cluster = S$data[S$data$CodeletName == closest,]$Cluster
              S$data[S$data$CodeletName == codelet,]$Cluster = closest_cluster 
          }
      } 
  }
  S$data$Cluster = as.numeric(as.factor(S$data$Cluster))
  return(S$data)
}

Once clusters are formed, we elect one representative per cluster. Representatives must be well-behaved and close to the cluster centroid.

In [7]:
%%R

# Representative election


#
# clust.centroid: returns the centroid of a cluster
#
clust.centroid = function(i, dat, clusters) {
  ind = (clusters == i)
  return(colMeans(dat[ind,, drop=FALSE]))
}

#
# elect_representatives: elect a well-behaved codelet as representative in 
# each cluster. The codelet is chosen as the closest well-behaved codelet 
# to the centroid.
#
elect_representatives <- function(S, feature_names, data) {
  features = S$features[feature_names]
  features = scale_features(features)

  #compute centroid of each cluster
  Centroids <- sapply(sort(unique(data$Cluster)), clust.centroid, features, data$Cluster)
  features$CodeletName = S$features$CodeletName

  data$is.representative = F
  for (cl in unique(data$Cluster)) {
    feat_cl = features[data$Cluster==cl & data$CPI_mismatch <= wellbehaved,]
    feat_cl$CPI_mismatch = data[data$Cluster==cl & data$CPI_mismatch <= wellbehaved,]$CPI_mismatch 
    if (nrow(feat_cl) == 0) {
      feat_cl = features[data$Cluster==cl,]
      feat_cl$CPI_mismatch = data[data$Cluster==cl,]$CPI_mismatch
    }

    #compute for each codelet distance from centroid
    feat_cl = ddply(feat_cl, c("CodeletName"), function(x) {
        x$CodeletName <- NULL
        CPI_mismatch = x$CPI_mismatch
        x$CPI_mismatch <- NULL

        if (length(x) == 1) {
            c = Centroids[cl]
        } else {
            c = Centroids[, cl]
        }
        data.frame(DistToCentroid = sum((x - c) ^ 2), CPI_mismatch = CPI_mismatch)
    })
    feat_cl <- feat_cl[order(feat_cl$CPI_mismatch), ] #order second by matching distance
    feat_cl <- feat_cl[order(feat_cl$DistToCentroid), ] #order first by distance to centroid

    # we select as representative the closest codelet to the centroid
    data[data$Cluster==cl, ]$is.representative = 
      ifelse(data[data$Cluster==cl, ]$CodeletName %in% feat_cl[1, ]$CodeletName, T, F)
  }
   
  return(data)
}

The following functions predict the performance of all the codelets or all the applications given the representative's measures on a target architecture.

In [8]:
%%R
#
# cycle_prediction: predict cycles per invocation for all the codelets
#
cycle_prediction <- function(S, feature_names, data) {
  data = elect_representatives(S, feature_names, data)
  

  #Isolate our representatives
  representatives <- data[data$is.representative, ]

  #Now we need to compute for each cluster the speedup 
  representatives = within(representatives, 
    {Speedup = 
       tar_vitro_CPInv/ref_vivo_CPInv})

  # Order representatives per clusters
  representatives <- representatives[order(representatives$Cluster), ]

  # Merge prediction Speedup, real Speedup and predicted cycles into data
  representatives <- representatives[!duplicated(representatives$Cluster), ]
  data$Speedup = representatives[data$Cluster, "Speedup"]
  data$realSpeedup = (data$tar_vivo_CPInv) / (data$ref_vivo_CPInv)  
  data$Predicted_cyclesPerIteration = data$ref_vivo_CPInv * data$Speedup
  data$per_err = abs(data$tar_vivo_CPInv - data$Predicted_cyclesPerIteration)/
    pmax(data$Predicted_cyclesPerIteration, data$tar_vivo_CPInv)

    
  return(data)
}

#
# compute_appli_cycles: predict each application run time by aggregating
# its codelets predictions.
#
compute_appli_cycles <- function(S, data) {
  tmp = ddply(data, c("BenchName"), function(x) {
    clusterTotal = sum(x$ref_vivo_invocations * x$ref_vivo_CPInv)
    data.frame(Speedup = 
               1/sum((x$ref_vivo_invocations * x$ref_vivo_CPInv)/(clusterTotal*x$Speedup)))
    }
  )
  
  data <- data[!duplicated(data$BenchName), ]
  data <- data[order(data$BenchName), ]
    
  tmp$Reference_APPLI_TIME = data$ref_app_cycles/S$ref_clk
  tmp$Target_APPLI_TIME = data$tar_app_cycles/S$tar_clk
  tmp$PredictedTime = (data$ref_app_cycles*tmp$Speedup)/S$tar_clk
      
  return(tmp)
}

Finally, we need a set of functions to evaluate the accuracy and benchmark reduction factor of our method.

In [9]:
%%R

#
# compute_benchmaring_cost: computes the reduction achieved by benchmarking the codelets
# instead of the original applications
#
compute_benchmarking_cost <- function(S, data) {
    data$BenchmarkTime = data$tar_vitro_CPInv*data$tar_vitro_invocations/S$tar_clk
    data = data[data$is.representative,]
    return(sum(data$BenchmarkTime)) 
}

#
# prediction_statistics: computes a set of statistics about the quality and speed of the
# prediction.
#
prediction_statistics <- function(S, data) {
  score_stats = data.frame(
    mean_per_err = mean(data$per_err),
    median_per_err = median(data$per_err),
    number_of_clusters = max(data$Cluster),
    benchmark_cost_ms = compute_benchmarking_cost(S, data) 
  )
  return(score_stats)
}

#
# gm_mean: returns the geometric mean
#
gm_mean = function(a){prod(a)^(1/length(a))}

#
# compute_global_speedup: computes the speedup for a whole architecture
# by returning the geometric mean of the applications speedups.
#
compute_global_speedup <- function(appli_prediction) {
    x=appli_prediction
    x$real_speedup = (x$Reference_APPLI_TIME)/(x$Target_APPLI_TIME)
    x$predicted_speedup = (x$Reference_APPLI_TIME)/(x$PredictedTime)
    appli_benchmark_cost_ms = sum(x$Target_APPLI_TIME)
    real_global_speedup = gm_mean(x$real_speedup) 
    predicted_global_speedup = gm_mean(x$predicted_speedup) 
    error = abs(real_global_speedup-predicted_global_speedup)/real_global_speedup
    stats = data.frame(real_speedup = real_global_speedup, 
                       predicted_speedup = predicted_global_speedup,
                       appli_benchmark_cost_ms = appli_benchmark_cost_ms,
                       error = error)
    return(stats)
}

# Utility functions:

# Keep only codelets from one single application
keep_only_app <- function(S, app) {
  S$data = S$data[S$data$BenchName == app, ]
  S$features = S$features[S$features$BenchName == app, ]
  return(S)
}

Numerical Recipes Results

In this section, we produce the Numerical Recipes plots and tables used in the paper. You can reproduce these results by downloading this notebook and running it in your computer.

In [10]:
%%R
# Setup experiment parameters for NR results
number_of_clusters=14
bench="NR"
architecture="atom"
In [11]:
%%R -w 177 -h 89 -u mm -r 100

if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "nr-table.pdf", sep="/")
    pdf(outputpdf, width=7, height=3.5)
}

# Create a new session with the adequate data
S = new_session(bench, architecture)

# Clean the feature list
features = S$features[feature_set]
features = scale_features(features)

# Compute the distance in the feature space between every pair of codelets
distances = distancematrix(features)

# Compute a hierarchical clustering dendrogram
fit <- hclust(distances, method="ward")
fit = as.dendrogram(fit)

# Run the clustering and prediction method
clusters = cluster_codelets(S, feature_set, number_of_clusters)
prediction = cycle_prediction(S, feature_set, clusters)
prediction$Vec.Ratio.= round(S$features$Vec..ratio......all..FP)

# Read high-level classification data
hlclass = read.csv(paste(DATA_PATH,"hlclassification.csv",sep="/"), header=T, sep=",")
prediction = merge(prediction, hlclass, by="CodeletName")

# Reorder table prediction using the dendrogram codelet order
newprediction = data.frame()
labels <- prediction$CodeletName
for (i in labels[order.dendrogram(fit)]) {
  newprediction = rbind(prediction[prediction$Codelet == i,], newprediction)
}

# Reorder cluster numbers by dendrogram
newprediction$Cluster = as.integer(factor(as.character(newprediction$Cluster), 
                                          levels=unique(as.character(newprediction$Cluster))))
prediction = newprediction

# Internally times (and therefore accelerations) are relative to machine cycles
# convert to absolute time using the architecture clocks frequencies

freq_ratio = S$ref_clk / S$tar_clk
prediction[,"realSpeedup"] <- 1/(prediction[,"realSpeedup"] * (freq_ratio))
prediction[,"Speedup"] = 1/(prediction[,"Speedup"] * (freq_ratio))

# Put acceleration of representatives between < > 
isrepr = prediction[,"is.representative"] == "TRUE"
prediction[!isrepr,"realSpeedup"] = sprintf("%.4s",prediction[!isrepr,"realSpeedup"])
prediction[isrepr,"realSpeedup"] = sprintf("<%.4s>",prediction[isrepr,"realSpeedup"])

# Clean the codelet names suffixes
cleanCodeletNames <- function(D) {
    for (suffix in c("_dp_sse", "_bet1_dt0_sse_initbet1", "_sp_sse", "_square_sp_sse", "_mp_sse")) 
    {
        D$CodeletName = factor(sub(suffix, "", D$CodeletName))
    }
    D$CodeletName = factor(sub("hqr_12_square", "hqr_12_sq", D$CodeletName))
    return(D)
}
prediction = cleanCodeletNames(prediction)

# Summarize codelet data by cluster number
D = ddply(prediction, .(Cluster), summarize, 
          Codelet=CodeletName, 
          "Computation Pattern"=Algorithm, 
          Stride=Array.Access..Stride, 
          Vec.=Vectorization,
          "Vec. %" =  Vec.Ratio.,
          s=realSpeedup)

# Rename column Cluster to C
colnames(D)[1] = "C"

# Clean the D table so identical clusters in successive lines
# are not repeated
cleanf <- function(x){ 
   oldx <- c(FALSE, x[-1]==x[-length(x)]) 
   res <- x
   res[oldx] <- ""        
   res
}
D[c("C")] <- lapply(D[c("C")], cleanf)


# Compute cluster separations row numbers
seps = as.list(which(D$C != "")-1)

# Plot the dendrogram and the table
par(mar=c(0,0,0,0), no.readonly=TRUE)
plot.new() 
gl <- grid.layout(nrow=1, ncol=2, widths=unit(c(1,5), 'null'), heights=unit(c(1), 'null'))

vp.1 <- viewport(layout.pos.col=1, layout.pos.row=1)
vp.2 <- viewport(layout.pos.col=2, layout.pos.row=1)

pushViewport(viewport(layout=gl))
pushViewport(vp.1)

# Plot Dendrogram

p = ggdendrogram(fit, rotate=TRUE, labels=F) 
p = p + theme(plot.margin = unit(c(0.2,-1.38,-0.94,0), "cm"))
p = p + geom_hline(yintercept=4.5, linetype="dashed", color="darkred")
p = p + annotate("text", size=3, y=14, x=1, color="darkred", label="cut for K = 14")
print(p, newpage=F)
popViewport()

pushViewport(vp.2)

# Plot Table 
grid.table(D, gp =gpar(fontsize=6.5, lwd=1), show.rownames=F, padding.v = unit(1.2, "mm"), 
           padding.h = unit(3.4, "mm"))

# Plot Separators
for (s in seps[-1]) {
  ss = nrow(D)-s
  y = unit(1,"mm")+unit(ss*2.985,"mm")
  grid.segments(0.058, y, 1-0.058, y)
}
popViewport()

if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
In [12]:
%%R -w 5 -h 3 -u in -r 150

if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "nr-codelets.pdf", sep="/")
    pdf(outputpdf, width=3.7, height=3)
}

data_sorted = prediction
  
#Put angle-brackets around representative names
list_rep =  data_sorted[data_sorted[,"is.representative"]=="TRUE","CodeletName"]  
levels(data_sorted$CodeletName) <- c(levels(data_sorted$CodeletName), 
                                     paste("<", list_rep ,">",sep=""))
data_sorted[data_sorted[,"is.representative"]=="TRUE","CodeletName"] = paste("<", 
                                                                             list_rep ,">",sep="")
 
#Compute prediction statistics
stats = prediction_statistics(S, data_sorted)


# Choose clusters to plot
data_sorted = data_sorted[data_sorted$Cluster==1 | data_sorted$Cluster==2,]

# Prepare plot labels
data_sorted$info_plot = 
  paste(
      paste("error = ", signif(data_sorted$per_err*100,3), "%", sep = ""),  
  "\n\n",sep="")

data_sorted[,"Speedup"] = sprintf("%.4s",data_sorted[,"Speedup"])
data_sorted[,"Speedup"] = paste("Cluster ",data_sorted[,"Cluster"], "  ",
                                "(s = ",data_sorted[,"Speedup"],")",sep="")

# Set codelet order in plot 
data_sorted$CodeletName = factor(data_sorted$CodeletName,
                                 levels=factor(c("<toeplz_1>","rstrct_29","mprove_8",
                                                 "toeplz_4","<realft_4>")))

# Plot
p <- ggplot(data_sorted, aes())
p <- p + geom_point(aes(y=ref_vivo_CPInv/S$ref_clk,
                        x=CodeletName, col="R", shape="R"), 
                    alpha=1,size=3)

# Plot prediction
iteration = 1
for (element in data_sorted$is.representative) {
    colour_info = "black"
    transparency = 1
    if (element == "TRUE") 
    { 
      colour_info = "darkgreen"
      transparency = 1
    } else {
      colour_info = "black"
      transparency = 0.75
    }        
    p <- p + geom_text(data=data_sorted[iteration, ],aes(y=ref_vivo_CPInv/S$ref_clk, 
                                                         x=CodeletName,col="R", 
                                                         shape="R"), 
                       hjust=0.25, vjust=0.5, size=2.5, alpha=transparency, 
                       colour=colour_info, angle=90) 
    p <- p + aes_string(label = "info_plot") 

    iteration = iteration + 1
}

p <- p + geom_point(aes(y=tar_vivo_CPInv/S$tar_clk, x=CodeletName,col="T1", 
                          shape="T1"), alpha=0.8,size=3)
p <- p + geom_point(aes(y=Predicted_cyclesPerIteration/S$tar_clk,x=CodeletName, col="T2", 
                          shape="T2"), alpha=0.8,size=3)
p <- p + geom_segment(aes(y=ref_vivo_CPInv/S$ref_clk,x = CodeletName,  
                            yend=Predicted_cyclesPerIteration/S$tar_clk, xend=CodeletName), 
                            arrow=arrow(length=unit(0.2,"cm")), size=0.3, color="blue", alpha=0.8)   
p <- p + scale_y_log10(breaks=c(1,2,5,10,20,40,80,160,320), limits=c(2,160))
p <- p + labs(y = "Execution time (ms / invocation)", x = "")
p <- p + theme(axis.text.x = element_blank())
p <- p + theme(legend.position="top")
p <- p + facet_wrap(~Speedup, scales="free_x") 
p <- p + theme_bw(base_size=9) 
p <- p + theme(legend.position="top", legend.direction="horizontal") 
p <- p + theme(legend.key = element_blank())
p <- p + theme(axis.text.x = element_text(angle = 45, hjust = 1))  
p <- p + scale_color_manual("", values=colours, 
                            labels=c("Reference (Nehalem)", "Atom real", "Atom predicted"))
p <- p + scale_shape_manual("", 
                            labels=c("Reference (Nehalem)", "Atom real", "Atom predicted"), 
                            values=c(3,4,5))
print(p)

if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
In [13]:
%%R

all = data.frame()
for (nclusters in c(14,-1)) {
  for (architecture in c("atom","sandybridge")) { 
    S= new_session(bench, architecture)
    clusters = cluster_codelets(S, feature_set, nclusters)
    prediction = cycle_prediction(S, feature_set, clusters)

    stats = prediction_statistics(S, prediction)

    # Create precentage
    stats$median_per_err = stats$median_per_err * 100
    stats$mean_per_err = stats$mean_per_err * 100

    D = data.frame(Architecture = as.factor(as.character(arch_names[architecture])), 
                   Clusters = nclusters,
                   "Median Error" = signif(stats$median_per_err,2),
                   "Average Error" = signif(stats$mean_per_err,2))

    all = rbind(all, D)
  }
}

print(all, row.names=F)
 Architecture Clusters Median.Error Average.Error
         Atom       14          1.8         12.00
 Sandy Bridge       14          3.2          9.30
         Atom       -1          0.0          1.70
 Sandy Bridge       -1          0.0          0.97

NAS SER Results

In this section, we produce the NAS Serial plots and tables used in the paper. You can reproduce these results by downloading this notebook and running it in your computer.

In [14]:
%%R 
# Setup experiment parameters for NAS SER results
bench="NAS"
architecture="sandybridge"
number_of_clusters=-1
In [15]:
%%R -w 7 -h 3 -u in -r 300

if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "nas-codelets.pdf", sep="/")
    pdf(outputpdf, width=7, height=3.6)
}

# Load session data
S= new_session(bench, architecture)

# Run the benchmark reduction
clusters = cluster_codelets(S, feature_set, number_of_clusters)
prediction = cycle_prediction(S, feature_set, clusters)
stats = prediction_statistics(S, prediction)

# Sort codelets by running time
data_sorted = arrange(prediction, -ref_vivo_CPInv)
data_sorted$CodeletName = factor(data_sorted$CodeletName, levels=unique(data_sorted$CodeletName))

# Plot data, group by application
p <- ggplot(data_sorted, aes(x=CodeletName, col=ARCH, shape=ARCH))
p <- p + geom_point(aes(y=ref_vivo_CPInv/S$ref_clk,
                          col="R", shape="R"), size=1.5)
p <- p + geom_point(aes(y=tar_vivo_CPInv/S$tar_clk,
                          col="T2", shape="T2"), size=1.5)
p <- p + geom_point(aes(y=Predicted_cyclesPerIteration/S$tar_clk,
                          col="T1", shape="T1"), size=1.5)
p <- p + scale_y_log10(breaks = c(1,2,5,10,25,50,100,200,500,1000,2000,4000))
p <- p + labs(y = "Execution time (ms / invocation)", x=NULL)
p <- p + facet_wrap(~Cluster ~Speedup, scales="free_x")
p <- p + scale_x_discrete(breaks = NA)
p <- p + theme_bw(base_size=9)
p <- p + scale_color_manual("", values=colours, labels=c("Reference (Nehalem)", 
                                                         "Sandy Bridge real", 
                                                         "Sandy Bridge predicted"))
p <- p + scale_shape_manual("", labels=c("Reference (Nehalem)", 
                                         "Sandy Bridge real", 
                                         "Sandy Bridge predicted"), 
                            values=c(3,4,5))
p <- p + theme(legend.position="top") + theme(legend.key = element_blank())
p <- p + facet_wrap(~BenchName, nrow=1, scales="free")
print(p)


if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
In [16]:
%%R -w 4 -h 5 -u in -r 150
if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "nas-applications.pdf", sep="/")
    pdf(outputpdf, width=3.5, height=4)
}

all = data.frame()
# Gather application prediction statistics on the three target architectures
for (arch in c("atom", "core2", "sandybridge")) { 
  architecture=arch
  S= new_session(bench, architecture)
  number_of_clusters=-1
  clusters = cluster_codelets(S, feature_set, number_of_clusters)            
  prediction = cycle_prediction(S, feature_set, clusters)                                 
  appli = compute_appli_cycles(S, prediction)                                    
  appli$Architecture = as.factor(as.character(arch_names[arch]))
  all = rbind(all, appli)
}

# Plot the data, group by architecture
all$Diff <- NULL                 
all$Speedup <- NULL                  
short.m <- melt(all, id.vars=c("Architecture", "BenchName")) 
p <- ggplot(short.m, aes(x=BenchName, y=value/1000, fill=variable))                
p <- p + geom_bar(stat="identity", position=position_dodge(), width=0.8)                 
p <- p + labs(y = "Execution time (s)", x = "")  
p <- p + scale_fill_manual(values=colours, labels = c("Reference", "Real", "Predicted")) 
p <- p + facet_wrap(~ Architecture, ncol=1,scale="free_y")                        
p <- p + guides(fill=guide_legend(title=NULL))
p <- p + theme_bw(base_size=9)
p <- p + theme(legend.key= element_blank(), legend.position="top", legend.direction="horizontal")

print(p)


if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
In [17]:
%%R -w 3 -h 3 -u in -r 150

if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "speedup.pdf", sep="/")
    pdf(outputpdf, width=3, height=2.5)
}


# Gather data from previous cell
all2 = data.frame(real_spu = all$Reference_APPLI_TIME/all$Target_APPLI_TIME,
                  predicted_spu = all$Reference_APPLI_TIME/all$PredictedTime,
                  BenchName = all$BenchName,
                  Architecture = all$Architecture)

# Compute geometrical mean by architecture
all2 = ddply(all2, .(Architecture), summarize, real_spu = gm_mean(real_spu), 
             predicted_spu = gm_mean(predicted_spu))

# Plot the data by architecture
short.m = melt(all2, id.vars=c("Architecture"))
p <- ggplot(short.m, aes(x=Architecture, y=value, fill=variable))                
p <- p + geom_bar(stat="identity", position=position_dodge())                 
p <- p + labs(y="Geometric mean speedup", x = NULL)                     
p <- p + scale_fill_manual(values=colours[-c(1)], labels = c("Real Speedup", "Predicted Speedup")) 
   
p <- p + ylim(0,2.2)
p <- p + geom_hline(yintercept=1.0, linetype="dashed")
p <- p + geom_text(aes(label=format(round(value,2),2)), position=position_dodge(width=0.9), 
                   vjust=-1, color="black", size=3) 
p <- p + guides(fill=guide_legend(title=NULL))
p <- p + theme_bw(base_size=9)
p <- p + theme(legend.key= element_blank(),legend.justification = c(0, 1), 
               legend.position = c(0, 1))


print(p)

if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
ymax not defined: adjusting position using y instead

In [18]:
%%R -w 7 -h 3 -u in -r 300

if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "speedupvserror.pdf", sep="/")
    pdf(outputpdf, width=7, height=3)
}

# Aggregate data for the three target architectures
all = data.frame()
for (arch in c("atom", "core2", "sandybridge")) { 
  architecture=arch
  number_of_clusters=-1
  
  S= new_session(bench, architecture)
  clusters = cluster_codelets(S, feature_set, number_of_clusters)            
  prediction = cycle_prediction(S, feature_set, clusters) 
  stats = prediction_statistics(S, prediction)


  P = expand.grid(ncluster=seq(1,35))
  D = ddply(P, 
            .(ncluster), 
            function(x) { P = cycle_prediction(S, feature_set, 
                                               cluster_codelets(S, feature_set, x$ncluster)) 
                            cbind(prediction_statistics(S, P), 
                                  compute_global_speedup(compute_appli_cycles(S, P)))
            }
            )
  D$benchmarking_speedup = D$appli_benchmark_cost_ms/D$benchmark_cost_ms

  D = D[!duplicated(D$number_of_clusters),]
  D$ncluster = D$number_of_clusters

  D = D[,c("ncluster", "median_per_err", "benchmarking_speedup")]
  D = melt(D, id.vars=c("ncluster"))
  D$label = c(rep(D[D$variable == "median_per_err" & 
                    D$ncluster==stats$number_of_clusters,]$value,nrow(D)/2),
               rep(D[D$variable == "benchmarking_speedup" & 
                    D$ncluster==stats$number_of_clusters,]$value,nrow(D)/2)
             )

  D$Architecture = as.factor(as.character(arch_names[arch]))
  all = rbind(all, D)
}



# Set the plot scales
maxsp = 50
maxer = 25
convert = maxer/maxsp

all[all$variable == "benchmarking_speedup",]$value = all[
                                        all$variable == "benchmarking_speedup",]$value * convert
all[all$variable == "median_per_err",]$value = all[
                                        all$variable == "median_per_err",]$value * 100 

all$variable = factor(all$variable, levels=c("median_per_err", "benchmarking_speedup"))

best_c = stats$number_of_clusters

# Plot the speedup & error by increasing number of representatives, group by architecture
trellis.par.set(theme = col.whitebg())
trellis.par.set(fontsize=list(text=10, points=8)) 
p <- xyplot( value ~ ncluster | Architecture, data = all, groups = variable, type="l", 
            ylim=c(0, 30), xlim=c(0,25),
            between = list(x = 0), scales = list(tck = c(1,0), alternating=1),
            xlab="Number of clusters", ylab="Median % error",
            par.settings=list(layout.widths=list(right.padding=10)), 
            auto.key=list(text=c("Median % error","Benchmarking reduction factor"),space="top", 
                          lines=TRUE, points=FALSE, columns=2),
            strip=strip.custom(bg="lightgray"),
            panel=function(x, y, ...) {
             panel.xyplot(x,y, ...)


             xscale <- current.viewport()$xscale

             if (panel.number() == 3) {
               yscale <- current.viewport()$yscale
               pushViewport(viewport(width=2, height=2, clip=TRUE))
               pushViewport(viewport(width=0.5, height=0.5,
                                      xscale=xscale, yscale=yscale))

               at <- pretty(c(0, maxsp))
               panel.axis("right", half = FALSE, outside=TRUE,
                          at = at * convert, labels = at)

               panel.text(32, maxer/2, "Benchmarking reduction factor", srt=+90)
               popViewport()
               popViewport()

             }
             error = y[best_c]
             speed = y[length(y)/2+best_c]/convert
             panel.abline(v=best_c, col.line="darkblue", lty=3)
             panel.points(best_c, error, pch=1, cex=1.75, col="darkgreen")
             panel.points(best_c, speed*convert, pch=1, cex=1.75, col="darkred")
             panel.text(best_c-2.5, error-1.5, paste0(signif(error,2), "%"), cex=0.8, 
                        col="darkgreen")
             panel.text(best_c-2.5, (speed+2)*convert, paste0("x", signif(speed,2)), cex=0.8, 
                        col="darkred")
       })

print(p)
if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
In [19]:
%%R -w 4 -h 3 -u in -r 150

if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "comp_random_clustering.pdf", sep="/")
    pdf(outputpdf, width=3.7, height=3)
}

data = read.csv(paste(DATA_PATH, "random-1000.csv", sep="/"), header=T)
data$cluster = all[all$variable == "median_per_err",]$value/100
g <- ggplot(data, aes(x=n))
g <- g + geom_line(aes(y=median*100,linetype="B"))
g <- g + geom_line(aes(y=minimum*100,linetype="C"))
g <- g + geom_line(aes(y=maximum*100,linetype="A"))
g <- g + geom_ribbon(data=data,aes(ymin=minimum*100,ymax=maximum*100, fill="Range"),
                     alpha=0.3, fill="gray")
g <- g + geom_line(aes(y=cluster*100,   color="GA features"), size = 1.3)
g <- g + labs(x = "Number of clusters", y = "Median % error")
g <- g + xlim(1,24)
#g <- g + scale_y_continuous(labels = percent_format())
g <- g + guides(col=guide_legend(title=NULL))
g <- g + guides(linetype=guide_legend(title=NULL))
g <- g + facet_wrap(~architecture, ncol=1, scales = "free_y")
g <- g + theme_bw(base_size=9)
g <- g + theme(legend.position = "none")
g <- g + theme(legend.key= element_blank(),legend.position = "right",
               legend.direction = "vertical")
g <- g + scale_linetype_manual(values=c(1,2,3), labels=c("Worst", "Median", "Best"))
#g <- g + guides(linetype = guide_legend(nrow = 2))
print(g)
if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
In [20]:
%%R -w 4 -h 4 -u in -r 150

D = data.frame()
# Limiting our method to extracting representatives only per application"

for (arch in c("atom", "core2", "sandybridge")) { 
     S = new_session(bench, arch)
     prediction = cycle_prediction(S, feature_set, cluster_codelets(S, feature_set, 1))
     G = compute_global_speedup(compute_appli_cycles(S, prediction))
     print(s)

  for (ncluster in seq(1,12)) {
     errors = c()
     nclusters = 0
     time_required = 0
     # mg cannot be predicted with codelets from mg, because they are all ill-behaved
     # (another disadvantage of being limited to a single application)
     # so it's not included. 
     for (app in c("bt", "cg", "ft", "is", "lu", "sp")) {
       S = new_session(bench, arch)
       S = keep_only_app(S, app)

       # Run the benchmark reduction
       clusters = cluster_codelets(S, feature_set, min(ncluster, nrow(S$data)))
       prediction = cycle_prediction(S, feature_set, clusters)
       errors = c(errors, prediction$per_err)
       stats = prediction_statistics(S, prediction)
       nclusters = nclusters + max(prediction$Cluster)
       time_required = time_required + stats$benchmark_cost_ms
     }
     D = rbind(D, data.frame(Architecture=as.factor(as.character(arch_names[arch])), 
                             method="Per Application", median_per_err=median(errors)*100, 
                             benchmarking_speedup=G$appli_benchmark_cost_ms/time_required, 
                             ncluster=nclusters))
  }
}
if(PDF_OUTPUT_DIR != "") {
    outputpdf=paste(PDF_OUTPUT_DIR, "across_vs_single.pdf", sep="/")
    pdf(outputpdf, width=3, height=4)
}

E = cast(all[,c("ncluster", "Architecture", "variable", "value")])
E$method = "Across Applications"
print(colnames(E))
print(colnames(D))
data = rbind(E, D)

p = ggplot(data=data, aes(y=median_per_err, x=ncluster, color=method, 
                          linetype=method, shape=method))  + geom_line() + geom_point()
p = p + facet_wrap(~Architecture, ncol=1)
p = p + theme_bw(base_size=9)
p = p + theme(legend.position="top") + theme(legend.key = element_blank())
p <- p + scale_shape_manual("Subsetting", values=c(1,2), 
                            labels=c("Across Applications", "Per Application"))
p <- p + scale_linetype_manual("Subsetting", values=c(1,2), 
                            labels=c("Across Applications", "Per Application"))
p <- p + scale_color_manual("Subsetting", values=colours[1:2],
                            labels=c("Across Applications", "Per Application"))
p <- p + labs(x = "Number of clusters", y = "Median % error") + xlim(0,25)
print(p)


if(PDF_OUTPUT_DIR != "") { dev.off(); print(paste("Saved to", outputpdf)) }
[1] 26
[1] 26
[1] 26
[1] "ncluster"             "Architecture"         "median_per_err"      
[4] "benchmarking_speedup" "method"              
[1] "Architecture"         "method"               "median_per_err"      
[4] "benchmarking_speedup" "ncluster"            

Codelet clustering is an important part of the method. It enables to keep a single copy among a group of similar codelets. One may wonder if clustering effectively reduces system selection overhead. This is a valid question because the acceleration of our method comes from two factors:

The following experiment examines separately the contributions of these two factors.

In [21]:
%%R

compute_speedup_breakdown <- function(S) {
  clusters = cluster_codelets(S, feature_set, number_of_clusters)  
  prediction = cycle_prediction(S, feature_set, clusters) 
  stats = prediction_statistics(S, prediction)
  speedup = compute_global_speedup(compute_appli_cycles(S, prediction))
  without_clustering = sum(S$data$tar_vitro_CPInv*S$data$tar_vitro_invocations/S$tar_clk)
  return(c(speedup$appli_benchmark_cost_ms/stats$benchmark_cost_ms, 
           speedup$appli_benchmark_cost_ms/without_clustering))
}

bench="NAS"
table = data.frame()
for (arch in c("atom", "core2", "sandybridge")) {
  architecture=arch
  number_of_clusters=-1

  S = new_session(bench, architecture)
  both = compute_speedup_breakdown(S)
  table = rbind(table, data.frame(architecture=arch, with_clusters=both[1], 
                                  without_clusters = both[2], ratio=both[1]/both[2]))
}
print(table)
  architecture with_clusters without_clusters    ratio
1         atom      44.29743        12.079576 3.667134
2        core2      24.72561         8.687125 2.846237
3  sandybridge      22.54264         6.293770 3.581739

Example: Using the prediction matrix

This sections shows how to build the prediction matrix used to transform the representative's measures on a target architecture into the performance estimation of all the codelets.

In [22]:
%%R

# Load session data
S= new_session("NR", "atom")

# Run the benchmark reduction
clusters = cluster_codelets(S, feature_set, number_of_clusters=14)            
prediction = cycle_prediction(S, feature_set, clusters)

The matrix \(\textbf{M}\) is defined as:

\[ \mathbf{M_{i,k}} = \left\{ \begin{array}{l l} 0 &\quad \text{ if } p_i \notin C_k\\ \frac{t^{ref}_i}{t^{ref}_{r_k}} &\quad \text{ if } p_i \in C_k \end{array} \right.\ \]

In [23]:
%%R

# Outputs the model matrix

build_pred_matrix <- function(prediction) {
  N=nrow(prediction)
  K=max(prediction$Cluster)
  M=matrix(0,nrow=N, ncol=K)
  colnames(M) = seq(max(prediction$Cluster))
  rownames(M) = cleanCodeletNames(prediction)$CodeletName

  reprs = data.frame()
  for (C in seq(max(prediction$Cluster))) {
    clus = prediction[prediction$Cluster == C,]
    repr = clus[clus$is.representative,]
    reprs = rbind(reprs, repr)
    t_ref_rk = repr$ref_vivo_CPInv
    for (i in seq(nrow(clus))) {
        n = as.integer(rownames(clus[i,]))
        t_ref_i = clus[i,]$ref_vivo_CPInv
        M[n,C] = t_ref_i/t_ref_rk
    }
  }
  return(list(M=M, reprs=reprs))
}

pred = build_pred_matrix(prediction)

pred$M = round(pred$M, 2)
print(as.table(local({pred$M[pred$M == 0] = NA;pred$M})))
             1    2    3    4    5    6    7    8    9   10   11   12   13   14
balanc_3  1.00                                                                 
elmhes_10 1.41                                                                 
elmhes_11      1.43                                                            
four1_2             0.57                                                       
hqr_12                   0.48                                                  
hqr_12_sq                1.00                                                  
hqr_13                   3.31                                                  
hqr_15                        1.00                                             
jacobi_5                 0.48                                                  
lop_13                             1.00                                        
ludcmp_4                                1.00                                   
matadd_16      0.16                                                            
mprove_8                                     0.16                              
mprove_9       0.18                                                            
realft_4                                          1.00                         
relax2_26                                              1.00                    
rstrct_29                                    0.24                              
svbksb_3                                                    1.00               
svdcmp_11      1.00                                                            
svdcmp_13                                                        1.00          
svdcmp_14                                                        1.37          
svdcmp_6                                                              1.00     
toeplz_1                                     1.00                              
toeplz_2            1.00                                                       
toeplz_3                                                    8.94               
toeplz_4                                          1.51                         
tridag_1                                                                   1.00
tridag_2            1.29                                                       

We time the 14 representatives on the target architecture and gather the measures in a vector: \(\vec{t^{tar}_{repr}}\)

In [24]:
%%R
t_repr_tar = matrix(pred$reprs$tar_vitro_CPInv)
print(t_repr_tar)
              [,1]
 [1,]   50744366.0
 [2,]  572788950.3
 [3,]   97536878.2
 [4,]    7131365.7
 [5,]     125812.3
 [6,]  124148677.3
 [7,]   91914428.2
 [8,]  124580017.1
 [9,]   74166891.8
[10,] 2778535280.0
[11,]    9366759.0
[12,]  148014005.5
[13,]  362832247.3
[14,]  372923685.6

The prediction of all the codelets is computed with the following formula:

\[ \vec{t^{tar}_{all}} = \textbf{M} . \vec{t^{tar}_{repr}} \]

In [25]:
%%R
predicted = pred$M %*% t_repr_tar

print(signif(data.frame(predicted=predicted, real=prediction$tar_vivo_CPInv),2))
          predicted    real
balanc_3    5.1e+07 5.1e+07
elmhes_10   7.2e+07 7.7e+07
elmhes_11   8.2e+08 5.8e+08
four1_2     5.6e+07 9.1e+07
hqr_12      3.4e+06 4.6e+06
hqr_12_sq   7.1e+06 7.1e+06
hqr_13      2.4e+07 2.7e+07
hqr_15      1.3e+05 1.3e+05
jacobi_5    3.4e+06 4.6e+06
lop_13      1.2e+08 1.2e+08
ludcmp_4    9.2e+07 9.2e+07
matadd_16   9.2e+07 5.7e+07
mprove_8    2.0e+07 3.1e+07
mprove_9    1.0e+08 6.8e+07
realft_4    7.4e+07 7.4e+07
relax2_26   2.8e+09 2.8e+09
rstrct_29   3.0e+07 2.8e+07
svbksb_3    9.4e+06 9.4e+06
svdcmp_11   5.7e+08 5.7e+08
svdcmp_13   1.5e+08 1.5e+08
svdcmp_14   2.0e+08 1.2e+08
svdcmp_6    3.6e+08 3.6e+08
toeplz_1    1.2e+08 1.2e+08
toeplz_2    9.8e+07 9.8e+07
toeplz_3    8.4e+07 9.6e+07
toeplz_4    1.1e+08 1.1e+08
tridag_1    3.7e+08 3.7e+08
tridag_2    1.3e+08 1.0e+08

Information for Reproducing the NAS measurements

List of the original NAS codelets, the file from which they were extracted and the position in the file. The source lines below enclose innermost loop. That is because they are produced during our Static Analysis which focuses on innermost loops. Nevertheless, the codelets that were extracted correspond to the outermost loop enclosing the below source lines.

In [26]:
%%R
# Load session data
S= new_session("NAS", "core2")
#print(colnames(S$features))
print(S$features[,c("CodeletName","BenchName", "Source.file", "Source.lines")])
        CodeletName BenchName                 Source.file Source.lines
3  codelet_1q1uztgi        bt    /NPB3.0-SER/BT/y_solve.f       42-124
12 codelet_9salavfj        bt    /NPB3.0-SER/BT/z_solve.f       42-124
26 codelet_f5wo0uxw        bt  /NPB3.0-SER/BT/exact_rhs.f        24-26
32 codelet_jbbvz7xk        bt        /NPB3.0-SER/BT/rhs.f       61-110
34 codelet_jxc4wc0u        bt  /NPB3.0-SER/BT/exact_rhs.f      331-333
35 codelet_k2brr96z        bt        /NPB3.0-SER/BT/rhs.f      266-311
41 codelet_n94ixut8        bt        /NPB3.0-SER/BT/rhs.f      244-251
43 codelet_npabl6hq        bt        /NPB3.0-SER/BT/rhs.f      371-373
48 codelet_oo8c273h        bt /NPB3.0-SER/BT/initialize.f        28-30
54 codelet_rbjjtmtw        bt        /NPB3.0-SER/BT/add.f        20-22
55 codelet_salgyh4a        bt        /NPB3.0-SER/BT/rhs.f      339-343
61 codelet_vp8kou9q        bt        /NPB3.0-SER/BT/rhs.f        46-48
62 codelet_y3z9unfb        bt        /NPB3.0-SER/BT/rhs.f        23-33
65 codelet_zdssy5el        bt    /NPB3.0-SER/BT/x_solve.f       45-127
67 codelet_zvxgk4ke        bt      /NPB3.0-SER/BT/error.f        70-72
6  codelet_3lkaf7j3        cg         /NPB3.0-SER/CG/cg.f      594-595
8  codelet_77bv9i9o        cg         /NPB3.0-SER/CG/cg.f      609-610
14 codelet_ah8c616d        cg         /NPB3.0-SER/CG/cg.f      195-196
27 codelet_ffhehgar        cg         /NPB3.0-SER/CG/cg.f      797-799
52 codelet_psux6bds        cg         /NPB3.0-SER/CG/cg.f      818-823
1  codelet_04wki9m9        ft    /NPB3.0-SER/FT/auxfnct.f      168-171
15 codelet_ajh8suhy        ft      /NPB3.0-SER/FT/fft3d.f        77-82
18 codelet_bfvnd0aj        ft      /NPB3.0-SER/FT/appft.f        45-47
20 codelet_cwibzte7        is         /NPB3.0-SER/IS/is.c      341-349
39 codelet_lwgg2vdo        is         /NPB3.0-SER/IS/is.c      475-476
53 codelet_qbx4gern        is         /NPB3.0-SER/IS/is.c      387-389
59 codelet_v392t8ky        is         /NPB3.0-SER/IS/is.c      375-376
64 codelet_zds65i92        is         /NPB3.0-SER/IS/is.c      380-381
7  codelet_5g6luxej        lu     /NPB3.0-SER/LU/l2norm.f        45-46
10 codelet_7ztudex2        lu       /NPB3.0-SER/LU/erhs.f        37-39
13 codelet_a5i9q5oc        lu       /NPB3.0-SER/LU/blts.f       73-243
17 codelet_bdcevv5m        lu       /NPB3.0-SER/LU/buts.f       70-239
25 codelet_f5scgu90        lu       /NPB3.0-SER/LU/erhs.f        49-57
28 codelet_gowmx4sy        lu       /NPB3.0-SER/LU/jacu.f       40-123
29 codelet_id2f9qxy        lu        /NPB3.0-SER/LU/rhs.f      271-278
30 codelet_id4su9eh        lu       /NPB3.0-SER/LU/erhs.f      287-294
45 codelet_nzbfxce7        lu        /NPB3.0-SER/LU/rhs.f       77-100
49 codelet_p6nf3vla        lu       /NPB3.0-SER/LU/erhs.f       96-120
51 codelet_pixsirbt        lu      /NPB3.0-SER/LU/jacld.f       40-123
56 codelet_syvfdqbb        lu       /NPB3.0-SER/LU/erhs.f      408-415
60 codelet_vdku9m9a        lu        /NPB3.0-SER/LU/rhs.f      391-398
63 codelet_yl02rn5o        lu        /NPB3.0-SER/LU/rhs.f        36-38
5  codelet_2xepi1y9        mg         /NPB3.0-SER/MG/mg.f      520-531
11 codelet_9guj5g8w        mg         /NPB3.0-SER/MG/mg.f    1164-1164
22 codelet_e77x2p6t        mg         /NPB3.0-SER/MG/mg.f      687-697
23 codelet_eh02x069        mg         /NPB3.0-SER/MG/mg.f      919-927
31 codelet_j5k2371m        mg         /NPB3.0-SER/MG/mg.f      590-606
33 codelet_jhez9gww        mg         /NPB3.0-SER/MG/mg.f    1084-1103
2  codelet_0612t675        sp        /NPB3.0-SER/SP/rhs.f        23-38
4  codelet_2v56b796        sp        /NPB3.0-SER/SP/add.f        20-22
9  codelet_7hw4mhgq        sp /NPB3.0-SER/SP/initialize.f        29-30
16 codelet_alcg4uf1        sp  /NPB3.0-SER/SP/exact_rhs.f        25-27
19 codelet_ciposq9y        sp        /NPB3.0-SER/SP/rhs.f      259-263
21 codelet_e3rjjpkq        sp        /NPB3.0-SER/SP/rhs.f       66-115
24 codelet_el0upqbv        sp        /NPB3.0-SER/SP/rhs.f      389-391
36 codelet_k2ftuhal        sp    /NPB3.0-SER/SP/z_solve.f       94-108
37 codelet_l044uhir        sp      /NPB3.0-SER/SP/ninvr.f        21-36
38 codelet_lwazhy7r        sp        /NPB3.0-SER/SP/rhs.f        52-54
40 codelet_n3f9ckan        sp     /NPB3.0-SER/SP/txinvr.f        23-48
42 codelet_n9k4kgux        sp      /NPB3.0-SER/SP/pinvr.f        21-36
44 codelet_nrpad2bf        sp        /NPB3.0-SER/SP/rhs.f      275-320
46 codelet_o8g6pwm1        sp     /NPB3.0-SER/SP/tzetar.f        24-53
47 codelet_ocwkgsmf        sp  /NPB3.0-SER/SP/exact_rhs.f      330-332
50 codelet_p8woixip        sp    /NPB3.0-SER/SP/y_solve.f       93-107
57 codelet_tw2jfd61        sp    /NPB3.0-SER/SP/x_solve.f       91-105
58 codelet_un3n5c0a        sp      /NPB3.0-SER/SP/error.f        68-70
66 codelet_ztezhmhm        sp        /NPB3.0-SER/SP/rhs.f      353-357

List of the 18 representatives found with the elbow method (number_of_clusters set to -1)

In [27]:
%%R
clusters = cluster_codelets(S, feature_set, number_of_clusters=-1)            
prediction = cycle_prediction(S, feature_set, clusters) 
pred = build_pred_matrix(prediction)

print(pred$reprs[, c("CodeletName")])
 [1] codelet_9salavfj codelet_jbbvz7xk codelet_jxc4wc0u codelet_k2brr96z
 [5] codelet_npabl6hq codelet_rbjjtmtw codelet_salgyh4a codelet_lwazhy7r
 [9] codelet_y3z9unfb codelet_un3n5c0a codelet_vdku9m9a codelet_lwgg2vdo
[13] codelet_f5scgu90 codelet_zds65i92 codelet_bdcevv5m codelet_gowmx4sy
[17] codelet_id4su9eh codelet_p8woixip
67 Levels: codelet_04wki9m9 codelet_0612t675 ... codelet_zvxgk4ke

An archive with the extracted 18 standalone microbenchmarks is available on request.

Prediction matrix:

In [28]:
%%R
print(signif(pred$M))
                        1        2        3        4         5       6        7
codelet_1q1uztgi 0.977837 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_9salavfj 1.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_f5wo0uxw 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_jbbvz7xk 0.000000 1.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_jxc4wc0u 0.000000 0.000000 1.000000 0.000000  0.000000 0.00000 0.000000
codelet_k2brr96z 0.000000 0.000000 0.000000 1.000000  0.000000 0.00000 0.000000
codelet_n94ixut8 0.000000 1.046650 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_npabl6hq 0.000000 0.000000 0.000000 0.000000  1.000000 0.00000 0.000000
codelet_oo8c273h 0.000000 0.000000 0.000000 0.000000  0.000000 1.11039 0.000000
codelet_rbjjtmtw 0.000000 0.000000 0.000000 0.000000  0.000000 1.00000 0.000000
codelet_salgyh4a 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 1.000000
codelet_vp8kou9q 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_y3z9unfb 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_zdssy5el 0.935144 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_zvxgk4ke 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_3lkaf7j3 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_77bv9i9o 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_ah8c616d 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_ffhehgar 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_psux6bds 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_04wki9m9 0.000000 0.000000 0.000000 0.000000 31.600800 0.00000 0.000000
codelet_ajh8suhy 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_bfvnd0aj 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_cwibzte7 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_lwgg2vdo 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_qbx4gern 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_v392t8ky 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_zds65i92 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_5g6luxej 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_7ztudex2 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_a5i9q5oc 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_bdcevv5m 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_f5scgu90 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_gowmx4sy 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_id2f9qxy 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_id4su9eh 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_nzbfxce7 0.000000 1.394120 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_p6nf3vla 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_pixsirbt 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_syvfdqbb 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_vdku9m9a 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_yl02rn5o 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_2xepi1y9 0.000000 0.302138 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_9guj5g8w 0.000000 0.000000 0.000000 0.000000  0.000000 2.30734 0.000000
codelet_e77x2p6t 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_eh02x069 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_j5k2371m 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 1.341950
codelet_jhez9gww 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_0612t675 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_2v56b796 0.000000 0.000000 0.000000 0.000000  0.000000 1.00073 0.000000
codelet_7hw4mhgq 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_alcg4uf1 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_ciposq9y 0.000000 1.050710 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_e3rjjpkq 0.000000 0.977226 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_el0upqbv 0.000000 0.000000 0.000000 0.000000  0.974259 0.00000 0.000000
codelet_k2ftuhal 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_l044uhir 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_lwazhy7r 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_n3f9ckan 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_n9k4kgux 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_nrpad2bf 0.000000 0.000000 0.000000 0.897745  0.000000 0.00000 0.000000
codelet_o8g6pwm1 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_ocwkgsmf 0.000000 0.000000 0.970971 0.000000  0.000000 0.00000 0.000000
codelet_p8woixip 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_tw2jfd61 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_un3n5c0a 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.000000
codelet_ztezhmhm 0.000000 0.000000 0.000000 0.000000  0.000000 0.00000 0.992263
                         8        9       10       11       12      13
codelet_1q1uztgi  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_9salavfj  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_f5wo0uxw  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_jbbvz7xk  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_jxc4wc0u  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_k2brr96z  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_n94ixut8  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_npabl6hq  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_oo8c273h  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_rbjjtmtw  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_salgyh4a  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_vp8kou9q  0.958280 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_y3z9unfb  0.000000 1.000000 0.000000  0.00000 0.000000  0.0000
codelet_zdssy5el  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_zvxgk4ke  0.000000 0.000000 0.987196  0.00000 0.000000  0.0000
codelet_3lkaf7j3  0.000000 0.000000 0.000000 16.78090 0.000000  0.0000
codelet_77bv9i9o  0.000000 0.000000 0.000000  0.00000 0.231473  0.0000
codelet_ah8c616d  0.812552 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_ffhehgar  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_psux6bds 15.888100 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_04wki9m9  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_ajh8suhy  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_bfvnd0aj  0.000000 0.000000 0.000000  0.00000 0.000000 14.2924
codelet_cwibzte7  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_lwgg2vdo  0.000000 0.000000 0.000000  0.00000 1.000000  0.0000
codelet_qbx4gern  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_v392t8ky  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_zds65i92  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_5g6luxej  0.000000 0.000000 0.990139  0.00000 0.000000  0.0000
codelet_7ztudex2  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_a5i9q5oc  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_bdcevv5m  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_f5scgu90  0.000000 0.000000 0.000000  0.00000 0.000000  1.0000
codelet_gowmx4sy  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_id2f9qxy  0.000000 0.000000 0.000000  0.89671 0.000000  0.0000
codelet_id4su9eh  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_nzbfxce7  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_p6nf3vla  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_pixsirbt  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_syvfdqbb  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_vdku9m9a  0.000000 0.000000 0.000000  1.00000 0.000000  0.0000
codelet_yl02rn5o  0.000000 1.361830 0.000000  0.00000 0.000000  0.0000
codelet_2xepi1y9  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_9guj5g8w  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_e77x2p6t  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_eh02x069  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_j5k2371m  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_jhez9gww  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_0612t675  0.000000 1.133210 0.000000  0.00000 0.000000  0.0000
codelet_2v56b796  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_7hw4mhgq  0.964511 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_alcg4uf1  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_ciposq9y  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_e3rjjpkq  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_el0upqbv  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_k2ftuhal  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_l044uhir  0.000000 0.580994 0.000000  0.00000 0.000000  0.0000
codelet_lwazhy7r  1.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_n3f9ckan  0.000000 0.955131 0.000000  0.00000 0.000000  0.0000
codelet_n9k4kgux  0.000000 0.577596 0.000000  0.00000 0.000000  0.0000
codelet_nrpad2bf  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_o8g6pwm1  0.000000 1.091430 0.000000  0.00000 0.000000  0.0000
codelet_ocwkgsmf  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_p8woixip  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_tw2jfd61  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
codelet_un3n5c0a  0.000000 0.000000 1.000000  0.00000 0.000000  0.0000
codelet_ztezhmhm  0.000000 0.000000 0.000000  0.00000 0.000000  0.0000
                         14      15      16       17         18
codelet_1q1uztgi 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_9salavfj 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_f5wo0uxw 0.00000000 0.00000 0.00000 0.000000 0.25231700
codelet_jbbvz7xk 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_jxc4wc0u 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_k2brr96z 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_n94ixut8 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_npabl6hq 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_oo8c273h 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_rbjjtmtw 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_salgyh4a 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_vp8kou9q 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_y3z9unfb 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_zdssy5el 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_zvxgk4ke 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_3lkaf7j3 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_77bv9i9o 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_ah8c616d 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_ffhehgar 0.00000000 0.00000 0.00000 0.000000 0.38075000
codelet_psux6bds 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_04wki9m9 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_ajh8suhy 0.00000000 0.00000 0.00000 0.000000 0.00290214
codelet_bfvnd0aj 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_cwibzte7 2.03710000 0.00000 0.00000 0.000000 0.00000000
codelet_lwgg2vdo 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_qbx4gern 0.00786582 0.00000 0.00000 0.000000 0.00000000
codelet_v392t8ky 0.02705570 0.00000 0.00000 0.000000 0.00000000
codelet_zds65i92 1.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_5g6luxej 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_7ztudex2 0.00000000 0.00000 0.00000 0.000000 0.23910400
codelet_a5i9q5oc 0.00000000 1.00895 0.00000 0.000000 0.00000000
codelet_bdcevv5m 0.00000000 1.00000 0.00000 0.000000 0.00000000
codelet_f5scgu90 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_gowmx4sy 0.00000000 0.00000 1.00000 0.000000 0.00000000
codelet_id2f9qxy 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_id4su9eh 0.00000000 0.00000 0.00000 1.000000 0.00000000
codelet_nzbfxce7 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_p6nf3vla 0.00000000 0.00000 0.00000 0.693504 0.00000000
codelet_pixsirbt 0.00000000 0.00000 1.01719 0.000000 0.00000000
codelet_syvfdqbb 0.00000000 0.00000 0.00000 0.925292 0.00000000
codelet_vdku9m9a 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_yl02rn5o 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_2xepi1y9 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_9guj5g8w 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_e77x2p6t 0.00000000 0.00000 0.00000 0.000000 0.09039200
codelet_eh02x069 0.01047910 0.00000 0.00000 0.000000 0.00000000
codelet_j5k2371m 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_jhez9gww 0.01228560 0.00000 0.00000 0.000000 0.00000000
codelet_0612t675 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_2v56b796 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_7hw4mhgq 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_alcg4uf1 0.00000000 0.00000 0.00000 0.000000 0.27349800
codelet_ciposq9y 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_e3rjjpkq 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_el0upqbv 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_k2ftuhal 0.00000000 0.00000 0.00000 0.000000 1.30909000
codelet_l044uhir 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_lwazhy7r 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_n3f9ckan 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_n9k4kgux 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_nrpad2bf 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_o8g6pwm1 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_ocwkgsmf 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_p8woixip 0.00000000 0.00000 0.00000 0.000000 1.00000000
codelet_tw2jfd61 0.00000000 0.00000 0.00000 0.000000 0.77052400
codelet_un3n5c0a 0.00000000 0.00000 0.00000 0.000000 0.00000000
codelet_ztezhmhm 0.00000000 0.00000 0.00000 0.000000 0.00000000

Function to generate the random clusterings in figure 7

In [29]:
%%R
#
# This function called on S$data will generate a random clustering of K clusters
#
random_clustering <- function(data, K) {
   data$Cluster = rep(1,nrow(data))

   # We want exactly K clusters, therefore select randomly K distinct clusters and 
   # attribute them to clusters 1 to K.
   ver = rep(T,nrow(data))
   for(i in (1:K))
   {
      n = which(ver)[round(runif(1,min=1,max=(nrow(data)-i+1)),digits=0)]
      data$Cluster[n] <- i
      ver[n] <- F
   }
   
   # Now label all other codelets with a random cluster between 1 and K
   data$Cluster [ver] <- round(runif(nrow(data)-K,min=1,max=K),digits=0)
   return (data)
}

Example: generate a random clustering for NR codelets.

In [30]:
%%R
S = new_session("NR", "atom")
cluster = random_clustering(S$data,10)
print(cluster[,c("CodeletName", "Cluster")])
                      CodeletName Cluster
1                 balanc_3_dp_sse       8
2                elmhes_10_dp_sse       7
3                elmhes_11_dp_sse      10
4                  four1_2_mp_sse       2
5                   hqr_12_sp_sse       2
6            hqr_12_square_sp_sse       2
7                   hqr_13_dp_sse       7
8                   hqr_15_sp_sse      10
9                 jacobi_5_sp_sse       2
10                  lop_13_dp_sse       4
11                ludcmp_4_sp_sse       1
12               matadd_16_dp_sse       9
13                mprove_8_mp_sse       8
14                mprove_9_dp_sse       3
15                realft_4_dp_sse       4
16               relax2_26_dp_sse       3
17               rstrct_29_dp_sse       3
18                svbksb_3_sp_sse       4
19               svdcmp_11_dp_sse       8
20               svdcmp_13_dp_sse       7
21               svdcmp_14_dp_sse       4
22                svdcmp_6_dp_sse       7
23                toeplz_1_dp_sse       9
24                toeplz_2_dp_sse       1
25                toeplz_3_dp_sse      10
26                toeplz_4_dp_sse       5
27 tridag_1_bet1_dt0_sse_initbet1       6
28                tridag_2_dp_sse       5

Genetic Algorithm exploration for selecting the features

In [31]:
%%R
require(genalg)
require(permute)

bench = "NR"
number_of_clusters = -1 # we use elbow during GA exploration

features = scan(file=paste(DATA_PATH, "features.list", sep="/"), what="character")

S = list("atom" = new_session(bench, "atom"),
         "sandybridge" = new_session(bench, "sandybridge"))

eval_arch <- function(clusters, arch, features) {
  prediction = cycle_prediction(S[[arch]], features, clusters) 
  stats = prediction_statistics(S[[arch]], prediction)
  return(stats$mean_per_err)
}

eval_func <- function(c) {
  # The empty chromosome has an error of 100 percent
  if (sum(c) <= 3) {
    return(100)
  }

  select_features = features[as.logical(c)]
  
  clusters = cluster_codelets(S[["sandybridge"]], select_features, number_of_clusters)            

  sandybridge = eval_arch(clusters, "sandybridge", select_features)
  
  atom_clusters = S[["atom"]]$data
  atom_clusters$Cluster = clusters$Cluster
  atom = eval_arch(atom_clusters, "atom")

  return (max(atom, sandybridge)*max(atom_clusters$Cluster))
}

monitor <- function(obj) {
    minEval = min(obj$evaluations)
    filter = obj$evaluations == minEval
    bestObjectCount = sum(rep(1, obj$popSize)[filter])
    # ok, deal with the situation that more than one object is best
    if (bestObjectCount > 1) {
        bestSolution = obj$population[filter,][1,]
    } else {
        bestSolution = obj$population[filter,]
    }

    # plot the population
    print(obj$best)
    print(obj$mean)
    cat(summary.rbga(obj))
    
    print(features[as.logical(bestSolution)])
}
Loading required package: genalg
Read 76 items

The real GA exploration used in the paper had 1000 individuals and 100 generations. It took more than 12 hours to run and was not run through IPython Notebook. Here we demonstrate with a toy population. Feel free to set the parameters to those of the paper to reproduce the original experiment.

In [32]:
%%R

PopulationSize = 10 # The real value during our experiments in 1000, 
                    # the exploration took more than 12 hours.
Generations = 5 # The real value used was 100
GAmodel <- rbga.bin(size = length(features), popSize = PopulationSize, 
                    iters = Generations, mutationChance = 0.01, 
    elitism = T, evalFunc = eval_func, monitorFunc = monitor)

plot(GAmodel)
save(GAmodel, file="gamodel.data")
[1] 1.294498       NA       NA       NA       NA
[1] 11.58573       NA       NA       NA       NA
GA Settings
  Type                  = binary chromosome
  Population size       = 10
  Number of Generations = 5
  Elitism               = TRUE
  Mutation Chance       = 0.01

Search Domain
  Var 1 = [,]
  Var 0 = [,]

GA Results
  Best Solution : 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
[1] "L2.request.rate"     "Load.to.Store.ratio" "X87.MFlops.s"       
[4] "Packed.MUOPS.s"      "Nb.instr."           "Nb.uops.P0"         
[7] "Nb.uops.P2"          "Nb.uops.P5"          "Vec..ratio......all"
[1] 1.294498 1.294498       NA       NA       NA
[1] 11.585730  1.730113        NA        NA        NA
GA Settings
  Type                  = binary chromosome
  Population size       = 10
  Number of Generations = 5
  Elitism               = TRUE
  Mutation Chance       = 0.01

Search Domain
  Var 1 = [,]
  Var 0 = [,]

GA Results
  Best Solution : 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
[1] "L2.request.rate"     "Load.to.Store.ratio" "X87.MFlops.s"       
[4] "Packed.MUOPS.s"      "Nb.instr."           "Nb.uops.P0"         
[7] "Nb.uops.P2"          "Nb.uops.P5"          "Vec..ratio......all"
[1] 1.294498 1.294498 1.294498       NA       NA
[1] 11.585730  1.730113  1.591848        NA        NA
GA Settings
  Type                  = binary chromosome
  Population size       = 10
  Number of Generations = 5
  Elitism               = TRUE
  Mutation Chance       = 0.01

Search Domain
  Var 1 = [,]
  Var 0 = [,]

GA Results
  Best Solution : 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
[1] "L2.request.rate"     "Load.to.Store.ratio" "X87.MFlops.s"       
[4] "Packed.MUOPS.s"      "Nb.instr."           "Nb.uops.P0"         
[7] "Nb.uops.P2"          "Nb.uops.P5"          "Vec..ratio......all"
[1] 1.294498 1.294498 1.294498 1.116076       NA
[1] 11.585730  1.730113  1.591848  1.490417        NA
GA Settings
  Type                  = binary chromosome
  Population size       = 10
  Number of Generations = 5
  Elitism               = TRUE
  Mutation Chance       = 0.01

Search Domain
  Var 1 = [,]
  Var 0 = [,]

GA Results
  Best Solution : 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 
[1] "L2.request.rate"                     "Load.to.Store.ratio"                
[3] "X87.MFlops.s"                        "Packed.MUOPS.s"                     
[5] "Nb.uops.P1"                          "Nb.uops.P5"                         
[7] "Vec..ratio......store..FP."          "X.L1..Bytes.loaded...cycle"         
[9] "X.L1..Nb.cycles.if.fully.vectorized"
[1] 1.294498 1.294498 1.294498 1.116076 1.116076
[1] 11.585730  1.730113  1.591848  1.490417  1.399273
GA Settings
  Type                  = binary chromosome
  Population size       = 10
  Number of Generations = 5
  Elitism               = TRUE
  Mutation Chance       = 0.01

Search Domain
  Var 1 = [,]
  Var 0 = [,]

GA Results
  Best Solution : 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 
[1] "L2.request.rate"                     "Load.to.Store.ratio"                
[3] "X87.MFlops.s"                        "Packed.MUOPS.s"                     
[5] "Nb.uops.P1"                          "Nb.uops.P5"                         
[7] "Vec..ratio......store..FP."          "X.L1..Bytes.loaded...cycle"         
[9] "X.L1..Nb.cycles.if.fully.vectorized"




The following cell is used to style this notebook and can be safely ignored.

In [33]:
# Use custom css style for this Notebook
from IPython.core.display import HTML
def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)
css_styling()
Out[33]: