Twitter

Tuesday, September 22, 2015

Usage trends on R and Python (2014 to 2015)

From a latest KDknuggets poll, both R and Python have furthered their dominance as programming language for analytics, data analysis, data mining and modeling. As a self-learning task, I made my very first chord diagram using the R package circlize. It was quite easy to use and took me about 2 minuets to make a first draft and another 30 minuets to refine the layout, color, etc (given I was remote desktoping from an iPad to my 10 years old windows PC in my office!).
It is shown here that both Python and R gained new users. The biggest movements are from previous R-only or Python-only users who decided to adopt the other programming language. This is natural as R and Python offer very different user experiences and in some areas complement each other. For new analytics researchers/data scientists with no or little prior experience, they almost exclusively chose R as a starting point. For users who decide to start using Python, they all had prior programming experience. 

This confirms what I have been speculating (which by no means can be claimed as novel or original).  The most attractive aspect of R is its relative ease of use. R has its own programming challenges and can sometimes hard to debug. But for new users, it does not take long for them to start hacking data. This is precisely the bottleneck for Python. Not everyone is willing to make the leap into scripting programming. The problem areas for R are computing speed, memory management and its interface with other programming tools, all of which are improving. For statisticians, we can leverage our years of experience with R and learn new computational tricks and new tools from other languages that have been interfaced with R, without the need to leave R. 

Here is the R code
> mat.v
             R2015 Python2015 other2015 none2015
R2014      0.40480     0.0506   0.00460    0.000
Python2014 0.01771     0.2093   0.00529    0.000
other2014  0.04370     0.0253   0.16100    0.000
none2014   0.04400     0.0000   0.00000    0.036
> circos.clear()
> circos.par(start.degree=-105)
> circos.par(gap.degree=c(rep(2, nrow(mat.v)-1), 30, 
                          rep(2, ncol(mat.v)-1), 30))
> chordDiagram(mat.v, order=c("R2014", "none2014", 
                             "other2014", "Python2014",
                             "Python2015", "other2015", 
                             "none2015", "R2015"), 
              grid.col=grid.col, directional=TRUE)





No comments: