If you regularly deal with problems related to Statistics and you haven’t even heard about R, then you’ve definitely committed a data sin. The R language has become a de facto standard among data analysts and miners for the development of statistical software.
R has many advantages such as that it’s free as in freedom and that you have a lot of prepackaged stuff that’s already available (read the popular New York Times article). R has its own conference, useR!, which in 2009 will be held at Rennes, France.
I’m a regulaR user since 2006, when I first discovered the power of R thRough “the R Book” by Michael Crawley. I wrote some code for a recent paper and right now I’m working on my first R package. Unfortunately, R is not at all popular in Greece, where (almost) everyone uses SPSS or other commercial software. R is mainstream but I think it’s here to stay.
So, don’t be lazy, try R and see for yourself if it fits your needs.
Suppose this is a blog, traditionally speaking; a diary, a place to keep notes and thoughts on past/upcoming events, my character and life lessons I’ve learned. At the end, add vanity (ματαιοδοξία, in Greek) as an important ingredient, otherwise, I’d have written down everything in a diary and kept it locked in a drawer.
After this prologue I can safely present some notes on my upcoming research in the field of (mostly Multivariate) Data Analysis:
I plan to…
- …further study the theoretical aspects of Correspondence Analysis and Related Methods, or simply CARME and particularly examine the parametrization which embeds different methods into a common super-family, which according to M. Greenacre can be useful as a tuning parameter in supervised learning if there is an outcome variable.
-
…analyze, explore and interpret data from social sciences and humanities, as well as from machine learning approaches, such as Collaborative Filtering.
- …develop data analysis software that is reasonably comprehensive, fairly easy to use and free as in freedom. After the CHIC Analysis software, I’m experimenting with Tcl/Tk and R in order to build a package which implements a graphical user interface for the ca package of Nenadic & Greenacre.
Not boring enough?
It was a love at first sight. The Singular Value Decomposition-SVD came into my (research) life back in 2002 as a crucial step of a whole family of Multivariate Data Analysis methods, such as Correspondence Analysis, Principal Component Analysis, Multidimensional Scaling and other. Most of us take advantage of its nice geometric properties as a “black box”, since it’s an important factorization technique with a wide spectrum of applications in statistics and machine learning.
I was wondering how many research articles mention the term SVD in a time period from 1970 to 2008, so I did a series of searches in Google Scholar. SVD gains more and more attention (see Figure below), especially after 1990, with a boom during the 00s. However, I can’t explain the fall in 2008 (I bet it’s not statistically significant :p). How long can the SVD stand the competition of more efficient methods? Will see.

SVD Search Results in Google Scholar
[read more at Wikipedia]