<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>A blog by any name</title>
	<atom:link href="http://www.ssfak.org/bilog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ssfak.org/bilog</link>
	<description>Stelios&#039; random notes and ideas</description>
	<lastBuildDate>Sat, 30 Jan 2010 16:10:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Mapping R functions to SQL</title>
		<link>http://www.ssfak.org/bilog/2010/01/30/mapping-r-functions-to-sql/</link>
		<comments>http://www.ssfak.org/bilog/2010/01/30/mapping-r-functions-to-sql/#comments</comments>
		<pubDate>Sat, 30 Jan 2010 02:29:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://www.ssfak.org/bilog/?p=19</guid>
		<description><![CDATA[Introduction
I assume that I have the following data.frame definition
d&#60;-data.frame(a=1:10, b=rnorm(10), c=rnorm(10))
with output similar to
    a          b          c
1   1 -1.4340611  1.3757397
2   2  0.1826867  1.4184245
3   3 -0.6749343 [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>I assume that I have the following data.frame definition</p>
<pre>d&lt;-data.frame(a=1:10, b=rnorm(10), c=rnorm(10))</pre>
<p>with output similar to</p>
<pre>    a          b          c
1   1 -1.4340611  1.3757397
2   2  0.1826867  1.4184245
3   3 -0.6749343  0.2227527
4   4 -2.1024966 -1.4681461
5   5 -1.1555581 -0.4754581
6   6  0.4927246  1.5400675
7   7  1.7117311 -0.2342912
8   8  1.5880962 -0.5721555
9   9  0.5862021 -1.1019397
10 10  1.3399340  0.6076585</pre>
<hr />
<h2><a name="_subset"></a>subset</h2>
<p>The <a title="subset" href="http://stat.ethz.ch/R-manual/R-patched/library/base/html/subset.html"><em>subset</em></a> base R function operates on the rows of the data frame to select only the rows that will be given as input to the <em>select</em> argument which is an expression indicating which column to keep.</p>
<h3>Example</h3>
<p>For the rows that have b positive return a data frame consisting of the a and c columns:</p>
<pre>subset(d, b&gt;0, c(a,c))</pre>
<p>The equivalent SQL statement is</p>
<pre>SELECT a, c from d WHERE b&gt;0</pre>
<hr />
<h2>transform</h2>
<p>This <a href="http://stat.ethz.ch/R-manual/R-patched/library/base/html/transform.html">function</a> takes a data frame and additional arguments of the form<em> tag=value</em>. The tags are matched against the <em>names</em> of the data frame, and for those that match, the value replace the corresponding variable in the data frame, and the others are appended to it.</p>
<h3><a name="_example_2"></a>Example</h3>
<pre>transform(d, a=a*c, aa=1)</pre>
<p>which is equivalent to the SQL query</p>
<pre>SELECT d.*, a*c as a, 1 as aa
FROM d</pre>
<hr />
<h2><a name="_aggregate"></a>aggregate</h2>
<p>This <a href="http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html">function</a> is used to return scalar data summaries for one or more columns of a data frame or matrix. The first argument to aggregate is a data frame or matrix containing the variables to be summarized, the second argument is a list containing the variables to be used for grouping, and the third argument is the function to be used to summarize the data. The grouping list determines how the data are split into subsets and then each column in these subsets is given to the aggregation function (third argument). This function, being an aggregate function, should always return a scalar.</p>
<h3><a name="_example_3"></a>Example</h3>
<p>To compute the sums of the odd and even groups do:</p>
<pre>aggregate(d, list(odd=d$a %% 2), sum)</pre>
<p>which gives</p>
<pre>   odd  a          b          c
1   0 30  1.5009449  1.5258489
2   1 25 -0.9666203 -0.2131966</pre>
<p>In SQL this is similar to</p>
<pre>SELECT odd, sum(a), sum(b), sum(c)
FROM (SELECT *, a % 2 as odd
      FROM d)
GROUP BY odd</pre>
<hr />
<h2><a name="_see_also"></a>See also</h2>
<ul>
<li><a href="http://cbare.org/">Christopher Bare</a> has a series of<a href="http://digitheadslabnotebook.blogspot.com/2009/07/select-operations-on-r-data-frames.html"> related</a><a href="http://digitheadslabnotebook.blogspot.com/2009/12/joining-data-frames-in-r.html"> blog</a><a href="http://digitheadslabnotebook.blogspot.com/2009/12/sql-group-by-in-r.html"> posts</a>.</li>
<li><a href="http://cran.r-project.org/web/packages/doBy/">doBy</a> offers a bunch of methods from grouping and summarizing data using a formula based interface.</li>
<li><a href="http://code.google.com/p/sqldf/">sqldf</a> offers an SQL interface for data frames manipulation.</li>
<li>And (finally!) <a href="http://had.co.nz/plyr/">plyr</a> which provides an array of split/apply/combine-like methods from lists, arrays, data frames, etc.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.ssfak.org/bilog/2010/01/30/mapping-r-functions-to-sql/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Estimating effort</title>
		<link>http://www.ssfak.org/bilog/2009/07/22/estimating-effort/</link>
		<comments>http://www.ssfak.org/bilog/2009/07/22/estimating-effort/#comments</comments>
		<pubDate>Wed, 22 Jul 2009 10:23:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[decision-making]]></category>

		<guid isPermaLink="false">http://www.ssfak.org/bilog/?p=16</guid>
		<description><![CDATA[In producing estimates anchoring could be of good use.
]]></description>
			<content:encoded><![CDATA[<p>In producing estimates <a href="http://www.bennorthrop.com/Essays/2009/anchoring_estimates.php">anchoring</a> could be of good use.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssfak.org/bilog/2009/07/22/estimating-effort/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The thought of the day</title>
		<link>http://www.ssfak.org/bilog/2009/07/10/tod/</link>
		<comments>http://www.ssfak.org/bilog/2009/07/10/tod/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 13:59:31 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.ssfak.org/bilog/?p=11</guid>
		<description><![CDATA[When a middleware techonology starts to be the favourite one in EU funded projects, it is near its death and ultimate demise.
]]></description>
			<content:encoded><![CDATA[<p>When a <a href="http://en.wikipedia.org/wiki/Service_Oriented_Architecture">middleware techonology</a> starts to be the favourite one in <a href="http://ec.europa.eu/research/fp7/">EU</a> funded projects, it is near its death and <a href="http://www.flickr.com/photos/psd/1428661128/">ultimate demise</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.ssfak.org/bilog/2009/07/10/tod/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zubin&#8217;s &#8220;controversial&#8221; ideas</title>
		<link>http://www.ssfak.org/bilog/2009/07/10/zubins-controversial-ideas/</link>
		<comments>http://www.ssfak.org/bilog/2009/07/10/zubins-controversial-ideas/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 10:00:39 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[bayesian]]></category>
		<category><![CDATA[machine-learning]]></category>

		<guid isPermaLink="false">http://www.ssfak.org/bilog/?p=3</guid>
		<description><![CDATA[I just found the following interesting opinions of Zubin Ghahramani in the presentation he gave at the Machine Learning Summer School 2005:


I have no idea why anyone would want to use non-subjective   priors. Objective priors are fraught with inconsistencies and no modeling is truly objective anyway. If you want robustness make sure your [...]]]></description>
			<content:encoded><![CDATA[<p>I just found the following interesting opinions of <a title="Title" href="http://learning.eng.cam.ac.uk/zoubin/">Zubin Ghahramani</a> in the <a href="http://videolectures.net/mlss05us_ghahramani_bl/">presentation</a> he gave at the <a href="http://videolectures.net/mlss05us_chicago/">Machine Learning Summer School 2005</a>:</p>
<blockquote>
<ul>
<li>I have no idea why anyone would want to use non-subjective   priors. Objective priors are fraught with inconsistencies and no modeling is truly objective anyway. If you want robustness make sure your prior captures a wide range of reasonable outcomes and use  decision theory to capture your losses.</li>
<li><a href="http://en.wikipedia.org/wiki/Bayesian_inference">Bayesian methods</a> don&#8217;t over fit, because they don&#8217;t fit anything!   Approximate Bayesian methods can have failure modes that look like overfitting.</li>
<li>Anything you can do easily with an SVM you can do with a <a href="http://www.gaussianprocess.org/">Gaussian Process</a> better.</li>
<li>Learning theory is useful to analyze bounds on the performance of   algorithms but I&#8217;m not sure it should be used to design algorithms.</li>
<li>Algorithms should be designed to be sensible given the problem at   hand, ignoring prior knowledge seems very silly.</li>
<li>Well designed <a href="http://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">MCMC </a> methods can sometimes be much faster and perform   better than optimization algorithms.</li>
<li>MAP methods, i.e. using a log prior as a regularizer, are not   Bayesian.</li>
</ul>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.ssfak.org/bilog/2009/07/10/zubins-controversial-ideas/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
