Posts Tagged ‘PHP’

PHP Stream Filters: Unchunking HTTP Streams

Slinging php code to and fro one day, I found myself needing to process a potentially large result from a url–a result too large to fit within PHP’s memory limit.  However, I could process this result a line at a time, so I could avoid buffering the entire thing in memory.  I couldn’t use cURL, since it buffers everything, but I could use PHP’s handy file-like stream interface, fetch the url with an fopen('http://my-url.n.e.t/', 'r'); and then use fgets() to keep only a line in memory at a time.

It was a great plan, but I noticed that I occasionally got garbage lines or bogus input. Using http cli tools like wget and curl revealed nothing out of the ordinary, until I realized that those garbage lines were the uninterpreted length markers for Transfer-Encoding: chunked. PHP’s http stream handler does not decode chunked transfers.

There is a pecl function http_chunked_decode(), but it operates on strings, not streams, so I would still have to buffer the entire input first.

PHP’s streams allow you to attach a chain of stream filters to a stream to process input and output (it’s the same mechanism ob_gzhandler() uses). My plan was to create a stream filter to transparently unchunk the stream. Unfortunately, the documentation on writing your own stream filter is pretty sparse, and the examples I could find on the web were all very trivial.

After a few false starts, however, I was able to create an http stream unchunker:

/**
* A stream filter for removing the 'chunking' of a 'Transfer-Encoding: chunked'
* http response
*
* The http stream wrapper on php does not support chunked transfer
* encoding, making this filter necessary.
*
* Add to a file resource with <code>stream_filter_append($fp, 'http_unchunk_filter',
* STREAM_FILTER_READ);</code>
*
* If the wrapper metadata for $fp does not contain a <code>transfer-encoding:
* chunked</code> header, this filter passes data through unchanged.
*
* @license BSD
* @author Francis Avila
*/
// Stream filters must subclass php_user_filter
class http_unchunk_filter extends php_user_filter {
	protected $chunkremaining = 0; //bytes remaining in the current chunk
	protected $ischunked = null; //whether the stream is chunk-encoded. null=not sure yet

	// this is the meat of the filter.
	// The class must have a function with this name and prototype
	// It must return a status--one of the PSFS_* constants;
	function filter($in, $out, &$consumed, $closing) {
		if ($this->ischunked===null) {
			$this->ischunked = self::ischunked($this->stream);
		}
		// $in and $out are opaque "bucket brigade" objects which consist of a
		// sequence of opaque "buckets", which contain the actual stream data.
		// The only way to use these objects is the stream_bucket_* functions.
		// Unfortunately, there doesn't seem to be any way to access a bucket
		// without turning it into a string using stream_bucket_make_writeable(),
		// even if you want to pass the bucket along unmodified.

		// Each call to this pops a bucket from the bucket brigade and
		// converts it into an object with two properties: datalen and data.
		// This same object interface is accepted by stream_bucket_append().
		while ($bucket = stream_bucket_make_writeable($in)) {
			if (!$this->ischunked) {
				$consumed += $bucket->datalen;
				stream_bucket_append($out, $bucket);
				continue;
			}
			$outbuffer = '';
			$offset = 0;
			// Loop through the string.  For efficiency, we don't advance a character
			// at a time but try to zoom ahead to where we think the next chunk
			// boundary should be.

			// Since the stream filter divides the data into buckets arbitrarily,
			// we have to maintain state ($this->chunkremaining) across filter() calls.
			while ($offset < $bucket->datalen) {
				if ($this->chunkremaining===0) { // start of new chunk, or the start of the transfer
					$firstline = strpos($bucket->data, "\r\n", $offset);
					$chunkline = substr($bucket->data, $offset, $firstline-$offset);
					$chunklen = current(explode(';', $chunkline, 2)); // ignore MIME-like extensions
					$chunklen = trim($chunklen);
					if (!ctype_xdigit($chunklen)) {
					// There should have been a chunk length specifier here, but since
					// there are non-hex digits something must have gone wrong.
						return PSFS_ERR_FATAL;
					}
					$this->chunkremaining = hexdec($chunklen);
					// $firstline already includes $offset in it
					$offset = $firstline+2; // +2 is CRLF
					if ($this->chunkremaining===0) { //end of the transfer
						break;  // ignore possible trailing headers
					}
				}
				// get as much data as available in a single go...
				$nibble = substr($bucket->data, $offset, $this->chunkremaining);
				$nibblesize = strlen($nibble);
				$offset += $nibblesize; // ...but recognize we may not have got all of it
				if ($nibblesize === $this->chunkremaining) {
					$offset += 2; // skip over trailing CRLF
				}
				$this->chunkremaining -= $nibblesize;
				$outbuffer .= $nibble;
			}
			$consumed += $bucket->datalen;
			$bucket->data = $outbuffer;
			stream_bucket_append($out, $bucket);
		}
		return PSFS_PASS_ON;
	}

	protected static function ischunked($stream) {
		$metadata = stream_get_meta_data($stream);
		$headers = $metadata['wrapper_data'];
		return (bool) preg_grep('/^Transfer-Encoding:\s+chunked\s*$/i', $headers);
	}

	function onCreate() {
		if (isset($this->stream)) { // This is usually not defined until the first filter() call.
			$this->ischunked = self::ischunked($this->stream);
		}
	}
}

stream_filter_register('http_unchunk_filter', 'http_unchunk_filter');

What you are left with is a stream filter you can then use like so:

$fp = fopen('http://my.url', 'r');
stream_filter_append($fp, 'http_unchunk_filter', STREAM_FILTER_READ);

If the http stream has a chunked transfer encoding, the filter will automatically unchunk it. However, it ignores extended data (anything after the hex-encoded chunk-length) and trailing headers, both of which are in the http specification but hardly ever used.

  • tags:
  • 2 Comments

Broadband for the People

Technology author and activist Drew Clark turned to Dancing Mammoth when he wanted to make his idea for Broadbandcensus.com into a reality. He envisioned a site capable of providing the most accurate and up-to-date information on broadband technologies to consumers in the United States.

Dancing Mammoth implemented blogs, wikis, speed tests, comments, real time graphs and carrier data into Broadbandcensus.com and designed the clearinghouse Clark imagined.

The first step in the creation of the site involved gathering data for the “What are your broadband internet options?” function. Dancing Mammoth collected data from the FCC and maps from the U.S. postal service. Data was also gathered from individual carriers websites, this data is usually buried deep in the sites, or worse yet, involved some programming knowledge to scrape the data from the sites. We did the scraping and we did the hours of manipulating data to create a tool where users could search their market by zip code.

The website also continues to learn about broadband markets by surveying its users about location, carrier, promised speeds, and an individual’s rating of his service through a census. The survey data, in combination with the search function previously mentioned, a user can automatically correlate carriers to specific zip codes, along with promised speeds and any comments about that location and carrier.

The second part of the census involves a speed test. Broadbandcensus.com has worked closely with Internet2 and Virginia Tech to implement a modified Java-based NDT (Network Diagnostics Tool) client.

Based on the location provided by the user in the census, the site calculates the closest online NDT server accepting connections. The speed test takes approximately 30 seconds and roughly 50 data points are collected during this time, which measure everything from total speed to where bottlenecks in the network are occurring. Once this data is collected it allows the site to display real time percentages of user ratings and percentage of users getting their promised speeds. This is crucial when trying to find the right (only) carrier in your market and makes it a great research tool for consumers.

Broadbandcensus.com is now a publicly available resource that provides real data to consumers about broadband in the U.S. and facilitates consumer research and competition in the broadband carrier sector.

Technologies:

  • Custom ORM Framework written in PHP/MySQL
  • Java
  • Javascript
  • WordPress
  • custom wiki software

Data:

  • 37,000 zipcodes
  • 30,000 Federal datapoints
  • 95,000 relationships
  • 110,000 objects
  • 1500 providers (and growing)

Please Take a Number

In the Internet world of seemingly endless computer resources, it’s not often that a website requires visitors to wait in line to visit, but a recent Dancing Mammoth project called for just that.

The Requirements:

  • Visitors will be added to a Virtual Waiting Room prior to advancing to website content.
  • Visitors will be advanced according to First In First Out.
  • Page must indicate current position in line via a client provided Flash object.
  • An administrator must be able to control flow of traffic.

The client expected light traffic to their video chat feature, so we decided that capturing queue data in a single table was the most efficient way to go. We could then poll at a some set interval to determine a visitor’s place in line, and take action based on the result.

The client-provided Flash object required use of a bit of javascript, so I decided to go ahead and implement the polling with the jQuery library’s ajax functionality — I ♥ jQuery. Here’s what the javascript function looks like:

function updatePosition()
{
	$.ajax({
		type: "GET",
		url: "queue/index.php",
		data: "sess=<?php echo $session_id ?>&random=" + new Date().getTime(),
		dataType: "xml",
		success: function(xml){
			var ky = "";
			var val = "";
			var action = "";
			var pos = 0;
			$("response", xml).each(function(){
				$("action", this).each(function(){
					action = $("key", this).text();
					val = $("value", this).text();
					if(action == "forward")
					{
						// Forward
						forwardToUrl(val);
					}else{
						//Update
						pos = val;
						thisMovie('queueCountdown').update(pos);
					}
				});
			});
		}
	});
}

If you’d like to review all the sample files, you can download them here, but no warranty is expressed or implied.

jQuery sends the visitor’s session id (as well as a random string — always send a random string when making ajax GET calls or IE will give you cached content) to a script that checks the queue and returns some XML with the action and associate value. Then the visitor is either forward to the content, or their position in line is displayed.

On the back end, the VirtualWaitingRoom class provides functionality to retrieve queue position, administratively advance visitors through the queue, and remove records for abandoned sessions.

The project was a success and while there’s nothing especially complex about it, this Virtual Waiting Room is a good short example of how various web technologies can come together to provide a unique solution.

THE BALLERINAS

  • PJ Doland

    Born in a cross-fire hurricane and he howled at his ma in the driving rain.

    Matt

    Making sure all our websites have at least 15 pieces of flair.

    Erin Doland

    100 percent all-natural high-quality content machine.

    Francis Avila

    Ambidextrously juggling clients and code without breaking a sweat.

    Rachelle Ondiege

    Far too much energy for her own good.