Menu

RSS Feeds for FTP Servers

March 22, 2006

Mark Woodman


The applications for RSS have extended far beyond a way to distribute news items. RSS is now used for everything from tracking packages to car dealer inventories. These reflect one of the great aspects of RSS: you can use it to tell you when something happens that you care about, rather than having to go check for yourself. In that spirit, this article will show you how to write a PHP script that will monitor an FTP server for you, notifying you of the newest files added or changed.

PHP, FTP, and Thee

With all the emphasis on web-related functionality, the FTP commands in PHP are often overlooked. The good news is that these functions are included with standard PHP 4, so no external libraries are required.

It is important to make sure your PHP install has the FTP functions enabled, however. To do this, use phpinfo() in a simple file to see what has been enabled:

<?php  

   phpinfo();  

?>

When you view the above script in your web browser, you'll see the classic PHP Info page with nearly all the configuration information you'll ever need. Scroll down to the FTP section to see if "FTP support" has been "enabled." It should look something like this:

PHP Info with FTP enabled
Figure 1. PHP Info showing FTP is enabled

(If the FTP functions are not enabled, you'll need to make arrangements to either get it enabled or host this tutorial's script on a different server.)

It is a lot to absorb, but the PHP Manual on FTP functions is a useful reference. You may want to keep it close at hand while going through this tutorial. (And if you have insomnia, you could always try reading RFC 959, the specification for FTP itself.)

Know the Code

For this tutorial, we will create a PHP script called ftp_monitor.php. We will go through the script piece by piece, but you might also want to download the complete source code for reference.

The exploreFtpServer() Function

Let's start with the heart of the FTP functionality, encapsulated in the exploreFtpServer() function. This will take parameters for the FTP hostname, username, password, and initial path. The purpose of this function is to explore the FTP server, recurse through any directories, and then return an associative array of filenames and corresponding file timestamps.

After declaring the function signature, use PHP's ftp_connect() to attempt a connection to the FTP server. If the connection is created, we will keep the connection ID in the variable $cid to be used with all of PHP's FTP-related functions.

function exploreFtpServer($host, $user, $pass, $path)

{

   // Connect

   $cid = ftp_connect($host) or die("Couldn't connect to server");

For the sake of simplicity, the above code will summarily halt the script execution if a FTP connection cannot be established. After you have used this script for a while, you may want to add in some more robust error handling.

Once we have a connection, we will attempt to authenticate with the server by using the PHP function ftp_login(). Like most of PHP's FTP functions, the first argument is the connection ID ($cid). This particular one also takes a username and password.

If the login is successful, we'll use ftp_pasv() to tell the FTP server we are going to use passive mode. This means all connections will be initiated by the script. In so doing, you will be able to run this script behind a firewall.

Now that the connection is all set up, we can recurse through the FTP server directories, starting at the specified path. We'll use a scanDirectory() function to accomplish this, to be written after this one.

Finally, whether authentication worked or not, we'll need to close the connection to the server using ftp_close(). Again, this example will halt the script with die() if authentication fails, but you may choose to handle the failure in a different way.

   // Login

   if (ftp_login($cid, $user, $pass))

   {

      // Passive mode

      ftp_pasv($cid, true);



      // Recurse directory structure

      $fileList = scanDirectory($cid, $path);



      // Disconnect

      ftp_close($cid);

   }

   else

   {

      // Disconnect

      ftp_close($cid);



      die("Couldn't authenticate.");

   }

At this point we have a populated $fileList variable, which is an associative array. The keys are the file names, and the values are the timestamps of the files. This array will be most useful if sorted by timestamp--newest first--so sort it with arsort() and return it.

   // Sort by timestamp, newest (largest number) first

   arsort($fileList);



   // Return the result

   return $fileList;

}

The scanDirectory() Function

Now we are ready to write the scanDirectory() function, which is called from our exploreFtpServer() described above. The purpose of this function is to scan an FTP directory for files and subdirectories, adding the former to a list and recursing through the latter. The parameters to pass in are the FTP connection ID ($cid) and a starting directory ($dir). We'll also declare a static variable $fileList which will be used to retain our file list across recursive calls to the function.

To get the contents of a given directory in FTP, we'll use the ftp_nlist() function. Unfortunately, this function isn't perfect. On most FTP servers I have tested, it will return a list of file and directory names. But there are a few, like WU-FTPD, which only return a listing of filenames. On such servers our script can monitor only the initial directory provided; no subdirectories will be monitored.

The alternative to ftp_nlist() is ftp_rawlist(), which should provide all directory contents regardless of type. Unfortunately, the format of data returned by ftp_rawlist() does not seem to be standardized, so any attempt to come up with a "universal parser" is a daunting task. (Read the user comments on ftp_rawlist() to see what I mean.) Thus, for the sake of the tutorial, we'll stick with the imperfect but much simpler ftp_nlist().

function scanDirectory($cid, $dir)

{

   // Use static value to collect results

   static $fileList=array();



   // Get a listing of directory contents

   $contents = ftp_nlist($cid, $dir);

The $contents variable is now populated with a simple array of file and directory names. Depending on the server, these items may or may not contain the path of the file itself. (We may get "foo.txt", or we may get "/i/pity/the/foo.txt".)

This next section of scanDirectory() will iterate through each name and use ftp_size() to determine whether the name is a file or directory. (This is a cheap trick: directories return a size of -1.) If the item is a file, we'll prepend a leading slash if needed to keep our paths consistent, then use the ftp_mdtm() function to get its modification timestamp. We'll then add the filename as the key in our $fileList associative array, and use its timestamp as the value:

     

   // Iterate through the directory contents

   if($contents!=null)

   {

      foreach ($contents as $item)

      {

         // Is the item a file?

         if (ftp_size($cid, $item)>=0)

         {

             // Prepend slash if not present

            if($item[0]!="/") $item = "/" . $item;



            // Add file and modify timestamp to results

            $fileList[$item] = ftp_mdtm($cid, $item);

         }

Now we'll need to deal with an item returned by ftp_nlist() that is a directory. We'll be sure to ignore aliases to the same or parent directory. If we have a usable directory name, we can call scanDirectory() to recurse into it. (This requires some extra logic to handle variations among servers that use full or relative paths.) With both files and directories handled, we can return the $fileList containing every file found thus far. Here's how it all looks:

         else

         // Item is a directory

         {

            // Exclude self/parent aliases

            if($item!="." && $item!=".." && $item!="/")

            {

               // Server uses full path names

               if($item==strstr($item, $dir))

               {

                  scanDirectory($cid, $item);

               }

               else

               {

                  // Server uses relative path names

                  if($dir=="/")

                  {

                     scanDirectory($cid, $dir . $item);

                  }

                  else

                  {

                     scanDirectory($cid, $dir . "/" . $item);

                  }

               }

            }

         }

      }

   }



   // Return the results

   return $fileList;

}

The generateRssFeed() Function

This has started to feel like a PHP tutorial, hasn't it? Good news: the hard part is over, and now all that remains is to write the function that actually generates the RSS feed.

This function is the one that we'll call directly from another PHP script, passing in all the parameters needed to make it work:

  • $host: The FTP server hostname. Example: "ftp.foo.com".
  • $user: The FTP username. Example: "anonymous".
  • $pass: The FTP user password. Example: "guest".
  • $path: The starting directory on the server. Example: "/pub/crawl"
  • $itemCount: The number of items to return in the RSS feed.

The first thing we'll do is call our exploreFtpServer() function to get the list of files and their timestamps from the FTP server. Once the list is returned, we can see whether the list is shorter than the $itemCount passed in, and use the smaller number of the two:

function generateRssFeed($host, $user, $pass, $path, $itemCount)

{   

   // Get array of file/time arrays

   $fileList = exploreFtpServer($host, $user, $pass, $path);



   // Use user's count or # of files found, whichever is smaller

   if(count($fileList)<$itemCount) $itemCount=count($fileList);

We have a few variables to declare before continuing. First is a $linkPrefix to hold the hostname prefixed with the FTP protocol. Next is a $channelPubDate which will hold the publication date of the RSS feed. We will also create an $items array to hold the RSS items we create.

   // Create link prefix for feed

   $linkPrefix = 'ftp://' . $host;



   // Declare date for channel/pubDate

   $channelPubDate = null;



   // Array for item strings

   $items = array();

We're now ready to grab the most recent files and create RSS items for each one. For this tutorial, we're going to keep it nice and simple: Each item will have a title, link, and date. Feel free, however, to spice up your feed with more information. (One fun idea would be to display a file icon which matches the file's extension.)

As you'll recall, the $fileList returned from exploreFtpServer() is sorted by timestamp, newest file first. As we loop through the array we'll use the timestamp to create the publication date of the RSS item. The first (newest) timestamp will also be used to create the publication date for the RSS feed as a whole.


   // Create array of RSS items from most recent files

   foreach ($fileList as $filePath => $time)

   {

      // Create item/pubDate according to RFC822

      $itemPubDate = date("r", $time);



      // Also use first item/pubDate as channel/pubDate

      if($channelPubDate==null) $channelPubDate = $itemPubDate;

Next we'll create the file's URI, starting with "ftp://". This $fileUri variable will be used to populate both the RSS item title and the link. We should replace any spaces in the filenames with the encoded value of "%20" to ensure the URI is well-formed.

Now that we have all the information we need, it is time to create the XML for each RSS item. When that is done, we'll add it to our $items array for use later on. We will also be sure to end this loop if we have reached the $itemCount threshold.

      // Create URI to ftp file

      $fileUri = ereg_replace(" ", "%20", $linkPrefix . $filePath);



      // Create item

      $item = '<item>' 

      .'<title>' . $fileUri . '</title>'

      .'<link>'. $fileUri . '</link>'

      .'<pubDate>' . $itemPubDate . '</pubDate>'

      .'</item>';



      // Add to item array

      array_push($items, $item);



      // If max items for feed reached, stop

      if(count($items)==$itemCount) break;

   }

Finally we get to create the RSS feed itself. Building XML in PHP using strings is easy, but rarely pretty to look at. Note that we're using join() to add in all of our RSS items, with a line break after each. (The line breaks aren't necessary, but they make the feed easier to read when you're troubleshooting.)

   // Build the RSS feed

   $rss = '<rss version="2.0">'

   . '<channel>'

   . '<title>FTP Monitor: ' . $host . '</title>'

   . '<link>' . $linkPrefix . '</link>'

   . '<description>The ' . $itemCount.' latest changes on '

   . $host . $path . ' (out of ' . count($fileList) 

   . ' files)</description>'

   . '<pubDate>' . $channelPubDate . '</pubDate>' . "\n"

   . join("\n", $items) . "\n"

   . '</channel>'

   . '</rss>';

The feed is ready to go. All that remains is to set the HTTP header to indicate that we are returning an XML document, then output the feed:

   // Set header for XML mime type

   header("Content-type: text/xml; charset=UTF-8");

   

   // Display RSS feed

   echo($rss);

}

And that's the last of the real work. If you haven't done so yet, be sure to download the complete source code of ftp_monitor.php to see it all in one place.

Put It to Work

After placing ftp_monitor.php on your PHP-enabled web server, you can reference it from any other PHP script. Here is an example of how that might look:

<?php

  

   // Import the FTP Monitor

   require_once('ftp_monitor.php');

    

   // Connection params to monitor FreeBSD snapshots

   $host = "ftp.freebsd.org";

   $user = "anonymous";

   $pass = "guest@anon.com";

   $path = "/pub/FreeBSD/snapshots";

   

   // Generate RSS 2.0 feed showing newest FreeBSD snapshots

   generateRssFeed($host, $user, $pass, $path, 10);

?>

Here is a sample output file from the above connection parameters: freebsd.xml. When viewed in SharpReader, the items look like this:

PHP Info with FTP Enabled
Figure 2. FTP Monitor items for ftp.freebsd.org

Many RSS aggregators will automatically follow an item's link if it does not have a description element. SharpReader is one of these aggregators, and it also supports the ftp:// protocol. Thus, clicking on one of the items from our FTP monitor will start to download it automatically. This usually works just fine if the FTP server allows anonymous connections. If you had to provide a real username and password in ftp_monitor.php, however, your ability to "click and download" will depend on whether your RSS reader can prompt you for FTP credentials.

Enhance Your Performance

There are a few caveats to keep in mind when using this script. First, many FTP servers aren't exactly speedy, so the performance of this script will be bound by the response times of the FTP commands themselves. Simply put, the more directories it has to recurse, the longer it will take. Try to limit the scope of what you need to monitor.

Second, this script is not intended to be hit by a lot of concurrent users. The speed issue is one factor, but the other is FTP connections. For every concurrent hit to this script, a connection is made to the FTP server. It won't take much for the available connections to max out. So, if you want to provide an RSS feed to a lot of users, you should hide this script behind a cache which calls it on a periodic basis. Let the real load be handled by your cache, not the ftp_monitor script.

If you keep these constraints in mind, you can provide a nice service to your users which provides the information they need without frustrating response times.

Give It Back

If you find this script useful, or if you come up with a cool modification to it, I'd love to hear from you. Post a comment and share what you have learned with the xml.com community.