bookmark_borderDistributed applications with Horde 4

Synopsis

Horde’s powerful RPC API has been used numerous times to allow integration of horde-based data into external applications or remote sites. It also provides an easy to set up basis for distributed applications with headless workers. In this article I will give you a brief introduction on how to build a scalable distributed architecture based on Horde 4.

Distributed Architecture

Assumptions:

  •  You want your application to be scalable over several hosts. We call the controlling instance the master and the reacting instances the workers.
  •  You don’t want to keep a lot of state on the worker. Adding or removing a worker instance should not require complicated setup. Most cloud layers like OpenStack assume worker instances to be virtually stateless. The master is the single source of truth and should be able to rebuild any broken or lost worker setup from stored information.
  • You are working in a hostile environment, e.g. the internet. Firewall only allows select ports and data has to travel over lines you cannot trust. You want to resort to https transport with real certificates.

The master:

I won’t go into too many  details on the master setup this time. Create a basic app from the skeleton as the horde wiki describes. Separate a communication driver for worker Api calls from the driving logic in your app and don’t couple them too tightly. Usually you want small commits of changes to both the master’s idea and the worker’s reality and you want to check back if everything worked out. This doesn’t scale well on large-scale changes though.

Sometimes you want to make complex changes to the “truth” or “theory” in the master’s db before you commit them to the worker world out there.

Accessing the worker from the master:

The core piece of your communication with the worker are just a few lines of code

   protected function callWorker(WorkerInstance $worker, $callMethod, array $parameters = array()) {
       try {
            $http = new Horde_Http_Client(array('request.username' => $worker->rpcuser, 'request.password' => $worker->rpcuserpass, 'request.timeout' => 20 ));
            $response = Horde_Rpc::request(
                    'xmlrpc',
                    'https://' . $worker->worker_hostname . '/' . $worker->worker_subdir .'/rpc.php',
                    $callMethod,
                    $http,
                    array($parameters)
            );
        }
        catch (Exception $e) {
            throw new Appname_Exception($e);
        }
        return $response;
    }

This is a dumbed down version for demonstration purposes. You might want to model WorkerInstance based on Horde_Rdo, the horde ORM layer. It is desirable to evaluate lazy relations and lazy attributes. This has important performance implications but more on this in another post. We’re also selling consulting 😉

Worker setup:

We want a stateless worker instance. Obviously, this is theory. Truth is: You need a unique IP and you probably want a unique hostname. Nowadays cloud layers can provide that level of configuration. How about a horde instance without db?

horde/config/registry.local.php

You want the worker to talk under a specific api name. Add a block to your registry.local.php

 'myvpnworkerworker' => array (
        'name' => _("someworkerfooname"), /* we can even drop the _() as nobody will localize this */
        'provides' => 'myvpnworkerapp',
    )

horde/config/conf.php

This is stripped down to just the important lines
$conf['auth']['params']['htpasswd_file'] = '/not/in/webroot/passwords.secret';
 $conf['auth']['params']['encryption'] = 'plain'; /* In real world, you want to use some encryption instead */
 $conf['auth']['driver'] = 'http'; /* We want authentication by http layer after all */

We want the server to be stateless and not to rely on external data. We don’t want a local mysqld running and we don’t want a remote ldap either. We will store the credentials in a .htpasswd style file. For demonstration purposes, we use plain authentication.

The file would look like this:

passwd.passwd would look like this: 

rpcuser:totallysecretrpcuserpass
adminuser:adminpass
localdebuguser:secretlocaldebugpass

We also want to get rid of any components which cannot work without an sql backend

$conf['log']['priority'] = 'DEBUG';
$conf['log']['ident'] = 'HORDE';
$conf['log']['name'] = LOG_USER;
$conf['log']['type'] = 'syslog';
$conf['log']['enabled'] = true;
$conf['log_accesskeys'] = false;

As the worker will probably only show the admin UI to localhost or VPN, you want to log any debug relevant data locally into a file
$conf['prefs']['driver'] = 'Session';
$conf['alarms']['driver'] = false;

We don’t want user prefs or alarms on the worker. You might consider setting up some basic email delivery and sending alarms by mail. I won’t cover this here.

$conf['datatree']['driver'] = 'null';
$conf['group']['driver'] = 'Mock';

Datatree support is sql-only. Datatree is mostly legacy support and it isn't particularly fast either. There is no guarantee future horde revisions will support datatree. You don't want it. Period. You don't want groups either. The primary user of your instance is the RPC user.
$conf['perms']['driver'] = 'Null'

Only the master speaks to your worker and this must be ensured on the ssl/https layer. No need for a perms backend

$conf['cache']['driver'] = 'File';

If we use caching at all, we want to use a primitive one.

$conf['lock']['driver'] = 'Null';
$conf['token']['driver'] = 'Null';

Horde_Locks is a cool library. Ben Klang wrote it in 2008 when I was working in a non-public project that needed it and I mailed some stuff to him. But it’s sql-only. We don’t want it here.
Horde_Tokens are essential for a lot of verification tasks but the worker is not the single source of truth.

$conf['vfs']['type'] = 'File';

You probably don’t want a vfs at all. Vfs means state.

$conf['sessionhandler']['type'] = 'Builtin';

Anything but sql. You probably don’t want sessions.

This should be the key parts to make your stock horde installation not want a database at all.

The RPC Worker app.

The key to your RPC worker app is Api.php

This is the entry point for any Horde RPC calls.

Basically it works this way:

  • The upper layer of array() is internal to the horde rpc request layer
  • In our client example we wrapped our params into an additional array() to facilitate optional parameters. This means any method in Api.php accepts an array as the single parameter. You could also use a fixed list of parameters with optionals in the rear positions.
  • While the horde registry calls applications apis as ‘domain/function’, the rpc api calls them as domain.function. Examples are horde.listApis and myvpnapp.fetchData

Any function  you can call from the outside is a method in Fooworkername_Api in Fooworkername/lib/Api.php.

Concurrency and queueing:

Horde is written in PHP. PHP is generally lacking in thread safety and doesn’t support real forking from within an apache module. You can however fork and detach processes using shell_exec. Horde ships some classes which help you use PHP in a shell environment but sometimes you want to resort to shell scripts or perl or anything else because it already exists or is more suitable to the job. shell_exec allows you use all of these. Usually you want your api calls to return fast. This doesn’t scale well. Make sure your individual call usually finishes in predictable worst case scenarios in 1/3 of the client’s response timeout. In our example we chose 20 seconds for timeout. Mind network latency and external script worst case runtime.

The solution here is decoupling:

  • Don’t make any UI element depend on live data from the worker
  • make a service/daemon or cron job collect worker state at short intervals and serialize these data points in time stamped files or directories
  • Create an api entry point to collect most recent state/results
  • Collect results of all workers from a commandline script, daemon, cron job or service in reasonable sequences.
  • Don’t expect most tasks immediately but add them to a queue. Horde_Queue may help you with that task.

Choose wisely where to call existing external apps and where to resort to PHP and the Horde Framework to solve common data collection, processing, formatting and returning tasks.

Remember to have fun.

The author is severly biased towards all things horde and has used horde classes and applications to solve various work-for-hire problems. The Horde Framework is one of the oldest and mature php projects and drives mission critical collaboration and data retrieval software all over the globe.