bookmark_borderSimplifying Routing / PSR-15 bootstrap in Horde

As you might remember from a previous post, Horde Core’s design is more complex than necessary or desirable for two main reasons:

  • Horde predates today’s standards like the Composer Autoloader and tries to solve problems on its own. Changing that will impair Horde’s ability to run without composer which we were hesitant to do, focusing on not breaking things previously possible.
  • Horde is highly flexible, extensible and configurable, which creates some chicken-egg problems to solve on each and every call to any endpoint inside a given app.

Today’s article concentrates on the latter problem. More precisely, we want to make routing more straight forward when a route is called.

A typical standalone or monolithic app usually is a composer root package. It knows its location relative to the autoloader, relative to the dependencies and relative to the fileroot of the composer installation. Moreover, the program usually knows about all its available routes. It will also have some builtin valid assumptions about how its different parts’ routes relate to the webroot.
None of this is true with a typical composer-based horde installation.

  • None of the apps is the root package
  • Apps are exposed to a web-readable subdir of the root package
  • While the relative filesystem location is known, each app can live in a separate subdomain, in the webroot or somewhere down the tree
  • Each app may reconfigure its template path, js path, themes path
  • The composer installer plugin makes a sane default. This default can be overridden
  • Each app can be served through multiple domains / vhosts with different registry and config settings in the same installation
  • Administrators can add local override routes
  • Parts of the code base rely on horde’s own runtime-configurable autoloader rather than composer.

This creates a lot of necessary complexity. The router needs to know the possible routes before it can map a request. To have the routes, the context described above must be established. Complexity cannot be removed without reducing flexibility. There is, however, a way out. The routing problem can be divided into three phases with different problems:

  • Development time – when routes are defined and changed frequently for a given app or service
  • Installation/Configuration time – when the administrator decides which apps’ routes will be available for your specific installation
  • Runtime – when a request comes in and the router must decide which route of which app needs to react – or none at all

Let’s ignore development time for now. It is just a complication of the other two cases. The design goal is to make runtime lean and simple. Runtime should initialize what the current route needs to work and as little as possible on top of that. Runtime needs to know all the routes and a minimal setup to make the router work. Complexity needs to be offloaded into installation time. Installation time needs to create a format of definite routes that runtime can process without a lot of setup and processing. As a side effect, we can gain speed for each individual call.

Modern Autoloaders are similar in concept: They have a setup stage where all known autoloader rules of the different packages are collected. In composer, the autoloader is re-collected in each installation or update process. The autoloader is exposed through a well-known location relative to the root package, vendor/autoload.php – it can be consumed by the application without further runtime setup. The autoloader can be optimized further by processing the autoloading rules into a fixed map of classes to filenames (Level 1), making these maps authoritative without an attempt to fail over on misses (Level 2a) and finally caching hits and misses into the in-memory APCu opcache. Each optimization process makes the lookup faster. This comes at the cost of flexibility. The mapping must be re-done whenever the installation changes. Otherwise things will fail. This is OK for production but it gets in the way of development. The same is true for the router.

The best optimization relies on the application code and configuration being static. The list of routes needs to be refreshed on change. Code updates are run through composer. The composer installer plugin can automatically refresh the router. Configuration updates can happen either through the horde web ui or through adding/editing files into the configuration area. Admins already know they need to run the composer horde-reconfigure command after they added new config files or removed files. Now they also need to run it when they changed file content. In development, routing information may change on the fly multiple times per hour. Offering a less optimized, more involved version of this route collection stage can help address the problem.

A new version of the RampageBootstrap codebase in horde/core is currently in development. It will offload more of horde’s early initialisation stages into a firmware stack and will reduce the early initialisation to the bare minimum. At the moment, I am still figuring out how we can do this in a backward compatible way.

bookmark_borderRdo: Persistence is not your model

Remember that post on how your backend might betray you? You can store your Turba addressbook into an LDAP tree, but if the addressbook is manipulated from LDAP side, your CardDAV Sync may be ignorant of this. The bottom line is: You cannot trust the backend. Nor should the persistence model govern your application internal model.

The development team at B1 works with a lot of inherited code from different areas and eras of FOSS and closed-source development. We see all styles of code and we have made all sorts of bad micro decisions ourselves over the years. One particulary hard question that comes up time and again is what is the “true” model of data in an application. Developers who come from traditional, monolithic web applications may say the true model is what is in the (relational) database and point out that it will closely influence the application’s internal model. Developers with a background in APIs and distributed apps will tend to think the messages going in and out of a service are the main thing. Application will be modeled after the messages and persistence is a secondary thought. Then we have the school of DDD-trained developers who will say neither is right. The internal model should be mostly ignorant of persistence and external API is just another type of persistence.

Let’s explore the benefits and impacts of these options.

DB-centric applications

In a DB-centric application, your entities closely represent rows of data in DB tables. If you use ActiveRecord or DataMapper style ORMs, you will often use the ORM Entities as your business objects in the application. This works fairly well for CRUD style applications where most if not all fields in the database will be editable form fields on screen.

If your application does little beyond storing, retrieving, updating and deleting “records” of whatever type, you will have a straight-forward development experience. Some frameworks may support autogenerating create/view/edit forms and searchable lists (html tables) out of your DB schema. Because it is the easiest way to add features to these types of application, the object model tends to be rich in attributes and limited in relations and segregation.

For example, it would be common for a “customer” entity to include separate fields for the customer’s street, city, country, email address… some fields would be marked as mandatory and others as optional, each accessible via getters and setters or public properties. This becomes cumbersome for entities which have subtypes with different sets of attributes or behaviours. Fields may be mandatory for one subtype, but optional or even forbidden for other subtypes. Attribute values are usually primitives (string, int) that can be mapped to DB column types easily. Developing these types of apps is really fast and concise as long as you do nothing fancy. As soon as you have any real amount of business logic and variance, it becomes cumbersome to maintain and hard to test.

API-centric applications

A similar school of thoughts comes out of microservice development. In many cases, CRUD services can be prototyped from autogenerated code defined through a Swagger/OpenAPI definition file. Sometimes there is little to do after this autogeneration, including persistence to the relational database. If you use a non-relational persistence like CouchDB, you may even get along with some variance and depth in the schema. Even a json or yaml file on disk gets you very far (at least in the prototype stage).

However, you will have scenarios where the same application object looks quite different between API requests. Users may have limited permissions on querying details of an object, different APIs with different use cases may have limited needs on the deeper details of objects. Retrieving your list of customer IDs and customer names for populating a dropdown is different from being able to view their billing data or contact persons’ details.

Domain Model centric approach

If your application has to deal with structured aggregates and consistency rules, exhibits rich behaviour, services multiple types of backends or is otherwise complicated, this might be the right approach.

At its core, the (micro)application is fairly ignorant of both persistence, export formats and external API.
The domain model deviates from persistence model or message representations. It has internal constraints and business rules. A domain entity is retrieved from repository in valid and complete state and all transformations will result in valid state. Transformations usually happen through defined operations rather than accessing a set of getters and setters. Setters may even not exist for many types of domain objects.

Through this, you can trust your objects by contract and reduce validations in the actual operations. This comes at a price: In a domain centric application, your classes may double or triple compared to other approaches.

You have your business object with all the internal integrity aspects. You have another version of the same object which is used for I/O – the so-called Data Transfer Object or DTO. There are versions where DTOs are either typed classes or untyped plain objects with just arbitrary public attributes. They may even just be structured arrays. Each option comes with a drawback. These world-readable objects expose all their relevant data to some kind of I/O – views rendered on the server side, files to export, REST API messages coming in or going out. A third version of your entity may be the persistence model(s), especially in relational databases. They may decompose your domain entity into multiple database tables, LDAP tree nodes or even multiple representations in no-sql databases. The main effort is getting all these transformations right. This type of application may seem verbose and repetitive. On the other hand, a strong object model with hard internal constraints and few dependencies on unrelated aspects is very straight-forward to unit test. This makes it suitable for large scale applications.

Compromises and practice

You do not have to commit to one approach with all consequences. This might even make less sense in languages which do not really enforce type hints (like python) or have no enforced concept of private and protected properties (again, like python). In many cases, you can get very far by discipline alone. Just restrict usage of setters to the persistence and I/O parts of your application. Add methods and behaviour to your persistence layer entities and even better, add an interface. Hint your business logic against the interface. You can also make your ORM mapper double as a Repository. You can scale out to real domain objects step by step as needed. Add conversion methods which turn a domain entity into one or many ORM entities and commit them to the DB. If your aggregate is not very suitable for this, hide your ORM mappers in a dedicated repository class. You can scale out as little or as much as your needs dictate.

On the other hand, you may abuse your ORM entities to populate views or data formats like XML, json or yaml.

Why we move away from putting logic into Rdo entities

Rdo is Rampage Data Objects, Horde’s minimalistic ORM solution. We have come a long way using the described tricks to make Horde_Rdo_Mappers work as repository implementations and Horde_Rdo_Base entities as business objects. However, we run into more and more situations where this is not appropriate. Using and lazy-loading relations to child objects is fine in the read use case, but persisting changes to such structures becomes tedious and error prone. Any logic tightly coupled to Rdo objects is also hard to unit test. Rdo-centric development does not mix well with the Shares library or with handling users and identities. As Rdo entities carry around references to the mapper and the database driver, they tend to bloat debug output. More fundamentally, our object models are evolving to a design which does not really look like database tables at all. We still love Rdo and may resort to Rdo entities with interfaces here and there, especially in prototyping. But our development model has long shifted towards unittest-early or unittest-driven rather.

Beware of the edges

Domain Driven Design purists may emphasize how domain models will rarely have attribute setters and some will even argue you should minimize your getters (tell don’t ask). However, at some point we need to get raw attributes to compose messages to the edges, to answer API requests or to persist any objects. There are multiple ways to do it, with different implications. Remember the Data Transfer Objects? I tend to have method on my domain aggregate to “eject” a fully detailed world readable object. In many cases, it is even typed. This chatty object can now be used for Formatters to transform them into API messages or data files. So the operation would be: Construct the domain aggregate – and fail if data is invalid – and from the domain aggregate, construct the DTO. Use the DTO for chatty jobs.

But how about the other way around? Messages from the outside can be malformed in many ways. They may be tampered with, they may miss details by design. If I exported an object to a user’s limited API and he sends an update, how to handle that? If an API user does not even know advanced attributes that are subject to other views and use cases, how would he create new entities?

I tend to have a method on the repository which accepts these messages and tries to turn them into Domain Objects. If the partial message contains some ID field or a sufficient set of data to form a unique key, I will first ask the backend to get the original details of the object and then apply the message details – and fail if this violates the constraits of my aggregate. If no previous version exists in the backend, missing information is added from defaults and generators, including a unique id of some sort. The message is applied on top.

The is approach may be expensive. I am creating full domain objects just to render database content to another format which may not even contain most of the retrieved data. On the other hand, if I know an object exists in a relational db, I could use some cheap SQL to apply partial changes without ever constructing the full model.

Creating straight-to-output repos with optimized queries is a tradeoff. They mean additional maintenance burden whenever the model changes, additional parts to cover with tests. I only do this when performance actually becomes an issue, not by default.

Creating straight-to-persistence methods for partial messages is even less desirable. It is circumventing integrity check at code level. However, sometimes the message itself is the artifact to persist – to be later processed by a queue or other asynchronous process.

bookmark_borderPEAR down – Taking Horde to Composer

Since Horde 4, the Horde ecosystem heavily relied on the PEAR infrastructure. Sadly, this infrastructure is in bad health. It’s time to add alternatives.

Everybody has noticed the recent PEAR break-in.

A security breach has been found on the http://pear.php.net webserver, with a tainted go-pear.phar discovered. The PEAR website itself has been disabled until a known clean site can be rebuilt. A more detailed announcement will be on the PEAR Blog once it’s back online. If you have downloaded this go-pear.phar in the past six months, you should get a new copy of the same release version from GitHub (pear/pearweb_phars) and compare file hashes. If different, you may have the infected file.

While I am writing these lines, pear.php.net is down. Retrieval links for individual pear packages are down. Installation of pear packages is still possible from private mirrors or linux software distribution packages (openSUSE, Debian, Ubuntu). Separate pear servers like pear.horde.org are not directly affected. However, a lot of pear software relies on one or many libraries from pear.php.net – it’s a tough situation. A lot of software projects have moved on to composer, an alternative solution to dependency distribution. However, some composer projects have dependency on PEAR channels.

I am currently submitting some changes to Horde upstream to make Horde libs (both released and from git) more usable from composer projects.
Short-term goal is making use of some highlight libraries easier in other contexts. For example, Horde_ActiveSync and Horde_Mail, Horde_Smtp, Horde_Imap_Client are really shiny. I use Horde_Date so much I even introduced it in some non-horde software – even though most functionality is also somewhere in php native classes.

The ultimate goal however is to enable horde groupware installations out of composer. This requires more work to be done. There are several issues.

  • The db migration tool checks for some pear path settings during runtime https://github.com/horde/Core/pull/2 Most likely there are other code paths which need to be addressed.
  • Horde Libraries should not be web readable but horde apps should be in a web accessible structure. Traditionally, they are installed below the base application (“horde dir”) but they can also be installed to separate dirs.
  • Some libraries like Horde_Core contain files like javascript packages which need to be moved or linked to a location inside another package. Traditionally, this is handled either by the “git-tools” tool linking the code directory to a separate web directory or by pear placing various parts of the package to different root paths. Composer doesn’t have that out of the box.

Horde already has been generating composer manifest files for quite a while. Unfortunately, they were thin wrappers around the existing pear channel. The original generator even took all package information from the pear manifest file (package.xml) and converted it. Which means, it relied on a working pear installation. I wrote an alternative implementation which directly converts from .horde.yml to composer.json – Calling the packages by their composer-native names. As horde packages have not been released on packagist yet, the composer manifest also includes repository links to the relevant git repository. This should later be disabled for releases and only turned on in master/head scenarios. Releases should be pulled from packagist authority, which is much faster and less reliant on existing repository layouts. https://github.com/horde/components/pull/3

To address the open points, composer needs to be amended. I currently generate the manifests using package types “horde-library” and “horde-application” – I also added a package type “horde-theme” for which no precedent exists yet. Composer doesn’t understand these types unless one adds an installer plugin https://github.com/maintaina-com/installers. Once completed and accepted, this should be upstreamed into composer/installers. The plugin currently handles installing apps to appropriate places rather than /vendor/ – however, I think we should avoid having a super-special case “horde-base” and default to installing apps directly below the project dir. Horde base should also live on the same hierarchy. This needs some additional tools and autoconfiguration to make it convenient. Still much way to go.

That said, I don’t think pear support should be dropped anytime soon. It’s the most sensible way for distribution packaging php software. As long as we can bear the cost involved in keeping it up, we should try.

bookmark_borderhorde trustr – A new horde CA app step by step

Trustr is my current project to create a simple certificate management app.
I decided that it is just about the right scope to demonstrate a few things about application development in Horde 5.

I have not made any research if the name is already occupied by some other software. Should any problems arise, please contact me and we will find a good solution. I just wanted to start without losing much time on unrelated issues.

My goals as of now:

– Keep everything neat, testable, fairly decoupled and reusable. The core logic should be exportable to a separate library without much change. There won’t be any class of static shortcut methods pulling stuff out of nowhere. Config and registry are only accessed at select points, never in the deeper layers.
– Provide a CLI using Horde_Cli and Horde_Cli_Application (modeled after the backup tool in horde base git)
– Store to relational database using Horde_Db and Horde_Rdo for abstraction
– Use php openssl extension for certificate actions, but design with future options in mind
– Rely on magic openssl defaults as little as possible
– Use conf.xml / conf.php for any global defaults
– Show how to use the inter-app API (reusable for xml-rpc and json-rpc)
– Showcase an approach to REST danabol ds in Horde (experimental)

The app is intended as a resource provider. The UI is NOT a top priority. However, I am currently toying around with a Flux-like design in some unrelated larger project and I may or may not try some ideas later on.

Initial Steps: Creating the working environment

I set up a new horde development container using the horde tumbleweed image from Open Build Service and a docker compose file from my colleague Florian Frank. Please mind both are WIP and improve-as-needed projects.


git clone https://github.com/FrankFlorian/hordeOnTumbelweed.git
cd hordeOnTumbelweed
docker-compose -f docker-compose.yml up


This yields a running horde instance on localhost and a database container.
I needed to perform a little manual setup in the web admin ui to get the DB to run and create all default horde schemas.

Next I entered the developer container with a shell
docker exec -it hordeOnTumbelWeed_php_1 bash

There are other ways to work with a container but that’s what I did.

 

Creating a  skeleton app

The container comes with a fairly complete horde git checkout in /srv/git/horde and a clone of the horde git tools in /srv/git/git-tools

A new skeleton app can be created using

horde-git-tools dev new --author "Ralf Lang <lastname@b1-systems.de>" --app-name trustr

The new app needs to be linked to the web directory using

horde-git-tools dev install

Also, a registry entry needs to be created by putting a little file into /srv/git/horde/base/config/registry.d

cat trustr-registry.d.php

<?php
// Copy this example snipped to horde/registry.d
$this->applications['trustr'] = array(
'name' => _('Certificates'),
'provides' => array('certificates')
);

 

This makes the new app show up in the admin menu. To actually use it and make it appear in topbar, you also need to go to /admin/config and create the config file for this app. Even though the settings don’t actually mean anything by now, the file must be present.

I hope to follow up soon with articles on the architecture and sub systems of the little app.