bookmark_borderRdo: Persistence is not your model

Remember that post on how your backend might betray you? You can store your Turba addressbook into an LDAP tree, but if the addressbook is manipulated from LDAP side, your CardDAV Sync may be ignorant of this. The bottom line is: You cannot trust the backend. Nor should the persistence model govern your application internal model.

The development team at B1 works with a lot of inherited code from different areas and eras of FOSS and closed-source development. We see all styles of code and we have made all sorts of bad micro decisions ourselves over the years. One particulary hard question that comes up time and again is what is the “true” model of data in an application. Developers who come from traditional, monolithic web applications may say the true model is what is in the (relational) database and point out that it will closely influence the application’s internal model. Developers with a background in APIs and distributed apps will tend to think the messages going in and out of a service are the main thing. Application will be modeled after the messages and persistence is a secondary thought. Then we have the school of DDD-trained developers who will say neither is right. The internal model should be mostly ignorant of persistence and external API is just another type of persistence.

Let’s explore the benefits and impacts of these options.

DB-centric applications

In a DB-centric application, your entities closely represent rows of data in DB tables. If you use ActiveRecord or DataMapper style ORMs, you will often use the ORM Entities as your business objects in the application. This works fairly well for CRUD style applications where most if not all fields in the database will be editable form fields on screen.

If your application does little beyond storing, retrieving, updating and deleting “records” of whatever type, you will have a straight-forward development experience. Some frameworks may support autogenerating create/view/edit forms and searchable lists (html tables) out of your DB schema. Because it is the easiest way to add features to these types of application, the object model tends to be rich in attributes and limited in relations and segregation.

For example, it would be common for a “customer” entity to include separate fields for the customer’s street, city, country, email address… some fields would be marked as mandatory and others as optional, each accessible via getters and setters or public properties. This becomes cumbersome for entities which have subtypes with different sets of attributes or behaviours. Fields may be mandatory for one subtype, but optional or even forbidden for other subtypes. Attribute values are usually primitives (string, int) that can be mapped to DB column types easily. Developing these types of apps is really fast and concise as long as you do nothing fancy. As soon as you have any real amount of business logic and variance, it becomes cumbersome to maintain and hard to test.

API-centric applications

A similar school of thoughts comes out of microservice development. In many cases, CRUD services can be prototyped from autogenerated code defined through a Swagger/OpenAPI definition file. Sometimes there is little to do after this autogeneration, including persistence to the relational database. If you use a non-relational persistence like CouchDB, you may even get along with some variance and depth in the schema. Even a json or yaml file on disk gets you very far (at least in the prototype stage).

However, you will have scenarios where the same application object looks quite different between API requests. Users may have limited permissions on querying details of an object, different APIs with different use cases may have limited needs on the deeper details of objects. Retrieving your list of customer IDs and customer names for populating a dropdown is different from being able to view their billing data or contact persons’ details.

Domain Model centric approach

If your application has to deal with structured aggregates and consistency rules, exhibits rich behaviour, services multiple types of backends or is otherwise complicated, this might be the right approach.

At its core, the (micro)application is fairly ignorant of both persistence, export formats and external API.
The domain model deviates from persistence model or message representations. It has internal constraints and business rules. A domain entity is retrieved from repository in valid and complete state and all transformations will result in valid state. Transformations usually happen through defined operations rather than accessing a set of getters and setters. Setters may even not exist for many types of domain objects.

Through this, you can trust your objects by contract and reduce validations in the actual operations. This comes at a price: In a domain centric application, your classes may double or triple compared to other approaches.

You have your business object with all the internal integrity aspects. You have another version of the same object which is used for I/O – the so-called Data Transfer Object or DTO. There are versions where DTOs are either typed classes or untyped plain objects with just arbitrary public attributes. They may even just be structured arrays. Each option comes with a drawback. These world-readable objects expose all their relevant data to some kind of I/O – views rendered on the server side, files to export, REST API messages coming in or going out. A third version of your entity may be the persistence model(s), especially in relational databases. They may decompose your domain entity into multiple database tables, LDAP tree nodes or even multiple representations in no-sql databases. The main effort is getting all these transformations right. This type of application may seem verbose and repetitive. On the other hand, a strong object model with hard internal constraints and few dependencies on unrelated aspects is very straight-forward to unit test. This makes it suitable for large scale applications.

Compromises and practice

You do not have to commit to one approach with all consequences. This might even make less sense in languages which do not really enforce type hints (like python) or have no enforced concept of private and protected properties (again, like python). In many cases, you can get very far by discipline alone. Just restrict usage of setters to the persistence and I/O parts of your application. Add methods and behaviour to your persistence layer entities and even better, add an interface. Hint your business logic against the interface. You can also make your ORM mapper double as a Repository. You can scale out to real domain objects step by step as needed. Add conversion methods which turn a domain entity into one or many ORM entities and commit them to the DB. If your aggregate is not very suitable for this, hide your ORM mappers in a dedicated repository class. You can scale out as little or as much as your needs dictate.

On the other hand, you may abuse your ORM entities to populate views or data formats like XML, json or yaml.

Why we move away from putting logic into Rdo entities

Rdo is Rampage Data Objects, Horde’s minimalistic ORM solution. We have come a long way using the described tricks to make Horde_Rdo_Mappers work as repository implementations and Horde_Rdo_Base entities as business objects. However, we run into more and more situations where this is not appropriate. Using and lazy-loading relations to child objects is fine in the read use case, but persisting changes to such structures becomes tedious and error prone. Any logic tightly coupled to Rdo objects is also hard to unit test. Rdo-centric development does not mix well with the Shares library or with handling users and identities. As Rdo entities carry around references to the mapper and the database driver, they tend to bloat debug output. More fundamentally, our object models are evolving to a design which does not really look like database tables at all. We still love Rdo and may resort to Rdo entities with interfaces here and there, especially in prototyping. But our development model has long shifted towards unittest-early or unittest-driven rather.

Beware of the edges

Domain Driven Design purists may emphasize how domain models will rarely have attribute setters and some will even argue you should minimize your getters (tell don’t ask). However, at some point we need to get raw attributes to compose messages to the edges, to answer API requests or to persist any objects. There are multiple ways to do it, with different implications. Remember the Data Transfer Objects? I tend to have method on my domain aggregate to “eject” a fully detailed world readable object. In many cases, it is even typed. This chatty object can now be used for Formatters to transform them into API messages or data files. So the operation would be: Construct the domain aggregate – and fail if data is invalid – and from the domain aggregate, construct the DTO. Use the DTO for chatty jobs.

But how about the other way around? Messages from the outside can be malformed in many ways. They may be tampered with, they may miss details by design. If I exported an object to a user’s limited API and he sends an update, how to handle that? If an API user does not even know advanced attributes that are subject to other views and use cases, how would he create new entities?

I tend to have a method on the repository which accepts these messages and tries to turn them into Domain Objects. If the partial message contains some ID field or a sufficient set of data to form a unique key, I will first ask the backend to get the original details of the object and then apply the message details – and fail if this violates the constraits of my aggregate. If no previous version exists in the backend, missing information is added from defaults and generators, including a unique id of some sort. The message is applied on top.

The is approach may be expensive. I am creating full domain objects just to render database content to another format which may not even contain most of the retrieved data. On the other hand, if I know an object exists in a relational db, I could use some cheap SQL to apply partial changes without ever constructing the full model.

Creating straight-to-output repos with optimized queries is a tradeoff. They mean additional maintenance burden whenever the model changes, additional parts to cover with tests. I only do this when performance actually becomes an issue, not by default.

Creating straight-to-persistence methods for partial messages is even less desirable. It is circumventing integrity check at code level. However, sometimes the message itself is the artifact to persist – to be later processed by a queue or other asynchronous process.

bookmark_borderHorde/Rdo ORM: PSR-4 and BC Breaks

Summary: Horde/Rdo ORM got upgraded for Namespaces. User code conversion is straight-forward. Backward Compatibility is limited.

If you ever wondered, RDO stands for Rampage Data Objects. This has been on my list for quite long, but it took some time to get it right. The horde/rdo library is horde’s Object Relational Mapping (ORM) solution. It allows you to store objects into sql databases or retrieve them without writing SQL. Originally, it has been written by Chuck Hagenbuch way back in the PHP 4 days and I have been a heavy user for years. If you know Laravel’s Eloquent, Doctrine, Hibernate, ActiveRecord, nHydrate or Dapper, this one is Chuck’s “as light as possible” implementation of the concept. Colleagues from B1 Systems have been users and contributors over the years. However, it has long been time to rethink Rdo in the light of newer capabilities of PHP 7.4 or even PHP 8. This time is now.

But you should not fear upgrading. First, the library still keeps the unnamespaced PSR-0 code, at least for the time being. Second, there is a straight-forward upgrade path for existing users.

Horde_Rdo_Base -> Horde\Rdo\Base
Horde_Rdo_Mapper -> Horde\Rdo\BaseMapper
Horde_Rdo_Factory -> Horde\Rdo\Factory
Horde_Rdo_List -> Horde\Rdo\DefaultList
Horde_Rdo_Iterator -> Horde\Rdo\DefaultIterator
Horde_Rdo_Query -> Horde\Rdo\DefaultQuery
Horde_Rdo_Exception -> Horde\Rdo\RdoException
Horde_Rdo:: -> Horde\Rdo\Constants::

It’s about as easy as it looks. Converting an application took me a few minutes. You might have noticed the names do not exactly match. Some names were not practical to simply turn into namespaced classes. In other cases, I turned class names into interface names. I found myself implementing the same enhancements over and over in multiple projects and I found myself wishing there was an easy way to do some others.

Less Boilerplate Mappers and Entities

Rdo is much more fun than some other ORMs as it comes with very little configuration. The library autodetects properties from the table columns. Datetime fields are automatically cast to Horde_Date objects. By default, Rdo tries for a convention over configuration approach for mapping table names, mappers, entities etc. Unfortunately, for most of my projects, that default does not fit too well to the class structure and file layout I choose. But still, implementing a new pair of mappers and entities takes two files and only two or three settings I ever need to think about.

Most often, subclassing the base mapper and the base entity is the right thing to do. But sometimes, you do not really care. If all you ever do to your entity is call ->toArray() and serialize it to json, you would be served very well by a generic entity instead. This is something on my list. I would even go one step further: If all you are changing to a mapper is subclassing and telling it the name of its database table and entity class, why subclass at all? Yes, I would want to turn the optional Factory class into something smarter. It will give you your mapper, be it an instance of the generic mapper with the right table name or something very customized.

Custom List Objects

Rdo queries always return a Horde\Rdo\List. This is a good default implementation and it makes common tasks easy. However, there are situations where you want your list of entities to be specific to an entity type or maybe a subclass of some base list class completely external to Rdo. Maybe you want to manipulate a list or merge results from two different queries.

Custom Entities
Sometimes the default entity implementation does not serve you well. There’s a range of things you would want.
– You want to inherit from a base class to attach behavior to your data. So you attach an interface and a trait to that foreign class to make it Rdo aware.
– You want to implement your own behaviour altogether
– You want Rdo to implement a proper repository with strongly encapsulated, less chatty domain objects. Rdo should provide a mechanism to produce those objects for you rather than having you cast or wrap Rdo\Base objects into your actual models. But it should not force you to think about such concepts before you really need them.

NoSql backends
Remember the name Rampage Data Objects? Rdo is mostly about mapping data to objects. It’s not about autogenerating the smartest SQL for the most obscure use cases. But once you have your prototype version ready, your first feedback comes in, you think about new features – and suddenly you want to support a new backend for some of your domain objects. Be it a NoSql database, a key/value store, a limited scope within a directory like LDAP or even a REST service. In a traditional horde application, you would wrap Rdo into a Driver called Driver/Rdo or Driver/Sql and implement a different backend. But what if you do not want to flip all your data to the new backend? Only the shopping list should go to the nosql backend but not the customers or the product inventory? You end up implementing individual drivers with individual backends. But you used Rdo’s relations feature … things get messy.

To achieve these capabilities, I want to make the Mapper less dominant. The formerly optional Factory gets promoted to take care of managing the right mappers, entities, backends, list types. This is what these interfaces are for. Mappers should mostly take care of mapping between an object class and a plain data format. Currently the mapper and query do too much, tightly coupled with the single mandatory list type. This will change.

Rampage Data Objects provides out of the box defaults for easy and common use cases. It gets you started really quick. We will add the capabilities needed when your application is maturing and your use cases get more demanding. This will be fun.

Backward Compatibility Breaks
The Horde\Rdo\Base* classes and their return types will be your best bet for backward compatibility. If you don’t try to use entities and mappers for side effects, you will be very safe. Factory’s constructor will change very soon. Factory should best be created from a Dependency Injector. Mappers should be created from Factory.

You should not rely on mappers exposing adapter or factory for creating other objects. Also, trying to manipulate sql session modes or transactions through Rdo’s adapter is not a good idea.