Rdo: Persistence is not your model

Remember that post on how your backend might betray you? You can store your Turba addressbook into an LDAP tree, but if the addressbook is manipulated from LDAP side, your CardDAV Sync may be ignorant of this. The bottom line is: You cannot trust the backend. Nor should the persistence model govern your application internal model.

The development team at B1 works with a lot of inherited code from different areas and eras of FOSS and closed-source development. We see all styles of code and we have made all sorts of bad micro decisions ourselves over the years. One particulary hard question that comes up time and again is what is the “true” model of data in an application. Developers who come from traditional, monolithic web applications may say the true model is what is in the (relational) database and point out that it will closely influence the application’s internal model. Developers with a background in APIs and distributed apps will tend to think the messages going in and out of a service are the main thing. Application will be modeled after the messages and persistence is a secondary thought. Then we have the school of DDD-trained developers who will say neither is right. The internal model should be mostly ignorant of persistence and external API is just another type of persistence.

Let’s explore the benefits and impacts of these options.

DB-centric applications

In a DB-centric application, your entities closely represent rows of data in DB tables. If you use ActiveRecord or DataMapper style ORMs, you will often use the ORM Entities as your business objects in the application. This works fairly well for CRUD style applications where most if not all fields in the database will be editable form fields on screen.

If your application does little beyond storing, retrieving, updating and deleting “records” of whatever type, you will have a straight-forward development experience. Some frameworks may support autogenerating create/view/edit forms and searchable lists (html tables) out of your DB schema. Because it is the easiest way to add features to these types of application, the object model tends to be rich in attributes and limited in relations and segregation.

For example, it would be common for a “customer” entity to include separate fields for the customer’s street, city, country, email address… some fields would be marked as mandatory and others as optional, each accessible via getters and setters or public properties. This becomes cumbersome for entities which have subtypes with different sets of attributes or behaviours. Fields may be mandatory for one subtype, but optional or even forbidden for other subtypes. Attribute values are usually primitives (string, int) that can be mapped to DB column types easily. Developing these types of apps is really fast and concise as long as you do nothing fancy. As soon as you have any real amount of business logic and variance, it becomes cumbersome to maintain and hard to test.

API-centric applications

A similar school of thoughts comes out of microservice development. In many cases, CRUD services can be prototyped from autogenerated code defined through a Swagger/OpenAPI definition file. Sometimes there is little to do after this autogeneration, including persistence to the relational database. If you use a non-relational persistence like CouchDB, you may even get along with some variance and depth in the schema. Even a json or yaml file on disk gets you very far (at least in the prototype stage).

However, you will have scenarios where the same application object looks quite different between API requests. Users may have limited permissions on querying details of an object, different APIs with different use cases may have limited needs on the deeper details of objects. Retrieving your list of customer IDs and customer names for populating a dropdown is different from being able to view their billing data or contact persons’ details.

Domain Model centric approach

If your application has to deal with structured aggregates and consistency rules, exhibits rich behaviour, services multiple types of backends or is otherwise complicated, this might be the right approach.

At its core, the (micro)application is fairly ignorant of both persistence, export formats and external API.
The domain model deviates from persistence model or message representations. It has internal constraints and business rules. A domain entity is retrieved from repository in valid and complete state and all transformations will result in valid state. Transformations usually happen through defined operations rather than accessing a set of getters and setters. Setters may even not exist for many types of domain objects.

Through this, you can trust your objects by contract and reduce validations in the actual operations. This comes at a price: In a domain centric application, your classes may double or triple compared to other approaches.

You have your business object with all the internal integrity aspects. You have another version of the same object which is used for I/O – the so-called Data Transfer Object or DTO. There are versions where DTOs are either typed classes or untyped plain objects with just arbitrary public attributes. They may even just be structured arrays. Each option comes with a drawback. These world-readable objects expose all their relevant data to some kind of I/O – views rendered on the server side, files to export, REST API messages coming in or going out. A third version of your entity may be the persistence model(s), especially in relational databases. They may decompose your domain entity into multiple database tables, LDAP tree nodes or even multiple representations in no-sql databases. The main effort is getting all these transformations right. This type of application may seem verbose and repetitive. On the other hand, a strong object model with hard internal constraints and few dependencies on unrelated aspects is very straight-forward to unit test. This makes it suitable for large scale applications.

Compromises and practice

You do not have to commit to one approach with all consequences. This might even make less sense in languages which do not really enforce type hints (like python) or have no enforced concept of private and protected properties (again, like python). In many cases, you can get very far by discipline alone. Just restrict usage of setters to the persistence and I/O parts of your application. Add methods and behaviour to your persistence layer entities and even better, add an interface. Hint your business logic against the interface. You can also make your ORM mapper double as a Repository. You can scale out to real domain objects step by step as needed. Add conversion methods which turn a domain entity into one or many ORM entities and commit them to the DB. If your aggregate is not very suitable for this, hide your ORM mappers in a dedicated repository class. You can scale out as little or as much as your needs dictate.

On the other hand, you may abuse your ORM entities to populate views or data formats like XML, json or yaml.

Why we move away from putting logic into Rdo entities

Rdo is Rampage Data Objects, Horde’s minimalistic ORM solution. We have come a long way using the described tricks to make Horde_Rdo_Mappers work as repository implementations and Horde_Rdo_Base entities as business objects. However, we run into more and more situations where this is not appropriate. Using and lazy-loading relations to child objects is fine in the read use case, but persisting changes to such structures becomes tedious and error prone. Any logic tightly coupled to Rdo objects is also hard to unit test. Rdo-centric development does not mix well with the Shares library or with handling users and identities. As Rdo entities carry around references to the mapper and the database driver, they tend to bloat debug output. More fundamentally, our object models are evolving to a design which does not really look like database tables at all. We still love Rdo and may resort to Rdo entities with interfaces here and there, especially in prototyping. But our development model has long shifted towards unittest-early or unittest-driven rather.

Beware of the edges

Domain Driven Design purists may emphasize how domain models will rarely have attribute setters and some will even argue you should minimize your getters (tell don’t ask). However, at some point we need to get raw attributes to compose messages to the edges, to answer API requests or to persist any objects. There are multiple ways to do it, with different implications. Remember the Data Transfer Objects? I tend to have method on my domain aggregate to “eject” a fully detailed world readable object. In many cases, it is even typed. This chatty object can now be used for Formatters to transform them into API messages or data files. So the operation would be: Construct the domain aggregate – and fail if data is invalid – and from the domain aggregate, construct the DTO. Use the DTO for chatty jobs.

But how about the other way around? Messages from the outside can be malformed in many ways. They may be tampered with, they may miss details by design. If I exported an object to a user’s limited API and he sends an update, how to handle that? If an API user does not even know advanced attributes that are subject to other views and use cases, how would he create new entities?

I tend to have a method on the repository which accepts these messages and tries to turn them into Domain Objects. If the partial message contains some ID field or a sufficient set of data to form a unique key, I will first ask the backend to get the original details of the object and then apply the message details – and fail if this violates the constraits of my aggregate. If no previous version exists in the backend, missing information is added from defaults and generators, including a unique id of some sort. The message is applied on top.

The is approach may be expensive. I am creating full domain objects just to render database content to another format which may not even contain most of the retrieved data. On the other hand, if I know an object exists in a relational db, I could use some cheap SQL to apply partial changes without ever constructing the full model.

Creating straight-to-output repos with optimized queries is a tradeoff. They mean additional maintenance burden whenever the model changes, additional parts to cover with tests. I only do this when performance actually becomes an issue, not by default.

Creating straight-to-persistence methods for partial messages is even less desirable. It is circumventing integrity check at code level. However, sometimes the message itself is the artifact to persist – to be later processed by a queue or other asynchronous process.