bookmark_borderModernizing horde/text_diff

If you ever read a github pull request or similar extension proposal, you will likely have seen side by side comparisons of the original and the changed file. You may also have seen some text format that highlights only differences and a little context but hides the unchanged rest of the file. Both of these formats are called Diff, named after the popular diff and patch utilities dating back to ancient Unix times. The git diff command does something very similar. The horde/text_diff library and its ancestor, the pear/text_diff library, are tools to generate and format such difference information for different usage scenarios.

Apart from Horde’s internal usage in its repository viewer, horde/chora, and its wiki software, horde/wicked, the tool is also used by external parties. WebSVN maintainer Michael O. approached me because he wanted to use a PHP 8.2 ready version of horde/text_diff to substitute an older component which did not do the job. Michael has been very helpful in getting me started, pointing me to some issues to solve and also providing his own solutions in some parts. The result is a conservative update of horde/text_diff that will run in the upcoming versions of PHP without causing any trouble. But this is only where I started.

Breaking bad habits

A closer look at the internal structure of the library showed that it deserved a major overhaul. The solution was to refrain from a verbatim upgrade to namespaces and the likes but to actually change some things. This meant breaking backwards compatibility. I go to great lengths to keep a conservative drop-in version of everything I touch in the lib/ folder. Sometimes it is just an interface or wrapper, sometimes the new and old code do not really share a lot.

I began with adding type hints to most methods. Targetting PHP 8.1+ for the src/ path allows to use union types and intersection types. A lot of knowledge hidden in phpdoc comments is moved into actual code and makes it more robust.

Exploring the code for base classes and interfaces, I noticed that some things I did not like.

Method signatures did not add up

Some method signatures did not add up. Depending on the type of Diff Engine, the diff() method would take different types of arguments. The interface was mixing the specifics of how the diff engine is set up with the command to create a list of operations objects. Loading the engine is now separated from running it. The running method is now always called in the same way.

Internal dependency creation

The Diff, Threeway Diff and Mapped Diff utilities all created their diff engine internally. To do this, they needed a very flexible constructor that allowed passing whatever is needed to set up the actual engine. That was bad enough but they also did it in different ways. The Differs’ constructor now only accepted a pre-constructed engine. For convenience, I created a factory which would take over the responsibilities originally assigned to the differs’ constructor: Building a differ from input and if no specific differ was chosen, selecting one by some priority logic. In the end it turned out that the Differ does not need the engine at all but rather needs the product of the Engine: A set of operations to transform document A to document B. Born is the OperationList class. I did not want to just pass an array. I added a small static method as a named constructor. It frees the actual constructor of too many responsibilities and allows to keep the interface clean and strict.

More explicit type juggling

Creation if diffs contains some interesting math. The algorithms use a lot of short variable names and operations that make sense if you know the underlying theory but otherwise look like garbage. I added some explicit conversions between string and integer and made some changes to ensure a number zero or an empty string is not mistaken for a “false” or “null” value which would have another meaning.
Overall it is now much easier for static analyzers to spot any issues.

Dual stacks have a price tag

Essentially maintaining two different sets of the library comes with some cost. One must ensure that unit tests targeted at the newer platform are not run when testing for compatibility with the older platform. The conservative lib/ is ready to run PHP 7.4 code but the modern version in src/ must be transpiled to be run in PHP 7.4 – I do that on release but still it is another aspect that needs minding. As I also use automated tools for some upgrade tasks, I need to ensure I do not upgrade the lib/ path. The price is worth it as I cannot convert the code base at once and I want to provide a good development experience to all who are caught in between maintaining an older release or creating new code. I am in that spot myself. Essentially it allows me to run two conflicting major versions of some critical libraries and pick the right one for different sub systems. The need will go away as code is gradually migrating towards the newer implementations. At some point a next major version will drop the conservative path. Anybody interested is free to maintain the older major version and keep using it.

Upcoming work

I consider the external interface of the newer horde/text_diff implementation fairly stable by now. Internally, however, there is a lot of room for improvement. Some functionality should move out of the base classes and into separate traits – which the base classes will use. Some getters should be added and used, preparing to move some public variables to internal state in a next major release. The new OperationList gets unloaded to plain arrays in several places – It needs to learn some tricks without degrading into a glorified array. None of this should stop early adopters from using the new code base. None of this is supposed to break any user code.

Out of scope for now

There are some items which I decided to postpone for now. One thing which bothers me is the amount of dependencies. While a dependency on horde/exception makes sense, it pulls in horde/translation for no good reason. Horde\Util is pulled in but really only used in two places: A horde/string call which could be reduced to a direct call to an internal function and one call to a helper for handling temporary files. That helper should maybe live in its own library, nicely decoupled from unrelated utilities. There was a reason why they were packed together but it is no longer relevant.

Also, some functionality is missing in the Xdiff-based engine. Most distributions do not even offer php-xdiff, including my own development platform. I will add that feature once I get it into CI and into the development setup. I do not want to delay other items to do that right now. Patches welcome 🙂

bookmark_borderCode Generators: Bad, worse & ugly

Code generators have been invented and forgotten at least four times in software history. They have an appeal to developers like the sun to Daedalus’ son. Let’s not be Icarus, let’s keep them generators at a distance and watch them carefully.

Whenever a language, framework or paradigm forces developers to do the same thing over and over and over and over again they will try to get rid of that repetition. One approach is to automate writing code. It is not pretty but it saves time to concentrate on more interesting and useful things. Seasoned and reasonable software developers have resorted to that solution and many inexperienced have followed. Outside very narrow use cases I think generated code should better not exist at all.

Valid use case

Generated code and other kinds of boilerplate code are valid where avoiding them is not practical. This is often true for entry points. Depending on language, it might not be trivial for a the first piece of running code to find

  • its configuration
  • its collaborators, base classes, dependencies
  • useful data from previous runs

I have written a long, long piece on two-stage autoloaders and other two-stage bootstrapping topics and I keep rewriting it, breaking out separate topics. It is still unreleased for a reason. Any two-stage process that splits automated detection or definition of artifacts from the production run that uses them is essentially code generation. Avoiding it might be possible but impractical. Some level of repetition cannot be avoided at all and is best automated.

Another valid use case is generating code from code or transpiling. Nothing is wrong with that.

Unfortunate Use Cases

There are other use cases that should be avoided. Your framework follows convention over configuration so making magic work requires having some artifacts in the right places. Even if they have no natural place in your specific solution they are needed for technical reasons so you copy/paste or auto-generate the minimum sufficient implementation and make it fit. This is something to look for. Often there are ways around it. Another case is limits of the underlying language. You evolved from using magic properties and methods to implementing type safe, explicit equivalents but now you have to re-invent the type specific list or container type and you automate it. Bonus points if your ORM tool requires it. If your language does not support generics or another templating method, you are stuck between repetitive, explicit code and weakly typed magic. You end up using a code generator. Hopefully at one point somebody is annoyed enough and ventures to bring generics into your language. That would be the better solution but it is likely out of scope for your day to day work.

Stinking unnecessary use cases

Beyond that you are likely in the land of fluff where things just happen while they can and lines of code are generated just because it is customary. This is a foolish thing best avoided. Granted, automating code is better than hand-writing it. It does however not mean the code should exist at all. If you have no specific reason to repeat code, it is likely a design smell. This is not new, the Cunningham wiki had this thought a decade or more ago. Likely they were not even the first to recognize it. Refactoring, abstraction, configuring the specifics can help reduce the necessity for repetitive code.

My programming tools were full of wizards. Little dialog boxes waiting for me to click “Next” and “Next” and “Finish.” Click and drag and shazzam! — thousands of lines of working code. No need to get into the “hassle” of remembering the language. No need to even learn it. It is a powerful siren-song lure: You can make your program do all these wonderful and complicated things, and you don’t really need to understand.

Ellen Ullman: The dumbing-down of programming, Salon, May 1998 https://www.salon.com/1998/05/12/feature_321/

Let us take the input to a code generator and make it the input to abstracted, ready to run code instead. We will know when it is not practical, not performant or not possible. Then code generation is a blessed thing. Otherwise it is a sin.

bookmark_borderTools to build better Tools faster

Behind every lofty architecture mantra there is mundane execution. This is best left to tools and I don’t mean anybody in particular but programs that help us make better programs. It basically goes like this: Build tool. Use tool. Build better tool. Build tool to build better tool. Build better tool to build better tool faster. And so on. Implementing this in practice can be quite boring but the alternative is to do boring things again and again and again and that’s enough already. So let’s see.

Maintaining 100+ libraries and programs involves doing a few things over and over again. Automating these seems natural but requires some thought. Developers want to spend their time in interesting and useful ways. Querying and manipulating git repositories is repetitive. Updating a changelog file with a select subset of messages also present in the git commits is repetitive. Rewriting project metadata and updating CI jobs for new PHPUnit and PHP versions or base operating systems is repetitive and requires no brains at all, why should I do this 100+ times?

Off the shelf tools

Using tools that already exist and are maintained by other parties is a no-brainer. Which tools can help?

  • PHPUnit helps us spot and eliminate regressions before any user is affected. The tool itself is maintained by Sebastian Bergmann but writing and upgrading the actual test code is a chore.
  • PHPStan or Psalm – I prefer PHPStan – are static analyzers which help developers spot places where signatures, types and assumptions don’t add up. To get the best out of it, either phpdoc annotations or parameter and return types must be added. No tests to write, which is good – but PHPStan is organized in progressively strict levels and each library needs to be checked against the level it is supposed to pass. Micromanaging that is boring as hell, tools are needed.
  • php-cs-fixer is developed by friendsofphp – it is a basic code manipulation tool which helps anywhere from adhering to PSR-12 or PER-1 to automatically upgrading from array() to [] notation. Configuring this beast is easy but ensuring the most current rules are used in every project is another burden.
  • rector is another tool that transpiles code either up or down to select standards. It will move implicit knowledge or phpdoc data into actual code or do the different thing. It will choose older ways to express something over new ones or vice versa. Configuring it to do only what is helpful is quite a challenge. Also ensuring the most recent config is used is just boring and cumbersome. Tool needed.

Homegrown tools

The horde project has some home grown tools that can help but need development themselves.

  • horde/git-tools by Michael Rubinsky used to be the way to assemble a bleeding edge developer copy from zillions of github repos. In a modern composer based installation this tool is less useful but it contains a lot of interesting capabilities that should be factored out
  • horde/components can generate composer and pear metadata from a self-defined yaml format. It can create tar archives from repositories, implements a basic workflow engine for release and quality check tasks and does some other things. Its internal architecture is rooted in history and while some of its functionality seems out of touch with 2022, many other parts deserve expanding or factoring out into modern self-contained libraries for reuse.
  • horde/hordectl is a command line tool to interact with a Horde installation. Inject users and passwords, configure permissions, groups or app-specific resources from yaml files and defaults. It needs some upgrading, it could do so much more to facilitate proof of concept, showcase or CI installations.
  • horde/horde-installer-plugin is a plugin for composer that helps bootstrap a horde installation and its web-readable part. Much of its code would best be moved out to separate libraries.

Building blocks

Existing and new libraries should inherit functionality moved out from existing tools or newly created

  • horde/vcs is a version control library. Its main origin are the horde/chora application and the installation/development tools. Recently I began to move or re-implement code from git-tools and horde/components into this library. I am less interested in the rcs, cvs and svn parts. The original library followed an approach abstracting the differences between git, cvs, svn & friends. This limits its usefulness. I see how it facilitates creating an application that consumes and shows code from these. Still, there should be a lower level of abstraction that provides the unique capabilities of git in a programmatic fashion. This is one thing I currently work on
  • horde/rampage used to be a dead end but I am reusing the library for deployment and introspection related code factored out from other tools.
  • horde/filesystem is a new library, focused on object-oriented filesystem traversal and manipulation. Still very immature but I hope to turn it into a standalone and reusable tool.
  • horde/registry is the stub of an upcoming redesign of the core bootstrapping process. No more globals, reliance on PSR-11 DI containers and PSR-4 autoloading – this registry will do less than its ancestors yet be much more powerful and easy to use. This is still much work.
  • horde/cli_modular is a tool to write extensible, pluggable commandline interfaces. It is used by horde/git-tools, horde/components, horde/hordectl and a few others. In the current upgrade cycle some redesign is necessary to make it viable for modern environments and free it from problems already solved by autoloaders or DI containers.

So much work to do but devoting some time to better tools is better than doing mindless conversions of existing code over and over.

bookmark_borderWhy you should develop for latest, greatest

Developers sometimes choose not to use the latest available language features that would be appropriate to tackle a problem for fear of alienating users and collaborators. This is a bad habit and we should stop doing that. Part of the solution are transpilers. What are transpilers, where are they used and what is the benefit? Why should we consider transpiling all our code?

I cut this piece from an upcoming article that is way too long anyway. I made this new article by reusing and reshaping existing text for a new audience and frame. You are reading a new text that first was built as a part of another text. – Yes. This was transpiling: Rephrasing an input, including externally supplied, derivative or implicit facts about it to an output that generally expresses the same. Excuse me, what? Let me go into some details.

Transpiling: Saying the same but different

In software, transpilers are also known as source to source compilers. They take in a program written in one language and write out a roughly equivalent program for another language. The other language may be another version or dialect of the input language or something entirely different. Don’t be too critical about the words: transpilers are just like all other compilers. Source code is machine-intelligible, otherwise it could not be compiled. Machine code is intelligible by humans, at least in principle.

Preprocessors are transpilers

A preprocessor is essentially a transpiler even if it does not interpret the program itself. The C language preprocessor is a mighty tool. It allows you to write placeholders that will be exchanged for code before the actual C compiler touches it. These placeholders may even have parameters or make the program include more code only if needed. Concatenating many source files into one and minifying these by stripping unnecessary whitespace can also be seen as a primitive form of transpiling.

Coding Style Fixers are transpilers

Automatic tools that edit your source code are transpilers. They might only exchange tabs for four space characters or make sure your curly braces are always in the same place or they may do much more involved stuff. For example php-cs-fixer transforms your technically correct code written in plain PHP into technically correct code in standards-conforming plain PHP. One such standard is PSR-2, later deprecated in favor of PSR-12 and PER-1 – these are all maintained by the PHP FIG. Software projects may define their own standards and configure tools to transpile existing code to conform to their evolving standards.

Compilers are transpilers

A compiler is a transpiler. It takes in the source code and builds a machine-executable artifact, the binary code. It might also build a byte code for some execution platform like Java’s JVM. It might build code for a relatively primitive intermediate language like CIL or a machine specific Assembly Language. Another compiler or an interpreter will be able to work with that to either run the software or turn it into a further optimized format. These transformations are potentially lossy.

Decompilers are transpilers

Earlier in life I used tools like SoftICE that would translate back from binary machine instructions to Assembly Language so that I could understand what exactly the machine is doing and make it do some unorthodox things. Compiling back from Machine Code to the machine-specific Assembly Language is technically possible and lossless but the result is not pretty.

Lost in translation

When humans rewrite a text for another target audience, they will remove remarks that are unintelligible or irrelevant to the new audience. They may also add things that were previously understood without saying or generally known in the former audience. Transpilers do the same. When they transpile for machine consumption, they remove what the machine has no interest in: Whitespace, comments, etc. They can also replace higher concept expressions by detailed instructions in lower concepts. Imagine I compiled a program from assembly language into binary machine code and then decompiled back to assembly language. Is it still the same program? Yes and no. It is still able to compile back into the same machine program. It does not look like the program I originally wrote. Any macro with a meaninful name was replaced by the equivalent step by step instructions without any guidance what their intention is. Any comments I wrote to help me or others navigate and reason about the code are lost in translation. The same is true anytime when we translate from a more expressive higher concept to a lower concept. Any implicit defaults I did not express now show up as deliberate and explicit artifacts or the other way around, depending on tool settings.

Lost in Translation but with Humans

You may know that from machine translated text. Put any non trivial text into a machine translator, translate it from English to Russian to Chinese to German and then back to English. In the best case, it still expresses the core concept. In the worst case it is complete garbage and misleading.

Another such thing are Controlled Languages like Simple English, Français fondamental, Leichte Sprache, etc. They use a reduced syntax with less options and variations and a smaller selection of words. Some like Aviation English or Seaspeak also try to reduce chance for fatal ambiguity or mishearing.

These reduced languages are supposedly helpful for those who cannot read very well, are still learning the language or have a learning disability. They may also enable speakers of a closely related foreign language to understand a text and they generally cater to machine translation. For those who easily navigate the full blown syntax and vocabulary and can cope with ambiguity and pun, simplified language can be repetitive, boring and an unnecessary burden to actual communication. Choosing a phrase or well known roughly fitting word over a less used but more precise word is an intellectual effort. Reading a known specific word can be easier on the brain than constructing a meaning from a group of more common words. Speaking to an expert in a language deliberately evading technical terms may have an unintended subtext. Speaking to a layman in lawyerese or technobabble might not only make it hard for them to understand you but also hard for them to like you. Readers will leave if I make this section any longer.

Useful application

Now that everybody is bored enough, let’s see why it is useful and how good it is.

Upgrading Code to newer language versions

You can use a transpiler to upgrade code to a newer version of the language. Why would you want that? Languages evolve. New features are added that allow you to write less and mean the same. Old expressions become ambiguous by new syntax and features. Keywords can be reserved that previously weren’t. Old features become deprecated and will finally stop working in later versions. A transpiler can rewrite your code in a way that it will run in the current and next version of a language. It can also move meta information from comments or annotations into actual language code.

    // Before
    /**
     * @readonly
     * @access private
     * @var FluffProviderInterface Tool that adds bovine output
     */
    var $fluffProvider;
    /**
     * Constructs an example
     * 
     * @access public
     *
     * @param IllustrableTopicInterface A topic to explain by example
     * @param bool $padWithFluff        Whether to make it longer than needed
     * @param int  $targetLength        How long to make the article
     *
     * @return string The Article
     */
    function constructExample($topic, $padWithFluff=true, $targetLength=3000)

Now that’s what I call a contrived example. Code might look like this if it was originally written in PHP 4 and later enhanced over the years, only using new expressiveness where needed. While it technically runs, it is not how we would possibly write it today.

    // After
    /**
     * @var FluffProviderInterface Tool that adds bovine output
     */
    private readonly FluffProviderInterface $fluffProvider;
    /**
     * Constructs an example
     * 
     * @param IllustrableTopicInterface A topic to explain by example
     * @param bool $padWithFluff        Whether to make it longer than needed
     * @param int  $targetLength        How long to make the article
     *
     * @return string The Article
     */
    public function constructExample(
        IllustrableTopicInterface $topic,
        bool $padWithFluff=true,
        int $targetLength=3000
    ): string

That could be the output of a transpiler. It takes meta information from controlled language in the comments and uses the advanced grammar of the improved PHP language to express them.
In other words, the upgraded code has turned instructions for the human or for external tools into instructions that the language can actually enforce at runtime. Before it helped you understand what to put in and what to expect out. Now it forbids you from putting in the wrong things and errors if the code tries to give back anything but text.

It may drop comments that are already expressed in the actual code. Some project standards suggest to drop @param and @return altogether to make the code more consise to read. I am a little conservative on this topic. A documentation block may be removed if it does not contain any guidance beyond the code. There is no need to rephrase “this is the constructor” or “The parameter of type integer with name $targetLength tells you how long it should be”. But sometimes things deserve explaining and sometimes the type annotations exceed what actual the language expresses. Intersection types are PHP 8.1+. PHP 8.2 can express “return this class or false but not true” while before the language only allowed “This class or a boolean (either true or false)”. Annotations can be read by tools to work with your code. As demonstrated, a transpiler can use them to rewrite your code to a more robust form. Static analyzers can detect type mismatch that can lead to all sorts of bugs and misbehaviours. Documentation generators can strip away the actual code and transform the comments and structural information into something you can easily navigate and reason about. Code including high concept and documentation is first and foremost for humans. Adapting it for machines often means dumbing it down.

Downgrading to an older platform or language version

Code can be transformed in the other direction, stripping or replacing advanced expressiveness to make the older runtime understand the code. This is very popular with Frontend Developers: Both Javascript and CSS are usually no longer shipped the way they are written. A variety of type safe and advanced languages exist that are not even intended to be run in their source form but compiled down to a more or less modern standard of JavaScript, then minified to the smallest valid representation. Possibly variable and function identifiers are changed to avoid them colliding between unrelated software loaded into the same browser. In other languages, we are used to develop against a target baseline and only use the features it provides, plus annotations for concepts it does not support. We choose the baseline by deciding on the lowest platform we want to or have to support. This is jolly insane and I mean it in a nice way.

Imagine we create a book for small children. We will first create a compelling story, lovely characters and possibly some educational tangent using our words and our thoughts, the level of abstraction we are fluent in and the tools we can handle. We finally take care to adapt wording, level of detail and difficult concepts to fit the desired product.
We don’t write to the agent, the publisher or the printing house in baby english. So why should we use anything less than our own development environment supports? It is not healthy. Outside very special situations or for the joy of it, we generally don’t work with one hand tied to the back, using antiquated tools and following outmoded standards.

This Catapult resembles the state of the art centuries ago. Shooting it is fun. For any other purpose it is the wrong tool.
This Catapult resembles the state of the art centuries ago. Shooting it is fun. For any other purpose it is the wrong tool.

If we cater to the lowest assumable set of capabilities at development time, we limit ourselves in a costly way. We cannot benefit from the latest and most convenient, i.e. effortless and reliable set of tools. We are slower than we could be, we will make more mistakes and it will exhaust us more than needed.

Provided our production pipeline from the development laptop or container to the CI are able to work with the latest tools, we can use them.

Deliver using a transpiler

The source branch should always target your development baseline, tools as modern as you can come by. Delivery artifacts, i.e. released versions, should deviate from the source distribution anyway:

  • Why should you ship build time dependencies with a release?
  • Why should you ship CI recipes or linter configurations with a release?
  • Depending on circumstances, shipping the unit tests might be useful or waste.
  • You would not normally ship your .git directory, would you?

Adding a transpiler step is just another item, just another reason. Transpiling to your lowest supported baseline is not really different from zipping a file, editing a version string or running a test suite to abort faulty builds before they ship. But still, it is not perfect. The shipped code will run on the oldest supported environment but it will miss many runtime benefits of newer versions. This is especially true if your library is a build time dependency of another project. In the best scenario, a build for a fairly recent but reasonable platform expectation exists and another build for an well-chosen older target exists. Both need to run through the test suite and ideally the older build will pass the test suite both when actually run on the old platform and when run on an upgraded platform. There are some details, edge cases and precautions needed to make this feasible and reliable. This will be detailed in an upcoming article which just shrank by a good portion.

bookmark_borderHorde/Yaml: Graceful degradation

Tonight’s work was polishing maintaina’s version of horde/yaml. The test suite and the CI jobs now run successfully both on PHP 7.4 and PHP 8.1. The usual bits of upgrading, you might say.

However, I had my eye on horde/yaml for a reason. I wanted to use it as part of my improvements to the horde composer plugin. Composer famously has been rejecting reading any yaml files for roughly a decade so I need to roll my own yaml reader if I want to deal with horde’s changelog, project definition and a few other files. I wanted to keep the footprint small though, not install half a framework along with the installer utility.

You never walk alone – There is no singular in horde

The library used to pull in quite a zoo of horde friends and I wondered why exactly. The answer was quite surprising. There is no singular in horde. Almost none of the packages can be installed without at least one dependency. In detail, horde/yaml pulled in horde/util because it used exactly one static method in exactly one place. It turned out while that method is powerful and has many use cases, it was used in a way that resulted in a very simple call to a PHP builtin function. I decided whenever the library is not around I will directly call that function and lose whatever benefits the other library might grant over this. This pattern is called graceful degradation. If a feature is missing, deliver the next best available alternative rather than just give up and fail. The util library kept installing although the yaml parser no longer needed it. The parser still depended on the horde/exception package which in turn depended on horde/translation and a few other helpers. Finally horde/test also depended on horde/util. It was time to allow a way out. While all of these are installed in any horde centric use case, anybody who wants only a neat little yaml parser would be very unhappy about that dependency crowd.

Alternative exceptions

The library already used native PHP exceptions in many places but wrapped Horde exceptions for some more intricate cases. While this is all desirable, we can also do without it. If the horde/exception package is available, it will be used. Otherwise one of the builtin exceptions is raised instead. This required to update the test suite to make it run correctly either way. But what is the point if the test suite will install horde/util anyway?

Running tests without horde/test unless it is available

I noticed none of the tests really depended on horde/test functionality. Only some glue code for utilities like the horde/test runner or horde/components really did anything useful. I decided to change the bootstrap code so that it would not outright fail if horde/test was not around. Now the library can be tested by an external phpunit installation, phar or whatever. It does not even need a “composer install” run, only a “composer dump-autoload --dev” to build the autoloader file.

A standalone yaml parser

The final result is a horde/yaml that still provides all integrations when run together with its peer libraries but can be used as a standalone yaml parser if that is desirable. I hope this helps make the package more popular outside the horde context.

Lessons learned

Sometimes less is more. Factoring out aspects for reuse is good. Factoring out aspects into all-powerful utility libraries like “util”, “support” and the likes can glue an otherwise self contained piece of software together with too many other things. That makes them less attractive and harder to work with. Gracefully running nevertheless is one part. The other is redesigning said packages which cover too many aspects at once. This is a topic for another article in another night though.

bookmark_borderPHP: Tentative Return Types

PHP 8.1 has introduced tentative return types. This can make older code spit out warnings like mad.
Let’s examine what it means and how to deal with it.

PHP 8.1 Warnings that will become syntax errors by PHP 9

PHP 7.4 to PHP 8.1 have introduced a lot of parameter types and return types to builtin classes that previously did not have types in their signatures. This would make any class extending builtin classes or implementing builtin interface break for the new PHP versions if they did not have the return type specified and would create interesting breaks on older PHP versions.

Remember the Liskov Substitution Principle (LSP): Objects of a parent class can be replaced by objects of the child class. For this to work, several conditions must be met:

  • Return types must be covariant, meaning the same as the parent’s return type or a more specific sub type. If the parent class guarantees to return an iterable then the child class must guarantee an iterable or something more specific, i.e. an ArrayObject or a MyFooList (implements an iterable type).
  • Parameter types must be contravariant, meaning they must allow all parameters the parent would allow, and can possibly allow a wider set of inputs. The child class cannot un-allow anything the parent would accept.
  • Exceptions are often forgotten: Barbara Liskov‘s work implies that Exceptions thrown by a subtype must be the same type as exceptions of the parent type. This allows for child exceptions or wrapping unrelated exceptions into related types.
  • There are some more expectations on the behaviour and semantics of derived classes which usually are ignored by many novice and intermediate programmers and sadly also some senior architects.

Historically, PHP was very lax about any of these requirements. PHP 4 brought classes and some limited inheritance, PHP 5 brought private and protected methods and properties, a new type of constructor and some very limited type system for arrays and classes. PHP 7 and 8 brought union types, intersection types, return type declaration and primitive types (int, string) along with the strict mode. Each version introduced some more constraints on inheritance in the spirit of LSP and gave us the traits feature to keep us from abusing inheritance for language assisted copy/paste. Each version also came with some subtle exceptions from LSP rules to allow backward compatibility, at least for the time being.

In parallel to return types, a lot of internal classes have changed from returning bare PHP resources to actual classes. Library code usually hides these differences and can be upgraded to work with either, depending on which PHP version they run. However, libraries that extend internal classes rather than wrapping them are facing some issues.

PHP’s solution was to make the return type tentative. Extending classes are supposed to declare compatible return types. Incompatible return types are a syntax error just like in a normal user class. Missing return types, no declaration at all, however, are handled more gracefully. Before PHP 8.1, they were silently ignored. Starting in PHP 8.1 they still work as before, but emit a deprecation notice to PHP’s error output, usually a logfile or the systemd journal. Starting in PHP 9 they will be turned into regular syntax errors.

Why is this good?

Adding types to internal classes helps developers use return values more correctly. Modern editors and IDEs like Visual Studio Code or PhpStorm are aware of class signatures and can inform the users about the intended types just as they write the code. Static analysis tools recognize types and signatures as well as some special comments (phpdoc) and can give insight into more subtle edge cases. One such utility is PHPStan. All together they allow us to be more productive, write more robust code with less bugs of the trivial and not so trivial types. This frees us from being super smart on the technical level or hunting down inexplicable, hard to reproduce issues. We can use this saved time and effort to be smarter on the conceptual level: This is where features grow, this is where most performance is usually won and lost.

Why is this bad?

Change is inevitable. Change is usually for the better, even if we don’t see it at first. However, change brings maintenance burden. In the past, Linux distributions often shipped well-tested but old PHP versions to begin with and release cycles, especially in the enterprise environment, were quite long. Developers would have had to write code that would run on the most recent PHP as well as versions released many years ago. Administrators would frown upon developers who always wanted the latest, greatest versions for their silly PHP toys. Real men use Perl anyway. But this has changed a lot. Developers and administrators now coexist peacefully in DevOps teams, CI pipelines bundle OS components, PHP and the latest application code into container images. Containers are bundled into deployments and somebody out there on the internet consumes these bundles with a shell oneliner or a click in some UI and expects a whole zoo of software to start up and cooperate. Things are moving much faster now. The larger the code base you own, the more time you spend on technically boring conversion work. You can be lucky and leverage a lot of external code. The downside is you are now caught in the intersection between PHP’s release cycle and the external code developer’s release cycles – the more vendors the more components that must be kept in sync. PHP 9 is far away but the time window for these technical changes can be more narrow than you think. After all, you have to deliver features and keep up with subtle changes in the behaviour and API of databases, consumed external services, key/value stores and so on. Just keeping a larger piece of software running in a changing and diverse environment is actually hard work. Let’s look at the available options.

How to silence it – Without breaking PHP 5

You can leverage a new attribute introduced in PHP 8.1 – just add it to your code base right above the method. It signals to PHP that it should not emit a notice about the mismatch.

<?php
class Horde_Ancient_ArrayType implements ArrayAccess {
    /**
     * @return bool PHP 8.1 would require a bool return time 
     */
    #[\ReturnTypeWillChange]
    public function offsetExists(mixed $offset) {
        // Implementation here
    }
...
}

Older PHP that does not know this attribute would just read it as a comment. Hash style comments have been around for long and while most style guides avoid them, they are enabled in all modern PHP versions. This approach will work fine until PHP 9.

How to fix it properly – Be safe for upcoming PHP 9

The obvious way forward is to just change the signature of your extending class.

<?php
class Horde_Ancient_ArrayType implements ArrayAccess {
    public function offsetExists(mixed $offset): bool {
        // Implementation here
    }
...
}

The change itself is simple enough. If your class is part of a wider type hierarchy, you will need to update all downstream inheriting classes as well. If you like to, you can also reduce checking code on the receiving side that previously guarded against unexpected input or just satisfied your static analyzer.
Tools like rector can help you mastering such tedious upgrade work over a large code base though they require non-trivial time to properly configure them for your specific needs. There are experts out there who can do this for you if you like to hire professional services – but don’t ask me please.

<?php
...
$exists = isset($ancient['element1']);
// No longer necessary - never mind the silly example
if (!is_bool($exists)) {
    throw new Horde_Exception("Some issue or other");
} 

Doing nothing is OK – For now

In many situations, reacting at all is a choice and not doing anything is a sane alternative. As always, it depends. You are planning a major refactoring, replace larger parts of code with a new library or major revision? Your customer has signaled he might move away from the code base? Don’t invest.

My approach for the maintaina-com code base

The maintaina-com github organization holds a fork of the Horde groupware and framework. With over 100 libraries and applications to maintain, it is a good example. While end users likely won’t see the difference, the code base is adapted for modern PHP versions, more recent major versions of external libraries, databases, composer as an installer and autoloader. Newer bits of code support the PHP-FIG standards from PSR-3 Logging to PSR-18 HTTP Client. Older pieces show their age in design and implementation. Exactly the amount of change described above makes it hard to merge back changes into the official horde builds – this is an ongoing effort. Changes from upstream horde are integrated as soon as possible.

I approach signature upgrades and other such tasks by grouping code in three categories:

  • Traditional code lives in /lib and follows a coding convention largely founded on PHP 5.x idioms, PSR-0 autoloading, PSR-1/PSR-2 guidelines with some exceptions. This code is mostly unnamespaced, some of it traces back into PHP 4 times. Coverage with unit tests is mostly good for libraries and lacking for applications. Some of this is just wrapping more modern implementations for consumption by older code, hiding incompatible improvements. This is where I adopt attributes when upstream does or when I happen to touch code but I make no active effort.
  • More modern code in /src follows PSR-4 autoloading, namespaces, PSR-12 coding standards, modern signatures and features to an increasing degree. This generally MUST run on PHP 7.4 and SHOULD run on recent PHP releases. This is where I actively pursue forward compatibility. Unit tests usually get a facelift to these standards and PHPStan coverage in a systematic fashion.
  • Glue code, utility code and interfaces are touched in a pragmatic fashion. Major rewrites come with updated standards and approaches, minor updates mostly ensure compatibility with the ever changing ecosystem.

If you maintain a large code base, you are likely know your own tradeoffs, the efforts you keep postponing in favour of more interesting or more urgent work until you have to. Your strategy might be different, porting everything to a certain baseline standard before approaching the next angle maybe. There is no right or wrong as long as it works for you.

bookmark_borderHorde/Skeleton: Modernized

Over the last months, a lot of new technologies have entered the Horde ecosystem. It was long overdue to modernize the Skeleton example app.

No more un-namespaced code

Skeleton has completely migrated to PSR-4 namespaced code in /src/ rather than traditional, unnamespaced code in /lib/. This includes framework integration classes like Application, Api, Ajax\Application but also the portal blocks.

All application internal classes of skeleton are now served via the Composer Autoloader and follow the PSR-4 standard. This requires the very latest releases of horde/horde (6.0.0alpha6) and horde/core (v3.0.0alpha9) to work correctly. The only exception is the database schema migration which intentionally does not follow regular autoloading conventions.

No more index.php

Client pages that traditionally called into the application’s internal classes have been removed and replaced by routes. This includes the index.php file. A default route handles the skeleton/ and skeleton/index.php cases. The contents of the example UI have not been changed. They are only implemented differently

Using horde/routes and horde/http_server

The new skeleton uses horde/routes to describe available routes in the app and horde/http_server to implement the controller classes behind these routes. The code comes with extensive documentation comments. horde/http_server implements the PSR-15: HTTP Server Request Handlers standard used by most modern PHP Frameworks.

Inter-App API

Skeleton now includes an example of the inter-app API implemented through the registry. The same interface is used as the basis for json-rpc and XMLRPC APIs.

Full Backward Compatibility

The changed libraries still work with unnamespaced or partially converted apps. Implementers can work according to their own schedule. However, there are some rules to keep in mind:

  • The namespaced versions of Application, Api, Test and Ajax\Application take precedence over their unnamespaced counterparts. Implementers can leave the unnamespaced code as-is or turn it into wrappers like
    lib/Application.php
    
    <?php
    /**
     * Backward compatibility wrapper.
     *
     * @deprecated Call into Horde\Skeleton\Application directly instead. 
     */ 
    Skeleton_Application extends Horde\Skeleton\Application {}
    
    
    
  • Portal Blocks should NOT be wrapped or duplicated. They should exist as either namespaced or unnamespaced versions. You can have both types in the same application, but if you have a wrapped or copied block, it will show up twice.
  • Ajax application handler classes only have one integration point, /{lib, src}/Ajax/Application.php. You can upgrade them any time, just change the reference in the Application class. The Ajax Application class itself can be duplicated or wrapped, but the namespaced version will always be chosen. The wrapper would be just a transitional backwards compatibility measure so your application still works with earlier alpha versions of the framework.

Not in this release

The current version of skeleton still leaves room for improvement. Not all external libraries used in the code base are already namespaced or otherwise modernized. The PageOutput helper is still emitting output which needs to be caught and redirected to the output stream as part of the PSR-7 HTTP Response object. Future versions should use a stream-ready implementation to reduce boilerplate. Also, there should be some ready-made controllers for standard cases like UI output or REST.

References

bookmark_borderHorde’s HTTP component goes PSR

This weekend, I gave the horde/http component a some major redesign. See how things escalated. Oh my.
My minimum goals were namespacing, PSR-4 (Revised Autoloading Standard) and some minor, schematic adjustments. The final result is quite different. I ended up implementing PSR-7 (HTTP Message Interface), PSR-17 (HTTP Factories) and PSR-18 (HTTP Client). The code largely complies with PSR-12 (Extended Coding Style Guide) and thus, implicitly, PSR-1 (Basic Coding Standard). I am sure you will find some deviations and issues, so I welcome any Pull Requests against my repo. You will find the new code in /src/. The original, incompatible implementation is untouched and resides in /lib/. They can coexist as they are so different (namespaced vs unnamespaced, among others).

This is not a total rewrite. I could leverage most of the existing code base with some tweaks. This work would not exist without the foundation by Chuck Hagenbuch and all the contributions from the different Horde maintainers over the years. You will also notice similarities to other php PSR-7/18 implementations out there. I checked out Guzzle, httplug/php-http and some others. It was a great learning experience and I will not pretend I am not influenced by it.

As with all my modernisation activities, I made use of features allowed by PHP 7.4. This excludes Constructor Property Promotion and, sadly, Union Types as both are PHP 8. Union Types have been relegated to phpdoc annotations or check methods. Please mind most of the PSR’s target compatibility with PHP releases older than PHP 7.4 and thus do not sport return types or scalar type hints. I followed these signatures where applicable.

One major change between the old codebase and the new one is clients and Request/Response classes.
In the old implementation, there would be one client but different Request/Response implementations using different backing technologies like pecl/http, fopen or curl. The new implementation moves transport code to clients implementing PSR-18. Optionally, they can be wrapped by a Horde\Http\HordeClientWrapper which exposes the PSR-18 itself, but otherwise mimics the old Horde_Http_Client class.

Horde/http is used by very different parts of Horde, including the horde/dav adapter to SabreDAV, various service integrations (Gravatar, Twitter, …), the horde/feed library and application code all over the place. I intend to upgrade those use cases to the new implementation. I am looking forward to criticism or acceptance of that approach.

The goal of this project is more far-reaching. While Horde 4 and Horde 5 already had horde/controller, they made very limited use of it. In my non-public projects, I relied heavily on controllers and I made several attempts at improving the way controllers are set up in horde/core. However, I always felt the results were clunky and not really what I wished to achieve. While horde/controller knows prefilters and postfilters, these are not easy to use and there are few examples. While doing research, I made up my mind. I want to replace Controller/Prefilter/Postfilter with their PSR-15 (HTTP Handlers and Middleware) equivalents. Controllers will be Handlers, Pre/Postfilters will be Middlewares. Together they will be stacks. Authentication, Authorization, Logging etc will be relegated to Middlewares. There will be a default stack to mimic the default controller behaviour in Horde 5 (Be authenticated or be relegated to the login page). You will be able to define application-specific default stacks or request-specific stacks. As Middlewares are a public standard, we might be able to leverage middlewares existing out in the wild or attract microframework users to some horde built middlewares. I want to make it easier for coders to build horde apps without relearning everything they needed to learn for laravel, laminas or symfony. I also want to make it easier for everyone to cooperate. Horde is among the oldest framework vendors, predating most of PEAR and Zend. I think we still have some bits to offer.

Missing Bits:

  • While I did some implementation of UploadedFileInterface, it is still quite basic
  • UploadedFileFactoryInterface is missing as I have not yet built the server side use cases
  • Unit Tests need to be adapted to the new code base. Is there some PSR Acid Test out there?
  • I began implementing the ext-http (PECL_HTTP) backend but stopped as I am unsure about it. That extension is in version 4 now and still services version 3, but we have backends for versions 1 and 2. I need to learn more about it and decice if it makes sense to invest into that aspect.