Levels of configuration languages

Posted by kaycebasques 2 days ago

Levels of configuration languages(beza1e1.tuxen.de)

74 points | 67 comments

susam 2 days ago|

About 20 or so years ago, I have come across a configuration pattern that could be arguably called "Level 0". It was configuration by file existence. The file itself would be typically empty. So no parsing, syntax, or schema involved. For example, if the file /opt/foo/foo.txt exists, the software does one thing but if it is missing the software does another thing. So effectively, the existence of the file serves as a boolean configuration flag.

bangonkeyboard 2 days ago||

A top-level /AppleInternal/ directory on macOS, even if empty, will enable certain features in Apple developer tools.

LeifCarrotson 2 days ago||

I've observed similar things to happen in other tools too - third-party tools that will list a certain set of file formats by default, and enable Autodesk formats if C:/Autodesk is present.

It seems to be most common in systems that cross major business domains (or come from completely separate companies). If you could just add an entry to the featureful, versioned, controlled config file then you'd do that, but if you can't developers frequently resort to the "does this path name resolve" heuristic.

somat 2 days ago|||

A number of traditional unix utilities change their behavior based on what their name is. /bin/test and /bin/[ come to mind. but I just checked and a quick survey of openbsd finds.

    eject mt
    [ test
    chgrp chmod
    cksum md5 sha1 sha256 sha512
    cpio pax tar
    ksh rksh sh

taken to it's logical extreme you end up with somthing like crunchgen https://man.openbsd.org/crunchgen which merges many independent programs into one and select which one to run based on the name.

And I am guilty of abusing symbolic links as a simple single value key value store. It turns out the link does not need to point to anything and using readlink(1) was easier than parsing a file.

cduzz 2 days ago||

ssh and rsh used to cause you to rsh/ssh to the name of the file...

so I had, in my home directory

~me/bin/snoopy

and if I wanted to log into snoopy, I'd just type

$ snoopy

and it'd rsh me into snoopy.

Hilarity was the day when someone ran

  cd /export/home && for user in * ; do chown -R $user:users ; done

(note the lack of the -h flag, which causes the ownership of the symbolic link instead of the reference of the symbolic link to be chowned)

2 days ago|||

noahjk 1 day ago|||

Similar behavior in the s6 overlay framework for containers. Some files do things just by existing IIRC

ks2048 2 days ago|||

This gives me an idea - store small integer parameters (<= 511) as file permissions (r/w/exe for user/group/other) on an empty file.

rzzzt 2 days ago||

Larger integers can go into UID:GID values.

umbra07 2 days ago|||

I use this approach for testing conditional logic in shell scripts sometimes.

esafak 2 days ago||

That's just cramming a flag into the file system.

alexambarch 2 days ago||

I’d argue Terraform/HCL is quite popular as a Level 4 configuration language. My biggest issue with it is that once things get sufficiently complex, you wish you were using a Level 5 language.

In fact, it’s hard to see where a Level 4 language perfectly fits. After you’ve surpassed the abilities of JSON or YAML (and you don’t opt for slapping on a templating engine like Helm does), it feels like jumping straight to Level 5 is worth the effort for the tooling and larger community.

default-kramer 2 days ago||

I'm very surprised we don't see more people using a level 5 language to generate Terraform (as level 3 JSON) for this exact reason. It would seem to be the best of both worlds -- use the powerful language to enforce consistency and correctness while still being able to read and diff the simple output to gain understanding. In this hypothetical workflow, Terraform constructs like variables and modules would not be used; they would be replaced by their counterparts in the level 5 language.

https://developer.hashicorp.com/terraform/language/syntax/js...

JanMa 2 days ago|||

That actually works quite well. I once built a templating engine for Terraform files based on JQ that reads in higher level Yaml definitions of the resources that should be created and outputs valid Terraform Json config. The main reason back then was that you couldn't dynamically create Terraform provider definitions in Terraform itself.

Later on I migrated the solution to Terramate which made it a lot more maintainable because you write HCL to template Terraform config instead of JQ filters.

rattyJ2 1 day ago||||

This is basically how pulumi and tfsdk work. You write go/ts/python/... that generates a bunch of config files to a temp folder, and then reconciles those.

harshitaneja 2 days ago|||

This is exactly how we do it with our very own rudimentary internal library and scripts. Barely enough and even though I worry at times it will break unexpectedly but so far we are surprised by how stable everything has been.

Although I really wish there was a first party solution or a well established library for this but I suspect that while it is easy to build only enough to support specific usecases but building a generic enough solution for everyone would be quite an undertaking.

danpalmer 2 days ago|||

The problem with HCL is that it's a Level 4 language masquerading as a Level 3 language, rather than a Level 4 language masquerading as a Level 5 (like Starlark, Dhall, even JSONNET). Because of that its syntax is very limited and it needs awkwardly nuanced semantics, and becomes difficult to use well as a result.

HCL is best used when the problem you're solving is nearly one you could use a level 3 language for, whereas in my experience, Starlark is only really worth it when what you need is nearly Python.

miningape 2 days ago||

The choice between 4 and 5 is more about what you get to avoid. By choosing level 5 you are opening the possibility to make some really complicated configurations and many more footguns. When you stay at level 4 you're forced into using more "standardised" blocks of code that can easily be looked up online and understood.

Level 4 is also far more declarative by nature, you cannot fully compute stuff so a lot is abstracted away declaratively. This also leads to simpler code since you're less encouraged to get into the weeds of instantiation and rather just declare what you'd like.

Overall it's about forcing simplicity by not allowing the scope of possibilities to explode. Certainly there are cases where you can't represent problems cleanly, but I think that tradeoff is worth it because of lowered complexity.

Another benefit of level 4 is that it's easier for your code can stay the same while changing the underlying system you're configuring. Since there's a driver layer between the level 4 configuration and the system which can (ideally) be swapped out.

sgeisenh 2 days ago||

> Don't waste time on discussions within a level.

I disagree with this. YAML has too many footguns (boolean conversions being the first among them) not to mention it is a superset of JSON. Plain old JSON or TOML are much simpler.

xelxebar 2 days ago||

> YAML has too many footguns (boolean conversions being the first among them)

Copying my own comment from elsewhere: https://news.ycombinator.com/item?id=43670716.

This has been fixed since 2009 with YAML 1.2. The problem is that everyone uses libyaml (_e.g._ PyYAML _etc._) which is stuck on 1.1 for reasons.

The 1.2 spec just treats all scalar types as opaque strings, along with a configurable mechanism[0] for auto-converting non-quoted scalars if you so please.

As such, I really don't quite grok why upstream libraries haven't moved to YAML 1.2. Would love to hear details from anyone with more info.

[0]:https://yaml.org/spec/1.2.2/#chapter-10-recommended-schemas

sevensor 2 days ago||

Lack of nulls in toml is a headache. No two yaml libraries agree on what a given yaml text means. Although json is bad at numbers, that’s more easily worked around.

AtlasBarfed 2 days ago||

This is just the complexity in individual files!

Configuration can be a lot more complicated. Look at dockerfiles, which are filesystems overlaid over each other, often sourced from the internet.

https://docs.spring.io/spring-boot/reference/features/extern...

Look at that: a massive 15 deep precedence order for pulling just individual values (oh man, doesn't even touch things like maps/lists that get merged/overridden).

That includes sources like the OS, environment-specific, a database (the JNDI registry), XML, JSON, .properties files, hardcodes. Honestly, I remember this being even deeper, I suspect they have simplified this.

This doesn't even get into secrets/secured configuration, which may require a web service invocation. I used to also pull config via ssh, or from private gits or github, from aws web service calls (THAT required another layer of config getting a TOTP cycled cred).

https://crushedby1sand0s.blogspot.com/2021/02/stages-of-desp...

I was right, the Spring config fallthru was deeper.

18172828286177 2 days ago||

> Don't waste time on discussions within a level. For example, JSON and YAML both have their problems and pitfalls but both are probably good enough.

Disagree. YAML is considerably easier to work with than JSON, and it’s worth dying on that hill.

MOARDONGZPLZ 2 days ago||

I love that there is one comment saying JSON is better, and then yours saying YAML is better.

drewcoo 2 days ago||

To be fair, both say "don't waste time" on it.

zzo38computer 2 days ago|||

I don't really like either format (I am not sure which is worse; both have significant problems). YAML has some problems (such as Norway problem and many other problems with the syntax), and JSON has different problems; and some problems are shared between both of them. Unicode is one problem that both of them have. Numbers are a problem in some implementations of JSON but it is not required. (Many other formats have some of these problems too, such as using Unicode, and using floating numbers and not integers, etc.)

I think DER is better (for structured data), although it is a binary format, but it is in canonical form. I made up the TER format which is a text format which you can compile to DER, and some additional types which can be used (such as a key/value list type). While Unicode is supported, there are other (sometimes better) character sets which you can also use.

(However, not all configuration files need structured data, and sometimes programs are also useful to include, and these and other considerations are also relevant for other formats, so not everything should use the same file formats anyways.)

dijksterhuis 2 days ago||

anchors/aliases/overrides are one of my favourite yaml features. i've done so much configuration de-duplication with them, it's unreal.

ajb 2 days ago||

I'm not convinced by reducing this to a single dimension. There are differences in both 'what can be expressed' and 'what validation can be done' which are somewhat independent of each other

qznc 2 days ago|

Hm, you got me thinking about reversible computing and how it could be applied to configuration.

Debugging a configuration becomes tedious once computation is involved. You think some value should be "foo" but it is "bar". Why is it "bar"? If someone wrote it there, the fix is simply to change. If "bar" is the result of some computation, you have to understand the algorithm and its inputs, which is significantly harder.

Given a "reversible" programming language that might be easier. Such languages are weird though and I don't know much about them. For example: https://en.wikipedia.org/wiki/Janus_(time-reversible_computi...

ajb 2 days ago||

Interesting idea! Although, maybe you just want to be able to run the configuration language in a reversible debugger?

This issue becomes even harder when you have some kind of solver involved, like a constraint solver or unification. As a user the solver is supposed to make your life easier but if it rejects something without a good enough error message you are stuck; having to examine the solver code to work out why is a much worse experience than not having a solver. (This is the same issue with clever type systems that need a solver)

behnamoh 2 days ago||

Lisp code is represented in the same data structure it manipulates. This homoiconicity makes Lisp code be a great config data especially in a Lisp program. In comparison, you can't represent JS code in JSON.

cap11235 1 day ago|

A concrete example of this is EDN in Clojure, like deps.edn for projects or bb.edn for babashka tasks. S-exps all the way down, so no specialized templating logic. Namespaced symbols, as well.

waynecochran 2 days ago||

https://jsonnet.org/ I never heard of this before. This seems like the JSON I wish I really had. Of course at some point you could just use JavaScript. I guess that fits w option 5.

rssoconnor 2 days ago||

Dave Cunningham created jsonnet from some conversations I had with him about how Nix's lazy language allows one to make recursive references between parts of one's configuration in a declarative way. No need to order the evaluation beforehand.

Dave also designed a way of doing "object oriented" programming in Nix which eventually turned into what is now known as overlays.

P.S. I'm pretty sure jsonnet is Turing complete. Once you get any level of programming, it's very hard not to be Turing complete.

pwm 2 days ago||

I would love to read more about all this, especially how overlays came from "object-oriented" programming. To me, the interesting part is their self-referential nature, for which lazy eval is indeed a great fit!

For anyone interested, this is how I'd illustrate Nix's overlays in Haskell (I know I know, I'm using one obscure lang to explain another...):

  data Attr a = Leaf a | Node (AttrSet a)
    deriving stock (Show, Functor)

  newtype AttrSet a = AttrSet (HashMap Text (Attr a))
    deriving stock (Show, Functor)
    deriving newtype (Semigroup, Monoid)

  type Overlay a = AttrSet a -> AttrSet a -> AttrSet a

  apply :: forall a. [Overlay a] -> AttrSet a -> AttrSet a
  apply overlays attrSet = fix go
    where
      go :: AttrSet a -> AttrSet a
      go final =
        let fs = map (\overlay -> overlay final) overlays
        in foldr (\f as -> f as <> as) attrSet fs

Which uses fix to tie the knot, so that each overlay has access to the final result of applying all overlays. To illustrate, if we do:

  find :: AttrSet a -> Text -> Maybe (Attr a)
  find (AttrSet m) k = HMap.lookup k m

  set :: AttrSet a -> Text -> Attr a -> AttrSet a
  set (AttrSet m) k v = AttrSet $ HMap.insert k v m

  overlayed =
    apply
      [ \final prev -> set prev "a" $ maybe (Leaf 0) (fmap (* 2)) (find final "b"),
        \_final prev -> set prev "b" $ Leaf 2
      ]
      (AttrSet $ HMap.fromList [("a", Leaf 1), ("b", Leaf 1)])

we get:

  λ overlayed
  AttrSet (fromList [("a",Leaf 4),("b",Leaf 2)])

Note that "a" is 4, not 2. Even though the "a = 2 * b" overlay was applied before the "b = 2" overlay, it had access to the final value of "b." The order of overlays still matters (it's right-to-left in my example tnx for foldr). For example, if I were to add another "b = 3" overlay in the middle, then "a" would be 6, not 4 (and if I add it to the end instead then "a" would stay 4).

rssoconnor 1 day ago||

I have the following file called oop.nix dated from that time.

    # Object Oriented Programming library for Nix
    # By Russell O'Connor in collaboration with David Cunningham.
    #
    # This library provides support for object oriented programming in Nix.
    # The library uses the following concepts.
    #
    # A *class* is an open recursive set.  An open recursive set is a function from
    # self to a set.  For example:
    #
    #     self : { x = 4; y = self.x + 1 }
    #
    # Technically an open recursive set is not recursive at all, however the function
    # is intended to be used to form a fixed point where self will be the resulting
    # set.
    #
    # An *object* is a value which is the fixed point of a class.  For example:
    #
    #    let class = self : { x = 4; y = self.x + 1; };
    #        object = class object; in
    #    object
    #
    # The value of this object is '{ x = 4; y = 5; }'.  The 'new' function in this
    # library takes a class and returns an object.
    #
    #     new (self : { x = 4; y = self. x + 1; });
    #
    # The 'new' function also adds an attribute called 'nixClass' that returns the
    # class that was originally used to define the object.
    #
    # The attributes of an object are sometimes called *methods*.
    #
    # Classes can be extended using the 'extend' function in this library.
    # the extend function takes a class and extension, and returns a new class.
    # An *extension* is a function from self and super to a set containing method
    # overrides.  The super argument provides access to methods prior to being
    # overloaded.  For example:
    #
    #    let class = self : { x = 4; y = self.x + 1; };
    #        subclass = extend class (self : super : { x = 5; y = super.y * self.x; });
    #    in new subclass
    #
    # denotes '{ x = 5; y = 30; nixClass = <LAMBDA>; }'.  30 equals (5 + 1) * 5).
    #
    # An extension can also omit the 'super' argument.
    #
    #    let class = self : { x = 4; y = self.x + 1; };
    #        subclass = extend class (self : { y = self.x + 5; });
    #    in new subclass
    #
    # denotes '{ x = 4; y = 9; nixClass = <LAMBDA>; }'.
    #
    # An extension can also omit both the 'self' and 'super' arguments.
    #
    #    let class = self : { x = 4; y = self.x + 1; };
    #        subclass = extend class { x = 3; };
    #    in new subclass
    #
    # denotes '{ x = 3; y = 4; nixClass = <LAMBDA>; }'.
    #
    # The 'newExtend' function is a composition of new and extend.  It takes a
    # class and and extension and returns an object which is an instance of the
    # class extended by the extension.
    
    rec {
      new = class :
        let instance = class instance // { nixClass = class; }; in instance;
    
      extend = class : extension : self :
        let super = class self; in super //
         (if builtins.isFunction extension
          then let extensionSelf = extension self; in
               if builtins.isFunction extensionSelf
               then extensionSelf super
               else extensionSelf
          else extension
         );
    
      newExtend = class : extension : new (extend class extension);
    }

In nix overlays, the names "final" and "prev" used to be called "self" and "super", owing to this OOP heritage, but people seemed to find those names confusing. Maybe you can still find old instances of the names "self" and "super" in places.

pwm 1 day ago||

Thanks for this! It's always nice to learn about the origins of things. I was around when they were called "self"/"super", but I never made the connection to OOP.

rssoconnor 1 day ago||

These slides from Ralf Hinze were instrumental to my understanding of how to use late-binding to define objects in a functional language: https://www.cs.ox.ac.uk/people/ralf.hinze/talks/Open.pdf

liveify 2 days ago||

I made a decision early on in a project to replace YAML with jsonnet for configuration and it was the best decision I made on that project - I’ve written tens of thousands of lines of jsonnet since.

jiggawatts 2 days ago||

In my opinion there's a "level 4.5" in between structured templating and full-blown procedural scripting: Using a general-purpose language to generate structured data, but then handing that over to a simpler system to materialise.

Pulumi is the best known example. Also, any time a normal programming language is used to generate something like an ARM template or any other kind of declarative deployment file.

This is the best-of-all-worlds in my opinion: Full capability, but with the safety of having an output that can be reviewed, committed to source control, diff-ed, etc...

tracnar 2 days ago|

Agreed. Also if you can generate your configuration at build time, it matters much less whether you use a Turing complete language or not. It then allows you to enforce limitations you care about, like e.g. forbidding network access, or making sure it builds within X seconds.

James_K 2 days ago|

"Use the lowest level possible" has always seemed rather stupid advice to me. What I suggest: use XML. Every programming language under the sun can spit out XML files, so you can generate them programmatically if needed, and it's not as if you'll ever sit there wishing you'd gone for a simpler format. Sachems make the files practically self-documenting and the tooling for them is brilliant.

More comments...