Show HN: TypeSchema – A JSON specification to describe data models

Posted by k42b3 10/24/2024

Show HN: TypeSchema – A JSON specification to describe data models(typeschema.org)

117 points | 49 comments

fridental 10/25/2024|

Three big downturns for me:

1) They do not publish rationale of why the world needs yet another protocol / language / framework on the homepage. It is hidden in https://typeschema.org/history

2) In the history page, they confuse strongly typed and statically typed languages. I have a prejudice about people doing this.

3) The biggest challenge about data models is not auto-generated code (that many people would avoid in principle anyway), but compressed, optimized wire serialization. So you START with selecting this for your application (eg. AVRO, CapnProto, MessagePack etc) and then use the schema definition language coming with the serialization tool you've chosen.

deskr 10/25/2024||

> ... auto-generated code (that many people would avoid in principle anyway)

Auto generated code is 100% enough, sometimes.

dirkt 10/27/2024||

I still have not found any way to use autogenerated code for Java/Spring that can handle updates to an external OpenAPI spec.

Any pointers?

(Serious question).

ericyd 10/25/2024|||

Point #1 was my biggest turn off. Numbers 2 and 3 are good points too.

owlstuffing 10/25/2024|||

> 1) Yet another protocol etc.

Agreed.

> 3) The biggest challenge about data models is not auto-generated code

I would say auto-generated code is most definitely the harder problem to solve, and I’d also go out on a limb and say it is THE problem to solve.

Whether it’s JSON, XML, JavaScript, SQL, or what have you, integrating both data and behavior between languages is paramount. But nothing has changed in the last 40+ years solving this problem, we still generate code the same clumsy way… Chinese wall between systems, separate build steps, and all the problems that go with it.

Something like project manifold[1] for the jvm world is in my view the way forward. Shrug.

1. https://github.com/manifold-systems/manifold

dominicrose 10/25/2024|||

also the output in markdown and php doesn't seem good

nsjdjwnn 10/25/2024||

I mean, Java and Go are strongly typed languages if you consider Object a = new Integer(); a = new Float(); to be strong.

They are also strict of cause

benatkin 10/24/2024||

Have you heard of wit? I suspect we'll see use outside of WebAssembly. https://component-model.bytecodealliance.org/design/wit.html

It has non-nullable types, via option, which makes non-nullable the default, since you have to explicitly wrap it in option. https://component-model.bytecodealliance.org/design/wit.html...

A way to represent types commonly found in major languages would be nice, but it would be better to start with something like wit and build on top of it, or at least have a lot of overlap with it.

pdimitar 10/27/2024|

That was a great read and it gave me several ideas. Thank you.

matthewtovbin 10/24/2024||

Why reinvent https://json-schema.org ?? Pros/cons?

michaelsalim 10/25/2024||

From my understanding, JSON schema describes the schema of JSON objects with JSON. This one describes a variety of types of schemas with JSON.

So it could be typescript, Go, GraphQL, etc. It seems to output to JSON schema as well. I guess its main purpose is to share the schema between different languages. Which I imagine works with JSON schema too, but this takes it a step further and handle all the mapping you'd need to do otherwise.

froh 10/25/2024||

json schema has nuanced and expressive constraints to validate information exchanged in json serialization.

typeschema in contrast seems to focus on describing just the structure of data with the goal to generate stubs in a wide variety of programming languages.

oaiey 10/25/2024||

so why not sub-setting JSON Schema? Like done with XML Infoset for example compared to XSD. And extensions are also possible to achieve POCO details as needed.

cmgriffing 10/24/2024||

I find it interesting that the Go serialization just duplicates the props rather than using composition: https://typeschema.org/example/go

Seems a bit naively implemented.

Ideally, the duplicated props in Student would just be a single line of `Human`.

Onawa 10/24/2024||

Comparison between TypeSchema and LinkML for those interested as I was. https://www.perplexity.ai/search/please-compare-and-contrast...

whizzter 10/24/2024||

What's the benefit over existing variants like Swagger/OpenAPI/JsonSchema ?

mariocesar 10/24/2024||

It feels like a convert solution, as it can transform TypeSchema into JsonSchema.

8338550bff96 10/24/2024|||

Yeah, I'm not really following the line of reasoning presented on the "/history" page: https://typeschema.org/history

It seems to me like a mischaracterization of JSON Schema to say you can't define a concrete type without actual data.

I am a very stupid individual so I could be misunderstanding the argument.

andix 10/24/2024|||

I can't really follow those arguments either. For example the empty object example {}. Why is this bad? Types without properties are a real thing. Also an empty schema is a real thing.

The thought I do get: JSON Schema primarily describes one main document (object/thing). And additionally defines named types (#/definitions/Student). But it's totally fine to just use the definitions for code generation.

The reference semantics of JSON Schema is quite powerful, a little bit like XML with XSD and all the different imports and addons.

llamaLord 10/25/2024||

Maybe it's just me, but I've never been able to get a complex type schema to work properly with JSON schema.

The moment you have types referencing other types in a way that can become recursive in ANY way, the whole thing seems to explode.

dangsux 10/24/2024|||

[dead]

RedShift1 10/24/2024|||

Heh feels like Json schema to me too... Same, but different.

drdaeman 10/24/2024||

Feels much weaker/naive than JSON Schema, as TypeSchema barely has any constraints.

The TypeSchema spec is hard to comprehend as it doesn't delve into any details and looks like just a bunch of random examples with comments than a proper definitive document (e.g. they don't ever seem to define what "date-time" string format is). I don't see a way to say, e.g., that a string must be an UUIDv7, or that an integer must be non-negative, or support for heterogeneous collections, etc etc.

Maybe it has some uses for code generation across multiple languages for very simple JSON structures, but that feels like a very niche use case. And even then, if you have to hook up per-language validation logic anyway (and probably language-specific patterns too, to express concepts idiomatically), what's the point of a code generator?

amanzi 10/24/2024||

"What is the difference to JSON Schema? JSON Schema is a constraint system which is designed to validate JSON data. Such a constraint system is not great for code generation, with TypeSchema our focus is to model data to be able to generate high quality code."

They have more details on the History page.

dragonwriter 10/24/2024|||

Those are certainly words, but since the words they use to describe what differentiates them form JSON Schema is just asserting that their thing is for exactly what has always motivated schema languages including, but not limited to, JSON Schema, and since JSON Schema supports that purpose far better, I am left confused

At best, I can guess that maybe they are trying to get at the fact that JSON schema supports some structures that can be awkward or unidiomatic for data models in some languages (a lot of what you can do via allOf or oneOf fits this) and they want to narrow down to something where what can be defined in the schema language is also simple idiomatic structures nearly everywhere, but a restricted profile of JSON Schema would get you there much faster than starting from the ground up.

drdaeman 10/24/2024|||

> narrow down to something where what can be defined in the schema language is also simple idiomatic structures nearly everywhere

It feels more like a lowest common denominator to me, which is frequently (in presence of anything non-trivial) the opposite of idiomatic.

For example, JSON does not have monetary/decimal type, best option available is a string. It would be very opposite of idiomatic to have a C# or Python code use a string in the record/dataclass, instead of a decimal, if the actual JSON document field has the "monetary value" semantic.

And TypeSchema seem to ignore aspects like nullability and presence requirements, making assumptions that everything can be null (which can be wrong and even harmful, as if Java haven't taught us anything).

Maybe I'm thinking wrong about it and the idea is to have separate wire and public API formats, though, where the wire format is minimal JSON (TypeSchema can work, I guess, although I still have concerns about nulls - and distinguishing between nulls and absence of the field) and then that intermediate almost-over-the-wire-but-deserialized-from-JSON-blob object representation is adapted into a language-specific idiomatic structure. I always felt that such approach is way too verbose and adds a lot of boilerplate without a good reason, but I could be wrong about it.

dragonwriter 10/24/2024||

Yeah, “idiomatic” may have been a poor word choice, I really meant closer to “simply representable”. oneOf, for instance, lets you very easily define flexible, concise structures in JSON Schema that OO languages without union types may not express naturally if at all, and which may not be natural to work with even if they cna be expressed in many languages.

drdaeman 10/24/2024||

This makes sense, but I think it's even a better reason to not use a code generator (which forces certain patterns on your code), but rather think about the best language-native way to express a certain concept you want to express.

HelloNurse 10/25/2024||||

Priority to high quality generation of good code from nice schemas that allow it (accepting that the schemas will be not very expressive and often too loose) vs. priority to faithfully representing and validating JSON documents that conform to general, detailed schemas (accepting that code generation won't be particularly flexible).

andix 10/24/2024|||

Restricting JSON Schema would've been my approach to this "problem" too.

drdaeman 10/24/2024|||

Yeah, I've edited my comment above and added the last paragraph with a note about it. Must be a really weird use case when you need to write a bunch of code in different languages (probably writing libraries for some API or JSON-based data interchange format?), and is also not concerned about validation and language - because if you need validation, you're writing code by hand either way, so code generation becomes a curse rather than a blessing.

I would've understood if it would do the inverse - read source code in any of the supported languages, and check if the structures it define it conforms to the schema. That would make sense for testing that those structs aren't too diverging between codebases (have the same-shaped fields). Even then I'm not sure I see the point because various languages tend to use different language-specific things (like an UUID type or enums/symbols/atoms, etc.) to make developer feel at home rather than in a barren JSONland.

mchicken 10/24/2024||

It looks far more constrained, especially when it comes to the validation logic, which makes sense validation-wise but honestly quickly becomes a "fate shovels shit in my face" kind of situation when it comes to code generation. As much as I love this sort of constraints I also find the union-type discrimination style "meh".

ssousa666 10/25/2024||

Kotlin classes are (seemingly) all generated as open classes, rather than data classes. Surprising choice - is this an intentional design decision? Wondering if I am missing something

tauntz 10/25/2024|

The output in various languages in rather questionable. Not wrong per-se as it's totally valid code, but just.. not idiomatic and not how a developer fluent in that language would implement it.

nicholaswmin 10/29/2024||

Hi man - Don't take my tone the wrong way but it's the only way i can express this. I will never, ever - EVER use your craft project without a complete series of unit-tests. Especially one like yours. I stop reading immediately and just go on about my life.

Good effort though.

Edit: Oh I thought it was yours. Well I'll leave this up anyway.

cernocky 10/25/2024||

I once read a paper about Apache/Meta Thrift [1,2]. Similarly, it allows the definition of data types/interfaces and code generation for many programming languages. It was specifically designed for RPCs and microservices.

[1]: https://thrift.apache.org/

[2]: https://github.com/facebook/fbthrift

bobbylarrybobby 10/24/2024|

The rust generator seems not to place generic parameters on the type itself?

use serde::{Serialize, Deserialize}; #[derive(Serialize, Deserialize)] pub struct Map { #[serde(rename = "totalResults")] total_results: Option<u64>,

    #[serde(rename = "entries")]
    entries: Option<Vec<T>>,

}

More comments...