And context can be extremely tailored to your niche: specific inventory, from a specific supplier, for a specific user of a specific B2B client of a specific business model subtype, who should or shouldn’t see certain features on that specific inventory at certain times.
When you can write your own logic, and just run this in a tight loop as easily and performantly as you can use a constant, it makes your business incredibly agile. Think some text might change for some customers? Just write the code to make it configurable, and you get tests and flags for free.
Sadly, that zero-hop setup requires a sophisticated client execution engine, which it doesn’t appear Cloudflare has implemented here. Makes sense for their memory constrained workers, less sense for traditional infrastructure.
Statsig has an approach here that I quite like:
> To be able to do this, Server SDKs hold the entire ruleset of your project in memory - a representation of each gate or experiment in JSON. On client SDKs, we evaluate all of the gates/experiments when you call initialize - on our servers.
https://docs.statsig.com/sdks/how-evaluation-works
You can also roll your own - just sync your rulesets to a few data structures every few seconds in a background thread and atomically swap the reference to them. Then you just need a CRUD interface over the applicability ruleset dimensions.
Just be careful to have governance on who can play with which would-be constants. Great power and great responsibility and all that!
For me feature flags go along with trunk based development to enable features in QA settings, but not on PROD yet, for PO/PM testing. Trunk based development allows for fast/easy devops, without complicated branching strategies.
Application configuration is, for me, part of the application and has the business context for customizing the application accordingly. Not sure if there are specific frameworks/tools out there. But one should clearly distinguish these two.
feature flags are perfect for configuration and customization, why using them for this purpose is 'misuse' is beyond me and I've heard this claim from multiple people. they're literally configuration. feature with a flag to turn it on, off or give the flag a value. where's the misuse? is it a problem I'm not running experiments when switching over redis to valkey or whatever?
If it's config/customization, it should be in code. If it's experimental it can be a flag until it solidifies, and then it needs to get moved to code.
When I was at Shopify a couple of years ago they mandated that feature flags had to be short-lived (Like 2-4w lifetime tops, some had exceptions) because they would end up getting left in code and never cleaned up, or for extended periods of time like months. Hard to tell if it's genuinely a "feature flag" or actually just a normal part of the system at that point.
Feature flags being flipped in prod was also a major source of incidents, in part because people didn't treat them as experimental and with the associated risk profile of something experimental.
The only exception where having long-lived flags was useful and required was for operational killswitches (E.g. disable Apple Pay because it's having issues), but that is explicitly not application config.
This is the kind of design wisdom that’s both true and difficult to win an argument over.
It reminds me of arguments related to over-engineering and complexity. The principles are super important to having a codebase that scales and continues to be efficient to work in as the team grows, but they are hard to objectively measure.
Locally or in isolation something may sound like a great idea. Being able to step back and see the greater ripple effects require some experience and intuition that can’t always be used to convince people otherwise.
Notably feature flags triggering incidents is expected and desired vs the alternative of shipping the code and having to roll a release back because there is no other way to remove the feature from prod.
When someone else flips a flag that impacts your team and they have no idea they even caused a problem, it becomes very difficult to resolve the issue. Usually you can check for recent deploys, instead you have to go and guess at which feature flag which was recently flipped could possibly be affecting your code. I experienced this several times.
Also, it was actually more desirable for most of these things to go straight to production. Test it properly before shipping, then when you ship it soaks on a 5% traffic canary at which point you can monitor and cancel the deploy if you see errors. That is generally safer than a feature flag rollout unless you are doing something very high impact/risk, in large part because it gives any other team affected by your rollout the ability to respond and be able to easily find the source of errors.
In my org it was a fairly common failure mode to ship something and accidentally cause an issue for another team. Usually it was other teams/orgs shipping things that impacted us.
You just have to label them as such and prevent other teams from fiddling with them.
This is not an antipattern, it's just semantic hand-wringing.
My team managed critical systems in the online flow of billions of dollars of daily payment volume. We also wrote the feature flag system that the rest of the company used. Not only were we completely fine with feature flags as long-lived control plane levers, we heavily used the system that way ourselves.
You just have to clearly distinguish between ephemeral rollout flags (and clean them up or expire them) and the permanent control plane levers.
It's the exact same functionality for both sets of tools. Just different practices around the two usages.
I don't think that is what most people colloquially mean by "feature flags" though. Even most teams in Shopify abused "ephemeral" flags for long periods of time.
When they rolled out the mandate it was very annoying for my team because we had a lot of operational flags like you're describing that we needed to get exemptions for.
Feature flags are gates for whether a piece of code runs; basically, an if-condition. Remote configs are a mechanism for changing runtime values without redeploying[1].
For example:
# Feature flag — variant gate for rollout
flag = sdk.check_gate(user, "checkout_flow")
if flag == 'open':
render_new_checkout()
elif flag == 'warning':
render_warning_checkout()
else:
render_old_checkout()
# Raw remote config pulled — structured values for tuning behavior
config = sdk.get_config(user, "checkout_settings") # if the config changes based on user or context, this "remote" config is considered "dynamic"
timeout_ms = config.get("timeout_ms", 5000)
max_items = config.get("max_items", 50)
allowed_tlds = config.get("allowed_tlds", [".com", ".org"])
In practice, feature flags are implemented on top of dynamic configs[2] to manage the temporary lifecycle of a feature — aka, ship a new block of code, ramp its execution up to 100%, then delete the flag. Whereas dynamic configs are a deeper primitive meant for semi-permanent/safer operations like tuning rate limits or changing text copy on a marketing website.As I've seen it: the forcing function that separates the concepts are experimentation platforms: when human-control of feature flags is shared (via dynamic configs) with automated & randomized assignments. That's how Statsig built their system and, in part, why they could sell for a billion. Whereas companies that ignored the difference, like LaunchDarkly, struggled outside of feature flags.
[1] https://engineering.atspotify.com/2020/10/spotifys-new-exper...
[2] https://docs.statsig.com/dynamic-config/overview https://blog.x.com/engineering/en_us/topics/infrastructure/2...
They literally are configuration.
Guys this is exactly the kind of banal crap that makes a simple app into a monsterous beast that won't work unless it's connected to the internet.
Feature flags are set once at startup (or specific events like hard refresh, or new login) and then simply included in the request headers.
It's not rocket science, but I'm sure people are free to overcomplicate it.
So glad Flipper exists and I don't have to deal with this stuff anymore.
It doesn't have to be sophisticated and they don't need to implement it themselves. They piggy-back on OpenFeature where the client libraries have a simple targeting rule evaluation engine integrated.
https://www.stigg.io/blog-posts/entitlements-untangled-the-m...
The per-seat billing we have in our agreement is a bit rough but it's workable.
wait what? what kind of logic do you need to do that CF Workers can't do?
If you boil it down to this, you may as well boil down every service that exists to bits-as-a-service.
Turns out theres legitimate business value in these things, and complexity in delivering them.
Dropbox has modified it
https://github.com/dropbox/librsync
This is why I prefer open source software. I can modify it
One person can use librsync to create a Dropbox company. Another person can use librsync for noncommercial purposes, e.g., to transfer and sync their own files
Either way, it's librsync
You could call any SaaS tool "excel-as-a-service" and it would hold the same power as your comment.
JSON in the repo also risks introducing customer data to git if you want to rollout based on specific customer attributes (sometimes, for us, it's a list of early opt-in customers we have meetings with to discuss/develop new features)
It's also less accessible for "business users" like product/project managers, sales, and marketing they want to coordinate feature rollout with other business initiatives (and don't want to bother engineers when they do)
Often problems are more complex than they seem at first sight and I have found it’s a good approach to think “what am I missing” rather than “lots of people must be making very obviously bad decisions” and reach the latter conclusion only after more work. Usually I’ve missed something.
If it were that easy people would not be paying for it.
We're a small company but new feature release for big features is typically targeted at low risk users/customers first. That usually means a few attributes are taken into account (age, customer value, customer sentiment, which features they use)
> The client provider requires an API token to fetch flag values. This token is not scoped to a single app, so anyone with the token can evaluate flags across all apps in your account. Use the client provider with caution in public-facing applications.
https://developers.cloudflare.com/flagship/sdk/client-provid...
Can anyone clarify... why the client SDK, designed to be deployed to browsers, requires caution? Does this mean that any client could send requests with a new targetingKey and observe other users' flags?
While flags probably shouldn't be critical information, this seems like an interesting design choice.
There is no way 6 months ago someone at CloudFlare thought it was a good idea to build a competitor to say LaunchDarkly.
Their recent features / announcements have been equivalent to:
(LaunchDarkly)
Resend, Firecrawl, CrewAI, Helicone, Replicate, Pinecone
-
Which like… many companies have a painful procurement process. If all you need is Cloudflare, and prices are within reason- why not use them
instead of polishing their existing products (and most of them do require a lot of work) they jump into any other niche someone thought was a good idea. My guess is that with ai being able to prototype things quickly they just started doing everything that is even a bit relevant.
which won't end well
Their core network stuff always seemed pretty robust but all the newer stuff was much more thrown together. Thinking specifically of Zero Trust/Argo Tunnels which has been around a few years (and I do like) but has some rough edges.
If you’re specifically thinking of native ephemeral workers with very fast startup, it seems like those would have to be sandboxed somehow, and WebAssembly seems like a decent solution. Is there really a significant native code gap between WebAssembly workers and native containers?
Kind of relevant on those cheapskate projects that only start paying licenses after the SOW is signed, but already expect some kind of prototyping in place.
WebAssembly is a solution looking for a problem outside the browser, with worse development experience.
If I want bytecode based runtimes, I already have them with first class development experience, and decades of deployment experience, between Erlang, JVM and CLR.
Not sure if that's possible/how easy it is on Vercel
Here's why we built it!
How many minutes do I need to wait until app-scoped tokens are live?
How can we possibly trust the AI to disable the 'CODE_IS_SKYNET' flag.
This does not apply to Cloudflare, especially not for an auth token that needs to be published on your website that cannot be restricted.
I think you're thinking this:
> If you are not embarrassed by the first version of your product, you've launched too late. -Reid Hoffman
https://blog.cloudflare.com/enterprise-grade-features-for-al...
—-
I don’t believe a single enterprise only feature has made its way to lower tier (paid) account yet.
I’m most interested in:
https://developers.cloudflare.com/speed/optimization/content...
https://community.cloudflare.com/t/making-enterprise-product...
Still using AWS for email sending so that will be great when it comes
> It is in the works. The billing team has been sprinting to fix a lot of debt in this area. I don’t have a date.
I agree that problem is not particularly hard in the grand scheme of things, but it is actually quite big, meaning it requires a lot of features that aren't obvious at first glance.
Edit: Thought of another analogy that may help explain the complexity. At their heart, feature flags are really a permissioning system: only certain users get access to certain pieces of functionality. Anyone who has ever dealt with permission systems know how complex they can be: group membership, including hierarchical groups, roles, ACLs, etc. All of those things are really analogous (actually, a subset really) to the various types of targeting rules that can be used in a feature flags system.
Cloudflare themselves even uses them internally as such, by shipping new features/builds to their free customers first, and then progressively larger customers after a settling period.
Feature flags can also be randomly turned on, for a sort of fuzz testing. Don't think of them just as 'new things' - it could be 'changed behavior'.
I guess you could think of them as a boolean on every client but they're generally not implemented that way.
To me the main appeal of feature flags is that they allow to work on large features that often require months and many commits to finish in a main branch. This, at least to me, results in a more lightweight and more iterative development process. This contrasts with maintaining a separate branch, with perhaps separate deployment target for a large in-development features.
How do you set a boolean to only return true for queries to 5% of the fleet? And which 5% of the fleet? And then ramp up on a predefined cadence? Or how about returning true only for customers in the preview group for the feature? Does the database return false automatically if the 5% of the fleet where it's true start crashing or throwing exceptions? Does it hook into your observability stack?
Fundamentally, sure, you could just implement it as a boolean in the database. It's the integration and tooling that works with the rest of your stack that makes it worthy of the name "feature flag".
I guess I like boring software too much to reach for a dependency but I do see how the tooling matters here.
Oh, that's right, you just spouted a "big company bad" mantra without bothering to read the article. Look, I know saying RTFA goes against the HN guidelines, but the amount of increasingly lazy spew i see from folks (or bots) who haven't bothered to read the article is so tiresome and annoying.
Full disclosure, I am the CTO of Flagsmith, and we have seen a clear curve in adoption of OpenFeature over the last few years. It used to be that we were pushing customers to try it out, now they come to us with OpenFeature as a requirement.
The vendor support is pretty mature now and there is coverage across almost all languages. If you're integrating feature flags into a new service, or looking to migrate from e.g. home-grown to a third party solution, OpenFeature is definitely the way I would recommend going.
It took like 2 weeks to build a full custom backend. SDKs across languages worked flawlessly (okay, we did find one bug, reported it, and it was fixed within the day)
Check a config, bdd value, env var to dynamically go one path or the other.
That’s all, you must either have a small feature or refactor the code to easily switch at a high level.
If you are not able to do so easily, then yes, complex feature flags implementations might help you, to coordinate feature activation between micro services.
Or if you have many features then a dashboard might be useful.
But I would argue that both are serious indicators that you should avoid feature flags, they are better for local and temporary changes, otherwise the complexity compounds and it become hard to manage and maintain.
Ofcourse you don't want users to lose the feature once they exceeded your revenue threshold or cross the border so you'll need to implement some kind of tracking. Your analytics and error tracking also needs to communicate with the feature flag service.
Definitely not rocket science but more complex than a environment variable.
That is, features are contractual and when you've only got 50 customers but they're all paying high 6 figures does anyone really care about feature flag complexity?
"The customer would like the main page blue and another one the red". Would it be feature flag for you?
Not just an argument, it's the entire point of feature flags for ui experiments which is an essential practice. Dynamic adjustment of the cohorts (or even just an immediate kill switch if it's a disaster) is required.
We used Statsig at Function. It started out as 2 of us using it on one product and within 12 months, large amounts of our product copy and rollouts were driven off of it.
Statsig has client side evals so you can write rules and rollouts based on internal concepts without Statsig’s servers processing a piece of user data. Hoping Cloudflare can build a sophisticated product here so I don’t have use another product in the future!