Posted by thomas_witt 7/6/2025
The move is so we can avoid allocating a string each we declare and use it since it will be frozen by default. It is a big optimization for GC mainly. Before we had to do such optimization by hand if we intend not to modify it:
# before
def my_method
do_stuff_with("My String") # 1 allocation at each call
end
# before, optim
MY_STRING = "My String".freeze # this does 2 allocations with 1 at init being GC quite early
def my_method
do_stuff_with(MY_STRING)
end
# after
def my_method
do_stuff_with("My String") # 1 allocation first time
end
But this move also complicates strings manipulation in the sense of it will lean users toward immutable ops that tend to allocate a lot of strings. foo.upcase.reverse
# VS
bar = foo.dup
bar.upcase!
bar.reverse!
So now we have to be deliberate about it: my_string = +"My String" # it is not frozen
We have frozen string literals for quite a while now, enabled file by file with the "frozen_string_literal: true" comment and I've seen it as the recommended way by the community and the de-facto standard in most codebase I've seen. It is generally enforced by code quality tools like Rubocop.So the mutable vs immutable is well known, and as it is part of the language, well, people should know the ins and outs.
I'm just a bit surprised that they devised this long path toward real frozen string literals, because it is already ongoing for years with the "frozen_string_literal: true" comment. Maybe to add proper warnings etc. in a way that does not "touch" code ? I prefer the explicit file by file comment. And for deps, well, the version bump of Ruby adding frozen string literals by default is quite a filter already.
Well, Ruby is well alive and it is what matters)
The original plan was to make the breaking change in 3.0, but that plan was canceled because it broke too much code all at once.
Hence why I proposed this multi-step plan to ease the transition.
See the discussion on the tracker if you are curious: https://bugs.ruby-lang.org/issues/20205
I say sorta late to the party, as I think it is more than fair to say there was not much of a party that folks were interested in in the lisp world. :D
Oh, I think I see some nameless person I know over there. Well-met Lisper, but goodbye!
Would Ruby be as successful if they had all those complicated features right from the start ?
Or do all languages start from a nice simple clean slate tabula rasa to get developers hooked, until the language is enough famous to get well developed and starts to be similar to all others big programming languages ?
> The selectors in parentheses may be replaced with other selectors by modifying the compiler and recompiling all methods in the system. The other selectors are built into the virtual machine.
> Any objects referred to in a CompiledMethod's bytecodes that do not fall into one of the categories above must appear in its literal frame. The objects ordinarily contained in a literal frame are
> shared variables (global, class, and pool)
> most literal constants (numbers, characters, strings, arrays, and symbols)
> most message selectors (those that are not special)
> Objects of these three types may be intermixed in the literal frame. If an object in the literal frame is referenced twice in the same method, it need only appear in the literal frame once. The two bytecodes that refer to the object will refer to the same location in the literal frame.
> Two types of object that were referred to above, temporary variables and shared variables, have not been used in the example methods. The following example method for Rectangle merge: uses both types. The merge: message is used to find a Rectangle that includes the areas in both the receiver and the argument.
http://www.mirandabanda.org/bluebook/bluebook_chapter26.html
'justastring' at: 6 put: $S; yourself
'justaString' .
However #'justasymbol' at: 6 put: $S; yourself
errorNoModification
self error: 'symbols can not be modified.'Evaluating this yields an error in both Squeak and Pharo. What Smalltalk are you using? I'm going to guess Cuis, in which case your example holds, but is misleading. Consider:
a:='justastring'.
b:='justastring'.
a at: 6 put: $S.
a, ' = ', b.
'justaString = justaString' .
Notice, modifying "a" also modified "b," because of the shared literal frame entry. This is why you were traditionally admonished to avoid directly modifying string literals. (Which wasn't an issue given the design of the string classes, and the general poor manners of destructively modifying a string argument of unknown origin.) | a b |
a := 'justastring'.
b := 'justastring'.
a == b
true .The standard library also has String, CString, CStr, OsString, and OsStr.
The latter four are for niche situations. 99.9% of the time, it's similar to Java: &str is Java's String, String is Java's StringBuffer/StringBuilder.
String
&str
&mut str
&'static str
etc.
These are just the language semantics.The other string types are non-Rust strings. Filesystem, C strings, etc. You only deal with them in dealing with specific OS and binding interfaces.
95% of the time you'll just be using String.
Why do you say that? I would say the opposite.
Ruby isn’t making all strings immutable here. Just string literals. You are free to allocate mutable strings that can be appended to, to your heart’s content. It is extremely rare that modifying a literal is intended behavior, since their contents are permanently persisted throughout the lifetime of your program. With your example, this would be like having one shared global buffer for your final document.
Ruby is not a web focused scripting language.
JavaScript is much more of a "web-focused scripting language" than Ruby is, and it is quite happy with immutable strings (only).
> I think the comment is about that you now need to choose mutable vs immutable, and that is framed as a consequence of broader adoption.
Ruby has also had immutable (frozen) strings for a very long time, so you've always had the choice. What is changing is that string literals are (eventually) going to migrate from "mutable with a strong-encouraged file level switch to make them immutable" to "immutable".
Mutable strings are totally possible (and not even especially hard) in compiled, statically typed, and lower-level languages. They're just not especially performant, and are sometimes a footgun.
> all those complicated features right from the start
Arguably, mutable strings are the more complicated feature. Removing them by default simplifies the language, or at least forces you to go out of your way to find the complexity.
What? Mutable strings are more performant generally. Sometimes immutability allows you to use high level algorithms that provide better performance, but most code doesn't take advantage of that.
<< is inplace append operator for strings/arrays, while + is used to make copy. So += will make new string & rebind variable
Good reminder that anyone can go on the internet, just say stuff, and be wrong.
Most but not all of these were performance related. If it took a few days to run that’s fine. Major versions don’t come out that often.
Before that, Ruby did "support encodings" in a sense, but a lot of the APIs were byte oriented. It was awkward in general.
https://web.archive.org/web/20180331093051/http://graysoftin...
I recall it was a bit bumpy, but not all that rough in the end. I suppose static type checking helps here to find all the ways how it could be used. There was a switch to allow running old code (to make strings and buffers interchangeable).
Ruby is not doing that, it's transitioning from mutable strings that can be frozen with no special treatment of literals (unless you opt-in to literals being frozen on per file basis) to mutable strings with all string literals frozen.
With immutable strings literals, string literals can be reused.
You make an arrow function that takes an object as input, and calls another with a string and a field from the object, for instance to populate a lookup table. You probably don’t want someone changing map keys out from under you, because you’ll break resize. So copies are being made to ensure this?
fooLit = "foo"
fooVar = "f".concat("o").concat("o")
This would have fooLit be frozen at parse time. In this situation there would be "foo", "f", and "o" as frozen strings; and fooLit and fooVar would be two different strings since fooVar was created at runtime.Creating a string that happens to be present in the frozen strings wouldn't create a new one.
irb(main):001> str = "f".concat("o").concat("o")
=> "foo"
irb(main):002> str.frozen?
=> false
irb(main):003> str.freeze
=> "foo"
irb(main):004> str.frozen?
=> true
irb(main):005> str = str.concat("bar")
(irb):5:in 'String#concat': can't modify frozen #<Class:#<String:0x000000015807ec58>>: "foo" (FrozenError)
from (irb):5:in '<main>'
from <internal:kernel>:168:in 'Kernel#loop'
from /opt/homebrew/Cellar/ruby/3.4.4/lib/ruby/gems/3.4.0/gems/irb-1.14.3/exe/irb:9:in '<top (required)>'
from /opt/homebrew/opt/ruby/bin/irb:25:in 'Kernel#load'
from /opt/homebrew/opt/ruby/bin/irb:25:in '<main>'1. Strings have a flag (FL_FREEZE) that are set when the string is frozen. This is checked whenever a string would be mutated, to prevent it.
2. There is an interned string table for frozen strings.
> Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation?
This I am less sure about, I poked around in the implementation for a bit, but I am not sure of this answer. It appears to me that it just deletes it, but that cannot be right, I suspect I'm missing something, I only dig around in Ruby internals once or twice a year :)
The interned string table uses weak references. Any string added to the interned string tables has the `FL_FSTR` flag set to it, and when a string a freed, if it has that flag the GC knowns to remove it from the interned string table.
The keyword to know to search for this in the VM is `fstring`, that's what interned strings are called internally:
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
Though since Ruby already has symbols which act as immutable interned strings, frozen literals might just piggyback on that, with frozen strings being symbols under the hood.
Variables don't "contain" a string, they just point to objects on the heap.
So:
my_string = same_string = "Hello World"
Here both variables are essentially pointers to a pre-existing object on the heap, and that object is immutable. SUB_ME = ':sub_me'.freeze
def my_method(method_argument)
foo = 'foo_:sub_me'
foo.sub!(SUB_ME, method_argument)
foo
end
which, without `# frozen_string_literal: true`, I believe allocates a string when the application loads (it sounds like it might be 2) and another string at runtime and then mutate that.That seems like it's better than doing
# frozen_string_literal: true
FOO = 'foo_:sub_me'
SUB_ME = ':sub_me'
def my_method(method_argument)
FOO.sub(SUB_ME, method_argument)
end
because that will allocate the frozen string to `FOO` when the application loads, then make a copy of it to `foo` at runtime, then mutate that copy. That means two strings that never leave memory (FOO, SUB_ME) and one that has to be GCed (return value) instead of just one that never leaves memory (SUB_ME) and one that has to be GCed (foo/return value).This is true in particular when FOO is only used in `my_method`. If it's also used in `my_other_method` and it logically makes sense for both methods to use the same base string, then it's beneficial to use the wider-scope constant.
(The reason this seems reasonable in an application is that the method defines the string, mutates it, and sends it along, which primarily works because I work on a small team. Ostensibly it should send a frozen string, though I rarely do that in practice because my rule is don't mutate a string outside the context in which it was defined, and that seems sensible enough.)
Am I mistaken and/or is there another, perhaps more common pattern that I'm not thinking about that makes this desirable? Presumably I can just add # frozen_string_literal: false to my files if I want so this isn't a complaint. I'm just curious to know the reasoning since it is not obvious to me.