Java String 为什么不可变？

查看 String 的源码：

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence,
               Constable, ConstantDesc {

    /**
     * The value is used for character storage.
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     *
     * Additionally, it is marked with {@link Stable} to trust the contents
     * of the array. No other facility in JDK provides this functionality (yet).
     * {@link Stable} is safe here, because value is never null.
     */
    @Stable
    private final byte[] value;

    /**
     * The identifier of the encoding used to encode the bytes in
     * {@code value}. The supported values in this implementation are
     *
     * LATIN1
     * UTF16
     *
     * @implNote This field is trusted by the VM, and is a subject to
     * constant folding if String instance is constant. Overwriting this
     * field after construction will cause problems.
     */
    private final byte coder;

    /** Cache the hash code for the string */
    private int hash; // Default to 0

    /**
     * Cache if the hash has been calculated as actually being zero, enabling
     * us to avoid recalculating this.
     */
    private boolean hashIsZero; // Default to false;
    
    ...

其中 value 被 final 修饰，不可指向其它对象。

为什么要设计成不可变？

线程安全。同一个字符串实例可以被多个线程共享，因为字符串不可变，本身就是线程安全的。
支持 hash 映射和缓存。因为 String 的 hash 值经常会使用到，比如作为 Map 的键，不可变的特性使得 hash 值也不会变，不需要重新计算。
出于安全考虑。URL, path , password 通常情况下都是以 String 类型保存，假若 String 不是固定不变的，将会引起各种安全隐患。
字符串常量池优化。 String 对象创建之后，会缓存到字符串常量池中，下次需要创建同样的对象时，可以直接返回缓存的引用。

调用 substring() 等 API 时，本质上是创建了一个新对象然后赋值。

Tips: 为什么自 JDK 9 之后 value 的类型由 char[] 变为了 byte[] ？
主要是为了节约 String 占用的内存。
在大部分 Java 程序的堆内存中， String 占用的空间最大，并且绝大多数 String 只有 Latin-1 字符，这些 Latin-1 字符只需要 1 个字节就够了。
而在 JDK 9 之前，JVM 因为 String 使用 char 数组存储，每个 char 占 2 个字节，所以即使字符串只需要 1 字节，它也要按照 2 字节进行分配，浪费了一半的内存空间。
到了 JDK 9 之后，对于每个字符串，会先判断它是不是只有 Latin-1 字符，如果是，就按照 1 字节的规格进行分配内存，如果不是，就按照 2 字节的规格进行分配，这样便提高了内存使用率，同时 GC 次数也会减少，提升效率。
不过 Latin-1 编码集支持的字符有限，比如不支持中文字符，因此对于中文字符串，用的是 UTF-16 编码（两个字节），所以用 byte[] 和 char[] 实现没什么区别。

Menu

Share

Java String 为什么不可变？

为什么要设计成不可变？

Comment