Strings and Methods — Interview Questions¶
Junior Level (5-7 Questions)¶
Q1: What is the difference between == and .equals() for Strings?¶
Answer
- `==` compares **references** (memory addresses) — checks if two variables point to the same object - `.equals()` compares **content** — checks if two strings contain the same characters **Always use `.equals()` for String content comparison.**Q2: What does it mean that Strings are immutable?¶
Answer
Immutability means once a String object is created, its value cannot be changed. Every method that appears to modify a String (like `toUpperCase()`, `replace()`, `concat()`) actually creates and returns a **new** String object. **Benefits of immutability:** - Thread-safe without synchronization - Safe to use as HashMap keys - Enables the String Pool optimization - Prevents accidental modification of shared dataQ3: What is the String Pool?¶
Answer
The String Pool (String Intern Pool) is a special area in the Java heap where the JVM stores unique string literals. When you create a string literal, the JVM checks the pool first: - If the same string exists, it returns a reference to the existing object - If not, it creates a new string in the pool This saves memory when the same string value is used in multiple places.Q4: What is the difference between String, StringBuilder, and StringBuffer?¶
Answer
| Feature | String | StringBuilder | StringBuffer | |---------|--------|---------------|-------------| | Mutability | Immutable | Mutable | Mutable | | Thread-safe | Yes (immutable) | No | Yes (synchronized) | | Performance | Slow for concatenation | Fast | Slower than StringBuilder | | When to use | Fixed text | Building strings (single thread) | Building strings (multi-thread) |Q5: How many String objects are created by new String("Hello")?¶
Answer
**1 or 2 objects:** 1. If `"Hello"` is **not** already in the String Pool, **2 objects** are created: - One in the String Pool (from the literal `"Hello"`) - One on the heap (from the `new` keyword) 2. If `"Hello"` is **already** in the pool, **1 object** is created: - Only the heap object (the pool already has it) The `new` keyword **always** creates a new object on the heap, regardless of the pool.Q6: What does intern() do?¶
Answer
`intern()` returns the String Pool reference for a given string: - If the string is already in the pool, it returns the existing reference - If not, it adds the string to the pool and returns the new referenceQ7: What is the output of this code?¶
String s1 = "Hello";
String s2 = "Hel" + "lo";
String s3 = "Hel";
String s4 = s3 + "lo";
System.out.println(s1 == s2);
System.out.println(s1 == s4);
Answer
- `s1 == s2` is **true** because the compiler performs **constant folding** — `"Hel" + "lo"` is computed at compile time to `"Hello"`, referencing the same pool object. - `s1 == s4` is **false** because `s3 + "lo"` involves a **variable**, so the concatenation happens at runtime, creating a new object on the heap.Middle Level (4-6 Questions)¶
Q1: Explain Compact Strings in Java 9+.¶
Answer
Before Java 9, Strings used `char[]` internally (2 bytes per character). Java 9 introduced Compact Strings: - Strings are now backed by `byte[]` instead of `char[]` - A `coder` field indicates the encoding: - `LATIN1` (0): 1 byte per character — for strings with only ISO-8859-1 characters - `UTF16` (1): 2 bytes per character — for strings with characters outside Latin-1 **Impact:** - ~50% memory reduction for ASCII-heavy applications (most real-world apps) - No API changes — completely transparent to application code - Can be disabled with `-XX:-CompactStrings` if it causes issuesQ2: Why should you compile regex patterns as static final fields?¶
Answer
Methods like `String.matches()`, `String.split()`, and `String.replaceAll()` compile the regex pattern on **every call**. This is expensive:// ❌ Compiles regex 10,000 times
for (String line : lines) {
if (line.matches("\\d{4}-\\d{2}-\\d{2}")) { ... }
}
// ✅ Compiles regex ONCE
private static final Pattern DATE_PATTERN = Pattern.compile("\\d{4}-\\d{2}-\\d{2}");
for (String line : lines) {
if (DATE_PATTERN.matcher(line).matches()) { ... }
}
Q3: How does invokedynamic improve String concatenation in Java 9+?¶
Answer
Java 8 compiled `a + b` to: Java 9+ compiles it to an `invokedynamic` call that delegates to `StringConcatFactory`. The factory chooses the optimal strategy at runtime: 1. **Pre-sizes the result buffer** based on known parts 2. **May skip StringBuilder entirely** — directly constructs the result `byte[]` 3. **Adapts to runtime conditions** — the strategy is bound once and reused Benefits: - Fewer intermediate objects (less GC pressure) - Better performance (especially for simple concatenations) - Future JVM versions can improve the strategy without recompiling codeQ4: What is the difference between trim() and strip() in Java 11+?¶
Answer
| Method | Characters Removed | Java Version | |--------|--------------------|-------------| | `trim()` | ASCII whitespace only (chars <= `\u0020`) | Java 1.0+ | | `strip()` | All Unicode whitespace (`Character.isWhitespace()`) | Java 11+ |String s = "\u2003Hello\u2003"; // Em-space (Unicode)
System.out.println(s.trim().equals("Hello")); // false — trim doesn't recognize em-space
System.out.println(s.strip().equals("Hello")); // true — strip handles Unicode
// Also available:
" hello ".stripLeading(); // "hello "
" hello ".stripTrailing(); // " hello"
Q5: Explain String deduplication in G1 GC.¶
Answer
G1 GC String deduplication (enabled with `-XX:+UseStringDeduplication`) works by: 1. During young generation GC, the collector identifies String objects 2. It computes a hash of the String's internal `byte[]` array 3. If another String with the same content exists, both Strings are updated to share the same `byte[]` array 4. The duplicate `byte[]` becomes eligible for garbage collection **Key points:** - It deduplicates the underlying `byte[]`, not the String objects themselves - Two String objects will still exist, but share one array - It runs concurrently on a low-priority thread — no application pause - Only targets young-gen strings (configurable with `-XX:StringDeduplicationAgeThreshold`) - No API or behavior change — completely transparent **When to use:** Applications with many duplicate strings loaded from external sources (databases, files, network).Q6: What are the security implications of storing passwords as Strings?¶
Answer
Strings are a **security risk for sensitive data** because: 1. **Immutability:** You cannot overwrite the contents — the password remains in memory until GC 2. **String Pool:** If interned, the password may persist for the entire JVM lifetime 3. **Heap dumps:** Memory dumps expose all string values in plain text 4. **Core dumps:** OS-level crashes can write heap contents to disk **Best practice — use `char[]`:** This is why `JPasswordField.getPassword()` returns `char[]`, not `String`.Senior Level (4-6 Questions)¶
Q1: How would you architect a high-throughput text processing pipeline in Java?¶
Answer
Key architectural decisions: 1. **Decode late:** Keep data as `byte[]` or `ByteBuffer` as long as possible. Decode to String only when text operations are needed. 2. **Use `CharSequence` in APIs:** Accept `CharSequence` instead of `String` to allow callers to pass `StringBuilder`, `CharBuffer`, etc. without conversion. 3. **Pool builders:** Use `ThreadLocalQ2: Explain the Rope data structure and when you would use it over String.¶
Answer
A **Rope** is a balanced binary tree where leaf nodes contain short string fragments. It provides: | Operation | String | Rope | |-----------|--------|------| | Concatenation | O(N) — copy entire array | O(1) — create new branch node | | Insertion at position | O(N) — shift and copy | O(log N) — split and rejoin | | Deletion | O(N) — copy remaining | O(log N) — restructure tree | | charAt | O(1) | O(log N) | | toString | O(1) — return self | O(N) — traverse and collect | **When to use Rope:** - Text editors with frequent insert/delete operations - Document processing systems - Very large strings (MB+) with many modifications **When NOT to use:** - Small strings — overhead of tree structure exceeds benefit - Read-heavy workloads — `charAt()` is O(log N) vs O(1) - Applications that need the full String APIQ3: How does the JIT compiler optimize String.equals()?¶
Answer
The HotSpot JIT compiler treats `String.equals()` as an **intrinsic** — replacing the bytecode with hand-tuned machine code: 1. **Reference check:** If `this == other`, return `true` immediately 2. **Type check:** If `other` is not a String, return `false` 3. **Length check:** Compare `value.length` (different lengths = not equal) 4. **Coder check:** Compare `coder` fields (LATIN1 vs UTF16 mismatch = not equal) 5. **Vectorized comparison:** Use SIMD instructions (SSE4.2 / AVX2) to compare 16-32 bytes at a time On x86_64 with AVX2: This makes `equals()` significantly faster than the pure Java implementation, especially for long strings.Q4: What are the implications of StringTable sizing for application performance?¶
Answer
The `StringTable` is a hash table with a fixed number of buckets. Performance implications: **Too small (default 65536 buckets):** - Many hash collisions → long chains - `intern()` degrades from O(1) to O(N) per lookup - Constant pool resolution during class loading slows down **Too large:** - Wastes native memory - Cache unfriendly if mostly empty **Tuning approach:** **Java 15+:** StringTable supports concurrent resizing, making manual tuning less critical. **Key metric:** If maximum bucket size > 10, the table is too small.Q5: How would you diagnose and fix a memory issue caused by String retention?¶
Answer
**Diagnosis steps:** 1. **Heap dump analysis:**jmap -dump:live,format=b,file=heap.hprof <pid>
# Open in Eclipse MAT or VisualVM
# Look for: String dominator tree, retained size
Scenario-Based Questions (3-5)¶
Scenario 1: You notice your microservice has high memory usage. Heap dump shows 40% of heap is String objects with many duplicates. How do you fix this?¶
Answer
**Immediate actions:** 1. Enable G1 String Deduplication: This is zero-code-change and can reduce duplicate string memory by 20-40%. 2. Profile the source of duplicates: **Code-level fixes based on source:** - **Database queries returning repeated values:** Use a `StringRegistry` (ConcurrentHashMap-based canonicalization) for columns like status, country_code - **JSON deserialization:** Configure Jackson to intern field names: - **Logging:** Switch to parameterized logging, verify log levels **Verify impact:**Scenario 2: Your REST API endpoint that generates CSV export is running out of memory for large datasets. The current code builds the entire CSV as a String before returning. How do you fix it?¶
Answer
**Problem:** Building a complete CSV String for 1M rows consumes gigabytes of heap. **Solution: Stream the response instead of buffering:**@GetMapping(value = "/export", produces = "text/csv")
public ResponseEntity<StreamingResponseBody> exportCsv() {
StreamingResponseBody stream = outputStream -> {
BufferedWriter writer = new BufferedWriter(
new OutputStreamWriter(outputStream, StandardCharsets.UTF_8));
writer.write("id,name,email\n");
// Stream from database in chunks
try (Stream<User> users = userRepository.streamAll()) {
users.forEach(user -> {
try {
writer.write(user.getId() + "," +
escape(user.getName()) + "," +
escape(user.getEmail()) + "\n");
writer.flush(); // flush periodically
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
}
writer.flush();
};
return ResponseEntity.ok()
.header("Content-Disposition", "attachment; filename=export.csv")
.body(stream);
}
Scenario 3: A developer on your team proposes using String.intern() for all user inputs to "save memory." What is your response?¶
Answer
**This is a bad idea.** Here is why: 1. **User inputs are mostly unique** — interning unique strings wastes memory (the pool keeps them forever) and adds lookup overhead (~800ns per `intern()` call for new strings) 2. **StringTable growth** — the intern pool has a fixed-size hash table. Flooding it with millions of unique strings causes: - Long hash chains → O(N) lookups instead of O(1) - Never garbage collected (weak refs are cleaned, but slowly) - Application-wide `intern()` calls slow down (global lock contention) 3. **Security risk** — interned user input persists in memory longer than necessary **Better alternatives:** - For high-repetition fields (status codes, country codes): use application-level `ConcurrentHashMapFAQ¶
Q: Is String concatenation with + always bad?¶
A: No. For simple expressions like String msg = "Hello " + name + "!", the compiler optimizes this efficiently (especially with Java 9+ invokedynamic). It is only problematic inside loops where it creates O(N) temporary objects.
Q: When should I use StringBuffer over StringBuilder?¶
A: Almost never. StringBuffer was the only option before Java 5. Today, use StringBuilder for single-threaded code (99% of cases). Use StringBuffer only when multiple threads are appending to the same builder — but even then, consider collecting per-thread and merging.
Q: Does substring() share the underlying array?¶
A: In Java 6 and earlier, substring() shared the parent String's char[] array (which could cause memory leaks). Since Java 7u6, substring() creates a new array, eliminating this issue.
Q: How does String.format() compare to + concatenation performance-wise?¶
A: String.format() is 3-5x slower than + concatenation because it parses the format string at runtime. Use it for readability when performance is not critical. For high-performance code, use StringBuilder.
Q: What is the maximum String length in Java?¶
A: Theoretically, Integer.MAX_VALUE (2^31 - 1 = ~2.1 billion characters). Practically, it is limited by available heap memory. A 2GB String requires ~2-4GB of heap space.