Entities — Professional¶
What? The JPA/Hibernate machinery a working backend engineer must control:
@Idand the four@GeneratedValuestrategies (IDENTITY, SEQUENCE, AUTO, UUID), surrogate vs. natural key in real schemas, optimistic locking via@Version, soft delete with@SQLDeleteand@Where, and the architecturally important question of whether to keep the domain entity and the JPA entity as one class or two. How? By choosing each annotation deliberately — knowing what SQL Hibernate will emit, when ids are assigned, what happens on flush, what optimistic locking actually checks — and by separating domain concerns from persistence concerns when the codebase justifies the cost.
1. @Id and @GeneratedValue strategies in depth¶
@Entity
public class Customer {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
}
There are four strategies. Each has different timing and SQL behaviour:
a. IDENTITY — column auto-increment¶
- DB column is
BIGINT AUTO_INCREMENT(MySQL) orBIGSERIAL(Postgres) orIDENTITY(SQL Server, H2). - Id is assigned after the INSERT — Hibernate must execute the INSERT immediately on
persist()to get the id back. This disables batch inserts for that entity. - Simple, fast for low-volume writes, awkward at scale.
b. SEQUENCE — database sequence¶
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "cust_seq")
@SequenceGenerator(name = "cust_seq", sequenceName = "customer_seq", allocationSize = 50)
private Long id;
- Hibernate pre-fetches
allocationSizeids in one round-trip and assigns them client-side. - Id is known before INSERT — batch inserts work.
- Postgres and Oracle native; emulated on others.
allocationSize = 50means one sequence call per 50 entities (huge throughput win).
c. AUTO — let the provider decide¶
- Hibernate picks based on dialect. On Postgres, it used to mean SEQUENCE (with a shared
hibernate_sequence) — that legacy default has bitten many teams. AvoidAUTOin new code; be explicit.
d. UUID — application-generated¶
- Hibernate assigns at
persist()time (or you can pre-assign yourself). - Available as
@org.hibernate.annotations.UuidGeneratoron Hibernate 6 withstyle = TIMEfor UUID v7. - 128 bits per id; mitigate with binary representation:
Or with Postgres-native UUID type, which stores it as 16 bytes natively.
| Strategy | Id known before INSERT? | Batch insert? | Default index locality | Notes |
|---|---|---|---|---|
| IDENTITY | No | No | Excellent | Disables batching |
| SEQUENCE | Yes (pre-fetch) | Yes | Excellent | Postgres/Oracle |
| AUTO | Depends | Depends | Depends | Avoid |
| UUID v7 | Yes | Yes | Good | Best of both worlds |
| UUID v4 | Yes | Yes | Poor | Avoid for high-write tables |
2. Surrogate vs natural key — schema reality¶
A surrogate key has no meaning outside the system (a number/UUID assigned for our own convenience). A natural key is something the world already uses to identify the thing (ISBN, VIN, email).
// Surrogate primary key, natural unique constraint on top
@Entity
@Table(name = "users",
uniqueConstraints = @UniqueConstraint(columnNames = "email"))
public class User {
@Id private UUID id; // surrogate — never changes
@Column(nullable = false) private String email; // natural — may change
}
This is the canonical professional pattern: surrogate PK + uniqueness constraint on the human identifier. You get:
- Stable foreign keys (no cascading updates when someone changes email).
- A short, indexable PK column.
- A queryable natural identifier with database-enforced uniqueness.
The mistake is using the natural key as the PK and then discovering, three years in, that it has to change. Surrogate is almost always the right call.
3. Optimistic locking with @Version¶
@Entity
public class Account {
@Id private UUID id;
@Version private long version;
private Money balance;
}
What Hibernate emits:
If two transactions both loaded version = 7, both modified, and both flushed, only the first's UPDATE matches (rows affected = 1, version becomes 8). The second's UPDATE matches zero rows; Hibernate throws OptimisticLockException (wrapped as StaleObjectStateException). The application typically retries.
@Transactional
public void debit(UUID accountId, Money amount) {
int attempts = 0;
while (true) {
try {
Account a = accountRepo.findById(accountId).orElseThrow();
a.debit(amount, TransactionReason.PURCHASE);
accountRepo.save(a);
return;
} catch (OptimisticLockException ex) {
if (++attempts >= 3) throw ex;
// retry with fresh state
}
}
}
Use @Version on every entity that gets updated, not just on a chosen few. The cost is one extra column (bigint or int) and one AND version = ? predicate per UPDATE.
Versioning columns can be:
long/int— incrementing counter (default).Instant/Timestamp— last-modified time. Less reliable on systems where two updates can land in the same clock tick.
Numeric is strongly preferred.
4. Soft delete — @SQLDelete and @Where¶
For audit/compliance reasons, many systems never physically DELETE rows. Instead, they set a deleted = true (or deleted_at = NOW()) flag.
@Entity
@Table(name = "customers")
@SQLDelete(sql = "UPDATE customers SET deleted_at = NOW() WHERE id = ?")
@Where(clause = "deleted_at IS NULL")
public class Customer {
@Id private UUID id;
@Column(name = "deleted_at") private Instant deletedAt;
// ...
}
@SQLDeleterewritesem.remove(c)into the UPDATE statement.@Wherefilters every SELECT to exclude already-deleted rows.@SQLDeleteis Hibernate-specific (no JPA equivalent).
Caveats:
@Whereapplies to associations too — eager-loading a deleted child will return null/empty, which is usually correct.- Unique constraints on natural keys must accommodate "deleted then re-created" — typically by indexing
(email) WHERE deleted_at IS NULL(partial unique index in Postgres) or by appending the deleted timestamp to the unique column. - "Restore deleted" requires raw SQL (bypassing
@Where).
5. Domain Entity vs JPA Entity — when to split¶
Two valid choices, each appropriate in different contexts:
Single-class (domain == JPA entity)¶
package shop.order;
@Entity
@Table(name = "orders")
public class Order {
@Id private UUID id;
@Version private long version;
@Enumerated(EnumType.STRING) private OrderStatus status;
@OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
@JoinColumn(name = "order_id")
private List<LineItem> items = new ArrayList<>();
public void addItem(ProductId p, int qty, Money price) {
requireDraft();
items.add(new LineItem(p, qty, price));
}
public void submit() {
requireDraft();
if (items.isEmpty()) throw new IllegalStateException("Empty order");
this.status = OrderStatus.SUBMITTED;
}
private void requireDraft() {
if (status != OrderStatus.DRAFT) throw new IllegalStateException("Not draft");
}
}
Pros: one file, no mapper, easy to onboard. Cons: domain depends on JPA; can't unit-test without EntityManagerFactory mock; harder to swap persistence.
Two-class (domain pure POJO; JPA record mirrors it)¶
// shop/order/Order.java — pure domain, no jakarta.* imports
package shop.order;
public final class Order { ... }
// shop/order/infrastructure/OrderRecord.java — JPA-only
package shop.order.infrastructure;
@Entity @Table(name = "orders")
public class OrderRecord {
@Id private UUID id;
@Version private long version;
private String status;
// ... no behaviour, only mapping
}
// shop/order/infrastructure/JpaOrderRepository.java
package shop.order.infrastructure;
public class JpaOrderRepository implements OrderRepository {
public Optional<Order> find(OrderId id) {
return em.find(OrderRecord.class, id.value())
.map(OrderMapper::toDomain);
}
public void save(Order o) {
OrderRecord r = OrderMapper.toRecord(o);
em.merge(r);
}
}
Pros: domain depends on no framework; trivial to test; persistence swappable. Cons: mapper drift (one field added, mapper forgotten); two test surfaces; slower to write.
Senior rule: start single-class. Split into two only when (a) you have evidence you need ORM-free domain tests, or (b) the JPA mapping is contorting the domain model (e.g., you need a child entity in the database for performance but it doesn't belong in the domain). Splitting prematurely is a top cause of "the boring code is in the mapper, the interesting code is buried" syndrome.
6. Equality and identity inside Hibernate's persistence context¶
Hibernate guarantees that within one persistence context, two loads of the same id return the same Java reference:
Customer a = em.find(Customer.class, id);
Customer b = em.find(Customer.class, id);
a == b; // true — identity map
This is the first-level cache (or identity map). It means that within a single session, == and equals agree. Across sessions, == no longer holds; you rely on equals by id.
This is also why a session-scoped entity is unsafe to leak across request threads — two threads using the same persistence context end up sharing mutable entities, and Hibernate's session is documented as not thread-safe.
7. Cascade and orphan-removal — entity-level decisions¶
@OneToMany(
mappedBy = "order",
cascade = CascadeType.ALL,
orphanRemoval = true,
fetch = FetchType.LAZY)
private List<LineItem> items = new ArrayList<>();
CascadeType.PERSIST— saving the parent saves new children.CascadeType.MERGE— merging the parent merges children.CascadeType.REMOVE— deleting the parent deletes children.CascadeType.ALL— all of the above plus refresh/detach.orphanRemoval = true— removing a child from the collection issues DELETE for that row.
The combination ALL + orphanRemoval = true matches the aggregate ownership semantics: child entities live and die with their parent. This is exactly the JPA-level encoding of "non-root entity inside an aggregate" from senior-level DDD.
Cascade across aggregate boundaries (Customer ↔ Order) is a design mistake; cross-aggregate references should be by id, with the other aggregate loaded separately via its own repository.
8. Auditing — @CreatedDate, @LastModifiedDate, @Version¶
@Entity
@EntityListeners(AuditingEntityListener.class)
public class Customer {
@Id private UUID id;
@Version private long version;
@CreatedDate
@Column(updatable = false)
private Instant createdAt;
@LastModifiedDate
private Instant updatedAt;
@CreatedBy
@Column(updatable = false)
private String createdBy;
@LastModifiedBy
private String lastModifiedBy;
}
Spring Data's auditing populates these automatically when paired with @EnableJpaAuditing. They are operational metadata — not domain attributes — and belong on every persistent entity, much like @Version.
9. Quick rules¶
- Choose one id strategy per codebase. UUID v7 default; SEQUENCE for legacy
Longschemas; neverAUTO. - Surrogate PK + unique constraint on natural identifier — almost always the right call.
@Versionon every entity that gets UPDATEs. Numeric, not timestamp.- Soft delete via
@SQLDelete + @Where; index unique columns partially to allow recreation. - Start single-class (domain == JPA entity); split only on evidence.
@OneToManyfor owned children:CascadeType.ALL + orphanRemoval = true + LAZYis the default for aggregate-internal entities.- Never cascade across aggregate boundaries; reference other aggregates by id.
equals/hashCodeby id only; constant hashcode while id is null; UUID at construction eliminates that whole problem.IDENTITYdisables batch inserts — for write-heavy tables prefer SEQUENCE or UUID.- Spring Data auditing fields (
@CreatedDate,@LastModifiedDate) plus@Versionbelong on every entity by default.
10. What's next¶
| Topic | File |
|---|---|
| Formal Entity contract — JLS/JPA spec references | specification.md |
Bugs around @Id, @Version, Lombok @Data, lazy proxies | find-bug.md |
| Hibernate caches, dirty checking, batch hints | optimize.md |
| Hands-on exercises | tasks.md |
| Interview Q&A | interview.md |
Value Objects — what JPA @Embeddable is for | ../01-value-objects/ |
| Aggregates — entity clusters with one root | ../03-aggregates/ |
| Repositories — how aggregate roots are loaded and stored | ../04-repository-concept/ |
Memorize this: at the JPA layer, an entity is @Id + an id strategy + @Version + behaviour-bearing methods + the persistence annotations that mirror your aggregate ownership. Pick UUID v7 (or SEQUENCE with allocationSize) over IDENTITY whenever you can; always carry a surrogate PK with the human identifier as a unique constraint; soft-delete via @SQLDelete/@Where; audit via Spring Data listeners. Start with one class that's both domain and JPA entity; split only when the evidence justifies the cost. The JPA professional knows what SQL each annotation emits, when ids are assigned, what optimistic locking actually checks, and why AUTO is a footgun. Vlad Mihalcea's High-Performance Java Persistence (2016, 2nd ed.) is the canonical reference.