Today, I opened our error logs and was welcomed by 101 brand new exceptions on our production system. 😱
Surprise exceptions on prod? You can probably imagine my terror.
As usual, I checked out the stacktrace. What should I say - I was a bit puzzled.
Unexpected exception with 'Could not commit JPA transaction; nested exception is javax.persistence.RollbackException: Error while committing the transaction'
org.springframework.transaction.TransactionSystemException: Could not commit JPA transaction; nested exception is javax.persistence.RollbackException: Error while committing the transaction
at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:571)
at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:711)
at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSupport.java:654)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:407)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:119)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:753)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:698)
at com.aitme.ucs.module.unit.core.service.AcceptUnitHeartbeatService$$EnhancerBySpringCGLIB$$d550b74d.execute(<generated>)
....
Caused by: java.lang.NullPointerException
at org.hibernate.type.LongType.next(LongType.java:62)
at org.hibernate.type.LongType.next(LongType.java:23)
at org.hibernate.engine.internal.Versioning.increment(Versioning.java:92)
...
“Could not commit JPA transaction”? Uhm… Okay.
Luckily, there was a small hint. It had something to do with a NullPointerException
, thrown in hibernate’s Versioning
class. The actual reason was something inside of our AcceptUnitHeartbeatService.execute
method (yes I know it’s a bad name). This is was a bit unintuitive, because of spring’s AOP proxy generated by the @Transactional
annotation.
I went looking for the cause. Why did this suddenly happen? Why did it not happen in the development environment? What code change caused this issue? I was recently working on this class and did some changes on repositories, but these changes had been deployed for weeks. So my changes probably weren’t the issue here. I went back and checked the exception cause again. Ok, Versioning
was the culprit.
The entity that’s modified here is the representation of one of our robots. It has a version
field annotated with @Versioned
. Is versioning suddenly broken? Did we update hibernate to a newer version? And if so, why a NullPointerException at org.hibernate.type.LongType.next
? The next
-function looks like this:
public Long next(Long current, SharedSessionContractImplementor session) {
return current + 1L;
}
The only thing that could have caused a NullPointer here is current
being null
. Does that mean that there is a versioned entity with a null
version in the database? How is this possible? Is our data corrupted or did somebody modify the data (shouldn’t this be impossible on prod)?
It turns out, it was way simpler. Today, somebody turned on one of the older prototype robots. It was happily sending heartbeat events, but it did have a null
version entry in it’s database row. The simple reason this never happened until today is that this robot has never been live in production before.
And what’s the moral of the story you ask?
First, when you encounter a strange bug, check the whole environment. This goes for other services as well for the people that use your services (or robots). Second, it’s a good idea to have a not-null constraint on your database table’s version column. Oh yeah, and obviously: Select (or in this case, @Versioned
) isn’t broken.