[18.0] Add 'Allow Commit' option on job functions by guewen · Pull Request #899 · OCA/queue

guewen · 2026-02-19T15:01:00Z

It is forbidden to commit inside a job, because it releases the job lock and can cause it to start again, while still being run, by the dead jobs requeuer. For some use cases, it may actually be legitimate, or at least be needed in the short term before actual updates in the code.

A new option on the job function, false by default, allow to run the job in a new transaction, at the cost of an additional connection + transaction overhead.

Related to #889

OCA-git-bot · 2026-02-19T15:01:05Z

Hi @sbidoul,
some modules you are maintaining are being modified, check this out!

queue_job/controllers/main.py

sbidoul · 2026-02-19T15:28:58Z

queue_job/models/queue_job_function.py

        "enable, func_name, kwargs.\n"
        "See the module description for details.",
    )
+    run_in_new_cursor = fields.Boolean(


Would it make sense to name this option with something that conveys "allow commit in job". Not sure.

I'll do that, just allow_commit should be enough?

amh-mw · 2026-02-19T15:24:37Z

queue_job/models/queue_job_function.py

            related_action_func_name=None,
            related_action_kwargs={},
            job_function_id=None,
+            run_in_new_cursor=False,


In the interest of magically un-breaking other modules that depend on queue_job, perhaps run_in_new_cursor could default to True for 18.0 and then back to False for 19.0?

Or get the default value from a system parameter (False by default) ?

System parameter that is False by default but with a migration that sets it to True automatically on existing databases?

We could modify _prevent_commit to update the job function option when the error happens, that would give the best of both worlds, but might be a bit too much; also I fear it could create other problems with several jobs attempting to update the same job function.

System parameter that is False by default but with a migration that sets it to True automatically on existing databases?

Why not. I'd be fine with breaking by default with a message explaining all the solutions and tradeoff, in order to raise awareness, but if people prefer to not break by default, good for me too.

We could modify _prevent_commit to update the job function option when the error happens, that would give the best of both worlds, but might be a bit too much; also I fear it could create other problems with several jobs attempting to update the same job function.

Sounds overkill to me too.

I have mixed feelings about the 2 options. My preference would go about breaking by default with an explanation, because otherwise nobody will care and databases will have the option enabled on all jobs forever. On the other side, I'm sure that in many situations nobody monitor the jobs and the issue will be discovered after some time, maybe after real issues due to non-running jobs (well there is a point about accountability regarding monitoring errors and jobs here).

I updated the PR yesterday evening with a system parameter and migration to allow commits by default, but I tend to favor breaking by default after a night sleep, so I'm tempted to remove it. We could add a link in the error message to a page such as this draft: https://github.com/OCA/queue/wiki/%5BDRAFT%5D-Upgrade-warning:-commits-inside-jobs

What do you think @amh-mw @sbidoul ?

My preference would go about breaking by default with an explanation, because otherwise nobody will care and databases will have the option enabled on all jobs forever. On the other side, I'm sure that in many situations nobody monitor the jobs and the issue will be discovered after some time, maybe after real issues due to non-running jobs (well there is a point about accountability regarding monitoring errors and jobs here).

Yeah, doubling the number of database cursors used for all jobs is not the best. 😓 I also lean towards breaking by default and removing the config parameter. Make sure to semver major!

I updated with a link to the explanation

queue_job/job.py

It is forbidden to commit inside a job, because it releases the job lock and can cause it to start again, while still being run, by the dead jobs requeuer. For some use cases, it may actually be legitimate, or at least be needed in the short term before actual updates in the code. A new option on the job function, false by default, allow to run the job in a new transaction, at the cost of an additional connection + transaction overhead. Related to OCA#889

False on new databases, True on existing databases. Should always be False by default on future versions.

This reverts commit b4f3bec.

queue_job/job.py

As the controller changes env on Job instances.

amh-mw

During functional testing, encountered error:

web-1    | 2026-02-24 13:23:47,685 80 ERROR odoo odoo.http: Exception during request handling. 
web-1    | Traceback (most recent call last):
web-1    |   File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 364, in execute
web-1    |     res = self._obj.execute(query, params)
web-1    |           ^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 510, in __getattr__
web-1    |     raise psycopg2.InterfaceError("Cursor already closed")
web-1    | psycopg2.InterfaceError: Cursor already closed
web-1    | 
web-1    | During handling of the above exception, another exception occurred:
web-1    | 
web-1    | Traceback (most recent call last):
web-1    |   File "/usr/lib/python3/dist-packages/odoo/http.py", line 2578, in __call__
web-1    |     response = request._serve_db()
web-1    |                ^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/http.py", line 2105, in _serve_db
web-1    |     return self._transactioning(
web-1    |            ^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/http.py", line 2168, in _transactioning
web-1    |     return service_model.retrying(func, env=self.env)
web-1    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/service/model.py", line 157, in retrying
web-1    |     result = func()
web-1    |              ^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/http.py", line 2135, in _serve_ir_http
web-1    |     response = self.dispatcher.dispatch(rule.endpoint, args)
web-1    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/http.py", line 2296, in dispatch
web-1    |     return self.request.registry['ir.http']._dispatch(endpoint)
web-1    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/addons/base/models/ir_http.py", line 333, in _dispatch
web-1    |     result = endpoint(**request.params)
web-1    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/http.py", line 756, in route_wrapper
web-1    |     result = endpoint(self, *args, **params_ok)
web-1    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/mnt/extra-addons/oca/queue/queue_job/controllers/main.py", line 221, in runjob
web-1    |     self._runjob(env, job)
web-1    |   File "/mnt/extra-addons/oca/queue/queue_job/controllers/main.py", line 190, in _runjob
web-1    |     cls._enqueue_dependent_jobs(env, job)
web-1    |   File "/mnt/extra-addons/oca/queue/queue_job/controllers/main.py", line 118, in _enqueue_dependent_jobs
web-1    |     with job.env.cr.savepoint():
web-1    |          ^^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 189, in savepoint
web-1    |     return _FlushingSavepoint(self)
web-1    |            ^^^^^^^^^^^^^^^^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 126, in __init__
web-1    |     super().__init__(cr)
web-1    |   File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 101, in __init__
web-1    |     cr.execute('SAVEPOINT "%s"' % self.name)
web-1    |   File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 367, in execute
web-1    |     _logger.error("bad query: %s\nERROR: %s", self._obj.query or query, e)
web-1    |                                               ^^^^^^^^^
web-1    |   File "/usr/lib/python3/dist-packages/odoo/sql_db.py", line 510, in __getattr__
web-1    |     raise psycopg2.InterfaceError("Cursor already closed")
web-1    | psycopg2.InterfaceError: Cursor already closed

I have not managed to chase down the origin of this error yet, but I will quickly note that my development environment...

makes heavy use of chain and group to order jobs
contains #767, #768 and some as-yet unpublished changes to queue_job_batch
is running 8 queue job workers out of total 12 workers
made it about 7,500 jobs in before I encountered this error
contains jobs that use savepoint to encapsulate partial work and commit the cursor after success

queue_job/job.py

guewen · 2026-02-24T15:11:38Z

@amh-mw

During functional testing, encountered error:

Thanks for the details, I think the problem is the same as this one: #883 (comment)

Looks good. While you are at it, could you add a return at line 174 of the controller (#883 (comment))?

We try to enqueue the dependent jobs using a closed cursor, but we actually should not try to enqueue them because the job failed. I added the return in the except RetryableJobError as err: branch.

I pushed a new commit with the DRY simplifications you propose, and took benefit of the new temporary env method of the job to use it in the controller:

-            with Registry(job.env.cr.dbname).cursor() as new_cr:
-                job.env = job.env(cr=new_cr)
+            with job.in_temporary_env():

So I made the env setter private in Job, as we no longer change it from the outside.

I created many job graphs / single jobs with the last changes with variations of (added a new failure_retry_seconds to have the job raise a retryable error)

http://localhost:8069/queue_job/create_test_job?commit_within_job=True&failure_rate=0.1&size=400&failure_retry_seconds=1
http://localhost:8069/queue_job/create_test_job?commit_within_job=True&failure_rate=0.2&size=1
http://localhost:8069/queue_job/create_test_job?commit_within_job=True&failure_rate=1&size=1&failure_retry_seconds=1

No issue on 10k jobs, but of course it does not reflect the reality of production.
I tried to find a way to write tests for the controller but could find a convincing way :/

amh-mw

During functional testing, started encountering this error after about 70,000 jobs:

web-1    | 2026-02-24 15:56:06,362 88 ERROR ? odoo.addons.queue_job.jobrunner.runner: exception in GET http://localhost:8069/queue_job/runjob?db=odoo&job_uuid=cd2eb546-40e4-4f8b-acce-04d5aa974157 
web-1    | Traceback (most recent call last):
web-1    |   File "/mnt/extra-addons/oca/queue/queue_job/jobrunner/runner.py", line 106, in urlopen
web-1    |     response.raise_for_status()
web-1    |   File "/usr/lib/python3/dist-packages/requests/models.py", line 1021, in raise_for_status
web-1    |     raise HTTPError(http_error_msg, response=self)
web-1    | requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:8069/queue_job/runjob?db=odoo&job_uuid=cd2eb546-40e4-4f8b-acce-04d5aa974157

All instances of this error have the same process id, which I have confirmed is the WorkerJobRunner. This coincides with a number of jobs raising FailedJobError (internally handled FileNotFoundError for absent production data). Things seem to be continuing as expected (currently at job 200,000), so maybe this is just some additional verbosity?

Edit: I'm going to increase my odoo.addons.queue_job.jobrunner.runner log verbosity and see if I can get a better picture.

amh-mw · 2026-02-24T17:26:13Z

Correlation between raised FailedJobError and HTTPError 500 on my development system:

web-1    | 2026-02-24 17:04:34,487 67 DEBUG odoo odoo.addons.queue_job.controllers.main: <Job 6068b4b1-6255-41be-963a-d750f0b66391, priority:801> started 
web-1    | 2026-02-24 17:04:34,497 67 ERROR odoo odoo.addons.queue_job.controllers.main: Traceback (most recent call last):
...
web-1    |     raise FailedJobError(e)
web-1    | odoo.addons.queue_job.exception.FailedJobError: handled FileNotFoundError
web-1    | 2026-02-24 17:04:34,535 88 DEBUG ? odoo.addons.queue_job.jobrunner.channels: job 6068b4b1-6255-41be-963a-d750f0b66391 marked failed in channel root(C:8,Q:60064,R:7,F:1628) 
web-1    | 2026-02-24 17:04:34,537 88 ERROR ? odoo.addons.queue_job.jobrunner.runner: exception in GET http://localhost:8069/queue_job/runjob?db=odoo&job_uuid=6068b4b1-6255-41be-963a-d750f0b66391 
web-1    | Traceback (most recent call last):
web-1    |   File "/mnt/extra-addons/oca/queue/queue_job/jobrunner/runner.py", line 106, in urlopen
web-1    |     response.raise_for_status()
web-1    |   File "/usr/lib/python3/dist-packages/requests/models.py", line 1021, in raise_for_status
web-1    |     raise HTTPError(http_error_msg, response=self)
web-1    | requests.exceptions.HTTPError: 500 Server Error: INTERNAL SERVER ERROR for url: http://localhost:8069/queue_job/runjob?db=odoo&job_uuid=6068b4b1-6255-41be-963a-d750f0b66391

Contrast to HTTPError 400 on staging system running queue_job 18.0.1.7.0

2026-02-24 02:10:06,499 63 ERROR ? odoo.addons.queue_job.jobrunner.runner: exception in GET http://localhost:8069/queue_job/runjob?db=odoo&job_uuid=a8068655-eb98-4bc8-895d-449462d86ccf
--
Traceback (most recent call last):
File "/usr/local/src/metricwise/oca/queue/queue_job/jobrunner/runner.py", line 253, in urlopen
response.raise_for_status()
File "/usr/lib/python3/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: BAD REQUEST for url: http://localhost:8069/queue_job/runjob?db=odoo&job_uuid=a8068655-eb98-4bc8-895d-449462d86ccf

amh-mw · 2026-02-24T17:30:10Z

I don't have an easy way to git bisect the origin of the HTTPError 400 to 500 change, but if anyone else can confirm that it wasn't introduced by this pull request, I don't have any other objections to approval.

guewen force-pushed the 18.0-add-optional-new-cr branch 2 times, most recently from 75725a3 to 8f24e19 Compare February 19, 2026 15:21

sbidoul reviewed Feb 19, 2026

View reviewed changes

amh-mw suggested changes Feb 19, 2026

View reviewed changes

guewen force-pushed the 18.0-add-optional-new-cr branch 2 times, most recently from 018331b to 7bc3c77 Compare February 19, 2026 15:58

guewen changed the title ~~[18.0] Add 'Run in new cursor' option on job functions~~ [18.0] Add 'Allow Commit' option on job functions Feb 19, 2026

sbidoul reviewed Feb 19, 2026

View reviewed changes

queue_job/job.py Outdated Show resolved Hide resolved

guewen force-pushed the 18.0-add-optional-new-cr branch from 7bc3c77 to 2dec167 Compare February 19, 2026 16:34

guewen added 2 commits February 19, 2026 17:36

Add parameter to allow commit by default in jobs

b4f3bec

False on new databases, True on existing databases. Should always be False by default on future versions.

guewen force-pushed the 18.0-add-optional-new-cr branch from 2dec167 to b4f3bec Compare February 19, 2026 16:36

guewen added 2 commits February 20, 2026 15:15

Revert "Add parameter to allow commit by default in jobs"

6f50739

This reverts commit b4f3bec.

Improve documentation on allow commit

3b0619e

guewen requested review from amh-mw and sbidoul February 23, 2026 13:54

amh-mw suggested changes Feb 23, 2026

View reviewed changes

queue_job/job.py Show resolved Hide resolved

Fix missing job.env setter

b7e0c83

As the controller changes env on Job instances.

amh-mw suggested changes Feb 24, 2026

View reviewed changes

queue_job/job.py Outdated Show resolved Hide resolved

queue_job/job.py Outdated Show resolved Hide resolved

queue_job/job.py Outdated Show resolved Hide resolved

guewen added 2 commits February 24, 2026 16:00

Add failure retry in queue job test job

bd2b1ff

Simplify job env management

33ca9e7

guewen force-pushed the 18.0-add-optional-new-cr branch from 4647d3e to 33ca9e7 Compare February 24, 2026 15:10

amh-mw suggested changes Feb 24, 2026

View reviewed changes

Uh oh!

Comments

Conversation

guewen commented Feb 19, 2026

Uh oh!

OCA-git-bot commented Feb 19, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guewen Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

amh-mw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guewen commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amh-mw left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amh-mw commented Feb 24, 2026

Uh oh!

amh-mw commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

guewen Feb 20, 2026 •

edited

Loading

guewen commented Feb 24, 2026 •

edited

Loading

amh-mw left a comment •

edited

Loading