Skip to content

[16.0][ADD] dms_import: Migration data from documents EE to dms CE#409

Open
xaviedoanhduy wants to merge 4 commits intoOCA:16.0from
xaviedoanhduy:16.0-add-dms_import
Open

[16.0][ADD] dms_import: Migration data from documents EE to dms CE#409
xaviedoanhduy wants to merge 4 commits intoOCA:16.0from
xaviedoanhduy:16.0-add-dms_import

Conversation

@xaviedoanhduy
Copy link

@xaviedoanhduy xaviedoanhduy commented Apr 17, 2025

Purpose

This migration script allows moving data from Enterprise documents* (EE) to the OCA dms* (CE) modules.

The goal is to preserve:

  • Folder hierarchy
  • Access rights (read/write groups)
  • Tags and tag categories
  • Files and their attachments

Approach

The migration is implemented as a post-init hook and works directly at the SQL level to avoid dependencies on EE models. This ensures:

  • Compatibility with CE environments (no EE models required).
  • Preservation of hierarchy, security, and data.

The migration flow has three steps:

1. Tags migration

  • documents.facetdms.category
  • documents.tagdms.tag
  • Prevents duplicates by checking existing categories and tags.

2. Folders migration

  • documents.folderdms.directory
  • Maintains parent/child relationships.
  • Root folders receive a default dms.storage.
  • Access rights migrated via:
    • documents_folder_res_groups_reldms.access.group (Write groups)
    • documents_folder_read_groups_reldms.access.group (Read groups)
  • Folder-level tags are also migrated via facet → tag mapping.

3. Files migration

  • documents.document (binary only) → dms.file
  • Preserves folder assignment and file metadata.
  • Keeps linked attachments by updating ir.attachment.res_model/res_id to the new dms.file.
  • Migrates file tags using the tag mapping.
  • Uses batch processing (1000 docs per batch) for scalability.

Data Mapping Details

EE Model / Table DMS Model / Table Notes
documents.facet dms.category Folder tag categories → Categories
documents.tag dms.tag Tag names + facet → Tags, with deduplication
documents.folder dms.directory Folder hierarchy → Directory hierarchy, sequence → color (fallback)
documents_folder_res_groups_rel dms.access.group Write group permissions (create/write/unlink)
documents_folder_read_groups_rel dms.access.group Read-only group permissions
documents.document (binary) dms.file Binary documents only, skipped inactive or non-binary docs
documents_document_tag_rel dms_file_tag_rel Many2many doc ↔ tags mapping
ir.attachment (linked to document) ir.attachment (relinked) Updated to point to new dms.file

Extra Notes

  • Colors default to randint(1, 11) when no valid sequence color is available.

  • A default database storage (dms.storage) is created if none exists.

  • Logging is verbose:

    • Created directories
    • Assigned groups
    • Migrated tags and files
  • Errors are logged per record without blocking the migration.

  • In the context of documents.document stored content via attachments (ir.attachment):

  • With dms.storage using save_type="database" (the default), the content will be recalculated and stored in dms.file. This can put additional load on the database.

  • With save_type="attachment", the storage requires directories that act as references, including the res_model and res_id of the record. Files within these directories will be linked to these values. For example, if the Contact directory references the res.partner model, it will create a directory named Admin storing res_model="res.partner" and res_id="3". Files attached to this partner will then be stored in the corresponding Admin directory. So if we choose to reference an existing attachment, it will require generating multiple directories corresponding to each attachment.

  • Given the options, I chose save_type="filestore", even though there is limited documentation about this type. In the current context, however, it seems to be the most appropriate choice.


Example Workflow

  1. Run module installation with the migration hook.
  2. Check that:
    • Categories and tags exist in dms.category and dms.tag.
    • Directories are created with correct hierarchy.
    • Access groups are properly assigned.
    • Files are migrated and linked to attachments.
    • Tags are assigned to both directories and files.

Validation

  • Before migration: Ensure EE tables (documents_*) exist and contain data.
  • After migration: Validate:
    • Number of categories, tags, and directories.
    • Randomly sample files to confirm attachments and tags.
    • Check group permissions are correctly migrated.

Migration should complete successfully and log a summary:

Successfully migrated X tags.
Successfully migrated Y folders (Z new root directories).
Successfully migrated N files.
Migration completed successfully.

@kobros-tech
Copy link
Contributor

@xaviedoanhduy

could you write desctiption on how to use it or configure if necessary, so that we could review easily?

@nilshamerlinck
Copy link

hi @kobros-tech you will receive an invitation to join the slack workspace where you'll find more information, see you there ;-)

@kobros-tech
Copy link
Contributor

kobros-tech commented May 29, 2025

@nilshamerlinck
thanks, I have now a stack account!

Copy link
Contributor

@kobros-tech kobros-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement Review Only, would recommend adding test cases

@kobros-tech
Copy link
Contributor

@xaviedoanhduy

what can happen if there is a file that should be for a specific partner, shall we get the same file accessible by the same partner Only after migration?

@kobros-tech
Copy link
Contributor

kobros-tech commented Jun 1, 2025

all right I will ping other mind to tell about real life scenario and then we can apply

@wlin-kencove

@xaviedoanhduy
Copy link
Author

xaviedoanhduy commented Jul 30, 2025

@kobros-tech,

As far as I recall, dms.file/dms.directory are shared at the user - res.users level, while documents.document/documents.folder are assigned to contacts - res.partner.

I’d appreciate your thoughts if you have any ideas on how to map these fields between the two models.

Sorry for my omission, I'd like to clarify some points:

  • For your question, it seems that the answer is no. because the data from EE documents is not shared with any user (it can only be owned by 1 partner) but only through links or permission groups - I will give an example that the Internal folder has Read Groups (Write Groups) containing the Documents/Adminstrator permission group and the user named Demo is not in this group, he will not be able to see the above folder in the backend view - but he can have full read and write permissions if he owns the link generated from Share links.
  • EE documents, the documents will mostly not be accessible flexibly to portal users (even if that user is the owner of that document - it will only be seen through the backend in the Contact app). And for portal users (or even non-logged in users), they only need to know the 1 link created from Share links (documents.share model) - and that is the only way they can access these documents.
  • For CE dms, these sensitive records are controlled from another group model (dms.access.group) and allow all public users to see these documents if their group (res.groups - portal group) or maybe on users (res.users).
  • For the context of the current module, I am using the mechanism of creating dms.access.groups based on the name of the document and Read Groups (Write Groups) -> resulting in only users in the above permission groups can access it - and one thing is for sure, those permission groups do not contain portal users.

@kobros-tech
Copy link
Contributor

yes, it is much better.

once you are done you can ask me to review, good luck!

@xaviedoanhduy xaviedoanhduy changed the title [16.0][ADD] dms_import [16.0][ADD] dms_import: Migration data from documents EE to dms CE Sep 25, 2025
@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 3 times, most recently from de8ccd8 to d98f346 Compare September 29, 2025 10:19
@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 2 times, most recently from 9844447 to e06570c Compare October 16, 2025 09:36
@andreampiovesana
Copy link

nice to have

@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 7 times, most recently from 564fae4 to 0c05e18 Compare October 23, 2025 09:53
@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 4 times, most recently from 578bb77 to 2cdf519 Compare October 29, 2025 05:24
[IMP] dms_import: move pre_init_hook to post_init_hook

[REF][FIX] dms_import: Use new syntax, avoid sql injection and fix group creation bug

dms_import: also migrate achived data

[REF] dms_import: avoid duplication by forcing tags, categories, and default group permissions

[IMP] dms_import: improve performance, reduce batch size, and bypass heavy compute fields

[FIX] dms_import: handle duplicate name

[FIX] dms_import: standardize file names

[IMP] dms_import: improve unique_name_new to avoid long name

[FIX] dms_import: avoid check size when uploading files
marcos-mendez

This comment was marked as outdated.

@pedrobaeza
Copy link
Member

@marcos-mendez please stop this AI reviews that are introducing a lot of noise until a AI policy is set at OCA level.

@kobros-tech
Copy link
Contributor

@dnplkndll

What do you think?

@marcos-mendez requested changes. I like that we finish and make it done the best way.

@kobros-tech
Copy link
Contributor

@marcos-mendez please stop this AI reviews that are introducing a lot of noise until a AI policy is set at OCA level.

Hi I have an open PR somewhere and I need your help there :)

@marcos-mendez
Copy link

@pedrobaeza — Understood, and I respect that. Reviews are paused immediately.
But I'd like to put this in context, because this isn't just about an AI bot — it's about a systemic problem that the OCA has been unable to solve for over a decade.

The Problem Is Not New
The OCA Board itself acknowledged in its 2014 meeting minutes that "many current reviewers are simply not performing reviews" and that there was a "lack of reviews leading to slow turnaround of PRs" (source). That was 11 years ago.
In 2019, the OCA Contributors mailing list had an entire thread called "Stale PR closing" where community members described the exact same frustration. One contributor summarized it perfectly: a developer submits a PR, gets all tests green, follows the OCA rule of reviewing 3 PRs for every 1 submitted, and still nobody reviews their work. Their conclusion? "OCA is not for me, I did everything I must but nobody review it" (source).
In 2024, the OCA's own OpenUpgrade blog confirmed: "reviewer work is a very demanding and time-consuming task. As a result, many pull requests can remain open for months on end". For version 17, released in October 2023, the base module migration was completed one year later (source).
The Numbers Tell the Story
The 2023 Dixmit analysis of OCA GitHub data shows that a single person — you, Pedro — performed 1,921 reviews in one year. That's ~5.3 reviews per day, every day, weekends and holidays included. The top 3 reviewers account for the vast majority of all OCA reviews. This is not a healthy ecosystem — it's a single point of failure.
Right now, stock-logistics-workflow alone has 147 open PRs. Across the OCA, dozens of repositories have PRs carrying the "stale" label — contributions that were simply ignored until they died.
The OCA's Own Documentation Acknowledges This
The official Maintainer Role document states: "when addons have no active maintainers, PSC members must take care of it, but they may be too busy or may not feel concerned" (source).
The PSC Guide defines the PSC's role as ensuring "balanced and wide scale peer review and collaboration do happen." If hundreds of PRs are aging without review, this responsibility is not being met.
What Happened Here
When I asked for a review on PR #2276, your answer was: "review other related PRs and ask in exchange that they review yours" (comment). I took that seriously. I built a tool to help — not to replace human reviewers, but to give traction to PRs that would otherwise sit with the "stale" label until they die.
The tool runs a local open-source model (qwen3-coder:30b) on my own workstation. Zero data is sent to third parties. No Microsoft, no OpenAI, no cloud APIs. More private than GitHub Copilot, which many OCA contributors already use silently.
The Industry Has Already Moved
Speaking of GitHub Copilot — it now has over 60 million code reviews performed, with 12,000+ organizations running it automatically on every PR (source). GitHub Copilot Code Review reached general availability in April 2025 and hit 1 million users in its first month. AI-assisted code review is not experimental anymore — it's the industry standard.
At the OCA's own AMA at Odoo Experience 2025, the association confirmed it is "actively discussing how LLMs should be used" and emphasized the need for "clear guidelines and acceptance criteria" (source).
My Position
I stopped the bot the moment you asked — no argument there. I respect the community's right to define its own policies.
But you can't block solutions to a problem the organization has acknowledged for 11 years and hasn't solved. If the PSCs can't guarantee timely reviews — which is their statutory responsibility — then tools that help with that bottleneck should be discussed, not dismissed.
I'm not here to fight. I'm here to contribute. If there's an appropriate channel to propose an AI review policy (GitHub Discussion, OCA Board meeting, mailing list), I'd like to participate and present what we've built — transparently, collaboratively.

@pedrobaeza
Copy link
Member

Not me either. That there's a problem: of course. But this solution is not the best one, as it's very verbose and not fine-tuned. Imagine the damage to a newbie telling to do things that are not correct...

@pedrobaeza pedrobaeza added this to the 16.0 milestone Mar 16, 2026
@pedrobaeza pedrobaeza dismissed marcos-mendez’s stale review March 16, 2026 08:04

Incorrect review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants