Sync Obsidian Attachments When Copying Vault Folders
When copying over only select folders from one Obsidian vault to another the attachments do not get copied over with the notes. Going through each note and manually copying and pasting the missing attachments would be tedious. Therefore, I am writing this blog post to go with a small Python script in order to automate this process. I provide a breakdown of the regular expression used to identify attachments at the end of the blog post.
1. Create New Vault or Select Destination Vault
The first step is to create a new Obsidian vault that is going to house our old vault’s notes and attachments. If you are working with a vault that has already been created then you can skip this step.
2. Copy Obsidian Vault Folders and Notes
If you haven’t already done so, the next step would be to copy the original vault’s file contents into the new vault’s storage. To find where you vault is located right click any of the folders within the vault and select Show in system explorer. Then drag and drop the desired folders.
After the files and folders are copied over we can see that our new vault contains all of the markdown notes, but all of the attachments are not found.
3. Update New Vault Default Attachments Directory (OPTIONAL)
This step is not required, as you can store all of the attachments in the root folder of your vault; However, I prefer to store my attachments in a newly created directory for management purposes. To create a new attachments directory:
- Create New Folder
- Setings > Files and Links > Default Location for New Attachments > Set to “In the folder specified below”
- Set Attachment Folder Path to the Folder Created Above
Setting Obsidian Default Directory for Attachments
4. Run My Python Script
Grab my vaultAttachSync python script stored on my GitHub. The script uses only standard Python3 libraries, so no need to install anything additional. Run the script from a command line (the script works on both Windows and Linux):
1
python vaultAttachSync.py -n "C:\New\Vault\Path" --na "Attachments" -o "C:\Old\Vault\Path" --oa "Attachments"
The script will recursively go through the New Vault’s files and folders to find all references to missing images. We are starting with the new vault first in order to only get relevant attachments. The script will notifiy you if any exceptions or failures to copy occur. Note, if you re-run the script multiple times you will not have to worry about duplicates.
vaultAttachSync Execution Output
Checking the same file with missing attachments seen in Step #2 we can see that the file now has the attachments:
Explaining The Magic
The script utilizes the following regular expression in order to find all references to images in the Obsidian file:
1
images = re.findall(r"\[\[(?:.*?/)(.*?\.(?:png|jpg|gif))(?:\|.*?)]]", codecs.decode(vaultFile.read_bytes())) # Return List of Images
Currently, the script is only programmed to find PNG/JPG/GIF attachments, but adding additional attachment cases would not be difficult.
RegEx Breakdown
Let’s break the RegEx down:
\[\[...]]
: Obsidian uses the![[<attachment>]]
synatx for inserting an attachment. This RegEx is looking for text enclosed in the brackets.(?:.*?/)
: Specifies a non-capture group, telling the interpreter to find any pattern that matches[[<anything>/]]
but not to include it in the output. Additionally, this pattern isn’t necessary, meaning that if an attachment doesn’t contain this pattern, the rest of the RegEx will work.
This RegEx was added because some of my Attachment Names contained the vault’s attachments directory name. I believe this occurred when I imported my Notion notes into Obsidian.
(.*?\.(?:png|jpg|gif)
: Specifies a capture-group to look for any text that matches[[<anything>/<anything>.(png/jpg/gif)]]
and a non-capture group to find alternative extension types but to not include this into the outputted list.(?:\|.*?)
: Another non-capture group looking for attachments that contain|
after the png extension.
This RegEx was added because some of my Attachment Names contained alternative names using
|
. Again, I believe this occurred when I imported my Notion notes into Obsidian.
codecs.decode(vaultFile.read_bytes())
: Previously I usedvaultFile.read_text()
however, I ran into some Unicode decoding errors. Getting the file’s raw bytes usingread_bytes()
then decoding them into UTF-8 usingcodecs.decode()
stopped this issue from occurring.
If you run into any UTF-8 decoding errors add the following additional argument:
codecs.decode(vaultFile.read_bytes(), encoding="latin-1")
.
Supporting Additional Attachments
I haven’t tested any additional attachment types other than those outlined in the RegEx. However, adding additional support should be as simple as adding |<extension>
to the current RegEx:
1
images = re.findall(r"\[\[(?:.*?/)(.*?\.(?:png|jpg|gif|<EXT HERE>))(?:\|.*?)]]", codecs.decode(vaultFile.read_bytes())) # Return List of Images