Convert bank statements to CSV format
How to convert PDF bank statements to CSV!
Aug 15, 2023 | Read time 10 minutes📗 Table of contents
Introduction
One of the annoying things when dealing with bank statements is taking out the data out of them and converting to CSV format.
Most bank statements come in PDF format, which isn’t always straightforward to convert into a more manageable format, such as CSV.
A common task that I have to do is to take the bank statement transactions can copy it to a CSV file. I would of thought this a easy process - all we need to do is hit copy and paste right? … right??
Now this can lead to a few issues:
- Scanned PDF bank statements can not be copy and pasted
- The columns and values can be jumbled up and not correct
- Since bank transactions contains currency values - this can screw up your CSV files due to separating commas (eg $20,001.00).
In this post, I will go over 4 ways we can export our PDF bank statements into CSV format.
Essentially, we have a few options:
- Use PdfDodo to extract the data
- Use Microsoft Excel and use the “Get Data” feature
- Use Microsoft Word and convert the PDF to Word format.
- Use Adobe Acrobat Pro
Two types of PDFs
There are two main types of PDFs:
- Text-based PDFs - these are the PDF files you can select the text. This type is easier to convert.
- Image-based PDFs - these PDF files are the ones that are scanned. So all the content is hidden in a image… even if it looks like text. These require OCR (Optical Character Recognition) to convert.
A note on CSVs
CSV, which stands for “Comma-Separated Values,” is a simple and widely-used file format that’s used to store tabular data (like a spreadsheet or database). Here’s a breakdown of the CSV format:
By default, fields are separated by commas, which is where the format gets its name. However, sometimes other delimiters like semicolons are used, especially in regions where the comma is used as a decimal separator.
Each line in a CSV file corresponds to a row in the table. As for columns - values on the same line, separated by commas, correspond to different columns
Consider the following table:
Name Age Occupation John Doe 30 Engineer Jane Doe 28 Doctor The above table would look like this in CSV format:
Name,Age,Occupation John Doe,30,Engineer Jane Doe,28,Doctor
Limitations with CSV
-
Complex Data - might not be suitable for extremely complex data structures since CSV is inherently flat data structure - eg can cause problems if your bank statement data contains nested tables.
-
Delimiters in Data: If the data itself contains commas or line breaks, it can complicate parsing. To handle this, field values with special characters are usually wrapped in double quotes. So for example instead of $20,001 - we change that to have double quotes “$20,001”
So to get around these limitations and convert our PDF statements to CSVs then we can do the following.
Option 1: PdfDodo
This option is created by yours truly. Just sign up with an account and you try it out for free.
To do start the CSV conversion, we go through the following steps:
- Step 1: Get an account with PdfDodo
- Step 2: Go to the home page and click “Upload Document”
- Step 3: After the document has uploaded - click “Start Analysis”. This will usually take a few seconds.
- Step 4: Click on the uploaded document link to see the results. After PdfDodo has completed analysis, it will list out all the data that it thinks is in a table format.
- Step 5: Find the table that you are interested in and hit “CSV”. This will give you the CSV format.
PdfDodo works with scanned PDFs by using AI and OCR technology
OCR (Optical Character Recognition) is the technology used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.
Through a combination of image preprocessing, feature detection, and machine learning algorithms, OCR systems identify these patterns and match them to corresponding characters in a predefined set.
Option 2: Use Microsoft Excel
If you possess a recent version of Excel and wish to extract data from your text-based PDF bank statement, this method might be handy for you.
Keep in mind, this technique is suited for text PDFs, not scanned documents.
To utilize this, ensure you have the most recent version of Excel. Older editions, like 2019 or 2016, won’t support this feature. Moreover, its availability is influenced by your Microsoft 365 subscription tier. It’s fully accessible in the Enterprise edition but might be restricted on more basic plans.
Steps to Export Data from PDF to Excel:
- Step 1. Launch Excel.
- Step 2. Navigate to the “Data” tab.
- Step 3. Select: Get Data > From File > From PDF.
- Step 4. Choose the desired PDF file, then click “Import”. The Navigator panel will display, showcasing tables and pages from your PDF with a preview.
- Step 5. Highlight the table you want and click “Load” to bring it into your Excel sheet.
- Step 6. Once your data populates in Excel, save it in CSV format by going to File -> Save As -> Save As Type -> CSV."
Limitations:
- Requires you to have the latest Excel installed. Older versions of Excel such as 2019, 2016 does not work.
- I found its not 100% accurate.
- Does not work with scanned PDFs
Option 3: Use Microsoft Word
One option that not a lot of people have thought about is to first convert your PDF file to Word.
Now if you already have Microsoft Word this is easy.
- Step 1: Open up Microsoft Word
- Step 2: Select a PDF file. If all goes well, Word converted the PDF file to a Word document.
- Step 3: You can then find your table of transactions and copy and paste all the row and columns into Excel.
- Step 4: We can then save the Excel file as CSV using File -> Save As -> Save As Type -> CSV
Limitations:
- Requires you to have Word installed.
- I found its not 100% accurate.
Option 4: Use Adobe Pro
- Step 1: Open the PDF in Adobe Acrobat - Launch Adobe Acrobat and open the PDF file you wish to convert.
- Step 2. Export the PDF - Navigate to File in the top-left corner, then select Export To from the dropdown menu.
- Step 3. Choose Spreadsheet Option - From the sub-menu of the export options, select Spreadsheet and then choose Microsoft Excel Workbook. Note: Adobe Acrobat doesn’t provide a direct “export to CSV” option, so we first convert to Excel.
- Step 4. Save the File - Choose a location on your computer to save the Excel file and click Save.
- Step 5. Open the Excel File - After the conversion is completed, open the resulting Excel file in Microsoft Excel.
- Step 6. Save the File as CSV - In Excel, go to File -> Save As. Choose the location where you want to save your CSV file. From the “Save as type” dropdown, select CSV (Comma delimited) (*.csv) and then click Save.
Now, you should have your PDF content in a CSV format.
Limitations:
- Accuracy of this process can vary, especially with more complex PDFs. If your PDF is a scanned image or contains intricate layouts, you might need to do some manual adjustments in Excel before saving it as a CSV.
- You will have to pay for Adobe Pro - which can be quite steep.
Tips for a Smooth Conversion
- Review After Conversion: Always check the CSV post-conversion. The original format of the PDF, especially if complex, might cause misalignments or data loss.
- Prioritize Privacy: If using online tools, remember that bank statements contain sensitive data. Always ensure that any platform you use respects user data privacy.
- Backup Original Files: Always keep a backup of the original bank statements. They might be needed for verification or in case the conversion process omits any data.
Final thoughts
Overall - if you want a all purpose solution that will work with text or even scanned PDFs then consider using PdfDodo.
If you already have the latest version of Excel (with the Enterprise subscription) and that your PDFs are quite simple then consider using the Excel “Get Data” method.
If you have Adobe Pro - then there is a option to convert PDF to Excel. However I found this option is limited if you are dealing with scanned PDFs or that you have a low specked machine - it can chew up CPU time!
👋 About the Author
G'day! I am Ken Ito a software engineer based in Australia. I have worked as a software engineer for more than 10 years ranging from roles such as tech lead, data analyst to individual contributor. I have experience in creating code (python, javascript) to extract data from PDFs for ETL and financial statement analysis for accounting purposes.
During those years, I have worked for Government departments, companies like NTT (Nippon Telegraph and Telephone), Bupa and not-for-profit organizations.
Converting Commonwealth Bank statement to Excel
Aug 15, 2023Step by step guide on converting your CommBank bank statements to Excel
How to convert pdf bank statement to Excel format
Aug 14, 2023Tips to convert and extract bank statements to Excel!