Collect Data

https://www.canva.com/design/DAGGgKTyPkM/uUO__9IJHqRy05npGsuITw/edit?utm_content=DAGGgKTyPkM&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton

GIT

https://www.w3schools.com/git/default.asp?remote=github

OCR

https://aws.amazon.com/vi/what-is/ocr/

OCR-Paddle: https://github.com/PaddlePaddle/Paddle.git

EasyOCR: https://github.com/JaidedAI/EasyOCR.git

Tesseract: https://github.com/tesseract-ocr/tesseract.git

Pros Cons Note
Paddle
EasyOCR
Tesseract

Table Recognition

https://viblo.asia/p/deep-learning-table-recognition-simple-is-better-than-complex-bai-toan-tai-cau-truc-du-lieu-bang-bieu-voi-deep-learning-Qbq5QBYLKD8 CVPR2023-TR: https://youtu.be/Onf5En9AI30?si=2jd6OnkrXYt3kqwj

Paper:

tsr https://arxiv.org/pdf/1908.04729

2html https://arxiv.org/pdf/2105.01848

lgpma https://arxiv.org/pdf/2105.06224

TSR Method Type Pros Cons Method Name Note
Split & Merge Split-Embed-Merge, Deep Split-Merge
https://arxiv.org/pdf/2107.05214
Detect & Classify ( Graphs ) GraphTSR
Image-to-markup Tạo thêm dữ liệu cực kì khó và không khả thi TableMaster, MTL TabNet, EDD-third-party, tsr-convstem
Thiết kế khá phức tạp ( gồm nhiều module con bên trong ) Davar-OCR
PaddleStructure
Modeling quá đơn giảnKhông detect được span-cells DeepTSR + TableNet
TGRNet
https://arxiv.org/pdf/2106.10598
Transformers
https://openaccess.thecvf.com/content/CVPR2022/papers/Smock_PubTables-1M_Towards_Comprehensive_Table_Extraction_From_Unstructured_Documents_CVPR_2022_paper.pdf

WorkFlow Table Recognition.png