https://www.canva.com/design/DAGGgKTyPkM/uUO__9IJHqRy05npGsuITw/edit?utm_content=DAGGgKTyPkM&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton
https://www.w3schools.com/git/default.asp?remote=github
https://aws.amazon.com/vi/what-is/ocr/
OCR-Paddle: https://github.com/PaddlePaddle/Paddle.git
EasyOCR: https://github.com/JaidedAI/EasyOCR.git
Tesseract: https://github.com/tesseract-ocr/tesseract.git
Pros | Cons | Note | |
---|---|---|---|
Paddle | |||
EasyOCR | |||
Tesseract |
https://viblo.asia/p/deep-learning-table-recognition-simple-is-better-than-complex-bai-toan-tai-cau-truc-du-lieu-bang-bieu-voi-deep-learning-Qbq5QBYLKD8 CVPR2023-TR: https://youtu.be/Onf5En9AI30?si=2jd6OnkrXYt3kqwj
Paper:
tsr https://arxiv.org/pdf/1908.04729
2html https://arxiv.org/pdf/2105.01848
lgpma https://arxiv.org/pdf/2105.06224
TSR Method Type | Pros | Cons | Method Name | Note |
---|---|---|---|---|
Split & Merge | Split-Embed-Merge, Deep Split-Merge | ‣ | ||
https://arxiv.org/pdf/2107.05214 | ||||
Detect & Classify ( Graphs ) | GraphTSR | |||
Image-to-markup | Tạo thêm dữ liệu cực kì khó và không khả thi | TableMaster, MTL TabNet, EDD-third-party, tsr-convstem | ||
Thiết kế khá phức tạp ( gồm nhiều module con bên trong ) | Davar-OCR | |||
PaddleStructure | ||||
Modeling quá đơn giảnKhông detect được span-cells | DeepTSR + TableNet | |||
TGRNet | ‣ | |||
https://arxiv.org/pdf/2106.10598 | ||||
Transformers | ‣ | |||
https://openaccess.thecvf.com/content/CVPR2022/papers/Smock_PubTables-1M_Towards_Comprehensive_Table_Extraction_From_Unstructured_Documents_CVPR_2022_paper.pdf |