kili-technology.com
Open in
urlscan Pro
2606:4700:20::681a:360
Public Scan
Submitted URL: https://proxhorror.com/lt/2251799839093995/NV0i2x7kKdkFta2fSqC0d
Effective URL: https://kili-technology.com/data-labeling/training-an-id-information-extraction-algorithm-with-kili-technology-the-story-of-lcl
Submission: On March 25 via manual from US — Scanned from DE
Effective URL: https://kili-technology.com/data-labeling/training-an-id-information-extraction-algorithm-with-kili-technology-the-story-of-lcl
Submission: On March 25 via manual from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Watch the replay! Fast Track Shipping Insurance AI Models: Overcoming Training Data Challenges * Products Platform * Labeling * Quality * Integration * LLM Fine Tuning * LLM Evaluation & Testing Labeling Services * Kili Simple Offer * ML expert guidance Assets * Text Annotation Tool * Image Annotation Tool * Video Annotation Tool * OCR Annotation Tool * Geospatial Annotation Tool Master the craft of preparing training data to turbocharge your ML efforts DOWNLOAD EBOOK HERE > * Solutions Solutions Data LabelingText AnnotationNatural Language ProcessingComputer Vision Image AnnotationVideo AnnotationLLM EvaluationRAG Evaluation Use Cases * Insurance * Security * Healthcare * Manufacturing * Content categorization Master the craft of preparing training data to turbocharge your ML efforts DOWNLOAD EBOOK HERE > * Company * About us * Why Kili * Careers * Events * Resources * Blog * Events & Webinars * Whitepapers * Case Studies * Open Datasets * Models Checklist: Comparing Data Labeling Services Download our free resource here * Docs * What is Kili Technology? * Getting started * Changelogs Users & rolesHandling projectsLabelingQuality Management PluginsAutomationKili APITroubleshooting * Pricing * Request a demo Get My Data LabeledLog In * Products * Solutions * Company * Resources * Docs * Pricing * Request a demo Get My Data Labeled * Home * / * Data labeling * / * Training an ID information extraction algorithm with Kili Technology: the story of LCL Deep Dive TRAINING INFORMATION EXTRACTION MODELS: THE STORY OF LCL Banks are amongst the most regulated establishments in the world. Discover how LCL built a powerful ID information extraction algorithm using Kili Technology to classify the IDs of their customers. Axel CypelAI Expert at LCL Table of Contents * Why information extraction? Another issue... * Where AI & extracting information comes into action * Working with Kili Technology for information extraction: the right solution * Back to the topic at hand * That’s not the end of the story WHY INFORMATION EXTRACTION? ANOTHER ISSUE... Banking activity is one of the most regulated sectors in advanced countries. The Basel Committee and the European Central Bank act as supervisors of the banking commercial structure which has the power to create money through credit. With such responsibility, banks are subject to national or international regulations. Regulations from influential countries like the United States, or the EC, also apply to our own institutions through extraterritorial laws (for instance, when using USD currency, or abiding by embargos). One of the pillars of good management for a bank is the well-known KYC: Know Your Customer. A massive amount of data needs to be collected to ensure a suitable KYC, and banks’ customers are perfectly aware of this obligation. One of the passage obligé to build powerful KYC is the collection of identity documents. To comply with international regulations, banking institutions are required to have a recent identity document copy for each of their customers. This copy should be of sufficient quality to be used for verification or control purposes. The issue begins when you need to parse your entire client base to ensure this constraint in a short time range, for all the clients in your portfolio (a few millions). This, of course, cannot be done manually. Another key pillar of good management is Corporate Social Responsibility (CSR). As banks are a major transmission belt in the economy, policies to make ecological changes apply to the finance industry. Banks being the natural ally of citizens when buying real estate, setting conditions on the energetic score when acquiring new construction or making renovations is important. The “DPE” (Energy Performance Diagnostic) is now a mandatory document due when a real estate loan is signed. This document contains data that allow banks to create the regulatory extra-financial reporting asked by the Regulators. WHERE AI & EXTRACTING INFORMATION COMES INTO ACTION Many formats of ID documents (National ID Card, Passports, residence permit) and DPE – for which there is no unique template – are collected and dumped in the LCL Electronic Document Management system. For each document, a dozen text fields must be extracted to ensure that the proper information is registered either in the bank’s CRM or in the appropriate reporting. And this information extraction must be executed successfully millions of times, for our millions of customers (we do not complain!). To recognize, categorize and extract structured data from these documents, we have two options: build an army of labelers to annotate 100% of the raw data manually or use state-of-the-art recognition algorithms to develop AI models. The former is not feasible in real life, while the latter is more than an option: it is the solution. And LCL is equipped for this challenge: there is already a team in LCL specialized in the business of creating AI products to process images, text files, voice samples and run named entity recognition or natural language processing techniques on legal documents. Document categorization & natural language processing are particularly well served by using supervised machine learning. The challenge resides in the high expectations from the business units (Compliance Department, CSR Management): we cannot afford any mistakes given the importance of our two missions. But that’s the way our business works! Even with our choice to use artificial intelligence and information extraction, building a labeling team is still a challenging task. As we said, we did not plan to hire an extensive team of annotators. But when using supervised learning, it is well known that to train our model, we need to obtain an example database, containing thousands of labeled and annotated images. This is where we need software to handle our document extractions and pre-processing to simplify the job of our five dedicated labelers handling the extracted data. WORKING WITH KILI TECHNOLOGY FOR INFORMATION EXTRACTION: THE RIGHT SOLUTION To be able to label our existing data, we chose to work with Kili Technology, the labeling platform to build high-quality training data from structured and unstructured data. Having extracted tens of thousands of images from our Enterprise Data Management, we stored them in our on-premises servers. A bridge with the Kili Technology software, also installed on-premise, allows our remote team of labelers to work on identifying entities, classification & global annotation tasks (e.g. the letter from the energy diagnostic, or the expiry date of a national identity card). The Kili Technology SDK allows us to use our custom models to OCRize interest areas, extract information, and prepare these documents for manual annotation. Labelers, but also experts from businesses enjoy working on Kili Technology because they can focus on the task to be done easily and in time. This is opposed to our former constant fear of work being lost or needing nightmarish file management with a dozen Excel files. From our perspective, not having to bother with data transfer and backups is a great relief: the installation of Kili Technology has been done up to our data safety standards. After some use, Kili Technology was considered a great comfort by both hired and volunteer labelers. People from our business units wanted to be involved in building AI: engaging many people as a workforce helped the federation of the company around AI. Watch video Learn more In the end, Kili Technology provides a strong labeling software focused on dataset quality and an easy-to-use interface, but there’s also a team behind the scenes. Our counterparts at Kili Technology gathered very quickly whenever we encountered any issue. Our dedicated customer success manager is very careful about any pain point that can arise and will organize meetings with tech profiles, should there be a need to customize or be trained on certain functionalities of the application. On one hand, all features needed for labeling are present, and a few of them are often used. On the other hand, we sometimes have a need for a feature that is not (yet) developed but that can be added to the roadmap. As a large corporation collaborating with a start-up, chances are our paces are sometimes different. But even with the differences in the working model, Kili Technology remains very attentive to our challenges. BACK TO THE TOPIC AT HAND For each of our two labeling campaigns, we used Kili Technology’s labeling platform. It allowed us to push, label and retrieve the data that will feed our machine learning algorithms. Deep learning is a big consumer of data if the output model is to meet the business requirement with a very small error rate. A standard annotation campaign accounts for typically 5,000 documents to be annotated in 2 or 3 weeks. To obtain them, there is an important data preparation work upstream. Thanks to the fact that our AI infrastructure now includes Kili Technology, we can use the tool for all kinds of projects by people trained to use the platform. Simply by requesting the relevant raw data to be loaded in, LCL teams can accelerate drastically the creation of their training datasets, which means a significant improvement for all the parties involved. Once our models are ready, there are integrated by our IT and ready to be pushed to production massively. See for yourself: more than 13 million documents were run in batches to check if every single KYC was complete with its readable ID document. The data extraction from every scan copy filled in a compliance tool that allowed the retail network of advisors to update documents with their clients. All of that is done algorithmically, with a training dataset built on Kili Technology. As for the extra-financial reporting, our algorithms will catch the relevant information contained in the DPEs during the credit process. That is, one more time, something that cannot be done by a manual process unless at great cost. And for all these services rendered by AI, there is a need for a labeled database. Without a tool such as Kili Technology, filling the requirements of the regulators would take months and make us miss the compliance schedule. Regulators do not wait! THAT’S NOT THE END OF THE STORY As you may have experienced, banks collect data from customers all the time: images, e-mails, phone calls, contracts, etc. We can even process voice recordings, where speech-to-text algorithms can apply the power of NLP to live conversations. We will undoubtedly use Kili Technology to annotate additional assets, all with the goal of improving our security and customer satisfaction. Regulators never sleep! To learn more about information extraction, optical character recognition, natural language processing, automatically extracting structured information, extracted entities, automatic annotation, information extraction and information retrieval, check out our webinars and other articles! Other articles on topic How to Choose a Data Labeling Service: A Comp... Best Geospatial Annotation Tool: What to Look... How to Perform Distributed Training? Document Layout Analysis, a complete guide Using ChatGPT to pre-annotate Named Entities ... How to Ensure the Accuracy of Your Geospatial... Satellite Imagery Annotation: Challenges and ... How to create an image recognition model Best Practices for Unstructured Data Protecti... Customer Story: How Covea Leveraged Kili Tech... Discover fairness issues in classification wi... Computer Vision Applications – Definition, Us... Supercharging Your Machine Learning Models wi... Data + Optimization Part 1: How Kili Technolo... Neural Network Architecture: all you need to ... Understanding Named Entity Recognition & Text... Webinar Recap: Fast Track Insurance AI: Overc... Beginner’s Guide to Intelligent Document Proc... Our Journey to Cleaning the Oceans with Machi... How to compare Data Labeling Tools? List of Image Annotation and Labeling Service... Video 2.0: Kili Technology’s Fresh Start on V... Understand predictive model through Topologic... Automatic error identification for image obje... AI-based Visual Inspection Systems: Next-gene... Want to get ML content directly in your inbox? Subscribe to our Newsletter Read More Read our Guides ULTIMATE GUIDE TO DATA LABELING IN ML OUR COMPLETE GUIDE TO VIDEO ANNOTATION TEXT ANNOTATION AND DOCUMENT PROCESSING: A COMPLETE GUIDE GET STARTED Get started! Build better data, now. Request a demoGet My Data Labeled Kili Technology © 2023 Products LabelingQualityIntegrationProfessional ServicesPlans & Features Tools LLM Fine-Tuning ToolLLM Evaluation ToolImage Annotation ToolVideo Annotation ToolNLP Text Annotation ToolOCR Annotation ToolGeospatial Annotation ToolData Labeling Tool Guides Data Labeling GuideRAG Evaluation GuideLLM Evaluation GuideText Annotation GuideNatural Language Processing GuideComputer Vision GuideImage Annotation GuideVideo Annotation Guide Kili Technology © 2023 CompanyPress France47 boulevard de Courcelles, 75008 Paris United States524 Broadway, New York, NY 10012 PRIVACY POLICYLEGAL NOTICESECURITY INFOSTATUS This website uses cookies Hey! At Kili Technology, we are committed to ensuring your privacy and providing you with the best possible experience. Now, before you jump into exploring our fantastic content, we'd like to get your permission to use these cookies. Don't worry; we've got your privacy covered! 😊 Our cookies serve two primary purposes: 1️⃣ Enhancing Your Experience: These cookies allow us to remember your preferences so you don't have to set them every time you visit. 2️⃣ Analyzing and Improving: We use cookies to enhance our content, features, and overall user experience. But here's the best part: we respect your choices! You have full control over which types of cookies you want to enable or disable. If you accept, we will use cookies for both the aforementioned purposes. However, if you prefer not to, we will only use the necessary cookies required for the site's basic functionality. Read more Save & Close Yes, it's Ok for me Let me choose Hide details