community.dataiku.com
Open in
urlscan Pro
2600:9000:2491:6800:1:9db:4040:93a1
Public Scan
Submitted URL: https://pages.dataiku.com/e3t/Ctc/GA+113/cfvmy04/MX7RCxG68F-W1K9m4m4gwCcBW3QbRmh4N1D4NN4313K53q3pBV1-WJV7Cg-3QMh_shM2xftKW...
Effective URL: https://community.dataiku.com/t5/Dataiku-Frontrunner-Awards/tkb-p/Awards?utm_campaign=Dataiku%20Frontrunner%20Awards%202022&ut...
Submission: On August 08 via api from DE — Scanned from DE
Effective URL: https://community.dataiku.com/t5/Dataiku-Frontrunner-Awards/tkb-p/Awards?utm_campaign=Dataiku%20Frontrunner%20Awards%202022&ut...
Submission: On August 08 via api from DE — Scanned from DE
Form analysis
3 forms found in the DOMName: form_3bb9fcc7484389 — POST https://community.dataiku.com/t5/tkb/v2/page.searchformv32.form.form
<form enctype="multipart/form-data" class="lia-form lia-form-inline SearchForm" action="https://community.dataiku.com/t5/tkb/v2/page.searchformv32.form.form" method="post" id="form_3bb9fcc7484389" name="form_3bb9fcc7484389">
<div class="t-invisible"><input
value="blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA.."
name="t:ac" type="hidden"><input value="search/contributions/page" name="t:cp" type="hidden"><input
value="KPDW8ZSwAIFqo03E6yOAINvqzsqLo6nMUoFTCo0lsO2L8MrzJ7WQFnt40pH2Y1T1eO9fjKGyy1AkcmdtyIA4i69oaunFB6S7W8S3r-cHeRdaXd_1zsri5OELal9hDZjWpjASzq3c3lW_MFsgl8snT1Bpov6t7vY5mccbxEPH9-6Y8pdzsY-rdguyeYcPv2PfyQuyEdEFqK9Has-FLCfe7qx2O5xRLxHi7x-HW28ZK7vT1B8GRRPyJMnP7dukPIoVrcAOYpTIFRXb1sKnTbUEeSaNK-_Ac0uWcFKaJIYBmEgH0ioZ2yDhBiiS0ha1QR5fKKZZM976rH9q6uqnfoeNhj1H_XMlbiSJz_garAq-NPElh1RrErEcYRQugygnQRrZ4AJ2QVXhNeoMVHQxrq6fOjE4qLeiWvwPSmM0JakR6Akybl7_7Nee8OjrMa0JlnNWvSaGtLxIMcE40zNhJhRx0PBANb0h42JwvjbKRXBqx2tUWG9Eq3hTY54ykXIA4fc_EtE8EMu_cja_3E41ezIg3iGuj1nZaCVkCHxKnX7MFrggakRO9F4kwbVahdoLQ0PJ9k8OwvoauqotaIo5GP_A89FO3K4sEshBwTDsmfQOIHZKTS7eNFdMwpNDDEhCzdepn6XtehHfwM-lml20OtSLdtumVAcosiFkwOWgT2zioYgtdNKangkPDgg4r8MOKZS78Hi8X2TiTJpQMJrhZvQdP-aHkLznfWStcULKSmjVNKuaxXdIeha94QHLvz9nsll4evaa9CbPTm_aJkdDZt96vCK3eOtv__fJcpYoav5ue1D5WEiEQZKND1gwGf_ytDVvkAdnq1ip9qxOxy0Y4x3lxcV2oEnxuplLAK_KVcjT0o3ymB5o6MKuv1k4q5JXMOAeyVBWu7DRUYvbW3bnXBy0PqJ4Jy8vMzB2zqQnO4VJbfJD-dUPeip3d28Ox9x01fnFNUnAoo1_SkzFzkNj1TGBgkCCML9BiUuWJeh6_VJcSweJdZ1ndQH0hsCtL-961kvr6qQUsRaAejgQ6JRf296EJGWmSYlu5IaCeMTbaGWgXtjx2ulTaUZYb7dLSAPedTuneirwEqfmIO81xHm1O1t-QS-5MGOmeRTVfWKaLEBYjCHAzLbwF_xKcNh__ldTTJYUwF32ELnLzriZO-5fQP1XRmio8nqDYXbyUq6CSJlk6bz4Yp6yU82XaHm_KwbwrYq5m3MxWE9WuBy6yAYO8MTQbKf2GmpH_0ckpJj3TZ1un-Jjd771QP558YL9KrPtpkMQU2r41p1xa8RTAdlpqOfp-p8SDsyTji6SySG4Zdx0kxM."
name="lia-form-context" type="hidden"><input value="TkbPage:blog-id/Awards:searchformv32.form:" name="liaFormContentKey" type="hidden"><input
value="5DI9GWMef1Esyz275vuiiOExwpQ=:H4sIAAAAAAAAALVSTU7CQBR+krAixkj0BrptjcpCMSbERGKCSmxcm+kwlGrbqTOvFDYexRMYL8HCnXfwAG5dubDtFKxgYgu4mrzvm3w/M+/pHcphHQ4kI4L2dMo9FLYZoM09qbeJxQ4V0+XC7e/tamqyBPEChwgbh1JAjQtLIz6hPaYh8ZlEMaxplAvm2KZmEsm0hhmBhOKpzZzOlsEw8LevR5W3zZfPEqy0oJIYc+eCuAyh2rolfaI7xLN0I8rjWfWBj7CuzJvf5osmbxRN3hacMimNwHRtKSOr0XNnv/vx+FoCGPjhMRzljhNLYHrEt9kA5T08ACCsKvREoYuqxqLl8BLO84q4UcMITcG49y/QOGs1pYyESl5p6V6qwRW086rinVmoxMZsiZud/zBUTc6gmVc4kExkJafmcYG1GM9+wfIsCkf2OP54hal5EjnG54z8h0XhjfcF7wQUs5Kz0GTjU2rOjc/llTT4Au07pDOcBQAA"
name="t:formdata" type="hidden"></div>
<div class="lia-inline-ajax-feedback">
<div class="AjaxFeedback" id="feedback_3bb9fcc7484389"></div>
</div>
<input value="-z4vhJDU0069q5UfIwniXfSRaFiWvTv0h0bC8O5Z7-Q." name="lia-action-token" type="hidden">
<input value="form_3bb9fcc7484389" id="form_UIDform_3bb9fcc7484389" name="form_UID" type="hidden">
<input value="" id="form_instance_keyform_3bb9fcc7484389" name="form_instance_key" type="hidden">
<span class="lia-search-granularity-wrapper">
<select title="Search Granularity" class="lia-search-form-granularity search-granularity" aria-label="Search Granularity" id="searchGranularity_3bb9fcc7484389" name="searchGranularity">
<option title="All community" value="gqmyn45884|community">All community</option>
<option title="This category" value="Programs|category">This category</option>
<option title="Knowledge base" selected="selected" value="Awards|tkb-board">Knowledge base</option>
<option title="Users" value="user|user">Users</option>
</select>
</span>
<span class="lia-search-input-wrapper">
<span class="lia-search-input-field">
<span class="lia-button-wrapper lia-button-wrapper-secondary lia-button-wrapper-searchForm-action"><input value="searchForm" name="submitContextX" type="hidden"><input class="lia-button lia-button-secondary lia-button-searchForm-action"
value="Search" id="submitContext_3bb9fcc7484389" name="submitContext" type="submit"></span>
<input placeholder="Search Dataiku use cases and success stories" aria-label="Search" title="Search" class="lia-form-type-text lia-autocomplete-input search-input lia-search-input-message" value="" id="messageSearchField_3bb9fcc7484389_0"
name="messageSearchField" type="text" aria-autocomplete="both" autocomplete="off">
<div class="lia-autocomplete-container" style="display: none; position: absolute;">
<div class="lia-autocomplete-header">Enter a search word</div>
<div class="lia-autocomplete-content">
<ul></ul>
</div>
<div class="lia-autocomplete-footer">
<a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="cgw667_8zzcQfauOzudhf_fJmyOde3Xt22LvKYxz-A4." rel="nofollow" id="disableAutoComplete_3bb9fcc783c01e" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&t:cp=action/contributions/searchactions">Turn off suggestions</a>
</div>
</div>
<input placeholder="Search Dataiku use cases and success stories" aria-label="Search" title="Search" class="lia-form-type-text lia-autocomplete-input search-input lia-search-input-tkb-article lia-js-hidden" value=""
id="messageSearchField_3bb9fcc7484389_1" name="messageSearchField_0" type="text" aria-autocomplete="both" autocomplete="off">
<div class="lia-autocomplete-container" style="display: none; position: absolute;">
<div class="lia-autocomplete-header">Enter a search word</div>
<div class="lia-autocomplete-content">
<ul></ul>
</div>
<div class="lia-autocomplete-footer">
<a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="zXYy1AWwZGN4D5aUHyUfPx71BRnDSz6Z-XLf6rO6uH4." rel="nofollow" id="disableAutoComplete_3bb9fcc7b56f61" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&t:cp=action/contributions/searchactions">Turn off suggestions</a>
</div>
</div>
<input placeholder="Search Dataiku use cases and success stories" ng-non-bindable="" title="Enter a user name or rank" class="lia-form-type-text UserSearchField lia-search-input-user search-input lia-js-hidden lia-autocomplete-input"
aria-label="Enter a user name or rank" value="" id="userSearchField_3bb9fcc7484389" name="userSearchField" type="text" aria-autocomplete="both" autocomplete="off">
<div class="lia-autocomplete-container" style="display: none; position: absolute;">
<div class="lia-autocomplete-header">Enter a user name or rank</div>
<div class="lia-autocomplete-content">
<ul></ul>
</div>
<div class="lia-autocomplete-footer">
<a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="6M7HFFaZ7EXAZeyEnjXQm47AiIat8eRiwvOKzKp-Z2M." rel="nofollow" id="disableAutoComplete_3bb9fcc7de4ef7" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&t:cp=action/contributions/searchactions">Turn off suggestions</a>
</div>
</div>
<input title="Enter a search word" class="lia-form-type-text NoteSearchField lia-search-input-note search-input lia-js-hidden lia-autocomplete-input" aria-label="Enter a search word" value="" id="noteSearchField_3bb9fcc7484389_0"
name="noteSearchField" type="text" aria-autocomplete="both" autocomplete="off" placeholder="Search Dataiku use cases and success stories">
<div class="lia-autocomplete-container" style="display: none; position: absolute;">
<div class="lia-autocomplete-header">Enter a search word</div>
<div class="lia-autocomplete-content">
<ul></ul>
</div>
<div class="lia-autocomplete-footer">
<a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="K20ERmlcIxyndcDPC1kbPi4wKOw9ajZ4p79Ys8EELF0." rel="nofollow" id="disableAutoComplete_3bb9fcc8071fab" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&t:cp=action/contributions/searchactions">Turn off suggestions</a>
</div>
</div>
<input title="Enter a search word" class="lia-form-type-text ProductSearchField lia-search-input-product search-input lia-js-hidden lia-autocomplete-input" aria-label="Enter a search word" value="" id="productSearchField_3bb9fcc7484389"
name="productSearchField" type="text" aria-autocomplete="both" autocomplete="off" placeholder="Search Dataiku use cases and success stories">
<div class="lia-autocomplete-container" style="display: none; position: absolute;">
<div class="lia-autocomplete-header">Enter a search word</div>
<div class="lia-autocomplete-content">
<ul></ul>
</div>
<div class="lia-autocomplete-footer">
<a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="ZG6qDTboUerQ708hlx1fzBQ500eAQrH2t18Zkiw8UkY." rel="nofollow" id="disableAutoComplete_3bb9fcc82bbd02" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&t:cp=action/contributions/searchactions">Turn off suggestions</a>
</div>
</div>
<input class="lia-as-search-action-id" name="as-search-action-id" type="hidden">
</span>
</span>
<span class="lia-cancel-search">cancel</span>
</form>
POST /restapi/vc/users/id/-1/media/albums/default/public/images/upload
<form id="logo_ajax_submition" action="/restapi/vc/users/id/-1/media/albums/default/public/images/upload" method="post" enctype="multipart/form-data" style="display: none;">
<input type="file" id="submission_logo_hidden" name="image.content" accept=".jpg,.png,.jpeg" autocomplete="off">
</form>
POST /t5/tkb/articleeditorpage/tkb-id/Awards/template-id/freeform?submission=new
<form id="awards_form" action="/t5/tkb/articleeditorpage/tkb-id/Awards/template-id/freeform?submission=new" method="post">
<fieldset id="awards_form_first" disabled="disabled">
<div class="shadowed-box">
<legend>Your Submission</legend>
<ol>
<li>
<fieldset style="margin-top: 24px">
<legend>Select your award category:</legend>
<p>You may enter your submission into multiple categories at once!</p>
<h4>Use Cases</h4>
<p>These categories focus on the practical applications of Dataiku:</p>
<div class="award_category">
<input type="checkbox" id="award_cat_1" data-group="1" name="award_cat_1" value="Data Science for Good">
<label for="award_cat_1"><strong>Data Science for Good</strong><br>Turning the spotlight on the best use of Dataiku by companies and individuals to make a positive impact on the world.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_2" data-group="1" name="award_cat_2" value="Responsible AI">
<label for="award_cat_2"><strong>Responsible AI</strong><br>Highlighting the individuals and organizations who are using Dataiku to develop foundational AI for the future, that is governable, sustainable, transparent, and seeks to
remove bias.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_3" data-group="1" name="award_cat_3" value="Value at Scale">
<label for="award_cat_3"><strong>Value at Scale</strong><br>Showcasing the pioneering individual and organizational use of Dataiku to manage the full lifecycle of models and pipelines, and deliver value at scale.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_10" data-group="1" name="award_cat_10" value="Moonshot Pioneer(s)">
<label for="award_cat_10"><strong>Partner Acceleration</strong><br>Featuring successful partnerships between Dataiku, partner organizations, and customers to bring a use case to fruition faster, smarter, and/or better.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_4" data-group="1" name="award_cat_4" value="Moonshot Pioneer(s)">
<label for="award_cat_4"><strong>Moonshot Pioneer(s)</strong><br>Rewarding the pioneers who are pushing the boundaries of Dataiku to build innovative projects - including for fun!</label>
</div>
<h4>Success Stories</h4>
<p>These categories highlight individual and collective achievements:</p>
<div class="award_category">
<input type="checkbox" id="award_cat_5" data-group="2" name="award_cat_5" value="Most Impactful Transformation Story">
<label for="award_cat_5"><strong>Most Impactful Transformation Story</strong><br>Recognizing inspiring transformation stories from organizations which have systematized the use of data and AI with Dataiku.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_6" data-group="2" name="award_cat_6" value="Most Impactful Ikigai Story">
<label for="award_cat_6"><strong>Most Impactful Ikigai Story</strong><br>Turning the spotlight on nonprofit organizations or volunteers who leverage Dataiku to accelerate their organization’s mission and grow their positive social
and/or environmental impact.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_7" data-group="2" name="award_cat_7" value="Excellence in Teaching">
<label for="award_cat_7"><strong>Excellence in Teaching</strong><br>Recognizing members of the teaching faculty for their invaluable contribution to educate the next generation of analytical talent with Dataiku, driving innovation in
the field and aligning with real world use cases.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_8" data-group="2" name="award_cat_8" value="Excellence in Research">
<label for="award_cat_8"><strong>Excellence in Research</strong><br>Starring academic researchers who are leveraging Dataiku to gain impactful insights from their data and push the frontiers of our knowledge.</label>
</div>
<div class="award_category">
<input type="checkbox" id="award_cat_9" data-group="2" name="award_cat_9" value="Most Extraordinary AI Maker">
<label for="award_cat_9"><strong>Most Extraordinary AI Maker(s)</strong><br>Spotlighting inspiring stories of AI makers who have made a bigger impact with Dataiku through individual upskill, business & tech collaboration, or
elevating others to harness the power of data.</label>
</div>
</fieldset>
</li>
<li>
<label for="submission_role">You are applying as</label>
<select id="submission_role" name="submission_role" autocomplete="off">
<option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
<option value="Large-scale company (revenue over $1 billion USD)">Large-scale company (revenue over $1 billion USD)</option>
<option value="Small or medium-sized company">Small or medium-sized company</option>
<option value="Partner organization">Partner organization</option>
<option value="Nonprofit organization">Nonprofit organization</option>
<option value="Academic(s)">Academic(s)</option>
<option value="Individual user(s)">Individual user(s)</option>
</select>
</li>
</ol>
</div>
</fieldset>
<fieldset id="awards_form_first_b" style="display: none;" disabled="disabled">
<div class="shadowed-box">
<legend>Your Use Case</legend>
<ol start="3">
<li>
<label for="submission_challenge">What business challenge were you encountering?</label>
<p>Feel free to contextualize by describing your industry, listing pain points, any frictions or obstacles that you met… <span class="formhint" filled-state="none"><span word-limit="300">0</span>/300
words</span><!--span data-text="Feel free to contextualize by describing your industry, listing pain points, any frictions or obstacles that you met…" class="tooltip">?</span--></p>
<textarea id="submission_challenge" name="submission_challenge" style="height:200px;"></textarea>
</li>
<li>
<label for="submission_solve">How did you solve it with Dataiku?</label>
<p><span class="corporate-option">You can highlight the reasons behind choosing Dataiku and how it helped you, how many users were involved across different roles, any techniques or other technologies you used, steps to complete your
project, and more generally describe your journey to success.</span><span class="noncorporate-option" style="display: none;">Here’s the place to detail your success story - you can highlight the reasons behind choosing Dataiku and how
it helped you reach your goals, as well as any important steps along your journey to success.</span><span id="excellence_in_teaching_text" class="dynamic-answer-text" style="display:none;"><br>Can you share more about your course
content and how it aligns with real-word use cases that prepare students for their careers?</span><span id="excellence_in_research_text" class="dynamic-answer-text" style="display:none;"><br>Can you detail the innovative approach of
your project and the impact your research has?</span> <span class="formhint" filled-state="none"><span word-limit="300">0</span>/300 words</span></p>
<textarea id="submission_solve" name="submission_solve" style="height:200px;"></textarea>
</li>
<li>
<label for="submission_businessarea">Business area enhanced</label>
<select id="submission_businessarea" name="submission_businessarea" autocomplete="off">
<option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
<option value="Accounting/Finance">Accounting/Finance</option>
<option value="Analytics">Analytics</option>
<option value="Communication/Strategy/Competitive Intelligence">Communication/Strategy/Competitive Intelligence</option>
<option value="Human Resources">Human Resources</option>
<option value="Internal Operations">Internal Operations</option>
<option value="IT/Cybersecurity/Data">IT/Cybersecurity/Data</option>
<option value="Manufacturing">Manufacturing</option>
<option value="Marketing/Sales/Customer Relationship Management">Marketing/Sales/Customer Relationship Management</option>
<option value="Product & Service Development">Product & Service Development</option>
<option value="Risk/Compliance/Legal/Internal Audit">Risk/Compliance/Legal/Internal Audit</option>
<option value="Supply-chain/Supplier Management/Service Delivery">Supply-chain/Supplier Management/Service Delivery</option>
<option value="Financial Services Specific">Financial Services Specific</option>
<option value="Other">Other - please specify</option>
<option value="Unknown">Unknown</option>
</select>
<input type="text" id="submission_businessarea_other" name="submission_businessarea_other" placeholder="please specify" style="display:none;">
</li>
<li>
<label for="submission_stage">Use case stage</label>
<select id="submission_stage" name="submission_stage" autocomplete="off">
<option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
<option value="Proof of Concept">Proof of Concept</option>
<option value="In Progress">In Progress</option>
<option value="Built & Functional">Built & Functional</option>
<option value="In Production">In Production</option>
<option value="Planned">Planned</option>
<option value="Archived/Paused">Archived/Paused</option>
<option value="Unknown">Unknown</option>
</select>
</li>
</ol>
</div>
</fieldset>
<fieldset id="awards_form_first_c" style="display: none;" disabled="disabled">
<div class="shadowed-box">
<legend>Value Generated</legend>
<ol start="7">
<li>
<label for="submission_value">Can you explain the value created with this use case or success story?</label>
<p><span class="corporate-option">Now is the time to explain the impact achieved - this can be ROI, metrics, and/or any other indicators of success!</span><span class="noncorporate-option" style="display: none;">Now is the time to explain
the value generated - this can be ROI, metrics, and/or any other indicators of success!</span> <span class="formhint" filled-state="none"><span word-limit="300">0</span>/300 words</span></p>
<textarea id="submission_value" name="submission_value" style="height:200px;"></textarea>
</li>
<li>
<label for="submission_valuespecific">What is the specific value brought by Dataiku?</label>
<p><span class="corporate-option">Some food for thought: speed and agility through increased team efficiency, enhanced tech stack efficiency, improved risk management and governance through transparency and explainability, upskilling and
networking with resources such as the Dataiku Academy and Community…</span><span class="noncorporate-option" style="display: none;">Some food for thought: speed and agility through increased team efficiency, enhanced tech stack
efficiency, improved risk management, and governance through transparency and explainability, upskilling and networking through resources such as the Academy and Community…</span> <span class="formhint" filled-state="none"><span
word-limit="300">0</span>/300 words</span></p>
<textarea id="submission_valuespecific" name="submission_valuespecific" style="height:200px;"></textarea>
</li>
<li class="multiple-selection-li">
<label for="submission_valuetype">Value type</label>
<label class="inset-checkbox" for="submission_valuetype_2"><input type="checkbox" id="submission_valuetype_2" name="submission_valuetype_2">
<p>Improve customer/employee satisfaction</p>
</label>
<label class="inset-checkbox" for="submission_valuetype_3"><input type="checkbox" id="submission_valuetype_3" name="submission_valuetype_3">
<p>Increase revenue</p>
</label>
<label class="inset-checkbox" for="submission_valuetype_4"><input type="checkbox" id="submission_valuetype_4" name="submission_valuetype_4">
<p>Reduce cost</p>
</label>
<label class="inset-checkbox" for="submission_valuetype_5"><input type="checkbox" id="submission_valuetype_5" name="submission_valuetype_5">
<p>Reduce risk</p>
</label>
<label class="inset-checkbox" for="submission_valuetype_6"><input type="checkbox" id="submission_valuetype_6" name="submission_valuetype_6">
<p>Save time</p>
</label>
<label class="inset-checkbox" for="submission_valuetype_7"><input type="checkbox" id="submission_valuetype_7" name="submission_valuetype_7">
<p>Increase trust</p>
</label>
<label class="inset-checkbox" for="submission_valuetype_8"><input type="checkbox" id="submission_valuetype_8" name="submission_valuetype_8">
<p>Other</p>
</label>
<input type="text" id="submission_valuetype_other" name="submission_valuetype_other" placeholder="please specify" style="display:none;">
<label class="inset-checkbox" for="submission_valuetype_1"><input type="checkbox" id="submission_valuetype_1" name="submission_valuetype_1">
<p>Unknown</p>
</label>
</li>
<li>
<label for="submission_valuerange">Value range</label>
<select id="submission_valuerange" name="submission_valuerange" autocomplete="off">
<option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
<option value="Less than $1,000">Less than $1,000</option>
<option value="Thousands of $">Thousands of $</option>
<option value="Hundreds of thousands of $">Hundreds of thousands of $</option>
<option value="Millions of $">Millions of $</option>
<option value="Dozens of millions of $">Dozens of millions of $</option>
<option value="Unknown">Unknown</option>
</select>
</li>
</ol>
</div>
</fieldset>
<fieldset id="awards_form_second" style="display: none;" disabled="disabled">
<div class="shadowed-box">
<legend>About your organization <span class="formhint">(optional if applying as an individual)</span></legend>
<ol start="11">
<li>
<label for="submission_orgname">Organization name</label>
<input type="text" id="submission_orgname" name="submission_orgname">
</li>
<li>
<label for="submission_boilerplate">Boilerplate</label>
<p>Short, standard description of your organization <span class="formhint" filled-state="none"><span word-limit="100">0</span>/100 words</span></p>
<textarea id="submission_boilerplate" name="submission_boilerplate" style="height:200px;"></textarea>
</li>
<li>
<label for="submission_logo">Logo</label>
<input type="text" id="submission_logo" name="submission_logo" autocomplete="off" style="display: none;">
<input type="text" id="submission_logo_id" name="submission_logo_id" autocomplete="off" style="display: none;">
<label class="lia-attachments-drop-zone" for="submission_logo_hidden">
<div class="lia-file-upload-wrapper" id="filedragdrop">
<div class="lia-file-upload">
<div class="lia-file-upload-content">
<div class="lia-attachment-description">
<div class="lia-cloud-symbol">
<span class="lia-img-icon-cloud-upload lia-fa-icon lia-fa-cloud-upload lia-fa"></span>
</div>
<div class="lia-attachment-description-details">
<div class="lia-attachment-description-text">Browse files to attach</div>
<div class="lia-attachment-constraints">Maximum size: 15 MB • File types allowed: JPG, PNG</div>
</div>
</div>
</div>
</div>
</div>
</label>
</li>
<li>
<label for="submission_industry">Industry</label>
<select id="submission_industry" name="submission_industry" autocomplete="off">
<option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
<option>Aerospace & Defence</option>
<option>Agriculture</option>
<option>Auto Transportation & Logistics</option>
<option>Construction & Real Estate</option>
<option>Energy & Utilities</option>
<option>Financial Services Banking & Insurance</option>
<option>Health & Pharmaceuticals</option>
<option>Higher Education</option>
<option>Manufacturing & Chemical</option>
<option>Media Information & Entertainment</option>
<option>Nonprofit</option>
<option>Professional Services & Consulting</option>
<option>Public Sector & Government</option>
<option>Retail Ecommerce & CPG</option>
<option>Software & Technology</option>
<option>Telecommunications</option>
<option>Travel & Hospitality</option>
<option>Other</option>
</select>
</li>
</ol>
</div>
</fieldset>
<fieldset id="awards_form_third" style="display: none;" disabled="disabled">
<div class="shadowed-box">
<legend>About you</legend>
<ol start="15">
<li>
<label for="submission_fullname">Your full name</label>
<input type="text" id="submission_fullname" name="submission_fullname" value=" ">
</li>
<li>
<label for="submission_title">Your title</label>
<input type="text" id="submission_title" name="submission_title">
</li>
<li>
<label for="submission_country">Your country</label>
<select id="submission_country" name="submission_country" autocomplete="off">
<option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
<option value="United States">United States</option>
<option value="United Kingdom">United Kingdom</option>
<option value="France">France</option>
<option value="Germany">Germany</option>
<option value="India">India</option>
<option value="Canada">Canada</option>
<option value="Afghanistan">Afghanistan</option>
<option value="Albania">Albania</option>
<option value="Algeria">Algeria</option>
<option value="Andorra">Andorra</option>
<option value="Angola">Angola</option>
<option value="Antigua & Deps">Antigua & Deps</option>
<option value="Argentina">Argentina</option>
<option value="Armenia">Armenia</option>
<option value="Australia">Australia</option>
<option value="Austria">Austria</option>
<option value="Azerbaijan">Azerbaijan</option>
<option value="Bahamas">Bahamas</option>
<option value="Bahrain">Bahrain</option>
<option value="Bangladesh">Bangladesh</option>
<option value="Barbados">Barbados</option>
<option value="Belarus">Belarus</option>
<option value="Belgium">Belgium</option>
<option value="Belize">Belize</option>
<option value="Benin">Benin</option>
<option value="Bermuda">Bermuda</option>
<option value="Bhutan">Bhutan</option>
<option value="Bolivia">Bolivia</option>
<option value="Bosnia Herzegovina">Bosnia Herzegovina</option>
<option value="Botswana">Botswana</option>
<option value="Brazil">Brazil</option>
<option value="Brunei">Brunei</option>
<option value="Bulgaria">Bulgaria</option>
<option value="Burkina">Burkina</option>
<option value="Burundi">Burundi</option>
<option value="Cambodia">Cambodia</option>
<option value="Cameroon">Cameroon</option>
<option value="Cape Verde">Cape Verde</option>
<option value="Central African Rep">Central African Rep</option>
<option value="Chad">Chad</option>
<option value="Chile">Chile</option>
<option value="China">China</option>
<option value="Colombia">Colombia</option>
<option value="Comoros">Comoros</option>
<option value="Congo">Congo</option>
<option value="Congo {Democratic Rep}">Congo {Democratic Rep}</option>
<option value="Costa Rica">Costa Rica</option>
<option value="Croatia">Croatia</option>
<option value="Cuba">Cuba</option>
<option value="Cyprus">Cyprus</option>
<option value="Czech Republic">Czech Republic</option>
<option value="Denmark">Denmark</option>
<option value="Djibouti">Djibouti</option>
<option value="Dominica">Dominica</option>
<option value="Dominican Republic">Dominican Republic</option>
<option value="East Timor">East Timor</option>
<option value="Ecuador">Ecuador</option>
<option value="Egypt">Egypt</option>
<option value="El Salvador">El Salvador</option>
<option value="Equatorial Guinea">Equatorial Guinea</option>
<option value="Eritrea">Eritrea</option>
<option value="Estonia">Estonia</option>
<option value="Ethiopia">Ethiopia</option>
<option value="Fiji">Fiji</option>
<option value="Finland">Finland</option>
<option value="Gabon">Gabon</option>
<option value="Gambia">Gambia</option>
<option value="Georgia">Georgia</option>
<option value="Ghana">Ghana</option>
<option value="Greece">Greece</option>
<option value="Grenada">Grenada</option>
<option value="Guatemala">Guatemala</option>
<option value="Guinea">Guinea</option>
<option value="Guinea-Bissau">Guinea-Bissau</option>
<option value="Guyana">Guyana</option>
<option value="Haiti">Haiti</option>
<option value="Honduras">Honduras</option>
<option value="Hong Kong">Hong Kong</option>
<option value="Hungary">Hungary</option>
<option value="Iceland">Iceland</option>
<option value="Indonesia">Indonesia</option>
<option value="Iran">Iran</option>
<option value="Iraq">Iraq</option>
<option value="Ireland {Republic}">Ireland {Republic}</option>
<option value="Israel">Israel</option>
<option value="Italy">Italy</option>
<option value="Ivory Coast">Ivory Coast</option>
<option value="Jamaica">Jamaica</option>
<option value="Japan">Japan</option>
<option value="Jordan">Jordan</option>
<option value="Kazakhstan">Kazakhstan</option>
<option value="Kenya">Kenya</option>
<option value="Kiribati">Kiribati</option>
<option value="Korea North">Korea North</option>
<option value="Korea South">Korea South</option>
<option value="Kuwait">Kuwait</option>
<option value="Kyrgyzstan">Kyrgyzstan</option>
<option value="Latvia">Latvia</option>
<option value="Lebanon">Lebanon</option>
<option value="Lesotho">Lesotho</option>
<option value="Liberia">Liberia</option>
<option value="Libya">Libya</option>
<option value="Liechtenstein">Liechtenstein</option>
<option value="Lithuania">Lithuania</option>
<option value="Luxembourg">Luxembourg</option>
<option value="Macedonia">Macedonia</option>
<option value="Madagascar">Madagascar</option>
<option value="Malawi">Malawi</option>
<option value="Malaysia">Malaysia</option>
<option value="Maldives">Maldives</option>
<option value="Mali">Mali</option>
<option value="Malta">Malta</option>
<option value="Marshall Islands">Marshall Islands</option>
<option value="Mauritania">Mauritania</option>
<option value="Mauritius">Mauritius</option>
<option value="Mexico">Mexico</option>
<option value="Micronesia">Micronesia</option>
<option value="Moldova">Moldova</option>
<option value="Monaco">Monaco</option>
<option value="Mongolia">Mongolia</option>
<option value="Montenegro">Montenegro</option>
<option value="Morocco">Morocco</option>
<option value="Mozambique">Mozambique</option>
<option value="Namibia">Namibia</option>
<option value="Nauru">Nauru</option>
<option value="Nepal">Nepal</option>
<option value="Netherlands">Netherlands</option>
<option value="New Zealand">New Zealand</option>
<option value="Nicaragua">Nicaragua</option>
<option value="Niger">Niger</option>
<option value="Nigeria">Nigeria</option>
<option value="Norway">Norway</option>
<option value="Oman">Oman</option>
<option value="Pakistan">Pakistan</option>
<option value="Palau">Palau</option>
<option value="Panama">Panama</option>
<option value="Papua New Guinea">Papua New Guinea</option>
<option value="Paraguay">Paraguay</option>
<option value="Peru">Peru</option>
<option value="Philippines">Philippines</option>
<option value="Poland">Poland</option>
<option value="Portugal">Portugal</option>
<option value="Qatar">Qatar</option>
<option value="Romania">Romania</option>
<option value="Russian Federation">Russian Federation</option>
<option value="Rwanda">Rwanda</option>
<option value="Samoa">Samoa</option>
<option value="San Marino">San Marino</option>
<option value="Saudi Arabia">Saudi Arabia</option>
<option value="Senegal">Senegal</option>
<option value="Serbia">Serbia</option>
<option value="Seychelles">Seychelles</option>
<option value="Sierra Leone">Sierra Leone</option>
<option value="Singapore">Singapore</option>
<option value="Slovakia">Slovakia</option>
<option value="Slovenia">Slovenia</option>
<option value="Solomon Islands">Solomon Islands</option>
<option value="Somalia">Somalia</option>
<option value="South Africa">South Africa</option>
<option value="South Sudan">South Sudan</option>
<option value="Spain">Spain</option>
<option value="Sri Lanka">Sri Lanka</option>
<option value="Sudan">Sudan</option>
<option value="Suriname">Suriname</option>
<option value="Sweden">Sweden</option>
<option value="Switzerland">Switzerland</option>
<option value="Taiwan">Taiwan</option>
<option value="Tajikistan">Tajikistan</option>
<option value="Thailand">Thailand</option>
<option value="Togo">Togo</option>
<option value="Tonga">Tonga</option>
<option value="Tunisia">Tunisia</option>
<option value="Turkey">Turkey</option>
<option value="Turkmenistan">Turkmenistan</option>
<option value="Tuvalu">Tuvalu</option>
<option value="Uganda">Uganda</option>
<option value="Ukraine">Ukraine</option>
<option value="United Arab Emirates">United Arab Emirates</option>
<option value="Uruguay">Uruguay</option>
<option value="Uzbekistan">Uzbekistan</option>
<option value="Vanuatu">Vanuatu</option>
<option value="Vatican City">Vatican City</option>
<option value="Venezuela">Venezuela</option>
<option value="Vietnam">Vietnam</option>
<option value="Yemen">Yemen</option>
<option value="Zambia">Zambia</option>
<option value="Zimbabwe">Zimbabwe</option>
</select>
</li>
<li>
<label for="submission_collaborators">If you are applying as a team or organization, enter the name of your teammates (and usernames on the Dataiku Community if relevant):</label>
<textarea id="submission_collaborators" name="submission_collaborators" style="height:122px;"></textarea>
</li>
</ol>
</div>
<div class="shadowed-box">
<input type="checkbox" id="submission_tos" name="submission_tos" value="true" required="">
<label for="submission_tos">I have reviewed and accepted the <a href="https://downloads.dataiku.com/publicdocs/Dataiku_Frontrunner_Awards_2021_Terms_latest.pdf" target="_blank">Submission Terms & Conditions</a>.</label>
<div class="preview-and-add-images"><a class="button lia-button lia-button-primary" href="javascript:void(0);">Preview and Submit</a></div>
</div>
</fieldset>
</form>
Text Content
This website uses cookies. By clicking OK, you consent to the use of cookies. Read our cookie policy. Accept Reject Community Showing results for Search instead for Did you mean: Browse * Discussions * Setup & Configuration * Using Dataiku * Plugins & Extending Dataiku * General Discussion * Job Board * Community Resources * Product Ideas * Knowledge * Getting Started * Knowledge Base * Documentation * Academy * Quick Start Programs * Learning Paths * Certifications * Course Catalog * Academy Discussions * Community Programs * Dataiku Neurons * User Groups * User Groups Resources * Dataiku Inspiration * Dataiku Frontrunner Awards * Student Showcase * Online Events * Upcoming Events * Past Events * Community Conundrums * Banana Data Podcast * Community Feedback * User Research * What's New Sign In You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION * Community * » * Community Programs * » Options * My Knowledge Base Contributions * Knowledge Base Article Dashboard * * Subscribe * Bookmark * * Subscribe to RSS Feed * * Invite a Friend DATAIKU FRONTRUNNER AWARDS Celebrating extraordinary people who are paving the way for Everyday AI with Dataiku All communityThis categoryKnowledge baseUsers Enter a search word Turn off suggestions Enter a search word Turn off suggestions Enter a user name or rank Turn off suggestions Enter a search word Turn off suggestions Enter a search word Turn off suggestions cancel Turn on suggestions Why Participate? Enter Your Submission Dataiku Use Cases And Success Stories 2021 Winners & Finalists The Dataiku Frontrunner Awards recognize the success of Dataiku customers, partners, nonprofits, academics, and all individual users. Enter your submission today to share your pioneering achievements — be it automating everyday tasks, elevating more people to harness the power of data, systemizing transformation, or tackling moonshot projects! To participate, fill out the submission form to detail the impact that you, your team, or your organization have achieved with Dataiku in one (or several!) of the award categories. Winners will be determined by a panel of judges, which includes Dataiku executives as well as independent industry experts, and announced in the fall of 2022. For any questions, please email the team at community@dataiku.com. We’re here to help you celebrate your success! Enter Your Submission WHY PARTICIPATE? BE RECOGNIZED AS A THOUGHT LEADER Video features and speaking opportunities will enable winners and finalists to earn visibility in the industry, while all participants will gain exposure on Dataiku’s networks. CELEBRATE YOUR INDIVIDUAL & TEAM’S SUCCESS Inspire the data science community by sharing your achievements and the value you have generated, individually or collectively. ENHANCE YOUR EMPLOYER BRANDING Showcase your innovation with the data science community and entice the brightest talents to join your organization and contribute to your success. WIN SPECIAL PRIZES AND SWAG Winners will be offered a unique trophy and a donation to the charity of their choice, and special Dataiku swag will be sent to all participants to thank you for your contribution to knowledge sharing! Submissions are open until Thursday, September 15th at 11:59am EST. We recommend drafting your entry in a separate document. Once ready, copy over to the form, upload any helpful visual elements (e.g. graphs, screenshots, infographics, or videos), and hit submit! By entering your submission, you agree to the Submission Terms & Conditions. Sign in to enter your submission Your Submission 1. Select your award category: You may enter your submission into multiple categories at once! USE CASES These categories focus on the practical applications of Dataiku: Data Science for Good Turning the spotlight on the best use of Dataiku by companies and individuals to make a positive impact on the world. Responsible AI Highlighting the individuals and organizations who are using Dataiku to develop foundational AI for the future, that is governable, sustainable, transparent, and seeks to remove bias. Value at Scale Showcasing the pioneering individual and organizational use of Dataiku to manage the full lifecycle of models and pipelines, and deliver value at scale. Partner Acceleration Featuring successful partnerships between Dataiku, partner organizations, and customers to bring a use case to fruition faster, smarter, and/or better. Moonshot Pioneer(s) Rewarding the pioneers who are pushing the boundaries of Dataiku to build innovative projects - including for fun! SUCCESS STORIES These categories highlight individual and collective achievements: Most Impactful Transformation Story Recognizing inspiring transformation stories from organizations which have systematized the use of data and AI with Dataiku. Most Impactful Ikigai Story Turning the spotlight on nonprofit organizations or volunteers who leverage Dataiku to accelerate their organization’s mission and grow their positive social and/or environmental impact. Excellence in Teaching Recognizing members of the teaching faculty for their invaluable contribution to educate the next generation of analytical talent with Dataiku, driving innovation in the field and aligning with real world use cases. Excellence in Research Starring academic researchers who are leveraging Dataiku to gain impactful insights from their data and push the frontiers of our knowledge. Most Extraordinary AI Maker(s) Spotlighting inspiring stories of AI makers who have made a bigger impact with Dataiku through individual upskill, business & tech collaboration, or elevating others to harness the power of data. 2. You are applying as --Please choose an option-- Large-scale company (revenue over $1 billion USD) Small or medium-sized company Partner organization Nonprofit organization Academic(s) Individual user(s) Your Use Case 3. What business challenge were you encountering? Feel free to contextualize by describing your industry, listing pain points, any frictions or obstacles that you met… 0/300 words 4. How did you solve it with Dataiku? You can highlight the reasons behind choosing Dataiku and how it helped you, how many users were involved across different roles, any techniques or other technologies you used, steps to complete your project, and more generally describe your journey to success.Here’s the place to detail your success story - you can highlight the reasons behind choosing Dataiku and how it helped you reach your goals, as well as any important steps along your journey to success. Can you share more about your course content and how it aligns with real-word use cases that prepare students for their careers? Can you detail the innovative approach of your project and the impact your research has? 0/300 words 5. Business area enhanced --Please choose an option-- Accounting/Finance Analytics Communication/Strategy/Competitive Intelligence Human Resources Internal Operations IT/Cybersecurity/Data Manufacturing Marketing/Sales/Customer Relationship Management Product & Service Development Risk/Compliance/Legal/Internal Audit Supply-chain/Supplier Management/Service Delivery Financial Services Specific Other - please specify Unknown 6. Use case stage --Please choose an option-- Proof of Concept In Progress Built & Functional In Production Planned Archived/Paused Unknown Value Generated 7. Can you explain the value created with this use case or success story? Now is the time to explain the impact achieved - this can be ROI, metrics, and/or any other indicators of success!Now is the time to explain the value generated - this can be ROI, metrics, and/or any other indicators of success! 0/300 words 8. What is the specific value brought by Dataiku? Some food for thought: speed and agility through increased team efficiency, enhanced tech stack efficiency, improved risk management and governance through transparency and explainability, upskilling and networking with resources such as the Dataiku Academy and Community…Some food for thought: speed and agility through increased team efficiency, enhanced tech stack efficiency, improved risk management, and governance through transparency and explainability, upskilling and networking through resources such as the Academy and Community… 0/300 words 9. Value type Improve customer/employee satisfaction Increase revenue Reduce cost Reduce risk Save time Increase trust Other Unknown 10. Value range --Please choose an option-- Less than $1,000 Thousands of $ Hundreds of thousands of $ Millions of $ Dozens of millions of $ Unknown About your organization (optional if applying as an individual) 11. Organization name 12. Boilerplate Short, standard description of your organization 0/100 words 13. Logo Browse files to attach Maximum size: 15 MB • File types allowed: JPG, PNG 14. Industry --Please choose an option-- Aerospace & Defence Agriculture Auto Transportation & Logistics Construction & Real Estate Energy & Utilities Financial Services Banking & Insurance Health & Pharmaceuticals Higher Education Manufacturing & Chemical Media Information & Entertainment Nonprofit Professional Services & Consulting Public Sector & Government Retail Ecommerce & CPG Software & Technology Telecommunications Travel & Hospitality Other About you 15. Your full name 16. Your title 17. Your country --Please choose an option-- United States United Kingdom France Germany India Canada Afghanistan Albania Algeria Andorra Angola Antigua & Deps Argentina Armenia Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia Herzegovina Botswana Brazil Brunei Bulgaria Burkina Burundi Cambodia Cameroon Cape Verde Central African Rep Chad Chile China Colombia Comoros Congo Congo {Democratic Rep} Costa Rica Croatia Cuba Cyprus Czech Republic Denmark Djibouti Dominica Dominican Republic East Timor Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Fiji Finland Gabon Gambia Georgia Ghana Greece Grenada Guatemala Guinea Guinea-Bissau Guyana Haiti Honduras Hong Kong Hungary Iceland Indonesia Iran Iraq Ireland {Republic} Israel Italy Ivory Coast Jamaica Japan Jordan Kazakhstan Kenya Kiribati Korea North Korea South Kuwait Kyrgyzstan Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Macedonia Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Mauritania Mauritius Mexico Micronesia Moldova Monaco Mongolia Montenegro Morocco Mozambique Namibia Nauru Nepal Netherlands New Zealand Nicaragua Niger Nigeria Norway Oman Pakistan Palau Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal Qatar Romania Russian Federation Rwanda Samoa San Marino Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa South Sudan Spain Sri Lanka Sudan Suriname Sweden Switzerland Taiwan Tajikistan Thailand Togo Tonga Tunisia Turkey Turkmenistan Tuvalu Uganda Ukraine United Arab Emirates Uruguay Uzbekistan Vanuatu Vatican City Venezuela Vietnam Yemen Zambia Zimbabwe 18. If you are applying as a team or organization, enter the name of your teammates (and usernames on the Dataiku Community if relevant): I have reviewed and accepted the Submission Terms & Conditions. Preview and Submit 0 % 0% complete Explore use cases and success stories from outstanding Dataiku users below, and give kudos to your favorites to show your support! All kudos given by August 31 on the 2022 submissions of the Dataiku Frontrunner Awards will be taken into consideration by our jury members. Use the following labels to filter submissions by industry: * Energy & Utilities * Financial Services, Banking & Insurance * Health & Pharmaceuticals * Higher Education * Manufacturing & Chemical * Nonprofit * Software & Technology * Retail Ecommerce & CPG * Professional Services & Consulting * Telecommunications * Other No posts to display. Dayananda Sagar University - Developing Management Professionals with Data-Driven Problem Solving and Decision-Making Skills Name: Prof Alok ChakravartyProf H N ShankarProf Sai PraveenProf A. Nagaraj Subbarao Country: India Organization: SCMS-PG, Dayananda Sagar University The School of Commerce & Management Studies, Dayananda Sagar University, Bengaluru, India, is a prestigious Business School with an emphasis on crafting superior business leaders and entrepreneurs. The ethos of the B School is to stay fully conscious of the changing business environment, particularly the technological environment and the need for digital literacy, and to disseminate this knowledge to our students. The school has an Executive MBA program for the working professional and a full-time MBA program, which attracts many graduate students with no work experience. Awards Categories: Excellence in Teaching Business Challenge: Our objective at the School of Commerce & Management Studies is to integrate Business Analytics into each functional management area: Marketing, Finance, Supply Chain Management, and Human Resource Management. In this process, we prepare our students for the future and position them to add value to digital transformation. Besides functional electives such as HR, Marketing, Finance, and Supply Chain, we also offer Technology electives like Business Analytics, Artificial Intelligence, and Information Technology as specialization electives to our students. Students can choose a major and a minor specialization in their second year. In the first year, we orient the students with foundations of business analytics. In the second year, we offer the following courses to our students who choose Business Analytics as a specialization: Data Management Systems: Master Data Management, Data RDBMS, Data Warehouse, No SQL, Big Data, Data Lake Data Visualization Using Tableau Applied Analytics (in different functional areas) Predictive Analytics Using R Exploratory Data Analysis using Python We faced the following challenges while designing and delivering our curriculum: Identifying competent faculties who understand functional areas as well as business analytics. Address the fear of coding amongst students. Focus on problem-solving and not get bogged down in technicalities. Addressing the process life-cycle view (such as CRISP-DM or SEMMA) of business analytics. Understanding the core concepts of Business Analytics in an easy-to-understand manner. Appreciate the integrated and interdependent way in which industry professionals work while executing a business analytics project. Exposure to an industrial strength platform that provides hands-on experience in business problem solving to students. Business Solution: We at the School of Commerce & Management Studies are proud to be the first Business School in India to have an academic alliance with Dataiku, USA. We were introduced to Dataiku through our senior Prof H N Shankar in early 2021. Dataiku made available a cloud instance with approximately 100 user ids. We are utilizing these for our full-time and Executive MBA program students. One of the advantages that we received because of this academic partnership was Dataiku's Academy, which is a rich repository of courses such as Basics 101 to 103, Visual Recipes, Visual Machine Learning, Advanced Analytics, and many more. We begin our orientation to Business Analytics by introducing our students to Basics 101 to Basics 103 and Intro to Machine Learning course modules in Academy. We then encourage our students to take Core Designer certification. We follow it up by solving several Use Cases available in Academy, such as: Predictive Maintenance in Manufacturing Industry Customer Churn Prediction Perform referrer and visitor analysis using Web Logs Network Optimization for a Car rental company Bike Sharing Usage Patterns Value Generated: For the present batch of first-year students, our Dataiku Academy metrics are as under: Master of Business Administration Basics 101: 111 Basics 102: 103 Basics 103: 94 Intro to ML: 76 Use Case Predictive Maintenance: 5 Core Designer Certificate: 10 Executive Master of Business Administration Basics 101: 42 Basics 102: 42 Basics 103: 42 Intro to ML: 42 Use Case Predictive Maintenance: 42 Core Designer Certificate: 42 Dataiku Data Scientists Mr Devesh and Mr Shubham conducted an on-premise workshop on Dataiku's platform, and 103 students from MBA 1st Year attended it. We also had a healthy coverage of Dataiku Academy courses in our outgoing batch of second-year students, and that is reflected in our overall placement percentage. Specifically, 90% of our outgoing students who had opted for Business Analytics as a major specialization got relevant analytics positions in good companies. Ten of our first-year students, along with four faculties, got the opportunity to attend the Everyday AI Conference in Bangalore and that had a major impact on the motivation and understanding of AI and Business Analytics amongst students. Value Brought by Dataiku: During the past decade or so, business analytics platforms have evolved from supporting IT and finance functions only to enabling business users across the organization or enterprise. However, many firms find themselves struggling to take advantage of its promise and the richness of data afforded. The data analytics program at the School of Commerce & Management Studies is working to provide the industry with well-trained resources to address digital world issues. By integrating the Dataiku Platform into our curriculum, we could effectively address the challenges mentioned earlier: Identifying competent faculties who understand functional areas as well as business analytics. Address the fear of coding amongst students. Focus on problem-solving and not get bogged down with technicalities. Addressing the process lifecycle view (such as CRISP-DM or SEMMA) of business analytics. Understanding the core concepts of Business Analytics in an easy-to-understand manner. Appreciate the integrated and interdependent way in which industry professionals work while executing a business analytics project. Exposure to an industrial strength platform that provides hands-on experience in business problem solving to students. It is a problem when organizations decide to embark on a digital transformation journey without having a clear strategy, action plan, or agenda, let alone a vision, for what it might mean and the path ahead. Many organizations face and will continue to face problems as they grapple with the process of change. At the School of Commerce & Management Studies, we think that we have an able partner in Dataiku in addressing this issue in an impactful manner. 0 2 Posted by alokchakravarty Crowley - Leveraging Analytics & ML to Increase Revenue in Container Shipping Name: Harsh Vora, Lead Data ScientistZachary Thorell, Business Data AnalystSandeep Punjari, Data Analyst 3Irwin Castellino, Director of Data and AnalyticsDeepak Arora, Vice President Corporate StrategyJavier Diaz, Senior Analyst Quality Assurance OpsFederico Gervasio, Industrial EngineerShannon Sarkees, Sustainability, Strategy & Digitization ManagerTishlee Rivera, Business Intelligence and Analytics DirectorSudip Roy, Big Data Solutions ArchitectSanjay Khobragade, MLOps Architect Country: United States Organization: Crowley Crowley, founded in 1892, is a privately-held, U.S.-owned and operated logistics, government, marine, and energy solutions company headquartered in Jacksonville, Florida. Services are provided worldwide by four primary business units – Crowley Logistics, Crowley (Government) Solutions, Crowley Shipping and Crowley Fuels. Crowley owns, operates, and/or manages a fleet of more than 200 vessels, consisting of RO/RO (roll-on-roll-off) vessels, LO/LO (lift-on-lift-off) vessels, articulated tug-barges (ATBs), LNG-powered container/roll-on, roll-off ships (ConRos) and multipurpose tugboats and barges. Land-based facilities and equipment include port terminals, warehouses, tank farms, gas stations, office buildings, trucks, trailers, containers, chassis, cranes, and other specialized vehicles. Awards Categories: Most Extraordinary AI Maker(s) Most Impactful Transformation Story Business Challenge: Crowley did not have a centralized platform to utilize data and machine learning for decision-making in our logistics business unit, where we face several fundamental issues: A. Missed revenue from dummy bookings – Customers book extra slots for containers on container ships and eventually show up at port with fewer containers since no cancellation fees are enforced (industry standard). B. Lack of demand forecast for each node – The availability of empty containers at the right nodes/ports in the supply chain is the key to meeting our customers’ demand. We did not have a historical and forecasted view of the demand for each container type, and between each origin and discharge node, which is key to enabling decision-making for empty container repositioning. C. Late customs documentation – Improper or late customs documentation provided by customers resulted in offloaded containers residing at the port, costing the port time and space, and incremental efforts for planning fulfillment. D. Unknown container weights – Each container ship has a maximum weight capacity. However, the weights of containers booked on a ship were only known once they are weighed at the port, resulting in last-minute planning for stowage (placement of containers on the ship) and accommodating weight constraints. E. Lack of Carbon Footprint estimation – Our customers seek to estimate the carbon footprint of their supply chain. We did not have the technology and tools to automate and expose the calculations of the carbon footprint from container shipments. F. Lack of Predictive Maintenance – Port equipment to load containers onto ships, trains and trucks are prone to failure due to extreme loads. Unplanned and immediate maintenance requests are disruptive and expensive. G. Non-targeted Promotions – The marketing methods for logistics customers were a manual and subjective process. A data-driven methodology to predict customer churn can improve the targeting of marketing efforts, especially for non-contract customers. Business Solution: As a 130-year-old company undergoing digital transformation, we seek to utilize predictive and prescriptive analytics with our business leadership to boost our revenue, customer experience, employee experience and sustainability efforts. We pioneer digital transformation in the supply chain industry through (1) centralization of data from our operational, commercial and sustainability data into a data warehouse, (2) utilization of singular platform (Dataiku) to develop predictive and prescriptive analytics that enables all personas through no-code, low-code, and full-code capabilities, and (3) democratization of data engineering and machine learning activities through employee upskilling programs. Through Dataiku, we developed/are developing solutions to our focus areas: A. Container sail/rollover model [in production] – We developed a classification model to predict the probability of show/no-show for each container booked on our ships, providing visibility of at-risk containers to our voyage planning team for improved decision making. B. Demand Forecasting [in production] – We utilized Dataiku’s AutoML capabilities to forecast demand associated with each container type, between each load and discharge node, enabling strategic decisions around empty container repositioning on a weekly basis. C. Customs documentation classification [in production] – We developed a classification model to predict the probability of improper/late documentation for each container, reducing manual work for our claims and customs department. D. Predict container weights [in development] – We are developing a regression model to predict the weight of containers booked on our voyages prior to them arriving at the port, enabling improved voyage planning with constrained weights of booked containers. E. Estimate Carbon Footprint [in development] – We are developing a methodology to dynamically calculate and serve the estimated carbon footprint as a service using Dataiku’s API capabilities. F. Predictive Fleet Maintenance [in development] – We are developing an anomaly detection model to identify concerning signatures from sensors on port equipment to implement a recommender system for inspection, reducing unplanned maintenance. G. Predict Customer Churn [in development] – We are developing a customer churn classification model to improve the targeting of our marketing and promotional efforts in our logistics business. Value Generated: Classification model to predict Container sail/rollover – Expected revenue gain of $5000-$10000/week, and 10-15 hours of employee time saved per week. Demand Forecasting model – Expected revenue gain of approximately $10000-$20000/week, and approximately 5 hours of employee time saved per week. Customs documentation classification model – Expected cost savings of approximately $5000-$10000/week, and approximately 5 hours of employee time saved per week. Regression model to predict container weights – Expected revenue gain of approximately $5000-$10000/week, and 5-10 hours of employee time saved per week. Estimate Carbon Footprint – This will be rolled out in a new product offering that enables optimization of supply chains based on carbon footprint and will position Crowley as a sustainability leader in the supply chain industry. Two potential customers have been identified, potentially generating revenue within the first year. Predictive Fleet Maintenance – Expected reduction in unplanned maintenance and last-minute planning of port equipment, and potential reduction in scheduled maintenance time. Potential cost saving tens to hundreds of thousands of dollars per year. Predict Customer Churn – Improved promotional targeting, reduction in manual hours for marketing, and data-driven identification of at-risk customers are expected to enable superior customer service to at-risk customers, increasing customer retention. In addition, Dataiku has also generated additional value at Crowley through the democratization of data analytics through upskilling and enablement of Crowley’s business analysts. Value Brought by Dataiku: Prior to Dataiku, each department worked in a silo utilizing disparate ETL, analysis, and reporting tools that did not integrate well. Dataiku provides a centralized, end-to-end platform for business analysts, data engineers, and data scientists to work together on analytics use cases. Another significant value addition comes from the interactive visual interface and a great suite of AutoML models provided by Dataiku, enabling data analysts to design predictive and prescriptive models. For the MLOps team, Dataiku provides a seamless manner of registering and deploying models to production. The deployer enables the necessary governance checkpoints and the inbuilt drift monitoring, metrics and checks enable the development of appropriate post-production alert systems. Finally, Dataiku simplifies the infrastructure needs of a maturing company. Our compute needs are always changing/increasing, and Dataiku’s Fleet Manager enables seamless scaling of servers and Kubernetes clusters. Due to the popularity of data science workflows developed in Dataiku at Crowley, the tool has got increasing interest from data engineering teams that are exploring the ETL functionalities that Dataiku provides, especially through seamless integration with Snowflake. 0 1 Posted by harsh9127 Excelion Partners - Building a Free Plugin to Efficiently Catalog and View Data Lineage Team members: Ryan Moore & Tony Olson Country: United States Organization: Excelion Partners Excelion Partners is a consulting organization with cloud data architects, data scientists, data engineers, and data analysts that are passionate about finding answers and building solutions with data. We help you "Decide with Data." Awards Categories: Partner Acceleration Moonshot Pioneers Business Challenge: At Excelion Partners, we work with numerous customers who utilize Dataiku in their Data Science and Analytics practice. Many of these organizations and analytics groups have not yet invested in an enterprise data cataloging tool or data lineage tool, which are often cost-prohibitive. As part of the productionalization process for these customers, we have often witnessed them creating "homegrown" data cataloging solutions that typically consist of a combination of spreadsheets, Dataiku, and their preferred visualization tool. Their “homegrown” data cataloging solutions are labor-intensive to maintain and do not integrate with their developers, who are hands-on with the Dataiku projects. Additionally, our clients struggle with data lineage. They are creating numerous downstream datasets in Dataiku. We often experience them saying “where did that column come from?” Without upstream data lineage visibility, our clients lose trust in the data and ultimately the solution’s business outcomes. Business Solution: Because of this cataloging and lineage challenge, Excelion has created a free Dataiku plugin called Thread. Thread is a lightweight catalog and lineage tool that directly integrates with Dataiku and its datasets. This tool allows for a single location to document data connected to Dataiku and to consume the catalog's contents in a manner that is easy and efficient for business practices. Thread is implemented as a Dataiku web app plugin which has a very easy installation process and has the ability to securely scan an entire (or partial) Dataiku node to allow for lineage view and documentation. The indexes and metadata that are generated by Thread are saved as Dataiku datasets in a project flow, making it very easy to export indexes and metadata for exposure in 3rd party visualization tools such as PowerBI or Tableau. Use Case Stage: In Production Value Generated: THREAD has already been deployed on 100s of projects at multiple joint Excelion and Dataiku clients. Here are some areas of business value THREAD users have obtained: Creates Efficiencies Less clicks / saves time by having the data definition at the time and location the information is needed. More insights and improved insights during exploratory data analysis through better documented columns. This all leads to faster solution building and data enrichment through documentation and improved data understanding. Improves Governance Clear measurement of governance through KPIs showing the percent of columns documented in any data set. Creates a repository for data documentation. Easier to keep definitions up to date. Allows definitions to be easily auditable (exportable). Natively integrated with Dataiku permissions that limit editing data definitions to those with access. Improves Trust Creates easy transparency for data analysts, data engineers, data scientists, and business leaders to see: What data was used in a project (data catalog). Where it was used (upstream/downstream data lineage). How that data is defined throughout the project (data dictionary). Builds a common language between the business and analysts Training & Onboarding Efficiencies Helps new team members learn company-specific jargon and abbreviations faster. Streamlines onboarding and training by keeping all individuals in Dataiku instead of a myriad of spreadsheets and code documentation. Saves Money and Labor Saves Analytics leaders $200k+ in purchasing, implementing, and supporting an enterprise grade data catalog & data lineage tool for their Dataiku environment. Value Brought by Dataiku: Thread is built on top of Dataiku! All the value Thread creates is an extension of and possible because of Dataiku. Dataiku’s flexible and extensible platform allows the community to contribute and share solutions across organizations and industries easily. The ability to write custom plugins and integrate with the Python API provide the capability to achieve exceptional business value through custom integrations. The native security integration removes governance concerns on building application solutions on top of Dataiku and thus increases the speed of the innovation process. Value Type: Reduce risk Save time Increase trust 3 5 Posted by rmoore Last reply Wednesday by rmoore Tom Brown (41xRT) - Helping Nonprofits Leverage Insights From Their Data Name:Tom BrownTitle:Non Profit Data Science & Analytics AdvocateCountry:United StatesOrganization:41xRTDescription:41xRT is “Where Arts and Technology Meet”. This is a name that I use while working with Cultural Non-Profit Organizations either with data technology, group facilitation or his own computational art work. In this context, I typically work on opportunities to allow data from patrons to speak more clearly to organizations helping them take smarter actions. Almost all of the work that I have done with organizations is on a pro bono basis helping the organizations build new data oriented capabilities.Awards Categories: Organizational Transformation Data Science for Good AI Democratization & Inclusivity Alan Tuning Challenge: As a personal passion, and professional mission, I am helping non-profit organizations around the world better understand their stakeholders through data and take actions based on these insights. Two challenges commonly arising when it comes to data science in the non-profits sector, particularly when trying to move beyond basic monitoring and evaluation toward the use of predictive models to drive more productive action: 1. Lack of proper infrastructure for data management & analysis As a striking example, when I started contributing to data analysis for a community college a few years ago, the collection pipeline was... making tick marks on a sheet of paper and having a work study student convert those tick marks to a spreadsheet! This was then used to produce end-of-year summary reporting. That is at the extreme end, to be sure, but many grassroots organizations rely on Excel spreadsheets. Even the largest cultural non-profit organizations don’t typically have data science tools to support the building of data pipelines and predictive modeling. 2. Vision and skills challenge for data science & AI The second challenge reflects a deeper issue of stakeholders’ awareness and ability to understand what data science is, what value it can bring, and what is possible through model operationalization to drive optimal action. In an already-tense market, it is difficult for nonprofits to hire for specialty data skills, especially as they have historically placed a priority on hiring more for “people skills”, including word literacy, fundraising, and passion for mission, than to build models that drive optimal performance. Solution: Hence my work revolves around pro bono consulting to build awareness and capabilities around how nonprofits think about data, data pipelines, and predictive analytics for their organizations - and Dataiku as a company, community, and tool has in so many ways helped move these endeavors forward. 1. Cleaning data for visualization at a Community college I started using the free version of Dataiku back in early 2017 (version 3) for a project with a Community College. This project enabled me to develop my own awareness of data science and tools that were accessible to non-programers. During this project, I used the visual recipes to turn messy data into clean data for visualization. The first major project helped the library to understand seasonal student flow at the reference desk. This understanding allowed staff to improve staffing levels at needed times to improve the student experience. 2. Predicting attendance at a children science museum Then, I brought Dataiku to Liberty Science Center, where I was spearheading Digital Projects & Analytics, and they benefited from the donated license as part of the Ikigai program. Our initial objective was to forecast future year attendance. Thanks to Dataiku resources and the versatility of the platform, we grew our data science skills to create features and develop a model that confirmed some staff hunches: at a children's science museum, attendance is strongly correlated with weather! Through simulating future years based on 20 years of past weather data, we found upper and lower bounds on attendance to inform the annual budgeting process. Our next project, which started just before COVID-19 hit, was to use the pipeline features to manage customer records in the fundraising and ticketing CRM system. 3. Equipping nonprofits with data science they can use Subsequently, I helped various non-profit organizations design and implement data science projects using Dataiku, which provides an interface to operationalize and leverage part of the work for other data initiatives. From cleaning data to re-import into the CRM system of Synchronicity for a women-run theater in Georgia, to audience segmentation and retention projects for the Cascade Bicycle Club, and membership churn modeling for a children's museum in Minnesota. 4. Building communities to share data science knowledge & learnings To expand my own knowledge throughout these endeavors, I sought to exchange with peers in the industry. That involves hosting events for the Dataiku New York User Groups, helping users solve their issues on the Dataiku Community as a Dataiku Neuron, facilitating an ‘analytic Coffee’ group, as well as an fast.ai study group among cultural non-profit administrators. Each of these activities helps to build awareness and capabilities for myself and emerging nonprofit leaders that now better understand the value of data science to facilitate smarter action. Impact: Data science is still at the infancy stage for most non-profit organizations. With Dataiku and a growing expertise to develop projects, enable team members, and communicate value to stakeholders, I was able to: 1. Deliver projects into organizations which wouldn’t have tackled them by themselves All projects listed above resulted from challenges known by nonprofits, but they did not have the awareness, the technology, nor the skills to solve. Having a single platform to build and operationalize projects enabled us to build solutions which wouldn’t be possible with the previously manual spreadsheet work. 2. Convert data into practical insights for the organizations Dataiku’s visualization features proved invaluable to communicate insights from data analysis to the broader organizations. This was key to show the value of data science initiatives, and enable further investment from staff in time and resources. 3. Build repositories of data science projects to leverage for future endeavors With the visual interface, workflows become understandable - even for non-data literate team members. This enables everyone to build upon existing work from more technical people, and leverage parts of it to conduct their own projects. 4. Onboard, enable, and upskills staff members and volunteers to leverage more insights from their data Thanks to the user-friendly interface, online resources, and programs such as Ikigai which provided a full-featured license and training, I was not only able to bring data science into all these organizations - but more importantly provide a pathway for them to build their own vision of what data science can bring to them. Users are quickly able to learn new data skills, and some have started to produce their own insights and build more advanced projects to grow their organizations. Although we’re living in a world of data science, most non-profit organizations still have a long way to go to embed the value of data science into their organizations and reap the benefits of smarter stakeholder interactions. With Dataiku, I have been planting the seeds of data democratization, enabling more stakeholders to leverage it and enable organizational change to fulfill their mission and change the world for the betterment of all. 0 3 Posted by tgb417 Cascade Bicycle Club - Laying the Foundation For Volunteers Collaboration on Data Insights Team members:Christopher Shainin, Technology ManagerTom Brown, Volunteer Data ScientistAkshay Kotha, Volunteer Data Scientist Sindhujaa Narasimhan, Volunteer Data Scientist Anas Patankar, Volunteer Data Scientist Sankash Shankar, Volunteer Data Scientist Megan Thomas, Volunteer Data ScientistCountry:United StatesOrganization:Cascade Bicycle ClubDescription:Cascade Bicycle Club, the nation’s largest statewide bicycling nonprofit, serves bike riders of all ages, races, genders, income levels, and abilities throughout the state of Washington. We teach the joys of bicycling, advocate for safe places to ride, and produce world-class rides and events. Our signature programs include the Seattle to Portland, Free Group Rides, the Pedaling Relief Project, the Advocacy Leadership Institute, the Bike Walk Roll Summit, Let's Go, and the Major Taylor Project.Awards Categories: Organizational Transformation Data Science for Good AI Democratization & Inclusivity Challenge: Cascade Bicycle Club, the nation’s largest statewide bicycle nonprofit, serves bike riders of all ages and abilities throughout the state of Washington. With a mission to improve lives through bicycling, they teach the joys of bicycling, advocate for safe places to ride, and produce world-class rides and events. In the fall of 2020, Cascade Bicycle Club invited a team of Pro Bono data scientists to help them understand and re-engage riders during and after the COVID-19 pandemic. The intent was to use existing transactional data held in Salesforce to model rider segments, as well as past drivers of engagement and churn behavior to better understand how they could better engage with riders. At the time of making this offer, Cascade Bicycle club had no infrastructure appropriate for data science work. Cascade was also wary of allowing Personally Identifiable Information (PII) on infrastructure not under Cascade Bicycle Club’s direct control. How could Cascade Bicycle Club quickly create an enterprise-class data science infrastructure that would allow a small team of volunteer data scientists from across the United States to work together? The solution had to involve providing familiar data science tools like Python, Jupyter notebooks, R, SQL, as well as access to Salesforce data for analysis, and eliminating the need to move customer data to analysts’ computers. Solution: As we started on this endeavor, we reached out to the Dataiku team about the Ikig.ai program. With a donated license, we were able to provide the platform to a small team of volunteer data scientists to collaborate on data analysis. Within less than a month, we’ve built out an AWS instance, connected data from Salesforce via a standard plugin, and made it available on Dataiku for collaboration - whereas the whole setup would usually take several months or more for a nonprofit to accomplish. This was made possible thanks to a team effort involving support from Dataiku, the willingness from Cascade to invest in some additional AWS infrastructure, the willingness of team members to move to a new platform (and move their Jupyter notebooks!). We were able to gain a quick impact through launching several projects: Rider segmentation, in order to better understand their objectives and behaviors. Rider retention and conversely ways to minimize churn. CRM cleaning through de-deduplication to lay the basis for further analysis. To work on these, we were able to invite an additional five pro bono data scientists into the process, who were quickly onboarded on Dataiku as we were able to reuse existing Python, notebooks and Dataiku data flows. Impact: Cascade wouldn’t have been able to securely leverage data science tools and techniques without a central platform. Dataiku has provided a home for data science operations for the organization, around three main pillars: 1. Enable collaboration between team members & volunteers Dataiku DSS provides a controlled environment to enable volunteers from around the United States an opportunity to collaborate on a common set of data and work in an environment with standard data science tools. Furthermore, thanks to its versatility, the platform allows each contributor to leverage the technologies and techniques they’re most familiar with - which has been pivotal in allowing volunteers to help as a side activity. This project provides a basic roadmap showing that nonprofit organizations can find creative ways to build infrastructure and leverage data science skills in order to participate in today’s data science revolution. 2. Facilitate reusability of past projects & workflows The visual interface allows everyone to view the workflow of other participants and assess where they can contribute their time and expertise. It also makes it easy to onboard new volunteers, as we did with a second round of contributors, and enable them to gain a quick understanding of projects conducted, as well as reuse parts of it for their own endeavors (thanks to copy/pasting steps of the flow & duplicating projects!). 3. Adopting a data-driven approach As we were able to conduct our first data science projects in Dataiku in a short time, and already show an impact on the organization, we’re planting the seeds of a data science culture at Cascade Bicycle Club - and laying the foundation for further engagement by staff and future groups of volunteers. This project becomes a template that can be reproduced by others wishing to leverage data science at the scale of a nonprofit organization. 1 4 Posted by cshainin1 Last reply 09-01-2021 by angie-gallagher Premera Blue Cross - Building a Feature Store for Quicker and More Accurate Machine Learning Models Team members:Marlan Crosier, Senior Data ScientistNorm Preston, Manager of Data Science team Jing Xie, Data Scientist Adam Welly, Data Scientist Jim Davis, Statistician Greg Smith, Healthcare Data Analyst Gene Prather, Dataiku System AdminCountry:United StatesOrganization:Premera Blue CrossDescription:Premera Blue Cross is a not-for-profit, independent licensee of the BCBS Association. The company provides health, life, vision, dental, stop-loss, and disability benefits to 1.8 million people in the Pacific Northwest.Awards Categories: AI Democratization & Inclusivity Responsible AI Value at Scale Alan Tuning Challenge: The feature store is an emergent concept in data science. It consists of a storehouse for features, which can be used in a variety of Machine Learning models. It helps streamline the process of building machine learning models and make it overall much more efficient, thanks to hundreds to thousands of features easily available. Before the development of our feature store, we had to build features for each new model from scratch. Building a feature from scratch can take several days or even weeks. Besides the significant additional time required to build new features, such one-off features were often not as well tested and so models were more likely to be impacted by errors. The other big impact was that often we were unable to test as many features as we might have liked to. The result is that our models were not as accurate as they could have been. Solution: Overview Our feature store currently includes 283 features. As a health insurance company, members are foundational entities in our business and all features are currently linked to a member. Our features are built from data in a SQL-based data warehouse. All features are all pre-calculated (vs. calculated on the fly). All processing runs in-database via SQL. In other words, we used SQL to build our features - as with the amount of data we are working with, using Python would not be practical. Given the pre-calculated approach, the resulting feature tables are fairly large since, for many of our features, we store daily values for our entire member base. Most features are thus updated daily (that is, new values are calculated daily). Day-level feature values are sufficient for the vast majority of our use cases. 1. Structure Our feature store includes a core table and then several tables for specific types of features. The data in these other tables is of a particular type or source, and is available in a particular timing. The benefits of this approach are multiple: Easier development, e.g. each table has its own Dataiku project. Scales better over time, as we don't have to worry about limits in terms of number of columns. Gives data scientists options regarding what data to include in their models. The data types for each feature store table have been carefully selected to minimize storage requirements, and more importantly to minimize the memory footprint when data is read into Python-based machine learning processes. 2. Development & Deployment We currently use a team approach for developing new features, and Dataiku’s collaboration features have been very helpful here. Each developer is provided with a current copy of the relevant feature store project, and then use either version control tracking to identify changes and additions, or git-based project merging to facilitate integrating the changes back into the main project. We deploy updates to our feature store using Dataiku’s automation instances. Develop and test takes place on the Design instance, then updates are deployed to a Test instance, and finally to a Production one. We have incorporated a variety of design time and run time checks (via Dataiku’s Metrics and Checks) to assure data accuracy and reliability. Additionally, we developed Python recipes that largely automate the feature table update process - for instance, copy data from existing table, drop existing table, create new table, copy existing data back in and then add new feature data. 3. Metadata & Discoverability Each of our feature projects includes a metadata component. This metadata is entered via a Dataiku Editable dataset and includes attributes like name, data type, definition, a standard indicator to use for missing values, and feature effective date. Since we were building it for a small team, and we wanted to try all store features in our models, we have initially been focusing on building features rather than discoverability. We are now building a fairly simple webapp in Dataiku to provide data discoverability and exploration, in preparation for rolling out the feature store to more teams in our company. This discoverability tool utilizes the feature store metadata described above. 4. Usage Data Scientists can incorporate feature store features in their projects using the Dataiku macro feature. The macro enables selection of subject areas to include and specification of how to join feature data to project data. The macro handles missing value logic and maintenance of data types to minimize memory demands in Python-based machine learning processes. Impact: The overriding benefit of our feature store is of course how much more quickly we can develop machine learning models. Developing an initial model often takes just a few hours, whereas without our feature store that same model may have taken days, weeks, or even months. In some cases, our final ML models only use features from our feature store - although more commonly we supplement these features with features designed for the particular problem at hand. Additional benefits include: Models are less likely to be impacted by errors or leakage, as features are more extensively tested and thus. Better accuracy in our models, as we are able to test many more features than we would be able to without a feature store. We've also experienced a bit of a virtuous cycle effect with our feature store. As the store expands and the value increases, it's easier to justify investing the resources to develop new features, test them thoroughly, assure that leakage is not occurring, etc. This in turn further increases the value of the store which makes it even easier to invest in further enhancements. And so on! At the company level, our ability to develop more accurate models more quickly also enables more areas in the organization to benefit from our data science investments. 0 3 Posted by Marlan AstraZeneca - Toward Self-service AI and Analytics for World-changing Innovation AstraZeneca has never attempted to solve the full landscape of data pipeline, machine learning, and data visualisation within a single tool due to the inherent complexity required in building and maintaining the broad spectrum of capabilities that would be required. Due to this, the lifecycle for a project from data wrangling, through cleaning, manipulation, data science, visualisation, and deployment could see a user working across multiple tools and platforms for each stage of their pipeline. 0 7 Posted by ak12 RiseHill Data Analysis - Using AI to combat the Rise in Corporate Fraud in Malaysia Name:Siti Sulaiha Binti Subiono Title:Data ScientistCountry:MalaysiaOrganization:RiseHill Data Analysis Sdn. Bhd.Description:Risehill Data Analysis Sdn. Bhd (RDA), is a high-tech development and service company registered in Kuala Lumpur, Malaysia, which is specialized in petroleum technique consulting, services, and data analytics. The company is committed to comprehensive technical research, development, and consultation based on the concept of ‘the integration of multiple sources of data'. Currently, the company has some software copyrights, technical patents, and the tailored workflow and solutions for some particular and challenging problems. The company aims to be a world-class integrated service in data analytics and is acknowledged for its state-of-the-art technology provider.Awards Categories: Data Science for Good AI Democratization & Inclusivity Responsible AI Value at Scale Challenge: To detect fraudulent activity, most organizations used to rely on a rule-based approach, which requires an algorithm to perform several defined scenarios - and the workflow must be manually updated if new scenarios or trends come in. As fraud tactics have become more advanced, this approach is now outdated. The vast number and size of datasets at hand also made fraud detection more challenging. Based on the Crime Statistics Malaysia 2020 by the Department of Statistics Malaysia, Corporate Fraud which involves bribery, corruption, and asset misappropriation recorded an increase from 2018 to 2019. The Covid-19 pandemic also contributed to the rising trend in fraud cases, as it accelerated the need for effective payment channels between consumers and companies - and faster payments can potentially mean faster crime. In addition, Malaysian organizations are quite slow to adopt AI technologies combating fraud, due to a number of factors. First, the increasing amount of data of questionable quality, which makes it harder to leverage. Second, corporations still do not trust technology as a tool in detecting fraud effectively and tend to keep conventional investigation methods, which are time-consuming. The last challenge lies in the shortage of local talents, which hinders progress in detecting fraudulent activities. As a Data Scientist, I also have challenges in building the whole workflow, which is a very lengthy process - from joining data from various sources, doing exploration, building machine learning models using Java or Python, fine-tuning those or optimizing computing time, until deployment. We found that Dataiku enabled to fill these gaps, so that the RiseHill Data Analysis Team stands together in combatting the rise in corporate fraud in Malaysia using AI and Data Analytics. We want many companies in Malaysia to open their eyes and use advanced technology and tools to combat this issue before it’s become worsen. Solution: RiseHill Data Analysis Sdn. Bhd. leverages Dataiku to develop Machine Learning models as a more effective method in detecting fraudulent activities, as well as a more secure and efficient approach - moving past the old school “rule-based” approach. We are now able to centralize data exploration, wrangling, and the creation of machine learning models in one platform - hence Dataiku helps us save time in the development and deployment phases of the models. Our favorite feature is Data Partitioning, which enables us to refresh the data on a daily basis, while Dataiku will only re-build the workflow with the partition that contains the new data. This is especially helpful to re-train models efficiently. Machine learning relies on pattern recognition and classification to distinguish legitimate transactions from fraudulent ones occurring through online payments channel. The types of classification we used are using are based on user identity, order history, location of the payment, time of transactions, and amount spent: In Identity classification, we use the age of the customer, the amount of the characters they used in their email address, fraud rate of their IP address, and the number of devices with which they access the organization’s site. In Order History, we use the data when the orders were placed, or time period, the amount spent in each transaction, and the data on how many orders were attempted and failed. In Location Classification, fraudulent activities can be detected through mismatch between the billing and shipping addresses, or between the user's IP location and the shipping address. In Method of Payment Classification, credit card details, name of the customer and the shipping information must reference the same country, and the credit card used by the customer must not be issued by a bank with a reputation of fraudulent transactions. Impact: Machine learning is particularly helpful to organizations which implement these models in the long run, as they are able to remove non-legit transactions and streamline the acquisition of new, reliable customers. It also enables risk mitigation, as these techniques detect more advanced fraud than the traditional rule-based approached. Our approach has the potential to be generalized in Malaysian organizations to fight fraud. Dataiku has benefited both our organization and customers through fraud detection - and enabled us to save time on development, execution, and deployment of modeling. No more hard-code! 0 3 Posted by sulaihasubi ALMA Observatory - Empowering a Data-driven Organization to Improve Astronomical Operations Team members:Ignacio Toledo, Data Analyst, Data Science Initiative LeadTomás Staig, Software Development Lead, Data Science Initiative Lead Rosita Hormann, Software Engineer Jorge García, Science Archive Content Manager Jose Luís Ortiz, Technical Lead - Digital Systems Mark Gallilee, Technical Lead - Mechanics Sergio Pavez, Software Engineer Takeshi Okuda, Senior Instrument Engineer Gastón Velez, Systems Administrator Maxs Simmonds, Technical Lead and Deputy - Archive and Pipeline Operations Jorge Ibsen, Head of the Department of ComputingCountry:ChileOrganization:ALMA ObservatoryDescription:The Atacama Large Millimeter/submillimeter Array (ALMA) is an international partnership of the European Southern Observatory (ESO), the U.S. National Science Foundation (NSF) and the National Institutes of Natural Sciences (NINS) of Japan, together with NRC (Canada), NSC and ASIAA (Taiwan), and KASI (Republic of Korea), in cooperation with the Republic of Chile. ALMA -the largest astronomical project in existence- is a single telescope of revolutionary design, composed of 66 high precision antennas located on the Chajnantor plateau, 5000 meters altitude in northern Chile.Awards Categories: Organizational Transformation Data Science for Good AI Democratization & Inclusivity Challenge: Just as close as 15 years ago, most of the earth-based observatories were small facilities producing data for astronomical research, bearing more resemblance to laboratories than to industries. However, since the beginning of the 2000s, more complex and ambitious observatories have been built, with multi-million dollar budgets. A major issue emerged: these could not be operated with a staff of 5 to 10 people, with one or two astronomers coming onsite to do their own experiments. As institutions, today's big astronomical observatories have become gigantic "data industries", producing terabytes (and soon petabytes) of data every year to power scientific research. ALMA requires a staff of 300+ people and, to provide 4,300 hours of useful scientific data from our skies in a given year, the same time must be spent on maintenance and updating activities. That includes hardware components such as the "radio interferometer" (a virtual telescope made of 66 antennas and two giant computers to join their signals), software systems used to collect and process the data, but also monitoring power supplies and weather conditions to ensure that observations are being performed with a sufficient level of quality. In short, the volume of data from observations increased, along with the variables to consider to operate an observatory correctly. Yet, we didn’t have the proper tools and processes to make sense of this new data. While we asked ourselves questions, we did not have the ability to provide quick and efficient answers. For instance, we once received an avalanche of problems reported from a particular hardware component, which became of critical importance as it impacted the quality of the observations performed. We began analyzing the number of successful hours observed in that month with this particular component - it turns out, it was the most productive month ever for that component! Obviously this seemed contradictory, but it registered more problems because it was simply used much more. This all required a simple data analysis to find out, but we didn’t realize this sooner because we did not have the tools nor the infrastructure to query and parse the data, clean it, and enrich it with other data sources. This lack of efficient analytical tools for system diagnostics pushed us to look for them outside the organization. Here comes Dataiku, and the Ikigai program giving free licenses to nonprofit organizations. Solution: With Dataiku, we’re building an infrastructure that is allowing the observatory’s staff to take their analytical work to the next level through: 1. Giving all people access to all the relevant data sources Our databases were previously only accessible by astronomers to process data for scientific research. As the central data science platform, Dataiku enables our whole organization to participate in the analytical process and find out answers for their day-to-day work. For instance, now engineers and data analyst can access the CMMS*, Jira tickets, and log files from a data warehouse populated using the ETL and data preparation capabilities provided by Dataiku, and they can enrich their analysis by joining and correlating data that was previously difficult to access and analyse. 2. Enabling them to upskill through integrating with a big technology stack Dataiku provides a visual interface to enable all technical levels to collaborate, while integrating with most current technologies to facilitate upskilling - for instance, learning a bit of SQL to query the data in various ways. The resources provided in the Dataiku Academy, as well as the Community platform where anyone can get quick answers from other users and experts (thanks to fellow Neurons!), are highly valuable for everyone to gain new knowledge. 3. Providing ways to leverage more advanced techniques, incl. machine learning Dataiku also provides ways for even the less technical staff to foray into machine learning, thanks to its user-friendly AutoML features, and the visual interface showing (and explaining) the most relevant performance indicators of different models - also conveniently summarized in the models competition page! 4. Easily presenting insights with user-friendly data visualization capabilities Anyone in our staff is able to perform exploratory data analysis, thanks to visual features and a drag-and-drop charting interface - and those willing to code can do deeper at their own pace. Presenting final results is also greatly accessible, with dashboards composed of tiles, to centralize from other parts of the project, in just a few clicks. 5. Giving guidance and resources to onboard enable everyone in the organization Lastly, Dataiku has been key to easily onboard new users and make them realize the value of data insights. We’ve developed a Working Group with members of the Software, Engineering, and Science teams, with the mission to train new users and propagate best practices. We’re leveraging content from the Dataiku Academy, and are highly involved with the Community platform where any users can go to ask questions and share knowledge. We’re also currently leading a hands-on challenge in which volunteer users give their time and expertise to bring a valuable contribution to ALMA through seeking to automatize quality assurance assessment. Always more people internally and externally collaborating in Dataiku, to advance the ‘search of our cosmic origins’! *Computerized maintenance management system (CMMS), also known as computerized maintenance management information system (CMMIS), is a software package that maintains a computer database of information about an organization's maintenance operations. Impact: Today, the ALMA Observatory is one of the first earth-based observatories, if not the first, to make advancements in using data science, machine learning and automation to improve its operations. By bringing people together on a single platform, Dataiku helped grow general awareness on data analytics and taking decisions based on information produced by the data. Now the value of analytical work is broadly recognized across the organization, triggering fruitful cross-functional collaborations between various profiles - astronomers, but also analysts, archive managers, software engineers, system engineers, etc. This leads to many wins across the organization, in which Dataiku replaces old processes to improve efficiencies through saving time and resources for building and maintaining data projects, plus optimizing through automation, machine learning, and easy monitoring among other features. For instance, the data management team needs to keep track of observation times to comply with those requested, and create indicators enabling them to identify possible problems which might hinder the delivery of the observation data to the scientific community. It formerly took years to create that tracking tool due to the efforts and resources required, now it is only a matter of months, as the approach to solve the problem moved from a software development perspective to a data science perspective, where Dataiku supports every step from accessing the data to providing the tools to present the results to the consumer, and the focus of the analysts was no longer debugging code but understanding the data and obtaining the information needed out of it. Eventually, the biggest value brought by Dataiku relates to powering scientific discoveries: not only are we producing scientific data, but we are starting to look into it to make our operations more efficient, so as to increase the number of hours in the sky by lowering the hours needed to keep everything working as expected, and to make the best use possible of those hours by improving the quality of the observations. 0 7 Posted by Ignacio_Toledo Malakoff Humanis - Leveraging AI to Democratize Insights From Customer Feedback Team members:Nikola Lackovic, Data Scientist (NLP & voice technology specialist)Gauthier Lalande Layal Saad-Koubeissi Zhijie ZhouCountry:FranceOrganization:Malakoff HumanisDescription:Malakoff Humanis is one of France's leading social protection groups, covering all the insurance needs of people in supplementary pensions, health, welfare, and savings.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Value at Scale Alan Tuning Challenge: Speech Analytics aims at analyzing the category of calls within the CRM framework, so as to enable different internal stakeholders to leverage oral feedback received to improve our product and customer experience. We therefore needed a solution able to receive, treat, analyze, classify and output the data to a visualisation tool, from an external server to a PowerBI interface. Dataiku enabled us to overcome the main challenges encountered: It helped us integrate the fully scaled solution with AWS S3 containers in order to store the data. The entire pipeline was then set up without using any additional components and everything was built using the user graphic interface, apart from Python recipes which were needed for various reasons. The dynamic and adaptive type handling was a feature which eased the process of implementation all along the way. Data preparation and several painful jobs were done using the in-built recipes and permitted to bypass the weight of coding everything in Python 3. The graph-based solution is very nice to grasp the entire workflow at a glance, also easing the process of metacognition over the entire pipeline. Dataiku was then exposing data back to S3 from which the PowerBI was then linked in order to display the data. Solution: Speech Analytics is a horizontal product available at everyone's fingertips - from a technician looking to solve product issues thanks to client feedback, to a high-level manager visualizing the interactivity of the client with multiple teams within the organization. Input data consists of different types of client data: conversational transcripts of calls, metadata from IVS, and CRM knowledge-base. Call transcripts are established thanks to a Speech-to-Text external partner, along with several description metrics to facilitate data comprehension - hence integrating multi-model data presented an interesting challenge for the project. A state-of-the-art fine-tuned transformer for French language, called camemBERT, was implemented for Natural Language Processing. We also leveraged a tonal (positive, neutral, negative) model built by Dataiku Data Scientists in order to predict the sentiment of a conversation. All along the process, every step was built within a design node in order to make a prototype that was therefore tested within the pipeline. When the use case was working under the design node, we built the scenario to run every hour within call centers hours, and migrated to the automation node. The automation node-design is up 24/7. It is a sanctuary on which we migrate the data workflow from the design node. The recipes used in the flow are: SQL recipes, Python Recipes, Data Preparation and Machine Learning Recipes. The entire flow built within the Dataiku platform is now running every day from Monday to Friday during working hours, 9 AM to 7 PM (GMT+02). As a latest development, a retro-feedback loop based on call center helpers has been implemented to feed the transformer - this will be pushed to production in the next internal release. This is also permitted with the integration of the EKS clusters technology within the DSS framework with one of our Data Engineers, which will enable us to scale to the maximum monitoring of data (85% of all calls). Impact: The benefits are multi-faceted: Cost savings The solution will enable us to automatize 45 seconds per call over 12 millions calls handled every year, which leads to tremendous cost saving in terms of human resources dedicated to answering those calls. Improved customer satisfaction Customers are getting faster answers to their questions, and more valuable interactions as they are directed to our most relevant team members for their requests - who will provide them with support and guidance, beyond the usual transactional interaction. Data science democratization Through making conversational data available to the broader organization, Speech Analytics empowers people to gain insights from customer feedback. In order to display the results, we leverage a visual stack in Microsoft PowerBI, which is a very easy and affordable way to enhance capabilities of. gathering information. The next development is to trigger actions within other components of the informational ecosystem. For instance, we're looking into developing in Dataiku a suggestion trigger for tele counsellor allocation, so that for every traffic group within a IVS cluster, we will be able to predict the t+1 call volume in order to hourly adapt the tele counsellor presence in call centers. 0 2 Posted by onevirtual Ericsson – Human-centered Machine Learning for Dimensioning Resources in Telecoms Name:Marcial GutierrezTitle:Senior Specialist in AI, ML and Data AnalyticsCountry:SwedenOrganization:EricssonDescription:Ericsson provides high-performing solutions to enable its customers to capture the full value of connectivity. The Company supplies communication infrastructure, services and software to the telecom industry and other sectors.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Responsible AI Value at Scale Alan Tuning Challenge: The IP Multimedia Subsystem, or IMS, is a core network technology that provides Communication Services to people in wireless and wireline networks. These services range from Voice and Video (e.g. over 4G and 5G) to Emergency Calling and Enriched Messaging. Typically, IMS is offered in the form of network functions as software, and it is deployed on a specific operator’s private cloud infrastructure. Before deploying/instantiating IMS network functions to provide the aforementioned services, a dimensioning process is conducted by the supplier (e.g. Ericsson) in order to estimate, based on the network’s user traffic model, how many resources (in the form of CPU Load or Memory) will the IMS network functions require from the target cloud environment so as to serve those subscribers accordingly. In other words, dimensioning is the process of predicting how much CPU Load, Memory, Storage and Networking would be required. Dimensioning of IMS or any Ericsson products and services with the highest accuracy is critical, so that a proper offer is submitted to the potential Ericsson Customers. This is also key to avoid contractual penalties, which in case they are incurred, can impact Ericsson and customer trust above all. Due to the high stakes and complexity the dimensioning task has, the process needs to be conducted with a human-centered approach supported by interfaces and a trustworthy calculations backend.Given that our IMS network elements generate statistical traffic data (in the form of Performance Management counters), data-driven ways to perform dimensioning are identified as the next evolutionary step to address these important needs. Before Dataiku DSS, the overall challenge we faced was to actually have a Machine Learning-based backend which could take this traffic statistical data, treat it and deal with everything we needed in terms of model training and inference for dimensioning, while having the ability to interact with it, as a service (black box), via Rest API calls only. The approach above would enable us to build a Web Application in front of this Machine Learning backend, in order to address our user-centered needs, while achieving high accuracy, as depicted below. Our application has a working title of Data-Driven CANDI (CANDI = Capacity and Dimensioning): Solution: Dataiku DSS allowed the successful realization of the whole concept we were after. More specifically, the major pain points Dataiku addresses for us are: Industrializing data preparation, model training, evaluation, deployment, and life-cycle management in one single place, completely controlled via Rest API calls. Training different estimators (AutoML) with datasets from different live networks and select the best of them for deployment and inference. Creating specific estimators, in the form of custom plugins, which are resilient to the characteristics of the Telecommunications data, and which can be added to the industrialization described above. Ensuring explainability of the deployed models was available for the users of the Web Application. We have a systematic flow that takes the data from a MongoDB database after the user uploads it via our WebApp. The following depicts a figure with the flow that every project of Data-Driven CANDI typically has. Via the WebApp, the user is able to request that a new ML model is trained for its dimensioning needs. This is translated into a scenario execution that takes care of running the above flow. DSS connects to the data from the MongoDB instance, prepares it, and then runs an AutoML workflow. The best model is selected and finally deployed, so that the user gets the necessary predictions for dimensioning via the exposed Rest API endpoint for inference. Feedback on what is happening under the hood is continuously provided to the user via the WebApp. The following depicts all the steps created for the scenario execution, and specifically the AutoML step: The resulting dimensioning estimators have high accuracy. This is based on the data science work we have done around this, and the custom plugin we created for the modeling phase, which deals with the very particular aspects of our data to ensure generalization of our models. The following picture shows the specific custom plugin used: Impact: Data-Driven CANDI as a whole is meant to provide considerable savings (~90%) in R&D costs compared to that of the current dimensioning tool. Moreover, it will provide us with the possibility to understand how our IMS software performs over many different networks with different Cloud Environments characteristics, beyond the ones we use internally in our labs for IMS verification. This all translates into achieving accuracy levels we never had before, and thus, increasing our forecasting and customer’s trust along the way. Explainability in the Telecommunications industry is important to achieve as well. The possibility to provide explainability into all of our dimensioning models to the Data-Driven CANDI user is something very important to trust in this system, and ultimately for the customer to also trust in the predictions obtained from the system and forecasting capabilities The way Data-Driven CANDI has been architected is innovative in the sense that all the complexity of dimensioning is hidden from the user.This means that the dimensioning user is still able to perform its task trusting that the system will provide an accurate result. Moreover, this architecture and approach allows the possibility to expose the WebApp directly to our customers, so that they are enabled to own and plan their network CAPEX in terms of IMS and cloud resources. Our approach has the potential to be generalized beyond IMS products (i.e. to other products in Ericsson such as 5G Core, 5G New Radio, etc.). You may find out more about our modeling strategy in the Ericsson’s next-gen AI-driven network dimensioning solution blog article. 0 3 Posted by gmarcial Pr. Zervoudakis (New York University) - Shortening Time to Insights For Students Name:Stavros ZervoudakisTitle:Assistant Professor (Adjunct)Country:United StatesOrganization:New York UniversityDescription:Founded in 1831, NYU is one of the world’s foremost research universities and is a member of the selective Association of American Universities. The first Global Network University, NYU has degree-granting university campuses in New York and Abu Dhabi, and has announced a third in Shanghai; has a dozen other global academic sites, including London, Paris, Florence, Tel Aviv, Buenos Aires, and Accra; and sends more students to study abroad than any other U.S. college or university. Through its numerous schools and colleges, NYU conducts research and provides education in the arts and sciences, law, medicine, business, dentistry, education, nursing, the cinematic and performing arts, music and studio arts, public administration, social work, and continuing and professional studies, among other areas.Awards Categories: Excellence in Teaching Challenge: I have designed and have been teaching a 2-semester course on Applied Data Analytics at New York University. The course has been running in the last few years. It starts with basic topics on statistics and simple visuals, and ends the 2nd semester with Deep Learning and AI frameworks. We start with Excel and Excel Data Analysis and we move to Python and Python Data Science packages. The challenge has always been to instill students with the necessary curiosity so they can master the basics and learn how to approach data science problem solving in a way that they own the answers. Typically, we go through learning what the concepts mean while practicing using tools and code. Solution: Dataiku comes into this learning journey after students have learned how to solve data science problems manually, the “harder” way. By design of the course, Dataiku DSS is employed at the time that students know how to answer these challenges. They are expected to have mastered the theory and they know how to practice solving such problems in the lab. Having a plethora of related capabilities, Dataiku creates a “wow” effect. It shows them how they can go through the pipeline faster and more thoroughly. A quote by my student this past 2021 Spring semester was;:“So now, by using Dataiku, I can complete the course project in a matter of a few days instead of a few weeks?” My answer was a simple “yes”, knowing from their homework submissions that they knew how to complete the project without Dataiku. Impact: The course uses Python, Python Statistical packages and Data Science/Machine Learning/Deep Learning packages, Excel and Excel Data Analysis add-ons, as its core tools to practice the concepts. At a high level, concepts that we cover start with theory of data and analytics, then we move to the basic use of spreadsheets and visualizations. At the same time, we touch upon basic Python programming and move quickly to related packages. Next we do statistics and probability theory, followed by more practice using both tool categories while we continue with sampling, estimation, and statistical inference. After these foundational ideas are mastered, and we cover prescriptive analytics thoroughly, we move to predictive and prescriptive analytics concepts while we introduce machine learning. A good amount of time is spent learning about how a good number of algorithms work, the ins and outs of related math, while practicing each of them with the appropriate dataset (sort of a mini project in the form of a team homework). In the 2nd half of the 2nd semester, we review frameworks, cloud computing, big data and we move to Deep Learning, Deep Learning architectures and related packages, and close the course by touching upon machine learning operations. Most of these concepts can be seen playing on the Dataiku user interface. When my students learn to use Dataiku, it becomes the ‘aha’ moment, where they see that once they know what data science means, they can use tools to help them execute a project faster and more thoroughly.t 0 1 Posted by Stavros RBC’s RaptOR - Dynamic Audit Planning Through Machine Learning-Based Risk Assessment Team Members:Masood Ali (Senior Director, Data Strategy & Governance)Vincent Huang (Director Data Science) Mark Subryan (Director Data Engineering) YuShing Law (Director Analytics Ecosystem) Kanika Vij (Sr. Director Data Science and Automation)Country:CanadaOrganization:Royal Bank of CanadaDescription:Royal Bank of Canada (RY on TSX and NYSE) and its subsidiaries operate under the master brand name RBC. We are one of Canada's biggest banks, and among the largest in the world based on market capitalization. We are one of North America's leading diversified financial services companies, and provide personal and commercial banking, wealth management, insurance, investor services and capital markets products and services on a global basis. Awards Categories: Organizational Transformation Value at Scale Challenge: Background: Internal Audit’s annual audit planning exercise comprises of two key components 1) risk assessment and 2) compilation of the audit plan. The risk assessment process results in the risk rating of auditable entities (organizational units). Internal Audit conducts risk assessment on over 400 auditable entities annually. The outcome of the risk assessment forms the basis of the audit plan. Business Problem: The annual audit planning process is subjective and a manually intensive process comprising of several non-standardized offline processes to gather data points to risk assess from different sources and compile audit plan. Therefore, it is a time intensive process spanning many months to compile the annual audit plan. Key Challenge: Our objective was to build a continuous risk assessment tool that automates the monitoring of risk status and trends to provide a comprehensive and dynamic view of risk for an audit entity at any given time and automate the compilation of the audit plan. The above challenge required a platform which provided the ability the perform extensive ETL related functions such as building a system to ingest and process data from various systems and sources across the Enterprise coupled with the ability to build and productionize machine learning models all in one place. The scale of our project is enterprise wide and the impact is department wide i.e. Internal Audit. This is where Dataiku provided the ability to perform extensive ETL and Machine learning all in one platform. Where did Dataiku fit into the picture? Solution: To enable a data driven risk assessment in an automated way across the entire department, following are the key areas in which Dataiku has facilitated: 1. Performing ETL and integrating Machine Learning models in one platform i. Data Acquisition – Setting up connections to source systems across the Enterprise. Currently, there are 96 connections to databases throughout the enterprise with only 2 platforms partially on-boarded. We anticipate the final number to be approximately 400 database connections when all platforms have been on-boarded. ii. Data Pre-processing – All transformations to each dataset are captured within their own project. The visualization of the pipelines reduces the need for manual documentations on workflows and execution instructions, and the risk of key people dependencies. When data is refreshed or new data arrives, pipelines can be easily executed to re-perform the calculations. We currently have over 700 intermediate datasets between raw inputs and the final staging dataset encompassing a wide range of numbers of transformations and calculations. Manual maintenance of these workflows would have challenging. iii. Automated Productionized Work flows - Dataiku enables IA to put workflows into production with a fraction of the staff and effort than custom coded or bespoke applications. At the moment, we have 21 scenarios set up in which 6 of them execute on a weekly or daily basis. The team receives email notifications of scenario executions and will promptly address failed runs. This fits our agile approach because we can respond to user enhancements faster. Also, the entire process is de-risked as we can roll-back the changes easily iv. Computations - Raptor in its current form consume approximately 7.58 million rows of data and performs over 174 million computations. Without a complete and dedicated development team, setting up a large-scale project like this would have been impossible. Dataiku provided the piping and basic infrastructure and this makes it easy for small teams, such as ours, to put together large projects. v. Machine Learning Models – Through Dataiku, we were able to easily set up a pipeline to consume data from an API, engineer features, prototype two different models with Dataiku’s Lab and deploy it with minimal friction. The model outputs were integrated with additional Enterprise data to derive additional insights. Dataiku was instrumental in this as it allowed us to monitor model performance and schedule model retraining and executions. vi. Workflow Management - If Dataiku wasn’t there, there would be a lot of spaghetti code to deal with on people’s laptops given the number of individuals involved in the project. Dataiku facilitates the organization and visualization of the workflows, which makes for an easier review as well as reduces key people dependency. vii. Scheduling workflows and adding dependencies – The risk assessments are to be updated on a quarterly basis. This entails a number of upstream and downstream dependencies. Dataiku makes it easier to schedule workflows and take into account the dependencies. viii. Dataiku visual recipes – Dataiku’s visual recipes helped in joining and pre-processing datasets in an efficient manner. This prevented time being spent on writing long and cumbersome spark/SQL code. ix. Freedom to focus on the problem – Dataiku has enabled IA to reduce the coding footprint to one-tenth of what it would be from a custom coded application. It gives Data Scientists/Engineers and Analysts the freedom to focus on problem they are trying to solve rather than having to wade through the overhead of handling miscellaneous IT issues. E.g., code environment issues, the code works on one person’s desktop but not the other. Also the data scientist doesn’t need to have a strong understanding of the details of how the system is being solutioned which allows them to focus on solving their task 2. Data Governance Due to the project scope, data is being sourced and processed from various source systems and teams across the Enterprise. This lends in itself key concerns around Data Governance that Dataiku has helped address such as: i. Data Lineage – Automating data lineage allows us to accurately capture what is actually happening to the data, not what employees believe is happening. In house built solution leverages Dataiku API to scan metadata in order to establish catalogue of data assets and their associated lineage at a data element level. This insight help identify that at IA there are 407 dataset reused 310 times; 16,273 datasets, 840,000 data elements consumed across analytics projects at IA. ii. Dataiku Metadata Integration with Collibra – Lineage results are then integrated with Enterprise Data Governance Platform Collibra leveraging APIs. Dataiku helped speed up documenting lineage of Raptor related KRIs to instill transparency in data consumed to risk assessed audit entities. Without Dataiku/Collibra integration it would have been 75% more costly, 66% more time consuming and perhaps not feasible to contribute 1 million inventory of data assets for lineage and keeping it up to date on a daily basis. iii. Data Quality – Raptor application derives 100’s of Key Risk Indicators (KRI) using 1000’s of critical data elements from variety of enterprise data sources. Knowing quality of critical data elements informing KRIs for audit planning decisions is very important. Dataiku’s data profiling, tagging, recipe sharing and integration with python capabilities provided the framework through which data quality checks were easily build and embedded in-line with data ingestion process. Results are harvested automatically using Dataiku APIs to integrate with Enterprise Data Governance Platform Collibra on a regular basis avoiding lots of manual effort. iv. Adherence to coding practices and version control – It would be simply impossible to adhere to coding practices and version control in a project of such a large scale if code was to be maintained offline on the team member’s laptops. There is a feature in Dataiku that helps to modularize and build libraries that team members can access and apply the same function across different datasets. For example, to streamline the same data quality (DQ) check across all datasets we built a library of DQ checks which the various data analysts on the project team can leverage in a standardized manner. Impact: Benefits are multi-faceted, and most impactful on two major areas: 1. Operational Efficiencies Department wide i. Time savings from automating the continuous risk assessment process by streamlining administrative processes related to data sourcing, processing and risk calculations related to risk assessment. ii. Reduction in manual processes and various end user computing tools such as excel files. iii. Flexibility to diverge resources to platforms with elevated areas of risk and highest impact. iv. Increase in consistency and repeatability of risk assessment process. 2. Quicker adjustments to the audit plan i. Enterprise audit plan coverage can be aligned to areas of elevated risk. ii. Visibility into emerging and changing risks on a continuous basis, which will help audit teams respond to changes in the risk environment by pivoting on the audit plan. 0 4 Posted by LisaB Schlumberger - Using Dataiku to Democratize AI Within the Organization Team members:Valerian Guillot (Nerve Center Data Science Architect), with:Sampath Reddy Jean-Marc Pietrzyk Jimmy Klinger Eimund LilandCountry:United KingdomOrganization:SchlumbergerDescription:Schlumberger is a technology company that partners with customers to access energy. Our people, representing over 160 nationalities, are providing leading digital solutions and deploying innovative technologies to enable performance and sustainability for the global energy industry.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Challenge: Democratizing AI within Schlumberger Schlumberger is investing significantly in research and development to improve our product and services for customers, and has been embarking on digital transformation internally, as well as supporting our customers through their own transformation. The main challenges Schlumberger has been facing are: Re-skilling cohorts of petro technical experts into digital skills Ensuring prior work on data driven topics are discoverable and reproducible Ensuring that access to data is democratized, with a focus on: Data discoverability Data ease of consumption Rights of use controls Ensuring that solutions designed & prototyped have a clear delivery path to yield business impact Solution: Schlumberger needed a single data science platform to access Schlumberger domain data through no-code & low code interfaces, where prior work is easily discoverable, and whose technology is close to the systems where the insights & models will be deployed to. Leveraging dataiku, we have put in place a mechanism where Schlumberger data scientists and technical experts can: Leverage the Dataiku Data Catalogue to access curated domain views of Schlumberger business systems data Leveraging Dataiku’s code samples & custom plugins capabilities to access high frequency historical environmental exposure of Schlumberger drilling equipment Leveraging Dataiku Visual & Cope recipes to build insights & models to improve well construction performance and reliability Leveraging Dataiku automation and API node capabilities, and its close integration with BI solutions, to easily put models and insights available to wider populations in the field. To support the internal adoption of Dataiku within Schlumberger, we’ve developed and delivered a number of custom data science classes, focusing on use cases relevant to Schlumberger’s population of technical experts . As usage has scaled out, we’ve leveraged Microsoft’s Yammer to build a technical community helping each other within Schlumberger. Impact: 8x increase in Dataiku usage in last 18 months 6TB of data analyzed per day 40% of contributions by non data scientists 42% yield on training classes 4x increase in community help in 12 months 35% of data science projects are collaborative 720 days since the last day without data science commits Models & insights used in 70 countries Dataiku, and its close integration with the DELFI E&P cognitive environment has been a key driver in democratizing the use of data science within Schlumberger. We are measuring the effectiveness of democratization through: The number of active users, where active users are users making technical contributions (e.g. code change, flow change…) The job code of the active users The usage of the data access helpers The number of projects going into production The graph above shows the growth in the number of users per week making contributions to data science projects, growing 8-fold since early 2019. The growth has been worldwide, with users all around the world: The data democratization has been successful in onboarding our existing population of data scientists, as well as technical experts, ranging from maintenance technicians, service quality engineers, well engineers, etc. who are now able to speak a common language, and make data-driven decisions such as: Choosing the types of batteries to include in a downhole tool, by looking at the historical environmental exposure of the tool Choosing the drilling bottom hole assembly to maximize operational reliability using BI solutions resulting from data flows in dataiku Choosing when to replace equipment to reduce the risk of downhole failure using PHM models trained in Dataiku Optimizing the choices of drilling parameters to maximize performance, and minimize the energy consumption. The distribution of contributions to data science projects (chart below, in 2020-2021) shows that ~70% were made by users who are not data scientists. Early 2021 data shows further growth in non-data science contributions. Dataiku has also enabled collaborative work between data scientists and domain experts, where 35% of the data science projects in Dataiku are collaborative projects (defined as the fraction of projects where the distribution of commits is spread amongst multiple users) The growth in usage, and the diversity in job code of users has proven the transformative value of dataiku as a collaborative data science platform for Schlumberger. Supporting the growth has been done on three axis: Domain views and helpers to access data Custom training (instructor-led, virtual, and self training) Community based technical support Community engagement: As the user base growed, we have put in a place a Bulletin Board where Dataiku practitioners can ask any technical questions on Dataiku, or data science, in order to collectively learn from each other: Snapshot of the Dataiku bulletin board on Yammer. The community engagement, measured here as the number of messages read on a technical Yammer chat over the last 365 days, shows the community engagement has tripled over the last year: Easing data access Accessing time series data from Schlumberger’s operation had historically been a challenge, especially for predictive maintenance purposes, which required being able to trace back the entire history of each piece of equipment. Leveraging Dataiku plugins and global shared code, domain specific helpers are implemented to retrieve the entire historical exposure of each piece of downhole drilling equipment, using only a single line of code. The helpers are used up to 12,000 times per day, with approximately 6TB per day of data being analysed. Training Dataiku is a platform that is simple enough where a user can get started on his own. The complexity starts from learning how to leverage Dataiku to access Schlumberger data, and time series data. In order to effectively train our users, the focus had to be on the ways to leverage Dataiku to access Schlumberger data, based on use cases relevant to the technical experts. In that effect, custom training manuals were developed: Predicting the chances of success of a drilling run, and identifying which controllable parameters would improve the chances of success Accessing the historical time series to identify operating environments of the equipments Accessing drilling time series data to identify similarities and differences between drilling operations Accessing historical time series & tool failure information to build failure predictive models. The cumulative views on the manuals exceeds 5,000. The yield of instructor & virtual classes is on average 42% (fraction of onboarded users still using Dataiku 6 months after the training), where virtual classes had a 50% yield, and instructor led classes 30%. 0 3 Posted by Valerian INSEEC U. - Dataiku as a Leading User-Friendly Data Science Platform for MBA Students Name:Linda ATTARITitle:Director of MSc 1 Data Management and MSc2 Data Analytics, INSEEC U. Campus LyonCEO, Attari ConsultingCountry:FranceOrganization:INSEEC U.Description:INSEEC U. is a private institution of higher education and multidisciplinary research in Management, Engineering Sciences, Communication & Digital and Political Sciences. With locations in Paris, Lyon, Bordeaux and Chambéry-Savoie, INSEEC U. trains 25,000 students and 5,000 executives each year in classroom and distance learning, from Bachelor's to DBA. The question of processing and analyzing data is becoming a major issue for companies. How to exploit data so that it can support companies in their strategic choices? This is the objective of the MSc 1 Data Management. This training allows students to acquire the fundamentals of data marketing, big data, data mining, and data processing. An introduction to AI, issues, challenges and ethics is provided, and the specific lens of AI applied to marketing is taught regarding data modeling and predictive issues. The objective of the MSc 2 Data Analytics is to provide technical expertise, centered on 4 major axes: Understanding consumer behavior through the optimization of the customer experience induced by the exploitation of unstructured data (photos, blogs, articles, comments) Improving decision making through online data analysis, predictive analytics, and machine learning Processes & improving the quality of data, as well as adding value to it Conducting and managing big data projects Awards Categories: Excellence in Teaching Challenge: In the digital age, the deluge of data is creating new economic opportunities for companies, and therefore our students must be prepared for this in our masters specialized in data analytics. The ability to analyze massive data by training our students in market tools represents a significant competitive advantage: from the collection of heterogeneous data, to its analysis and visualization in real time. We needed to extract the most relevant online data for the business to identify the right information at the right time and place, so as to improve decision making and optimize organizational performance. We had to choose appropriate tools to understand and capitalize on this new reality: predictive analytics and data intelligence. We also had to assess the value of datasets, evaluate the evolution of the data market - from the collection process, to cleaning, valorization, and interpretation. The questions that arose are: what evangelistic tools exist on the market? How to understand the new ecosystem, and how to best explain it? The old KPI key performance indicators become obsolete as soon as they are defined, due to the agility of big data - therefore, how to value new and more relevant indicators, such as the Knowledge Value Added (KVA)? We needed to train new skills within our MSc Data Analytics by training future executives to become quickly operational: we did not have technical solutions, so we turned to the Dataiku platform in 2016 for students to practice with real datasets and be supported in the decision-making process. Solution: As part of the Academic program offering, Dataiku licenses have been provided free of charge to students and teachers. The benefits are multiple: Students can download Dataiku directly from the website and activate their license when they first log in to the interface. Different deployment options are provided (Mac, PC with VM, Amazon or Azure). Teachers and students have access to the Dataiku e-learning website, which contains all the resources to quickly onboard on the platform. Multiple solutions have been implemented with Dataiku: Data Wrangling: Dataiku offers interactive data cleansing and enrichment. The user can easily access more than 80 visual processors for code-free wrangling. Contextual transformations are automatically suggested, and it is possible to suggest new ones, as well as to perform mass actions on the data. Machine Learning: The platform offers guided Machine Learning, enabling users to clean the data, create new features, and build a model in a unified environment. Data Mining: Dataiku provides visual insights thanks to a user-friendly interface. Using drag-and-drop technology, users can easily create for data exploration. The 25 built-in ranking formats make it possible to understand the data at a glance. Data Visualization: Users can quickly create histograms, maps, heatmaps, box plots, etc. Visualizations can be set up very easily, and the data can be explored using an intuitive drag and drop system. Dataiku is suitable for teamwork and knowledge sharing. Different features facilitate collaboration. It is possible to add documentation information and comments on each object, along with "To do" lists that facilitate data project planning and delivery. Project example: MSc1 Data Analytics & Marketing Manager – Students from the 2020-2021 course Manon Proton, in collaboration with Jérémy Kodaday and Johanna Tournadre: The Adidas project implemented Nov 03, 2020: An example of data exploration: discovery of the data and their display in the software interface. The platform is intuitive and easy to use. The editor's interface is fluid and well-designed, which enhances the user experience. Moreover, the user easily understands the organization of the tools. It is possible to work in groups and in remote mode. It is possible to use the software on Windows or Mac operating systems. Impact: My course is dedicated to the big data provider market and benchmark of technical and functional solutions. The students had to apply the testing methodology seen in class by following all the steps of data preparation: import the data, discover them, know how to organize them, clean them, enrich them in order to perform their analysis. In addition, they applied the benchmark of different solutions through working on the functional and technical characteristics of the platform. For the functional characteristics, students had to find out if it was possible to work with quality data preparation, build relevant visualizations, and ensure traceability of the data among others. Regarding the technical specifications, they had to check the import and export format, the different possible types of external sources, the various statistical representations, the recognition of the variety of data formats, the volume of data accepted, and the UX. They were also tasked with setting up a competitive mapping, and finally to explore the economic model of the solution in order to make a recommendation. Dataiku stood out for its adaptability to different operating systems, its recognition and quality of data, the easy handling, and its ancillary software. Dataiku is a leader in terms of completeness of vision, execution, and capability - so the platform is a reference model in the algorithmic fields that can assist employees both in marketing and in the prediction of events. The variety of possible applications are as follows: - Evangelical Solution - Connectivity - Cleaning and Enrichment - Machine Learning - Data Mining - Data Visualization - Workflow - Real-time Scoring - Collaboration See. Inseec Campus Lyon MSc2 Data Analytics student report – Johanna Tournadre – Jérémy Kodaday – Manon Proton. Extract of some specifications studied: Volume, recognition, representation, and data quality: 0 2 Posted by L_Attari Pr. Vazacopoulos (Stevens Institute of Technology) - Upskilling Students From All Levels with Dataiku Name:Alkividis VazacopoulosTitle:ProfessorCountry:United StatesOrganization:Stevens Institute of TechnologyDescription:Stevens Institute of Technology is a premier, private research university situated in Hoboken, N.J. overlooking the Manhattan skyline. Founded in 1870, technological innovation has been the hallmark and legacy of Stevens’ education and research programs for more than 140 years. Within the university’s three schools and one college, more than 6,100 undergraduate and graduate students collaborate with more than 350 faculty members in an interdisciplinary, student-centric, entrepreneurial environment to advance the frontiers of science and leverage technology to confront global challenges.Awards Categories: Excellence in Teaching Challenge: We’ve come across four main challenges on which Dataiku helped: Companies want to hire students who can attest that they know how to use Dataiku This challenge is resolved with the Dataiku Academy and its certified Learning Path, and we’ve put together a special non credit course that the students can take in order to learn Dataiku. In our Business Analytics program for MBAs and Executive MBAs, the students do not have programming skills (R or Python) Thanks to Dataiku’s visual interface, we have integrated Dataiku in the 600 and 610 BIA course for students to learn about Machine Learning and practice with several examples. Grading Python code is very tedious Dataiku makes it more simple to track and assess progress in the Machine Learning course. Many times, we want to combine visual tools with Python code This is easily handled in Dataiku, so that students can leverage the most relevant environment for each challenge. Solution: Dataiku has helped in many aspects: Collaboration, which also helps students learn from each other. Auto ML has helped a lot, especially with the ability for students to find if a specific data set can lead to relevant results. Students can improve their skills much faster thanks to the many different technologies available in Dataiku. Students can combine Python code with visual recipes to select the most efficient route for each step in their data workflow. Merging and cleaning data sets is very important for us, and made easy in Dataiku with just a few clicks. For all those reasons, we are starting to use Dataiku in my industry capstone course! Impact: Our students are currently using it for the Vaccination analytics summer project. Students from Fall 2020 classes I taught using Dataiku were hired by a major pharmaceutical company. Helping us teach the next generation of best-in-class data talents! Several projects have been completed using Dataiku. Undergraduate students were able to complete more advanced projects, such as sentiment analysis, and upskill with Dataiku. 0 2 Posted by avazacopoulos Schlumberger HR - Talent Acquisition Enablement with Machine Learning Team members:Modhar Khan - Head of People AnalyticsRichard De Moucheron – Director Total Talent ManagementWesley Noah – Global Compliance Managing Counsel OperationsSejal Sagar Mehta – Application EngineerSudeep Goswami – HR Applications ManagerRyan Stewart – Global Talent Acquisition Planning ManagerSonia Badilla - Talent Acquisition Manager Western HemPhilip Irele Evbomoen - Talent Acquisition Manager - Eastern HemBeth Kremer – North America Recruiting ManagerZhi Chi – Data Engineer HRITSimon Spero (Dataiku) - Senior Enterprise Customer Success ManagerCountry:United StatesOrganization:SchlumbergerDescription:Schlumberger is a technology company that partners with customers to access energy. Our people, representing over 160 nationalities, are providing leading digital solutions and deploying innovative technologies to enable performance and sustainability for the global energy industry. With expertise in more than 120 countries, we collaborate to create technology that unlocks access to energy for the benefit of all.Awards Categories: Responsible AI Value at Scale Challenge: Every year, more than 500k candidates apply to Schlumberger across the globe. With our PeopleFirst Strategy, we made a commitment towards improving Diversity & Inclusion in everything we do as a company. Our Talent Acquisition team had stretched investment and resources to vet these candidates, match them to business demand, and do all of that efficiently with the utmost compliance. The challenge in using AI & ML was to ensure that it will not have any negative impact on the candidates and to continuously monitor such models that can be vetted and improved, in case they generate any bias against any class. After vetting many ready-made solutions, we found that they do not cover the complexities in 80+ countries, nor the number of profiles we hire for. Solution: Complex data engineering: Making the data ready for exploration was a complex process as it involved many internal and external data sources, as well as numerous engineering steps and feature generations. With Dataiku, we were able to do that at scale quickly and in a quality manner. See example of a project showing Dataiku’s ability to handle complexity at scale: Ensemble modeling: From advanced embedding models for text and features extraction to probabilistic predictive workflow, Dataiku was able to handle customizations needed in our ML workflows seamlessly. Model deployment: The API deployer proved to be an efficient and cost effective feature, without requiring to add additional infrastructure in the pipeline. Collaboration and adoption: Recruiters were able to interact with the predictions and provide feedback in a true collaborative manner. Impact: Pilot results (Q2-Q3 2021): The ensemble model developed was able to rank 10,000 applicants with 82% agreement on the output by a committee of recruiters and managers. The API deployed and connected to the candidate processing system is now in the test phase, and we look to deploy it in select countries by mid-year. We estimate that this will reduce the time of processing candidates by more than 80%, while providing better experience to applicants with timely feedback. This will also support agility in responding to critical business needs. 0 4 Posted by modhar Australia Post - Leveraging ML-based Forecasting To Optimize Capacity Planning at Processing Facilities in a Large-scale Logistics Network Name: James Walter, Senior Data ScientistYohan Ko, Senior Data EngineerBtou Zhang, Network Operations LeadDuc Nguyen, Shift Production ManagerNormy Chamoun, Head of Processing NSW/ACTSheral Rifat, Data Science ManagerPhil Chan, Data Engineering ManagerDavid Barrett, Facility ManagerBoris Savkovic, Data Science Manager Country: Australia Organization: Australia Post Australia Post is a government business enterprise that provides postal services in Australia. We are also Australia’s leading logistics and integrated services business. Last year, we processed 2.6 billion items, delivered to 12.4 million delivery points across the nation, and continued to provide essential government and financial services via the country’s largest retail network. Awards Categories: Value at Scale Partner Acceleration Business Challenge: We are Australia’s leading logistics and integrated services business. Last year, we processed 2.6 billion items, delivered to 12.4 million delivery points across the nation, and continued to provide essential government and financial services via the country’s largest retail network. The global pandemic has accelerated e-commerce growth with more households shopping online than ever before. Whilst Australia Post has a long and proud history to lean on, we continue to face challenges from ever-increasing parcel volumes and a great digital disruption that is shaking up the wider logistics industry. This requires us to innovate and transform. A key daily activity, at facilities within our logistics network, relates to shift production managers being tasked with making daily resource/staffing planning decisions, that seek to ensure that we process parcel demand in a timely manner, whilst controlling for cost. Currently, these decisions are being made based on limited, but best-available, information. Too few staffing hours can result in sub-optimal throughput and parcel delays, whilst too many staffing hours can unnecessarily increase labor spend. To address this pain paint, our Data Science team developed a shift volume forecasting algorithm in Dataiku. The model provides facility operators with daily shift volume forecasts and translates this information into staffing requirements. The algorithm was trialed in partnership with one of the biggest processing facilities in the Southern Hemisphere, Sydney Parcel Facility, and is now used to inform daily planning activities. Feedback from shift production managers is that "based on the volume prediction, we were comfortable with not running overtime the following morning. This paid off". Thus the model is empowering managers to confidently make decisions regarding the need for overtime. The approach is changing the way that facility operators make decisions, resulting in significant operational dollar value savings (~15 million Australian Dollars [AUD] p.a. once rolled out nationally). Business Solution: We chose Dataiku as we were looking for an end-to-end data science platform that simplified and automated many aspects of the data science and data engineering workflow, allowing our team to deliver results faster and with fewer frustrations. The team made use of Dataiku starting with initial exploratory data analysis (EDA), for python coding and use of custom modules, through to production deployment and BAU operation including model performance monitoring, data monitoring, and resource monitoring. The end-to-end MLOps process in Dataiku is streamlined, integrated, and easy to use. Specifically, Dataiku has enabled us to easily manage the following key aspects of the MLOps workflow: Complex dependencies in terms of libraries and virtual environments, by abstracting many of the complexities that one usually faces when working with dockerization or virtual environments. Scalability of our models by providing a streamlined way to leverage a Kubernetes cluster on GCP to attain scale and to enable further scale-out of the model to future facilities. Version control functionality in Dataiku Data Science Studio (DSS) enables lineage tracking and model versioning across the full lifecycle. Collaboration functionality whereby data scientists, data engineers, and developers could co- develop and then serve the models seamlessly to business users. ETL pipeline development, leveraging both time-based and event-driven scenario execution, to process the real-time data that is feeding into our models. We rapidly developed an ML model (random forest model in Dataiku, leveraging the Visual ML capability) to forecast shift volumes and labor/staffing requirements at each facility. Model deployment to production in Dataiku was speedy and required minimal resources from data engineering as many of the production processes are automated and handled by the Dataiku platform. The metrics, checks, and testing capabilities have enabled us to add quality assurance to our models Business Area: Supply-chain/Supplier Management/Service Delivery Use Case Stage: In Production Value Generated: This project is highly innovative, novel, and transformative within Australia Post, as it is bringing real-time forecasts to users at our facilities, thus enabling a level of real-time and data-driven decision-making that has not been possible to date. In short, operational decisions can now be made in a timely manner as required by the time-constrained daily cycle of our network operation teams. Most importantly, these forecasts are relevant, actionable, accurate, and highly automated through the use of the Dataiku platform. From a scale and technical point of view, the forecast generation process is now streamlined end-to-end in Dataiku, and can easily be scaled out to more facilities nationally. Specific business metrics of success include: The data-driven forecasting approach is changing the way that facility operators make decisions, resulting in significant operational dollar value savings (~15 million AUD p.a. once rolled out nationally), and significant uplifts in overall parcel throughput within our network. The dollar value savings result from reduced labor costs at facilities (reduced spending on on-demand agency staff), the uplift in service quality, increased throughput at facilities, and freeing up shift managers’ time. The process is now also fully automated, whereas previously human operators would laboriously have to collate data from multiple sources (including Excel spreadsheets), which was costly (human resources) and did not have a level of automated quality assurance. The model is 25% more accurate at shift volume forecasting than traditional human approaches. Uplift in repeatability and consistency of labor forecasting for planning. We now have a consistent standard and process that can be scaled out nationally, in a consistent and repeatable manner. Value Brought by Dataiku: The specific value brought by the Dataiku platform and the Dataiku team include: Ability to develop, deploy and operationalize ML models at speed and at scale, and within a controlled and governed end-to-end data science workflow. Dataiku is a single end-to-end integrated data science platform, from development to deployment to BAU operation. This results in a streamlined and consistent process for ML and data science work, across the full spectrum of data science work. Specifically: The full data science and data engineering lifecycle are native to DSS. ModelOps and MLOps frameworks are native to DSS, including versioning and dependency management (two key challenges when it comes to production-grade deployments). The option to leverage advanced models easily and to deploy at scale (using Kubernetes), subject to best-practice MLOps practices as dictated/governed by the Dataiku platform. In the future, we are also looking to leverage Apache Spark as an execution engine within Dataiku as we continue to scale up and roll out the solution nationally. The ability to leverage Kubernetes to train models at scale, and to easily deploy many models in a production environment, via elastic compute options. Dataiku acts as a seamless abstraction layer from the complexity of the underlying big data processing technologies. Dataiku enabled us to test many multiple models in parallel, including champion-challenger frameworks, which accelerated the model development and model field testing cycles. The Dataiku academy and the Australian and global Dataiku teams provided excellent support to uplift our team, and also to support our end-to-end journey from onboarding the platform all the way to our first production deployments and operations, and beyond. Value Type: Increase revenue Reduce cost Save time Increase trust Value Range: Dozens of millions of $ 3 10 Posted by boriss Last reply Friday by SteveG bp T&S - Re-Imagining Fundamental Analytics in bp Trading & Shipping Team members: David Maerz, SVP Trading Analytics & InsightRobert Doubble, VP Trading Data AnalyticsCarl Hale, VP Programme ManagementDan Parisian, VP Fundamentals Modelling & InfrastructureFor and on behalf of the Trading Analytics & Insight and I&E dTA organizations in bp Trading & Shipping Country: United Kingdom Organization: bp Trading & Shipping T&S is the energy and commodity trading arm of bp and is one of the world’s leading energy, marketing, operations, and trading organizations. We buy, sell, and move energy across the globe to provide integrated solutions to over 12,000 customers in 140 countries. With upwards of 300 ships on the water at any given moment for bp, T&S moves around 240 million tonnes of oil, gas, and refined products every year. Awards Categories: Most Impactful Transformation Story Business Challenge: Immediately following the appointment of Bernard Looney as the new bp CEO in 2020, the company announced an ambitious net zero low carbon agenda and the transition from an international oil company to an integrated energy company. Trading & Shipping (T&S), the energy and commodities trading division within bp, is a key enabler of this strategic intent. With over 12,000 customers worldwide, and a business spanning crude oil, refined products, natural gas, power, LNG, biofuels, and low carbon, T&S helps keep the planet’s energy moving. Its commercial success is underpinned by a world-class analytics capability, comprising a global team of 160+ analysts in Europe, the Americas, and Asia. They deliver actionable insights, advanced pricing models, and valuations of complex structured deals that inform the deployment of risk by the commercial teams. Possessing strong business acumen, seasoned market knowledge, and deep technical know-how, they are the backbone of our ‘analytics edge’. Historically many analysts have worked in vertically integrated silos, sourcing, cleaning, exploring data, building models, and producing outputs largely independently of one another. This frequently led to parochial, duplicative solutions that were sometimes frail and often highly manual. Opportunities to collaborate, share best practices, or to seek peer reviews were limited, and the development of modular re-usable solutions to common business problems was a rarity. With bp T&S mandated to grow revenue by expanding into new countries, entering new markets, and scaling up existing business lines, demand for analytics will only increase. Successfully navigating the energy transition will require an agile, flexible analytics capability, one that our legacy working practices and Excel tooling cannot provide. Eliciting change required a disruptive paradigm shift in our ways of working. Business Solution: bp’s new strategic direction provided a powerful catalyst for a radical rethink of our analytics working practices and organizational design. In 2021 we re-organized analytics along technical discipline lines, embraced Agile, spun up four multidisciplinary Agile Squads, and agreed that Dataiku would be the cornerstone of our modern strategic analytics tooling. In addition, we created a specialist fundamentals modeling discipline, one that would spearhead our transformation activity. Dataiku was a natural choice for an enterprise AI platform for the T&S analytics organization. With its concept of ‘Clickers’ and ‘Coders’, it was well matched to a population equipped with a broad set of technical skills and differing levels of proficiency, ranging from Excel novices to deep Python experts. Dataiku’s emphasis on the collaborative development of end-to-end model workflows also resonated powerfully with our goal of empowering cross-discipline squads to reimagine our next generation of predictive models. In 2021, Agile squads in London, Singapore, Houston, and Chicago kicked off our analytics transformation journey. Informed by a small group of enthusiastic product owners, the squads set about reimagining complex, high-value, and business-critical Excel models in Dataiku. Favouring progress over perfection, our goal was to continuously accrue benefits by engineering intuitive model workflows that benefit from superior automation and increased robustness. We now have a growing number of business-critical models in Dataiku, executing intraday as new market data arrives without human intervention. Linked Excel Workbooks have been replaced by simplified workflows comprising both visual recipes and bespoke Python code, organized in logical Flow Zone groupings that afford standardization through design modularity and bespoke, re-usable Plugins. What’s more, model outputs are disseminated to traders via highly interactive self-service dashboards. As we transform, we routinely engage with Dataiku to share feedback, seek technical reviews of our design thinking, and learn how their customers are tackling similar problems. Value Generated: Now 18 months into our transformation journey, we have a growing number of business-critical models executing daily on Dataiku with a high degree of automation. Traders can interrogate model outputs using interactive dashboards and experiment with custom market scenarios that would be impracticable in Excel. By embracing Agile and fine-tuning it to our business context, we have been able to continuously accrue benefits in double-quick time. Furthermore, by eliminating manual processes for loading and preparing data, utilizing job scheduling, and embracing superior automation, we free up analysts from clerical tasks to instead focus on highly dynamic energy and commodity markets. Through a process of continuous learning, we have identified design patterns for common recurring tasks that are ripe for modularization, either in the form of reusable Dataiku plugins, or by creating bespoke Python libraries. By building out a suite of shared components, our transformation trajectory is accelerating, with new models deployed more quickly. As our momentum builds, so does our business impact across the trading floors, as we transform legacy models at pace. Our work has received high praise from senior T&S leadership, citing its ‘game-changing’ nature, as well as recognition from Franziska Bell, SVP Digital Technology. In the case of low carbon analytics, starting from greenfield, we have built out an entire suite of analytics on Dataiku which has very quickly delivered material value. A key enabler of our success is a strong partnership with the central IT team. The provision of a robust multi-tenanted platform with ~150+ users is key to building confidence in Dataiku and critical to embedding our new ways of working. Arguably, our trailblazing analytics transformation is demonstrating to both T&S and the wider organization how new digital investments can advance bp’s commercial strategy. Value Brought by Dataiku: Over the course of our 18-month transformation journey, we have retired 140 Excel Workbooks, eliminating 500 spreadsheet tabs in the process. By replacing onerous clerical processes with superior automation, we have saved 174 analyst hours per year. Analysts now have more time to focus on high-value analytics. Models now run more quickly and more often, allowing us to quickly disseminate actionable insights to the commercial teams in response to market-moving events. Scenario analysis allows front-line traders to quickly understand how changes to model parameters impact the numerical output, helping them to build greater trade conviction and to deploy risk with increased confidence. Agile working practices allow us to accrue benefits rateably, unlike a waterfall-based approach. Analysts reap the benefits of manual work being taken out of the system, while traders gain from having access to more powerful tools to understand markets. Duplicative, siloed model development processes have been superseded by collaborative, cross-discipline working practices, and a centralized repository of models and libraries of reusable components. Company knowledge is institutionalized, and key person risk is reduced. With our oil, natural gas, power, and low carbon models now in a single central location we can seamlessly construct cross-commodity views and generate new commercial insights that were impossible while working in siloes. Powerful machine learning algorithms and Dataiku’s ability to handle large data sets provide the foundation for building our next generation of advanced predictive models, something inconceivable in Excel. Our team is enthusiastic and energized by what can be delivered through our new ways of working and by embedding Dataiku at the heart of what we do. Empowered and encouraged, the team will continue to employ Dataiku in innovative and novel ways to underpin the future commercial success of bp T&S. 0 11 Posted by BobDoubble Unilever - Building Self-service NLP for Analysts Worldwide Names:Linda Hoeberigs, Head of Data Science & AI, PDC LabAsh Tapia, Data Partnerships & Tools Stack ManagerCountry:United KingdomOrganization:UnileverDescription:Everyday 2.5 billion people use a Unilever product to look good, feel good or get more out of life. Our purpose is to make sustainable living commonplace. We are home to some of the UK’s best- known brands like Persil, Dove and Marmite, plus some others that are on their way to becoming household favourites like Seventh Generation and Grom. We have always been at the front of media revolutions whether that be the 1st print advertisements in the 1890s or in 1955 when we became the 1st company to advertise on British TV screens. Experimentation and bravery drive us and have helped us become one of the UK’s most successful consumer goods companies.Awards Categories: AI Democratization & Inclusivity Challenge: Our Unilever People Data Centre (PDC) teams across the globe deal with vast amounts of unstructured text data to gain insight into our customers, how they engage with our brands and products, and what are the needs we are yet to tap into. The industry is moving at a rapid pace which consequently requires a rapid generation of insights to stay on top of the latest trends. The sheer amounts of data and the skills required to analyse it efficiently exacerbate this problem. The answers our marketeers, product research and development, and supply chain specialists also require analytics approaches tailored to the business. Analyzing text data is a complex task and often requires understanding complex language models and Natural Language Processing techniques, which most of our marketeers do not have. Their skills are focused on data analysis, so we had to find a way to synthesize our text data into something that can be analyzed by our PDC analysts, without compromising on our technical data science approach. Building on this, the solution had to be flexible and able to work in multiple languages with the aim to supply all analysts a tool that would be accessible in their market. Solution: This solution was born via the democratization of a project flow made up of several code recipes, as with most data science work, it is often unknown how applicable and reusable a piece of code is until its is put into practice. In this case, we were able to take these code recipes written by our data scientists and encapsulated them into a plugin by collaborating with our data engineers. Using the ability to create custom plugins, we developed a plugin called Language Analyser which is readily available for use by anyone in the PDC across the globe. It has allowed hundreds of analysts to be able to apply Natural Language Processing (NLP), increasing the efficiency, quality, and granularity of their work. What’s more, the ability to compare two text datasets was implemented using the ability to have multiple datasets as input to a single plugin, thus increasing the range of applications of this tool. To solve the challenge of flexibility we employed a custom front-end, using HTML, CSS, and JavaScript. By being able to create a user-friendly interface we were able to break the barrier between technical terminology and algorithms, with analyst-friendly terms. For an analyst to be able to use this plugin they merely needed to supply a dataset and select their pre-processing steps such as removing spammy authors from social media data, removing unnecessary stop words, and cleaning the data of noise. From this they can then choose which NLP techniques to apply to their data, including identifying general grammatical entities, emojis, and Unilever relevant terms such as ingredients and fragrances. Building from this, they can choose to enrich their analysis with pre-tagged sentiment, adding a layer of depth to generated insights such as which emojis are used in a positive context when discussing vegan foods. Our data scientists are often focused on the accuracy and the processes behind the scenes to turn unstructured data into something more structured, our analysts on the other hand are focused on finding insights and presenting these back to their stakeholders. Our solution makes use of static insights within Dataiku to create a way of visualizing the data returned from the pre-processing and data science processes. Being able to leverage such JavaScript libraries like D3 allowed us to collaborate with a dedicated design team to present the data in a way that aided information presentation and insight discovery. Impact: The tool has been received extremely well by analysts and other data scientists. It sees strong usage every day across a wide span of research projects. The outputs serve both as inspiration for further analyses such as theme detection, and as discovery of language intricacies. One of the key reasons this solution was implemented in a plugin was due to how it gave a single interface to multiple common options using NLP. This resulted in analysts being able to use the Language Analyser for data cleaning, tagging Unilever entities, or completing a full comparative language analysis on two datasets. It allows the analysts to see their text data in a new light in a matter of minutes. It goes without saying this is now our analysts’ go-to text analysis tool. In addition, this tool has been up and running for more than a year and has changed the way in which Unilever informed marketing strategy for Hellmann’s, who found out which foods over-index for lunch compared to other meal-moments, and thus were able to generate more relatable meal moments in their campaigns. It has also informed Comfort on which words to use in tone-of-voice strategy by finding out which words over-index for millennials. The current team continues to better the tool by integrating with other existing capabilities, for example topics and themes. Which adjectives and adverbs describe each theme the best? What beauty ingredients are most common for each topic? As we uncover more insights, our questions grow more advanced, and this requires a forward-thinking strategy. In addition, as we expand globally, questions like this are starting to pour in all the way from Mexico to Japan – we have continuously worked to improve our language coverage with the tool gone from supporting 12 languages at the start to 30 languages currently. We design and develop with the analyst-in-mind and market coverage has been a significant milestone. The Language Analyser has allowed data scientists, data engineers, and visualization experts to collaborate in a way that was previously siloed. It has paved the way for future projects with regards to how we think about what data science process are democratized into plugins for our global analysts to use. At the end of the day, the Language Analyser has fundamentally changed how we view text analysis and visualization – it has opened the business to new ideas and possibilities across the globe. 0 4 Posted by ash Fanalists - Bringing Data Marketing to Sports & Entertainment Organizations of All Sizes Name:Thierry de ReusTitle:Head of Tech & DataCountry:NetherlandsOrganization:FanalistsDescription:Nowadays personalized experiences are the norm. Personal attention, or the lack of, can make-or-break your business. Fanalists supports organizations in sports, media and entertainment to get to know their fans and to get a better gasp of their business. Fanalists breaks data silos, centralizes and enriches data, creates rich fan profiles and makes them available to analyze and communicate with. Fanalists works for event and festival organizers, sports federations like the Dutch hockey federation (KNHB), media companies such as The Walt Disney Company, and sports organizations like cycling team Team Jumbo-Visma and football club Anderlecht.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Value at Scale Challenge: The world of entertainment and sports is inspiring. Talented event organizers create the most creative live concepts and popular sports clubs create stimulating experiences for their fans. But really getting a grip on the actual individual fan? Nah. Data is generally not their comfort zone. Let alone understanding data science concepts like predictive modeling to understand their fans and customers even better. It turned out to be hard to truly get a grasp on the power of data, customer segmentation and personalization by explaining the theory behind it. Something was missing. Something that makes it transparent, visible and usable for creative marketers. Without that missing link, organizations would never outgrow bulk marketing campaigns and generic strategies. And that would be a shame: the interplay between creativity and data results in such a powerful combination. Moreover, Fanalists wants to support organizations in sports and entertainment in a scalable manner. Creating ad hoc analysis and customized data models for more than a handful organizations is not feasible. On the other hand, there will always be elements fully tailored to the needs of the individual organization. How can we create a clear and scalable workflow for our own data analysts and experts? How can we achieve that, while making it understandable for the talented marketers and campaign managers on the other end of the table? How can we use Dataiku to make great use of the best features of the platform? And how can we split project-specific details and configuration from the models in Dataiku so our data flows won't grow into non-transferable messes over time? Solution: We created a framework that ensures that our in-house data analysts make use of a data flow that is as standardized as possible, while enabling the individual data analyst to create changes or data models specifically for an individual project. In short: we created an extensive ETL flow, divided the flow in multiple phases and started managing project-specific settings and definitions externally as much as possible. And we launched an interface that brings transparency to the marketers and specialists at the other end of the table. That interface is called the Fanalists Terminal, where everyone working for the project, both Fanalists team members and our clients, can log in and see what is in the data model. Phase 1: Integration Layer This phase consists of bringing in the data and connecting to external sources. Generally this phase contains a mix of sources, such as SQL databases, SFTP files, and datasets retrieved by means of an API. At the end of this phase, all data is combined and standardized according to the conventions we defined within the framework. All columns in the available datasets are shown in the Fanalists Terminal as so-called "data fields" to enable everyone to get a grasp of what actually is in the database. Phase 2: Context and Settings Most data in entertainment and sports does not speak for itself. A lot of information and context is in the heads of humans. Keeping it that way does not really work for data models. That's why everyone involved in the project can use the Fanalists Terminal to add information and context to the available datasets. Which artists were performing at the music festival last year? What type of sports event did we sell for? What are the definitions of our marketing permissions? All this information is managed in the Fanalists Terminal, and applied later on in the data flow. This way, every project can be unique without making concessions to the standard flow. Phase 3: Grouping and Modeling In this phase all data is combined to create 360-profiles. After this phase, everyone involved in the project can analyze and act on rich fan profiles. Moreover, within this phase our data analysts can add prediction and clustering models. Usually a project starts without advanced predictive models - but with a growing maturity of the organization, models could be added when the project is ready for it. At Fanalists, we defined multiple business models and have multiple predictive models on the shelf that we can implement to match specific business models. For instance, when our client has a membership model, we can add churn prediction to the mix. But when we come across a client who is selling tickets for a yearly festival, we can add a model predicting repeat customers to the stack. Phase 4: Segmentation Based on the created 360-profiles, everyone involved in the project can use the Fanalists Terminal to define fan segments. Again, all so-called "data fields" are available to use and configure segments with. Because of phase 2 and 3, the existing data fields are enriched with additional information and data points related to the predictive or clustering models. Impact: With a clear data infrastructure, Fanalists is able to support organizations in sports and entertainment. By making it as transparent and comprehensible as possible, everything clicks into place, while also giving them the tools to play with their data. Through this combination, our clients can get the most out of their data driven marketing strategy. 1. Easily setting up the baseline infrastructure for data analysis and dashboarding The data infrastructure makes it possible to both analyze and act on the data of their fans. With an implemented data infrastructure, including a Fanalists Terminal, marketers and strategists can capitalize fully on the segments they created by analyzing them through dashboarding, i.e. Qlik Sense and Looker. On the other hand marketers can use this information to create personalized marketing campaigns and communication flows, by syncing this information to marketing platforms like email services. Using segments based on predictive models is therefore effortless and comprehensible. And after Fanalists helped a client reach that stage, the fun begins. 2. Enabling organizations of all sizes to leverage data insights, without the need to hire specialists It speaks for itself that analyzing and understanding their business and fans, leads to better decision-making, marketing efficiency and eventually revenue increase. Rolling out a data-driven marketing strategy not only leads to great fan experiences, it also results in more loyal fans and more valuable customers. Fanalists makes it possible for organizations without huge budgets and in-house data specialists to implement this innovative way of working. 3. Reducing efforts (and cost) to take the plunge toward data-driven marketing The described solution is beneficial for Fanalists in improving efficiency, reducing the necessary capacity, and improving quality of delivery in the long term. Since the data flow is as standardized as possible, new projects are kicked off faster. And because enrichments, definitions and segmentations are configured by the clients themselves, there is significantly less back and forth communication. Essentially it means hitting two targets with one shot: better internal workflows result in lower costs for the client and therefore lower the risk to take the plunge. So we can create great personalized fan experiences together. 0 4 Posted by thierrydereus Leidos - Software Analysis Execution Process Improvement and Prediction Program Team members:Karen Cheng, Principal Investigator, Data Scientist Ron Keesing, Division ManagerMark Clark, Program ManagerCaitlin Burgess, Program Management SupportTifani O’Brien, Pilot Project Lead and Concept InitiatorColeen Davis, Data ScientistDavid Morgenthaler, Data ScientistJevon Spivey, Architecture AdministratorCountry:United StatesOrganization:LeidosDescription:Leidos, formerly known as Science Applications International Corporation (SAIC), is an American defense, aviation, information technology (Lockheed Martin IS&GS), and biomedical research company headquartered in Reston, Virginia, that provides scientific, engineering, systems integration, and technical services. The Leidos Innovations Center (LInC) rapidly prototypes and field solutions in areas such as Artificial Intelligence/Machine Learning, big data, cyber, surveillance systems, autonomy, sensors, applied biology, and directed energy. This project is a Machine Learning and data analytics web-based deployment that analyzes project execution data for continuous process evaluation and improvement using the full lifecycle pipeline of Dataiku 1) data preparation, 2) data exploration and visualization, 3) AutoML machine learning, and 4) web-based user dashboard deployment.Awards Categories: Organizational Transformation Value at Scale Excellence in Research Alan Tuning Challenge: Background: Software development teams often don’t have sufficient actionable information and analysis to reliably forecast efforts, or real time metrics, to monitor and assess the production of software development teams. Our goal in this effort is to use analytics to improve agile-based software project execution processes by identifying key drivers of success, and predicting various outcomes. Business Problem: The Software Development Analytics project creates data-mining analytical and visualization approaches that Leidos will use to identify and analyze software best practices. The team will use predictive machine learning classification approaches that incorporate the identified key performance indicators to accurately forecast software development success probabilities. Predictive analytics will learn from historical performance data to predict and quality-check anticipated level of efforts for successful task completions. Lastly, the visualizations will be deployed via a web-accessible dashboard to support ongoing program performance tracking and to make the data-mined visualizations and predictive analytics accessible to interested parties. Implementation Challenges: This research analyzes various data produced during the agile software development process that indicates measurable business activity impacting the quality and delivery of software code. Efficient data Extraction, Translation, Loading (ETL), data cleaning, aggregation, and joining is required to assemble and store the data. Our project plan was to initially analyze pilot software programs that could scale in the future to support evaluation of multiple programs. Therefore, an understandable and reproducible pipeline is ideal. We desire to use state-of-the-art machine learning and Bayesian analytics to identify the key drivers for successful software execution, as well as discover pitfalls. We will also identify the best technical approaches for classification and supervised predictive learning approaches. This requires extensive data analysis and an iterative model exploration approach. Lastly, as the insights discovered will also be used for process monitoring and evaluation, a dashboard is will enable our technical development team to make the results accessible to various stakeholders. This project involves the full data analysis lifecycle from data wrangling to an interactive dashboard that showcases the resulting visualizations and analytics as depicted below. Solution: We employed Dataiku in all phases of our pipeline: 1. Repeatable pipelines and workflow analysis Dataiku greatly facilitates the organization and visualization of the pipeline workflows. Dataiku’s DSS pipeline allows us to easily scale the project to evaluate additional software programs because we are able to quickly identify the single point within the pipeline that needs modification, without disturbing the common components of the pipeline. The clean workflow presentation helps our team keep the code more maintainable and understandable. The sequential and modularized organizational approach of the pipeline steps supports an easier transition when adding new developers to the project, as the flow visualization is inherently self-documenting since the processing steps are more apparent. 2. Data acquisition and storage Dataiku was used to assemble, store, and “data wrangle” the various input files. Dataiku’s built-in file system and database solutions allowed us to quickly access the data and utilize SQL on the resulting datasets, without requiring us to spend our time on building a data lake. 3. Data processing Dataiku’s visual recipes supported rapid data transformations in data joining, column manipulation and data pruning. Dataiku’s ability to combine pre-packaged analyses with our own customized scripts gave us the significant flexibility we required to accomplish all of our data transformation needs. 4. Data visualization and analysis Dataiku’s rapid visualization of the raw and processed data was invaluable in allowing us to gain a quick understanding of the data distributions and data integrity. Dataiku greatly facilitates identification of missing data, invalid data, and outliers, allowing us to have confidence in the data we are processing. Dataiku’s built-in graphics were intuitive, allowing us to quickly look at the composition of the data and the relationships between datasets and enabling us to gain rapid understanding of the value within the data. 5. Auto-ML Machine Learning We deployed Dataiku’s Auto-ML approaches to verify performance of our candidate machine learning classification and predictive models, as well as identify additional candidate models that we should consider. Dataiku’s metrics evaluation interfaces allowed us to quickly look at performance trade-offs using multiple industry-standard metrics, and to identify overfitting conditions when training a model. DSS’s model Evaluation Recipe allows us to ascertain performance on a given test set. 6. Web-based deployment We took advantage of Dataiku’s ability to integrate web-based applications into the workflow. We were pleased that Dataiku supported current leading-edge web-based deployment technologies, thus allowing us to maintain our entire deployment implementation within the DSS workflow and to host it from Dataiku’s web application services. 7. Amazon Elastic Container Service for Kubernetes (EKS) architecture We instantiated Dataiku’s EKS capabilities which allows us to integrate with AWS security and scale our future development efforts. Impact: Dataiku had a great impact on numerous aspects of this project throughout the entire pipeline, the most important ones are highlighted below. 1. Deployment efficiency Significant time-saving was achieved in the combination, manipulation, and storage of data. We were able to implement the data processing pipeline in days, as opposed to months. 2. Ability to focus on our area of expertise in Machine Learning Not having to invest time in database setup and file system organization allowed us to focus on our core research interests that will address our machine learning challenges. By taking advantage of Dataiku’s web deployment capabilities, we saved a significant amount of time by avoiding the need to setup additional webservers. Consequently, our team did not require a web application specialist. 3. More robust organization and maintainability While this benefit can be overlooked, the impact to an organization can be tremendous. Dataiku provided us with additional version control, a framework for teamwork contribution, and process step readability and maintainability. 4. Rapid Machine Learning exploration and performance assessment Dataiku allowed us to search the algorithmic space and performance efficiently. We were able to consider additional models we might not have originally considered and were able to rapidly perform model tradeoffs. It would normally be time-consuming to consider a large number of models, but Dataiku makes this process efficient enabling us to look at tradeoffs between candidate approaches such as neural network versus decision-tree implementations. The model building process also allows us to fine-tune and compare hyperparameter settings. 5. Excellence in research category only While predictive approaches are often used to predict various data influences on the dependent variable, such as neural networks and decision-trees, we are interested in more than just the predictive results. One of our key research areas in this project involves identifying the key drivers of the dependent variable, in this case it is software project implementation planning and timeliness success. This capability provides us the ability to learn from our data to guide actionable software process improvement. The other research aspect of this project is the identification of the best classification and predictive approaches when predicting performance. Dataiku’s AutoML feature has greatly helped us to rapidly identify and assess candidate algorithms, explore hyperparameter settings, and to consider additional algorithms we may not have thought of. We are also quickly able to retrain models using different optimization goals. Since we are able to explore the algorithmic space quickly, we are able to become confident that our final model is optimal for our problem set. 6. Alan Turning category only In addition to the above, our project innovations include combining the web deployment pipeline with the overall data preparation and modeling pipeline. Historically, these project steps are performed by different teams and require web developer support. The combined pipeline approach was made possible by the latest version of Dataiku dashboard capabilities, which include state-of-the-art web development libraries. This end-to-end pipeline capability is visionary and leading-edge, enabling to deploy the latest models in near real-time to our end users of the dashboard. 0 4 Posted by chengke Researcher Frank Romo (University of Michigan) - Mapping Police Fatal Encounters to Inform Future Policy Team members:Frank Romo, Master of Urban Planning ResearcherHarley Etienne, ProfessorCountry:United StatesOrganization:University of Michigan - Independent ResearchersDescription:The team consists of Professor Harley Etienne and myself, Frank Romo. Our research on this topic has been going on for over five years and has been presented in various formats presentations, maps, videos, and community workshops. Harley and I are independent researchers working to support community safety and change through our academic research and community’s action.Awards Categories: Organizational Transformation Excellence in Research Challenge: The research project focuses on mapping Police Fatal Encounters. Our team cleaned, mapped, and analyzed thousands of records from various data sets to better understand the spatial distribution of fatal encounters in the United States between the years 2015 and 2020. The main challenge we faced was comparing records across multiple data sets and building a comprehensive dataset from various partial sources. Using Dataiku, our team was able to combine multiple datasets and create the first ever comprehensive dataset on this topic. Solution: Dataiku helped by working with us to establish a workflow that allowed us to not only create a comprehensive dataset but also perform spatial analysis, as well as run regressions and static tests. We had a great support from Lorena De La Parra (AI Strategist) and other team members to develop our strategy, testing methods, and to deliver a final dataset that we could use for future maps and visualizations. Impact: The results of this collaboration allowed Professor Harley Etienne and myself to submit abstracts to multiple academic journals. Currently, our research on Race and Policing in America is being examined and reviewed by multiple academic journals for potential publication. In addition, the dataset we created during this process was mapped and used in various community presentations. In fact, our research, maps and analysis have been highlighted on recent podcasts at MIT Community Innovators Lab and within the geospatial industry with ESRI. We will continue to build on this great momentum and continue to use the tools that Dataiku provides to help clean and refine our data so that it can be presented to the public and help inform future policy discussions. 0 5 Posted by fromo IME - Building an Emotion Classification System on Videos Emotion classification system on videos using the Dataiku deep learning for images plugin. 0 1 Posted by mohamed-khamis Dr. Haug (University of Bern) - Showing Students Industry Data Solutions with Dataiku Name:Sigve HaugTitle:PD Dr.Country:SwitzerlandOrganization:Data Science Lab, University of BernDescription:The Data Science Lab at the University of Bern is a cross-faculty initiative for open collaboration on data science, machine learning and artificial intelligence topics. It supports and conducts research projects and shares knowledge via a wide range of training offers and seminars. The University of Bern is a university in the Swiss capital of Bern and was founded in 1834. It is regulated and financed by the Canton of Bern. It is a comprehensive university offering a broad choice of courses and programs in eight faculties and some 150 institutes. With around 18’000 students, the University of Bern is the third largest university in Switzerland.Awards Categories: Excellence in Teaching Challenge: We offer data science and machine learning training to students and professionals at all levels. The offers range from initial data model design via programming, collaboration tools, mathematics, statistical inference, ethics, etc. to deep learning with all its application possibilities. A comprehensive and intuitive most-in-one solution which simplifies and connects the full workflow was missing in our portfolio. Solution: Through introducing students to Dataiku in voluntary training courses, we are able to exemplify a high-level industry solution to data science and machine learning with its advantages and possible limitations. Students in particular appreciate the comprehensive capture of large parts of the workflow in a data science project. Impact: The Dataiku training courses show students how high-level industry solutions to data science and machine learning may look like. In particular, they gain an impression of their advantages and possible limitations. The students appreciate the comprehensive capture of large parts of the workflow in a data science project. In future engagements, they may need to work with Dataiku or similar products. 0 1 Posted by shaug Oncrawl - Leveraging Dataiku for Predictive SEO as a Product Strategy Team members:Vincent Terrasi, Product DirectorElodie Mondon, Data Engineer Damien Garaud, Data ScientistCountry:FranceOrganization:OncrawlDescription:Enterprise SEO platform powered by the industry-leading SEO Crawler and Log Analyzer. Combine the power of technical SEO, machine learning and data science for increased revenues from search engines. Oncrawl offers two product suites to help you open Google’s blackbox and increase website revenues based on reliable SEO data. Oncrawl Insights: Unleash your SEO potential with prescriptive analysis. Unify your search data and improve your site’s traffic, rankings and online revenues: Analyze your website like Google does, no matter how large or complex your website is. Understand the impact of ranking factors on crawl budget and organic traffic Relies on 600+ indicators, advanced data exploration and actionable dashboards. And Oncrawl Genius: Empower your SEO with data science and automation. Use SEO data to build a more profitable business through BI, data science and machine learning: Build custom solutions to business and marketing problems with our API Use ready-made machine learning projects and adaptable models applied to SEO Connect with Business Intelligence solutions for better strategic decision-making Awards Category: Alan Tuning Challenge: Due to the complexity of today’s markets, the growing opacity of search engine ranking algorithms, and the sheer volume of data available affecting Search Engine Optimization (SEO), the ability to easily manipulate and analyze data now makes the difference between using it as a marketing tool, and leveraging it as an executive-level product strategy. In SEO, the goal is to rank pages at the top of search engine results. However, search engine ranking algorithms are based on many factors and generally constitute a black box. Our clients wanted to know the ranking factors that are most influential for their website. This is the goal of predictive SEO. Exhaustive data, incl. indexed pages, links, logs, etc. is collected to train a ML model to recognize the patterns between ranking factors and actual page rank. It is designed to answer questions frequently encountered in the field: how to predict crawl budget, how to detect anomalies based on trends, how to generate SEO text, etc. Integrating technical SEO with a data science platform is the best solution to provide the most efficient and relevant insights to answer these. Within the field, many different use cases are possible: Identification of new or unindexed content for real-time indexing requests SEO text generation Anomaly reporting based on trends in your crawl results Prediction of future long tail trends Find ranking factors per URL or group of URLs Monitor your crawl budget by category or subcategory to detect SEO issues Detect the best new products for the next few weeks for featured highlights Monitor your crawl budget based on different Google bots to focus on the right technologies And lots more! Additionally, another challenge is access to SEO data and data analysis skills in the filed: few specialists are also skilled in data analysis, and few data analysis platforms have the ability to easily interface with the sources of data used in SEO. Solution: API usage is limited by calculation speed in API-based solutions that then use Python or R to manipulate the data. Dataiku makes data manipulation simpler and more robust when compared to traditional API usage and enables faster data integration. Oncrawl plugin for Dataiku provides a recipe enabling the easy export of URLs or aggregated data from crawls, as well as logging monitoring events. Here's the step-sby-step process: Step 1: Import the data You can retrieve different projects from Oncrawl, and request the latest crawls. You can therefore use both data related to your site and data related to your competitors. This is not possible directly in Oncrawl where each project corresponds to a specific website. Step 2: Prepare the data Then, you need to prepare the data: clean up missing data, rename columns, enrich the data if necessary. Step 3: Add additional datasets Beyond the data linked to the crawl, you can add data from other tools: keywords tool, backlink tool, etc. Step 4: Merge the data Then, you simply have to merge the data, i.e. merge all the datasets based on the URL. The goal is to understand what impacts the SEO for each URL or group of URLs. Once the final dataset is ready, you can create a visual analysis. Step 5: AutoML Prediction You can click on ‘AutoML Prediction’: the interface helps you test the most efficient algorithms and recommends the best one. You must then choose on which variable the model should base its prediction. This is an essential step, as you must determine it according to your needs. You will then see the results of different algorithms on the same page, and be able to compare their accuracy to select the best fit. You should now have access to your results! For each of the algorithms used, an AUC score between 0 and 1 is available, the closest to 1 presenting the best results. You can dive into the details to assess the accuracy and efficiency of the model, while 'Interpretation' will give you detailed explanations about the metrics. You also look into which keywords boost or penalize seach URL, which will help you determine where to focus your efforts, depending on each site and the metrics involved. Impact: The use cases we mentioned above are not "new" in data science or machine learning, but they are newly accessible to the SEO community. As they often don't have advanced data skills, our work has made it possible for SEO specialists to work visually with SEO data. It also enables experienced data scientists and analysts to more easily obtain SEO data, which was, until now, not a typical type of data they had access to. SEO is a confirmed and growing field with an increasingly important role in business strategy. Improving data analysis and making this kind of data available for other purposes opens the door to more effective and broader-reaching strategies, as well as significant savings in cost. These strategies rely on the ability to implement the use cases listed above. For example, analyzing data related to ranking factors and keywords, combined with crawl data, can help identify URLs with textual content that should be improved. For one customer, rewriting meta descriptions through machine learning and text generation led to savings of 30 man-hours and 24,000 USD/month in SEO "production" costs alone. This project has made it easier to get the data, implement machine learning with Dataiku, and train a broader audience of practitioners. In terms of productivity, the overall process is twice as efficient: everything that is done in Dataiku would previously have been developed in R or Python, would have to be tested extensively and taken a lot of time to implement. In a few minutes, Dataiku is able to output all the variables to be worked on in priority, for each of the variables the analysis is detailed, and for each of the URLs we know what boosts or penalizes. Once the machine learning model is in place, we can add new URLs and know even before putting the content if there is a chance to be in the top 10 rankings! Up next: the value of machine kearning for SEO The next step in the democratization of machine learning for SEO will be to integrate the results of a Dataiku analysis directly into the tools and interfaces known by SEO users. Oncrawl is working on a big project to make steps even easier and with fewer clicks for the final user. Stay tuned… 0 1 Posted by VincentOnCrawl ADNOC – Building an Audit Intelligence Framework For Insights-driven Risk and Performance Analytics Team members:Ahmed Abujarad (SVP – Audit and Assurance)Darsan Krishnan (Manager – Quality Assurance and Excellence, A&A) Malav Patel (VP, Internal Audit) Niladri Das (Sr. Auditor, Internal Audit Analytics) Shiju Nair (Sr. Auditor, Internal Audit Analytics) Aneeth Menon (Sr. Auditor, Quality Assurance and Excellence)Mohammed K Al Mansoori (Manager – Business and Commercial Solutions) Antonio Rivas (Sr. Architect, IT Business Solutions) Country:United Arab EmiratesOrganization:Abu Dhabi National Oil Company (ADNOC)Description:We are a leading Oil and Gas Company in UAE. Established in 1971, Abu Dhabi National Oil Company (ADNOC) is a diversified group of energy and petrochemical companies that employs more than 50,000 people and is a major contributor to the GDP of the United Arab Emirates (UAE). ADNOC's Group companies operate in the fields of exploration and production; oil refining and gas processing; chemicals and petrochemicals; refined products and distribution; maritime transportation; and support services including sales and marketing, human capital, legal, finance and IT. ADNOC has been named the UAE’s most valuable brand for a second consecutive year starting from 2019, 28.6% increase over the previous year and 145% increase since the launch of its transformation strategy in 2017, making it the fastest-growing brand in the Middle East and the first UAE brand to surpass $10 billion in value.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Value at Scale Challenge: At ADNOC, the Internal Audit team works on a wide domain of Auditing services across ADNOC Head Quarters and 14+ Group Companies. Below are the key challenges that Internal Audit was facing: 1. Necessity of having a centralized monitoring solution for Internal Audit governance, planning, execution, and quality assurance & improvement programs Audit Management System hosted in ADNOC HQ was rolled out to ADNOC Group Companies in 2019. The primary challenge was to figure out an Audit Intelligence framework to drive multiple data-driven analytical solutions that would connect group companies on a near real-time basis and support Internal Audit management in deriving key insights on performance, audit completion, time tracking, efficiency, and cost optimization. The audit management was not having a suitable digital platform to continuously measure, monitor, and improve performance. 2. Necessity for an automated process for Internal Audit action tracking and performance analysis across the Group With 180+ auditors across 14+ Group Companies, there were more than 30,000 Internal Audit findings issued at various levels within the organization. There was a pressing need to thoroughly analyze the nature of audit findings and provide insights to the ADNOC Group Management, Audit Committees, or Boards on general vulnerabilities and effectiveness of policies to drive improvements & value generation. Some of the key challenges faced by the Internal Audit team were: Labor-intensive manual consolidation and report generation of the findings, and corresponding computation of action statistics which was prone to human error. Highly time-consuming consolidation of all the Group Companies data to gain insights. Interacting with the business focal points & auditee management was manual, and efficiency of the action follow up process needed improvement. Monitoring and reporting status of audit actions to the Audit Committee and respective Company Management was a time-consuming process due to lack of centralized repository of audit information. Data exploration and performance analysis across various Group companies was nearly impossible, as the information was scattered and not systemically controlled with proper standardization. Long-standing overdue findings were impacting Companies ability to improve internal controls and realize value benefits timely. ADNOC Group Audit Excellence objective was to consolidate all the findings and perform insights-driven risk and performance analytics across the Group. A need for an appropriate Audit Intelligence platform was eminent to drive Internal Audit performance and value. 3. Other challenges demanding a data science and analytics tool were as below: Information retrieval from Audit Management system and Share-Point/One-Drive flat files via APIs and automate scheduled analytical jobs. Live connection to a central Audit Analytics Data Mart to perform all analytical trends and risks predictions from findings across the Group. Have one central data science and analytics platform as an enterprise tool to connect all group company’s data and perform analytics. Complex organization structures and business hierarchy across the Group. New competency requirements and ability to quickly cross train auditors to perform Extraction, Transformation & Load (ETL) and analytics with minimal help and guidance the central analytics team. Access control & confidentiality of information. Establishing governance & process. To have a daily refresh and scheduling process. Solution: Audit Intelligence Framework As part of the Audit Intelligence framework, two digital solutions were established: Group Internal Audit Performance Analytics Group Central Audit Action Follow-up Analytics An end-to-end architectural design was established from data ingestion to visualization prior to the development: In 2019, ADNOC Group Audit Excellence team took pilot steps in standardizing the audit process, classifying audit findings based on risk rating and the management action plans. Committed action closure target dates were added, so that the most critical and high value actions were taken up by business on priority basis to minimize the risk and optimize value realization. In view of this, the Group Central Follow-up Analytics Solution was established and rolled out in 2019 – 2020 across the Group. In 2020, Group Internal Audit Performance analytics tool was implemented to measure and monitor core Internal Audit KPIs on audit governance, execution, and performance against set benchmarks and approved audit plans. The analytics were provided to Internal Audit Leaders, Managers and Audit Committee across the ADNOC Group to monitor audit execution rate, timelines, and findings to take proactive measures for driving productivity and efficiency. This has resulted in considerable value generation and cost savings by increasing in-house productivity and reducing outsourcing costs. With its user-friendly interface, visual debugging capabilities and workflow segregations along with the power of data engineering, ETL, and analytics, Dataiku helped establish the Audit Performance Analytics application in quick time with minimal training efforts. Dataiku extensively supported the following areas to deliver the solution: 1. Data Acquisition & Profiling Data Source connections were setup across the source system, especially the Audit Management System API and SharePoint for the Enterprise. We were able to integrate structured and unstructured data and flat files across group companies. Data prepare recipe of Dataiku helped in profiling the data to determine the accuracy, completeness, and uniqueness. 2. Data Standardization and Enrichment With Dataiku’s Prepare recipe, data fields were standardized to a common format to help prepare complex joining of datasets for further analytical processing. 3. Data Processing & Transformation Various transformations to the data were applied to prepare intermediary logic encompassing multiple calculations. Dataiku’s data visualization and Artificial Intelligence (AI) driven feature supported a great extent in understanding the calculation outputs - even before performing the recipe execution. Visual recipes (incl. stacking, window function, join operations) provide great features in Dataiku and extensively helped in processing datasets effectively without writing complex SQL scripts. We have over 300+ datasets and a number of visual recipes for transformation, rolling aggregations, and various data handling processes shared across projects. 4. Forecast Analytics Models were developed to understand and predict the potential spillover of Internal Audit plan based on execution rate for each Group Company. These were continuously measured by Internal Audit users and operation plans adjusted accordingly. 5. Processed Data Output Through Dataiku, we were able to push the processed output to a central analytics environment for generating further Business Intelligence (BI) visualizations and reporting. The ‘In-Database’ engine supported in performing complex calculations at database level to generate results in short span of time. 6. Automated Workflows Dataiku automation and scheduling capabilities helped to read data from multiple sources and provided job process-level insights that made the entire projects to remove all manual interventions and execute the process at set frequencies. Email notification feature of Dataiku supported to alert users in any issues encountered during data ingestion and processing. Impact: Driving efficiency and performance with highest productivity was one of the key leadership messages last year, and Internal Audit through digital projects has significantly contributed in all areas of ADNOC Strategic pillars through well-defined KPIs across 14+ Group Companies. ADNOC Group Internal audit was able to improve the Measurable Value of Audit (in AED Billions) actions and follow them effectively till the completion using analytics dashboards provided to the audit and business management. More notable benefits of data analytics solution powered by Dataiku include: Centralized data repository using a single version of truth, by connecting Internal Audit information across ADNOC HQ and 14+ Group Companies. Gain better insights on the Group Internal Audit spectrum. Improved and informed decision making with up-to-date information. Cost optimization by reducing outsourcing demand, thanks to increased in-house Internal Audit productivity. Improved operational efficiency through KPIs & SLAs. Better focus on identifying trends and Internal Audit performance across Group Companies. Quick turnaround of performance with accurate reporting of KPIs to management. Leveraging Dataiku to augment Internal Audit activities. Reduction of Internal Audit action overdue and overall improvement in Risk Assurance & Internal Controls. One year into our Dataiku journey, ADNOC is running 6 large projects in Internal Audit execution and performance, actioning insights across 180+ auditors and 1,500+ business clients in 14+ group companies. These projects are all automated and running on enterprise server with data stored in SQL schemas and providing end-to-end visualizations in corporate BI tool. With the help of Dataiku, we are also able to plan and design a central framework of Continuous Control Assurance analytics, which will provide analytics, augmented audits, and advanced analytics with predictive risks and detections - which will help reduce the leakages and establish the right level of governance and controls. In 2021, ADNOC Group Audit Excellence vision is to successfully roll out continuous control assurance projects running on top of SAP ERP platform for procurement across all Group Companies, provide near real-time detections of high-risk activities, as well as proactive insights to the ADNOC senior management & deeper assurance to the Audit Committee & Board. The predictive and Machine Learning capabilities in the tool is already being explored and under development which will be interesting to share in the next Awards submission. 2 16 Posted by LisaB Last reply 03-29-2022 by Data_Optimist Ericsson - Optimizing Warehouse Space with Citizen Data Science Team members:Ting Xiao, Automation DeveloperRafael Maia C, Automation Developer Michel Benites Nascimento, Analytics Solution Designer Yao Lu, Supply Chain ManagerCountry:United StatesOrganization:EricssonDescription:Ericsson provides high-performing solutions to enable its customers to capture the full value of connectivity. The Company supplies communication infrastructure, services and software to the telecom industry and other sectors.Awards Categories: AI Democratization & Inclusivity Value at Scale Challenge: As the world leader in the rapidly changing environment of communications technology, Ericsson operates many warehouses on a global scale. Optimizing the use of this space is a key part of Ericsson’s lean supply chain. Our project goal is to provide accurate estimates of the space for current and future inventory need. For most of the products stored, the occupied space can be calculated using simple formulas. However, for the remaining products this is not possible, thus historically the estimates of the available space were inaccurate. Simply applying the formulas will not satisfy the stakeholder requirements for high accuracy. Our challenges can be summarized as such: Impossible to measure every single packaging on a daily basis Logic to calculate the size of unknown packaging is unclear, and so complex that it cannot be defined by the business No centralized platform to perform the calculation Stakeholders require high reporting accuracy Solution: After being introduced to the Dataiku platform at Ericsson, we realized that Machine Learning could be used to estimate the space occupied by those remaining products in order to increase the accuracy - and all this, just in one platform. Since we already had the data stored in our data lake, Dataiku made it easy to extract, clean and transform the data. The seamless way it integrates the data flow simplified the process of dealing with the data sources. The data selection using Dataiku is made easy through the dataset explorer window, picking the right variable type, identifying incompatible data types is very easy and fast. We could also test multiple different Machine Learning algorithms for benchmarking, without having to code them. We were able to compare KNN, Random Forest, XGBoost and a few others. The hyperparameter setup, metric selection, and comparison with the charts output made it easy to spot the best algorithm. Not only that, but the auto retrain and selection of the best technique also allow our models to pick a better approach should it change with future data. Model lifecycle management is achieved by scheduled model retrain, which will pick up the changes in warehouse behavior and to provide more accurate estimation. The entire flow of pulling data, scoring, and outputting is automated via the Scenario feature. The output of our project is a dataset containing accurate estimates for the space available in all warehouses. This dataset is read by a dashboard in Tableau, displaying the insights visually to all supply hub managers. Impact: The ultimate financial impact of our project is that Ericsson saves on real-estate costs, by continuously optimizing the use of the existing warehouses. This is thanks to the more accurate estimates from our project. On top of the financial savings, our project provides operational visibility on demand, a data-driven warehouse space management process, and aligns with our strategy for digital transformation. The work that we initiated in our Americas region is now being showcased within our Group Supply organization. Being reusable around the world, our work will have an even greater impact on Ericsson's transformation journey. Personally, as Citizen Data Scientists, we are able to use AI to augment and optimize an existing process, all without writing a single line of code. This inspires us to do the same for many other use cases at Ericsson. 0 4 Posted by ayako Effilab - Building More Robust Data Products for Digital Advertising, in Less Time Name:Caroline Cochet-EscartinTitle:Data ScientistCountry:FranceOrganization:EffilabDescription:Effilab was initially a digital advertising agency which was acquired by Solocal to develop a digital product - Booster Contact - which offers customers higher visibility on search engines through optimized campaigns and bids on Bing and Google AdWords. Solocal offers these services for a fixed monthly fee, which is generated on a prospect-by-prospect basis depending on their needs, especially to small local enterprises.Awards Categories: Organizational Transformation Value at Scale Challenge: The data team at Effilab was originally composed of Python developers, data analysts, and data scientists. Our core missions were the deployment and operationalization of two models: A pricer to provide automated yet customized, per-customer quotes for core products, and deliver these quotes through an API. A budgetizer to generate dynamic bidding on Google and Bing ads, which is automatically tuned each day to keep up with recent performance and campaign behavior. This project is quite sensitive for our customers given that it automatically spends hundreds of thousands of euros a month. The solution used before Dataiku was entirely home made in Python code, with the following challenges: Lack of robustness in the data pipelines and model operationalization. Gap between data engineers, developers, data scientists, and data analysts. Slow changes, slow integration of new features, and model deployment. Restricted data access. Solution: Both projects were deployed quite quickly on Dataiku, using the Design, Automation and API nodes with the following benefits: 1. Data scientists are more independent The team moved from the in-house app developed by Python developers to Dataiku on Google Cloud Platform, which removed the barrier between research/development of algorithms and productization. This ultimately gives data scientists more independence and more control over the data pipelines, as well as more time to focus on the models that bring business value. 2. Time to production decreased Dataiku enabled Effilab’s data team to reduce time-to-production by at least 3x. This change was driven by the nature of Dataiku as a robust solution, including allowing for algorithmic R&D to be facilitated by the model interface. 3. Smoother and more robust overall processes In going from a mass of different tools and attempting to cobble them together in-house (data connections, Python recipes, Jupyter Notebooks, libraries integration for development, wiki, scheduled scenarios, monitoring) to Dataiku as an all-in-one solution, the overall processes and efficiency of the team are improved. 4. Data democratization Later on, easy onboarding of data analysts that got access to data pipelines and databases, and were able to contribute more and more easily. Machine learning modeling:Load and preprocess historical Visual feature and model exploration Advanced fine-tuned model building Refreshing actionable data:Sync business dataUpdate geo-demographic data Impact: The value generated revolves around 3 main elements: Cost savings The need for data engineering to maintain the two products has largely decreased, thanks to the robustness of the solution developed in Dataiku. The all-in-one platform enabled us to remove bugs in the automatic bidding models, leading to cost savings in the orders of dozens of thousands of euros a month (largely paying for the Dataiku licence!). Time savings As explained above, Dataiku enabled us to decrease time-to-production by at least 3x thanks to the central, flexible platform, which allows for integrating different technologies (e.g. algorithmic R&D) within the visual interface. Room for innovation Through enabling fast onboarding of new team members and giving an easy way to act on data, new ideas and new features emerged to improve the two products. Priceless! 0 2 Posted by Caroline Pr. Wartenberg (Hochschule Hannover) - Dataiku as a Versatile Platform for BI & Beyond Name:Maylin WartenbergTitle:Professor. Doctor.Country:GermanyOrganization:University of Applied Sciences and Arts Hannover (Hochschule Hannover)Description:With around 10.000 students the Hochschule Hannover is the second largest university in Hanover, the capital of the federal state of Lower Saxony. Institutionalized in 1971 from various educational precursors – the oldest dating back to the year 1791 – the Hochschule Hannover offers a particularly wide range of subjects in five faculties. The degree course 'Business Information Systems' is oriented towards business information systems as an independent discipline. The experience that business information systems specialists must have both thorough knowledge of business administration and a comprehensive basic knowledge of computer science is appropriately implemented through the contents of the degree course. One of the specializations to choose from is 'Business Intelligence'.Awards Categories: Excellence in Teaching Challenge: I teach a class in a Bachelor's degree on advanced topics in Business Intelligence. Students focus on ‘Business Information Systems’ and can choose ‘Business Intelligence’ as a specialization, with two courses: ‘Data Warehouse’ and ‘Business Intelligence’. In my course, Advanced topics in BI, I want to give the students a good overview about possible further topics in the field of data science, not only theoretically, but with lots of practical experience. The topics include different aspects of data science like data preparation, data visualization, and data analysis, as well as topics such as data governance, collaboration, code languages, machine learning, and even deep learning. Therefore I needed a software which facilitates many different use cases based on a broad variety and the interests of the students, and is easy to work with without prior experience. In addition to that, I teach another class in a Master’s degree called ‘Digital Transformation’. This is a consecutive Master based on a Bachelor’s degree in Business Informatics. Some students have a background in business, others in IT. The topic I teach is an introduction to artificial intelligence. Some of the students already have experience in machine learning topics and are able to program in Python, but some do not have any experience regarding AI. Therefore it was difficult to work out hands-on exercises with such differences in prior experience. Solution: I have been using Dataiku DSS for over 3 years in teaching for both classes, and it works very well. It only takes a few basic tutorials to get to know the software and to be able to work with it. The students can even work on complex machine learning tasks within one semester. They have the ability to use the integrated algorithms, or code their own. Dataiku DSS offers such a great variety of topics in tutorials, documentations, and articles so that the students are able to get to know many different aspects of working with data. Each year I try to work with the newest version and constantly explore new additions to the software. Impact: I would like to share this year's experience in my class ‘Advanced Topics of BI’ as an example of the possibilities. The students work in small groups and each group gets a special topic. They have to present on the topic in general, and then create a small hands-on workshop for other students in the class. This year, the topics were: Data Visualization & Storytelling, especially Waterfall, Treemap, and Sunburst Charts Geospatial Analytics, especially MapCharts using Reverse Geocoding/Admin maps Graph Analytics, especially Social Network Analysis Exploratory Data Analysis using Interactive Statistics Data Governance Connectivity/Data Sources, especially SQL Data Tables and SQL Recipes ETL using recipes based on Code, especially Python recipes Code - Notebooks, especially Python Notebooks Webapps, using Dash Deep Learning - Different Libraries, and especially Keras Natural Language Processing - Sentiment Analysis It really is a wide variety of topics that can be addressed within one software framework. The students create their own scenarios and create or find their own data as the setting for their workshop. Some integrate pictures in their project overviews as you can see here: Some use the Wiki for describing the tasks in the workshop: Some use data that already creates an interest in the topic for other students like the Social Network Analysis on the Marvel Universe or Game of Thrones Data: The feedback of the students regarding the software is always very good. They can be very creative and get to know different topics in data science. The added hands-on experience makes the presentation more interesting and increases the learning experience. 0 1 Posted by MW Pr. Enobi (Live University) - Facilitating & Enhancing the Data Science Learning Experience with Dataiku Name:Fernando EnobiTitle:ProfessorCountry:BrazilOrganization:Live UniversityDescription:Live University was created 17 years ago, with the dream of transforming education. We believe that people should enjoy every step of their lives and that includes when they decide to study. We are crazy about our 5 schools focused on different areas: HR, Purchasing and Supply Chain, Market and Commercial Intelligence, Tax and Accounting, Data Science and IT.Awards Categories: Excellence in Teaching Challenge: As a Professor teaching MBA classes in data science, I faced multiple challenges: Students in more executive levels or from non-IT departments are struggling to implement the practical data science use cases. Students were developing the use cases on their desktops, without any collaboration nor knowledge-sharing. All the classes have business use cases that relate to a data science solution. Without Dataiku, students did not have sufficient time to develop these projects. For me as a teacher, it was very hard to monitor and support students during the project execution. There was no “real” production environment for data projects. Students were always complaining about running Python codes on their notebooks and using Excel spreadsheets, which works fine for experimentation but not enough for bringing corporate value. Solution: Our MBA program focuses on implementing real business cases in leading data solutions. We have many students in leadership positions who are building data-driven teams, organizations, and data stack strategies. Dataiku came to our attention when we were searching the Gartner Magic Quadrant to select well positioned data solutions, and Dataiku has been named a leader two years running! We were amazed at the first contact with the Dataiku team, who saw the importance of educating MBA students (also professionals in the Brazilian market) on new technologies to fast track data solutions implementation and adoption. Impact: First thing, all the activities are prepared and organized on the platform, including use case scenarios, distribution of tasks, insights documentation. Dataiku has a very fast learning curve. Students are very excited to use the platform and in the very first class, they already started planning capstone projects because of all the available resources. The dynamic and interaction during the classes are now very critical to guarantee the quality of students' experimentation and learning. Below is a screen shot with 5 students interacting in a use case about Industry 4.0: they were working on a use case for predicting preventive maintenance in an e-coating manufacturing plant, based on 2 data sets requiring non-supervised learning. The use case was presented with little information to increase complexity and stimulate data investigation. The students were able to carry out the analysis using more concepts and models than originally covered in the original MBA module, which was made possible by Dataiku’s friendly visual environment and ability to connect with most current technologies. The students were therefore able to upskill and do a very complete diagnosis in a short time frame, leading to insights guiding business action plans. In summary, the Dataiku platform increased the learning experience by a lot! I’ve not seen this kind of experience in other schools in Brazil yet. 0 1 Posted by fenobi Standard Chartered Bank - Building an Intelligent Data Operations for Financial Planning and Performance Management Team members:Craig Turrell, Head of Digital Centre of Excellence P2P, with:Christopher HarveyDavid RogersRajesh A.Karthik C.Ramakrishnan D.Mahesh IyerPriyanka JaiswalRajasekar KanniappanBenjamin KohVignesh Kp Vivek KumarNaresh Babu Joshua SamuelK Santosh Satuluri Suhas Talanki Pushya ThimmaiahDheerendra YadavCountry:SingaporeOrganization:Standard Chartered BankDescription:We are a leading international banking group, with a presence in more than 60 of the world’s most dynamic markets. Our purpose is to drive commerce and prosperity through our unique diversity, and our heritage and values are expressed in our brand promise, Here for good.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Value at Scale Challenge: At Standard Chartered Bank, the Financial Operations Plan to Performance (P2P) division works on a broad array of core financial statement and performance management systems of the bank. We need to be able to look five years back and five years forward to identify abnormalities and trends, do balance sheet analytics, and conduct cost analysis to answer complex questions around how and where the bank is making profit, how the bank behaves, who should be hired and where they should be placed as related to cost profiles, etc. We provide the enterprise financial performance data fabric that drives the organisation financial operational and strategic thinking – the data and insight behind the decision. Of course, we had the systems in place to do all of this for many years, but operationally we were limited to millions of rows of data. While it sounds like a lot, the reality is that teams could provide one or two levels of detail for the 10 core products of the bank, or core primary country markets, and look at basic account structure over about three months - and even at those dimensions, we had to start splitting analysis in pieces. To answer questions for example for cost trend across the entire bank, its cost centres, every account line and every cost centre runs in the nearly 10 trillion possible questions. So we started digitizing reports for CFOs, but soon realized that this approach wasn’t going to influence the behaviors of the bank. We needed to find a way to impact the day-to-day work of financial analysts, making them more efficient and effective. When diving into the issue, we found out it was primarily a question of volume to get from 10 million to 400 million rows of data, not a question of underlying infrastructure — in fact, we already had robust compute warehousing, but almost no one was using it. We needed to find a way to leverage that existing ecosystem. Solution: In addition to finding a solution that leveraged our existing infrastructure investments, we didn’t want to have to go looking for another tool again in a few years when our team becomes mature enough to start doing machine learning on their data. That’s when we found Dataiku, and it solved volume straightaway. Within three to four weeks, we managed to turnover a 4.5 billion row table in a single operation. But Dataiku made us realize we could do so much more than that. We had an army of people copy and pasting data and, since we were now able to centralize all treatment within the platform in a lightning-fast manner, Dataiku allowed us to have different conversations about data. In the first nine months with Dataiku, the team churned out use cases from around FP&A. The next step was productionalizing their system and patterns, including ensuring there was discipline with data pipelines, SLAs, and more stringent DigitalOps processes. That’s the power of Dataiku: unbounding freedom, but also providing features to facilitate structure and processes. It made our vision possible and our strategy a reality. Impact: The Digital MI team at Standard Chartered Bank, led by Craig Turrell, overhauled three major systems at the bank that produce summary financials and expose performance and planning dashboards to thousands of stakeholders across the bank. This project is a major achievement, automating laborious tasks previously done in spreadsheets, increasing the scale and frequency of analytics, and delivering self-service analytics capabilities in a governed, standardized way. Key KPIs include: Processing 10 million to 400 million+ rows of data, opening doors to future innovation. Turning 2,500 hours down to a 10 minutes process, using governed process automation. Increasing analyst productivity by a factor of 30 through replacing spreadsheet processes with governed self-service analytics. Accelerating overall time-to-market, delivering their use cases in production in less than 9 months and turning idea-to-prototype time to under 12 weeks. We’re also developing Standard Chartered Bank’s unique brand of data democratization or self-service analytics, with a center of excellence (CoE) owning the core structured intelligence of the bank. All enterprise-level data is centralized, with product owners for every dataset and defined governance. From there, the team builds specific experiences to deliver answers through core apps, and the ultimate “self-serve” flexibility comes from how people around the organization use those apps to solve business problems in their day-to-day. The CoE at Standard Chartered Bank is currently made up of 16 people, but they will be expanding to 30 and expect to be hundreds in the next few years to support the growing demand and continue driving efficiencies around the business. There are numerous communities across the bank leveraging Dataiku and building “digital bridges” to the CoE’s core structured intelligence. On average we estimate that, two people armed with Dataiku are doing the work of about 70 people limited to spreadsheets. The goal in the coming years will be to continue to upskill people with Dataiku to increase efficiency across more areas of Standard Chartered Bank. In the months and years to come, we will also move into more predictive analytics in the FP&A division, with a focus on predicting within the mid/short term (3 months to a year). The vision is around a “supermind,” or a smart group of independent agents working together to create a benchmark of intelligence (if, say, 10 machines independently make predictions, taking those predictions collectively will probably be close to reality). There will likely be interesting learnings to share in next year’s Awards submission! 2 10 Posted by LisaB Last reply 07-22-2021 by tinaresh Lect. Beaumont, PhD (Columbia University) – Data Analysis Bridges Finance Theory and Practice Name:Perry Beaumont, PhDTitle:LecturerCountry:United StatesOrganization:Columbia University, School of Professional StudiesDescription:The School of Professional Studies is one of the schools comprising Columbia University in the city of New York. It offers seventeen master's degrees, courses for advancement and graduate school preparation, and certificate programs.Awards Categories: Excellence in Teaching Challenge: For context, the applicable course as pertains to this Submission is Security Analysis, and it is a finance class taught at Columbia University within the School of Professional Studies. Security Analysis is also the name of a classic investment book written by Benjamin Graham and David L. Dodd, and Benjamin Graham was a professor of Warren Buffet when Buffet attended the Columbia Business School. The class is very popular with students, and is taught each semester. The course is a post-baccalaureate offering, and many attendees are enrolled in a master’s degree program for business or public service. The class lasts 12 weeks, and students generally have a basic understanding of statistics and analytics. A core element of the course involves building bridges for students between finance theory and practice, and the homework exercise involving Dataiku specifically relates to identifying important distinctions between the available attributes of a successful brick-and-mortar retail business and an online retail business. A helpful way of approaching this is to tap into a real-world dataset, as well as an online enterprise platform. Accordingly, Google Query was used to access actual (anonymized) eCommerce metrics from Google’s merchandise website, and Dataiku was selected for performing the analysis. Solution: Dataiku was immensely helpful, in different ways: Admin & tech support: It was a great pleasure to collaborate with Dataiku personnel inclusive of Adela Deanova (concept development), Damien Jacquemart (programming contributions), and Josh Hewitt (classroom account setups), who each uniquely contributed to the success of the initiative. Product flexibility & extensiveness: By virtue of Dataiku making its product available with a variety of venues, from a 14-day free trial to the leveraging of synergies with AWS, Azure, Oracle VM VirtualBox, and more, an array of learning opportunities are presented to help students appreciate the value of the Dataiku proposition. Visual user interface: The visual enhancement tools available within Dataiku per recipe, display of model results, and graphing possibilities - all combine to help make for a meaningful interactive learning experience. Impact: Generally speaking, the conversion rate for a brick-and-mortar retail store is about 20%; that is, about 20% of the persons who enter the store end up making a purchase. By contrast, the conversion rate for a person who visits an online retail store is closer to 2%. Accordingly, with the appreciably smaller number of conversions online, yet with the ability to collect dozens of metrics related to a customer’s online experience (i.e., the customer’s device used to access site, length of time on pages, page path to checkout, and so forth), there is an opportunity to identify the factors that contribute to a greater likelihood of success in driving online sales. Even an insight that results in an additional 0.5% point in sales (from 2% to 2.5%) represents a 25% improvement in conversions (2.5%/2%-1=25%). As a result of working through the Dataiku module, students were able to obtain a variety of invaluable insights. Not only were they able to better grasp the enormous amounts of data that can be generated by an eCommerce business, but were able to appreciate the tremendous power of Dataiku to generate meaningful analyses from especially large files. Their analytical skills increased markedly, though perhaps even more impressive was the greater comfort level they exhibited with regards to drawing connections between the mathematical results and the practical implications. In the process, it became quite evident that students were becoming increasingly confident with developing a bilingual vocabulary to constructively evaluate both quantitative and qualitative dynamics of decision-making. The recommendations they made very much reflected a depth and breadth of understanding that went well beyond what would have been possible for them to achieve simply by reading a case study. By way of one particular example, the univariate analysis tool within Dataiku very much provided a useful guide for students to evaluate the information content and value-add of each variable within the dataset, and opened up constructive conversations related to the true key performance indicators within an eCommerce context. In brief, by virtue of digging into the data themselves, they were able to have a far richer learning experience, and one that will surely stay with them for a long time to come. By using actual Google Query data in combination with Dataiku, students were able to see for themselves what the customer experience looks like with well-defined data relationships, all while building bridges between textbook theory and real-world insights. As an additional element of measuring success with this initiative, students who have taken this course routinely contact me to say that they are using Dataiku with other applications, both academic and in the business world. In brief, they are taking the basic skills developed in the classroom and are actively applying them in a variety of other contexts. 0 3 Posted by phb2120 Schlumberger HR - Federation of Data Science to Accelerate Talent Performance Enablement Team members:Modhar Khan - Head of People AnalyticsRichard De Moucheron – Director Total Talent ManagementWesley Noah –Global Compliance Managing Counsel OperationsRupinder Kaur – Data Scientist Talent AnalyticsSampath Reddy – Analytics Product ChampionVipin Sharma - Technical Leads AnalyticsJuliette Murray Lamotte – Global Compensation Value ManagerRafael Fejervary – Global Talent ManagerSimon Spero (Dataiku) - Senior Enterprise Customer Success Manager)Country:United StatesOrganization:SchlumbergerDescription:Schlumberger is a technology company that partners with customers to access energy. Our people, representing over 160 nationalities, are providing leading digital solutions and deploying innovative technologies to enable performance and sustainability for the global energy industry. With expertise in more than 120 countries, we collaborate to create technology that unlocks access to energy for the benefit of all.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Challenge: With superior talent and a vast data warehouse available to Talent Management teams across the globe, the journey towards applying machine learning on the edge was challenged with the following requirements: Investment in learning and training, Compliance monitoring and ethical use of data (assurance), Bringing stakeholders together to discuss and assure the value of such projects. Furthermore, a challenge on capacity and resourcing also emerged in complex scenarios, in which the talent team across the world needed the technical expertise of the central data science team to support and enable components of talent-specific data projects such as talent planning, acquisition, identification, skilling and retention involving multitudes of unsupervised learning (e.g. clustering), text mining and NLP (e.g. embedding, NLP – identity), and supervised learning (ensemble modeling). Solution: Training: The material provided by Dataiku covered all the needs and catered to various competencies and profiles (e.g. data engineering, analysts, business partners), which reduced our journey to data science at scale by months and years. Collaboration: The platform enabled connectedness across multiple teams and drove efficiency in projects decisions, as well as visibility on where support was needed. In the past, we had many reviews to get stakeholders to understand what data was used and how engineering was applied, which went on for months. Today, they have instant visibility on the entire data pipeline. Compliance Monitoring: A big challenge was how to ensure that all the projects being done on the edge are compliant to privacy regulations and bias elimination, without stopping creativity. With the clear reporting tools and automation of such reviews, teams are able to work more efficiently at scale - where it would previously take weeks and months to complete such reviews before projects begin. Impact: During the pilot conducted for one month with 10 members from various teams (compensation, talent acquisition) and personas (recruiters, compensation analysts, talent acquisition planners), we saw more than 5 projects being deployed to solve local business needs. We have a plan to expand and add more than 100 HR personnel on the platform in Q3-4 of this year. 0 2 Posted by modhar Standard Chartered Bank - Learning Together, Faster Through 100 Days of Coding Name: Craig Turrell Title: Head of Digital, Finance Operations Country: Poland Organization: Standard Chartered Bank We are a leading international banking group, with a presence in more than 60 of the world’s most dynamic markets. Our purpose is to drive commerce and prosperity through our unique diversity, and our heritage and values are expressed in our brand promise, Here for good. Awards Category: Most Extraordinary AI Maker(s) Business Challenge: At Standard Chartered Bank with Financial Operations / Financial Planning & Analytics, we have been on a journey. This journey has taken us from a small team of financial analysts living 'ground-hog' lives trying to get information sources, integrated, and if time discovers value from it - but most of the time we could only publish the numbers and hope someone found it interesting. This is our beginning in early 2019, and our progress from those early spreadsheet days to enterprise-class pipelining, analytical translation and the ongoing pursuit of everyday AI is well documented. But something that happened in early 2022 shocked us, we started to reach a realisation of the '10,000' hour learning challenge - an almost impossible hurdle that meant we could not scale. So no matter how advanced the tool we were using, it will be worthless because we did not have the talent and structures for managing and using it. The 10,000 Hours So the first question is the 10,000 hours - where did this come from? Well, the answer is in our belief there are digital unicorns; analytical engineer experts that have design and hands-on knowledge of the analytical full stack. From data ingestion to normalisation, feature engineering, metrics calculation, machine learning modelling, visualisation and automation - a person who was able to transfer a multi-sided analytical platform and design next-generation analytics. We broke this down into three broad categories of skills: data pipelining and data structures, metrics+scenarios+machine learning and human-computer interaction/UI UX Design. Each of these required upskills and certification to establish credible skills in a centre of expertise business model. This impacted not only hand-on engineers but up the top of executive digital management - we had critical skills drift inhibiting us scale success. Business Solution: Dataiku helped us in three key ways: Ongoing evolution of the platform features Academy programme Partner ecosystem and interoperability Platform Features: The ongoing evolution of the Dataiku platform and incremental business value generated by each new release bring ready-to-use business solutions and analytical features which no longer need to be discovered, built and adopted across the team - but comes out of the box. For example, in recent version 11.0 the native time series features no longer requires our team to learn the theory, build a model/visualisation and share the feature - best practice is already there. The ongoing development of business solutions and best practice templates will also be a game changer. Anything which accelerate the time-to-value and reduces the learning overhead is make significant and immediate value again. We can do more because Dataiku gives us that 'helping hand' to get to best practice. Academy Program: This is our driving licence for analytics. It is how we decide how to enable people on our production environment and guardrails in the use of development features and machine learning. The courses are well structured, the video content and use cases are on topic and aligned to real work situations and problems we face. When were we struggling to learn what to establish a data and machine learning operations (dataOps / MLOps) Dataiku already had a new learning path for us to following and certify it. Even for senior digital executive, such as Craig Turrell (Head of Digital Operations), taking the courses and achieving certification helped close the skills drift and make better platform decisions. Partner Ecosystem: Having a broad range of plug-in extensions, interoperability options and cross platform solution not only provided an immediate solution but reduced learning burden 10,000 hours became 300 hours - accelerate time to value + ability to scale analytics & AI. Impact: Without Dataiku help: Original estimate of learning = 10,000 hours (data, machine learning and visualisation) across three technology stacks to expert level with Digital COE Estimated learning cost per engineering - 20,000 USD / 9-months of achieve full-stack delivery productivity With Dataiku help: Following platform improvement in Dataiku, extension of academy programme, utilise of partner/Dataiku solution + 100 days of coding learning sprints = 200-300 hours Estimated learning cost per engineering - btwn 1000-500 USD / 2-month to achieve full stack productivity Value Brought by Dataiku: 100 Days of Coding The personal journey of Head of Digital FinOps, Craig Turrell, is the best example of not only the impact on upskilling, and enhancement to tech stack efficiency but the network effects of the platform. Dataiku is a multi-sided platform for artificial intelligence and advanced analytics - it is an ecosystem of data, services, standards, and tools upon which different analytical persons individually, but collectively create and extract value. It is co-creative analytical thinking and analytical network effects of the platform and learning environment. This created an exponentially valuable effect on this critical skills problem as we were able to seamlessly share and co-create learning journeys, tutorials and 'hackathon' challenges in a community-driven learning marketplace enabled by the Dataiku platform and homogeneous data environment. The 100 Days of Coding was a call to arms to ensure the skills of the most senior digital leader were on par with the rest of the team, not through words but mental sweating through the courses and certification process. Dataiku gave us the environment to build big, fast and intelligent systems - it enabled us to achieve amazing results that were irreversible and transformative. But the 100 Days of Coding, the improvements in the product platform, the ongoing enhancement of Dataiku Academy and the contribution that partners are continuous extending and enriching the available solution to real business problems allowed us to do something beyond technical. Dataiku reduced the cost and time to teach new ways of digital and democratise advanced analytics and machine learning ; it reduced the time it takes to become a digital unicorn. It allowed us the see how we got the to 'enterprise' leveraging Everyday AI. And through socializing our journey on social media, we're now building momentum outside of Standard Chartered Bank! 0 1 Posted by CraigTurrell Ranjan Relan - High Traction Online Course on "How To Build your first Data Pipeline with Dataiku" Name: Ranjan Relan Title: AI and Data Strategy Manager Country: India Awards Categories: Excellence in Teaching Business Challenge: In 2020, demand for Data Scientist was increasing at an exponential space but the number of skilled professionals in Data Science were very few. Many had to go through the rigorous process of understanding the issue at hand and writing lot of code to a build data science pipeline. I was primarily looking for a low-code/no-code AI platform which could be leveraged by many to quickly build data science pipelines. Since many organizations started using and exploring the DataIku platform, I also started leveraging it in 2020. I was extremely excited to discover this low-code/no-code Enterprise AI platform - which had so many features such as amazing UX design, automated ML, visual recipes which are so easy to maintain and run Data Science pipelines, extensibility to Python and R, ability to do data demography analysis, and data engineering with a click few buttons. Looking at its product features, the DataIku platform came out as becoming more and more powerful in coming years. Business Solution: Based on the current industry trends in 2020, i.e. lack of Data Science skills, limited low-code/no-code AI platform, I thought there would surely be a course on DataIku. Since I have already published courses on some of the major well known EdTech companies , I firmly believed based on AI and Data Market Landscape and with DataIku's gaining popularity, a course in DataIku would help the community a lot. Hence, I created a course on coursera.org named "Build your first ML pipeline using DataIku", which was published in May/June 2021. Within just one year, this course has been taken by more than 2,400 users. Of the 5 courses I published last year in Coursera, the Dataiku course has been my fastest and most loved course in terms of users who have taken it and the rating it attained. This course has also one of the most "completion rate" which also contributes to the ease of use of Dataiku platform. In this course, students learn how to build their first data science pipeline using easy-to-use features in Dataiku. It leverages COVID datasets and students understand how to leverage visual recipes to perform data transformation such as split data , aggregate data etc. as well as how to train and score model and spin up their first data pipeline in less than an hour. Value Generated: This course has been taken by more than 2,400 students (course was launched last year in June), has 4.5 average rating and has completion ratio of 40% - which is very high in online learning world. Value Brought by Dataiku: DataIku has a great UX and as a low-code/no-code platform, it helps increase the team's efficiency, has an easy-to-use interface and high user adoption amongst citizen data scientist and ML Engineering community. 0 1 Posted by RanjanRelan Unilever - Designing a Responsible, Self-service Tool for Natural Language Processing Name:Ash Tapia (Linda Hoeberigs, Head of Data Science and AI, PDC Lab | CMI People Data Centre)Title:Data Partnerships & Tools Stack ManagerCountry:United KingdomOrganization:UnileverDescription:Everyday 2.5 billion people use a Unilever product to look good, feel good or get more out of life. Our purpose is to make sustainable living commonplace. We are home to some of the UK’s best- known brands like Persil, Dove and Marmite, plus some others that are on their way to becoming household favourites like Seventh Generation and Grom. We have always been at the front of media revolutions whether that be the 1st print advertisements in the 1890s or in 1955 when we became the 1st company to advertise on British TV screens. Experimentation and bravery drive us and have helped us become one of the UK’s most successful consumer goods companies.Awards Categories: Responsible AI Challenge: Our Unilever People Data Centre (PDC) teams across the globe deal with vast amounts of unstructured text data on a daily basis to gain insight into our customers, how they engage with our brands and products, and what are the needs that we are yet to tap into. The industry is moving at a rapid pace which consequently requires a rapid generation of insights to stay on top of the latest trends. The sheer amounts of data and the skills required to analyse it efficiently exacerbate this problem. The answers our marketeers, product research and development, and supply chain specialists need also require analytics approaches tailored to the business. Analyzing text data is a complex task and often requires understanding complex language models and Natural Language Processing techniques, which most of our marketeers do not have. To help with this, our data scientists and software engineers in PDC have built a range of NLP methodologies and plugins, with the most complex being the Language Analyser. The Language Analyser uses pre-trained language models for Part-of-Speech tagging, Named Entity Recognition, string matching based on existing entities relevant to Unilever, and visualises a range of insights in an interactive dashboard in the shape of network graphs, word clouds and sentiment scale bubble charts amongst others. Responsible AI is one of the fundamentals to ensure our business is responsible, ethical, and sustainable, and this key across all business areas. We set out to understand whether Language Analyser, one of the most used plugins by analysts that employs NLP methods, is ethical. Finally, we also set out to understand whether the way it is used within Dataiku by the analysts is ethical. Solution: To assess how ethical is Language Analyser and the way it’s used as part of our Dataiku ecosystem, we engaged with Adriano Koshiyama, a Research Fellow in Computer Science at University College London (UCL) and a co-founder of Holistic AI, a start-up focused on auditing and providing assurance of AI systems. Adriano has been working as a data scientist for many years across many industries such as Retail, Finance, Recruitment, and R&D companies. All aspects of the plugin were assessed – its internal components, the wider environment of where it’s sitting, and the kinds of datasets analysts pull through the plugin. Since the plugin is available within Dataiku, we can easily assess how and where people use it. Dataiku and its collaborative, open environment has enabled full transparency on how the plugin is used across different research projects. We are able to monitor usage and assess its applications. Holistic AI assessed our capability on privacy, fairness, robustness, and explainability using the following assessment framework: Thanks to the use of Dataiku, we were able to clearly outline each step of our development process, as both current and historical versions had been stored in the flow and using the timeline versioning. We were also able to share how, when and by whom the plugin was used, thanks to the usage stats available on a dashboard we created in Dataiku. Furthermore, it was extremely clear where the data came from thanks to the end-to-end visibility of each flow in the project. All of this meant that we were able to provide white-box access levels, and could be judged on each of the following dimensions: Impact: The Language Analyser plugin received the green stamp of approval from the AI auditing start-up. After having done a full assessment of one our most successful plugins utilizing the fair, responsible, ethical and unbiased, the analysts and data scientists can now be assured that the tool they use as part of their work fits within our responsible business practices. The plugin passed the assessment (more details) with flying colors on each dimension, thanks to it being fully transparent in Dataiku, and was the first capability in all of Unilever to do so. More widely, we can assure the business that Dataiku supports our teams in ensuring longevity and continued transparency of our capabilities. Additionally, we have full control visibility over what we develop, how we develop it and what components we bring together to design a responsible tool. Combined with sufficient version control, we are able to mitigate risks and know which areas to pay particular attention to. 1 2 Posted by ash Last reply 09-01-2021 by Triveni Unilever - Developing a Scalable Digital Voice of the Consumer Capability Name:Anand Patel (Digital Voice of the Consumer team)Title:Analytics ManagerCountry:United KingdomOrganization:UnileverDescription:Everyday 2.5 billion people use a Unilever product to look good, feel good or get more out of life. Our purpose is to make sustainable living commonplace. We are home to some of the UK’s best- known brands like Persil, Dove and Marmite, plus some others that are on their way to becoming household favourites like Seventh Generation and Grom. We have always been at the front of media revolutions whether that be the 1st print advertisements in the 1890s or in 1955 when we became the 1st company to advertise on British TV screens. Experimentation and bravery drive us and have helped us become one of the UK’s most successful consumer goods companies.Awards Categories: AI Democratization & Inclusivity Challenge: As a consumer obsessed business, Unilever had an ambition to ensure all business decisions are centered around our consumers. In order to do this, we must listen to, understand, and adapt to the changing needs and wants of our consumers. Our challenge was to develop a scalable digital voice of the consumer capability, that would enable us to interpret large, complex and unstructured consumer feedback datasets through leading data science techniques and serve relevant consumer insights to business decision makers for actioning. The solution would be developed in partnership with Unilever’s Quality function, to enable them to identify product issue and opportunities areas to improve and innovate based on consumer comments. The comments would be sourced from social media, product reviews, and Unilever’s customer engagement centers. We therefore required a platform that would enable us to build an industrialized AI solution end-to-end, including data merging, cleansing, manipulation, modeling, and output. The platform should enable the processing of large volumes of datasets in near real-time and enable the development and deployment of sophisticated AI-driven natural language processing models. The AI models will be key to automation and deriving prescriptive insights from the large unstructured datasets. Models should also be made available to be used by the rest of the business for other products or analysis if needed. The developed flow show be repeatable and performed on a cloud set up, to benefit from distributed data storage and processing. Solution: An industrialized solution called Digital Voice Of the Consumer (DVOC) was developed to fulfill the ambition. With Dataiku as the underlying platform, we were able to develop an automated and scaled solution that is updated daily and democratizes consumer insights to Unilever’s entire organisation. Through developing multiple flows in Dataiku, we were able to bring together an array of internal and external consumer feedback datasets and enrich with additional product data. The datasets were then cleansed and structured using a combination of Dataiku’s pre-built nodes and Python scripts. Machine learning-based natural language models were developed using leading-edge methods and enabled unsupervised learning to identify the key topics and themes consumers were talking about. Deep learning algorithms were also developed for sentiment classification. Through the versatility and intuitive nature of Dataiku, the flow was developed for data scientists and analysts with all levels of experiences and provided a great way for more junior team members to upskill themselves. Developed machine learning models have been deployed and also made available to the rest of the Unilever community through the plugins feature on Dataiku. The overall DVOC flows are complex, however the Dataiku platform enables us to visually display the flow and create groupings and scenarios to cluster related notes together – making it easier to make developments, test and diagnose bugs and errors. Impact: Our Digital Voice Of the Consumer solution is now live in over 60 markets, 50 factories across the globe and available in more than 20 languages with an active user base of over 2,000 people. The tool is embedded into the day-to-day operations of the Quality team, and is also widely used across other business functions including marketing, research & development, and customer development teams. There have been over 750 business insights logged from the capability to date with over 500 of these actioned and closed. DVOC insights have been actioned across Unilever’s business to make a real impact on consumers. Examples include: Helping Unilever develop and launch new products in line with what consumers really want. Optimise existing products so that are more suited for eCom by improving product packaging to reduce leakage and breakages. Redesign of products and packaging to make them more sustainable. Helping fight counterfeits. Tracing back issues to specific Unilever factories and enabling Unilever’s business to be agile during the Covid outbreak by reacting to changes in consumer behaviors. This has led to tangible business results to deliver costs savings, increased profits and ultimately improved product quality and better consumer experiences. Individual cases have unlocked costs savings of up to €350k each. 0 1 Posted by anand_patel ENGIE GEM - Building a Path For All Users to Easily and Securely Gain Insights From Their Data Name:Stéphane RaguideauTitle:Digital & Data AcceleratorCountry:FranceOrganization:ENGIE - Global Energy ManagementDescription:Global Energy Management (GEM) is one of ENGIE’s Business Units. At the heart of the energy value chain, we optimize the Group’s assets portfolio including electricity, renewable technologies, natural gas, environmental products and bulk commodities such as biomass. We also develop our own external commercial franchise worldwide and rely on four main expertise to offer tailor-made, innovative and competitive solutions. We provide services in energy supply & global commodities, energy transition services, risk management & market access, and asset management. With a staff of 1,400, offices in 15 countries including 8 main spots, GEM has an extended geographical coverage in Europe, the US and Asia-Pacific.Awards Categories: AI Democratization & Inclusivity Challenge: Data science is at the core of our activities at ENGIE - Global Energy Management. Users across departments manage various sources of data, including: Energy consumption, Market data, Weather information, Deal and order book, Etc. This data is leveraged by the business for many purposes, including: Pricing, Risk management, Data reconciliation from various sources, Reporting, Etc. But access to the data was limited due to its sheer volume, security considerations, and tooling segmentation. In addition, coding skills were required for accessing it, which excluded many users who did not have a technical background. Users needed to manually retrieve the data through a variety of applications, which caused several issues: Task repetitiveness, which was very time-consuming - and namely included extracting data from the different systems in place. Data availability, as all data sources were not always referenced and only IT may have been able access these. Operational risk, which is related to the quality of the data and the manual processing taking place (e.g. mistakes in copy/pasting steps), Coding skills required to manipulate the data and automatize part of the process, e.g. “Visual Basic for Applications” (VBA) in Excel, or Python. Tooling was not fit for the volume of data (in particular, Excel). Solution: Dataiku enabled us to solve some of these pain points: Data is now easily accessible through a number of plugins created internally, which enable users to easily and securely interact with the different data sources, Low code/no code data manipulation: visual recipes enable users to prepare and transform the data to fit their needs, without any coding skills required. For more complex operations, the collaborative visual interface enables our IT teams to work hand-in-hand with the business on building and editing workflows. Sharing insights from the data is made easy with the dashboarding features. Process automation, leading to: Shortened time-to-market, now that reporting and analysis are available on-demand. Increased monitoring capabilities, as monthly and weekly analysis can easily be turned into daily reports. Reduced operational risk, as manual operations are now automatized. Impact: As with every new tool, Dataiku requires specific onboarding to maximize its benefits. At ENGIE Global Energy Management, our users have different profiles and backgrounds, hence they are not all familiar with data manipulation and analysis. It is therefore important to provide them with training opportunities, regardless of their division (trading, risk, back office, finance, IT, etc.). This includes: Understanding their needs and identifying a use case to conduct a Proof of Concept (POC). Developing the most relevant training in regard to their profile and skills. Building the Dataiku plugins and connectors to allow them to easily and securely access the data. Hosting regular workshops (at least once per week) on select topics throughout the POC, including partitioning, Python recipes, machine learning, automation, dashboarding, pattern recognition, etc.) This training path is set to two months, after which users are given autonomy to access the data, manipulate it for their day-to-day needs, and most importantly, are able to explore new areas to gain more insights from their data. This has been a key pillar of data democratization within ENGIE - Global Energy Management. Becoming a data scientist or an engineer doesn’t happen overnight though, hence we’ve developed a framework to monitor the projects created in Dataiku and ensure they’re following established governance and best practices, including data connections, scenarios, data sharing, partitioning, plugins types, etc.) All users are therefore able to produce insights safely! 0 8 Posted by s-raguideau NXP Semiconductors - Reducing Detection Time of Manufacturing Issues with Real-time Automated Process Control Team members:Adnan Chowdhury, Manufacturing Quality EngineerDavid MeyerCountry:United StatesOrganization:NXP SemiconductorsDescription:NXP (originating from Motorola and Philips) is one of the largest semiconductor suppliers in the world. Key products range from Automotive solutions, Communication, Infrastructure, Mobile, Industrial, and Smart City/Home. NXP has over 60 years of experience in the industry and has brought key innovations to the world.Awards Categories: AI Democratization & Inclusivity Value at Scale Challenge: In semiconductor manufacturing, a critical quality and manufacturability figure of merit is the ability to detect and resolve manufacturing issues as quickly as possible, i.e. “Time to Detect” or TTD. Advanced process control is one of the key contributors that enable factories to minimize this TTD. Reduction of Time to Detect (TTD) is a critical quality/manufacturability goal because high TTD means manufacturing issues are not detected and resolved rapidly and consequently allows further production material to be exposed to faulty processing — which incurs material costs, engineering costs, and delays in meeting customer demand. In this article, I will present a comparison of the current typical process control using test wafer measurements with high TTD, versus using real-time automated process control using virtual metrology built with machine learning in Dataiku that greatly reduces TTD. Solution: In this Virtual Metrology solution, inputs consist in various data sources from the manufacturing production line (e.g. sensor data). We build machine learning models to generate predictions of the Key Measurement of interest, which then feeds directly into our Statistical Process Control systems for making Process decisions. Some examples of key measurements of semiconductor components may include the measurement of physical geometries (depths/angles) and electrical characteristics such as voltages/currents/resistances. This figure shows high-level comparison between using previous method of test wafer metrology for process control vs. new virtual metrology method for process control: We observe that, in the previous method, there is a delay in detecting issues in the manufacturing line because we only take test measurements every 3-4 days. When an issue occurs, it will go undetected know until the next scheduled test measurement. The new method with virtual metrology provides continuous detection of manufacturing issues, through creating virtual measurements on all materials. As manufacturing issues come up, we are able to observe the effects through the virtual measurements, which enables the manufacturing team to take immediate action and contain the problem. The key sections of the Virtual Metrology solution can be broken down into 4 components: What data will be used: Examples of the data sources included (but not limited to) Fault Detection Control data (i.e. sensor data from the process tool), hardware component information of consumables in the process, material volume data, electrical testing, etc. How the data will be used: Feature Selection, i.e. identifying the most significant inputs, and Feature Engineering, i.e. gaining deeper insights from inputs, are both used heavily to structure the input data before modeling. How to model the data: Advanced Machine Learning model was used for final predictions How to use the predictions for process control: To develop an end-to-end solution driving production process control, we identified and implemented the following components (cf. graph): Accessing and querying in real-time the input manufacturing data Performing all the Feature Selection and Engineering tasks Running the input data through the model and generate predictions Export the predictions into Statistical Process Control charts Identify suitable control limits and define appropriate out of control actions Results We determined the effectiveness of the model by focusing on the following metrics: Explainability of variation between inputs and outputs Error between actual result and predicted result The graph below shows the target (actual) vs. predicted values of the critical parameter of interest over a randomly sampled span in time. It can be observed that the predictions match closely the actual measurement values: Impact: Key advantages of using virtual metrology for process control includes: Significantly reduced TTD of manufacturing issues without significant financial investment from the factory, i.e. such Analytics/ML solutions simply use existing data sources. Reduction/elimination of the material and engineering cost of test wafer measurements. Increase production volume through minimizing tooling downtime, while we previously waited for test wafer measurements Faster root cause problem solving by using features like variable importance from the models. Through the deployment of the end-to-end solution, we estimate savings in the million dollars based on material and engineering costs associated with avoidable manufacturing excursions, now that Virtual Metrology is in place. 2 9 Posted by Adnan325 Last reply 08-06-2021 by BasB InfoCepts - An End-to-end Data Workflow to Conduct Clinical Research at Scale Team members:Nilesh LahotiAnil Kumar M.S.Mohit A. JichkarAnanth Kumar ChamarthiCountry:United StatesOrganization:InfoCeptsDescription:InfoCepts, a global leader of end-to-end data and analytics, enables customers to become data-driven and stay modern. We bring people, process, and technology together the InfoCepts way to deliver predictable outcomes with guaranteed ROI. Working in partnership with you, we help businesses modernize data platforms, advance data-driven capabilities, build augmented business applications, create data products, and support systems. Founded in 2004, InfoCepts is headquartered in Tysons Corner, VA, with offices throughout North America, Europe, and Asia. Every day more than 160,000 users use solutions powered by InfoCepts to make smarter decisions and businesses achieve better outcomes. For more information, please visit www.infocepts.com or follow @InfoCepts on Twitter.Awards Categories: Value at Scale Excellence in Research Challenge: The client is a leading pharma company, which wanted to analyze the market and make a decision to invest in the research of drugs to avoid risks, save time, and at the same time be profitable in the near future. The lack of both qualitative and quantifiable data at the client’s hand was a big concern to correctly analyze and understand the present market to plan and organize the future business. To develop any predictive model or to draw insights of any business using machine learning algorithms, it is very important to have real quality data telling about health symptoms that users are experiencing. The purpose of this research project was to collect real-time data from the end-user, store the collected data, perform analytics, and build a predictive model on top of it. Their challenges involved with the previous approach are summarized below: It was not an easy task to collect data from individuals as no one wants to share their identity while disclosing their health information, hence the need to anonymize the personal identity of the user. Manual data collection was a tedious process that involved sending an email and getting the details on an Excel spreadsheet. Lack of central storage mechanism and process to save and update the data regularly. Extensive coding was involved to prepare, clean, and aggregate the collected data before it can be analyzed. Heavy reliance on a third-party application to perform analytics and build predictive models. High cost involved in the purchase of data from third-party resources to perform analytics. Need to integrate the end reports derived out of multiple tools to create a single view of insights. Reliance on custom-based web graphical user interface, a standalone app processing on the server, and user BI reporting tools. Time-consuming data integration and pipeline orchestration with multiple technologies and scripts. Solution: To meet the above objective, our team built a web-based user interface survey form to collect data, created a storage mechanism to store temporary as well as permanent data, and a processing engine that can run the advanced analytics based on the existing and newly collected data. In addition, we built a business intelligence dashboard to visualize insights, plots, and analytics, along with predictions derived out of user-given inputs back to the end-user. This dashboard was presented as an output to the user to explain his/her current and future health disease condition with prediction. The following steps summarize the activities carried out to solve the business case: 1. Real-time data collection Dataiku web application capability was leveraged to create a survey form for the end-users. Our team used the Rshiny templates with Dataiku, which made it simpler and faster to create the form: The web app was made public (within intranet) to be accessible to all the users within the organization Apart from user input in forms, data was also fetched from internet sources like Google Trends to augment the data science models. Dataiku time-based scenarios are used to automate the process of collecting the latest trends. 2. Data preparation A mix of visual and code-based recipes in Dataiku was used to perform the data cleaning and preprocessing activities. 3. Model development The following models were developed using Python and Rstudio within Dataiku: Disease prediction: A classification model to predict the disease condition of the end user and indicate whether the user is disease-free or has been impacted by the disease. Survival analysis model: Predicts the expected age to attain the disease condition under different given medical conditions. Sales forecasting: Predictive model which makes sales predictions based on user-given inputs. 4. Automation and End User Reports Real-time prediction and analytics are presented to the end user via an Rshiny web app, based on the inputs provided: The output includes the prediction of disease condition, a survival analysis graph which predicts at what age the disease is expected under different given medical conditions, segmentation which shows the similar medical symptoms under different given age groups, etc. Python-based models were invoked from the Rshiny web app using the APIs provided by Dataiku. The entire workflow (screenshot 1.3) was seamlessly automated using a mix of scenario-based triggers and API based calls from the web apps (screenshot 1.4): Impact: 1. Cost savings The solution enabled $300k of cost savings from optimized infrastructure, improved process orchestration, and 3rd party data purchase avoidance. 2. Time savings The solution saved 50% of effort that was involved in the earlier manual effort. 3. Bridge gap between technology and business The business users were closely involved in the iterative development, review, and continuous research. The visual recipes in Dataiku enabled business stakeholders to understand the technology and general challenges in the process very well. This increased adoption by 2X. 4. Real-time ingestion and analytics Saved processing time in terms of data collection and data integration from the end users - since as soon as the user fills the form, the rest of the process for data preprocessing and analytics was automated within Dataiku itself. 5. Opportunities for innovation Real-time data collection enabled additional avenues to understand the current pharmaceutical market conditions better. 6. Improved decision-making process Central access by all the departments helped the users to make data-driven decisions based on the current market conditions, avoiding risks, and be more profitable. 1 11 Posted by keogabriel Last reply 07-22-2021 by Uday Hospital de Clínicas de Porto Alegre - Streamlining Data Workflows for Clinical Research Name:Tiago Andres VazTitle:Head of A.I. (From Research-to-Production) | IT Advisor in HealthcareCountry:BrazilOrganization:Hospital de Clínicas de Porto AlegreDescription:Hospital de Clínicas de Porto Alegre is a large teaching hospital located in Porto Alegre, Brazil. Affiliated with Federal University of Rio Grande do Sul, it was inaugurated in 1970, gradually becoming a reference for the state of Rio Grande do Sul and southern Brazil. It takes care of in about 60 specialties, since the simplest procedures until most complex, with priority, for patients of the Secondary Uses Service.Awards Categories: Excellence in Research Challenge: Hospital de Clínicas de Porto Alegre is a general, public and tertiary health care institution partnering with the medical, nursing, pharmacy and dental schools of the public university UFRGS, in Porto Alegre, Brazil. We develop our own Electronic Health Record called AGHUse, which is open source and the most adopted university hospital information system in Brazil. We faced multiple challenges: Data acquisition and preparation is time consuming and leads to lots of transformations and less data quality. Large amounts of data took hours to open and process simple modifications, and querying such complex databases usually requires more than one system analyst and business experts. The necessity to query each dataset multiple times to understand the information and create manual pipelines for machine learning were complex and confusing processes, without a graphical clear path explanation of what was going on. Each modification in methods or statistical analyzes involved the creation of a new branch in one centralized repository only for syntax and code versioning, and data was re-generated each time we had to rollback, sometimes leading to impossible reproduction in our own laboratory. Comments on our data were managed in docs without any link or meaningful integration to our code or data. We had faced limits with the number of columns in traditional relational databases and switching databases was an almost forbidden process. Switching machine learning pipelines from R to Python, and vice-versa, the same. And lately, before Dataiku, we were starting to feel pain points for larger data governance, tracing access to data, defining user profiles and logging every aspect from our research project with data. Solution: We started using Dataiku after a Tableau representative sent to me a comparison between Dataiku and Databricks. We analyzed both platforms, comparing important features for us, and I remember that moment that our research team voted unanimously for Dataiku to start our research. Then we sent a message to the company's Academic and Education relations team, and after a fast response we received a donated license and installed Dataiku on premises. After a few configurations and installation steps accomplished, almost with no need for support from IT department, we started the following steps: Statistical description of all our income data De-identification Cleaning and formatting Interpretation and curation strategy Definition of roles and tasks planning Notes and codes standardization Graphical pipeline definition Pipeline execution Statistical description of processed data Machine learning modeling Parameter tuning Artificial intelligence deployment In our journey, we learned that innovation tools like Dataiku will revamp clinical research, and so, there is the need to formally define ontologies and new methods for healthcare research using large hospital datasets. This is the motivation of our current work! Impact: The time saved using Dataiku is remarkable. I am teaching at medical school, where students are using Dataiku as a substitute for MS Excel and Power BI are leading the process. I think that this is due to the innovative “first layer” that the Dataiku interface gives to old spreadsheet concepts, used along solid backend processing (auditable and automatic), which enable us to move to a DataOps culture. 0 2 Posted by tiagoandresvaz One Acre Fund - Scaling Data Science Insights to Better Serve Smallholder Farmers Name:Emiel VeersmaTitle:Data ScientistCountry:RwandaOrganization:One Acre FundDescription:One Acre Fund is a nonprofit organization that supplies smallholder farmers in East Africa with asset-based financing and agriculture training services to reduce hunger and poverty. Headquartered in Kakamega, Kenya, the organization works with farmers in rural villages throughout Kenya, Rwanda, Burundi, Tanzania, Uganda, Malawi, Nigeria, Zambia, Ethiopia, and India. One Acre Fund actively serves more than 1 million farmer families. One Acre Fund offers smallholder farmers an asset-based loan that includes: 1) distribution of seeds and fertilizer; 2) financing for farm inputs; 3) training on agriculture techniques; and 4) market facilitation to maximize profits. Each service bundle is around US$80 in value and includes crop insurance to mitigate the risks of drought and disease. To receive the One Acre Fund loan and training, farmers must join a village group that is supported by a local One Acre Fund field officer. Field officers meet regularly with the farmer groups to coordinate delivery of farm inputs, administer trainings, and to collect repayments. One Acre Fund offers a flexible repayment system: farmers may pay back their loans in any increment at any time during the growing season. Beyond their core program model, One Acre Fund also offers smallholder farmers opportunities to purchase additional products and services on credit. These include solar lights and reusable sanitary pads.Awards Categories: Organizational Transformation Data Science for Good AI Democratization & Inclusivity Value at Scale Challenge: Operationalizing data science projects The biggest challenge we were faced with at One Acre Fund was operationalizing our data science projects. Over the years, many clever data scientists came and went at our organization. The data scientists conducted impressive analyses, but the results were soon outdated and forgotten once they left. There was not one root problem that caused this, but there were many different challenges faced. 1. Coding makes reusability more difficult The first challenge was that the data scientists were doing everything with code. It's hard to take over someone’s project when both the data, the model, and the steps were not visible. When the timelines of the data scientists would not overlap, taking over a project would be so challenging that the new data scientist would just start over again. 2. One-time insights through local computing Furthermore, our data scientists were not used to working with servers, so the code would run locally on their computer. If you run code locally, you can’t interact with “live” systems, pushing data science to the back of the organization. Results were not taken into production, but just used as one-time insights. 3. No shared infrastructure for accessing data Our final challenge was that we didn’t have an infrastructure set up to share our data. We were not used to interacting with databases, and thus our data would reside on our computers. When a project was finished, the deliverable was the report, but it would be hard to reproduce it. Solution: Since Dataiku is a full stack data science platform, it helped us in so many ways: 1. Automation to facilitate workflow maintenance Initially we were looking for a solution where we could schedule and run our Python and R code. We wanted it to integrate with Git, and run code in isolated environments. When we tried out Dataiku, we set up a project to predict client repayments. We had analysed this before, but it was a complex process, which took a lot of effort to maintain. With Dataiku, we could easily run our code, connect with our data warehouse and schedule the flow. 2. Optimize modeling thanks to model competition Dataiku enabled us to try out different types of models and investigate the data. These features helped us more than we expected. 3. Visual interface to democratize data insights In the next projects, we worked with less tech-savvy colleagues. Dataiku helped them to be able to use the click-and-play functionality to build complex ETL processes and store the data in the database. This helps the organization to democratize our data analysis and to store the data in a central place. Because of the visual nature of the flows, we can easily work together and discuss the challenges that we face during the project. Seeing the datasets halfway through the flow enables us to easily understand what is going on in the data, and share it with different stakeholders. It’s obvious that visualizing the steps of a process reduces a lot of mistakes and it is something we couldn’t work without anymore. Impact: 1. Scaling our data science initiatives Currently, we have created 70+ projects, of which 25 are in production. We maintain more than 1,000 datasets on 33 connections. We’re working with a small team and this would not have been possible without a platform such as Dataiku. And although our team is small, more and more colleagues are working on Dataiku and are able to perform their own advanced data science. We have 25 active users on Dataiku who work together on the platform on a daily basis, and this number is growing rapidly. Before, we wouldn’t be able to work together with such a big group. 2. Faster user onboarding & enablement Dataiku saves us a lot of time. Last week, we introduced a Rwandan data analyst to the platform. We reproduced a project he had been working on and we were able to take it to production within an hour. This meant that he didn’t have to manually download the dataset anymore, thanks to the visual recipes he could run his code on the database and he could easily investigate his intermediate steps. Before Dataiku, the project took him 5 days to build and run. 3. Upskilling our team & stakeholders It also allows us to use techniques which weren’t accessible to our data scientists before. For example, our farmers can now talk to a chatbot, to receive information about the weather. This chatbot talks to a Dataiku API endpoint, which accesses our stored forecasts. Without Dataiku, our data scientists wouldn’t be able to set up an API by themselves. The same can be said for the scheduling and the deployment of code to a server. Overall Dataiku really helps us to become a data-driven organization and we couldn’t work without it anymore. 0 5 Posted by Emiel_Veersma Atlantic Plant Maintenance - Bringing Workers Home Safe Through Defect Detection Name:Aaron CrouchTitle:Data Analytics ManagerCountry:United StatesOrganization:Atlantic Plant MaintenanceDescription:We specialize in the repair and maintenance of Power Plant equipment. We mostly work with GE Coal, Steam, Nuclear and Gas turbines and boilers. It is difficult and dangerous work performed by skilled union labor, often in the elements. Very few laborers work directly for APM on a permanent basis, and are hired out of union halls as needed. Since our work requires taking generators offline, most of our work is done in Spring and Fall when power demand is lowest. Our labor pool has many opportunities outside of APM, especially in the Spring and Fall outage seasons, so it is imperative that our union employees WANT to work for APM. Part of building that loyalty is getting our labor home safely.Awards Categories: Organizational Transformation Data Science for Good Challenge: The main issue which drove us to Dataiku is the moral imperative to send our workers home safe. Beyond that, there is a financial and business impact for safety too. If our workers go home hurt, they won’t want to come back and work with us again. Also, the industry, insurance companies, and government regulators track our Injury and Illness (or I&I) rate, which is a formula that takes into account the number of our most serious injures, times 200,000, divided by the number of hours we work. This represents the number of full time employees, per one hundred, that will experience a recordable injury in a year. Many plants require contractors to have an Illness and Injury rate below a certain threshold to work on site. Our I&I rate was close to getting us blocked from certain customers. So we wanted to combat safety issues, but injuries seemed so random that safety professionals and leadership could only react instead of being proactive. We needed to flag problem jobs, send professionals to problem sites, and train problem employees. It was decided that the data we have could help flag these problem jobs and possibly prevent an incident before it happened. Solution: We used Dataiku to combine all the inputs we had on jobs: defects by job site history, superintendent defect history, employee ratings, lines of business, job duration, headcount, turbine type, work scope… Dataiku was able to combine all of these variables and compare them to the past jobs that had safety or quality incidents and calculate the likelihood of a defect on upcoming jobs. Dataiku’s feature allowing us to see which variables are causing an individual job to be flagged as high risk has allowed management to reduce risk, through being able to identify the most impactful variables (where possible) and make changes in the front end accordingly, leading to fewer high risk jobs to begin with. The ability of Dataiku to flag which metrics are most important to the model as a whole has allowed our field personnel to see how the model works, and even suggest other metrics that the data team did not consider, increasing field buy in. We put in place mitigation strategies, such as required twice weekly hazard hunts and leadership site audits to reduce the likelihood of an injury or quality defect. Dataiku has also enabled us to measure how effective our mitigation strategies are after a job has occurred, by looking at the percentage of high risk jobs that used our mitigation strategies and comparing whether those jobs had a safety or quality defect. Impact: The results are dramatic. In 2018, before we launched the high risk jobs program, close to 26% of our jobs had some sort of safety or quality defect. That number has declined steadily to less than 11% in YTD 2021. In 2018, over 86% of the jobs that would have been flagged high risk had defects. In 2021 YTD, only 68% of high risk jobs had a defect. Using the statistical tools in Dataiku, we were able to see that high risk jobs where a leadership site audit was performed had a 77% chance of having a defect, while high risk jobs that did not have a leadership site audit performed had an 83% chance of having a defect, with a statistically significant p-value of .046. We obviously can’t point to an injury, or quality event that did not occur, but we can assume from the data that workers who otherwise would have been hurt made it home safely. This keeps our employees safe, our customers happy, and reduces insurance rates. 0 1 Posted by AaronCrouch Westpac – Driving Organizational Change with Collaborative, Self-service Data Science Team members:Cameron Wasilewsky, Discovery Success Lead, with:Malcolm Wanstall Daniel Mccarthy Leonardo Silva Upul Senanayake Astria Bothello Nicholas Lillywhite Kelly Tsang Collin Zheng Victoria Zheng Vincent Chen Dylan Dowling Rupa Dutta Di Gao Ryan Hopson Jun (Blake) Im Tim SpencerCountry:AustraliaOrganization:WestpacDescription:Westpac is Australia’s oldest bank and company, one of four major banking organisations in Australia and one of the largest banks in New Zealand. We provide a broad range of banking and financial services in these markets, including consumer, business and institutional banking and wealth management services. We also have offices in key financial centres around the world including London, New York, and Singapore. Westpac Group's purpose is Helping Australians Succeed. It’s what we do, who we are and why we come to work every day. What's most important to us is understanding what success means to our customers and helping them get there.Awards Categories: Organizational Transformation AI Democratization & Inclusivity Challenge: As Australia’s oldest bank, and one of the major banking organizations in the country, Westpac encountered the typical data science challenges of companies in the financial services industry: 1. Complexity in processes Being a large, highly regulated organisation, it can sometimes be a complex process to get approval on a business idea and bring it into production quickly. We set out to tackle this issue to improve innovation and the ability to solve complex business problems. We not only had to provide a pathway to move forward with more velocity, but also make it an understandable pathway through dedicated governance and documentation to fit within the industry’s security and regulatory framework. 2. Change in ways of working Data practitioners at Westpac are comfortable with SQL, which shaped a certain understanding that data comes in tables and products are like reports. It is a hurdle to work through, given that they are comfortable with this way of working, and the key is not only to show them new possibilities, but also to communicate the value they will gain. Learning new tools and changing ways of working is extremely difficult, hence we had to make them see that we could build much bigger products and services, that data can come in any shape or form, and expand the very idea of what data can do – instead of just being an answer to a current question. 3. Technological change The former tool structure was very simple, but we needed new tools to achieve bigger outcomes, which added complexity. But, relating to the challenges previously highlighted, people are naturally resistant to change and renewing the structure is a very cumbersome process – hence the need for a central data platform to bring everyone together on this journey. Solution: Dataiku played a key role in this transformation, in several ways: 1. New organizational structure leveraging Dataiku as the central data platform As we implemented Dataiku in September 2020, we created the Discovery Lab which, with only 5 full time and 5 part-time team members, supports 40 business labs and 120+ employees in their data science endeavors. The Discovery Lab is structured into two teams: The Success Team focuses on business engagement, use case prioritization, onboarding, and orientation in Dataiku, and supporting the delivery of data initiatives with the different labs. The Platform Team oversees building the data applications, as well as developing the capabilities to integrate new technologies within the security and regulatory framework of the organization. Embodied through using Dataiku as the central data science platform, this new setup helped break down the barriers between business and tech to seamlessly work toward building cutting-edge data products and services. 2. New collaborative, self-service operating model to upskill and drive a closer alignment between the business and tech teams We also developed a new operating model and new processes to ensure a strong alignment between the Lab and the business teams, while enabling them to broaden their understanding and gain new data skills throughout the project. When a new team member joins, we plan an orientation session so that they understand how the team can leverage Dataiku. Then they go through the ‘Discovery Suitability Assessment’ to further define their use case, the objectives and expected impact, as well as how they visualize the outputs and outcomes of the initiative. This is key to understand the potential value that can be delivered to our customers through this use case. This enables us to assess where we can help, whether this aligns with our data strategy, and the potential resourcing gaps to cover in order to carry it through. Adopting this high-level, end-to-end project view also enables us to identify the needs of the business in terms of data literacy, wrangling, and visualization, and train them to develop these new skills. With this practical approach, they immediately see the benefits they will gain in their day-to-day job. Dataiku’s visual interface also acts as an enabler since they understand the whole data workflow, while being able to learn more and dig deeper into the more technical work in one click. 3. Documentation & reporting made easy to efficiently carry out new data initiatives through production Our main best practice is to document every request and action performed throughout the project for public knowledge – which not only helps with alignment and upskilling but has also enabled us to carry out more and bigger undertakings. After the ‘Discovery Suitability Assessment’, the overall objectives, key metrics, and outputs are entered in a public Confluence page (example below), as well as the main developments, and any issues or bugs encountered. Conducting each project in this transparent manner helps with setting the right expectations for what the Lab can achieve - being clear on what we’re capable of doing, what we’re working toward, as well as what we’re building and when it will be delivered. It’s critical to build trust with our business counterparts. Everyone can add and edit, and it’s everyone’s responsibility to keep it updated so that we move forward quickly and comfortably, while anticipating and mitigating any potential risks. Besides the Confluence page, all requests and actions are logged in Jira so that stakeholders are aligned on progress. Dataiku’s tracking and monitoring capabilities are a critical piece of this equation. The built-in functionalities accurately reflect the work that is being done by the team and all actions performed on the project, in a visual and easily accessible manner. We have also created dedicated tracking projects to automate reporting and gather high-level risk metrics on the projects conducted by all our 40 labs (examples can be seen below) – this saves us precious time on formerly tedious reporting tasks and enables our team to focus on the bigger, more interesting undertaking that will transform the organization. Dataiku dashboard tracking the number of weekly active users Dataiku dashboard tracking our tickets and time to address them We have also implemented a range of continued support for our labs in the form of weekly drop-in meetings, 1-on-1 sessions and Westpac specific video training material. Continued engagement through announcements and our recent showcase competition. Impact: This new organizational structure implemented with Dataiku has brought tremendous benefits to drive data transformation within the organization. The most striking aspect is the change in perspective, which is critical to enable change. The new collaborative, self-service model enables the tech team to serve the business, help them upskill, and work together to drive innovation at scale. It is altogether a different attitude to a lot of the former processes that were in place! Breaking down the barrier between tech and the business has been our biggest success with Dataiku. Now we are able to tackle bigger data projects and get them in production state in a much quicker, yet realistic, time frame - whereas many used to be stuck in ideation due to misalignment and tooling segmentation. As the central data platform, Dataiku enables us to gather all stakeholders around the table, regardless of their profile and data skills – which brings about a diverse mix of perspectives to the project, helps everyone upskill, and eventually leads to developing more innovative and transformative data products and services. This structure and platform have enabled us to bring our 100+ users and 300+ analytics community members closer to the data, ensuring they are able to ideate and develop data driven insights with the appropriate governance, support, and mindset. 1 4 Posted by camwasi Last reply 07-13-2021 by JamesO The Ocean Cleanup – Empowering Citizen Data Scientists Across the Organization Name:Bruno Sainte-RoseTitle:Lead Computational ModelerCountry:NetherlandsOrganization:The Ocean Cleanup, StichtingDescription:Every year, millions of tons of plastic enter the oceans primarily from rivers. The plastic afloat across the oceans – legacy plastic – isn’t going away by itself. Therefore, solving ocean plastic pollution requires a combination of stemming the inflow and cleaning up what has already accumulated. The Ocean Cleanup, a non-profit organization, designs and develops advanced technologies to rid the world’s oceans of plastic by means of ocean cleanup systems and river interception solutions.Awards Categories: Data Science for Good AI Democratization & Inclusivity Challenge: At the Ocean Cleanup, data science needs not only to be applied to develop the technical solution to get rid of the oceans’ plastics, but also to maximize opportunities for funding and sustaining the broader organization. We started using Dataiku in 2018, as we were facing numerous data science challenges: 1. Data pipelines managementWe were first lacking an adequate tool to manage data processing pipelines that would allow for ad-hoc data updating and processing with an optimal computing time. Some of the data that we were manipulating was faulty (in part because of satellite transmissions shortening messages) and we were missing a tool to have a quick scan through the data, in order to elaborate the right approach to correct it. We were missing a tool to automate the updating of our pipeline, especially accounting for specific triggers, but also allowing for dashboarding and reporting options. 2. Handling different formats and types of dataThe data that we manipulate can be structured and unstructured, comes in various formats from different providers. As a consequence, being able to handle at the same time single .csv files, databases, retrieving the data from built-in and/or provider specific API was compulsory. Along with the diversity of data that we manipulate in form, we also manipulate data of different nature (scientific measurements, geospatial information, natural language, financial data, etc.), which also calls for a versatile data science processing solution. 3. Lack of centralized platformAll the AI/Machine learning frameworks that we came across were not very user-friendly and required too much expertise to be promoted internally. Finally, we were looking for a collaborative data science platform to allow for multiple users, with specific roles/rights/access. Solution: Thanks to Dataiku, we were able to address a big part of the challenges aforementioned. In the frame of the Ikigai program we were given access to both Dataiku as a platform at a company-wide level, but also to Dataiku staff and expert-knowledge to support the implementation of our data science projects. 1. Empower people across the organization to gain insightsHaving access to Dataiku allowed us to ramp up our Data Science analysis. The user-oriented, code-minimalistic approach provided by the Dataiku pipeline was a game changer both for our data pre-processing and post-processing steps, the extensiveness of built-in operations to manipulate and prepare the data made it possible for less programming-savvy staff to perform their usually very time-consuming operations, as such they felt like using a steroid-powered Excel! The built-in GIT version controlling, and the logging of each individual operations allows for a readable and sustainable project approach. The collaborative environment, and the overall user experience allowed for a company-wide adoption. As part of the Ikigai program for nonprofits, Dataiku provided support to enable users on the platform through trainings, projects co-development, and support. The Dataiku staff also contributed directly to develop some of the most innovative projects, including Emilie Stojanowski, Matthieu Scordia, Paul Hervet, Jacqueline Kuo among others. 2. Leverage & reuse data pipelines and features to save timeIn regard to the product itself, Dataiku helped us better manage our data pipelines, so as to track what has been done and leverage accomplishments for future projects. We first started using Dataiku for the testing of our barriers in November 2018, then less than a year later we easily replicated the same data workflow for a new test campaign – leveraging these new efficiencies to spend more time developing features. In November 2020, during a campaign in the North Sea, our engineers only went through a quick Dataiku training to be able to reuse the previous data pipelines and features, so as to focus their time on where they can add most value. 3. Adapt to different data (in format & type) and use cases as we expandThanks to its great versatility, Dataiku enabled us to connect to many different data formats, APIs, plugins, etc. This is paramount as we are handling different data in nature as well, and the platform capabilities are key to adapt. For instance, visual pre-treatment features allow us to identify when satellite data is cut without needing to complete the entire preparation process, and filter this out – which saves much-needed time and resources for our nonprofit organization. Impact: As a Non-profit organization, our main Key Performance Indicator is the quality/time ratio of the tools that we are using. In other words, our biggest objective is to have the most reliable yet versatile data science platform to efficiently conduct data projects and create the biggest impact across the organization. Dataiku enabled us to dramatically improve this KPI through different levers: 1. Improved operational efficiencies to focus resources on innovationBefore, we had to extract data from SQL databases, aggregate them, build interpolations, etc. with a combination of Excel and MatLab. Dataiku enabled us to centralize the whole workflow, while all practitioners are able to work with the technologies they’re used to – making us move faster and going further into the most innovative parts. 2. Gather everyone on the same platform for quicker decision-makingThanks to the visual interface, Dataiku enables both technical and non-technical stakeholders to understand the data workflow and the success metrics of projects developed. This is enabling us to make quicker decisions, for instance regarding the success of specific campaigns, and make adjustments on the go to meet our goals. 3. Easy onboarding to bring in more people to better fit projects’ needsThe user-friendly interface makes it easy to onboard new people into the platform. The learning resources, as well as the catalog of events and content, give people a vast perspective on data science topics. Our core data science team is aided by 5 times more people across the organization who have been given access on a temporary basis to bring their expertise on various projects. 4. Visual “recipes” enable everyone to bring in their skills and shorten the time-to-insightThe visual features for data wrangling and visualization enable everyone to contribute their skills to successfully conduct data projects. Even the project managers and director are able to draw insights from the data at hand. 5. Democratize data science through a versatile, all-in-one platformBuilding upon all levers described above, Dataiku’s biggest impact lies in the. democratization of data science across the organization. From the original technical testing project, we’ve expanded usage to finance (understanding fundraising dynamics to increase the impact of our campaigns) and communications (optimizing social media content and timing to maximize visibility, and therefore improve fundraising abilities). Through bringing together everyone on the same platform and the rise of “citizen data science”, Dataiku enabled us to embed data science across the organization to create more value toward fulfilling our mission. 0 2 Posted by BrunoTOC HES-SO - Teaching the Next Generation of Chief Data Officers with Dataiku Team members:Cédric Gaspoz, Professor UASDominique Genoud, Professor UASCountry:SwitzerlandOrganization:University of Applied Sciences and Arts Western Switzerland (HES-SO)Description:HES-SO is a network of 28 schools of higher education offering degree programmes in six key fields to some 21,000 students. Our universities play a key role in the social, economic and cultural development of each of western Switzerland’s seven cantons. The Master of Science in Business Administration (MSc BA) gives the opportunity to develop the understanding of management they acquired during their Bachelor’s course and specialise in a fast-growing area of competence.Awards Categories: Excellence in Teaching Challenge: To accompany our students and their future employers in the digital transition, we have thoroughly revised our business intelligence courses. Our students must not only be able to analyze data, but also to become information producers with all the steps that this includes. During the three BI courses of the master, we start by refreshing the knowledge of R, before starting the discovery of data science that leads us from data acquisition to Deep Learning. It was possible to introduce the students to the different types of training and evaluation of the models by using the available metrics and data splitting that are usually used in machine learning. The built-in graphical explanations about the results greatly facilitated the understanding of the tuning of the models and their understanding. Another challenge we wanted to address with this redesign was the production stage. Often, curricula stop at the learning of models and their evaluation. However, from a business point of view, it is only when the models are deployed that we start to create value. It was therefore important to be able to concretely see how to use the models to support business processes. When redesigning these courses we faced several challenges: Multiple tools implemented depending on the languages (R or Python) Tools dedicated to only one part of the workflow (data cleaning, machine learning...) Lack of tools for the release of models in production Lack of understanding of the metrics used to check the quality of the models in production No possibility of collaboration between students on the same project Feedbacks and corrections take a lot of time (file transfer between students and teachers) IT support for multiples tools The lack of integration of the tools also prevented us to successfully proposing integrative pedagogical scenarios because it was difficult to actively collaborate with several people on the same task. Solution: During our review of different tools, we had the opportunity to test Dataiku. The ability to support all phases of the lifecycle as well as the integration of notebooks convinced us to pursue the discussions with the Dataiku academic team. The most important weakness was the absence of the API services in the academic offer, which Dataiku finally integrated into its offering. Our Dataiku instance has been deployed in our global infrastructure on Azure and is perfectly integrated in our processes (incl. provisioning, authentication...). After one year of classes, we have 109 users in 39 groups who have produced 646 projects, 2,491 recipes, 614 notebooks, 752 models and 67 API services. This usage includes exercises and work done in class, individual projects, group projects, a hackathon and some master's thesis. Using the Dataiku API allows us to efficiently create projects, assign rights, track progress and evaluate results. As teachers, especially in the pandemic year, Dataiku allowed us to support all teaching activities. The first discovery of Dataiku was through the R notebooks. By revisiting the statistical basics and the R language, students started to use the dataset features. Then, through a day animated by a data scientist from Datailu, the students discovered the preparation and classification of data with the integration of R recipes. As the weeks went by, we introduced more advanced notions to finish with image recognition using deep learning. Finally we saw how to publish a model using the API services and integrate it using a simple webapp, also created in Dataiku. Various group projects allowed the students to put their knowledge into practice on different datasets related to concrete business problems (e.g. sales prediction, churn, audit, mortgages). The groups performed all the tasks related to the lifecycle: data preparation, feature creation, feature selection, model learning, hyperparameter selection, selecting the best model, deploying the model on the API services, and querying the model with a webapp: Impact: The course content is organized around the tasks of the CDO (Chief Data Officer). In collaboration with CDOs, who are also involved in the course, we have defined 26 user stories that cover all aspects of the function. While the theoretical aspects are covered in a more traditional way, the practical aspects are realized on Dataiku. Thus, without being data scientists, the students had the opportunity to concretely explore the different aspects of the job and to implement them through various use cases. This allows them to specialize in data science or in managerial functions where they will be required to manage the different aspects of data projects within multidisciplinary teams. Because of the importance of practice during the courses, we have adapted the assessments to allow students to be in a situation close to reality. At the end of the course, a hackathon was organized with the goal of developing a webapp for investors, allowing them to determine the financing opportunities, based on a dataset on the success of startups according to the financing rounds. During 10 hours, groups of students from management and computer science departments had to complete 15 tasks (data balance analysis, feature creation with R, subpopulation analysis, ...) and produce 10 deliverables (map of failures and geographical disparities of factors, evaluation of model results, ...). A board meeting was also organized in the middle of the day to review the intermediate results and distribute new data. It should also be noted that the 10 groups (40 students) had to work remotely due to health restrictions, which would not have been possible without a tool such as Dataiku. At the end of the day, the groups were able to present their results and demonstrate their webapp based on the best trained model. After the first iteration, student satisfaction was very high and the Dataiku tool was quickly adapted. Several students chose to do their master thesis using Dataiku. Quotes from the course evaluation: “Really cool to have discovered and tested R and Dataiku. Thanks for the Hackathon experience and the whole organization!” “Very rich content, intervention of a Dataiku data scientist.” “Very interesting material and dynamic presentation, good alternation between theory and exercises.” The knowledge of Dataiku by our students allowed us to propose master thesis subjects including advanced machine learning algorithms. As the Dataiku tools were sufficiently well understood, many students have chosen subjects containing machine learning, ranging from sleep cycle analysis to recognition of objects on geographical maps, and face recognition. They will all use the features provided by the Dataiku framework.Number of distinct users per day - showing a strong interest in Dataiku, even outside of the class! 0 3 Posted by cgaspoz Epsilon DX Machine Learning - Leveraging Dataiku to Build a Real-time Decision Engine Team members:Mark Sucrese (VP of Marketing Sciences), with Kevin Ng, Ravi Nagabhyru, Ben Eubank, Felice Brezina, Raghavan Kirthivasan, Kevin Elwood, Ben McVay, and Wayne Townsend. Country:United StatesOrganization:Epsilon DX Machine LearningDescription:Epsilon DX is an organization that is focused on delivering value through; partnerships, engineering, creativity, strategy, software implementation, and best-in-class managed services. We work with our clients to help them take on the challenges of today, tomorrow, and beyond. We have deep expertise in partnerships like; Adobe, Salesforce, Dataiku, IBM, Microsoft, and Sitecore to create unique partnerships that bring value to our clients not seen by others.Awards Categories: Value at Scale Challenge: We have been asked by a large US retailer to create a universal decision engine, that leverages modern machine learning technology, to create omnichannel personalized experiences that can scale the enterprise for digital and non-digital customer engagements. Areas of focus for personalization are: Product and offer recommendations, Optimized content and messaging, Improved and targeted pricing, Ability to reduce fraud. The brand wants to ensure that model development can be leveraged by both data scientists and business analysts in a collaborative way, and go from development to production in a short amount of time. Lastly, the brand needs improved model transparency and interpretation to ensure compliance with legal, data, IT, and marketing teams. Solution: Epsilon worked with Dataiku to build a real-time decision engine leveraging Dataiku for model development, workflow, and execution, automation node for job scheduling and monitoring, and API node for integrated services to the various applications for batch and real-time processing. We integrated this solution into the brand's enterprise CRM and email marketing applications to deliver hyper-personalized email experiences. As each email campaign is generated from the marketing teams, the system calls our environment to return the next best actions for things like product recommendations, offers and promotions, best content, and best subject lines. This sense and response environment ensures that no two emails are ever the same, and each one is uniquely personalized for every customer profile. Impact: The machine learning test groups have outperformed the control groups by 47% for revenue per email open. The machine learning test groups have driven a 20% lift in conversion rates over the control groups for all emails that were opened. To date, the program has generated ~31k in net new revenue week over week. Created a first of its kind, content optimization delivery system using deep learning and computer vision models in Dataiku. 0 1 Posted by msucrese Pr. Kurnicki (Hult International Business School) - Frictionless Data Mining for Advanced Learning Name:Thomas KurnickiTitle:ProfessorCountry:United StatesOrganization:Hult International Business School Description:Hult International Business School is a new kind of non-profit business school that constantly innovates to meet the needs of students, employers, and society in a world that is changing faster than ever before. More than a business school, Hult is a dynamic and multicultural community that educates, inspires, and connects some of the most forward-thinking business talent from around the world.Awards Categories: Excellence in Teaching Challenge: Before using Dataiku, our Data Mining class was based on 5 different data environments and IDEs: SQL Server, MongoDB, Hadoop, Arrangemo DB. Students had to navigate from one access point to another. Students would report multiple tech issues in different systems and our teaching assistants spent a lot of time trouble-shooting. Solution: Dataiku helped us consolidate the environments. Instead of using 5 different environments (accessible from different apps), we used Dataiku as an all in one software where students could seamlessly move data between the different environments. At the end, students used one software, Dataiku, throughout the entire class. Impact: The Data Mining class allows students to explore many different data environments and learn the advantages and disadvantages of using a particular data environment for a given business case. Students learn when they should use a SQL, NoSQL, Hadoop environment and how to make that decision. By enabling them to switch in just a few clicks, Dataiku removes the friction and enables them to quickly compare and assess the most relevant technology for their project - saving much-needed time and headaches. In the real-world, many data teams need to connect to multiple data sources to collect required data - this class shows students how to do that. 0 2 Posted by LisaB MandM Direct - Managing Models at Scale to Deliver Faster Insights Team members:Ben Powis - Head of Data ScienceJoel Lenden - Junior Data ScientistTobi Osinowo - Data ScientistJim Taylor - Data AnalystOisin Devitt - Data AnalystCountry:United KingdomOrganization:MandM DirectDescription:Our journey began in 1987 by founders Mark Ellis and Martin Churchward (the 'two Ms' ), selling end of line sports products directly to customers in the UK. Now more than 30 years on, we're now one of Europe's leading online off price retailers - with over 2 million active customers. We have dedicated local market websites in Ireland, Germany, France, Netherlands, Denmark and Poland as well as dispatching to another 20+ countries worldwide. Our success is down to our commitment and passion for seeking out the biggest fashion, sport and outdoor brands at unbeatably low prices all year round, to make sure you get even more for your money.Awards Categories: AI Democratization & Inclusivity Value at Scale Challenge: MandM Direct is one of the largest online retailers in the United Kingdom with over 3.5 million active customers and seven dedicated local market websites across Europe. The company delivers more than 300 brands annually to 25+ countries worldwide - which means that in 2020, we grew fast. This meant more customers and, therefore more data, which magnified some of our challenges: Getting all the available data out of silos and into a unified, analytics-ready environment: The core data team is made up of four people (two data scientists, and two data analysts), but we extend our reach by leveraging a hub and spoke model for our data center of excellence, meaning we work with analysts embedded across the business lines to scale our efforts. However, this requires an easy way to enable those teams to leverage data to answer business questions that doesn’t necessarily involve code. Scaling out AI deployment in a traceable, transparent, and collaborative manner: MandM’s first machine learning models were written in Python (.py files) and run on the data scientist’s local machine, and we needed a way to prevent interruptions or failure of the machine learning deployments. In an attempt to tackle the second challenge, our team moved these .py files to Google Cloud Platform (GCP), and the outcome was well received by the business and technical teams in the organization. However, once the number of models in production went from one to three and more, we quickly realized the burden involved in maintaining models. There were too many disconnected datasets and Python files running on the virtual machine, and we had no way to check or stop the machine learning pipeline. Solution: We turned to the powerful combination of Dataiku and GCP to answer these critical challenges. With Google BigQuery’s fully-managed, serverless data warehouse, we were able to break the data silos and democratize data access across teams. MandM Direct was one of the first online retailers to implement Google BigQuery across the organization. At the same time, thanks to Dataiku’s visual and collaborative interface for data pipelining, data preparation, model training, and MLOps, our team could also easily scale out the models in production without failure or interruptions - all this in a transparent and traceable way. MandM now has hundreds of live models doing everything from scoring customer propensity to generating pricing models, all with visibility into model performance metrics, clear separation of design and production environments, and many more MLOps capabilities built into the platform. Teams can now easily push-down and offload computations for both data preparation and machine learning to GCP. Using Dataiku means this capability is accessible to all user profiles across the organization, without knowing the underlying technologies or complexity. We love the flexibility offered by Dataiku. We do have a mix of people that go more toward AutoML and visual tools as well as one data scientist who loves to work in code. That’s the beauty of the platform and why we chose it — we didn’t want a low-code tool where we could get lazy and just click a few buttons. Now the team has the best of both worlds: if they want to nerd out and go under the hood, they can do that. If they need a quick model, they can do that too. Impact: The benefits we have seen by using Dataiku and GCP aren’t limited to time saved from tedious maintenance work - we’re also having more impact across the business. Since we began our journey with Dataiku in January 2020, 54 projects were created, which handle 1,171 different datasets and are orchestrated by 53 different scenarios, making sure our models build only when the data is available and validated. We have 9 large projects deployed to an automation node, which are solving complex business problems or providing advanced insight on a daily basis. Our data team is now able to deliver a variety of solutions on business problems from adtech to customer lifetime value, whether that’s a dashboard, a more detailed piece of analysis or a machine learning project deployed in production. For example, one application might be business users in the buying and merchandising teams, who could interact with machine learning models in their day-to-day work through Dataiku applications, which provide a nontechnical interface for projects developed by the data team. We’ve also built out a feature library with Dataiku that contains more than 400 features specific to MandM’s business. Now, the feature library is the first place people go, sort of like a shop window for machine learning projects — it takes away the monotony and repetition of their work. Having a platform like Dataiku allows our data scientists to focus on building cool things, not spending hours and hours on maintenance and making sure things are running. With workflows deployed in Dataiku, we save days of work every month. 0 2 Posted by ben_p Template Submission - Predicting the Sakura Blooming Day Name:Makoto MiyazakiTitle:Data ScientistCountry:FranceOrganization:DataikuDescription:Dataiku is the world’s leading AI and machine learning platform, supporting agility in organizations’ data efforts via collaborative, elastic, and responsible AI, all at enterprise scale. Hundreds of companies use Dataiku to underpin their essential business operations and ensure they stay relevant in a changing world.Awards Categories: Alan Tuning Challenge: Sakura, the world-famous cherry blossom in Japan, happens every year in the spring. It is a world-renowned attraction, and many people travel from far to witness its wonders. However, sakura blooms only for a short period of time: seven days after the flowers open, they already start to scatter, so many people simply miss it. As I’m a Data Scientist at Dataiku, I took it as a challenge to build a prediction model for the bloom of Sakura using Dataiku DSS - and see if I could obtain more accurate predictions than other websites! Solution: Dataiku enabled me to automatically update the prediction on a daily basis, thanks to the scenario automation feature of Dataiku DSS: Everyday at 2 a.m., a Python recipe scraped the weather information in the three cities from the previous day and updated the predictions like the chart below: The two other main forecasting websites, tenki.jp and Japan Weather Association (JWA) respectively updated their prediction once a week and every two weeks. Daily updates are a big plus to gather more accurate forecasts on a precise blooming day! My Dataiku DSS flow can be seen below: It consists of two zones: data pre-processing and machine learning: Data preprocessing Inputs: daily weather data from 1991 until today in the three cities as well as the historical blooming days from the past 30 years + daily weather data scraped from the Japan Meteorological Agency using a Python recipe, including average, highest, and lowest temperature, precipitation, and daylight hours. Feature generation with a window recipe: generating rolling averages during the past one month, three months, and six months for each of the weather-related variables for each of the three cities. I also made an average of the blooming days during previous years for each city, assuming that the blooming day does not differ much from year to year. Machine learning Includes two Random Forest algorithms, scoring one dataset for each: These two scored datasets are then combined to create a single prediction result. I made it this way because I set the target variable to “number of days until blossom.” This target variable itself takes a value between 0 and 365 (or even more). But I wanted to tell the model to look at this as a cyclical variable, so that it can correctly assess the error. For this, I scaled the variable to a range of 0 to 2π, then decomposed it into sine and cosine. Therefore, one model predicted the sine value, another predicted the cosine value. I combined the prediction results and reversed it to a day unit. Impact: I run my prediction, humoristically called ‘Random Sakura Forest’, for three cities in Japan: Oita prefecture (southern Japan), Aomori prefecture (northern Japan), and Tokyo. My predictions were a few days behind the two other forecasting websites: Sakura in Tokyo bloomed on March 15, so my prediction was already proven wrong - but both of the other forecasting websites also missed the forecast, although at a smaller extent. However, my predictions were closer regarding Oita, which blossomed on March 24 (I predicted March 22 vs other websites had March 15). I was also more in line, together with JWA, regarding Aomori, which opened on April 14 (predicted April 21). A random forest with 500 trees and the maximum depth of 100 yielded the best result, and I was able to reduce the error to four days. One interesting finding is that the model favored only the temperature-related features. All the other features, such as precipitation and daylight hour, had very little impact on the result. In Japan, forecasting the Sakura blooming day is the daily news headline throughout spring. Hence since the 1950s, a lot of methodologies have been addressed, including multiple regression analysis. Nowadays, most of the Sakura blossom forecasters use a formula based on a method developed by Yasuyuki Aono, Associate Professor at Osaka Metropolitan University, in 2003. Aono’s approach is unique in a way that it’s composed of two parts well-incorporating the biology of Sakura trees. First, it computes a D-day, where the trees wake up from their sleep during the winter time. This D-day is computed from a place’s latitude, distance from the sea shore, and average temperature during January and March, which therefore depends on the place. What the Aono method tells us is that the blooming day depends solely on the place’s geographical position and its temperature, which is indeed consistent with my prediction result! 0 1 Posted by Makoto 2021 WINNERS & FINALISTS Discover the winners and finalists of the 2021 edition, and read their story to learn about their pioneering achievements in data science and AI! Use the following labels to filter by award category: * Organizational Transformation * AI Democratization & Inclusivity * Data Science for Good * Responsible AI * Value at Scale * Excellence in Teaching * Excellence in Research * Alan Tuning ORGANIZATIONAL TRANSFORMATION Recognizing individuals and organizations who are building the foundations of a data-centric culture with Dataiku. WINNER SUBMISSION Building an Intelligent Data Operations for Financial Planning and Performance Management ORGANIZATION Standard Chartered Bank TEAM MEMBER Craig Turrell, Head of Digital Centre of Excellence P2P Read their story FINALIST SUBMISSION Using Dataiku to Democratize AI Within the Organization ORGANIZATION Schlumberger TEAM MEMBER Valerian Guillot, Nerve Center Data Science Architect Read their story FINALIST SUBMISSION Empowering a Data-driven Organization to Improve Astronomical Operations ORGANIZATION ALMA Observatory TEAM MEMBER Ignacio Toledo, Data Analyst Read their story AI DEMOCRATIZATION & INCLUSIVITY Revealing the outstanding work of individuals and organizations who are leveraging Dataiku to enable all people to gain insights from their data. WINNER SUBMISSION Using Dataiku to Democratize AI Within the Organization ORGANIZATION Schlumberger TEAM MEMBER Valerian Guillot, Nerve Center Data Science Architect Read their story FINALIST SUBMISSION Building Self-service NLP for Analysts Worldwide ORGANIZATION Unilever TEAM MEMBERS Linda Hoeberigs, Head of Data Science and AI, PDC Lab & Ash Tapia, Data Partnerships & Tools Stack Manager Read their story FINALIST SUBMISSION Developing a Scalable Digital Voice of the Consumer Capability ORGANIZATION Unilever TEAM MEMBER Anand Patel, Analytics Manager Read their story DATA SCIENCE FOR GOOD Turning the spotlight on the best use of Dataiku by nonprofits, companies, and individuals, to make a positive impact on the world. WINNER SUBMISSION Bringing Workers Home Safe Through Defect Detection ORGANIZATION Atlantic Plant Maintenance TEAM MEMBER Aaron Crouch, Data Analytics Manager Read their story FINALIST SUBMISSION Empowering Citizen Data Scientists Across the Organization ORGANIZATION The Ocean Cleanup TEAM MEMBER Bruno Sainte-Rose, Lead Computational Modeler Read their story FINALIST SUBMISSION Helping Nonprofits Leverage Insights From Their Data ORGANIZATION 41xRT TEAM MEMBER Tom Brown, Non Profit Data Science & Analytics Advocate Read their story RESPONSIBLE AI Highlighting the individuals and organizations who are using Dataiku to develop foundational AI for the future, that is governable, sustainable, transparent and free of unintended bias. WINNER SUBMISSION Designing a Responsible, Self-service Tool for Natural Language Processing ORGANIZATION Unilever TEAM MEMBERS Linda Hoeberigs, Head of Data Science and AI, PDC Lab & Ash Tapia, Data Partnerships & Tools Stack Manager Read their story FINALIST SUBMISSION Building a Feature Store for Quicker and More Accurate Machine Learning Models ORGANIZATION Premera Blue Cross TEAM MEMBER Marlan Crosier, Senior Data Scientist Read their story FINALIST SUBMISSION Talent Acquisition Enablement with Machine Learning ORGANIZATION Schlumberger TEAM MEMBER Modhar Khan, Head of People Analytics Read their story VALUE AT SCALE Showcasing the pioneering individual and organizational use of Dataiku to manage the full lifecycle of models and pipelines, and deliver value at scale. WINNER SUBMISSION Dynamic Audit Planning Through Machine Learning-based Risk Assessment ORGANIZATION Royal Bank of Canada TEAM MEMBER Masood Ali, Senior Director, Data Strategy & Governance Read their story WINNER SUBMISSION Reducing Detection Time of Manufacturing Issues with Real-time Automated Process Control ORGANIZATION NXP Semiconductors TEAM MEMBER Adnan Chowdhury, Manufacturing Quality Engineer Read their story FINALIST SUBMISSION Streamlining & Augmenting the Well Evaluation Process at Scale ORGANIZATION Schlumberger TEAM MEMBER Rasesh Saraiya, Data Scientist FINALIST SUBMISSION Leveraging AI to Democratize Insights From Customer Feedback ORGANIZATION Malakoff Humanis TEAM MEMBER Nikola Lackovic, Data Scientist Read their story FINALIST SUBMISSION Human-centered Machine Learning for Dimensioning Resources in Telecoms ORGANIZATION Ericsson TEAM MEMBER Marcial Gutierrez, System Manager Read their story EXCELLENCE IN TEACHING Recognizing members of the teaching faculty for their invaluable contribution to educate the next generation of data science talent with Dataiku, driving innovation in the field and aligning with real world use cases. WINNER SUBMISSION Teaching the Next Generation of Chief Data Officers with Dataiku ORGANIZATION HES-SO TEAM MEMBERS Cédric Gaspoz and Dominique Genoud, Professors UAS Read their story FINALIST SUBMISSION Data Analysis Bridges Finance Theory and Practice ORGANIZATION Columbia University TEAM MEMBER Perry Beaumont, PhD., Lecturer Read their story FINALIST SUBMISSION Dataiku as a Leading User-Friendly Data Science Platform for MBA Students ORGANIZATION INSEEC U. TEAM MEMBER Linda Attari, Director of MSc 1 Data Management and MSc2 Data Analytics Read their story FINALIST SUBMISSION Facilitating & Enhancing the Data Science Learning Experience with Dataiku ORGANIZATION Live University TEAM MEMBER Fernando Enobi, Professor Read their story FINALIST SUBMISSION Dataiku as a Versatile Platform for BI & Beyond ORGANIZATION Hochschule Hannover TEAM MEMBER Dr. Maylin Wartenberg, Professor Read their story EXCELLENCE IN RESEARCH Starring academic researchers who are leveraging Dataiku to gain impactful insights from their data and push the frontiers of our knowledge. WINNER SUBMISSION Mapping Police Fatal Encounters to Inform Future Policy ORGANIZATION University of Michigan TEAM MEMBERS Frank Romo, Master of Urban Planning Researcher, & Harley Etienne, Professor Read their story FINALIST SUBMISSION Streamlining Data Workflows for Clinical Research ORGANIZATION Hospital de Clínicas de Porto Alegre TEAM MEMBER Tiago Andres Vaz, Head of AI FINALIST SUBMISSION Software Analysis Execution Process Improvement and Prediction Program ORGANIZATION Leidos TEAM MEMBER Karen Cheng, Data Scientist Read their story ALAN TUNING Rewarding the pioneers who are pushing the boundaries of Dataiku to build innovative projects - including for fun! WINNER SUBMISSION Leveraging AI to Democratize Insights From Customer Feedback ORGANIZATION Malakoff Humanis TEAM MEMBER Nikola Lackovic, Data Scientist Read their story FINALIST SUBMISSION Software Analysis Execution Process Improvement and Prediction Program ORGANIZATION Leidos TEAM MEMBER Karen Cheng, Data Scientist Read their story FINALIST SUBMISSION Streamlining & Augmenting the Well Evaluation Process at Scale ORGANIZATION Schlumberger TEAM MEMBER Rasesh Saraiya, Data Scientist FINALIST SUBMISSION Building an Emotion Classification System on Videos ORGANIZATION IME TEAM MEMBER Mohamed AbdElAziz Khamis Omar, Senior Data Scientist Read their story Top * * * * © 2012-2022 Dataiku. All rights reserved. * Privacy Policy * Cookie Policy * Events Code Of Conduct Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.