community.dataiku.com Open in urlscan Pro
2600:9000:2491:6800:1:9db:4040:93a1  Public Scan

Submitted URL: https://pages.dataiku.com/e3t/Ctc/GA+113/cfvmy04/MX7RCxG68F-W1K9m4m4gwCcBW3QbRmh4N1D4NN4313K53q3pBV1-WJV7Cg-3QMh_shM2xftKW...
Effective URL: https://community.dataiku.com/t5/Dataiku-Frontrunner-Awards/tkb-p/Awards?utm_campaign=Dataiku%20Frontrunner%20Awards%202022&ut...
Submission: On August 08 via api from DE — Scanned from DE

Form analysis 3 forms found in the DOM

Name: form_3bb9fcc7484389POST https://community.dataiku.com/t5/tkb/v2/page.searchformv32.form.form

<form enctype="multipart/form-data" class="lia-form lia-form-inline SearchForm" action="https://community.dataiku.com/t5/tkb/v2/page.searchformv32.form.form" method="post" id="form_3bb9fcc7484389" name="form_3bb9fcc7484389">
  <div class="t-invisible"><input
      value="blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA.."
      name="t:ac" type="hidden"><input value="search/contributions/page" name="t:cp" type="hidden"><input
      value="KPDW8ZSwAIFqo03E6yOAINvqzsqLo6nMUoFTCo0lsO2L8MrzJ7WQFnt40pH2Y1T1eO9fjKGyy1AkcmdtyIA4i69oaunFB6S7W8S3r-cHeRdaXd_1zsri5OELal9hDZjWpjASzq3c3lW_MFsgl8snT1Bpov6t7vY5mccbxEPH9-6Y8pdzsY-rdguyeYcPv2PfyQuyEdEFqK9Has-FLCfe7qx2O5xRLxHi7x-HW28ZK7vT1B8GRRPyJMnP7dukPIoVrcAOYpTIFRXb1sKnTbUEeSaNK-_Ac0uWcFKaJIYBmEgH0ioZ2yDhBiiS0ha1QR5fKKZZM976rH9q6uqnfoeNhj1H_XMlbiSJz_garAq-NPElh1RrErEcYRQugygnQRrZ4AJ2QVXhNeoMVHQxrq6fOjE4qLeiWvwPSmM0JakR6Akybl7_7Nee8OjrMa0JlnNWvSaGtLxIMcE40zNhJhRx0PBANb0h42JwvjbKRXBqx2tUWG9Eq3hTY54ykXIA4fc_EtE8EMu_cja_3E41ezIg3iGuj1nZaCVkCHxKnX7MFrggakRO9F4kwbVahdoLQ0PJ9k8OwvoauqotaIo5GP_A89FO3K4sEshBwTDsmfQOIHZKTS7eNFdMwpNDDEhCzdepn6XtehHfwM-lml20OtSLdtumVAcosiFkwOWgT2zioYgtdNKangkPDgg4r8MOKZS78Hi8X2TiTJpQMJrhZvQdP-aHkLznfWStcULKSmjVNKuaxXdIeha94QHLvz9nsll4evaa9CbPTm_aJkdDZt96vCK3eOtv__fJcpYoav5ue1D5WEiEQZKND1gwGf_ytDVvkAdnq1ip9qxOxy0Y4x3lxcV2oEnxuplLAK_KVcjT0o3ymB5o6MKuv1k4q5JXMOAeyVBWu7DRUYvbW3bnXBy0PqJ4Jy8vMzB2zqQnO4VJbfJD-dUPeip3d28Ox9x01fnFNUnAoo1_SkzFzkNj1TGBgkCCML9BiUuWJeh6_VJcSweJdZ1ndQH0hsCtL-961kvr6qQUsRaAejgQ6JRf296EJGWmSYlu5IaCeMTbaGWgXtjx2ulTaUZYb7dLSAPedTuneirwEqfmIO81xHm1O1t-QS-5MGOmeRTVfWKaLEBYjCHAzLbwF_xKcNh__ldTTJYUwF32ELnLzriZO-5fQP1XRmio8nqDYXbyUq6CSJlk6bz4Yp6yU82XaHm_KwbwrYq5m3MxWE9WuBy6yAYO8MTQbKf2GmpH_0ckpJj3TZ1un-Jjd771QP558YL9KrPtpkMQU2r41p1xa8RTAdlpqOfp-p8SDsyTji6SySG4Zdx0kxM."
      name="lia-form-context" type="hidden"><input value="TkbPage:blog-id/Awards:searchformv32.form:" name="liaFormContentKey" type="hidden"><input
      value="5DI9GWMef1Esyz275vuiiOExwpQ=:H4sIAAAAAAAAALVSTU7CQBR+krAixkj0BrptjcpCMSbERGKCSmxcm+kwlGrbqTOvFDYexRMYL8HCnXfwAG5dubDtFKxgYgu4mrzvm3w/M+/pHcphHQ4kI4L2dMo9FLYZoM09qbeJxQ4V0+XC7e/tamqyBPEChwgbh1JAjQtLIz6hPaYh8ZlEMaxplAvm2KZmEsm0hhmBhOKpzZzOlsEw8LevR5W3zZfPEqy0oJIYc+eCuAyh2rolfaI7xLN0I8rjWfWBj7CuzJvf5osmbxRN3hacMimNwHRtKSOr0XNnv/vx+FoCGPjhMRzljhNLYHrEt9kA5T08ACCsKvREoYuqxqLl8BLO84q4UcMITcG49y/QOGs1pYyESl5p6V6qwRW086rinVmoxMZsiZud/zBUTc6gmVc4kExkJafmcYG1GM9+wfIsCkf2OP54hal5EjnG54z8h0XhjfcF7wQUs5Kz0GTjU2rOjc/llTT4Au07pDOcBQAA"
      name="t:formdata" type="hidden"></div>
  <div class="lia-inline-ajax-feedback">
    <div class="AjaxFeedback" id="feedback_3bb9fcc7484389"></div>
  </div>
  <input value="-z4vhJDU0069q5UfIwniXfSRaFiWvTv0h0bC8O5Z7-Q." name="lia-action-token" type="hidden">
  <input value="form_3bb9fcc7484389" id="form_UIDform_3bb9fcc7484389" name="form_UID" type="hidden">
  <input value="" id="form_instance_keyform_3bb9fcc7484389" name="form_instance_key" type="hidden">
  <span class="lia-search-granularity-wrapper">
    <select title="Search Granularity" class="lia-search-form-granularity search-granularity" aria-label="Search Granularity" id="searchGranularity_3bb9fcc7484389" name="searchGranularity">
      <option title="All community" value="gqmyn45884|community">All community</option>
      <option title="This category" value="Programs|category">This category</option>
      <option title="Knowledge base" selected="selected" value="Awards|tkb-board">Knowledge base</option>
      <option title="Users" value="user|user">Users</option>
    </select>
  </span>
  <span class="lia-search-input-wrapper">
    <span class="lia-search-input-field">
      <span class="lia-button-wrapper lia-button-wrapper-secondary lia-button-wrapper-searchForm-action"><input value="searchForm" name="submitContextX" type="hidden"><input class="lia-button lia-button-secondary lia-button-searchForm-action"
          value="Search" id="submitContext_3bb9fcc7484389" name="submitContext" type="submit"></span>
      <input placeholder="Search Dataiku use cases and success stories" aria-label="Search" title="Search" class="lia-form-type-text lia-autocomplete-input search-input lia-search-input-message" value="" id="messageSearchField_3bb9fcc7484389_0"
        name="messageSearchField" type="text" aria-autocomplete="both" autocomplete="off">
      <div class="lia-autocomplete-container" style="display: none; position: absolute;">
        <div class="lia-autocomplete-header">Enter a search word</div>
        <div class="lia-autocomplete-content">
          <ul></ul>
        </div>
        <div class="lia-autocomplete-footer">
          <a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="cgw667_8zzcQfauOzudhf_fJmyOde3Xt22LvKYxz-A4." rel="nofollow" id="disableAutoComplete_3bb9fcc783c01e" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&amp;t:cp=action/contributions/searchactions">Turn off suggestions</a>
        </div>
      </div>
      <input placeholder="Search Dataiku use cases and success stories" aria-label="Search" title="Search" class="lia-form-type-text lia-autocomplete-input search-input lia-search-input-tkb-article lia-js-hidden" value=""
        id="messageSearchField_3bb9fcc7484389_1" name="messageSearchField_0" type="text" aria-autocomplete="both" autocomplete="off">
      <div class="lia-autocomplete-container" style="display: none; position: absolute;">
        <div class="lia-autocomplete-header">Enter a search word</div>
        <div class="lia-autocomplete-content">
          <ul></ul>
        </div>
        <div class="lia-autocomplete-footer">
          <a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="zXYy1AWwZGN4D5aUHyUfPx71BRnDSz6Z-XLf6rO6uH4." rel="nofollow" id="disableAutoComplete_3bb9fcc7b56f61" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&amp;t:cp=action/contributions/searchactions">Turn off suggestions</a>
        </div>
      </div>
      <input placeholder="Search Dataiku use cases and success stories" ng-non-bindable="" title="Enter a user name or rank" class="lia-form-type-text UserSearchField lia-search-input-user search-input lia-js-hidden lia-autocomplete-input"
        aria-label="Enter a user name or rank" value="" id="userSearchField_3bb9fcc7484389" name="userSearchField" type="text" aria-autocomplete="both" autocomplete="off">
      <div class="lia-autocomplete-container" style="display: none; position: absolute;">
        <div class="lia-autocomplete-header">Enter a user name or rank</div>
        <div class="lia-autocomplete-content">
          <ul></ul>
        </div>
        <div class="lia-autocomplete-footer">
          <a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="6M7HFFaZ7EXAZeyEnjXQm47AiIat8eRiwvOKzKp-Z2M." rel="nofollow" id="disableAutoComplete_3bb9fcc7de4ef7" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&amp;t:cp=action/contributions/searchactions">Turn off suggestions</a>
        </div>
      </div>
      <input title="Enter a search word" class="lia-form-type-text NoteSearchField lia-search-input-note search-input lia-js-hidden lia-autocomplete-input" aria-label="Enter a search word" value="" id="noteSearchField_3bb9fcc7484389_0"
        name="noteSearchField" type="text" aria-autocomplete="both" autocomplete="off" placeholder="Search Dataiku use cases and success stories">
      <div class="lia-autocomplete-container" style="display: none; position: absolute;">
        <div class="lia-autocomplete-header">Enter a search word</div>
        <div class="lia-autocomplete-content">
          <ul></ul>
        </div>
        <div class="lia-autocomplete-footer">
          <a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="K20ERmlcIxyndcDPC1kbPi4wKOw9ajZ4p79Ys8EELF0." rel="nofollow" id="disableAutoComplete_3bb9fcc8071fab" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&amp;t:cp=action/contributions/searchactions">Turn off suggestions</a>
        </div>
      </div>
      <input title="Enter a search word" class="lia-form-type-text ProductSearchField lia-search-input-product search-input lia-js-hidden lia-autocomplete-input" aria-label="Enter a search word" value="" id="productSearchField_3bb9fcc7484389"
        name="productSearchField" type="text" aria-autocomplete="both" autocomplete="off" placeholder="Search Dataiku use cases and success stories">
      <div class="lia-autocomplete-container" style="display: none; position: absolute;">
        <div class="lia-autocomplete-header">Enter a search word</div>
        <div class="lia-autocomplete-content">
          <ul></ul>
        </div>
        <div class="lia-autocomplete-footer">
          <a class="lia-link-navigation lia-autocomplete-toggle-off lia-link-ticket-post-action lia-component-search-action-disable-auto-complete" data-lia-action-token="ZG6qDTboUerQ708hlx1fzBQ500eAQrH2t18Zkiw8UkY." rel="nofollow" id="disableAutoComplete_3bb9fcc82bbd02" href="https://community.dataiku.com/t5/tkb/v2/page.disableautocomplete:disableautocomplete?t:ac=blog-id/Awards/q-p/dXRtX2NhbXBhaWduOkRhdGFpa3UrRnJvbnRydW5uZXIrQXdhcmRzKzIwMjI6OnV0bV9tZWRpdW06ZW1haWw6Ol9oc21pOjIyMTA3ODA5NTo6X2hzZW5jOnAyQU5xdHotX2FkZWhpUWdJanUwaGZpQ0poWnVFakNtSUMxcmRQNlhhSlNrYWN5MnFJdkZGMEoySC1Mb28xU3FpM2lyVHFhTUhlZ3ZhVWZURHo4SVBLejc5Qi1Ec1dZZVNVRXJuWmdCVnNicGpzalJ3LW5NaElOSjA6OnV0bV9jb250ZW50OjIyMTA3NzQwNDo6dXRtX3NvdXJjZTpoc19lbWFpbA..&amp;t:cp=action/contributions/searchactions">Turn off suggestions</a>
        </div>
      </div>
      <input class="lia-as-search-action-id" name="as-search-action-id" type="hidden">
    </span>
  </span>
  <span class="lia-cancel-search">cancel</span>
</form>

POST /restapi/vc/users/id/-1/media/albums/default/public/images/upload

<form id="logo_ajax_submition" action="/restapi/vc/users/id/-1/media/albums/default/public/images/upload" method="post" enctype="multipart/form-data" style="display: none;">
  <input type="file" id="submission_logo_hidden" name="image.content" accept=".jpg,.png,.jpeg" autocomplete="off">
</form>

POST /t5/tkb/articleeditorpage/tkb-id/Awards/template-id/freeform?submission=new

<form id="awards_form" action="/t5/tkb/articleeditorpage/tkb-id/Awards/template-id/freeform?submission=new" method="post">
  <fieldset id="awards_form_first" disabled="disabled">
    <div class="shadowed-box">
      <legend>Your Submission</legend>
      <ol>
        <li>
          <fieldset style="margin-top: 24px">
            <legend>Select your award category:</legend>
            <p>You may enter your submission into multiple categories at once!</p>
            <h4>Use Cases</h4>
            <p>These categories focus on the practical applications of Dataiku:</p>
            <div class="award_category">
              <input type="checkbox" id="award_cat_1" data-group="1" name="award_cat_1" value="Data Science for Good">
              <label for="award_cat_1"><strong>Data Science for Good</strong><br>Turning the spotlight on the best use of Dataiku by companies and individuals to make a positive impact on the world.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_2" data-group="1" name="award_cat_2" value="Responsible AI">
              <label for="award_cat_2"><strong>Responsible AI</strong><br>Highlighting the individuals and organizations who are using Dataiku to develop foundational AI for the future, that is governable, sustainable, transparent, and seeks to
                remove bias.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_3" data-group="1" name="award_cat_3" value="Value at Scale">
              <label for="award_cat_3"><strong>Value at Scale</strong><br>Showcasing the pioneering individual and organizational use of Dataiku to manage the full lifecycle of models and pipelines, and deliver value at scale.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_10" data-group="1" name="award_cat_10" value="Moonshot Pioneer(s)">
              <label for="award_cat_10"><strong>Partner Acceleration</strong><br>Featuring successful partnerships between Dataiku, partner organizations, and customers to bring a use case to fruition faster, smarter, and/or better.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_4" data-group="1" name="award_cat_4" value="Moonshot Pioneer(s)">
              <label for="award_cat_4"><strong>Moonshot Pioneer(s)</strong><br>Rewarding the pioneers who are pushing the boundaries of Dataiku to build innovative projects - including for fun!</label>
            </div>
            <h4>Success Stories</h4>
            <p>These categories highlight individual and collective achievements:</p>
            <div class="award_category">
              <input type="checkbox" id="award_cat_5" data-group="2" name="award_cat_5" value="Most Impactful Transformation Story">
              <label for="award_cat_5"><strong>Most Impactful Transformation Story</strong><br>Recognizing inspiring transformation stories from organizations which have systematized the use of data and AI with Dataiku.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_6" data-group="2" name="award_cat_6" value="Most Impactful Ikigai Story">
              <label for="award_cat_6"><strong>Most Impactful Ikigai Story</strong><br>Turning the spotlight on nonprofit organizations or volunteers who leverage Dataiku to accelerate their organization’s mission and grow their positive social
                and/or environmental impact.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_7" data-group="2" name="award_cat_7" value="Excellence in Teaching">
              <label for="award_cat_7"><strong>Excellence in Teaching</strong><br>Recognizing members of the teaching faculty for their invaluable contribution to educate the next generation of analytical talent with Dataiku, driving innovation in
                the field and aligning with real world use cases.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_8" data-group="2" name="award_cat_8" value="Excellence in Research">
              <label for="award_cat_8"><strong>Excellence in Research</strong><br>Starring academic researchers who are leveraging Dataiku to gain impactful insights from their data and push the frontiers of our knowledge.</label>
            </div>
            <div class="award_category">
              <input type="checkbox" id="award_cat_9" data-group="2" name="award_cat_9" value="Most Extraordinary AI Maker">
              <label for="award_cat_9"><strong>Most Extraordinary AI Maker(s)</strong><br>Spotlighting inspiring stories of AI makers who have made a bigger impact with Dataiku through individual upskill, business &amp; tech collaboration, or
                elevating others to harness the power of data.</label>
            </div>
          </fieldset>
        </li>
        <li>
          <label for="submission_role">You are applying as</label>
          <select id="submission_role" name="submission_role" autocomplete="off">
            <option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
            <option value="Large-scale company (revenue over $1 billion USD)">Large-scale company (revenue over $1 billion USD)</option>
            <option value="Small or medium-sized company">Small or medium-sized company</option>
            <option value="Partner organization">Partner organization</option>
            <option value="Nonprofit organization">Nonprofit organization</option>
            <option value="Academic(s)">Academic(s)</option>
            <option value="Individual user(s)">Individual user(s)</option>
          </select>
        </li>
      </ol>
    </div>
  </fieldset>
  <fieldset id="awards_form_first_b" style="display: none;" disabled="disabled">
    <div class="shadowed-box">
      <legend>Your Use Case</legend>
      <ol start="3">
        <li>
          <label for="submission_challenge">What business challenge were you encountering?</label>
          <p>Feel free to contextualize by describing your industry, listing pain points, any frictions or obstacles that you met… <span class="formhint" filled-state="none"><span word-limit="300">0</span>/300
              words</span><!--span data-text="Feel free to contextualize by describing your industry, listing pain points, any frictions or obstacles that you met…" class="tooltip">?</span--></p>
          <textarea id="submission_challenge" name="submission_challenge" style="height:200px;"></textarea>
        </li>
        <li>
          <label for="submission_solve">How did you solve it with Dataiku?</label>
          <p><span class="corporate-option">You can highlight the reasons behind choosing Dataiku and how it helped you, how many users were involved across different roles, any techniques or other technologies you used, steps to complete your
              project, and more generally describe your journey to success.</span><span class="noncorporate-option" style="display: none;">Here’s the place to detail your success story - you can highlight the reasons behind choosing Dataiku and how
              it helped you reach your goals, as well as any important steps along your journey to success.</span><span id="excellence_in_teaching_text" class="dynamic-answer-text" style="display:none;"><br>Can you share more about your course
              content and how it aligns with real-word use cases that prepare students for their careers?</span><span id="excellence_in_research_text" class="dynamic-answer-text" style="display:none;"><br>Can you detail the innovative approach of
              your project and the impact your research has?</span> <span class="formhint" filled-state="none"><span word-limit="300">0</span>/300 words</span></p>
          <textarea id="submission_solve" name="submission_solve" style="height:200px;"></textarea>
        </li>
        <li>
          <label for="submission_businessarea">Business area enhanced</label>
          <select id="submission_businessarea" name="submission_businessarea" autocomplete="off">
            <option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
            <option value="Accounting/Finance">Accounting/Finance</option>
            <option value="Analytics">Analytics</option>
            <option value="Communication/Strategy/Competitive Intelligence">Communication/Strategy/Competitive Intelligence</option>
            <option value="Human Resources">Human Resources</option>
            <option value="Internal Operations">Internal Operations</option>
            <option value="IT/Cybersecurity/Data">IT/Cybersecurity/Data</option>
            <option value="Manufacturing">Manufacturing</option>
            <option value="Marketing/Sales/Customer Relationship Management">Marketing/Sales/Customer Relationship Management</option>
            <option value="Product &amp; Service Development">Product &amp; Service Development</option>
            <option value="Risk/Compliance/Legal/Internal Audit">Risk/Compliance/Legal/Internal Audit</option>
            <option value="Supply-chain/Supplier Management/Service Delivery">Supply-chain/Supplier Management/Service Delivery</option>
            <option value="Financial Services Specific">Financial Services Specific</option>
            <option value="Other">Other - please specify</option>
            <option value="Unknown">Unknown</option>
          </select>
          <input type="text" id="submission_businessarea_other" name="submission_businessarea_other" placeholder="please specify" style="display:none;">
        </li>
        <li>
          <label for="submission_stage">Use case stage</label>
          <select id="submission_stage" name="submission_stage" autocomplete="off">
            <option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
            <option value="Proof of Concept">Proof of Concept</option>
            <option value="In Progress">In Progress</option>
            <option value="Built &amp; Functional">Built &amp; Functional</option>
            <option value="In Production">In Production</option>
            <option value="Planned">Planned</option>
            <option value="Archived/Paused">Archived/Paused</option>
            <option value="Unknown">Unknown</option>
          </select>
        </li>
      </ol>
    </div>
  </fieldset>
  <fieldset id="awards_form_first_c" style="display: none;" disabled="disabled">
    <div class="shadowed-box">
      <legend>Value Generated</legend>
      <ol start="7">
        <li>
          <label for="submission_value">Can you explain the value created with this use case or success story?</label>
          <p><span class="corporate-option">Now is the time to explain the impact achieved - this can be ROI, metrics, and/or any other indicators of success!</span><span class="noncorporate-option" style="display: none;">Now is the time to explain
              the value generated - this can be ROI, metrics, and/or any other indicators of success!</span> <span class="formhint" filled-state="none"><span word-limit="300">0</span>/300 words</span></p>
          <textarea id="submission_value" name="submission_value" style="height:200px;"></textarea>
        </li>
        <li>
          <label for="submission_valuespecific">What is the specific value brought by Dataiku?</label>
          <p><span class="corporate-option">Some food for thought: speed and agility through increased team efficiency, enhanced tech stack efficiency, improved risk management and governance through transparency and explainability, upskilling and
              networking with resources such as the Dataiku Academy and Community…</span><span class="noncorporate-option" style="display: none;">Some food for thought: speed and agility through increased team efficiency, enhanced tech stack
              efficiency, improved risk management, and governance through transparency and explainability, upskilling and networking through resources such as the Academy and Community…</span> <span class="formhint" filled-state="none"><span
                word-limit="300">0</span>/300 words</span></p>
          <textarea id="submission_valuespecific" name="submission_valuespecific" style="height:200px;"></textarea>
        </li>
        <li class="multiple-selection-li">
          <label for="submission_valuetype">Value type</label>
          <label class="inset-checkbox" for="submission_valuetype_2"><input type="checkbox" id="submission_valuetype_2" name="submission_valuetype_2">
            <p>Improve customer/employee satisfaction</p>
          </label>
          <label class="inset-checkbox" for="submission_valuetype_3"><input type="checkbox" id="submission_valuetype_3" name="submission_valuetype_3">
            <p>Increase revenue</p>
          </label>
          <label class="inset-checkbox" for="submission_valuetype_4"><input type="checkbox" id="submission_valuetype_4" name="submission_valuetype_4">
            <p>Reduce cost</p>
          </label>
          <label class="inset-checkbox" for="submission_valuetype_5"><input type="checkbox" id="submission_valuetype_5" name="submission_valuetype_5">
            <p>Reduce risk</p>
          </label>
          <label class="inset-checkbox" for="submission_valuetype_6"><input type="checkbox" id="submission_valuetype_6" name="submission_valuetype_6">
            <p>Save time</p>
          </label>
          <label class="inset-checkbox" for="submission_valuetype_7"><input type="checkbox" id="submission_valuetype_7" name="submission_valuetype_7">
            <p>Increase trust</p>
          </label>
          <label class="inset-checkbox" for="submission_valuetype_8"><input type="checkbox" id="submission_valuetype_8" name="submission_valuetype_8">
            <p>Other</p>
          </label>
          <input type="text" id="submission_valuetype_other" name="submission_valuetype_other" placeholder="please specify" style="display:none;">
          <label class="inset-checkbox" for="submission_valuetype_1"><input type="checkbox" id="submission_valuetype_1" name="submission_valuetype_1">
            <p>Unknown</p>
          </label>
        </li>
        <li>
          <label for="submission_valuerange">Value range</label>
          <select id="submission_valuerange" name="submission_valuerange" autocomplete="off">
            <option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
            <option value="Less than $1,000">Less than $1,000</option>
            <option value="Thousands of $">Thousands of $</option>
            <option value="Hundreds of thousands of $">Hundreds of thousands of $</option>
            <option value="Millions of $">Millions of $</option>
            <option value="Dozens of millions of $">Dozens of millions of $</option>
            <option value="Unknown">Unknown</option>
          </select>
        </li>
      </ol>
    </div>
  </fieldset>
  <fieldset id="awards_form_second" style="display: none;" disabled="disabled">
    <div class="shadowed-box">
      <legend>About your organization <span class="formhint">(optional if applying as an individual)</span></legend>
      <ol start="11">
        <li>
          <label for="submission_orgname">Organization name</label>
          <input type="text" id="submission_orgname" name="submission_orgname">
        </li>
        <li>
          <label for="submission_boilerplate">Boilerplate</label>
          <p>Short, standard description of your organization <span class="formhint" filled-state="none"><span word-limit="100">0</span>/100 words</span></p>
          <textarea id="submission_boilerplate" name="submission_boilerplate" style="height:200px;"></textarea>
        </li>
        <li>
          <label for="submission_logo">Logo</label>
          <input type="text" id="submission_logo" name="submission_logo" autocomplete="off" style="display: none;">
          <input type="text" id="submission_logo_id" name="submission_logo_id" autocomplete="off" style="display: none;">
          <label class="lia-attachments-drop-zone" for="submission_logo_hidden">
            <div class="lia-file-upload-wrapper" id="filedragdrop">
              <div class="lia-file-upload">
                <div class="lia-file-upload-content">
                  <div class="lia-attachment-description">
                    <div class="lia-cloud-symbol">
                      <span class="lia-img-icon-cloud-upload lia-fa-icon lia-fa-cloud-upload lia-fa"></span>
                    </div>
                    <div class="lia-attachment-description-details">
                      <div class="lia-attachment-description-text">Browse files to attach</div>
                      <div class="lia-attachment-constraints">Maximum size: 15 MB • File types allowed: JPG, PNG</div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </label>
        </li>
        <li>
          <label for="submission_industry">Industry</label>
          <select id="submission_industry" name="submission_industry" autocomplete="off">
            <option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
            <option>Aerospace &amp; Defence</option>
            <option>Agriculture</option>
            <option>Auto Transportation &amp; Logistics</option>
            <option>Construction &amp; Real Estate</option>
            <option>Energy &amp; Utilities</option>
            <option>Financial Services Banking &amp; Insurance</option>
            <option>Health &amp; Pharmaceuticals</option>
            <option>Higher Education</option>
            <option>Manufacturing &amp; Chemical</option>
            <option>Media Information &amp; Entertainment</option>
            <option>Nonprofit</option>
            <option>Professional Services &amp; Consulting</option>
            <option>Public Sector &amp; Government</option>
            <option>Retail Ecommerce &amp; CPG</option>
            <option>Software &amp; Technology</option>
            <option>Telecommunications</option>
            <option>Travel &amp; Hospitality</option>
            <option>Other</option>
          </select>
        </li>
      </ol>
    </div>
  </fieldset>
  <fieldset id="awards_form_third" style="display: none;" disabled="disabled">
    <div class="shadowed-box">
      <legend>About you</legend>
      <ol start="15">
        <li>
          <label for="submission_fullname">Your full name</label>
          <input type="text" id="submission_fullname" name="submission_fullname" value=" ">
        </li>
        <li>
          <label for="submission_title">Your title</label>
          <input type="text" id="submission_title" name="submission_title">
        </li>
        <li>
          <label for="submission_country">Your country</label>
          <select id="submission_country" name="submission_country" autocomplete="off">
            <option value="" disabled="" selected="" hidden="">--Please choose an option--</option>
            <option value="United States">United States</option>
            <option value="United Kingdom">United Kingdom</option>
            <option value="France">France</option>
            <option value="Germany">Germany</option>
            <option value="India">India</option>
            <option value="Canada">Canada</option>
            <option value="Afghanistan">Afghanistan</option>
            <option value="Albania">Albania</option>
            <option value="Algeria">Algeria</option>
            <option value="Andorra">Andorra</option>
            <option value="Angola">Angola</option>
            <option value="Antigua &amp; Deps">Antigua &amp; Deps</option>
            <option value="Argentina">Argentina</option>
            <option value="Armenia">Armenia</option>
            <option value="Australia">Australia</option>
            <option value="Austria">Austria</option>
            <option value="Azerbaijan">Azerbaijan</option>
            <option value="Bahamas">Bahamas</option>
            <option value="Bahrain">Bahrain</option>
            <option value="Bangladesh">Bangladesh</option>
            <option value="Barbados">Barbados</option>
            <option value="Belarus">Belarus</option>
            <option value="Belgium">Belgium</option>
            <option value="Belize">Belize</option>
            <option value="Benin">Benin</option>
            <option value="Bermuda">Bermuda</option>
            <option value="Bhutan">Bhutan</option>
            <option value="Bolivia">Bolivia</option>
            <option value="Bosnia Herzegovina">Bosnia Herzegovina</option>
            <option value="Botswana">Botswana</option>
            <option value="Brazil">Brazil</option>
            <option value="Brunei">Brunei</option>
            <option value="Bulgaria">Bulgaria</option>
            <option value="Burkina">Burkina</option>
            <option value="Burundi">Burundi</option>
            <option value="Cambodia">Cambodia</option>
            <option value="Cameroon">Cameroon</option>
            <option value="Cape Verde">Cape Verde</option>
            <option value="Central African Rep">Central African Rep</option>
            <option value="Chad">Chad</option>
            <option value="Chile">Chile</option>
            <option value="China">China</option>
            <option value="Colombia">Colombia</option>
            <option value="Comoros">Comoros</option>
            <option value="Congo">Congo</option>
            <option value="Congo {Democratic Rep}">Congo {Democratic Rep}</option>
            <option value="Costa Rica">Costa Rica</option>
            <option value="Croatia">Croatia</option>
            <option value="Cuba">Cuba</option>
            <option value="Cyprus">Cyprus</option>
            <option value="Czech Republic">Czech Republic</option>
            <option value="Denmark">Denmark</option>
            <option value="Djibouti">Djibouti</option>
            <option value="Dominica">Dominica</option>
            <option value="Dominican Republic">Dominican Republic</option>
            <option value="East Timor">East Timor</option>
            <option value="Ecuador">Ecuador</option>
            <option value="Egypt">Egypt</option>
            <option value="El Salvador">El Salvador</option>
            <option value="Equatorial Guinea">Equatorial Guinea</option>
            <option value="Eritrea">Eritrea</option>
            <option value="Estonia">Estonia</option>
            <option value="Ethiopia">Ethiopia</option>
            <option value="Fiji">Fiji</option>
            <option value="Finland">Finland</option>
            <option value="Gabon">Gabon</option>
            <option value="Gambia">Gambia</option>
            <option value="Georgia">Georgia</option>
            <option value="Ghana">Ghana</option>
            <option value="Greece">Greece</option>
            <option value="Grenada">Grenada</option>
            <option value="Guatemala">Guatemala</option>
            <option value="Guinea">Guinea</option>
            <option value="Guinea-Bissau">Guinea-Bissau</option>
            <option value="Guyana">Guyana</option>
            <option value="Haiti">Haiti</option>
            <option value="Honduras">Honduras</option>
            <option value="Hong Kong">Hong Kong</option>
            <option value="Hungary">Hungary</option>
            <option value="Iceland">Iceland</option>
            <option value="Indonesia">Indonesia</option>
            <option value="Iran">Iran</option>
            <option value="Iraq">Iraq</option>
            <option value="Ireland {Republic}">Ireland {Republic}</option>
            <option value="Israel">Israel</option>
            <option value="Italy">Italy</option>
            <option value="Ivory Coast">Ivory Coast</option>
            <option value="Jamaica">Jamaica</option>
            <option value="Japan">Japan</option>
            <option value="Jordan">Jordan</option>
            <option value="Kazakhstan">Kazakhstan</option>
            <option value="Kenya">Kenya</option>
            <option value="Kiribati">Kiribati</option>
            <option value="Korea North">Korea North</option>
            <option value="Korea South">Korea South</option>
            <option value="Kuwait">Kuwait</option>
            <option value="Kyrgyzstan">Kyrgyzstan</option>
            <option value="Latvia">Latvia</option>
            <option value="Lebanon">Lebanon</option>
            <option value="Lesotho">Lesotho</option>
            <option value="Liberia">Liberia</option>
            <option value="Libya">Libya</option>
            <option value="Liechtenstein">Liechtenstein</option>
            <option value="Lithuania">Lithuania</option>
            <option value="Luxembourg">Luxembourg</option>
            <option value="Macedonia">Macedonia</option>
            <option value="Madagascar">Madagascar</option>
            <option value="Malawi">Malawi</option>
            <option value="Malaysia">Malaysia</option>
            <option value="Maldives">Maldives</option>
            <option value="Mali">Mali</option>
            <option value="Malta">Malta</option>
            <option value="Marshall Islands">Marshall Islands</option>
            <option value="Mauritania">Mauritania</option>
            <option value="Mauritius">Mauritius</option>
            <option value="Mexico">Mexico</option>
            <option value="Micronesia">Micronesia</option>
            <option value="Moldova">Moldova</option>
            <option value="Monaco">Monaco</option>
            <option value="Mongolia">Mongolia</option>
            <option value="Montenegro">Montenegro</option>
            <option value="Morocco">Morocco</option>
            <option value="Mozambique">Mozambique</option>
            <option value="Namibia">Namibia</option>
            <option value="Nauru">Nauru</option>
            <option value="Nepal">Nepal</option>
            <option value="Netherlands">Netherlands</option>
            <option value="New Zealand">New Zealand</option>
            <option value="Nicaragua">Nicaragua</option>
            <option value="Niger">Niger</option>
            <option value="Nigeria">Nigeria</option>
            <option value="Norway">Norway</option>
            <option value="Oman">Oman</option>
            <option value="Pakistan">Pakistan</option>
            <option value="Palau">Palau</option>
            <option value="Panama">Panama</option>
            <option value="Papua New Guinea">Papua New Guinea</option>
            <option value="Paraguay">Paraguay</option>
            <option value="Peru">Peru</option>
            <option value="Philippines">Philippines</option>
            <option value="Poland">Poland</option>
            <option value="Portugal">Portugal</option>
            <option value="Qatar">Qatar</option>
            <option value="Romania">Romania</option>
            <option value="Russian Federation">Russian Federation</option>
            <option value="Rwanda">Rwanda</option>
            <option value="Samoa">Samoa</option>
            <option value="San Marino">San Marino</option>
            <option value="Saudi Arabia">Saudi Arabia</option>
            <option value="Senegal">Senegal</option>
            <option value="Serbia">Serbia</option>
            <option value="Seychelles">Seychelles</option>
            <option value="Sierra Leone">Sierra Leone</option>
            <option value="Singapore">Singapore</option>
            <option value="Slovakia">Slovakia</option>
            <option value="Slovenia">Slovenia</option>
            <option value="Solomon Islands">Solomon Islands</option>
            <option value="Somalia">Somalia</option>
            <option value="South Africa">South Africa</option>
            <option value="South Sudan">South Sudan</option>
            <option value="Spain">Spain</option>
            <option value="Sri Lanka">Sri Lanka</option>
            <option value="Sudan">Sudan</option>
            <option value="Suriname">Suriname</option>
            <option value="Sweden">Sweden</option>
            <option value="Switzerland">Switzerland</option>
            <option value="Taiwan">Taiwan</option>
            <option value="Tajikistan">Tajikistan</option>
            <option value="Thailand">Thailand</option>
            <option value="Togo">Togo</option>
            <option value="Tonga">Tonga</option>
            <option value="Tunisia">Tunisia</option>
            <option value="Turkey">Turkey</option>
            <option value="Turkmenistan">Turkmenistan</option>
            <option value="Tuvalu">Tuvalu</option>
            <option value="Uganda">Uganda</option>
            <option value="Ukraine">Ukraine</option>
            <option value="United Arab Emirates">United Arab Emirates</option>
            <option value="Uruguay">Uruguay</option>
            <option value="Uzbekistan">Uzbekistan</option>
            <option value="Vanuatu">Vanuatu</option>
            <option value="Vatican City">Vatican City</option>
            <option value="Venezuela">Venezuela</option>
            <option value="Vietnam">Vietnam</option>
            <option value="Yemen">Yemen</option>
            <option value="Zambia">Zambia</option>
            <option value="Zimbabwe">Zimbabwe</option>
          </select>
        </li>
        <li>
          <label for="submission_collaborators">If you are applying as a team or organization, enter the name of your teammates (and usernames on the Dataiku Community if relevant):</label>
          <textarea id="submission_collaborators" name="submission_collaborators" style="height:122px;"></textarea>
        </li>
      </ol>
    </div>
    <div class="shadowed-box">
      <input type="checkbox" id="submission_tos" name="submission_tos" value="true" required="">
      <label for="submission_tos">I have reviewed and accepted the <a href="https://downloads.dataiku.com/publicdocs/Dataiku_Frontrunner_Awards_2021_Terms_latest.pdf" target="_blank">Submission Terms &amp; Conditions</a>.</label>
      <div class="preview-and-add-images"><a class="button lia-button lia-button-primary" href="javascript:void(0);">Preview and Submit</a></div>
    </div>
  </fieldset>
</form>

Text Content

This website uses cookies. By clicking OK, you consent to the use of cookies.
Read our cookie policy.

Accept

Reject
Community

Showing results for 
Search instead for 
Did you mean: 
Browse

 * Discussions
   * Setup & Configuration
   * Using Dataiku
   * Plugins & Extending Dataiku
   * General Discussion
   * Job Board
   * Community Resources
   * Product Ideas
 * Knowledge
   * Getting Started
   * Knowledge Base
   * Documentation
 * Academy
   * Quick Start Programs
   * Learning Paths
   * Certifications
   * Course Catalog
   * Academy Discussions
 * Community Programs
   * Dataiku Neurons
   * User Groups
   * User Groups Resources
   * Dataiku Inspiration
   * Dataiku Frontrunner Awards
   * Student Showcase
   * Online Events
   * Upcoming Events
   * Past Events
   * Community Conundrums
   * Banana Data Podcast
   * Community Feedback
 * User Research
 * What's New

Sign In
You now have until September 15th to submit your use case or success story to
the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

 * Community
 * »
 * Community Programs
 * »

Options
 * My Knowledge Base Contributions
 * Knowledge Base Article Dashboard
 * 
 * Subscribe
 * Bookmark
 * 
 * Subscribe to RSS Feed
 * 
 * Invite a Friend


DATAIKU FRONTRUNNER AWARDS

Celebrating extraordinary people who are paving the way for Everyday AI with
Dataiku

All communityThis categoryKnowledge baseUsers
Enter a search word

Turn off suggestions
Enter a search word

Turn off suggestions
Enter a user name or rank

Turn off suggestions
Enter a search word

Turn off suggestions
Enter a search word

Turn off suggestions
cancel
Turn on suggestions

Why Participate? Enter Your Submission Dataiku Use Cases And Success Stories
2021 Winners & Finalists

The Dataiku Frontrunner Awards recognize the success of Dataiku customers,
partners, nonprofits, academics, and all individual users.

Enter your submission today to share your pioneering achievements ⁠— be it
automating everyday tasks, elevating more people to harness the power of data,
systemizing transformation, or tackling moonshot projects!

To participate, fill out the submission form to detail the impact that you, your
team, or your organization have achieved with Dataiku in one (or several!) of
the award categories.

Winners will be determined by a panel of judges, which includes Dataiku
executives as well as independent industry experts, and announced in the fall of
2022.

For any questions, please email the team at community@dataiku.com. We’re here to
help you celebrate your success!

Enter Your Submission


WHY PARTICIPATE?


BE RECOGNIZED AS A THOUGHT LEADER

Video features and speaking opportunities will enable winners and finalists to
earn visibility in the industry, while all participants will gain exposure on
Dataiku’s networks.


CELEBRATE YOUR INDIVIDUAL & TEAM’S SUCCESS

Inspire the data science community by sharing your achievements and the value
you have generated, individually or collectively.


ENHANCE YOUR EMPLOYER BRANDING

Showcase your innovation with the data science community and entice the
brightest talents to join your organization and contribute to your success.


WIN SPECIAL PRIZES AND SWAG

Winners will be offered a unique trophy and a donation to the charity of their
choice, and special Dataiku swag will be sent to all participants to thank you
for your contribution to knowledge sharing!

Submissions are open until Thursday, September 15th at 11:59am EST.

We recommend drafting your entry in a separate document. Once ready, copy over
to the form, upload any helpful visual elements (e.g. graphs, screenshots,
infographics, or videos), and hit submit!

By entering your submission, you agree to the Submission Terms & Conditions.

Sign in to enter your submission

Your Submission
 1. Select your award category:
    
    You may enter your submission into multiple categories at once!
    
    USE CASES
    
    These categories focus on the practical applications of Dataiku:
    
    Data Science for Good
    Turning the spotlight on the best use of Dataiku by companies and
    individuals to make a positive impact on the world.
    Responsible AI
    Highlighting the individuals and organizations who are using Dataiku to
    develop foundational AI for the future, that is governable, sustainable,
    transparent, and seeks to remove bias.
    Value at Scale
    Showcasing the pioneering individual and organizational use of Dataiku to
    manage the full lifecycle of models and pipelines, and deliver value at
    scale.
    Partner Acceleration
    Featuring successful partnerships between Dataiku, partner organizations,
    and customers to bring a use case to fruition faster, smarter, and/or
    better.
    Moonshot Pioneer(s)
    Rewarding the pioneers who are pushing the boundaries of Dataiku to build
    innovative projects - including for fun!
    
    SUCCESS STORIES
    
    These categories highlight individual and collective achievements:
    
    Most Impactful Transformation Story
    Recognizing inspiring transformation stories from organizations which have
    systematized the use of data and AI with Dataiku.
    Most Impactful Ikigai Story
    Turning the spotlight on nonprofit organizations or volunteers who leverage
    Dataiku to accelerate their organization’s mission and grow their positive
    social and/or environmental impact.
    Excellence in Teaching
    Recognizing members of the teaching faculty for their invaluable
    contribution to educate the next generation of analytical talent with
    Dataiku, driving innovation in the field and aligning with real world use
    cases.
    Excellence in Research
    Starring academic researchers who are leveraging Dataiku to gain impactful
    insights from their data and push the frontiers of our knowledge.
    Most Extraordinary AI Maker(s)
    Spotlighting inspiring stories of AI makers who have made a bigger impact
    with Dataiku through individual upskill, business & tech collaboration, or
    elevating others to harness the power of data.
 2. You are applying as --Please choose an option-- Large-scale company (revenue
    over $1 billion USD) Small or medium-sized company Partner organization
    Nonprofit organization Academic(s) Individual user(s)

Your Use Case
 3. What business challenge were you encountering?
    
    Feel free to contextualize by describing your industry, listing pain points,
    any frictions or obstacles that you met… 0/300 words

 4. How did you solve it with Dataiku?
    
    You can highlight the reasons behind choosing Dataiku and how it helped you,
    how many users were involved across different roles, any techniques or other
    technologies you used, steps to complete your project, and more generally
    describe your journey to success.Here’s the place to detail your success
    story - you can highlight the reasons behind choosing Dataiku and how it
    helped you reach your goals, as well as any important steps along your
    journey to success.
    Can you share more about your course content and how it aligns with
    real-word use cases that prepare students for their careers?
    Can you detail the innovative approach of your project and the impact your
    research has? 0/300 words

 5. Business area enhanced --Please choose an option-- Accounting/Finance
    Analytics Communication/Strategy/Competitive Intelligence Human Resources
    Internal Operations IT/Cybersecurity/Data Manufacturing
    Marketing/Sales/Customer Relationship Management Product & Service
    Development Risk/Compliance/Legal/Internal Audit Supply-chain/Supplier
    Management/Service Delivery Financial Services Specific Other - please
    specify Unknown
 6. Use case stage --Please choose an option-- Proof of Concept In Progress
    Built & Functional In Production Planned Archived/Paused Unknown

Value Generated
 7.  Can you explain the value created with this use case or success story?
     
     Now is the time to explain the impact achieved - this can be ROI, metrics,
     and/or any other indicators of success!Now is the time to explain the value
     generated - this can be ROI, metrics, and/or any other indicators of
     success! 0/300 words

 8.  What is the specific value brought by Dataiku?
     
     Some food for thought: speed and agility through increased team efficiency,
     enhanced tech stack efficiency, improved risk management and governance
     through transparency and explainability, upskilling and networking with
     resources such as the Dataiku Academy and Community…Some food for thought:
     speed and agility through increased team efficiency, enhanced tech stack
     efficiency, improved risk management, and governance through transparency
     and explainability, upskilling and networking through resources such as the
     Academy and Community… 0/300 words

 9.  Value type
     
     Improve customer/employee satisfaction
     
     Increase revenue
     
     Reduce cost
     
     Reduce risk
     
     Save time
     
     Increase trust
     
     Other
     
     Unknown

 10. Value range --Please choose an option-- Less than $1,000 Thousands of $
     Hundreds of thousands of $ Millions of $ Dozens of millions of $ Unknown

About your organization (optional if applying as an individual)
 11. Organization name
 12. Boilerplate
     
     Short, standard description of your organization 0/100 words

 13. Logo
     Browse files to attach
     Maximum size: 15 MB • File types allowed: JPG, PNG
 14. Industry --Please choose an option-- Aerospace & Defence Agriculture Auto
     Transportation & Logistics Construction & Real Estate Energy & Utilities
     Financial Services Banking & Insurance Health & Pharmaceuticals Higher
     Education Manufacturing & Chemical Media Information & Entertainment
     Nonprofit Professional Services & Consulting Public Sector & Government
     Retail Ecommerce & CPG Software & Technology Telecommunications Travel &
     Hospitality Other

About you
 15. Your full name
 16. Your title
 17. Your country --Please choose an option-- United States United Kingdom
     France Germany India Canada Afghanistan Albania Algeria Andorra Angola
     Antigua & Deps Argentina Armenia Australia Austria Azerbaijan Bahamas
     Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan
     Bolivia Bosnia Herzegovina Botswana Brazil Brunei Bulgaria Burkina Burundi
     Cambodia Cameroon Cape Verde Central African Rep Chad Chile China Colombia
     Comoros Congo Congo {Democratic Rep} Costa Rica Croatia Cuba Cyprus Czech
     Republic Denmark Djibouti Dominica Dominican Republic East Timor Ecuador
     Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Fiji Finland
     Gabon Gambia Georgia Ghana Greece Grenada Guatemala Guinea Guinea-Bissau
     Guyana Haiti Honduras Hong Kong Hungary Iceland Indonesia Iran Iraq Ireland
     {Republic} Israel Italy Ivory Coast Jamaica Japan Jordan Kazakhstan Kenya
     Kiribati Korea North Korea South Kuwait Kyrgyzstan Latvia Lebanon Lesotho
     Liberia Libya Liechtenstein Lithuania Luxembourg Macedonia Madagascar
     Malawi Malaysia Maldives Mali Malta Marshall Islands Mauritania Mauritius
     Mexico Micronesia Moldova Monaco Mongolia Montenegro Morocco Mozambique
     Namibia Nauru Nepal Netherlands New Zealand Nicaragua Niger Nigeria Norway
     Oman Pakistan Palau Panama Papua New Guinea Paraguay Peru Philippines
     Poland Portugal Qatar Romania Russian Federation Rwanda Samoa San Marino
     Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Slovakia
     Slovenia Solomon Islands Somalia South Africa South Sudan Spain Sri Lanka
     Sudan Suriname Sweden Switzerland Taiwan Tajikistan Thailand Togo Tonga
     Tunisia Turkey Turkmenistan Tuvalu Uganda Ukraine United Arab Emirates
     Uruguay Uzbekistan Vanuatu Vatican City Venezuela Vietnam Yemen Zambia
     Zimbabwe
 18. If you are applying as a team or organization, enter the name of your
     teammates (and usernames on the Dataiku Community if relevant):

I have reviewed and accepted the Submission Terms & Conditions.
Preview and Submit
0 %
0% complete


Explore use cases and success stories from outstanding Dataiku users below, and
give kudos to your favorites to show your support!

All kudos given by August 31 on the 2022 submissions of the Dataiku Frontrunner
Awards will be taken into consideration by our jury members.

Use the following labels to filter submissions by industry:

 * Energy & Utilities
 * Financial Services, Banking & Insurance
 * Health & Pharmaceuticals
 * Higher Education
 * Manufacturing & Chemical
 * Nonprofit
 * Software & Technology
 * Retail Ecommerce & CPG
 * Professional Services & Consulting
 * Telecommunications
 * Other

No posts to display.
Dayananda Sagar University - Developing Management Professionals with
Data-Driven Problem Solving and Decision-Making Skills
Name: Prof Alok ChakravartyProf H N ShankarProf Sai PraveenProf A. Nagaraj
Subbarao Country: India Organization: SCMS-PG, Dayananda Sagar University The
School of Commerce & Management Studies, Dayananda Sagar University, Bengaluru,
India, is a prestigious Business School with an emphasis on crafting superior
business leaders and entrepreneurs. The ethos of the B School is to stay fully
conscious of the changing business environment, particularly the technological
environment and the need for digital literacy, and to disseminate this knowledge
to our students. The school has an Executive MBA program for the working
professional and a full-time MBA program, which attracts many graduate students
with no work experience. Awards Categories: Excellence in Teaching   Business
Challenge: Our objective at the School of Commerce & Management Studies is to
integrate Business Analytics into each functional management area: Marketing,
Finance, Supply Chain Management, and Human Resource Management. In this
process, we prepare our students for the future and position them to add value
to digital transformation. Besides functional electives such as HR, Marketing,
Finance, and Supply Chain, we also offer Technology electives like Business
Analytics, Artificial Intelligence, and Information Technology as specialization
electives to our students. Students can choose a major and a minor
specialization in their second year. In the first year, we orient the students
with foundations of business analytics. In the second year, we offer the
following courses to our students who choose Business Analytics as a
specialization: Data Management Systems: Master Data Management, Data RDBMS,
Data Warehouse, No SQL, Big Data, Data Lake Data Visualization Using Tableau
Applied Analytics (in different functional areas)  Predictive Analytics Using R
Exploratory Data Analysis using Python We faced the following challenges while
designing and delivering our curriculum: Identifying competent faculties who
understand functional areas as well as business analytics.  Address the fear of
coding amongst students. Focus on problem-solving and not get bogged down in
technicalities. Addressing the process life-cycle view (such as CRISP-DM or
SEMMA) of business analytics. Understanding the core concepts of Business
Analytics in an easy-to-understand manner. Appreciate the integrated and
interdependent way in which industry professionals work while executing a
business analytics project. Exposure to an industrial strength platform that
provides hands-on experience in business problem solving to students.   Business
Solution: We at the School of Commerce & Management Studies are proud to be the
first Business School in India to have an academic alliance with Dataiku, USA.
We were introduced to Dataiku through our senior Prof H N Shankar in early 2021.
Dataiku made available a cloud instance with approximately 100 user ids. We are
utilizing these for our full-time and Executive MBA program students. One of the
advantages that we received because of this academic partnership was Dataiku's
Academy, which is a rich repository of courses such as Basics 101 to 103, Visual
Recipes, Visual Machine Learning, Advanced Analytics, and many more. We begin
our orientation to Business Analytics by introducing our students to Basics 101
to Basics 103 and Intro to Machine Learning course modules in Academy. We then
encourage our students to take Core Designer certification. We follow it up by
solving several Use Cases available in Academy, such as: Predictive Maintenance
in Manufacturing Industry Customer Churn Prediction Perform referrer and visitor
analysis using Web Logs Network Optimization for a Car rental company Bike
Sharing Usage Patterns   Value Generated: For the present batch of first-year
students, our Dataiku Academy metrics are as under: Master of Business
Administration Basics 101: 111 Basics 102: 103 Basics 103: 94 Intro to ML: 76
Use Case Predictive Maintenance: 5 Core Designer Certificate: 10 Executive
Master of Business Administration Basics 101: 42 Basics 102: 42 Basics 103: 42
Intro to ML: 42 Use Case Predictive Maintenance: 42 Core Designer Certificate:
42 Dataiku Data Scientists Mr Devesh and Mr Shubham conducted an on-premise
workshop on Dataiku's platform, and 103 students from MBA 1st Year attended it.
We also had a healthy coverage of Dataiku Academy courses in our outgoing batch
of second-year students, and that is reflected in our overall placement
percentage. Specifically, 90% of our outgoing students who had opted for
Business Analytics as a major specialization got relevant analytics positions in
good companies. Ten of our first-year students, along with four faculties, got
the opportunity to attend the Everyday AI Conference in Bangalore and that had a
major impact on the motivation and understanding of AI and Business Analytics
amongst students.   Value Brought by Dataiku: During the past decade or so,
business analytics platforms have evolved from supporting IT and finance
functions only to enabling business users across the organization or enterprise.
However, many firms find themselves struggling to take advantage of its promise
and the richness of data afforded. The data analytics program at the School of
Commerce & Management Studies is working to provide the industry with
well-trained resources to address digital world issues. By integrating the
Dataiku Platform into our curriculum, we could effectively address the
challenges mentioned earlier: Identifying competent faculties who understand
functional areas as well as business analytics. Address the fear of coding
amongst students. Focus on problem-solving and not get bogged down with
technicalities. Addressing the process lifecycle view (such as CRISP-DM or
SEMMA) of business analytics. Understanding the core concepts of Business
Analytics in an easy-to-understand manner. Appreciate the integrated and
interdependent way in which industry professionals work while executing a
business analytics project. Exposure to an industrial strength platform that
provides hands-on experience in business problem solving to students. It is a
problem when organizations decide to embark on a digital transformation journey
without having a clear strategy, action plan, or agenda, let alone a vision, for
what it might mean and the path ahead. Many organizations face and will continue
to face problems as they grapple with the process of change. At the School of
Commerce & Management Studies, we think that we have an able partner in Dataiku
in addressing this issue in an impactful manner.
0 2
Posted by alokchakravarty
Crowley - Leveraging Analytics & ML to Increase Revenue in Container Shipping
Name: Harsh Vora, Lead Data ScientistZachary Thorell, Business Data
AnalystSandeep Punjari, Data Analyst 3Irwin Castellino, Director of Data and
AnalyticsDeepak Arora, Vice President Corporate StrategyJavier Diaz, Senior
Analyst Quality Assurance OpsFederico Gervasio, Industrial EngineerShannon
Sarkees, Sustainability, Strategy & Digitization ManagerTishlee Rivera, Business
Intelligence and Analytics DirectorSudip Roy, Big Data Solutions ArchitectSanjay
Khobragade, MLOps Architect Country: United States Organization: Crowley
Crowley, founded in 1892, is a privately-held, U.S.-owned and operated
logistics, government, marine, and energy solutions company headquartered in
Jacksonville, Florida. Services are provided worldwide by four primary business
units – Crowley Logistics, Crowley (Government) Solutions, Crowley Shipping and
Crowley Fuels. Crowley owns, operates, and/or manages a fleet of more than 200
vessels, consisting of RO/RO (roll-on-roll-off) vessels, LO/LO
(lift-on-lift-off) vessels, articulated tug-barges (ATBs), LNG-powered
container/roll-on, roll-off ships (ConRos) and multipurpose tugboats and barges.
Land-based facilities and equipment include port terminals, warehouses, tank
farms, gas stations, office buildings, trucks, trailers, containers, chassis,
cranes, and other specialized vehicles. Awards Categories: Most Extraordinary AI
Maker(s) Most Impactful Transformation Story Business Challenge: Crowley did not
have a centralized platform to utilize data and machine learning for
decision-making in our logistics business unit, where we face several
fundamental issues: A. Missed revenue from dummy bookings – Customers book extra
slots for containers on container ships and eventually show up at port with
fewer containers since no cancellation fees are enforced (industry standard). B.
Lack of demand forecast for each node – The availability of empty containers at
the right nodes/ports in the supply chain is the key to meeting our customers’
demand. We did not have a historical and forecasted view of the demand for each
container type, and between each origin and discharge node, which is key to
enabling decision-making for empty container repositioning. C. Late customs
documentation – Improper or late customs documentation provided by customers
resulted in offloaded containers residing at the port, costing the port time and
space, and incremental efforts for planning fulfillment. D. Unknown container
weights – Each container ship has a maximum weight capacity. However, the
weights of containers booked on a ship were only known once they are weighed at
the port, resulting in last-minute planning for stowage (placement of containers
on the ship) and accommodating weight constraints. E. Lack of Carbon Footprint
estimation – Our customers seek to estimate the carbon footprint of their supply
chain. We did not have the technology and tools to automate and expose the
calculations of the carbon footprint from container shipments. F. Lack of
Predictive Maintenance – Port equipment to load containers onto ships, trains
and trucks are prone to failure due to extreme loads. Unplanned and immediate
maintenance requests are disruptive and expensive. G. Non-targeted Promotions –
The marketing methods for logistics customers were a manual and subjective
process. A data-driven methodology to predict customer churn can improve the
targeting of marketing efforts, especially for non-contract customers. Business
Solution: As a 130-year-old company undergoing digital transformation, we seek
to utilize predictive and prescriptive analytics with our business leadership to
boost our revenue, customer experience, employee experience and sustainability
efforts. We pioneer digital transformation in the supply chain industry through
(1) centralization of data from our operational, commercial and sustainability
data into a data warehouse, (2) utilization of singular platform (Dataiku) to
develop predictive and prescriptive analytics that enables all personas through
no-code, low-code, and full-code capabilities, and (3) democratization of data
engineering and machine learning activities through employee upskilling
programs. Through Dataiku, we developed/are developing solutions to our focus
areas: A. Container sail/rollover model [in production] – We developed a
classification model to predict the probability of show/no-show for each
container booked on our ships, providing visibility of at-risk containers to our
voyage planning team for improved decision making. B. Demand Forecasting [in
production] – We utilized Dataiku’s AutoML capabilities to forecast demand
associated with each container type, between each load and discharge node,
enabling strategic decisions around empty container repositioning on a weekly
basis. C. Customs documentation classification [in production] – We developed a
classification model to predict the probability of improper/late documentation
for each container, reducing manual work for our claims and customs department.
D. Predict container weights [in development] – We are developing a regression
model to predict the weight of containers booked on our voyages prior to them
arriving at the port, enabling improved voyage planning with constrained weights
of booked containers. E. Estimate Carbon Footprint [in development] – We are
developing a methodology to dynamically calculate and serve the estimated carbon
footprint as a service using Dataiku’s API capabilities. F. Predictive Fleet
Maintenance [in development] – We are developing an anomaly detection model to
identify concerning signatures from sensors on port equipment to implement a
recommender system for inspection, reducing unplanned maintenance. G. Predict
Customer Churn [in development] – We are developing a customer churn
classification model to improve the targeting of our marketing and promotional
efforts in our logistics business. Value Generated: Classification model to
predict Container sail/rollover – Expected revenue gain of $5000-$10000/week,
and 10-15 hours of employee time saved per week. Demand Forecasting model –
Expected revenue gain of approximately $10000-$20000/week, and approximately 5
hours of employee time saved per week. Customs documentation classification
model – Expected cost savings of approximately $5000-$10000/week, and
approximately 5 hours of employee time saved per week. Regression model to
predict container weights – Expected revenue gain of approximately
$5000-$10000/week, and 5-10 hours of employee time saved per week. Estimate
Carbon Footprint – This will be rolled out in a new product offering that
enables optimization of supply chains based on carbon footprint and will
position Crowley as a sustainability leader in the supply chain industry. Two
potential customers have been identified, potentially generating revenue within
the first year. Predictive Fleet Maintenance – Expected reduction in unplanned
maintenance and last-minute planning of port equipment, and potential reduction
in scheduled maintenance time. Potential cost saving tens to hundreds of
thousands of dollars per year. Predict Customer Churn – Improved promotional
targeting, reduction in manual hours for marketing, and data-driven
identification of at-risk customers are expected to enable superior customer
service to at-risk customers, increasing customer retention. In addition,
Dataiku has also generated additional value at Crowley through the
democratization of data analytics through upskilling and enablement of Crowley’s
business analysts. Value Brought by Dataiku: Prior to Dataiku, each department
worked in a silo utilizing disparate ETL, analysis, and reporting tools that did
not integrate well. Dataiku provides a centralized, end-to-end platform for
business analysts, data engineers, and data scientists to work together on
analytics use cases. Another significant value addition comes from the
interactive visual interface and a great suite of AutoML models provided by
Dataiku, enabling data analysts to design predictive and prescriptive models.
For the MLOps team, Dataiku provides a seamless manner of registering and
deploying models to production. The deployer enables the necessary governance
checkpoints and the inbuilt drift monitoring, metrics and checks enable the
development of appropriate post-production alert systems. Finally, Dataiku
simplifies the infrastructure needs of a maturing company. Our compute needs are
always changing/increasing, and Dataiku’s Fleet Manager enables seamless scaling
of servers and Kubernetes clusters. Due to the popularity of data science
workflows developed in Dataiku at Crowley, the tool has got increasing interest
from data engineering teams that are exploring the ETL functionalities that
Dataiku provides, especially through seamless integration with Snowflake.
0 1
Posted by harsh9127
Excelion Partners - Building a Free Plugin to Efficiently Catalog and View Data
Lineage
Team members: Ryan Moore & Tony Olson Country: United States
Organization: Excelion Partners Excelion Partners is a consulting organization
with cloud data architects, data scientists, data engineers, and data analysts
that are passionate about finding answers and building solutions with data. We
help you "Decide with Data." Awards Categories: Partner Acceleration Moonshot
Pioneers Business Challenge: At Excelion Partners, we work with numerous
customers who utilize Dataiku in their Data Science and Analytics practice. Many
of these organizations and analytics groups have not yet invested in an
enterprise data cataloging tool or data lineage tool, which are often
cost-prohibitive. As part of the productionalization process for these
customers, we have often witnessed them creating "homegrown" data cataloging
solutions that typically consist of a combination of spreadsheets, Dataiku, and
their preferred visualization tool. Their “homegrown” data cataloging solutions
are labor-intensive to maintain and do not integrate with their developers, who
are hands-on with the Dataiku projects. Additionally, our clients struggle with
data lineage. They are creating numerous downstream datasets in Dataiku. We
often experience them saying “where did that column come from?” Without upstream
data lineage visibility, our clients lose trust in the data and ultimately the
solution’s business outcomes.   Business Solution: Because of this cataloging
and lineage challenge, Excelion has created a free Dataiku plugin called Thread.
Thread is a lightweight catalog and lineage tool that directly integrates with
Dataiku and its datasets. This tool allows for a single location to document
data connected to Dataiku and to consume the catalog's contents in a manner that
is easy and efficient for business practices. Thread is implemented as a Dataiku
web app plugin which has a very easy installation process and has the ability to
securely scan an entire (or partial) Dataiku node to allow for lineage view and
documentation. The indexes and metadata that are generated by Thread are saved
as Dataiku datasets in a project flow, making it very easy to export indexes and
metadata for exposure in 3rd party visualization tools such as PowerBI or
Tableau. Use Case Stage: In Production Value Generated: THREAD has already been
deployed on 100s of projects at multiple joint Excelion and Dataiku clients.
Here are some areas of business value THREAD users have obtained: Creates
Efficiencies Less clicks / saves time by having the data definition at the time
and location the information is needed. More insights and improved insights
during exploratory data analysis through better documented columns. This all
leads to faster solution building and data enrichment through documentation and
improved data understanding. Improves Governance Clear measurement of governance
through KPIs showing the percent of columns documented in any data set.  Creates
a repository for data documentation. Easier to keep definitions up to date.
Allows definitions to be easily auditable (exportable). Natively integrated with
Dataiku permissions that limit editing data definitions to those with access.
Improves Trust Creates easy transparency for data analysts, data engineers, data
scientists, and business leaders to see: What data was used in a project (data
catalog). Where it was used (upstream/downstream data lineage).  How that data
is defined throughout the project (data dictionary).  Builds a common language
between the business and analysts Training & Onboarding Efficiencies Helps new
team members learn company-specific jargon and abbreviations faster. Streamlines
onboarding and training by keeping all individuals in Dataiku instead of a
myriad of spreadsheets and code documentation. Saves Money and Labor Saves
Analytics leaders $200k+ in purchasing, implementing, and supporting an
enterprise grade data catalog & data lineage tool for their Dataiku environment.
  Value Brought by Dataiku: Thread is built on top of Dataiku! All the value
Thread creates is an extension of and possible because of Dataiku. Dataiku’s
flexible and extensible platform allows the community to contribute and share
solutions across organizations and industries easily. The ability to write
custom plugins and integrate with the Python API provide the capability to
achieve exceptional business value through custom integrations. The native
security integration removes governance concerns on building application
solutions on top of Dataiku and thus increases the speed of the innovation
process. Value Type: Reduce risk Save time Increase trust
3 5
Posted by rmoore
Last reply Wednesday by rmoore
Tom Brown (41xRT) - Helping Nonprofits Leverage Insights From Their Data
Name:Tom BrownTitle:Non Profit Data Science & Analytics AdvocateCountry:United
StatesOrganization:41xRTDescription:41xRT is “Where Arts and Technology Meet”.
This is a name that I use while working with Cultural Non-Profit Organizations
either with data technology, group facilitation or his own computational art
work. In this context, I typically work on opportunities to allow data from
patrons to speak more clearly to organizations helping them take smarter
actions. Almost all of the work that I have done with organizations is on a pro
bono basis helping the organizations build new data oriented capabilities.Awards
Categories: Organizational Transformation Data Science for Good AI
Democratization & Inclusivity Alan Tuning Challenge: As a personal passion, and
professional mission, I am helping non-profit organizations around the world
better understand their stakeholders through data and take actions based on
these insights. Two challenges commonly arising when it comes to data science in
the non-profits sector, particularly when trying to move beyond basic monitoring
and evaluation toward the use of predictive models to drive more productive
action: 1. Lack of proper infrastructure for data management & analysis As a
striking example, when I started contributing to data analysis for a community
college a few years ago, the collection pipeline was... making tick marks on a
sheet of paper and having a work study student convert those tick marks to a
spreadsheet! This was then used to produce end-of-year summary reporting. That
is at the extreme end, to be sure, but many grassroots organizations rely on
Excel spreadsheets. Even the largest cultural non-profit organizations don’t
typically have data science tools to support the building of data pipelines and
predictive modeling. 2. Vision and skills challenge for data science & AI The
second challenge reflects a deeper issue of stakeholders’ awareness and ability
to understand what data science is, what value it can bring, and what is
possible through model operationalization to drive optimal action. In an
already-tense market, it is difficult for nonprofits to hire for specialty data
skills, especially as they have historically placed a priority on hiring more
for “people skills”, including word literacy, fundraising, and passion for
mission, than to build models that drive optimal performance. Solution: Hence my
work revolves around pro bono consulting to build awareness and capabilities
around how nonprofits think about data, data pipelines, and predictive analytics
for their organizations - and Dataiku as a company, community, and tool has in
so many ways helped move these endeavors forward. 1. Cleaning data for
visualization at a Community college I started using the free version of Dataiku
back in early 2017 (version 3) for a project with a Community College. This
project enabled me to develop my own awareness of data science and tools that
were accessible to non-programers. During this project, I used the visual
recipes to turn messy data into clean data for visualization. The first major
project helped the library to understand seasonal student flow at the reference
desk. This understanding allowed staff to improve staffing levels at needed
times to improve the student experience. 2. Predicting attendance at a children
science museum Then, I brought Dataiku to Liberty Science Center, where I was
spearheading Digital Projects & Analytics, and they benefited from the donated
license as part of the Ikigai program. Our initial objective was to forecast
future year attendance. Thanks to Dataiku resources and the versatility of the
platform, we grew our data science skills to create features and develop a model
that confirmed some staff hunches: at a children's science museum, attendance is
strongly correlated with weather! Through simulating future years based on 20
years of past weather data, we found upper and lower bounds on attendance to
inform the annual budgeting process. Our next project, which started just before
COVID-19 hit, was to use the pipeline features to manage customer records in the
fundraising and ticketing CRM system. 3. Equipping nonprofits with data science
they can use Subsequently, I helped various non-profit organizations design and
implement data science projects using Dataiku, which provides an interface to
operationalize and leverage part of the work for other data initiatives. From
cleaning data to re-import into the CRM system of Synchronicity for a women-run
theater in Georgia, to audience segmentation and retention projects for the
Cascade Bicycle Club, and membership churn modeling for a children's museum in
Minnesota. 4. Building communities to share data science knowledge & learnings
To expand my own knowledge throughout these endeavors, I sought to exchange with
peers in the industry. That involves hosting events for the Dataiku New York
User Groups, helping users solve their issues on the Dataiku Community as a
Dataiku Neuron, facilitating an ‘analytic Coffee’ group, as well as an fast.ai
study group among cultural non-profit administrators. Each of these activities
helps to build awareness and capabilities for myself and emerging nonprofit
leaders that now better understand the value of data science to facilitate
smarter action. Impact: Data science is still at the infancy stage for most
non-profit organizations. With Dataiku and a growing expertise to develop
projects, enable team members, and communicate value to stakeholders, I was able
to: 1. Deliver projects into organizations which wouldn’t have tackled them by
themselves All projects listed above resulted from challenges known by
nonprofits, but they did not have the awareness, the technology, nor the skills
to solve. Having a single platform to build and operationalize projects enabled
us to build solutions which wouldn’t be possible with the previously manual
spreadsheet work. 2. Convert data into practical insights for the organizations
Dataiku’s visualization features proved invaluable to communicate insights from
data analysis to the broader organizations. This was key to show the value of
data science initiatives, and enable further investment from staff in time and
resources. 3. Build repositories of data science projects to leverage for future
endeavors With the visual interface, workflows become understandable - even for
non-data literate team members. This enables everyone to build upon existing
work from more technical people, and leverage parts of it to conduct their own
projects. 4. Onboard, enable, and upskills staff members and volunteers to
leverage more insights from their data Thanks to the user-friendly interface,
online resources, and programs such as Ikigai which provided a full-featured
license and training, I was not only able to bring data science into all these
organizations - but more importantly provide a pathway for them to build their
own vision of what data science can bring to them. Users are quickly able to
learn new data skills, and some have started to produce their own insights and
build more advanced projects to grow their organizations. Although we’re living
in a world of data science, most non-profit organizations still have a long way
to go to embed the value of data science into their organizations and reap the
benefits of smarter stakeholder interactions. With Dataiku, I have been planting
the seeds of data democratization, enabling more stakeholders to leverage it and
enable organizational change to fulfill their mission and change the world for
the betterment of all.
0 3
Posted by tgb417
Cascade Bicycle Club - Laying the Foundation For Volunteers Collaboration on
Data Insights
Team members:Christopher Shainin, Technology ManagerTom Brown, Volunteer Data
ScientistAkshay Kotha, Volunteer Data Scientist Sindhujaa Narasimhan, Volunteer
Data Scientist Anas Patankar, Volunteer Data Scientist Sankash Shankar,
Volunteer Data Scientist Megan Thomas, Volunteer Data ScientistCountry:United
StatesOrganization:Cascade Bicycle ClubDescription:Cascade Bicycle Club, the
nation’s largest statewide bicycling nonprofit, serves bike riders of all ages,
races, genders, income levels, and abilities throughout the state of Washington.
We teach the joys of bicycling, advocate for safe places to ride, and produce
world-class rides and events. Our signature programs include the Seattle to
Portland, Free Group Rides, the Pedaling Relief Project, the Advocacy Leadership
Institute, the Bike Walk Roll Summit, Let's Go, and the Major Taylor
Project.Awards Categories: Organizational Transformation Data Science for Good
AI Democratization & Inclusivity Challenge: Cascade Bicycle Club, the nation’s
largest statewide bicycle nonprofit, serves bike riders of all ages and
abilities throughout the state of Washington. With a mission to improve lives
through bicycling, they teach the joys of bicycling, advocate for safe places to
ride, and produce world-class rides and events. In the fall of 2020, Cascade
Bicycle Club invited a team of Pro Bono data scientists to help them understand
and re-engage riders during and after the COVID-19 pandemic. The intent was to
use existing transactional data held in Salesforce to model rider segments, as
well as past drivers of engagement and churn behavior to better understand how
they could better engage with riders. At the time of making this offer, Cascade
Bicycle club had no infrastructure appropriate for data science work. Cascade
was also wary of allowing Personally Identifiable Information (PII) on
infrastructure not under Cascade Bicycle Club’s direct control. How could
Cascade Bicycle Club quickly create an enterprise-class data science
infrastructure that would allow a small team of volunteer data scientists from
across the United States to work together? The solution had to involve providing
familiar data science tools like Python, Jupyter notebooks, R, SQL, as well as
access to Salesforce data for analysis, and eliminating the need to move
customer data to analysts’ computers. Solution: As we started on this endeavor,
we reached out to the Dataiku team about the Ikig.ai program. With a donated
license, we were able to provide the platform to a small team of volunteer data
scientists to collaborate on data analysis. Within less than a month, we’ve
built out an AWS instance, connected data from Salesforce via a standard plugin,
and made it available on Dataiku for collaboration - whereas the whole setup
would usually take several months or more for a nonprofit to accomplish. This
was made possible thanks to a team effort involving support from Dataiku, the
willingness from Cascade to invest in some additional AWS infrastructure, the
willingness of team members to move to a new platform (and move their Jupyter
notebooks!). We were able to gain a quick impact through launching several
projects: Rider segmentation, in order to better understand their objectives and
behaviors. Rider retention and conversely ways to minimize churn. CRM cleaning
through de-deduplication to lay the basis for further analysis. To work on
these, we were able to invite an additional five pro bono data scientists into
the process, who were quickly onboarded on Dataiku as we were able to reuse
existing Python, notebooks and Dataiku data flows. Impact: Cascade wouldn’t have
been able to securely leverage data science tools and techniques without a
central platform. Dataiku has provided a home for data science operations for
the organization, around three main pillars: 1. Enable collaboration between
team members & volunteers Dataiku DSS provides a controlled environment to
enable volunteers from around the United States an opportunity to collaborate on
a common set of data and work in an environment with standard data science
tools. Furthermore, thanks to its versatility, the platform allows each
contributor to leverage the technologies and techniques they’re most familiar
with - which has been pivotal in allowing volunteers to help as a side activity.
This project provides a basic roadmap showing that nonprofit organizations can
find creative ways to build infrastructure and leverage data science skills in
order to participate in today’s data science revolution. 2. Facilitate
reusability of past projects & workflows The visual interface allows everyone to
view the workflow of other participants and assess where they can contribute
their time and expertise. It also makes it easy to onboard new volunteers, as we
did with a second round of contributors, and enable them to gain a quick
understanding of projects conducted, as well as reuse parts of it for their own
endeavors (thanks to copy/pasting steps of the flow & duplicating projects!). 3.
Adopting a data-driven approach As we were able to conduct our first data
science projects in Dataiku in a short time, and already show an impact on the
organization, we’re planting the seeds of a data science culture at Cascade
Bicycle Club - and laying the foundation for further engagement by staff and
future groups of volunteers. This project becomes a template that can be
reproduced by others wishing to leverage data science at the scale of a
nonprofit organization.
1 4
Posted by cshainin1
Last reply 09-01-2021 by angie-gallagher
Premera Blue Cross - Building a Feature Store for Quicker and More Accurate
Machine Learning Models
Team members:Marlan Crosier, Senior Data ScientistNorm Preston, Manager of Data
Science team Jing Xie, Data Scientist Adam Welly, Data Scientist Jim Davis,
Statistician Greg Smith, Healthcare Data Analyst Gene Prather, Dataiku System
AdminCountry:United StatesOrganization:Premera Blue CrossDescription:Premera
Blue Cross is a not-for-profit, independent licensee of the BCBS Association.
The company provides health, life, vision, dental, stop-loss, and disability
benefits to 1.8 million people in the Pacific Northwest.Awards Categories: AI
Democratization & Inclusivity Responsible AI Value at Scale Alan Tuning
Challenge: The feature store is an emergent concept in data science. It consists
of a storehouse for features, which can be used in a variety of Machine Learning
models. It helps streamline the process of building machine learning models and
make it overall much more efficient, thanks to hundreds to thousands of features
easily available. Before the development of our feature store, we had to build
features for each new model from scratch. Building a feature from scratch can
take several days or even weeks. Besides the significant additional time
required to build new features, such one-off features were often not as well
tested and so models were more likely to be impacted by errors. The other big
impact was that often we were unable to test as many features as we might have
liked to. The result is that our models were not as accurate as they could have
been. Solution: Overview Our feature store currently includes 283 features. As a
health insurance company, members are foundational entities in our business and
all features are currently linked to a member. Our features are built from data
in a SQL-based data warehouse. All features are all pre-calculated (vs.
calculated on the fly). All processing runs in-database via SQL. In other words,
we used SQL to build our features - as with the amount of data we are working
with, using Python would not be practical. Given the pre-calculated approach,
the resulting feature tables are fairly large since, for many of our features,
we store daily values for our entire member base. Most features are thus updated
daily (that is, new values are calculated daily). Day-level feature values are
sufficient for the vast majority of our use cases. 1. Structure Our feature
store includes a core table and then several tables for specific types of
features. The data in these other tables is of a particular type or source, and
is available in a particular timing. The benefits of this approach are multiple:
Easier development, e.g. each table has its own Dataiku project. Scales better
over time, as we don't have to worry about limits in terms of number of columns.
Gives data scientists options regarding what data to include in their models.
The data types for each feature store table have been carefully selected to
minimize storage requirements, and more importantly to minimize the memory
footprint when data is read into Python-based machine learning processes. 2.
Development & Deployment We currently use a team approach for developing new
features, and Dataiku’s collaboration features have been very helpful here. Each
developer is provided with a current copy of the relevant feature store project,
and then use either version control tracking to identify changes and additions,
or git-based project merging to facilitate integrating the changes back into the
main project. We deploy updates to our feature store using Dataiku’s automation
instances. Develop and test takes place on the Design instance, then updates are
deployed to a Test instance, and finally to a Production one. We have
incorporated a variety of design time and run time checks (via Dataiku’s Metrics
and Checks) to assure data accuracy and reliability. Additionally, we developed
Python recipes that largely automate the feature table update process - for
instance, copy data from existing table, drop existing table, create new table,
copy existing data back in and then add new feature data. 3. Metadata &
Discoverability Each of our feature projects includes a metadata component. This
metadata is entered via a Dataiku Editable dataset and includes attributes like
name, data type, definition, a standard indicator to use for missing values, and
feature effective date. Since we were building it for a small team, and we
wanted to try all store features in our models, we have initially been focusing
on building features rather than discoverability. We are now building a fairly
simple webapp in Dataiku to provide data discoverability and exploration, in
preparation for rolling out the feature store to more teams in our company. This
discoverability tool utilizes the feature store metadata described above. 4.
Usage Data Scientists can incorporate feature store features in their projects
using the Dataiku macro feature. The macro enables selection of subject areas to
include and specification of how to join feature data to project data. The macro
handles missing value logic and maintenance of data types to minimize memory
demands in Python-based machine learning processes. Impact: The overriding
benefit of our feature store is of course how much more quickly we can develop
machine learning models. Developing an initial model often takes just a few
hours, whereas without our feature store that same model may have taken days,
weeks, or even months. In some cases, our final ML models only use features from
our feature store - although more commonly we supplement these features with
features designed for the particular problem at hand. Additional benefits
include: Models are less likely to be impacted by errors or leakage, as features
are more extensively tested and thus. Better accuracy in our models, as we are
able to test many more features than we would be able to without a feature
store. We've also experienced a bit of a virtuous cycle effect with our feature
store. As the store expands and the value increases, it's easier to justify
investing the resources to develop new features, test them thoroughly, assure
that leakage is not occurring, etc. This in turn further increases the value of
the store which makes it even easier to invest in further enhancements. And so
on! At the company level, our ability to develop more accurate models more
quickly also enables more areas in the organization to benefit from our data
science investments.
0 3
Posted by Marlan
AstraZeneca - Toward Self-service AI and Analytics for World-changing Innovation
AstraZeneca has never attempted to solve the full landscape of data pipeline,
machine learning, and data visualisation within a single tool due to the
inherent complexity required in building and maintaining the broad spectrum of
capabilities that would be required. Due to this, the lifecycle for a project
from data wrangling, through cleaning, manipulation, data science,
visualisation, and deployment could see a user working across multiple tools and
platforms for each stage of their pipeline.
0 7
Posted by ak12
RiseHill Data Analysis - Using AI to combat the Rise in Corporate Fraud in
Malaysia
Name:Siti Sulaiha Binti Subiono Title:Data
ScientistCountry:MalaysiaOrganization:RiseHill Data Analysis Sdn.
Bhd.Description:Risehill Data Analysis Sdn. Bhd (RDA), is a high-tech
development and service company registered in Kuala Lumpur, Malaysia, which is
specialized in petroleum technique consulting, services, and data analytics. The
company is committed to comprehensive technical research, development, and
consultation based on the concept of ‘the integration of multiple sources of
data'. Currently, the company has some software copyrights, technical patents,
and the tailored workflow and solutions for some particular and challenging
problems. The company aims to be a world-class integrated service in data
analytics and is acknowledged for its state-of-the-art technology
provider.Awards Categories: Data Science for Good AI Democratization &
Inclusivity Responsible AI Value at Scale Challenge: To detect fraudulent
activity, most organizations used to rely on a rule-based approach, which
requires an algorithm to perform several defined scenarios - and the workflow
must be manually updated if new scenarios or trends come in. As fraud tactics
have become more advanced, this approach is now outdated. The vast number and
size of datasets at hand also made fraud detection more challenging. Based on
the Crime Statistics Malaysia 2020 by the Department of Statistics Malaysia,
Corporate Fraud which involves bribery, corruption, and asset misappropriation
recorded an increase from 2018 to 2019. The Covid-19 pandemic also contributed
to the rising trend in fraud cases, as it accelerated the need for effective
payment channels between consumers and companies - and faster payments can
potentially mean faster crime. In addition, Malaysian organizations are quite
slow to adopt AI technologies combating fraud, due to a number of factors.
First, the increasing amount of data of questionable quality, which makes it
harder to leverage. Second, corporations still do not trust technology as a tool
in detecting fraud effectively and tend to keep conventional investigation
methods, which are time-consuming. The last challenge lies in the shortage of
local talents, which hinders progress in detecting fraudulent activities. As a
Data Scientist, I also have challenges in building the whole workflow, which is
a very lengthy process - from joining data from various sources, doing
exploration, building machine learning models using Java or Python, fine-tuning
those or optimizing computing time, until deployment. We found that Dataiku
enabled to fill these gaps, so that the RiseHill Data Analysis Team stands
together in combatting the rise in corporate fraud in Malaysia using AI and Data
Analytics. We want many companies in Malaysia to open their eyes and use
advanced technology and tools to combat this issue before it’s become worsen.
Solution: RiseHill Data Analysis Sdn. Bhd. leverages Dataiku to develop Machine
Learning models as a more effective method in detecting fraudulent activities,
as well as a more secure and efficient approach - moving past the old school
“rule-based” approach. We are now able to centralize data exploration,
wrangling, and the creation of machine learning models in one platform - hence
Dataiku helps us save time in the development and deployment phases of the
models. Our favorite feature is Data Partitioning, which enables us to refresh
the data on a daily basis, while Dataiku will only re-build the workflow with
the partition that contains the new data. This is especially helpful to re-train
models efficiently. Machine learning relies on pattern recognition and
classification to distinguish legitimate transactions from fraudulent ones
occurring through online payments channel. The types of classification we used
are using are based on user identity, order history, location of the payment,
time of transactions, and amount spent: In Identity classification, we use the
age of the customer, the amount of the characters they used in their email
address, fraud rate of their IP address, and the number of devices with which
they access the organization’s site. In Order History, we use the data when the
orders were placed, or time period, the amount spent in each transaction, and
the data on how many orders were attempted and failed. In Location
Classification, fraudulent activities can be detected through mismatch between
the billing and shipping addresses, or between the user's IP location and the
shipping address. In Method of Payment Classification, credit card details, name
of the customer and the shipping information must reference the same country,
and the credit card used by the customer must not be issued by a bank with a
reputation of fraudulent transactions. Impact: Machine learning is particularly
helpful to organizations which implement these models in the long run, as they
are able to remove non-legit transactions and streamline the acquisition of new,
reliable customers.  It also enables risk mitigation, as these techniques detect
more advanced fraud than the traditional rule-based approached. Our approach has
the potential to be generalized in Malaysian organizations to fight fraud.
Dataiku has benefited both our organization and customers through fraud
detection - and enabled us to save time on development, execution, and
deployment of modeling. No more hard-code!
0 3
Posted by sulaihasubi
ALMA Observatory - Empowering a Data-driven Organization to Improve Astronomical
Operations
Team members:Ignacio Toledo, Data Analyst, Data Science Initiative LeadTomás
Staig, Software Development Lead, Data Science Initiative Lead Rosita Hormann,
Software Engineer Jorge García, Science Archive Content Manager Jose Luís Ortiz,
Technical Lead - Digital Systems Mark Gallilee, Technical Lead - Mechanics
Sergio Pavez, Software Engineer Takeshi Okuda, Senior Instrument Engineer Gastón
Velez, Systems Administrator Maxs Simmonds, Technical Lead and Deputy - Archive
and Pipeline Operations Jorge Ibsen, Head of the Department of
ComputingCountry:ChileOrganization:ALMA ObservatoryDescription:The Atacama Large
Millimeter/submillimeter Array (ALMA) is an international partnership of the
European Southern Observatory (ESO), the U.S. National Science Foundation (NSF)
and the National Institutes of Natural Sciences (NINS) of Japan, together with
NRC (Canada), NSC and ASIAA (Taiwan), and KASI (Republic of Korea), in
cooperation with the Republic of Chile. ALMA -the largest astronomical project
in existence- is a single telescope of revolutionary design, composed of 66 high
precision antennas located on the Chajnantor plateau, 5000 meters altitude in
northern Chile.Awards Categories: Organizational Transformation Data Science for
Good AI Democratization & Inclusivity Challenge: Just as close as 15 years ago,
most of the earth-based observatories were small facilities producing data for
astronomical research, bearing more resemblance to laboratories than to
industries. However, since the beginning of the 2000s, more complex and
ambitious observatories have been built, with multi-million dollar budgets. A
major issue emerged: these could not be operated with a staff of 5 to 10 people,
with one or two astronomers coming onsite to do their own experiments. As
institutions, today's big astronomical observatories have become gigantic "data
industries", producing terabytes (and soon petabytes) of data every year to
power scientific research. ALMA requires a staff of 300+ people and, to provide
4,300 hours of useful scientific data from our skies in a given year, the same
time must be spent on maintenance and updating activities. That includes
hardware components such as the "radio interferometer" (a virtual telescope made
of 66 antennas and two giant computers to join their signals), software systems
used to collect and process the data, but also monitoring power supplies and
weather conditions to ensure that observations are being performed with a
sufficient level of quality. In short, the volume of data from observations
increased, along with the variables to consider to operate an observatory
correctly. Yet, we didn’t have the proper tools and processes to make sense of
this new data. While we asked ourselves questions, we did not have the ability
to provide quick and efficient answers. For instance, we once received an
avalanche of problems reported from a particular hardware component, which
became of critical importance as it impacted the quality of the observations
performed. We began analyzing the number of successful hours observed in that
month with this particular component - it turns out, it was the most productive
month ever for that component! Obviously this seemed contradictory, but it
registered more problems because it was simply used much more. This all required
a simple data analysis to find out, but we didn’t realize this sooner because we
did not have the tools nor the infrastructure to query and parse the data, clean
it, and enrich it with other data sources. This lack of efficient analytical
tools for system diagnostics pushed us to look for them outside the
organization. Here comes Dataiku, and the Ikigai program giving free licenses to
nonprofit organizations. Solution: With Dataiku, we’re building an
infrastructure that is allowing the observatory’s staff to take their analytical
work to the next level through: 1. Giving all people access to all the relevant
data sources Our databases were previously only accessible by astronomers to
process data for scientific research. As the central data science platform,
Dataiku enables our whole organization to participate in the analytical process
and find out answers for their day-to-day work. For instance, now engineers and
data analyst can access the CMMS*, Jira tickets, and log files from a data
warehouse populated using the ETL and data preparation capabilities provided by
Dataiku, and they can enrich their analysis by joining and correlating data that
was previously difficult to access and analyse. 2. Enabling them to upskill
through integrating with a big technology stack Dataiku provides a visual
interface to enable all technical levels to collaborate, while integrating with
most current technologies to facilitate upskilling - for instance, learning a
bit of SQL to query the data in various ways. The resources provided in the
Dataiku Academy, as well as the Community platform where anyone can get quick
answers from other users and experts (thanks to fellow Neurons!), are highly
valuable for everyone to gain new knowledge. 3. Providing ways to leverage more
advanced techniques, incl. machine learning Dataiku also provides ways for even
the less technical staff to foray into machine learning, thanks to its
user-friendly AutoML features, and the visual interface showing (and explaining)
the most relevant performance indicators of different models - also conveniently
summarized in the models competition page! 4. Easily presenting insights with
user-friendly data visualization capabilities Anyone in our staff is able to
perform exploratory data analysis, thanks to visual features and a drag-and-drop
charting interface - and those willing to code can do deeper at their own pace.
Presenting final results is also greatly accessible, with dashboards composed of
tiles, to centralize from other parts of the project, in just a few clicks. 5.
Giving guidance and resources to onboard enable everyone in the organization
Lastly, Dataiku has been key to easily onboard new users and make them realize
the value of data insights. We’ve developed a Working Group with members of the
Software, Engineering, and Science teams, with the mission to train new users
and propagate best practices. We’re leveraging content from the Dataiku Academy,
and are highly involved with the Community platform where any users can go to
ask questions and share knowledge. We’re also currently leading a hands-on
challenge in which volunteer users give their time and expertise to bring a
valuable contribution to ALMA through seeking to automatize quality assurance
assessment. Always more people internally and externally collaborating in
Dataiku, to advance the ‘search of our cosmic origins’! *Computerized
maintenance management system (CMMS), also known as computerized maintenance
management information system (CMMIS), is a software package that maintains a
computer database of information about an organization's maintenance operations.
Impact: Today, the ALMA Observatory is one of the first earth-based
observatories, if not the first, to make advancements in using data science,
machine learning and automation to improve its operations. By bringing people
together on a single platform, Dataiku helped grow general awareness on data
analytics and taking decisions based on information produced by the data. Now
the value of analytical work is broadly recognized across the organization,
triggering fruitful cross-functional collaborations between various profiles -
astronomers, but also analysts, archive managers, software engineers, system
engineers, etc. This leads to many wins across the organization, in which
Dataiku replaces old processes to improve efficiencies through saving time and
resources for building and maintaining data projects, plus optimizing through
automation, machine learning, and easy monitoring among other features. For
instance, the data management team needs to keep track of observation times to
comply with those requested, and create indicators enabling them to identify
possible problems which might hinder the delivery of the observation data to the
scientific community. It formerly took years to create that tracking tool due to
the efforts and resources required, now it is only a matter of months, as the
approach to solve the problem moved from a software development perspective to a
data science perspective, where Dataiku supports every step from accessing the
data to providing the tools to present the results to the consumer, and the
focus of the analysts was no longer debugging code but understanding the data
and obtaining the information needed out of it. Eventually, the biggest value
brought by Dataiku relates to powering scientific discoveries: not only are we
producing scientific data, but we are starting to look into it to make our
operations more efficient, so as to increase the number of hours in the sky by
lowering the hours needed to keep everything working as expected, and to make
the best use possible of those hours by improving the quality of the
observations.
0 7
Posted by Ignacio_Toledo
Malakoff Humanis - Leveraging AI to Democratize Insights From Customer Feedback
Team members:Nikola Lackovic,  Data Scientist (NLP & voice technology
specialist)Gauthier Lalande Layal Saad-Koubeissi Zhijie
ZhouCountry:FranceOrganization:Malakoff HumanisDescription:Malakoff Humanis is
one of France's leading social protection groups, covering all the insurance
needs of people in supplementary pensions, health, welfare, and savings.Awards
Categories: Organizational Transformation AI Democratization & Inclusivity Value
at Scale Alan Tuning Challenge: Speech Analytics aims at analyzing the category
of calls within the CRM framework, so as to enable different internal
stakeholders to leverage oral feedback received to improve our product  and
customer experience. We therefore needed a solution able to receive, treat,
analyze, classify and output the data to a visualisation tool, from an external
server to a PowerBI interface.  Dataiku enabled us to overcome the main
challenges encountered: It helped us integrate the fully scaled solution with
AWS S3 containers in order to store the data. The entire pipeline was then set 
up without using any additional components and everything was built using the
user graphic interface, apart from Python recipes which were needed for various
reasons. The dynamic and adaptive type handling was a feature which eased the
process of implementation all along the way. Data preparation and several
painful jobs were done using the in-built recipes and permitted to bypass the
weight of coding everything in Python 3. The graph-based solution is very nice
to grasp the entire workflow at a glance, also easing the process of
metacognition over the entire pipeline. Dataiku was then exposing data back to
S3 from which the PowerBI was then linked in order to display the data.
Solution: Speech Analytics is a horizontal product available at everyone's
fingertips - from a technician looking to solve product issues thanks to client
feedback, to a high-level manager visualizing the interactivity of the client
with multiple teams within the organization.  Input data consists of different
types of client data: conversational transcripts of calls, metadata from IVS,
and CRM knowledge-base.  Call transcripts are established thanks to a
Speech-to-Text external partner, along with several description metrics to
facilitate data comprehension - hence integrating multi-model data presented an
interesting challenge for the project.  A state-of-the-art fine-tuned
transformer for French language, called camemBERT, was implemented for Natural
Language Processing. We also leveraged a tonal (positive, neutral, negative)
model built by Dataiku Data Scientists in order to predict the sentiment of a
conversation. All along the process, every step was built within a design node
in order to make a prototype that was therefore tested within the pipeline. When
the use case was working under the design node, we built the scenario to run
every hour within call centers hours, and migrated to the automation node. The
automation node-design is up 24/7. It is a sanctuary on which we migrate the
data workflow from the design node. The recipes used in the flow are: SQL
recipes, Python Recipes, Data Preparation and Machine Learning Recipes. The
entire flow built within the Dataiku platform is now running every day from
Monday to Friday during working hours, 9 AM to 7 PM (GMT+02). As a latest
development, a retro-feedback loop based on call center helpers has been
implemented to feed the transformer -  this will be pushed to production in the
next internal release.  This is also permitted with the integration of the EKS
clusters technology within the DSS framework with one of our Data Engineers,
which will enable us to scale to the maximum monitoring of data (85% of all
calls). Impact: The benefits are multi-faceted: Cost savings The solution will
enable us to automatize 45 seconds per call over 12 millions calls handled every
year, which leads to tremendous cost saving in terms of human resources
dedicated to answering those calls.  Improved customer satisfaction Customers
are getting faster answers to their questions, and more valuable interactions as
they are directed  to our most relevant team members for their requests - who
will provide them with support and guidance, beyond the usual transactional
interaction.  Data science democratization Through making conversational data
available to the broader organization, Speech Analytics empowers people to gain
insights from customer feedback.  In order to display the results, we leverage a
visual stack in Microsoft PowerBI, which is a very easy and affordable way to
enhance capabilities of. gathering information. The next development is to
trigger actions within other components of the informational ecosystem. For
instance, we're looking into developing in Dataiku a suggestion trigger for tele
counsellor allocation, so that for every traffic group within a IVS cluster, we
will be able to predict the t+1 call volume in order to hourly adapt the tele
counsellor presence in call centers.
0 2
Posted by onevirtual
Ericsson – Human-centered Machine Learning for Dimensioning Resources in
Telecoms
Name:Marcial GutierrezTitle:Senior Specialist in AI, ML and Data
AnalyticsCountry:SwedenOrganization:EricssonDescription:Ericsson provides
high-performing solutions to enable its customers to capture the full value of
connectivity. The Company supplies communication infrastructure, services and
software to the telecom industry and other sectors.Awards Categories:
Organizational Transformation AI Democratization & Inclusivity Responsible AI
Value at Scale Alan Tuning Challenge:   The IP Multimedia Subsystem, or IMS, is
a core network technology that provides Communication Services to people in
wireless and wireline networks. These services range from Voice and Video (e.g.
over 4G and 5G) to Emergency Calling and Enriched Messaging. Typically, IMS is
offered in the form of network functions as software, and it is deployed on a
specific operator’s private cloud infrastructure. Before deploying/instantiating
IMS network functions to provide the aforementioned services, a dimensioning
process is conducted by the supplier (e.g. Ericsson) in order to estimate, based
on the network’s user traffic model, how many resources (in the form of CPU Load
or Memory) will the IMS network functions require from the target cloud
environment so as to serve those subscribers accordingly. In other
words, dimensioning is the process of predicting how much CPU Load, Memory,
Storage and Networking would be required. Dimensioning of IMS or any Ericsson
products and services with the highest accuracy is critical, so that a proper
offer is submitted to the potential Ericsson Customers. This is also key to
avoid contractual penalties, which in case they are incurred, can impact
Ericsson and customer trust above all. Due to the high stakes and complexity the
dimensioning task has, the process needs to be conducted with a human-centered
approach supported by interfaces and a trustworthy calculations backend.Given
that our IMS network elements generate statistical traffic data (in the form of
Performance Management counters), data-driven ways to perform dimensioning are
identified as the next evolutionary step to address these important needs.
Before Dataiku DSS, the overall challenge we faced was to actually have a
Machine Learning-based backend which could take this traffic statistical data,
treat it and deal with everything we needed in terms of model training and
inference for dimensioning, while having the ability to interact with it, as a
service (black box), via Rest API calls only. The approach above would enable us
to build a Web Application in front of this Machine Learning backend, in order
to address our user-centered needs, while achieving high accuracy, as depicted
below. Our application has a working title of Data-Driven CANDI (CANDI =
Capacity and Dimensioning): Solution: Dataiku DSS allowed the successful
realization of the whole concept we were after. More specifically, the major
pain points Dataiku addresses for us are: Industrializing data preparation,
model training, evaluation, deployment, and life-cycle management in one single
place, completely controlled via Rest API calls. Training different estimators
(AutoML) with datasets from different live networks and select the best of
them for deployment and inference. Creating specific estimators, in the form of
custom plugins, which are resilient to the characteristics of the
Telecommunications data, and which can be added to the industrialization
described above. Ensuring explainability of the deployed models was available
for the users of the Web Application. We have a systematic flow that takes the
data from a MongoDB database after the user uploads it via our WebApp. The
following depicts a figure with the flow that every project of Data-Driven CANDI
typically has. Via the WebApp, the user is able to request that a new ML model
is trained for its dimensioning needs. This is translated into a scenario
execution that takes care of running the above flow. DSS connects to the data
from the MongoDB instance, prepares it, and then runs an AutoML workflow. The
best model is selected and finally deployed, so that the user gets the necessary
predictions for dimensioning via the exposed Rest API endpoint for inference.
Feedback on what is happening under the hood is continuously provided to the
user via the WebApp. The following depicts all the steps created for the
scenario execution, and specifically the AutoML step: The resulting dimensioning
estimators have high accuracy. This is based on the data science work we have
done around this, and the custom plugin we created for the modeling phase, which
deals with the very particular aspects of our data to ensure generalization of
our models. The following picture shows the specific custom plugin used: Impact:
  Data-Driven CANDI as a whole is meant to provide considerable savings (~90%)
in R&D costs compared to that of the current dimensioning tool. Moreover, it
will provide us with the possibility to understand how our IMS software performs
over many different networks with different Cloud Environments characteristics,
beyond the ones we use internally in our labs for IMS verification. This all
translates into achieving accuracy levels we never had before, and thus,
increasing our forecasting and customer’s trust along the way. Explainability in
the Telecommunications industry is important to achieve as well. The possibility
to provide explainability into all of our dimensioning models to the Data-Driven
CANDI user is something very important to trust in this system, and ultimately
for the customer to also trust in the predictions obtained from the system and
forecasting capabilities The way Data-Driven CANDI has been architected is
innovative in the sense that all the complexity of dimensioning is hidden from
the user.This means that the dimensioning user is still able to perform its task
trusting that the system will provide an accurate result. Moreover, this
architecture and approach allows the possibility to expose the WebApp directly
to our customers, so that they are enabled to own and plan their network
CAPEX in terms of IMS and cloud resources. Our approach has the potential to be
generalized beyond IMS products (i.e. to other products in Ericsson such as 5G
Core, 5G New Radio, etc.). You may find out more about our modeling strategy in
the Ericsson’s next-gen AI-driven network dimensioning solution blog article.
0 3
Posted by gmarcial
Pr. Zervoudakis (New York University) - Shortening Time to Insights For Students
Name:Stavros ZervoudakisTitle:Assistant Professor (Adjunct)Country:United
StatesOrganization:New York UniversityDescription:Founded in 1831, NYU is one of
the world’s foremost research universities and is a member of the selective
Association of American Universities. The first Global Network University, NYU
has degree-granting university campuses in New York and Abu Dhabi, and has
announced a third in Shanghai; has a dozen other global academic sites,
including London, Paris, Florence, Tel Aviv, Buenos Aires, and Accra; and sends
more students to study abroad than any other U.S. college or university. Through
its numerous schools and colleges, NYU conducts research and provides education
in the arts and sciences, law, medicine, business, dentistry, education,
nursing, the cinematic and performing arts, music and studio arts, public
administration, social work, and continuing and professional studies, among
other areas.Awards Categories: Excellence in Teaching Challenge: I have designed
and have been teaching a 2-semester course on Applied Data Analytics at New York
University. The course has been running in the last few years. It starts with
basic topics on statistics and simple visuals, and ends the 2nd semester with
Deep Learning and AI frameworks. We start with Excel and Excel Data Analysis and
we move to Python and Python Data Science packages. The challenge has always
been to instill students with the necessary curiosity so they can master the
basics and learn how to approach data science problem solving in a way that they
own the answers. Typically, we go through learning what the concepts mean while
practicing using tools and code. Solution: Dataiku comes into this learning
journey after students have learned how to solve data science problems manually,
the “harder” way. By design of the course, Dataiku DSS is employed at the time
that students know how to answer these challenges. They are expected to have
mastered the theory and they know how to practice solving such problems in the
lab. Having a plethora of related capabilities, Dataiku creates a “wow” effect.
It shows them how they can go through the pipeline faster and more thoroughly. A
quote by my student this past 2021 Spring semester was;:“So now, by using
Dataiku, I can complete the course project in a matter of a few days instead of
a few weeks?” My answer was a simple “yes”, knowing from their homework
submissions that they knew how to complete the project without Dataiku. Impact:
The course uses Python, Python Statistical packages and Data Science/Machine
Learning/Deep Learning packages, Excel and Excel Data Analysis add-ons, as its
core tools to practice the concepts. At a high level, concepts that we cover
start with theory of data and analytics, then we move to the basic use of
spreadsheets and visualizations. At the same time, we touch upon basic Python
programming and move quickly to related packages. Next we do statistics and
probability theory, followed by more practice using both tool categories while
we continue with sampling, estimation, and statistical inference. After these
foundational ideas are mastered, and we cover prescriptive analytics thoroughly,
we move to predictive and prescriptive analytics concepts while we introduce
machine learning. A good amount of time is spent learning about how a good
number of algorithms work, the ins and outs of related math, while practicing
each of them with the appropriate dataset (sort of a mini project in the form of
a team homework). In the 2nd half of the 2nd semester, we review frameworks,
cloud computing, big data and we move to Deep Learning, Deep Learning
architectures and related packages, and close the course by touching upon
machine learning operations. Most of these concepts can be seen playing on the
Dataiku user interface. When my students learn to use Dataiku, it becomes the
‘aha’ moment, where they see that once they know what data science means, they
can use tools to help them execute a project faster and more thoroughly.t
0 1
Posted by Stavros
RBC’s RaptOR - Dynamic Audit Planning Through Machine Learning-Based Risk
Assessment
Team Members:Masood Ali (Senior Director, Data Strategy & Governance)Vincent
Huang (Director Data Science) Mark Subryan (Director Data Engineering) YuShing
Law (Director Analytics Ecosystem) Kanika Vij (Sr. Director Data Science and
Automation)Country:CanadaOrganization:Royal Bank of CanadaDescription:Royal Bank
of Canada (RY on TSX and NYSE) and its subsidiaries operate under the master
brand name RBC. We are one of Canada's biggest banks, and among the largest in
the world based on market capitalization. We are one of North America's leading
diversified financial services companies, and provide personal and commercial
banking, wealth management, insurance, investor services and capital markets
products and services on a global basis. Awards Categories: Organizational
Transformation Value at Scale Challenge: Background: Internal Audit’s annual
audit planning exercise comprises of two key components 1) risk assessment and
2) compilation of the audit plan. The risk assessment process results in the
risk rating of auditable entities (organizational units). Internal Audit
conducts risk assessment on over 400 auditable entities annually. The outcome of
the risk assessment forms the basis of the audit plan. Business Problem: The
annual audit planning process is subjective and a manually intensive process
comprising of several non-standardized offline processes to gather data points
to risk assess from different sources and compile audit plan. Therefore, it is a
time intensive process spanning many months to compile the annual audit plan.
Key Challenge: Our objective was to build a continuous risk assessment tool that
automates the monitoring of risk status and trends to provide a comprehensive
and dynamic view of risk for an audit entity at any given time and automate the
compilation of the audit plan. The above challenge required a platform which
provided the ability the perform extensive ETL related functions such as
building a system to ingest and process data from various systems and sources
across the Enterprise coupled with the ability to build and productionize
machine learning models all in one place. The scale of our project is enterprise
wide and the impact is department wide i.e. Internal Audit. This is where
Dataiku provided the ability to perform extensive ETL and Machine learning all
in one platform. Where did Dataiku fit into the picture? Solution: To enable a
data driven risk assessment in an automated way across the entire department,
following are the key areas in which Dataiku has facilitated:   1. Performing
ETL and integrating Machine Learning models in one platform i. Data Acquisition
– Setting up connections to source systems across the Enterprise. Currently,
there are 96 connections to databases throughout the enterprise with only 2
platforms partially on-boarded. We anticipate the final number to be
approximately 400 database connections when all platforms have been on-boarded.
ii. Data Pre-processing – All transformations to each dataset are captured
within their own project. The visualization of the pipelines reduces the need
for manual documentations on workflows and execution instructions, and the risk
of key people dependencies. When data is refreshed or new data arrives,
pipelines can be easily executed to re-perform the calculations. We currently
have over 700 intermediate datasets between raw inputs and the final staging
dataset encompassing a wide range of numbers of transformations and
calculations. Manual maintenance of these workflows would have challenging. iii.
Automated Productionized Work flows - Dataiku enables IA to put workflows into
production with a fraction of the staff and effort than custom coded or bespoke
applications. At the moment, we have 21 scenarios set up in which 6 of them
execute on a weekly or daily basis. The team receives email notifications of
scenario executions and will promptly address failed runs. This fits our agile
approach because we can respond to user enhancements faster. Also, the entire
process is de-risked as we can roll-back the changes easily iv. Computations -
Raptor in its current form consume approximately 7.58 million rows of data and
performs over 174 million computations. Without a complete and dedicated
development team, setting up a large-scale project like this would have been
impossible. Dataiku provided the piping and basic infrastructure and this makes
it easy for small teams, such as ours, to put together large projects. v.
Machine Learning Models – Through Dataiku, we were able to easily set up a
pipeline to consume data from an API, engineer features, prototype two different
models with Dataiku’s Lab and deploy it with minimal friction. The model outputs
were integrated with additional Enterprise data to derive additional insights.
Dataiku was instrumental in this as it allowed us to monitor model performance
and schedule model retraining and executions. vi. Workflow Management - If
Dataiku wasn’t there, there would be a lot of spaghetti code to deal with on
people’s laptops given the number of individuals involved in the project.
Dataiku facilitates the organization and visualization of the workflows, which
makes for an easier review as well as reduces key people dependency. vii.
Scheduling workflows and adding dependencies – The risk assessments are to be
updated on a quarterly basis. This entails a number of upstream and downstream
dependencies. Dataiku makes it easier to schedule workflows and take into
account the dependencies. viii. Dataiku visual recipes – Dataiku’s visual
recipes helped in joining and pre-processing datasets in an efficient manner.
This prevented time being spent on writing long and cumbersome spark/SQL code.
ix. Freedom to focus on the problem – Dataiku has enabled IA to reduce the
coding footprint to one-tenth of what it would be from a custom coded
application. It gives Data Scientists/Engineers and Analysts the freedom to
focus on problem they are trying to solve rather than having to wade through the
overhead of handling miscellaneous IT issues. E.g., code environment issues, the
code works on one person’s desktop but not the other. Also the data scientist
doesn’t need to have a strong understanding of the details of how the system is
being solutioned which allows them to focus on solving their task   2. Data
Governance Due to the project scope, data is being sourced and processed from
various source systems and teams across the Enterprise. This lends in itself key
concerns around Data Governance that Dataiku has helped address such as: i. Data
Lineage – Automating data lineage allows us to accurately capture what is
actually happening to the data, not what employees believe is happening. In
house built solution leverages Dataiku API to scan metadata in order to
establish catalogue of data assets and their associated lineage at a data
element level. This insight help identify that at IA there are 407 dataset
reused 310 times; 16,273 datasets, 840,000 data elements consumed across
analytics projects at IA. ii. Dataiku Metadata Integration with Collibra –
Lineage results are then integrated with Enterprise Data Governance Platform
Collibra leveraging APIs. Dataiku helped speed up documenting lineage of Raptor
related KRIs to instill transparency in data consumed to risk assessed audit
entities. Without Dataiku/Collibra integration it would have been 75% more
costly, 66% more time consuming and perhaps not feasible to contribute 1 million
inventory of data assets for lineage and keeping it up to date on a daily basis.
iii. Data Quality – Raptor application derives 100’s of Key Risk Indicators
(KRI) using 1000’s of critical data elements from variety of enterprise data
sources. Knowing quality of critical data elements informing KRIs for audit
planning decisions is very important. Dataiku’s data profiling, tagging, recipe
sharing and integration with python capabilities provided the framework through
which data quality checks were easily build and embedded in-line with data
ingestion process. Results are harvested automatically using Dataiku APIs to
integrate with Enterprise Data Governance Platform Collibra on a regular basis
avoiding lots of manual effort. iv. Adherence to coding practices and version
control – It would be simply impossible to adhere to coding practices and
version control in a project of such a large scale if code was to be maintained
offline on the team member’s laptops. There is a feature in Dataiku that helps
to modularize and build libraries that team members can access and apply the
same function across different datasets. For example, to streamline the same
data quality (DQ) check across all datasets we built a library of DQ checks
which the various data analysts on the project team can leverage in a
standardized manner. Impact: Benefits are multi-faceted, and most impactful on
two major areas: 1. Operational Efficiencies Department wide i. Time savings
from automating the continuous risk assessment process by streamlining
administrative processes related to data sourcing, processing and risk
calculations related to risk assessment. ii. Reduction in manual processes and
various end user computing tools such as excel files. iii. Flexibility to
diverge resources to platforms with elevated areas of risk and highest impact.
iv. Increase in consistency and repeatability of risk assessment process. 2.
Quicker adjustments to the audit plan i. Enterprise audit plan coverage can be
aligned to areas of elevated risk. ii. Visibility into emerging and changing
risks on a continuous basis, which will help audit teams respond to changes in
the risk environment by pivoting on the audit plan.
0 4
Posted by LisaB
Schlumberger - Using Dataiku to Democratize AI Within the Organization
Team members:Valerian Guillot (Nerve Center Data Science Architect),
with:Sampath Reddy Jean-Marc Pietrzyk Jimmy Klinger Eimund LilandCountry:United
KingdomOrganization:SchlumbergerDescription:Schlumberger is a technology company
that partners with customers to access energy. Our people, representing over 160
nationalities, are providing leading digital solutions and deploying innovative
technologies to enable performance and sustainability for the global energy
industry.Awards Categories: Organizational Transformation AI Democratization &
Inclusivity Challenge: Democratizing AI within Schlumberger Schlumberger is
investing significantly in research and development to improve our product and
services for customers, and has been embarking on digital transformation
internally, as well as supporting our customers through their own
transformation. The main challenges Schlumberger has been facing are:
Re-skilling cohorts of petro technical experts into digital skills Ensuring
prior work on data driven topics are discoverable and reproducible Ensuring that
access to data is democratized, with a focus on: Data discoverability Data ease
of consumption Rights of use controls Ensuring that solutions designed &
prototyped have a clear delivery path to yield business impact Solution:
Schlumberger needed a single data science platform to access Schlumberger domain
data through no-code & low code interfaces, where prior work is easily
discoverable, and whose technology is close to the systems where the insights &
models will be deployed to. Leveraging dataiku, we have put in place a mechanism
where Schlumberger data scientists and technical experts can: Leverage the
Dataiku Data Catalogue to access curated domain views of Schlumberger business
systems data Leveraging Dataiku’s code samples & custom plugins capabilities to
access high frequency historical environmental exposure of Schlumberger drilling
equipment Leveraging Dataiku Visual & Cope recipes to build insights & models to
improve well construction performance and reliability Leveraging Dataiku
automation and API node capabilities, and its close integration with BI
solutions, to easily put models and insights available to wider populations in
the field. To support the internal adoption of Dataiku within Schlumberger,
we’ve developed and delivered a number of custom data science classes, focusing
on use cases relevant to Schlumberger’s population of technical experts . As
usage has scaled out, we’ve leveraged Microsoft’s Yammer to build a technical
community helping each other within Schlumberger. Impact:   8x increase in
Dataiku usage in last 18 months 6TB of data analyzed per day 40% of
contributions by non data scientists 42% yield on training classes 4x increase
in community help in 12 months 35% of data science projects are collaborative
720 days since the last day without data science commits Models & insights used
in 70 countries   Dataiku, and its close integration with the DELFI E&P
cognitive environment has been a key driver in democratizing the use of data
science within Schlumberger. We are measuring the effectiveness of
democratization through: The number of active users, where active users are
users making technical contributions (e.g. code change, flow change…) The job
code of the active users The  usage of the data access helpers The number of
projects going into production The graph above shows the growth in the number of
users per week making contributions to data science projects, growing 8-fold
since early 2019. The growth has been worldwide, with users all around the
world: The data democratization has been successful in onboarding our existing
population of data scientists, as well as technical experts, ranging from
maintenance technicians, service quality engineers, well engineers, etc. who are
now able to speak a common language, and make data-driven decisions such as:
Choosing the types of batteries to include in a downhole tool, by looking at the
historical environmental exposure of the tool  Choosing the drilling bottom hole
assembly to maximize operational reliability using BI solutions resulting from
data flows in dataiku Choosing when to replace equipment to reduce the risk of
downhole failure using PHM models trained in Dataiku Optimizing the choices of
drilling parameters to maximize performance, and minimize the energy
consumption. The distribution of contributions to data science projects (chart
below, in 2020-2021) shows that ~70% were made by users who are not data
scientists. Early 2021 data shows further growth in non-data science
contributions. Dataiku has also enabled collaborative work between data
scientists and domain experts, where 35% of the data science projects in Dataiku
are collaborative projects (defined as the fraction of projects where the
distribution of commits is spread amongst multiple users)   The growth in usage,
and the diversity in job code of users has proven the transformative value of
dataiku as a collaborative data science platform for Schlumberger. Supporting
the growth has been done on three axis: Domain views and helpers to access data
Custom training (instructor-led, virtual, and self training) Community based
technical support   Community engagement: As the user base growed, we have put
in a place a Bulletin Board where Dataiku practitioners can ask any technical
questions on Dataiku, or data science, in order to collectively learn from each
other: Snapshot of the Dataiku bulletin board on Yammer. The community
engagement, measured here as the number of messages read on a technical Yammer
chat over the last 365 days, shows the community engagement has tripled over the
last year:     Easing data access Accessing time series data from Schlumberger’s
operation had historically been a challenge, especially for predictive
maintenance purposes, which required being able to trace back the entire history
of each piece of equipment. Leveraging Dataiku plugins and global shared code,
domain specific helpers are implemented to retrieve the entire historical
exposure of each piece of downhole drilling equipment, using only a single line
of code.  The helpers are used up to 12,000 times per day, with approximately
6TB per day of data being analysed.   Training Dataiku is a platform that is
simple enough where a user can get started on his own. The complexity starts
from learning how to leverage Dataiku to access Schlumberger data, and time
series data. In order to effectively train our users, the focus had to be on the
ways to leverage Dataiku to access Schlumberger data, based on use cases
relevant to the technical experts.   In that effect, custom training manuals
were developed: Predicting the chances of success of a drilling run, and
identifying which controllable parameters would improve the chances of success
Accessing the historical time series to identify operating environments of the
equipments Accessing drilling time series data to identify similarities and
differences between drilling operations Accessing historical time series & tool
failure information to build failure predictive models. The cumulative views on
the manuals exceeds 5,000. The yield of instructor & virtual classes is on
average 42% (fraction of onboarded users still using Dataiku 6 months after the
training), where virtual classes had a 50% yield, and instructor led classes
30%.
0 3
Posted by Valerian
INSEEC U. - Dataiku as a Leading User-Friendly Data Science Platform for MBA
Students
Name:Linda ATTARITitle:Director of MSc 1 Data Management and MSc2 Data
Analytics, INSEEC U. Campus LyonCEO, Attari
ConsultingCountry:FranceOrganization:INSEEC U.Description:INSEEC U. is a private
institution of higher education and multidisciplinary research in Management,
Engineering Sciences, Communication & Digital and Political Sciences. With
locations in Paris, Lyon, Bordeaux and Chambéry-Savoie, INSEEC U. trains 25,000
students and 5,000 executives each year in classroom and distance learning, from
Bachelor's to DBA. The question of processing and analyzing data is becoming a
major issue for companies. How to exploit data so that it can support companies
in their strategic choices? This is the objective of the MSc 1 Data Management.
This training allows students to acquire the fundamentals of data marketing, big
data, data mining, and data processing. An introduction to AI, issues,
challenges and ethics is provided, and the specific lens of AI applied to
marketing is taught regarding data modeling and predictive issues. The objective
of the MSc 2 Data Analytics is to provide technical expertise, centered on 4
major axes: Understanding consumer behavior through the optimization of the
customer experience induced by the exploitation of unstructured data (photos,
blogs, articles, comments) Improving decision making through online data
analysis, predictive analytics, and machine learning Processes & improving the
quality of data, as well as adding value to it Conducting and managing big data
projects Awards Categories: Excellence in Teaching Challenge: In the digital
age, the deluge of data is creating new economic opportunities for companies,
and therefore our students must be prepared for this in our masters specialized
in data analytics. The ability to analyze massive data by training our students
in market tools represents a significant competitive advantage: from the
collection of heterogeneous data, to its analysis and visualization in real
time. We needed to extract the most relevant online data for the business to
identify the right information at the right time and place, so as to improve
decision making and optimize organizational performance. We had to choose
appropriate tools to understand and capitalize on this new reality: predictive
analytics and data intelligence. We also had to assess the value of datasets,
evaluate the evolution of the data market - from the collection process, to
cleaning, valorization, and interpretation. The questions that arose are: what
evangelistic tools exist on the market? How to understand the new ecosystem, and
how to best explain it? The old KPI key performance indicators become obsolete
as soon as they are defined, due to the agility of big data - therefore, how to
value new and more relevant indicators, such as the Knowledge Value Added (KVA)?
We needed to train new skills within our MSc Data Analytics by training future
executives to become quickly operational: we did not have technical solutions,
so we turned to the Dataiku platform in 2016 for students to practice with real
datasets and be supported in the decision-making process. Solution: As part of
the Academic program offering, Dataiku licenses have been provided free of
charge to students and teachers. The benefits are multiple: Students can
download Dataiku directly from the website and activate their license when they
first log in to the interface. Different deployment options are provided (Mac,
PC with VM, Amazon or Azure). Teachers and students have access to the Dataiku
e-learning website, which contains all the resources to quickly onboard on the
platform. Multiple solutions have been implemented with Dataiku: Data Wrangling:
Dataiku offers interactive data cleansing and enrichment. The user can easily
access more than 80 visual processors for code-free wrangling. Contextual
transformations are automatically suggested, and it is possible to suggest new
ones, as well as to perform mass actions on the data. Machine Learning: The
platform offers guided Machine Learning, enabling users to clean the data,
create new features, and build a model in a unified environment. Data Mining:
Dataiku provides visual insights thanks to a user-friendly interface. Using
drag-and-drop technology, users can easily create for data exploration. The 25
built-in ranking formats make it possible to understand the data at a glance.
Data Visualization: Users can quickly create histograms, maps, heatmaps, box
plots, etc. Visualizations can be set up very easily, and the data can be
explored using an intuitive drag and drop system. Dataiku is suitable for
teamwork and knowledge sharing. Different features facilitate collaboration. It
is possible to add documentation information and comments on each object, along
with "To do" lists that facilitate data project planning and delivery. Project
example: MSc1 Data Analytics & Marketing Manager – Students from the 2020-2021
course Manon Proton, in collaboration with Jérémy Kodaday and Johanna Tournadre:
The Adidas project implemented Nov 03, 2020: An example of data exploration:
discovery of the data and their display in the software interface. The platform
is intuitive and easy to use. The editor's interface is fluid and well-designed,
which enhances the user experience. Moreover, the user easily understands the
organization of the tools. It is possible to work in groups and in remote mode.
It is possible to use the software on Windows or Mac operating systems. Impact:
My course is dedicated to the big data provider market and benchmark of
technical and functional solutions. The students had to apply the testing
methodology seen in class by following all the steps of data preparation: import
the data, discover them, know how to organize them, clean them, enrich them in
order to perform their analysis. In addition, they applied the benchmark of
different solutions through working on the functional and technical
characteristics of the platform. For the functional characteristics, students
had to find out if it was possible to work with quality data preparation, build
relevant visualizations, and ensure traceability of the data among others.
Regarding the technical specifications, they had to check the import and export
format, the different possible types of external sources, the various
statistical representations, the recognition of the variety of data formats, the
volume of data accepted, and the UX. They were also tasked with setting up a
competitive mapping, and finally to explore the economic model of the solution
in order to make a recommendation. Dataiku stood out for its adaptability to
different operating systems, its recognition and quality of data, the easy
handling, and its ancillary software. Dataiku is a leader in terms of
completeness of vision, execution, and capability - so the platform is a
reference model in the algorithmic fields that can assist employees both in
marketing and in the prediction of events. The variety of possible applications
are as follows: - Evangelical Solution - Connectivity - Cleaning and Enrichment
- Machine Learning - Data Mining - Data Visualization - Workflow - Real-time
Scoring - Collaboration   See. Inseec Campus Lyon MSc2 Data Analytics student
report – Johanna Tournadre – Jérémy Kodaday – Manon Proton. Extract of some
specifications studied:   Volume, recognition, representation, and data quality:
0 2
Posted by L_Attari
Pr. Vazacopoulos (Stevens Institute of Technology) - Upskilling Students From
All Levels with Dataiku
Name:Alkividis VazacopoulosTitle:ProfessorCountry:United
StatesOrganization:Stevens Institute of TechnologyDescription:Stevens Institute
of Technology is a premier, private research university situated in Hoboken,
N.J. overlooking the Manhattan skyline. Founded in 1870, technological
innovation has been the hallmark and legacy of Stevens’ education and research
programs for more than 140 years. Within the university’s three schools and one
college, more than 6,100 undergraduate and graduate students collaborate with
more than 350 faculty members in an interdisciplinary, student-centric,
entrepreneurial environment to advance the frontiers of science and leverage
technology to confront global challenges.Awards Categories: Excellence in
Teaching Challenge: We’ve come across four main challenges on which Dataiku
helped: Companies want to hire students who can attest that they know how to use
Dataiku This challenge is resolved with the Dataiku Academy and its certified
Learning Path, and we’ve put together a special non credit course that the
students can take in order to learn Dataiku. In our Business Analytics program
for MBAs and Executive MBAs, the students do not have programming skills (R or
Python) Thanks to Dataiku’s visual interface, we have integrated Dataiku in the
600 and 610 BIA course for students to learn about Machine Learning and practice
with several examples. Grading Python code is very tedious Dataiku makes it more
simple to track and assess progress in the Machine Learning course. Many times,
we want to combine visual tools with Python code This is easily handled in
Dataiku, so that students can leverage the most relevant environment for each
challenge. Solution: Dataiku has helped in many aspects: Collaboration, which
also helps students learn from each other. Auto ML has helped a lot, especially
with the ability for students to find if a specific data set can lead to
relevant results. Students can improve their skills much faster thanks to the
many different technologies available in Dataiku. Students can combine Python
code with visual recipes to select the most efficient route for each step in
their data workflow. Merging and cleaning data sets is very important for us,
and made easy in Dataiku with just a few clicks. For all those reasons, we are
starting to use Dataiku in my industry capstone course! Impact: Our students are
currently using it for the Vaccination analytics summer project. Students from
Fall 2020 classes I taught using Dataiku were hired by a major pharmaceutical
company. Helping us teach the next generation of best-in-class data talents!
Several projects have been completed using Dataiku. Undergraduate students were
able to complete more advanced projects, such as sentiment analysis, and upskill
with Dataiku.
0 2
Posted by avazacopoulos
Schlumberger HR - Talent Acquisition Enablement with Machine Learning
Team members:Modhar Khan - Head of People AnalyticsRichard De Moucheron –
Director Total Talent ManagementWesley Noah – Global Compliance Managing Counsel
OperationsSejal Sagar Mehta – Application EngineerSudeep Goswami – HR
Applications ManagerRyan Stewart – Global Talent Acquisition Planning
ManagerSonia Badilla - Talent Acquisition Manager Western HemPhilip Irele
Evbomoen - Talent Acquisition Manager - Eastern HemBeth Kremer – North America
Recruiting ManagerZhi Chi – Data Engineer HRITSimon Spero (Dataiku) - Senior
Enterprise Customer Success ManagerCountry:United
StatesOrganization:SchlumbergerDescription:Schlumberger is a technology company
that partners with customers to access energy. Our people, representing over 160
nationalities, are providing leading digital solutions and deploying innovative
technologies to enable performance and sustainability for the global energy
industry. With expertise in more than 120 countries, we collaborate to create
technology that unlocks access to energy for the benefit of all.Awards
Categories: Responsible AI Value at Scale Challenge: Every year, more than 500k
candidates apply to Schlumberger across the globe. With our PeopleFirst
Strategy, we made a commitment towards improving Diversity & Inclusion in
everything we do as a company. Our Talent Acquisition team had stretched
investment and resources to vet these candidates, match them to business demand,
and do all of that efficiently with the utmost compliance. The challenge in
using AI & ML was to ensure that it will not have any negative impact on the
candidates and to continuously monitor such models that can be vetted and
improved, in case they generate any bias against any class. After vetting many
ready-made solutions, we found that they do not cover the complexities in 80+
countries, nor the number of profiles we hire for. Solution: Complex data
engineering: Making the data ready for exploration was a complex process as it
involved many internal and external data sources, as well as numerous
engineering steps and feature generations. With Dataiku, we were able to do that
at scale quickly and in a quality manner. See example of a project showing
Dataiku’s ability to handle complexity at scale: Ensemble modeling: From
advanced embedding models for text and features extraction to probabilistic
predictive workflow, Dataiku was able to handle customizations needed in our ML
workflows seamlessly. Model deployment: The API deployer proved to be an
efficient and cost effective feature, without requiring to add additional
infrastructure in the pipeline. Collaboration and adoption: Recruiters were able
to interact with the predictions and provide feedback in a true collaborative
manner. Impact: Pilot results (Q2-Q3 2021): The ensemble model developed was
able to rank 10,000 applicants with 82% agreement on the output by a committee
of recruiters and managers. The API deployed and connected to the candidate
processing system is now in the test phase, and we look to deploy it in select
countries by mid-year. We estimate that this will reduce the time of processing
candidates by more than 80%, while providing better experience to applicants
with timely feedback. This will also support agility in responding to critical
business needs.
0 4
Posted by modhar
Australia Post - Leveraging ML-based Forecasting To Optimize Capacity Planning
at Processing Facilities in a Large-scale Logistics Network
Name: James Walter, Senior Data ScientistYohan Ko, Senior Data EngineerBtou
Zhang, Network Operations LeadDuc Nguyen, Shift Production ManagerNormy Chamoun,
Head of Processing NSW/ACTSheral Rifat, Data Science ManagerPhil Chan, Data
Engineering ManagerDavid Barrett, Facility ManagerBoris Savkovic, Data Science
Manager Country: Australia Organization: Australia Post Australia Post is a
government business enterprise that provides postal services in Australia. We
are also Australia’s leading logistics and integrated services business. Last
year, we processed 2.6 billion items, delivered to 12.4 million delivery points
across the nation, and continued to provide essential government and financial
services via the country’s largest retail network. Awards Categories: Value at
Scale Partner Acceleration   Business Challenge: We are Australia’s leading
logistics and integrated services business. Last year, we processed 2.6 billion
items, delivered to 12.4 million delivery points across the nation, and
continued to provide essential government and financial services via the
country’s largest retail network. The global pandemic has accelerated e-commerce
growth with more households shopping online than ever before. Whilst Australia
Post has a long and proud history to lean on, we continue to face challenges
from ever-increasing parcel volumes and a great digital disruption that is
shaking up the wider logistics industry. This requires us to innovate and
transform. A key daily activity, at facilities within our logistics network,
relates to shift production managers being tasked with making daily
resource/staffing planning decisions, that seek to ensure that we process parcel
demand in a timely manner, whilst controlling for cost. Currently, these
decisions are being made based on limited, but best-available, information. Too
few staffing hours can result in sub-optimal throughput and parcel delays,
whilst too many staffing hours can unnecessarily increase labor spend. To
address this pain paint, our Data Science team developed a shift volume
forecasting algorithm in Dataiku. The model provides facility operators with
daily shift volume forecasts and translates this information into staffing
requirements. The algorithm was trialed in partnership with one of the biggest
processing facilities in the Southern Hemisphere, Sydney Parcel Facility, and is
now used to inform daily planning activities. Feedback from shift production
managers is that "based on the volume prediction, we were comfortable with not
running overtime the following morning. This paid off". Thus the model is
empowering managers to confidently make decisions regarding the need for
overtime. The approach is changing the way that facility operators make
decisions, resulting in significant operational dollar value savings (~15
million Australian Dollars [AUD] p.a. once rolled out nationally).   Business
Solution: We chose Dataiku as we were looking for an end-to-end data science
platform that simplified and automated many aspects of the data science and data
engineering workflow, allowing our team to deliver results faster and with fewer
frustrations. The team made use of Dataiku starting with initial exploratory
data analysis (EDA), for python coding and use of custom modules, through to
production deployment and BAU operation including model performance monitoring,
data monitoring, and resource monitoring. The end-to-end MLOps process in
Dataiku is streamlined, integrated, and easy to use. Specifically, Dataiku has
enabled us to easily manage the following key aspects of the MLOps workflow:
Complex dependencies in terms of libraries and virtual environments, by
abstracting many of the complexities that one usually faces when working with
dockerization or virtual environments. Scalability of our models by providing a
streamlined way to leverage a Kubernetes cluster on GCP to attain scale and to
enable further scale-out of the model to future facilities. Version control
functionality in Dataiku Data Science Studio (DSS) enables lineage tracking and
model versioning across the full lifecycle. Collaboration functionality whereby
data scientists, data engineers, and developers could co- develop and then serve
the models seamlessly to business users. ETL pipeline development, leveraging
both time-based and event-driven scenario execution, to process the real-time
data that is feeding into our models. We rapidly developed an ML model (random
forest model in Dataiku, leveraging the Visual ML capability) to forecast shift
volumes and labor/staffing requirements at each facility. Model deployment to
production in Dataiku was speedy and required minimal resources from data
engineering as many of the production processes are automated and handled by the
Dataiku platform. The metrics, checks, and testing capabilities have enabled us
to add quality assurance to our models   Business Area: Supply-chain/Supplier
Management/Service Delivery Use Case Stage: In Production   Value Generated:
This project is highly innovative, novel, and transformative within Australia
Post, as it is bringing real-time forecasts to users at our facilities, thus
enabling a level of real-time and data-driven decision-making that has not been
possible to date. In short, operational decisions can now be made in a timely
manner as required by the time-constrained daily cycle of our network operation
teams. Most importantly, these forecasts are relevant, actionable, accurate, and
highly automated through the use of the Dataiku platform. From a scale and
technical point of view, the forecast generation process is now streamlined
end-to-end in Dataiku, and can easily be scaled out to more facilities
nationally. Specific business metrics of success include: The data-driven
forecasting approach is changing the way that facility operators make decisions,
resulting in significant operational dollar value savings (~15 million AUD p.a.
once rolled out nationally), and significant uplifts in overall parcel
throughput within our network. The dollar value savings result from reduced
labor costs at facilities (reduced spending on on-demand agency staff), the
uplift in service quality, increased throughput at facilities, and freeing up
shift managers’ time. The process is now also fully automated, whereas
previously human operators would laboriously have to collate data from multiple
sources (including Excel spreadsheets), which was costly (human resources) and
did not have a level of automated quality assurance. The model is 25% more
accurate at shift volume forecasting than traditional human approaches. Uplift
in repeatability and consistency of labor forecasting for planning. We now have
a consistent standard and process that can be scaled out nationally, in a
consistent and repeatable manner.   Value Brought by Dataiku: The specific value
brought by the Dataiku platform and the Dataiku team include: Ability to
develop, deploy and operationalize ML models at speed and at scale, and within a
controlled and governed end-to-end data science workflow. Dataiku is a single
end-to-end integrated data science platform, from development to deployment to
BAU operation. This results in a streamlined and consistent process for ML and
data science work, across the full spectrum of data science work. Specifically:
The full data science and data engineering lifecycle are native to DSS. ModelOps
and MLOps frameworks are native to DSS, including versioning and dependency
management (two key challenges when it comes to production-grade deployments).
The option to leverage advanced models easily and to deploy at scale (using
Kubernetes), subject to best-practice MLOps practices as dictated/governed by
the Dataiku platform. In the future, we are also looking to leverage Apache
Spark as an execution engine within Dataiku as we continue to scale up and roll
out the solution nationally. The ability to leverage Kubernetes to train models
at scale, and to easily deploy many models in a production environment, via
elastic compute options. Dataiku acts as a seamless abstraction layer from the
complexity of the underlying big data processing technologies. Dataiku enabled
us to test many multiple models in parallel, including champion-challenger
frameworks, which accelerated the model development and model field testing
cycles. The Dataiku academy and the Australian and global Dataiku teams provided
excellent support to uplift our team, and also to support our end-to-end journey
from onboarding the platform all the way to our first production deployments and
operations, and beyond.   Value Type: Increase revenue Reduce cost Save time
Increase trust Value Range: Dozens of millions of $
3 10
Posted by boriss
Last reply Friday by SteveG
bp T&S - Re-Imagining Fundamental Analytics in bp Trading & Shipping
Team members: David Maerz, SVP Trading Analytics & InsightRobert Doubble,  VP
Trading Data AnalyticsCarl Hale, VP Programme ManagementDan Parisian, VP
Fundamentals Modelling & InfrastructureFor and on behalf of the Trading
Analytics & Insight and I&E dTA organizations in bp Trading & Shipping
Country: United Kingdom Organization: bp Trading & Shipping T&S is the energy
and commodity trading arm of bp and is one of the world’s leading energy,
marketing, operations, and trading organizations. We buy, sell, and move energy
across the globe to provide integrated solutions to over 12,000 customers in 140
countries. With upwards of 300 ships on the water at any given moment for bp,
T&S moves around 240 million tonnes of oil, gas, and refined products every
year. Awards Categories: Most Impactful Transformation Story   Business
Challenge: Immediately following the appointment of Bernard Looney as the new bp
CEO in 2020, the company announced an ambitious net zero low carbon agenda and
the transition from an international oil company to an integrated energy
company. Trading & Shipping (T&S), the energy and commodities trading division
within bp, is a key enabler of this strategic intent. With over 12,000 customers
worldwide, and a business spanning crude oil, refined products, natural gas,
power, LNG, biofuels, and low carbon, T&S helps keep the planet’s energy moving.
Its commercial success is underpinned by a world-class analytics capability,
comprising a global team of 160+ analysts in Europe, the Americas, and Asia.
They deliver actionable insights, advanced pricing models, and valuations of
complex structured deals that inform the deployment of risk by the commercial
teams. Possessing strong business acumen, seasoned market knowledge, and deep
technical know-how, they are the backbone of our ‘analytics edge’. Historically
many analysts have worked in vertically integrated silos, sourcing, cleaning,
exploring data, building models, and producing outputs largely independently of
one another. This frequently led to parochial, duplicative solutions that were
sometimes frail and often highly manual. Opportunities to collaborate, share
best practices, or to seek peer reviews were limited, and the development of
modular re-usable solutions to common business problems was a rarity. With bp
T&S mandated to grow revenue by expanding into new countries, entering new
markets, and scaling up existing business lines, demand for analytics will only
increase. Successfully navigating the energy transition will require an agile,
flexible analytics capability, one that our legacy working practices and Excel
tooling cannot provide. Eliciting change required a disruptive paradigm shift in
our ways of working.   Business Solution: bp’s new strategic direction provided
a powerful catalyst for a radical rethink of our analytics working practices and
organizational design. In 2021 we re-organized analytics along technical
discipline lines, embraced Agile, spun up four multidisciplinary Agile Squads,
and agreed that Dataiku would be the cornerstone of our modern strategic
analytics tooling. In addition, we created a specialist fundamentals modeling
discipline, one that would spearhead our transformation activity. Dataiku was a
natural choice for an enterprise AI platform for the T&S analytics organization.
With its concept of ‘Clickers’ and ‘Coders’, it was well matched to a population
equipped with a broad set of technical skills and differing levels of
proficiency, ranging from Excel novices to deep Python experts. Dataiku’s
emphasis on the collaborative development of end-to-end model workflows also
resonated powerfully with our goal of empowering cross-discipline squads to
reimagine our next generation of predictive models. In 2021, Agile squads in
London, Singapore, Houston, and Chicago kicked off our analytics transformation
journey. Informed by a small group of enthusiastic product owners, the squads
set about reimagining complex, high-value, and business-critical Excel models in
Dataiku. Favouring progress over perfection, our goal was to continuously accrue
benefits by engineering intuitive model workflows that benefit from superior
automation and increased robustness. We now have a growing number of
business-critical models in Dataiku, executing intraday as new market data
arrives without human intervention. Linked Excel Workbooks have been replaced by
simplified workflows comprising both visual recipes and bespoke Python code,
organized in logical Flow Zone groupings that afford standardization through
design modularity and bespoke, re-usable Plugins. What’s more, model outputs are
disseminated to traders via highly interactive self-service dashboards. As we
transform, we routinely engage with Dataiku to share feedback, seek technical
reviews of our design thinking, and learn how their customers are tackling
similar problems.   Value Generated: Now 18 months into our transformation
journey, we have a growing number of business-critical models executing daily on
Dataiku with a high degree of automation. Traders can interrogate model outputs
using interactive dashboards and experiment with custom market scenarios that
would be impracticable in Excel. By embracing Agile and fine-tuning it to our
business context, we have been able to continuously accrue benefits in
double-quick time. Furthermore, by eliminating manual processes for loading and
preparing data, utilizing job scheduling, and embracing superior automation, we
free up analysts from clerical tasks to instead focus on highly dynamic energy
and commodity markets. Through a process of continuous learning, we have
identified design patterns for common recurring tasks that are ripe for
modularization, either in the form of reusable Dataiku plugins, or by creating
bespoke Python libraries. By building out a suite of shared components, our
transformation trajectory is accelerating, with new models deployed more
quickly. As our momentum builds, so does our business impact across the trading
floors, as we transform legacy models at pace. Our work has received high praise
from senior T&S leadership, citing its ‘game-changing’ nature, as well as
recognition from Franziska Bell, SVP Digital Technology. In the case of low
carbon analytics, starting from greenfield, we have built out an entire suite of
analytics on Dataiku which has very quickly delivered material value. A key
enabler of our success is a strong partnership with the central IT team. The
provision of a robust multi-tenanted platform with ~150+ users is key to
building confidence in Dataiku and critical to embedding our new ways of
working. Arguably, our trailblazing analytics transformation is demonstrating to
both T&S and the wider organization how new digital investments can advance bp’s
commercial strategy.   Value Brought by Dataiku: Over the course of our 18-month
transformation journey, we have retired 140 Excel Workbooks, eliminating 500
spreadsheet tabs in the process. By replacing onerous clerical processes with
superior automation, we have saved 174 analyst hours per year. Analysts now have
more time to focus on high-value analytics. Models now run more quickly and more
often, allowing us to quickly disseminate actionable insights to the commercial
teams in response to market-moving events. Scenario analysis allows front-line
traders to quickly understand how changes to model parameters impact the
numerical output, helping them to build greater trade conviction and to deploy
risk with increased confidence. Agile working practices allow us to accrue
benefits rateably, unlike a waterfall-based approach. Analysts reap the benefits
of manual work being taken out of the system, while traders gain from having
access to more powerful tools to understand markets. Duplicative, siloed model
development processes have been superseded by collaborative, cross-discipline
working practices, and a centralized repository of models and libraries of
reusable components. Company knowledge is institutionalized, and key person risk
is reduced. With our oil, natural gas, power, and low carbon models now in a
single central location we can seamlessly construct cross-commodity views and
generate new commercial insights that were impossible while working in siloes.
Powerful machine learning algorithms and Dataiku’s ability to handle large data
sets provide the foundation for building our next generation of advanced
predictive models, something inconceivable in Excel. Our team is enthusiastic
and energized by what can be delivered through our new ways of working and by
embedding Dataiku at the heart of what we do. Empowered and encouraged, the team
will continue to employ Dataiku in innovative and novel ways to underpin the
future commercial success of bp T&S.
0 11
Posted by BobDoubble
Unilever - Building Self-service NLP for Analysts Worldwide
Names:Linda Hoeberigs, Head of Data Science & AI, PDC LabAsh Tapia, Data
Partnerships & Tools Stack ManagerCountry:United
KingdomOrganization:UnileverDescription:Everyday 2.5 billion people use a
Unilever product to look good, feel good or get more out of life. Our purpose is
to make sustainable living commonplace. We are home to some of the UK’s best-
known brands like Persil, Dove and Marmite, plus some others that are on their
way to becoming household favourites like Seventh Generation and Grom. We have
always been at the front of media revolutions whether that be the 1st print
advertisements in the 1890s or in 1955 when we became the 1st company to
advertise on British TV screens. Experimentation and bravery drive us and have
helped us become one of the UK’s most successful consumer goods companies.Awards
Categories: AI Democratization & Inclusivity Challenge: Our Unilever People Data
Centre (PDC) teams across the globe deal with vast amounts of unstructured text
data to gain insight into our customers, how they engage with our brands and
products, and what are the needs we are yet to tap into. The industry is moving
at a rapid pace which consequently requires a rapid generation of insights to
stay on top of the latest trends. The sheer amounts of data and the skills
required to analyse it efficiently exacerbate this problem. The answers our
marketeers, product research and development, and supply chain specialists also
require analytics approaches tailored to the business. Analyzing text data is a
complex task and often requires understanding complex language models and
Natural Language Processing techniques, which most of our marketeers do not
have. Their skills are focused on data analysis, so we had to find a way to
synthesize our text data into something that can be analyzed by our PDC
analysts, without compromising on our technical data science approach. Building
on this, the solution had to be flexible and able to work in multiple languages
with the aim to supply all analysts a tool that would be accessible in their
market. Solution: This solution was born via the democratization of a project
flow made up of several code recipes, as with most data science work, it is
often unknown how applicable and reusable a piece of code is until its is put
into practice. In this case, we were able to take these code recipes written by
our data scientists and encapsulated them into a plugin by collaborating with
our data engineers. Using the ability to create custom plugins, we developed a
plugin called Language Analyser which is readily available for use by anyone in
the PDC across the globe. It has allowed hundreds of analysts to be able to
apply Natural Language Processing (NLP), increasing the efficiency, quality, and
granularity of their work. What’s more, the ability to compare two text datasets
was implemented using the ability to have multiple datasets as input to a single
plugin, thus increasing the range of applications of this tool. To solve the
challenge of flexibility we employed a custom front-end, using HTML, CSS, and
JavaScript. By being able to create a user-friendly interface we were able to
break the barrier between technical terminology and algorithms, with
analyst-friendly terms. For an analyst to be able to use this plugin they merely
needed to supply a dataset and select their pre-processing steps such as
removing spammy authors from social media data, removing unnecessary stop words,
and cleaning the data of noise. From this they can then choose which NLP
techniques to apply to their data, including identifying general grammatical
entities, emojis, and Unilever relevant terms such as ingredients and
fragrances. Building from this, they can choose to enrich their analysis with
pre-tagged sentiment, adding a layer of depth to generated insights such as
which emojis are used in a positive context when discussing vegan foods. Our
data scientists are often focused on the accuracy and the processes behind the
scenes to turn unstructured data into something more structured, our analysts on
the other hand are focused on finding insights and presenting these back to
their stakeholders. Our solution makes use of static insights within Dataiku to
create a way of visualizing the data returned from the pre-processing and data
science processes. Being able to leverage such JavaScript libraries like D3
allowed us to collaborate with a dedicated design team to present the data in a
way that aided information presentation and insight discovery. Impact: The tool
has been received extremely well by analysts and other data scientists. It sees
strong usage every day across a wide span of research projects. The outputs
serve both as inspiration for further analyses such as theme detection, and as
discovery of language intricacies. One of the key reasons this solution was
implemented in a plugin was due to how it gave a single interface to multiple
common options using NLP. This resulted in analysts being able to use the
Language Analyser for data cleaning, tagging Unilever entities, or completing a
full comparative language analysis on two datasets. It allows the analysts to
see their text data in a new light in a matter of minutes. It goes without
saying this is now our analysts’ go-to text analysis tool. In addition, this
tool has been up and running for more than a year and has changed the way in
which Unilever informed marketing strategy for Hellmann’s, who found out which
foods over-index for lunch compared to other meal-moments, and thus were able to
generate more relatable meal moments in their campaigns. It has also informed
Comfort on which words to use in tone-of-voice strategy by finding out which
words over-index for millennials. The current team continues to better the tool
by integrating with other existing capabilities, for example topics and themes.
Which adjectives and adverbs describe each theme the best? What beauty
ingredients are most common for each topic? As we uncover more insights, our
questions grow more advanced, and this requires a forward-thinking strategy. In
addition, as we expand globally, questions like this are starting to pour in all
the way from Mexico to Japan – we have continuously worked to improve our
language coverage with the tool gone from supporting 12 languages at the start
to 30 languages currently. We design and develop with the analyst-in-mind and
market coverage has been a significant milestone. The Language Analyser has
allowed data scientists, data engineers, and visualization experts to
collaborate in a way that was previously siloed. It has paved the way for future
projects with regards to how we think about what data science process are
democratized into plugins for our global analysts to use. At the end of the day,
the Language Analyser has fundamentally changed how we view text analysis and
visualization – it has opened the business to new ideas and possibilities across
the globe.
0 4
Posted by ash
Fanalists - Bringing Data Marketing to Sports & Entertainment Organizations of
All Sizes
Name:Thierry de ReusTitle:Head of Tech &
DataCountry:NetherlandsOrganization:FanalistsDescription:Nowadays personalized
experiences are the norm. Personal attention, or the lack of, can make-or-break
your business. Fanalists supports organizations in sports, media and
entertainment to get to know their fans and to get a better gasp of their
business. Fanalists breaks data silos, centralizes and enriches data, creates
rich fan profiles and makes them available to analyze and communicate with.
Fanalists works for event and festival organizers, sports federations like the
Dutch hockey federation (KNHB), media companies such as The Walt Disney Company,
and sports organizations like cycling team Team Jumbo-Visma and football club
Anderlecht.Awards Categories: Organizational Transformation AI Democratization &
Inclusivity Value at Scale Challenge: The world of entertainment and sports is
inspiring. Talented event organizers create the most creative live concepts and
popular sports clubs create stimulating experiences for their fans. But really
getting a grip on the actual individual fan? Nah. Data is generally not their
comfort zone. Let alone understanding data science concepts like predictive
modeling to understand their fans and customers even better. It turned out to be
hard to truly get a grasp on the power of data, customer segmentation and
personalization by explaining the theory behind it. Something was missing.
Something that makes it transparent, visible and usable for creative marketers.
Without that missing link, organizations would never outgrow bulk marketing
campaigns and generic strategies. And that would be a shame: the interplay
between creativity and data results in such a powerful combination. Moreover,
Fanalists wants to support organizations in sports and entertainment in a
scalable manner. Creating ad hoc analysis and customized data models for more
than a handful organizations is not feasible. On the other hand, there will
always be elements fully tailored to the needs of the individual organization.
How can we create a clear and scalable workflow for our own data analysts and
experts? How can we achieve that, while making it understandable for the
talented marketers and campaign managers on the other end of the table? How can
we use Dataiku to make great use of the best features of the platform? And how
can we split project-specific details and configuration from the models in
Dataiku so our data flows won't grow into non-transferable messes over time?
Solution: We created a framework that ensures that our in-house data analysts
make use of a data flow that is as standardized as possible, while enabling the
individual data analyst to create changes or data models specifically for an
individual project. In short: we created an extensive ETL flow, divided the flow
in multiple phases and started managing project-specific settings and
definitions externally as much as possible. And we launched an interface that
brings transparency to the marketers and specialists at the other end of the
table. That interface is called the Fanalists Terminal, where everyone working
for the project, both Fanalists team members and our clients, can log in and see
what is in the data model. Phase 1: Integration Layer This phase consists of
bringing in the data and connecting to external sources. Generally this phase
contains a mix of sources, such as SQL databases, SFTP files, and datasets
retrieved by means of an API. At the end of this phase, all data is combined and
standardized according to the conventions we defined within the framework. All
columns in the available datasets are shown in the Fanalists Terminal as
so-called "data fields" to enable everyone to get a grasp of what actually is in
the database. Phase 2: Context and Settings Most data in entertainment and
sports does not speak for itself. A lot of information and context is in the
heads of humans. Keeping it that way does not really work for data models.
That's why everyone involved in the project can use the Fanalists Terminal to
add information and context to the available datasets. Which artists were
performing at the music festival last year? What type of sports event did we
sell for? What are the definitions of our marketing permissions? All this
information is managed in the Fanalists Terminal, and applied later on in the
data flow. This way, every project can be unique without making concessions to
the standard flow. Phase 3: Grouping and Modeling In this phase all data is
combined to create 360-profiles. After this phase, everyone involved in the
project can analyze and act on rich fan profiles. Moreover, within this phase
our data analysts can add prediction and clustering models. Usually a project
starts without advanced predictive models - but with a growing maturity of the
organization, models could be added when the project is ready for it. At
Fanalists, we defined multiple business models and have multiple predictive
models on the shelf that we can implement to match specific business models. For
instance, when our client has a membership model, we can add churn prediction to
the mix. But when we come across a client who is selling tickets for a yearly
festival, we can add a model predicting repeat customers to the stack. Phase 4:
Segmentation Based on the created 360-profiles, everyone involved in the project
can use the Fanalists Terminal to define fan segments. Again, all so-called
"data fields" are available to use and configure segments with. Because of phase
2 and 3, the existing data fields are enriched with additional information and
data points related to the predictive or clustering models. Impact: With a clear
data infrastructure, Fanalists is able to support organizations in sports and
entertainment. By making it as transparent and comprehensible as possible,
everything clicks into place, while also giving them the tools to play with
their data. Through this combination, our clients can get the most out of their
data driven marketing strategy. 1. Easily setting up the baseline infrastructure
for data analysis and dashboarding The data infrastructure makes it possible to
both analyze and act on the data of their fans. With an implemented data
infrastructure, including a Fanalists Terminal, marketers and strategists can
capitalize fully on the segments they created by analyzing them through
dashboarding, i.e. Qlik Sense and Looker. On the other hand marketers can use
this information to create personalized marketing campaigns and communication
flows, by syncing this information to marketing platforms like email services.
Using segments based on predictive models is therefore effortless and
comprehensible. And after Fanalists helped a client reach that stage, the fun
begins. 2. Enabling organizations of all sizes to leverage data insights,
without the need to hire specialists It speaks for itself that analyzing and
understanding their business and fans, leads to better decision-making,
marketing efficiency and eventually revenue increase. Rolling out a data-driven
marketing strategy not only leads to great fan experiences, it also results in
more loyal fans and more valuable customers. Fanalists makes it possible for
organizations without huge budgets and in-house data specialists to implement
this innovative way of working. 3. Reducing efforts (and cost) to take the
plunge toward data-driven marketing The described solution is beneficial for
Fanalists in improving efficiency, reducing the necessary capacity, and
improving quality of delivery in the long term. Since the data flow is as
standardized as possible, new projects are kicked off faster. And because
enrichments, definitions and segmentations are configured by the clients
themselves, there is significantly less back and forth communication.
Essentially it means hitting two targets with one shot: better internal
workflows result in lower costs for the client and therefore lower the risk to
take the plunge. So we can create great personalized fan experiences together.
0 4
Posted by thierrydereus
Leidos - Software Analysis Execution Process Improvement and Prediction Program
Team members:Karen Cheng, Principal Investigator, Data Scientist Ron Keesing,
Division ManagerMark Clark, Program ManagerCaitlin Burgess, Program Management
SupportTifani O’Brien, Pilot Project Lead and Concept InitiatorColeen Davis,
Data ScientistDavid Morgenthaler, Data ScientistJevon Spivey, Architecture
AdministratorCountry:United StatesOrganization:LeidosDescription:Leidos,
formerly known as Science Applications International Corporation (SAIC), is an
American defense, aviation, information technology (Lockheed Martin IS&GS), and
biomedical research company headquartered in Reston, Virginia, that provides
scientific, engineering, systems integration, and technical services. The Leidos
Innovations Center (LInC) rapidly prototypes and field solutions in areas such
as Artificial Intelligence/Machine Learning, big data, cyber, surveillance
systems, autonomy, sensors, applied biology, and directed energy. This project
is a Machine Learning and data analytics web-based deployment that analyzes
project execution data for continuous process evaluation and improvement using
the full lifecycle pipeline of Dataiku 1) data preparation, 2) data exploration
and visualization, 3) AutoML machine learning, and 4) web-based user dashboard
deployment.Awards Categories: Organizational Transformation Value at Scale
Excellence in Research Alan Tuning Challenge: Background: Software development
teams often don’t have sufficient actionable information and analysis to
reliably forecast efforts, or real time metrics, to monitor and assess the
production of software development teams. Our goal in this effort is to use
analytics to improve agile-based software project execution processes by
identifying key drivers of success, and predicting various outcomes. Business
Problem: The Software Development Analytics project creates data-mining
analytical and visualization approaches that Leidos will use to identify and
analyze software best practices. The team will use predictive machine learning
classification approaches that incorporate the identified key performance
indicators to accurately forecast software development success probabilities.
Predictive analytics will learn from historical performance data to predict and
quality-check anticipated level of efforts for successful task completions.
Lastly, the visualizations will be deployed via a web-accessible dashboard to
support ongoing program performance tracking and to make the data-mined
visualizations and predictive analytics accessible to interested parties.
Implementation Challenges: This research analyzes various data produced during
the agile software development process that indicates measurable business
activity impacting the quality and delivery of software code. Efficient data
Extraction, Translation, Loading (ETL), data cleaning, aggregation, and joining
is required to assemble and store the data. Our project plan was to initially
analyze pilot software programs that could scale in the future to support
evaluation of multiple programs. Therefore, an understandable and reproducible
pipeline is ideal. We desire to use state-of-the-art machine learning and
Bayesian analytics to identify the key drivers for successful software
execution, as well as discover pitfalls. We will also identify the best
technical approaches for classification and supervised predictive learning
approaches. This requires extensive data analysis and an iterative model
exploration approach. Lastly, as the insights discovered will also be used for
process monitoring and evaluation, a dashboard is will enable our technical
development team to make the results accessible to various stakeholders. This
project involves the full data analysis lifecycle from data wrangling to an
interactive dashboard that showcases the resulting visualizations and analytics
as depicted below. Solution: We employed Dataiku in all phases of our pipeline:
1. Repeatable pipelines and workflow analysis Dataiku greatly facilitates the
organization and visualization of the pipeline workflows. Dataiku’s DSS pipeline
allows us to easily scale the project to evaluate additional software programs
because we are able to quickly identify the single point within the pipeline
that needs modification, without disturbing the common components of the
pipeline. The clean workflow presentation helps our team keep the code more
maintainable and understandable. The sequential and modularized organizational
approach of the pipeline steps supports an easier transition when adding new
developers to the project, as the flow visualization is inherently
self-documenting since the processing steps are more apparent. 2. Data
acquisition and storage Dataiku was used to assemble, store, and “data wrangle”
the various input files. Dataiku’s built-in file system and database solutions
allowed us to quickly access the data and utilize SQL on the resulting datasets,
without requiring us to spend our time on building a data lake. 3. Data
processing Dataiku’s visual recipes supported rapid data transformations in data
joining, column manipulation and data pruning. Dataiku’s ability to combine
pre-packaged analyses with our own customized scripts gave us the significant
flexibility we required to accomplish all of our data transformation needs. 4.
Data visualization and analysis Dataiku’s rapid visualization of the raw and
processed data was invaluable in allowing us to gain a quick understanding of
the data distributions and data integrity. Dataiku greatly facilitates
identification of missing data, invalid data, and outliers, allowing us to have
confidence in the data we are processing. Dataiku’s built-in graphics were
intuitive, allowing us to quickly look at the composition of the data and the
relationships between datasets and enabling us to gain rapid understanding of
the value within the data. 5. Auto-ML Machine Learning We deployed Dataiku’s
Auto-ML approaches to verify performance of our candidate machine learning
classification and predictive models, as well as identify additional candidate
models that we should consider. Dataiku’s metrics evaluation interfaces allowed
us to quickly look at performance trade-offs using multiple industry-standard
metrics, and to identify overfitting conditions when training a model. DSS’s
model Evaluation Recipe allows us to ascertain performance on a given test set.
6. Web-based deployment We took advantage of Dataiku’s ability to integrate
web-based applications into the workflow. We were pleased that Dataiku supported
current leading-edge web-based deployment technologies, thus allowing us to
maintain our entire deployment implementation within the DSS workflow and to
host it from Dataiku’s web application services. 7. Amazon Elastic Container
Service for Kubernetes (EKS) architecture We instantiated Dataiku’s EKS
capabilities which allows us to integrate with AWS security and scale our future
development efforts.   Impact: Dataiku had a great impact on numerous aspects of
this project throughout the entire pipeline, the most important ones are
highlighted below. 1. Deployment efficiency Significant time-saving was achieved
in the combination, manipulation, and storage of data. We were able to implement
the data processing pipeline in days, as opposed to months. 2. Ability to focus
on our area of expertise in Machine Learning Not having to invest time in
database setup and file system organization allowed us to focus on our core
research interests that will address our machine learning challenges. By taking
advantage of Dataiku’s web deployment capabilities, we saved a significant
amount of time by avoiding the need to setup additional webservers.
Consequently, our team did not require a web application specialist. 3. More
robust organization and maintainability While this benefit can be overlooked,
the impact to an organization can be tremendous. Dataiku provided us with
additional version control, a framework for teamwork contribution, and process
step readability and maintainability. 4. Rapid Machine Learning exploration and
performance assessment Dataiku allowed us to search the algorithmic space and
performance efficiently. We were able to consider additional models we might not
have originally considered and were able to rapidly perform model tradeoffs. It
would normally be time-consuming to consider a large number of models, but
Dataiku makes this process efficient enabling us to look at tradeoffs between
candidate approaches such as neural network versus decision-tree
implementations. The model building process also allows us to fine-tune and
compare hyperparameter settings. 5. Excellence in research category only While
predictive approaches are often used to predict various data influences on the
dependent variable, such as neural networks and decision-trees, we are
interested in more than just the predictive results. One of our key research
areas in this project involves identifying the key drivers of the dependent
variable, in this case it is software project implementation planning and
timeliness success. This capability provides us the ability to learn from our
data to guide actionable software process improvement. The other research aspect
of this project is the identification of the best classification and predictive
approaches when predicting performance. Dataiku’s AutoML feature has greatly
helped us to rapidly identify and assess candidate algorithms, explore
hyperparameter settings, and to consider additional algorithms we may not have
thought of. We are also quickly able to retrain models using different
optimization goals. Since we are able to explore the algorithmic space quickly,
we are able to become confident that our final model is optimal for our problem
set. 6. Alan Turning category only In addition to the above, our project
innovations include combining the web deployment pipeline with the overall data
preparation and modeling pipeline. Historically, these project steps are
performed by different teams and require web developer support. The combined
pipeline approach was made possible by the latest version of Dataiku dashboard
capabilities, which include state-of-the-art web development libraries. This
end-to-end pipeline capability is visionary and leading-edge, enabling to deploy
the latest models in near real-time to our end users of the dashboard.
0 4
Posted by chengke
Researcher Frank Romo (University of Michigan) - Mapping Police Fatal Encounters
to Inform Future Policy
Team members:Frank Romo, Master of Urban Planning ResearcherHarley Etienne,
ProfessorCountry:United StatesOrganization:University of Michigan - Independent
ResearchersDescription:The team consists of Professor Harley Etienne and myself,
Frank Romo. Our research on this topic has been going on for over five years and
has been presented in various formats presentations, maps, videos, and community
workshops. Harley and I are independent researchers working to support community
safety and change through our academic research and community’s action.Awards
Categories: Organizational Transformation Excellence in Research Challenge: The
research project focuses on mapping Police Fatal Encounters. Our team cleaned,
mapped, and analyzed thousands of records from various data sets to better
understand the spatial distribution of fatal encounters in the United States
between the years 2015 and 2020. The main challenge we faced was comparing
records across multiple data sets and building a comprehensive dataset from
various partial sources. Using Dataiku, our team was able to combine multiple
datasets and create the first ever comprehensive dataset on this topic.
Solution: Dataiku helped by working with us to establish a workflow that allowed
us to not only create a comprehensive dataset but also perform spatial analysis,
as well as run regressions and static tests. We had a great support from Lorena
De La Parra (AI Strategist) and other team members to develop our strategy,
testing methods, and to deliver a final dataset that we could use for future
maps and visualizations. Impact: The results of this collaboration allowed
Professor Harley Etienne and myself to submit abstracts to multiple academic
journals. Currently, our research on Race and Policing in America is being
examined and reviewed by multiple academic journals for potential publication.
In addition, the dataset we created during this process was mapped and used in
various community presentations. In fact, our research, maps and analysis have
been highlighted on recent podcasts at MIT Community Innovators Lab and within
the geospatial industry with ESRI. We will continue to build on this great
momentum and continue to use the tools that Dataiku provides to help clean and
refine our data so that it can be presented to the public and help inform future
policy discussions.
0 5
Posted by fromo
IME - Building an Emotion Classification System on Videos
Emotion classification system on videos using the Dataiku deep learning for
images plugin.
0 1
Posted by mohamed-khamis
Dr. Haug (University of Bern) - Showing Students Industry Data Solutions with
Dataiku
Name:Sigve HaugTitle:PD Dr.Country:SwitzerlandOrganization:Data Science Lab,
University of BernDescription:The Data Science Lab at the University of Bern is
a cross-faculty initiative for open collaboration on data science, machine
learning and artificial intelligence topics. It supports and conducts research
projects and shares knowledge via a wide range of training offers and seminars.
The University of Bern is a university in the Swiss capital of Bern and was
founded in 1834. It is regulated and financed by the Canton of Bern. It is a
comprehensive university offering a broad choice of courses and programs in
eight faculties and some 150 institutes. With around 18’000 students, the
University of Bern is the third largest university in Switzerland.Awards
Categories: Excellence in Teaching Challenge: We offer data science and machine
learning training to students and professionals at all levels. The offers range
from initial data model design via programming, collaboration tools,
mathematics, statistical inference, ethics, etc. to deep learning with all its
application possibilities. A comprehensive and intuitive most-in-one solution
which simplifies and connects the full workflow was missing in our portfolio.
Solution: Through introducing students to Dataiku in voluntary training courses,
we are able to exemplify a high-level industry solution to data science and
machine learning with its advantages and possible limitations. Students in
particular appreciate the comprehensive capture of large parts of the workflow
in a data science project. Impact: The Dataiku training courses show students
how high-level industry solutions to data science and machine learning may look
like. In particular, they gain an impression of their advantages and possible
limitations. The students appreciate the comprehensive capture of large parts of
the workflow in a data science project. In future engagements, they may need to
work with Dataiku or similar products.
0 1
Posted by shaug
Oncrawl - Leveraging Dataiku for Predictive SEO as a Product Strategy
Team members:Vincent Terrasi, Product DirectorElodie Mondon, Data Engineer
Damien Garaud, Data
ScientistCountry:FranceOrganization:OncrawlDescription:Enterprise SEO platform
powered by the industry-leading SEO Crawler and Log Analyzer. Combine the power
of technical SEO, machine learning and data science for increased revenues from
search engines. Oncrawl offers two product suites to help you open Google’s
blackbox and increase website revenues based on reliable SEO data. Oncrawl
Insights: Unleash your SEO potential with prescriptive analysis. Unify your
search data and improve your site’s traffic, rankings and online revenues:
Analyze your website like Google does, no matter how large or complex your
website is. Understand the impact of ranking factors on crawl budget and organic
traffic Relies on 600+ indicators, advanced data exploration and actionable
dashboards. And Oncrawl Genius: Empower your SEO with data science and
automation. Use SEO data to build a more profitable business through BI, data
science and machine learning: Build custom solutions to business and marketing
problems with our API Use ready-made machine learning projects and adaptable
models applied to SEO Connect with Business Intelligence solutions for better
strategic decision-making Awards Category: Alan Tuning Challenge: Due to the
complexity of today’s markets, the growing opacity of search engine ranking
algorithms, and the sheer volume of data available affecting Search Engine
Optimization (SEO), the ability to easily manipulate and analyze data now makes
the difference between using it as a marketing tool, and leveraging it as an
executive-level product strategy. In SEO, the goal is to rank pages at the top
of search engine results. However, search engine ranking algorithms are based on
many factors and generally constitute a black box. Our clients wanted to know
the ranking factors that are most influential for their website. This is the
goal of predictive SEO. Exhaustive data, incl. indexed pages, links, logs, etc.
is collected to train a ML model to recognize the patterns between ranking
factors and actual page rank. It is designed to answer questions frequently
encountered in the field: how to predict crawl budget, how to detect anomalies
based on trends, how to generate SEO text, etc. Integrating technical SEO with a
data science platform is the best solution to provide the most efficient and
relevant insights to answer these.  Within the field, many different use cases
are possible: Identification of new or unindexed content for real-time indexing
requests SEO text generation Anomaly reporting based on trends in your crawl
results Prediction of future long tail trends Find ranking factors per URL or
group of URLs Monitor your crawl budget by category or subcategory to detect SEO
issues Detect the best new products for the next few weeks for featured
highlights Monitor your crawl budget based on different Google bots to focus on
the right technologies And lots more! Additionally, another challenge is access
to SEO data and data analysis skills in the filed: few specialists are also
skilled in data analysis, and few data analysis platforms have the ability to
easily interface with the sources of data used in SEO. Solution: API usage is
limited by calculation speed in API-based solutions that then use Python or R to
manipulate the data. Dataiku makes data manipulation simpler and more robust
when compared to traditional API usage and enables faster data integration. 
Oncrawl plugin for Dataiku provides a recipe enabling the easy export of URLs or
aggregated data from crawls, as well as logging monitoring events. Here's the
step-sby-step process:  Step 1: Import the data You can retrieve different
projects from Oncrawl, and request the latest crawls. You can therefore use both
data related to your site and data related to your competitors. This is not
possible directly in Oncrawl where each project corresponds to a specific
website. Step 2: Prepare the data Then, you need to prepare the data: clean up
missing data, rename columns, enrich the data if necessary. Step 3: Add
additional datasets Beyond the data linked to the crawl, you can add data from
other tools: keywords tool, backlink tool, etc. Step 4: Merge the data Then, you
simply have to merge the data, i.e. merge all the datasets based on the URL. The
goal is to understand what impacts the SEO for each URL or group of URLs. Once
the final dataset is ready, you can create a visual analysis.  Step 5: AutoML
Prediction You can click on ‘AutoML Prediction’: the interface helps you test
the most efficient algorithms and recommends the best one. You must then choose
on which variable the model should base its prediction. This is an essential
step, as you must determine it according to your needs. You will then see the
results of different algorithms on the same page, and be able to compare their
accuracy to select the best fit.  You should now have access to your results!
For each of the algorithms used, an AUC score between 0 and 1 is available, the
closest to 1 presenting the best results. You can dive into the details to
assess the accuracy and efficiency of the model, while 'Interpretation' will
give you detailed explanations about the metrics. You also look into which
keywords boost or penalize seach URL, which will help you determine where to
focus your efforts, depending on each site and the metrics involved. Impact: The
use cases we mentioned above are not "new" in data science or machine learning,
but they are newly accessible to the SEO community. As they often don't have
advanced data skills, our work has made it possible for SEO specialists to work
visually with SEO data. It also enables experienced data scientists and analysts
to more easily obtain SEO data, which was, until now, not a typical type of data
they had access to. SEO is a confirmed and growing field with an increasingly
important role in business strategy. Improving data analysis and making this
kind of data available for other purposes opens the door to more effective and
broader-reaching strategies, as well as significant savings in cost. These
strategies rely on the ability to implement the use cases listed above. For
example, analyzing data related to ranking factors and keywords, combined with
crawl data, can help identify URLs with textual content that should be improved.
For one customer, rewriting meta descriptions through machine learning and text
generation led to savings of 30 man-hours and 24,000 USD/month in SEO
"production" costs alone. This project has made it easier to get the data,
implement machine learning with Dataiku, and train a broader audience of
practitioners. In terms of productivity, the overall process is twice as
efficient: everything that is done in Dataiku would previously have been
developed in R or Python, would have to be tested extensively and taken a lot of
time to implement. In a few minutes, Dataiku is able to output all the variables
to be worked on in priority, for each of the variables the analysis is detailed,
and for each of the URLs we know what boosts or penalizes. Once the machine
learning model is in place, we can add new URLs and know even before putting the
content if there is a chance to be in the top 10 rankings! Up next: the value of
machine kearning for SEO The next step in the democratization of machine
learning for SEO will be to integrate the results of a Dataiku analysis directly
into the tools and interfaces known by SEO users. Oncrawl is working on a big
project to make steps even easier and with fewer clicks for the final user. Stay
tuned…
0 1
Posted by VincentOnCrawl
ADNOC – Building an Audit Intelligence Framework For Insights-driven Risk and
Performance Analytics
Team members:Ahmed Abujarad (SVP – Audit and Assurance)Darsan Krishnan (Manager
– Quality Assurance and Excellence, A&A) Malav Patel (VP, Internal Audit)
Niladri Das (Sr. Auditor, Internal Audit Analytics) Shiju Nair (Sr. Auditor,
Internal Audit Analytics) Aneeth Menon (Sr. Auditor, Quality Assurance and
Excellence)Mohammed K Al Mansoori (Manager – Business and Commercial Solutions)
Antonio Rivas (Sr. Architect, IT Business Solutions) Country:United Arab
EmiratesOrganization:Abu Dhabi National Oil Company (ADNOC)Description:We are a
leading Oil and Gas Company in UAE. Established in 1971, Abu Dhabi National Oil
Company (ADNOC) is a diversified group of energy and petrochemical companies
that employs more than 50,000 people and is a major contributor to the GDP of
the United Arab Emirates (UAE). ADNOC's Group companies operate in the fields of
exploration and production; oil refining and gas processing; chemicals and
petrochemicals; refined products and distribution; maritime transportation; and
support services including sales and marketing, human capital, legal, finance
and IT. ADNOC has been named the UAE’s most valuable brand for a second
consecutive year starting from 2019, 28.6% increase over the previous year and
145% increase since the launch of its transformation strategy in 2017, making it
the fastest-growing brand in the Middle East and the first UAE brand to surpass
$10 billion in value.Awards Categories: Organizational Transformation AI
Democratization & Inclusivity Value at Scale Challenge: At ADNOC, the Internal
Audit team works on a wide domain of Auditing services across ADNOC Head
Quarters and 14+ Group Companies. Below are the key challenges that Internal
Audit was facing: 1. Necessity of having a centralized monitoring solution for
Internal Audit governance, planning, execution, and quality assurance &
improvement programs Audit Management System hosted in ADNOC HQ was rolled out
to ADNOC Group Companies in 2019. The primary challenge was to figure out an
Audit Intelligence framework to drive multiple data-driven analytical solutions
that would connect group companies on a near real-time basis and support
Internal Audit management in deriving key insights on performance, audit
completion, time tracking, efficiency, and cost optimization. The audit
management was not having a suitable digital platform to continuously measure,
monitor, and improve performance. 2. Necessity for an automated process for
Internal Audit action tracking and performance analysis across the Group With
180+ auditors across 14+ Group Companies, there were more than 30,000 Internal
Audit findings issued at various levels within the organization. There was a
pressing need to thoroughly analyze the nature of audit findings and provide
insights to the ADNOC Group Management, Audit Committees, or Boards on general
vulnerabilities and effectiveness of policies to drive improvements & value
generation. Some of the key challenges faced by the Internal Audit team were:
Labor-intensive manual consolidation and report generation of the findings, and
corresponding computation of action statistics which was prone to human error.
Highly time-consuming consolidation of all the Group Companies data to gain
insights. Interacting with the business focal points & auditee management was
manual, and efficiency of the action follow up process needed improvement.
Monitoring and reporting status of audit actions to the Audit Committee and
respective Company Management was a time-consuming process due to lack of
centralized repository of audit information. Data exploration and performance
analysis across various Group companies was nearly impossible, as the
information was scattered and not systemically controlled with proper
standardization. Long-standing overdue findings were impacting Companies ability
to improve internal controls and realize value benefits timely. ADNOC Group
Audit Excellence objective was to consolidate all the findings and perform
insights-driven risk and performance analytics across the Group. A need for an
appropriate Audit Intelligence platform was eminent to drive Internal Audit
performance and value. 3. Other challenges demanding a data science and
analytics tool were as below: Information retrieval from Audit Management system
and Share-Point/One-Drive flat files via APIs and automate scheduled analytical
jobs. Live connection to a central Audit Analytics Data Mart to perform all
analytical trends and risks predictions from findings across the Group. Have one
central data science and analytics platform as an enterprise tool to connect all
group company’s data and perform analytics. Complex organization structures and
business hierarchy across the Group. New competency requirements and ability to
quickly cross train auditors to perform Extraction, Transformation & Load (ETL)
and analytics with minimal help and guidance the central analytics team. Access
control & confidentiality of information. Establishing governance & process. To
have a daily refresh and scheduling process. Solution: Audit Intelligence
Framework As part of the Audit Intelligence framework, two digital solutions
were established: Group Internal Audit Performance Analytics Group Central Audit
Action Follow-up Analytics An end-to-end architectural design was established
from data ingestion to visualization prior to the development: In 2019, ADNOC
Group Audit Excellence team took pilot steps in standardizing the audit process,
classifying audit findings based on risk rating and the management action plans.
Committed action closure target dates were added, so that the most critical and
high value actions were taken up by business on priority basis to minimize the
risk and optimize value realization. In view of this, the Group Central
Follow-up Analytics Solution was established and rolled out in 2019 – 2020
across the Group. In 2020, Group Internal Audit Performance analytics tool was
implemented to measure and monitor core Internal Audit KPIs on audit governance,
execution, and performance against set benchmarks and approved audit plans. The
analytics were provided to Internal Audit Leaders, Managers and Audit Committee
across the ADNOC Group to monitor audit execution rate, timelines, and findings
to take proactive measures for driving productivity and efficiency. This has
resulted in considerable value generation and cost savings by increasing
in-house productivity and reducing outsourcing costs. With its user-friendly
interface, visual debugging capabilities and workflow segregations along with
the power of data engineering, ETL, and analytics, Dataiku helped establish the
Audit Performance Analytics application in quick time with minimal training
efforts. Dataiku extensively supported the following areas to deliver the
solution: 1. Data Acquisition & Profiling Data Source connections were setup
across the source system, especially the Audit Management System API and
SharePoint for the Enterprise. We were able to integrate structured and
unstructured data and flat files across group companies. Data prepare recipe of
Dataiku helped in profiling the data to determine the accuracy, completeness,
and uniqueness. 2. Data Standardization and Enrichment With Dataiku’s Prepare
recipe, data fields were standardized to a common format to help prepare complex
joining of datasets for further analytical processing. 3. Data Processing &
Transformation Various transformations to the data were applied to prepare
intermediary logic encompassing multiple calculations. Dataiku’s data
visualization and Artificial Intelligence (AI) driven feature supported a great
extent in understanding the calculation outputs - even before performing the
recipe execution. Visual recipes (incl. stacking, window function, join
operations) provide great features in Dataiku and extensively helped in
processing datasets effectively without writing complex SQL scripts. We have
over 300+ datasets and a number of visual recipes for transformation, rolling
aggregations, and various data handling processes shared across projects. 4.
Forecast Analytics Models were developed to understand and predict the potential
spillover of Internal Audit plan based on execution rate for each Group Company.
These were continuously measured by Internal Audit users and operation plans
adjusted accordingly. 5. Processed Data Output Through Dataiku, we were able to
push the processed output to a central analytics environment for generating
further Business Intelligence (BI) visualizations and reporting. The
‘In-Database’ engine supported in performing complex calculations at database
level to generate results in short span of time. 6. Automated Workflows Dataiku
automation and scheduling capabilities helped to read data from multiple sources
and provided job process-level insights that made the entire projects to remove
all manual interventions and execute the process at set frequencies. Email
notification feature of Dataiku supported to alert users in any issues
encountered during data ingestion and processing. Impact: Driving efficiency and
performance with highest productivity was one of the key leadership messages
last year, and Internal Audit through digital projects has significantly
contributed in all areas of ADNOC Strategic pillars through well-defined KPIs
across 14+ Group Companies. ADNOC Group Internal audit was able to improve the
Measurable Value of Audit (in AED Billions) actions and follow them effectively
till the completion using analytics dashboards provided to the audit and
business management. More notable benefits of data analytics solution powered by
Dataiku include: Centralized data repository using a single version of truth, by
connecting Internal Audit information across ADNOC HQ and 14+ Group Companies.
Gain better insights on the Group Internal Audit spectrum. Improved and informed
decision making with up-to-date information. Cost optimization by reducing
outsourcing demand, thanks to increased in-house Internal Audit productivity.
Improved operational efficiency through KPIs & SLAs. Better focus on identifying
trends and Internal Audit performance across Group Companies. Quick turnaround
of performance with accurate reporting of KPIs to management. Leveraging Dataiku
to augment Internal Audit activities. Reduction of Internal Audit action overdue
and overall improvement in Risk Assurance & Internal Controls. One year into our
Dataiku journey, ADNOC is running 6 large projects in Internal Audit execution
and performance, actioning insights across 180+ auditors and 1,500+ business
clients in 14+ group companies. These projects are all automated and running on
enterprise server with data stored in SQL schemas and providing end-to-end
visualizations in corporate BI tool. With the help of Dataiku, we are also able
to plan and design a central framework of Continuous Control Assurance
analytics, which will provide analytics, augmented audits, and advanced
analytics with predictive risks and detections - which will help reduce the
leakages and establish the right level of governance and controls. In 2021,
ADNOC Group Audit Excellence vision is to successfully roll out continuous
control assurance projects running on top of SAP ERP platform for procurement
across all Group Companies, provide near real-time detections of high-risk
activities, as well as proactive insights to the ADNOC senior management &
deeper assurance to the Audit Committee & Board. The predictive and Machine
Learning capabilities in the tool is already being explored and under
development which will be interesting to share in the next Awards submission.
2 16
Posted by LisaB
Last reply 03-29-2022 by Data_Optimist
Ericsson - Optimizing Warehouse Space with Citizen Data Science
Team members:Ting Xiao, Automation DeveloperRafael Maia C, Automation Developer
Michel Benites Nascimento, Analytics Solution Designer Yao Lu, Supply Chain
ManagerCountry:United StatesOrganization:EricssonDescription:Ericsson provides
high-performing solutions to enable its customers to capture the full value of
connectivity. The Company supplies communication infrastructure, services and
software to the telecom industry and other sectors.Awards Categories: AI
Democratization & Inclusivity Value at Scale Challenge: As the world leader in
the rapidly changing environment of communications technology, Ericsson operates
many warehouses on a global scale. Optimizing the use of this space is a key
part of Ericsson’s lean supply chain. Our project goal is to provide accurate
estimates of the space for current and future inventory need. For most of the
products stored, the occupied space can be calculated using simple formulas.
However, for the remaining products this is not possible, thus historically the
estimates of the available space were inaccurate. Simply applying the formulas
will not satisfy the stakeholder requirements for high accuracy. Our challenges
can be summarized as such: Impossible to measure every single packaging on a
daily basis Logic to calculate the size of unknown packaging is unclear, and so
complex that it cannot be defined by the business No centralized platform to
perform the calculation Stakeholders require high reporting accuracy Solution:
After being introduced to the Dataiku platform at Ericsson, we realized that
Machine Learning could be used to estimate the space occupied by those remaining
products in order to increase the accuracy - and all this, just in one platform.
Since we already had the data stored in our data lake, Dataiku made it easy to
extract, clean and transform the data. The seamless way it integrates the data
flow simplified the process of dealing with the data sources. The data selection
using Dataiku is made easy through the dataset explorer window, picking the
right variable type, identifying incompatible data types is very easy and fast.
We could also test multiple different Machine Learning algorithms for
benchmarking, without having to code them. We were able to compare KNN, Random
Forest, XGBoost and a few others. The hyperparameter setup, metric selection,
and comparison with the charts output made it easy to spot the best algorithm.
Not only that, but the auto retrain and selection of the best technique also
allow our models to pick a better approach should it change with future data.
Model lifecycle management is achieved by scheduled model retrain, which will
pick up the changes in warehouse behavior and to provide more accurate
estimation. The entire flow of pulling data, scoring, and outputting is
automated via the Scenario feature. The output of our project is a dataset
containing accurate estimates for the space available in all warehouses. This
dataset is read by a dashboard in Tableau, displaying the insights visually to
all supply hub managers. Impact: The ultimate financial impact of our project is
that Ericsson saves on real-estate costs, by continuously optimizing the use of
the existing warehouses. This is thanks to the more accurate estimates from our
project. On top of the financial savings, our project provides operational
visibility on demand, a data-driven warehouse space management process, and
aligns with our strategy for digital transformation. The work that we initiated
in our Americas region is now being showcased within our Group Supply
organization. Being reusable around the world, our work will have an even
greater impact on Ericsson's transformation journey. Personally, as Citizen Data
Scientists, we are able to use AI to augment and optimize an existing process,
all without writing a single line of code. This inspires us to do the same for
many other use cases at Ericsson.
0 4
Posted by ayako
Effilab - Building More Robust Data Products for Digital Advertising, in Less
Time
Name:Caroline Cochet-EscartinTitle:Data
ScientistCountry:FranceOrganization:EffilabDescription:Effilab was initially a
digital advertising agency which was acquired by Solocal to develop a digital
product - Booster Contact - which offers customers higher visibility on search
engines through optimized campaigns and bids on Bing and Google AdWords. Solocal
offers these services for a fixed monthly fee, which is generated on a
prospect-by-prospect basis depending on their needs, especially to small local
enterprises.Awards Categories: Organizational Transformation Value at Scale
Challenge: The data team at Effilab was originally composed of Python
developers, data analysts, and data scientists. Our core missions were the
deployment and operationalization of two models: A pricer to provide automated
yet customized, per-customer quotes for core products, and deliver these quotes
through an API. A budgetizer to generate dynamic bidding on Google and Bing ads,
which is automatically tuned each day to keep up with recent performance and
campaign behavior. This project is quite sensitive for our customers given that
it automatically spends hundreds of thousands of euros a month. The solution
used before Dataiku was entirely home made in Python code, with the following
challenges: Lack of robustness in the data pipelines and model
operationalization. Gap between data engineers, developers, data scientists, and
data analysts. Slow changes, slow integration of new features, and model
deployment. Restricted data access. Solution: Both projects were deployed quite
quickly on Dataiku, using the Design, Automation and API nodes with the
following benefits: 1. Data scientists are more independent The team moved from
the in-house app developed by Python developers to Dataiku on Google Cloud
Platform, which removed the barrier between research/development of algorithms
and productization. This ultimately gives data scientists more independence and
more control over the data pipelines, as well as more time to focus on the
models that bring business value. 2. Time to production decreased Dataiku
enabled Effilab’s data team to reduce time-to-production by at least 3x. This
change was driven by the nature of Dataiku as a robust solution, including
allowing for algorithmic R&D to be facilitated by the model interface. 3.
Smoother and more robust overall processes In going from a mass of different
tools and attempting to cobble them together in-house (data connections, Python
recipes, Jupyter Notebooks, libraries integration for development, wiki,
scheduled scenarios, monitoring) to Dataiku as an all-in-one solution, the
overall processes and efficiency of the team are improved. 4. Data
democratization Later on, easy onboarding of data analysts that got access to
data pipelines and databases, and were able to contribute more and more easily.
  Machine learning modeling:Load and preprocess historical Visual feature and
model exploration Advanced fine-tuned model building Refreshing actionable
data:Sync business dataUpdate geo-demographic data         Impact: The value
generated revolves around 3 main elements: Cost savings The need for data
engineering to maintain the two products has largely decreased, thanks to the
robustness of the solution developed in Dataiku. The all-in-one platform enabled
us to remove bugs in the automatic bidding models, leading to cost savings in
the orders of dozens of thousands of euros a month (largely paying for the
Dataiku licence!). Time savings As explained above, Dataiku enabled us to
decrease time-to-production by at least 3x thanks to the central, flexible
platform, which allows for integrating different technologies (e.g. algorithmic
R&D) within the visual interface. Room for innovation Through enabling fast
onboarding of new team members and giving an easy way to act on data, new ideas
and new features emerged to improve the two products. Priceless!
0 2
Posted by Caroline
Pr. Wartenberg (Hochschule Hannover) - Dataiku as a Versatile Platform for BI &
Beyond
Name:Maylin WartenbergTitle:Professor.
Doctor.Country:GermanyOrganization:University of Applied Sciences and Arts
Hannover (Hochschule Hannover)Description:With around 10.000 students the
Hochschule Hannover is the second largest university in Hanover, the capital of
the federal state of Lower Saxony. Institutionalized in 1971 from various
educational precursors – the oldest dating back to the year 1791 – the
Hochschule Hannover offers a particularly wide range of subjects in five
faculties. The degree course 'Business Information Systems' is oriented towards
business information systems as an independent discipline. The experience that
business information systems specialists must have both thorough knowledge of
business administration and a comprehensive basic knowledge of computer science
is appropriately implemented through the contents of the degree course. One of
the specializations to choose from is 'Business Intelligence'.Awards Categories:
Excellence in Teaching Challenge: I teach a class in a Bachelor's degree on
advanced topics in Business Intelligence. Students focus on ‘Business
Information Systems’ and can choose ‘Business Intelligence’ as a specialization,
with two courses: ‘Data Warehouse’ and ‘Business Intelligence’. In my course,
Advanced topics in BI, I want to give the students a good overview about
possible further topics in the field of data science, not only theoretically,
but with lots of practical experience. The topics include different aspects of
data science like data preparation, data visualization, and data analysis, as
well as topics such as data governance, collaboration, code languages, machine
learning, and even deep learning. Therefore I needed a software which
facilitates many different use cases based on a broad variety and the interests
of the students, and is easy to work with without prior experience. In addition
to that, I teach another class in a Master’s degree called ‘Digital
Transformation’. This is a consecutive Master based on a Bachelor’s degree in
Business Informatics. Some students have a background in business, others in IT.
The topic I teach is an introduction to artificial intelligence. Some of the
students already have experience in machine learning topics and are able to
program in Python, but some do not have any experience regarding AI. Therefore
it was difficult to work out hands-on exercises with such differences in prior
experience. Solution: I have been using Dataiku DSS for over 3 years in teaching
for both classes, and it works very well. It only takes a few basic tutorials to
get to know the software and to be able to work with it. The students can even
work on complex machine learning tasks within one semester. They have the
ability to use the integrated algorithms, or code their own. Dataiku DSS offers
such a great variety of topics in tutorials, documentations, and articles so
that the students are able to get to know many different aspects of working with
data. Each year I try to work with the newest version and constantly explore new
additions to the software. Impact: I would like to share this year's experience
in my class ‘Advanced Topics of BI’ as an example of the possibilities. The
students work in small groups and each group gets a special topic. They have to
present on the topic in general, and then create a small hands-on workshop for
other students in the class. This year, the topics were: Data Visualization &
Storytelling, especially Waterfall, Treemap, and Sunburst Charts Geospatial
Analytics, especially MapCharts using Reverse Geocoding/Admin maps Graph
Analytics, especially Social Network Analysis Exploratory Data Analysis using
Interactive Statistics Data Governance Connectivity/Data Sources, especially SQL
Data Tables and SQL Recipes ETL using recipes based on Code, especially Python
recipes Code - Notebooks, especially Python Notebooks Webapps, using Dash Deep
Learning - Different Libraries, and especially Keras Natural Language Processing
- Sentiment Analysis It really is a wide variety of topics that can be addressed
within one software framework. The students create their own scenarios and
create or find their own data as the setting for their workshop. Some integrate
pictures in their project overviews as you can see here: Some use the Wiki for
describing the tasks in the workshop:   Some use data that already creates an
interest in the topic for other students like the Social Network Analysis on the
Marvel Universe or Game of Thrones Data: The feedback of the students regarding
the software is always very good. They can be very creative and get to know
different topics in data science. The added hands-on experience makes the
presentation more interesting and increases the learning experience.
0 1
Posted by MW
Pr. Enobi (Live University) - Facilitating & Enhancing the Data Science Learning
Experience with Dataiku
Name:Fernando EnobiTitle:ProfessorCountry:BrazilOrganization:Live
UniversityDescription:Live University was created 17 years ago, with the dream
of transforming education. We believe that people should enjoy every step of
their lives and that includes when they decide to study. We are crazy about our
5 schools focused on different areas: HR, Purchasing and Supply Chain, Market
and Commercial Intelligence, Tax and Accounting, Data Science and IT.Awards
Categories: Excellence in Teaching Challenge: As a Professor teaching MBA
classes in data science, I faced multiple challenges: Students in more executive
levels or from non-IT departments are struggling to implement the practical data
science use cases. Students were developing the use cases on their desktops,
without any collaboration nor knowledge-sharing. All the classes have business
use cases that relate to a data science solution. Without Dataiku, students did
not have sufficient time to develop these projects. For me as a teacher, it was
very hard to monitor and support students during the project execution. There
was no “real” production environment for data projects. Students were always
complaining about running Python codes on their notebooks and using Excel
spreadsheets, which works fine for experimentation but not enough for bringing
corporate value. Solution: Our MBA program focuses on implementing real business
cases in leading data solutions. We have many students in leadership positions
who are building data-driven teams, organizations, and data stack strategies.
Dataiku came to our attention when we were searching the Gartner Magic Quadrant
to select well positioned data solutions, and Dataiku has been named a leader
two years running! We were amazed at the first contact with the Dataiku team,
who saw the importance of educating MBA students (also professionals in the
Brazilian market) on new technologies to fast track data solutions
implementation and adoption. Impact: First thing, all the activities are
prepared and organized on the platform, including use case scenarios,
distribution of tasks, insights documentation. Dataiku has a very fast learning
curve. Students are very excited to use the platform and in the very first
class, they already started planning capstone projects because of all the
available resources. The dynamic and interaction during the classes are now very
critical to guarantee the quality of students' experimentation and learning.
Below is a screen shot with 5 students interacting in a use case about Industry
4.0: they were working on a use case for predicting preventive maintenance in an
e-coating manufacturing plant, based on 2 data sets requiring non-supervised
learning. The use case was presented with little information to increase
complexity and stimulate data investigation. The students were able to carry out
the analysis using more concepts and models than originally covered in the
original MBA module, which was made possible by Dataiku’s friendly visual
environment and ability to connect with most current technologies. The students
were therefore able to upskill and do a very complete diagnosis in a short time
frame, leading to insights guiding business action plans. In summary, the
Dataiku platform increased the learning experience by a lot! I’ve not seen this
kind of experience in other schools in Brazil yet.
0 1
Posted by fenobi
Standard Chartered Bank - Building an Intelligent Data Operations for Financial
Planning and Performance Management
Team members:Craig Turrell, Head of Digital Centre of Excellence P2P,
with:Christopher HarveyDavid RogersRajesh A.Karthik C.Ramakrishnan D.Mahesh
IyerPriyanka JaiswalRajasekar KanniappanBenjamin KohVignesh Kp Vivek KumarNaresh
Babu Joshua SamuelK Santosh Satuluri Suhas Talanki Pushya ThimmaiahDheerendra
YadavCountry:SingaporeOrganization:Standard Chartered BankDescription:We are a
leading international banking group, with a presence in more than 60 of the
world’s most dynamic markets. Our purpose is to drive commerce and prosperity
through our unique diversity, and our heritage and values are expressed in our
brand promise, Here for good.Awards Categories: Organizational Transformation AI
Democratization & Inclusivity Value at Scale Challenge: At Standard Chartered
Bank, the Financial Operations Plan to Performance (P2P) division works on a
broad array of core financial statement and performance management systems of
the bank. We need to be able to look five years back and five years forward to
identify abnormalities and trends, do balance sheet analytics, and conduct cost
analysis to answer complex questions around how and where the bank is making
profit, how the bank behaves, who should be hired and where they should be
placed as related to cost profiles, etc. We provide the enterprise financial
performance data fabric that drives the organisation financial operational and
strategic thinking – the data and insight behind the decision. Of course, we had
the systems in place to do all of this for many years, but operationally we were
limited to millions of rows of data. While it sounds like a lot, the reality is
that teams could provide one or two levels of detail for the 10 core products of
the bank, or core primary country markets, and look at basic account structure
over about three months - and even at those dimensions, we had to start
splitting analysis in pieces. To answer questions for example for cost trend
across the entire bank, its cost centres, every account line and every cost
centre runs in the nearly 10 trillion possible questions. So we started
digitizing reports for CFOs, but soon realized that this approach wasn’t going
to influence the behaviors of the bank. We needed to find a way to impact the
day-to-day work of financial analysts, making them more efficient and effective.
When diving into the issue, we found out it was primarily a question of volume
to get from 10 million to 400 million rows of data, not a question of underlying
infrastructure — in fact, we already had robust compute warehousing, but almost
no one was using it. We needed to find a way to leverage that existing
ecosystem. Solution: In addition to finding a solution that leveraged our
existing infrastructure investments, we didn’t want to have to go looking for
another tool again in a few years when our team becomes mature enough to start
doing machine learning on their data. That’s when we found Dataiku, and it
solved volume straightaway. Within three to four weeks, we managed to turnover a
4.5 billion row table in a single operation. But Dataiku made us realize we
could do so much more than that. We had an army of people copy and pasting data
and, since we were now able to centralize all treatment within the platform in a
lightning-fast manner, Dataiku allowed us to have different conversations about
data. In the first nine months with Dataiku, the team churned out use cases from
around FP&A. The next step was productionalizing their system and patterns,
including ensuring there was discipline with data pipelines, SLAs, and more
stringent DigitalOps processes. That’s the power of Dataiku: unbounding freedom,
but also providing features to facilitate structure and processes. It made our
vision possible and our strategy a reality. Impact: The Digital MI team at
Standard Chartered Bank, led by Craig Turrell, overhauled three major systems at
the bank that produce summary financials and expose performance and planning
dashboards to thousands of stakeholders across the bank. This project is a major
achievement, automating laborious tasks previously done in spreadsheets,
increasing the scale and frequency of analytics, and delivering self-service
analytics capabilities in a governed, standardized way.   Key KPIs include:
Processing 10 million to 400 million+ rows of data, opening doors to future
innovation. Turning 2,500 hours down to a 10 minutes process, using governed
process automation. Increasing analyst productivity by a factor of 30 through
replacing spreadsheet processes with governed self-service analytics.
Accelerating overall time-to-market, delivering their use cases in production in
less than 9 months and turning idea-to-prototype time to under 12 weeks. We’re
also developing Standard Chartered Bank’s unique brand of data democratization
or self-service analytics, with a center of excellence (CoE) owning the core
structured intelligence of the bank. All enterprise-level data is centralized,
with product owners for every dataset and defined governance. From there, the
team builds specific experiences to deliver answers through core apps, and the
ultimate “self-serve” flexibility comes from how people around the organization
use those apps to solve business problems in their day-to-day. The CoE at
Standard Chartered Bank is currently made up of 16 people, but they will be
expanding to 30 and expect to be hundreds in the next few years to support the
growing demand and continue driving efficiencies around the business. There are
numerous communities across the bank leveraging Dataiku and building “digital
bridges” to the CoE’s core structured intelligence. On average we estimate that,
two people armed with Dataiku are doing the work of about 70 people limited to
spreadsheets. The goal in the coming years will be to continue to upskill people
with Dataiku to increase efficiency across more areas of Standard Chartered
Bank. In the months and years to come, we will also move into more predictive
analytics in the FP&A division, with a focus on predicting within the mid/short
term (3 months to a year). The vision is around a “supermind,” or a smart group
of independent agents working together to create a benchmark of intelligence
(if, say, 10 machines independently make predictions, taking those predictions
collectively will probably be close to reality). There will likely be
interesting learnings to share in next year’s Awards submission!
2 10
Posted by LisaB
Last reply 07-22-2021 by tinaresh
Lect. Beaumont, PhD (Columbia University) – Data Analysis Bridges Finance Theory
and Practice
Name:Perry Beaumont, PhDTitle:LecturerCountry:United StatesOrganization:Columbia
University, School of Professional StudiesDescription:The School of Professional
Studies is one of the schools comprising Columbia University in the city of New
York. It offers seventeen master's degrees, courses for advancement and graduate
school preparation, and certificate programs.Awards Categories: Excellence in
Teaching Challenge: For context, the applicable course as pertains to this
Submission is Security Analysis, and it is a finance class taught at Columbia
University within the School of Professional Studies. Security Analysis is also
the name of a classic investment book written by Benjamin Graham and David L.
Dodd, and Benjamin Graham was a professor of Warren Buffet when Buffet attended
the Columbia Business School. The class is very popular with students, and is
taught each semester. The course is a post-baccalaureate offering, and many
attendees are enrolled in a master’s degree program for business or public
service. The class lasts 12 weeks, and students generally have a basic
understanding of statistics and analytics. A core element of the course involves
building bridges for students between finance theory and practice, and the
homework exercise involving Dataiku specifically relates to identifying
important distinctions between the available attributes of a successful
brick-and-mortar retail business and an online retail business. A helpful way of
approaching this is to tap into a real-world dataset, as well as an online
enterprise platform. Accordingly, Google Query was used to access actual
(anonymized) eCommerce metrics from Google’s merchandise website, and Dataiku
was selected for performing the analysis. Solution: Dataiku was immensely
helpful, in different ways: Admin & tech support:  It was a great pleasure to
collaborate with Dataiku personnel inclusive of Adela Deanova (concept
development), Damien Jacquemart (programming contributions), and Josh Hewitt
(classroom account setups), who each uniquely contributed to the success of the
initiative. Product flexibility & extensiveness: By virtue of Dataiku making its
product available with a variety of venues, from a 14-day free trial to the
leveraging of synergies with AWS, Azure, Oracle VM VirtualBox, and more, an
array of learning opportunities are presented to help students appreciate the
value of the Dataiku proposition. Visual user interface: The visual enhancement
tools available within Dataiku per recipe, display of model results, and
graphing possibilities - all combine to help make for a meaningful interactive
learning experience. Impact: Generally speaking, the conversion rate for a
brick-and-mortar retail store is about 20%; that is, about 20% of the persons
who enter the store end up making a purchase. By contrast, the conversion rate
for a person who visits an online retail store is closer to 2%. Accordingly,
with the appreciably smaller number of conversions online, yet with the ability
to collect dozens of metrics related to a customer’s online experience (i.e.,
the customer’s device used to access site, length of time on pages, page path to
checkout, and so forth), there is an opportunity to identify the factors that
contribute to a greater likelihood of success in driving online sales. Even an
insight that results in an additional 0.5% point in sales (from 2% to 2.5%)
represents a 25% improvement in conversions (2.5%/2%-1=25%). As a result of
working through the Dataiku module, students were able to obtain a variety of
invaluable insights. Not only were they able to better grasp the enormous
amounts of data that can be generated by an eCommerce business, but were able to
appreciate the tremendous power of Dataiku to generate meaningful analyses from
especially large files. Their analytical skills increased markedly, though
perhaps even more impressive was the greater comfort level they exhibited with
regards to drawing connections between the mathematical results and the
practical implications. In the process, it became quite evident that students
were becoming increasingly confident with developing a bilingual vocabulary to
constructively evaluate both quantitative and qualitative dynamics of
decision-making. The recommendations they made very much reflected a depth and
breadth of understanding that went well beyond what would have been possible for
them to achieve simply by reading a case study. By way of one particular
example, the univariate analysis tool within Dataiku very much provided a useful
guide for students to evaluate the information content and value-add of each
variable within the dataset, and opened up constructive conversations related to
the true key performance indicators within an eCommerce context. In brief, by
virtue of digging into the data themselves, they were able to have a far richer
learning experience, and one that will surely stay with them for a long time to
come. By using actual Google Query data in combination with Dataiku, students
were able to see for themselves what the customer experience looks like with
well-defined data relationships, all while building bridges between textbook
theory and real-world insights. As an additional element of measuring success
with this initiative, students who have taken this course routinely contact me
to say that they are using Dataiku with other applications, both academic and in
the business world. In brief, they are taking the basic skills developed in the
classroom and are actively applying them in a variety of other contexts.
0 3
Posted by phb2120
Schlumberger HR - Federation of Data Science to Accelerate Talent Performance
Enablement
Team members:Modhar Khan - Head of People AnalyticsRichard De Moucheron –
Director Total Talent ManagementWesley Noah –Global Compliance Managing Counsel
OperationsRupinder Kaur – Data Scientist Talent AnalyticsSampath Reddy –
Analytics Product ChampionVipin Sharma - Technical Leads AnalyticsJuliette
Murray Lamotte – Global Compensation Value ManagerRafael Fejervary – Global
Talent ManagerSimon Spero (Dataiku) - Senior Enterprise Customer Success
Manager)Country:United StatesOrganization:SchlumbergerDescription:Schlumberger
is a technology company that partners with customers to access energy. Our
people, representing over 160 nationalities, are providing leading digital
solutions and deploying innovative technologies to enable performance and
sustainability for the global energy industry. With expertise in more than 120
countries, we collaborate to create technology that unlocks access to energy for
the benefit of all.Awards Categories: Organizational Transformation AI
Democratization & Inclusivity Challenge:   With superior talent and a vast data
warehouse available to Talent Management teams across the globe, the journey
towards applying machine learning on the edge was challenged with the following
requirements: Investment in learning and training, Compliance monitoring and
ethical use of data (assurance), Bringing stakeholders together to discuss and
assure the value of such projects. Furthermore, a challenge on capacity and
resourcing also emerged in complex scenarios, in which the talent team across
the world needed the technical expertise of the central data science team to
support and enable components of talent-specific data projects such as talent
planning, acquisition, identification, skilling and retention involving
multitudes of unsupervised learning (e.g. clustering), text mining and NLP (e.g.
embedding, NLP – identity), and supervised learning (ensemble modeling).  
Solution:   Training: The material provided by Dataiku covered all the needs and
catered to various competencies and profiles (e.g. data engineering, analysts,
business partners), which reduced our journey to data science at scale by months
and years.  Collaboration: The platform enabled connectedness across multiple
teams and drove efficiency in projects decisions, as well as visibility on where
support was needed. In the past, we had many reviews to get stakeholders to
understand what data was used and how engineering was applied, which went on for
months. Today, they have instant visibility on the entire data pipeline.
Compliance Monitoring: A big challenge was how to ensure that all the projects
being done on the edge are compliant to privacy regulations and bias
elimination, without stopping creativity. With the clear reporting tools and
automation of such reviews, teams are able to work more efficiently at scale -
where it would previously take weeks and months to complete such reviews before
projects begin.   Impact:   During the pilot conducted for one month with 10
members from various teams (compensation, talent acquisition) and personas
(recruiters, compensation analysts, talent acquisition planners), we saw more
than 5 projects being deployed to solve local business needs. We have a plan to
expand and add more than 100 HR personnel on the platform in Q3-4 of this year.
0 2
Posted by modhar
Standard Chartered Bank - Learning Together, Faster Through 100 Days of Coding
Name: Craig Turrell Title: Head of Digital, Finance Operations Country: Poland
Organization: Standard Chartered Bank We are a leading international banking
group, with a presence in more than 60 of the world’s most dynamic markets. Our
purpose is to drive commerce and prosperity through our unique diversity, and
our heritage and values are expressed in our brand promise, Here for good.
Awards Category: Most Extraordinary AI Maker(s)   Business Challenge: At
Standard Chartered Bank with Financial Operations / Financial Planning &
Analytics, we have been on a journey. This journey has taken us from a small
team of financial analysts living 'ground-hog' lives trying to get information
sources, integrated, and if time discovers value from it - but most of the time
we could only publish the numbers and hope someone found it interesting. This is
our beginning in early 2019, and our progress from those early spreadsheet days
to enterprise-class pipelining, analytical translation and the ongoing pursuit
of everyday AI is well documented. But something that happened in early 2022
shocked us, we started to reach a realisation of the '10,000' hour learning
challenge - an almost impossible hurdle that meant we could not scale. So no
matter how advanced the tool we were using, it will be worthless because we did
not have the talent and structures for managing and using it. The 10,000 Hours
So the first question is the 10,000 hours - where did this come from? Well, the
answer is in our belief there are digital unicorns; analytical engineer experts
that have design and hands-on knowledge of the analytical full stack. From data
ingestion to normalisation, feature engineering, metrics calculation, machine
learning modelling, visualisation and automation - a person who was able to
transfer a multi-sided analytical platform and design next-generation analytics.
We broke this down into three broad categories of skills: data pipelining and
data structures, metrics+scenarios+machine learning and human-computer
interaction/UI UX Design. Each of these required upskills and certification to
establish credible skills in a centre of expertise business model. This impacted
not only hand-on engineers but up the top of executive digital management - we
had critical skills drift inhibiting us scale success.   Business Solution:
Dataiku helped us in three key ways: Ongoing evolution of the platform features
Academy programme Partner ecosystem and interoperability Platform Features: The
ongoing evolution of the Dataiku platform and incremental business value
generated by each new release bring ready-to-use business solutions and
analytical features which no longer need to be discovered, built and adopted
across the team - but comes out of the box. For example, in recent version 11.0
the native time series features no longer requires our team to learn the theory,
build a model/visualisation and share the feature - best practice is already
there. The ongoing development of business solutions and best practice templates
will also be a game changer. Anything which accelerate the time-to-value and
reduces the learning overhead is make significant and immediate value again. We
can do more because Dataiku gives us that 'helping hand' to get to best
practice. Academy Program: This is our driving licence for analytics. It is how
we decide how to enable people on our production environment and guardrails in
the use of development features and machine learning. The courses are well
structured, the video content and use cases are on topic and aligned to real
work situations and problems we face. When were we struggling to learn what to
establish a data and machine learning operations (dataOps / MLOps) Dataiku
already had a new learning path for us to following and certify it. Even for
senior digital executive, such as Craig Turrell (Head of Digital Operations),
taking the courses and achieving certification helped close the skills drift and
make better platform decisions. Partner Ecosystem: Having a broad range of
plug-in extensions, interoperability options and cross platform solution not
only provided an immediate solution but reduced learning burden 10,000 hours
became 300 hours - accelerate time to value + ability to scale analytics & AI.  
Impact: Without Dataiku help: Original estimate of learning = 10,000 hours
(data, machine learning and visualisation) across three technology stacks to
expert level with Digital COE Estimated learning cost per engineering - 20,000
USD / 9-months of achieve full-stack delivery productivity With Dataiku help:
Following platform improvement in Dataiku, extension of academy programme,
utilise of partner/Dataiku solution + 100 days of coding learning sprints =
200-300 hours Estimated learning cost per engineering - btwn 1000-500 USD /
2-month to achieve full stack productivity   Value Brought by Dataiku: 100 Days
of Coding The personal journey of Head of Digital FinOps, Craig Turrell, is the
best example of not only the impact on upskilling, and enhancement to tech stack
efficiency but the network effects of the platform. Dataiku is a multi-sided
platform for artificial intelligence and advanced analytics - it is an ecosystem
of data, services, standards, and tools upon which different analytical persons
individually, but collectively create and extract value. It is co-creative
analytical thinking and analytical network effects of the platform and learning
environment. This created an exponentially valuable effect on this critical
skills problem as we were able to seamlessly share and co-create learning
journeys, tutorials and 'hackathon' challenges in a community-driven learning
marketplace enabled by the Dataiku platform and homogeneous data environment.
The 100 Days of Coding was a call to arms to ensure the skills of the most
senior digital leader were on par with the rest of the team, not through words
but mental sweating through the courses and certification process. Dataiku gave
us the environment to build big, fast and intelligent systems - it enabled us to
achieve amazing results that were irreversible and transformative. But the 100
Days of Coding, the improvements in the product platform, the ongoing
enhancement of Dataiku Academy and the contribution that partners are continuous
extending and enriching the available solution to real business problems allowed
us to do something beyond technical. Dataiku reduced the cost and time to teach
new ways of digital and democratise advanced analytics and machine learning ; it
reduced the time it takes to become a digital unicorn. It allowed us the see how
we got the to 'enterprise' leveraging Everyday AI. And through socializing our
journey on social media, we're now building momentum outside of Standard
Chartered Bank!
0 1
Posted by CraigTurrell
Ranjan Relan - High Traction Online Course on "How To Build your first Data
Pipeline with Dataiku"
Name: Ranjan Relan Title: AI and Data Strategy Manager Country: India Awards
Categories: Excellence in Teaching   Business Challenge: In 2020, demand for
Data Scientist was increasing at an exponential space but the number of skilled
professionals in Data Science were very few. Many had to go through the rigorous
process of understanding the issue at hand and writing lot of code to a build
data science pipeline. I was primarily looking for a low-code/no-code AI
platform which could be leveraged by many to quickly build data science
pipelines. Since many organizations started using and exploring the DataIku
platform, I also started leveraging it in 2020. I was extremely excited to
discover this low-code/no-code Enterprise AI platform - which had so many
features such as amazing UX design, automated ML, visual recipes which are so
easy to maintain and run Data Science pipelines, extensibility to Python and R,
ability to do data demography analysis, and data engineering with a click few
buttons. Looking at its product features, the DataIku platform came out as
becoming more and more powerful in coming years.   Business Solution: Based on
the current industry trends in 2020, i.e. lack of Data Science skills, limited
low-code/no-code AI platform, I thought there would surely be a course on
DataIku. Since I have already published courses on some of the major well known
EdTech companies , I firmly believed based on AI and Data Market Landscape and
with DataIku's gaining popularity,  a course in DataIku would help the community
a lot. Hence, I created a course on coursera.org named "Build your first ML
pipeline using DataIku", which was published in May/June 2021. Within just one
year, this course has been taken by more than 2,400 users. Of the 5 courses I
published last year in Coursera, the Dataiku course has been my fastest and most
loved course in terms of users who have taken it and the rating it attained.
This course has also one of the most "completion rate" which also contributes to
the ease of use of Dataiku platform. In this course, students learn how to build
their first data science pipeline using easy-to-use features in Dataiku. It
leverages COVID datasets and students understand how to leverage visual recipes
to perform data transformation such as split data , aggregate data etc. as well
as how to train and score model and spin up their first data pipeline in less
than an hour. Value Generated: This course has been taken by more than 2,400
students (course was launched last year in June), has 4.5 average rating and has
completion ratio of 40% - which is very high in online learning world.   Value
Brought by Dataiku: DataIku has a great UX and as a low-code/no-code platform,
it helps increase the team's efficiency, has an easy-to-use interface and high
user adoption amongst citizen data scientist and ML Engineering community.
0 1
Posted by RanjanRelan
Unilever - Designing a Responsible, Self-service Tool for Natural Language
Processing
Name:Ash Tapia (Linda Hoeberigs, Head of Data Science and AI, PDC Lab | CMI
People Data Centre)Title:Data Partnerships & Tools Stack ManagerCountry:United
KingdomOrganization:UnileverDescription:Everyday 2.5 billion people use a
Unilever product to look good, feel good or get more out of life. Our purpose is
to make sustainable living commonplace. We are home to some of the UK’s best-
known brands like Persil, Dove and Marmite, plus some others that are on their
way to becoming household favourites like Seventh Generation and Grom. We have
always been at the front of media revolutions whether that be the 1st print
advertisements in the 1890s or in 1955 when we became the 1st company to
advertise on British TV screens. Experimentation and bravery drive us and have
helped us become one of the UK’s most successful consumer goods companies.Awards
Categories: Responsible AI Challenge: Our Unilever People Data Centre (PDC)
teams across the globe deal with vast amounts of unstructured text data on a
daily basis to gain insight into our customers, how they engage with our brands
and products, and what are the needs that we are yet to tap into. The industry
is moving at a rapid pace which consequently requires a rapid generation of
insights to stay on top of the latest trends. The sheer amounts of data and the
skills required to analyse it efficiently exacerbate this problem. The answers
our marketeers, product research and development, and supply chain specialists
need also require analytics approaches tailored to the business. Analyzing text
data is a complex task and often requires understanding complex language models
and Natural Language Processing techniques, which most of our marketeers do not
have. To help with this, our data scientists and software engineers in PDC have
built a range of NLP methodologies and plugins, with the most complex being the
Language Analyser. The Language Analyser uses pre-trained language models for
Part-of-Speech tagging, Named Entity Recognition, string matching based on
existing entities relevant to Unilever, and visualises a range of insights in an
interactive dashboard in the shape of network graphs, word clouds and sentiment
scale bubble charts amongst others. Responsible AI is one of the fundamentals to
ensure our business is responsible, ethical, and sustainable, and this key
across all business areas. We set out to understand whether Language Analyser,
one of the most used plugins by analysts that employs NLP methods, is ethical.
Finally, we also set out to understand whether the way it is used within Dataiku
by the analysts is ethical. Solution: To assess how ethical is Language Analyser
and the way it’s used as part of our Dataiku ecosystem, we engaged with Adriano
Koshiyama, a Research Fellow in Computer Science at University College London
(UCL) and a co-founder of Holistic AI, a start-up focused on auditing and
providing assurance of AI systems. Adriano has been working as a data scientist
for many years across many industries such as Retail, Finance, Recruitment, and
R&D companies. All aspects of the plugin were assessed – its internal
components, the wider environment of where it’s sitting, and the kinds of
datasets analysts pull through the plugin. Since the plugin is available within
Dataiku, we can easily assess how and where people use it. Dataiku and its
collaborative, open environment has enabled full transparency on how the plugin
is used across different research projects. We are able to monitor usage and
assess its applications. Holistic AI assessed our capability on privacy,
fairness, robustness, and explainability using the following assessment
framework: Thanks to the use of Dataiku, we were able to clearly outline each
step of our development process, as both current and historical versions had
been stored in the flow and using the timeline versioning. We were also able to
share how, when and by whom the plugin was used, thanks to the usage stats
available on a dashboard we created in Dataiku. Furthermore, it was extremely
clear where the data came from thanks to the end-to-end visibility of each flow
in the project. All of this meant that we were able to provide white-box access
levels, and could be judged on each of the following dimensions: Impact: The
Language Analyser plugin received the green stamp of approval from the AI
auditing start-up. After having done a full assessment of one our most
successful plugins utilizing the fair, responsible, ethical and unbiased, the
analysts and data scientists can now be assured that the tool they use as part
of their work fits within our responsible business practices. The plugin passed
the assessment (more details) with flying colors on each dimension, thanks to it
being fully transparent in Dataiku, and was the first capability in all of
Unilever to do so. More widely, we can assure the business that Dataiku supports
our teams in ensuring longevity and continued transparency of our capabilities.
Additionally, we have full control visibility over what we develop, how we
develop it and what components we bring together to design a responsible tool.
Combined with sufficient version control, we are able to mitigate risks and know
which areas to pay particular attention to.
1 2
Posted by ash
Last reply 09-01-2021 by Triveni
Unilever - Developing a Scalable Digital Voice of the Consumer Capability
Name:Anand Patel (Digital Voice of the Consumer team)Title:Analytics
ManagerCountry:United KingdomOrganization:UnileverDescription:Everyday 2.5
billion people use a Unilever product to look good, feel good or get more out of
life. Our purpose is to make sustainable living commonplace. We are home to some
of the UK’s best- known brands like Persil, Dove and Marmite, plus some others
that are on their way to becoming household favourites like Seventh Generation
and Grom. We have always been at the front of media revolutions whether that be
the 1st print advertisements in the 1890s or in 1955 when we became the 1st
company to advertise on British TV screens. Experimentation and bravery drive us
and have helped us become one of the UK’s most successful consumer goods
companies.Awards Categories: AI Democratization & Inclusivity Challenge: As a
consumer obsessed business, Unilever had an ambition to ensure all business
decisions are centered around our consumers. In order to do this, we must listen
to, understand, and adapt to the changing needs and wants of our consumers. Our
challenge was to develop a scalable digital voice of the consumer capability,
that would enable us to interpret large, complex and unstructured consumer
feedback datasets through leading data science techniques and serve relevant
consumer insights to business decision makers for actioning. The solution would
be developed in partnership with Unilever’s Quality function, to enable them to
identify product issue and opportunities areas to improve and innovate based on
consumer comments. The comments would be sourced from social media, product
reviews, and Unilever’s customer engagement centers. We therefore required a
platform that would enable us to build an industrialized AI solution end-to-end,
including data merging, cleansing, manipulation, modeling, and output. The
platform should enable the processing of large volumes of datasets in near
real-time and enable the development and deployment of sophisticated AI-driven
natural language processing models. The AI models will be key to automation and
deriving prescriptive insights from the large unstructured datasets. Models
should also be made available to be used by the rest of the business for other
products or analysis if needed. The developed flow show be repeatable and
performed on a cloud set up, to benefit from distributed data storage and
processing. Solution: An industrialized solution called Digital Voice Of the
Consumer (DVOC) was developed to fulfill the ambition. With Dataiku as the
underlying platform, we were able to develop an automated and scaled solution
that is updated daily and democratizes consumer insights to Unilever’s entire
organisation. Through developing multiple flows in Dataiku, we were able to
bring together an array of internal and external consumer feedback datasets and
enrich with additional product data. The datasets were then cleansed and
structured using a combination of Dataiku’s pre-built nodes and Python scripts.
Machine learning-based natural language models were developed using leading-edge
methods and enabled unsupervised learning to identify the key topics and themes
consumers were talking about. Deep learning algorithms were also developed for
sentiment classification. Through the versatility and intuitive nature of
Dataiku, the flow was developed for data scientists and analysts with all levels
of experiences and provided a great way for more junior team members to upskill
themselves. Developed machine learning models have been deployed and also made
available to the rest of the Unilever community through the plugins feature on
Dataiku. The overall DVOC flows are complex, however the Dataiku platform
enables us to visually display the flow and create groupings and scenarios to
cluster related notes together – making it easier to make developments, test and
diagnose bugs and errors. Impact: Our Digital Voice Of the Consumer solution is
now live in over 60 markets, 50 factories across the globe and available in more
than 20 languages with an active user base of over 2,000 people. The tool is
embedded into the day-to-day operations of the Quality team, and is also widely
used across other business functions including marketing, research &
development, and customer development teams. There have been over 750 business
insights logged from the capability to date with over 500 of these actioned and
closed. DVOC insights have been actioned across Unilever’s business to make a
real impact on consumers. Examples include: Helping Unilever develop and launch
new products in line with what consumers really want. Optimise existing products
so that are more suited for eCom by improving product packaging to reduce
leakage and breakages. Redesign of products and packaging to make them more
sustainable. Helping fight counterfeits. Tracing back issues to specific
Unilever factories and enabling Unilever’s business to be agile during the Covid
outbreak by reacting to changes in consumer behaviors. This has led to tangible
business results to deliver costs savings, increased profits and ultimately
improved product quality and better consumer experiences. Individual cases have
unlocked costs savings of up to €350k each.
0 1
Posted by anand_patel
ENGIE GEM - Building a Path For All Users to Easily and Securely Gain Insights
From Their Data
Name:Stéphane RaguideauTitle:Digital & Data
AcceleratorCountry:FranceOrganization:ENGIE - Global Energy
ManagementDescription:Global Energy Management (GEM) is one of ENGIE’s Business
Units. At the heart of the energy value chain, we optimize the Group’s assets
portfolio including electricity, renewable technologies, natural gas,
environmental products and bulk commodities such as biomass. We also develop our
own external commercial franchise worldwide and rely on four main expertise to
offer tailor-made, innovative and competitive solutions. We provide services in
energy supply & global commodities, energy transition services, risk management
& market access, and asset management. With a staff of 1,400, offices in 15
countries including 8 main spots, GEM has an extended geographical coverage in
Europe, the US and Asia-Pacific.Awards Categories: AI Democratization &
Inclusivity Challenge: Data science is at the core of our activities at ENGIE -
Global Energy Management. Users across departments manage various sources of
data, including:  Energy consumption, Market data, Weather information, Deal and
order book, Etc.  This data is leveraged by the business for many purposes,
including: Pricing, Risk management, Data reconciliation from various sources,
Reporting, Etc.  But access to the data was limited due to its sheer volume,
security considerations, and tooling segmentation. In addition, coding skills
were required for accessing it, which excluded many users who did not have a
technical background.  Users needed to manually retrieve the data through a
variety of applications, which caused several issues: Task repetitiveness, which
was very time-consuming - and namely included extracting data from the different
systems in place. Data availability, as all data sources were not always
referenced and only IT may have been able access these. Operational risk, which
is related to the quality of the data and the manual processing taking place
(e.g. mistakes in copy/pasting steps), Coding skills required to manipulate the
data and automatize part of the process, e.g. “Visual Basic for Applications”
(VBA) in Excel, or Python.  Tooling was not fit for the volume of data (in
particular, Excel).  Solution: Dataiku enabled us to solve some of these pain
points: Data is now easily accessible through a number of plugins created
internally, which enable users to easily and securely interact with the
different data sources, Low code/no code data manipulation: visual recipes
enable users to prepare and transform the data to fit their needs, without any
coding skills required.  For more complex operations, the collaborative visual
interface enables our IT teams to work hand-in-hand with the business on
building and editing workflows. Sharing insights from the data is made easy with
the dashboarding features.  Process automation, leading to:  Shortened
time-to-market, now that reporting and analysis are available on-demand.
Increased monitoring capabilities, as monthly and weekly analysis can easily be
turned into daily reports.  Reduced operational risk, as manual operations are
now automatized. Impact: As with every new tool, Dataiku requires specific
onboarding to maximize its benefits. At ENGIE Global Energy Management, our
users have different profiles and backgrounds, hence they are not all familiar
with data manipulation and analysis.  It is therefore important to provide them
with training opportunities, regardless of their division (trading, risk, back
office, finance, IT, etc.). This includes: Understanding their needs and
identifying a use case to conduct a Proof of Concept (POC).  Developing the most
relevant training in regard to their profile and skills.  Building the Dataiku
plugins and connectors to allow them to easily and securely access the data. 
Hosting regular workshops (at least once per week) on select topics throughout
the POC, including partitioning, Python recipes, machine learning, automation,
dashboarding, pattern recognition, etc.) This training path is set to two
months, after which users are given autonomy to access the data, manipulate it
for their day-to-day needs, and most importantly, are able to explore new areas
to gain more insights from their data. This has been a key pillar of data
democratization within ENGIE - Global Energy Management.  Becoming a data
scientist or an engineer doesn’t happen overnight though, hence we’ve developed
a framework to monitor the projects created in Dataiku and ensure they’re
following established governance and best practices, including data connections,
scenarios, data sharing, partitioning, plugins types, etc.) All users are
therefore able to produce insights safely!
0 8
Posted by s-raguideau
NXP Semiconductors - Reducing Detection Time of Manufacturing Issues with
Real-time Automated Process Control
Team members:Adnan Chowdhury, Manufacturing Quality EngineerDavid
MeyerCountry:United StatesOrganization:NXP SemiconductorsDescription:NXP
(originating from Motorola and Philips) is one of the largest semiconductor
suppliers in the world. Key products range from Automotive solutions,
Communication, Infrastructure, Mobile, Industrial, and Smart City/Home. NXP has
over 60 years of experience in the industry and has brought key innovations to
the world.Awards Categories: AI Democratization & Inclusivity Value at Scale
Challenge: In semiconductor manufacturing, a critical quality and
manufacturability figure of merit is the ability to detect and resolve
manufacturing issues as quickly as possible, i.e. “Time to Detect” or TTD.
Advanced process control is one of the key contributors that enable factories to
minimize this TTD. Reduction of Time to Detect (TTD) is a critical
quality/manufacturability goal because high TTD means manufacturing issues are
not detected and resolved rapidly and consequently allows further production
material to be exposed to faulty processing — which incurs material costs,
engineering costs, and delays in meeting customer demand. In this article, I
will present a comparison of the current typical process control using test
wafer measurements with high TTD, versus using real-time automated process
control using virtual metrology built with machine learning in Dataiku that
greatly reduces TTD.  Solution: In this Virtual Metrology solution, inputs
consist in various data sources from the manufacturing production line (e.g.
sensor data). We build machine learning models to generate predictions of the
Key Measurement of interest, which then feeds directly into our Statistical
Process Control systems for making Process decisions.  Some examples of key
measurements of semiconductor components may include the measurement of physical
geometries (depths/angles) and electrical characteristics such as
voltages/currents/resistances. This figure shows high-level comparison between
using previous method of test wafer metrology for process control vs. new
virtual metrology method for process control: We observe that, in the previous
method, there is a delay in detecting issues in the manufacturing line because
we only take test measurements every 3-4 days. When an issue occurs, it will go
undetected know until the next scheduled test measurement. The new method with
virtual metrology provides continuous detection of manufacturing issues, through
creating virtual measurements on all materials. As manufacturing issues come up,
we are able to observe the effects through the virtual measurements, which
enables the manufacturing team to take immediate action and contain the problem.
The key sections of the Virtual Metrology solution can be broken down into 4
components: What data will be used: Examples of the data sources included (but
not limited to) Fault Detection Control data (i.e. sensor data from the process
tool), hardware component information of consumables in the process, material
volume data, electrical testing, etc. How the data will be used: Feature
Selection, i.e. identifying the most significant inputs, and Feature
Engineering, i.e. gaining deeper insights from inputs, are both used heavily to
structure the input data before modeling. How to model the data: Advanced
Machine Learning model was used for final predictions How to use the predictions
for process control: To develop an end-to-end solution driving production
process control, we identified and implemented the following components (cf.
graph): Accessing and querying in real-time the input manufacturing data
Performing all the Feature Selection and Engineering tasks Running the input
data through the model and generate predictions Export the predictions into
Statistical Process Control charts Identify suitable control limits and define
appropriate out of control actions   Results We determined the effectiveness of
the model by focusing on the following metrics: Explainability of variation
between inputs and outputs Error between actual result and predicted result The
graph below shows the target (actual) vs. predicted values of the critical
parameter of interest over a randomly sampled span in time. It can be observed
that the predictions match closely the actual measurement values: Impact: Key
advantages of using virtual metrology for process control includes:
Significantly reduced TTD of manufacturing issues without significant financial
investment from the factory, i.e. such Analytics/ML solutions simply use
existing data sources. Reduction/elimination of the material and engineering
cost of test wafer measurements. Increase production volume through minimizing
tooling downtime, while we previously waited for test wafer measurements Faster
root cause problem solving by using features like variable importance from the
models.  Through the deployment of the end-to-end solution, we estimate savings
in the million dollars based on material and engineering costs associated with
avoidable manufacturing excursions, now that Virtual Metrology is in place.
2 9
Posted by Adnan325
Last reply 08-06-2021 by BasB
InfoCepts - An End-to-end Data Workflow to Conduct Clinical Research at Scale
Team members:Nilesh LahotiAnil Kumar M.S.Mohit A. JichkarAnanth Kumar
ChamarthiCountry:United StatesOrganization:InfoCeptsDescription:InfoCepts, a
global leader of end-to-end data and analytics, enables customers to become
data-driven and stay modern. We bring people, process, and technology together
the InfoCepts way to deliver predictable outcomes with guaranteed ROI. Working
in partnership with you, we help businesses modernize data platforms, advance
data-driven capabilities, build augmented business applications, create data
products, and support systems. Founded in 2004, InfoCepts is headquartered in
Tysons Corner, VA, with offices throughout North America, Europe, and Asia.
Every day more than 160,000 users use solutions powered by InfoCepts to make
smarter decisions and businesses achieve better outcomes. For more information,
please visit www.infocepts.com or follow @InfoCepts on Twitter.Awards
Categories: Value at Scale Excellence in Research Challenge: The client is a
leading pharma company, which wanted to analyze the market and make a decision
to invest in the research of drugs to avoid risks, save time, and at the same
time be profitable in the near future. The lack of both qualitative and
quantifiable data at the client’s hand was a big concern to correctly analyze
and understand the present market to plan and organize the future business. To
develop any predictive model or to draw insights of any business using machine
learning algorithms, it is very important to have real quality data telling
about health symptoms that users are experiencing. The purpose of this research
project was to collect real-time data from the end-user, store the collected
data, perform analytics, and build a predictive model on top of it. Their
challenges involved with the previous approach are summarized below: It was not
an easy task to collect data from individuals as no one wants to share their
identity while disclosing their health information, hence the need to anonymize
the personal identity of the user. Manual data collection was a tedious process
that involved sending an email and getting the details on an Excel spreadsheet.
Lack of central storage mechanism and process to save and update the data
regularly. Extensive coding was involved to prepare, clean, and aggregate the
collected data before it can be analyzed. Heavy reliance on a third-party
application to perform analytics and build predictive models. High cost involved
in the purchase of data from third-party resources to perform analytics. Need to
integrate the end reports derived out of multiple tools to create a single view
of insights. Reliance on custom-based web graphical user interface, a standalone
app processing on the server, and user BI reporting tools. Time-consuming data
integration and pipeline orchestration with multiple technologies and scripts.  
Solution: To meet the above objective, our team built a web-based user interface
survey form to collect data, created a storage mechanism to store temporary as
well as permanent data, and a processing engine that can run the advanced
analytics based on the existing and newly collected data. In addition, we built
a business intelligence dashboard to visualize insights, plots, and analytics,
along with predictions derived out of user-given inputs back to the end-user.
This dashboard was presented as an output to the user to explain his/her current
and future health disease condition with prediction. The following steps
summarize the activities carried out to solve the business case: 1. Real-time
data collection Dataiku web application capability was leveraged to create a
survey form for the end-users. Our team used the Rshiny templates with Dataiku,
which made it simpler and faster to create the form: The web app was made public
(within intranet) to be accessible to all the users within the organization
Apart from user input in forms, data was also fetched from internet sources like
Google Trends to augment the data science models. Dataiku time-based scenarios
are used to automate the process of collecting the latest trends. 2. Data
preparation A mix of visual and code-based recipes in Dataiku was used to
perform the data cleaning and preprocessing activities. 3. Model development The
following models were developed using Python and Rstudio within Dataiku: Disease
prediction: A classification model to predict the disease condition of the end
user and indicate whether the user is disease-free or has been impacted by the
disease. Survival analysis model: Predicts the expected age to attain the
disease condition under different given medical conditions. Sales forecasting:
Predictive model which makes sales predictions based on user-given inputs. 4.
Automation and End User Reports Real-time prediction and analytics are presented
to the end user via an Rshiny web app, based on the inputs provided: The output
includes the prediction of disease condition, a survival analysis graph which
predicts at what age the disease is expected under different given medical
conditions, segmentation which shows the similar medical symptoms under
different given age groups, etc. Python-based models were invoked from the
Rshiny web app using the APIs provided by Dataiku. The entire workflow
(screenshot 1.3) was seamlessly automated using a mix of scenario-based triggers
and API based calls from the web apps (screenshot 1.4): Impact: 1. Cost savings
The solution enabled $300k of cost savings from optimized infrastructure,
improved process orchestration, and 3rd party data purchase avoidance. 2. Time
savings The solution saved 50% of effort that was involved in the earlier manual
effort. 3. Bridge gap between technology and business The business users were
closely involved in the iterative development, review, and continuous research.
The visual recipes in Dataiku enabled business stakeholders to understand the
technology and general challenges in the process very well. This increased
adoption by 2X. 4. Real-time ingestion and analytics Saved processing time in
terms of data collection and data integration from the end users - since as soon
as the user fills the form, the rest of the process for data preprocessing and
analytics was automated within Dataiku itself. 5. Opportunities for innovation 
Real-time data collection enabled additional avenues to understand the current
pharmaceutical market conditions better. 6. Improved decision-making process
Central access by all the departments helped the users to make data-driven
decisions based on the current market conditions, avoiding risks, and be more
profitable.
1 11
Posted by keogabriel
Last reply 07-22-2021 by Uday
Hospital de Clínicas de Porto Alegre - Streamlining Data Workflows for Clinical
Research
Name:Tiago Andres VazTitle:Head of A.I. (From Research-to-Production) | IT
Advisor in HealthcareCountry:BrazilOrganization:Hospital de Clínicas de Porto
AlegreDescription:Hospital de Clínicas de Porto Alegre is a large teaching
hospital located in Porto Alegre, Brazil. Affiliated with Federal University of
Rio Grande do Sul, it was inaugurated in 1970, gradually becoming a reference
for the state of Rio Grande do Sul and southern Brazil. It takes care of in
about 60 specialties, since the simplest procedures until most complex, with
priority, for patients of the Secondary Uses Service.Awards Categories:
Excellence in Research Challenge: Hospital de Clínicas de Porto Alegre is a
general, public and tertiary health care institution partnering with the
medical, nursing, pharmacy and dental schools of the public university UFRGS, in
Porto Alegre, Brazil. We develop our own Electronic Health Record called AGHUse,
which is open source and the most adopted university hospital information system
in Brazil. We faced multiple challenges: Data acquisition and preparation is
time consuming and leads to lots of transformations and less data quality. Large
amounts of data took hours to open and process simple modifications, and
querying such complex databases usually requires more than one system analyst
and business experts. The necessity to query each dataset multiple times to
understand the information and create manual pipelines for machine learning were
complex and confusing processes, without a graphical clear path explanation of
what was going on. Each modification in methods or statistical analyzes involved
the creation of a new branch in one centralized repository only for syntax and
code versioning, and data was re-generated each time we had to rollback,
sometimes leading to impossible reproduction in our own laboratory. Comments on
our data were managed in docs without any link or meaningful integration to our
code or data. We had faced limits with the number of columns in traditional
relational databases and switching databases was an almost forbidden process.
Switching machine learning pipelines from R to Python, and vice-versa, the same.
And lately, before Dataiku, we were starting to feel pain points for larger data
governance, tracing access to data, defining user profiles and logging every
aspect from our research project with data. Solution: We started using Dataiku
after a Tableau representative sent to me a comparison between Dataiku and
Databricks. We analyzed both platforms, comparing important features for us, and
I remember that moment that our research team voted unanimously for Dataiku to
start our research. Then we sent a message to the company's Academic and
Education relations team, and after a fast response we received a donated
license and installed Dataiku on premises. After a few configurations and
installation steps accomplished, almost with no need for support from IT
department, we started the following steps: Statistical description of all our
income data De-identification Cleaning and formatting Interpretation and
curation strategy Definition of roles and tasks planning Notes and codes
standardization Graphical pipeline definition Pipeline execution Statistical
description of processed data Machine learning modeling Parameter tuning
Artificial intelligence deployment In our journey, we learned that innovation
tools like Dataiku will revamp clinical research, and so, there is the need to
formally define ontologies and new methods for healthcare research using large
hospital datasets. This is the motivation of our current work! Impact: The time
saved using Dataiku is remarkable. I am teaching at medical school, where
students are using Dataiku as a substitute for MS Excel and Power BI are leading
the process. I think that this is due to the innovative “first layer” that the
Dataiku interface gives to old spreadsheet concepts, used along solid backend
processing (auditable and automatic), which enable us to move to a DataOps
culture.
0 2
Posted by tiagoandresvaz
One Acre Fund - Scaling Data Science Insights to Better Serve Smallholder
Farmers
Name:Emiel VeersmaTitle:Data ScientistCountry:RwandaOrganization:One Acre
FundDescription:One Acre Fund is a nonprofit organization that supplies
smallholder farmers in East Africa with asset-based financing and agriculture
training services to reduce hunger and poverty. Headquartered in Kakamega,
Kenya, the organization works with farmers in rural villages throughout Kenya,
Rwanda, Burundi, Tanzania, Uganda, Malawi, Nigeria, Zambia, Ethiopia, and India.
One Acre Fund actively serves more than 1 million farmer families. One Acre Fund
offers smallholder farmers an asset-based loan that includes: 1) distribution of
seeds and fertilizer; 2) financing for farm inputs; 3) training on agriculture
techniques; and 4) market facilitation to maximize profits. Each service bundle
is around US$80 in value and includes crop insurance to mitigate the risks of
drought and disease. To receive the One Acre Fund loan and training, farmers
must join a village group that is supported by a local One Acre Fund field
officer. Field officers meet regularly with the farmer groups to coordinate
delivery of farm inputs, administer trainings, and to collect repayments. One
Acre Fund offers a flexible repayment system: farmers may pay back their loans
in any increment at any time during the growing season. Beyond their core
program model, One Acre Fund also offers smallholder farmers opportunities to
purchase additional products and services on credit. These include solar lights
and reusable sanitary pads.Awards Categories: Organizational Transformation Data
Science for Good AI Democratization & Inclusivity Value at Scale Challenge:
Operationalizing data science projects The biggest challenge we were faced with
at One Acre Fund was operationalizing our data science projects. Over the years,
many clever data scientists came and went at our organization. The data
scientists conducted impressive analyses, but the results were soon outdated and
forgotten once they left. There was not one root problem that caused this, but
there were many different challenges faced. 1. Coding makes reusability more
difficult The first challenge was that the data scientists were doing everything
with code. It's hard to take over someone’s project when both the data, the
model, and the steps were not visible. When the timelines of the data scientists
would not overlap, taking over a project would be so challenging that the new
data scientist would just start over again. 2. One-time insights through local
computing Furthermore, our data scientists were not used to working with
servers, so the code would run locally on their computer. If you run code
locally, you can’t interact with “live” systems, pushing data science to the
back of the organization. Results were not taken into production, but just used
as one-time insights. 3. No shared infrastructure for accessing data Our final
challenge was that we didn’t have an infrastructure set up to share our data. We
were not used to interacting with databases, and thus our data would reside on
our computers. When a project was finished, the deliverable was the report, but
it would be hard to reproduce it. Solution: Since Dataiku is a full stack data
science platform, it helped us in so many ways: 1. Automation to facilitate
workflow maintenance Initially we were looking for a solution where we could
schedule and run our Python and R code. We wanted it to integrate with Git, and
run code in isolated environments. When we tried out Dataiku, we set up a
project to predict client repayments. We had analysed this before, but it was a
complex process, which took a lot of effort to maintain. With Dataiku, we could
easily run our code, connect with our data warehouse and schedule the flow. 2.
Optimize modeling thanks to model competition Dataiku enabled us to try out
different types of models and investigate the data. These features helped us
more than we expected. 3. Visual interface to democratize data insights In the
next projects, we worked with less tech-savvy colleagues. Dataiku helped them to
be able to use the click-and-play functionality to build complex ETL processes
and store the data in the database. This helps the organization to democratize
our data analysis and to store the data in a central place. Because of the
visual nature of the flows, we can easily work together and discuss the
challenges that we face during the project. Seeing the datasets halfway through
the flow enables us to easily understand what is going on in the data, and share
it with different stakeholders. It’s obvious that visualizing the steps of a
process reduces a lot of mistakes and it is something we couldn’t work without
anymore. Impact: 1. Scaling our data science initiatives Currently, we have
created 70+ projects, of which 25 are in production. We maintain more than 1,000
datasets on 33 connections. We’re working with a small team and this would not
have been possible without a platform such as Dataiku. And although our team is
small, more and more colleagues are working on Dataiku and are able to perform
their own advanced data science. We have 25 active users on Dataiku who work
together on the platform on a daily basis, and this number is growing rapidly.
Before, we wouldn’t be able to work together with such a big group. 2. Faster
user onboarding & enablement Dataiku saves us a lot of time. Last week, we
introduced a Rwandan data analyst to the platform. We reproduced a project he
had been working on and we were able to take it to production within an hour.
This meant that he didn’t have to manually download the dataset anymore, thanks
to the visual recipes he could run his code on the database and he could easily
investigate his intermediate steps. Before Dataiku, the project took him 5 days
to build and run. 3. Upskilling our team & stakeholders It also allows us to use
techniques which weren’t accessible to our data scientists before. For example,
our farmers can now talk to a chatbot, to receive information about the weather.
This chatbot talks to a Dataiku API endpoint, which accesses our stored
forecasts. Without Dataiku, our data scientists wouldn’t be able to set up an
API by themselves. The same can be said for the scheduling and the deployment of
code to a server. Overall Dataiku really helps us to become a data-driven
organization and we couldn’t work without it anymore.
0 5
Posted by Emiel_Veersma
Atlantic Plant Maintenance - Bringing Workers Home Safe Through Defect Detection
Name:Aaron CrouchTitle:Data Analytics ManagerCountry:United
StatesOrganization:Atlantic Plant MaintenanceDescription:We specialize in the
repair and maintenance of Power Plant equipment. We mostly work with GE Coal,
Steam, Nuclear and Gas turbines and boilers. It is difficult and dangerous work
performed by skilled union labor, often in the elements. Very few laborers work
directly for APM on a permanent basis, and are hired out of union halls as
needed. Since our work requires taking generators offline, most of our work is
done in Spring and Fall when power demand is lowest. Our labor pool has many
opportunities outside of APM, especially in the Spring and Fall outage seasons,
so it is imperative that our union employees WANT to work for APM. Part of
building that loyalty is getting our labor home safely.Awards Categories:
Organizational Transformation Data Science for Good Challenge: The main issue
which drove us to Dataiku is the moral imperative to send our workers home safe.
Beyond that, there is a financial and business impact for safety too. If our
workers go home hurt, they won’t want to come back and work with us again. Also,
the industry, insurance companies, and government regulators track our Injury
and Illness (or I&I) rate, which is a formula that takes into account the number
of our most serious injures, times 200,000, divided by the number of hours we
work. This represents the number of full time employees, per one hundred, that
will experience a recordable injury in a year. Many plants require contractors
to have an Illness and Injury rate below a certain threshold to work on site.
Our I&I rate was close to getting us blocked from certain customers. So we
wanted to combat safety issues, but injuries seemed so random that safety
professionals and leadership could only react instead of being proactive. We
needed to flag problem jobs, send professionals to problem sites, and train
problem employees. It was decided that the data we have could help flag these
problem jobs and possibly prevent an incident before it happened. Solution: We
used Dataiku to combine all the inputs we had on jobs: defects by job site
history, superintendent defect history, employee ratings, lines of business, job
duration, headcount, turbine type, work scope… Dataiku was able to combine all
of these variables and compare them to the past jobs that had safety or quality
incidents and calculate the likelihood of a defect on upcoming jobs. Dataiku’s
feature allowing us to see which variables are causing an individual job to be
flagged as high risk has allowed management to reduce risk, through being able
to identify the most impactful variables (where possible) and make changes in
the front end accordingly, leading to fewer high risk jobs to begin with. The
ability of Dataiku to flag which metrics are most important to the model as a
whole has allowed our field personnel to see how the model works, and even
suggest other metrics that the data team did not consider, increasing field buy
in. We put in place mitigation strategies, such as required twice weekly hazard
hunts and leadership site audits to reduce the likelihood of an injury or
quality defect. Dataiku has also enabled us to measure how effective our
mitigation strategies are after a job has occurred, by looking at the percentage
of high risk jobs that used our mitigation strategies and comparing whether
those jobs had a safety or quality defect. Impact: The results are dramatic. In
2018, before we launched the high risk jobs program, close to 26% of our jobs
had some sort of safety or quality defect. That number has declined steadily to
less than 11% in YTD 2021. In 2018, over 86% of the jobs that would have been
flagged high risk had defects. In 2021 YTD, only 68% of high risk jobs had a
defect. Using the statistical tools in Dataiku, we were able to see that high
risk jobs where a leadership site audit was performed had a 77% chance of having
a defect, while high risk jobs that did not have a leadership site audit
performed had an 83% chance of having a defect, with a statistically significant
p-value of .046. We obviously can’t point to an injury, or quality event that
did not occur, but we can assume from the data that workers who otherwise would
have been hurt made it home safely. This keeps our employees safe, our customers
happy, and reduces insurance rates.
0 1
Posted by AaronCrouch
Westpac – Driving Organizational Change with Collaborative, Self-service Data
Science
Team members:Cameron Wasilewsky, Discovery Success Lead, with:Malcolm Wanstall
Daniel Mccarthy Leonardo Silva Upul Senanayake Astria Bothello Nicholas
Lillywhite Kelly Tsang Collin Zheng Victoria Zheng Vincent Chen Dylan Dowling
Rupa Dutta Di Gao Ryan Hopson Jun (Blake) Im Tim
SpencerCountry:AustraliaOrganization:WestpacDescription:Westpac is Australia’s
oldest bank and company, one of four major banking organisations in Australia
and one of the largest banks in New Zealand. We provide a broad range of banking
and financial services in these markets, including consumer, business and
institutional banking and wealth management services. We also have offices in
key financial centres around the world including London, New York, and
Singapore. Westpac Group's purpose is Helping Australians Succeed. It’s what we
do, who we are and why we come to work every day. What's most important to us is
understanding what success means to our customers and helping them get
there.Awards Categories: Organizational Transformation AI Democratization &
Inclusivity Challenge: As Australia’s oldest bank, and one of the major banking
organizations in the country, Westpac encountered the typical data science
challenges of companies in the financial services industry: 1. Complexity in
processes Being a large, highly regulated organisation, it can sometimes be a
complex process to get approval on a business idea and bring it into production
quickly. We set out to tackle this issue to improve innovation and the ability
to solve complex business problems. We not only had to provide a pathway to move
forward with more velocity, but also make it an understandable pathway through
dedicated governance and documentation to fit within the industry’s security and
regulatory framework. 2. Change in ways of working Data practitioners at Westpac
are comfortable with SQL, which shaped a certain understanding that data comes
in tables and products are like reports. It is a hurdle to work through, given
that they are comfortable with this way of working, and the key is not only to
show them new possibilities, but also to communicate the value they will gain.
Learning new tools and changing ways of working is extremely difficult, hence we
had to make them see that we could build much bigger products and services, that
data can come in any shape or form, and expand the very idea of what data can do
– instead of just being an answer to a current question. 3. Technological change
The former tool structure was very simple, but we needed new tools to achieve
bigger outcomes, which added complexity. But, relating to the challenges
previously highlighted, people are naturally resistant to change and renewing
the structure is a very cumbersome process – hence the need for a central data
platform to bring everyone together on this journey. Solution: Dataiku played a
key role in this transformation, in several ways: 1. New organizational
structure leveraging Dataiku as the central data platform As we implemented
Dataiku in September 2020, we created the Discovery Lab which, with only 5 full
time and 5 part-time team members, supports 40 business labs and 120+ employees
in their data science endeavors. The Discovery Lab is structured into two teams:
The Success Team focuses on business engagement, use case prioritization,
onboarding, and orientation in Dataiku, and supporting the delivery of data
initiatives with the different labs. The Platform Team oversees building the
data applications, as well as developing the capabilities to integrate new
technologies within the security and regulatory framework of the organization.
Embodied through using Dataiku as the central data science platform, this new
setup helped break down the barriers between business and tech to seamlessly
work toward building cutting-edge data products and services.   2. New
collaborative, self-service operating model to upskill and drive a closer
alignment between the business and tech teams We also developed a new operating
model and new processes to ensure a strong alignment between the Lab and the
business teams, while enabling them to broaden their understanding and gain new
data skills throughout the project. When a new team member joins, we plan an
orientation session so that they understand how the team can leverage Dataiku.
Then they go through the ‘Discovery Suitability Assessment’ to further define
their use case, the objectives and expected impact, as well as how they
visualize the outputs and outcomes of the initiative. This is key to understand
the potential value that can be delivered to our customers through this use
case. This enables us to assess where we can help, whether this aligns with our
data strategy, and the potential resourcing gaps to cover in order to carry it
through. Adopting this high-level, end-to-end project view also enables us to
identify the needs of the business in terms of data literacy, wrangling, and
visualization, and train them to develop these new skills. With this practical
approach, they immediately see the benefits they will gain in their day-to-day
job. Dataiku’s visual interface also acts as an enabler since they understand
the whole data workflow, while being able to learn more and dig deeper into the
more technical work in one click. 3. Documentation & reporting made easy to
efficiently carry out new data initiatives through production Our main best
practice is to document every request and action performed throughout the
project for public knowledge – which not only helps with alignment and
upskilling but has also enabled us to carry out more and bigger undertakings.
After the ‘Discovery Suitability Assessment’, the overall objectives, key
metrics, and outputs are entered in a public Confluence page (example below), as
well as the main developments, and any issues or bugs encountered. Conducting
each project in this transparent manner helps with setting the right
expectations for what the Lab can achieve - being clear on what we’re capable of
doing, what we’re working toward, as well as what we’re building and when it
will be delivered. It’s critical to build trust with our business counterparts.
Everyone can add and edit, and it’s everyone’s responsibility to keep it updated
so that we move forward quickly and comfortably, while anticipating and
mitigating any potential risks. Besides the Confluence page, all requests and
actions are logged in Jira so that stakeholders are aligned on progress.
Dataiku’s tracking and monitoring capabilities are a critical piece of this
equation. The built-in functionalities accurately reflect the work that is being
done by the team and all actions performed on the project, in a visual and
easily accessible manner. We have also created dedicated tracking projects to
automate reporting and gather high-level risk metrics on the projects conducted
by all our 40 labs (examples can be seen below) – this saves us precious time on
formerly tedious reporting tasks and enables our team to focus on the bigger,
more interesting undertaking that will transform the organization. Dataiku
dashboard tracking the number of weekly active users Dataiku dashboard tracking
our tickets and time to address them We have also implemented a range of
continued support for our labs in the form of weekly drop-in meetings, 1-on-1
sessions and Westpac specific video training material. Continued engagement
through announcements and our recent showcase competition.   Impact: This new
organizational structure implemented with Dataiku has brought tremendous
benefits to drive data transformation within the organization. The most striking
aspect is the change in perspective, which is critical to enable change. The new
collaborative, self-service model enables the tech team to serve the business,
help them upskill, and work together to drive innovation at scale. It is
altogether a different attitude to a lot of the former processes that were in
place! Breaking down the barrier between tech and the business has been our
biggest success with Dataiku. Now we are able to tackle bigger data projects and
get them in production state in a much quicker, yet realistic, time frame -
whereas many used to be stuck in ideation due to misalignment and tooling
segmentation. As the central data platform, Dataiku enables us to gather all
stakeholders around the table, regardless of their profile and data skills –
which brings about a diverse mix of perspectives to the project, helps everyone
upskill, and eventually leads to developing more innovative and transformative
data products and services. This structure and platform have enabled us to bring
our 100+ users and 300+ analytics community members closer to the data, ensuring
they are able to ideate and develop data driven insights with the appropriate
governance, support, and mindset.
1 4
Posted by camwasi
Last reply 07-13-2021 by JamesO
The Ocean Cleanup – Empowering Citizen Data Scientists Across the Organization
Name:Bruno Sainte-RoseTitle:Lead Computational
ModelerCountry:NetherlandsOrganization:The Ocean Cleanup,
StichtingDescription:Every year, millions of tons of plastic enter the oceans
primarily from rivers. The plastic afloat across the oceans – legacy plastic –
isn’t going away by itself. Therefore, solving ocean plastic pollution requires
a combination of stemming the inflow and cleaning up what has already
accumulated. The Ocean Cleanup, a non-profit organization, designs and develops
advanced technologies to rid the world’s oceans of plastic by means of ocean
cleanup systems and river interception solutions.Awards Categories: Data Science
for Good AI Democratization & Inclusivity Challenge: At the Ocean Cleanup, data
science needs not only to be applied to develop the technical solution to get
rid of the oceans’ plastics, but also to maximize opportunities for funding and
sustaining the broader organization. We started using Dataiku in 2018, as we
were facing numerous data science challenges: 1. Data pipelines managementWe
were first lacking an adequate tool to manage data processing pipelines that
would allow for ad-hoc data updating and processing with an optimal computing
time. Some of the data that we were manipulating was faulty (in part because of
satellite transmissions shortening messages) and we were missing a tool to have
a quick scan through the data, in order to elaborate the right approach to
correct it. We were missing a tool to automate the updating of our pipeline,
especially accounting for specific triggers, but also allowing for dashboarding
and reporting options. 2. Handling different formats and types of dataThe data
that we manipulate can be structured and unstructured, comes in various formats
from different providers. As a consequence, being able to handle at the same
time single .csv files, databases, retrieving the data from built-in and/or
provider specific API was compulsory. Along with the diversity of data that we
manipulate in form, we also manipulate data of different nature (scientific
measurements, geospatial information, natural language, financial data, etc.),
which also calls for a versatile data science processing solution. 3. Lack of
centralized platformAll the AI/Machine learning frameworks that we came across
were not very user-friendly and required too much expertise to be promoted
internally. Finally, we were looking for a collaborative data science platform
to allow for multiple users, with specific roles/rights/access.   Solution:
Thanks to Dataiku, we were able to address a big part of the challenges
aforementioned. In the frame of the Ikigai program we were given access to both
Dataiku as a platform at a company-wide level, but also to Dataiku staff and
expert-knowledge to support the implementation of our data science projects. 1.
Empower people across the organization to gain insightsHaving access to Dataiku
allowed us to ramp up our Data Science analysis. The user-oriented,
code-minimalistic approach provided by the Dataiku pipeline was a game changer
both for our data pre-processing and post-processing steps, the extensiveness of
built-in operations to manipulate and prepare the data made it possible for less
programming-savvy staff to perform their usually very time-consuming operations,
as such they felt like using a steroid-powered Excel! The built-in GIT version
controlling, and the logging of each individual operations allows for a readable
and sustainable project approach. The collaborative environment, and the overall
user experience allowed for a company-wide adoption. As part of the Ikigai
program for nonprofits, Dataiku provided support to enable users on the platform
through trainings, projects co-development, and support. The Dataiku staff also
contributed directly to develop some of the most innovative projects, including
Emilie Stojanowski, Matthieu Scordia, Paul Hervet, Jacqueline Kuo among others.
2. Leverage & reuse data pipelines and features to save timeIn regard to the
product itself, Dataiku helped us better manage our data pipelines, so as to
track what has been done and leverage accomplishments for future projects. We
first started using Dataiku for the testing of our barriers in November 2018,
then less than a year later we easily replicated the same data workflow for a
new test campaign – leveraging these new efficiencies to spend more time
developing features. In November 2020, during a campaign in the North Sea, our
engineers only went through a quick Dataiku training to be able to reuse the
previous data pipelines and features, so as to focus their time on where they
can add most value. 3. Adapt to different data (in format & type) and use cases
as we expandThanks to its great versatility, Dataiku enabled us to connect to
many different data formats, APIs, plugins, etc. This is paramount as we are
handling different data in nature as well, and the platform capabilities are key
to adapt. For instance, visual pre-treatment features allow us to identify when
satellite data is cut without needing to complete the entire preparation
process, and filter this out – which saves much-needed time and resources for
our nonprofit organization. Impact: As a Non-profit organization, our main Key
Performance Indicator is the quality/time ratio of the tools that we are using.
In other words, our biggest objective is to have the most reliable yet versatile
data science platform to efficiently conduct data projects and create the
biggest impact across the organization. Dataiku enabled us to dramatically
improve this KPI through different levers: 1. Improved operational efficiencies
to focus resources on innovationBefore, we had to extract data from SQL
databases, aggregate them, build interpolations, etc. with a combination of
Excel and MatLab. Dataiku enabled us to centralize the whole workflow, while all
practitioners are able to work with the technologies they’re used to – making us
move faster and going further into the most innovative parts. 2. Gather everyone
on the same platform for quicker decision-makingThanks to the visual interface,
Dataiku enables both technical and non-technical stakeholders to understand the
data workflow and the success metrics of projects developed. This is enabling us
to make quicker decisions, for instance regarding the success of specific
campaigns, and make adjustments on the go to meet our goals. 3. Easy onboarding
to bring in more people to better fit projects’ needsThe user-friendly interface
makes it easy to onboard new people into the platform. The learning resources,
as well as the catalog of events and content, give people a vast perspective on
data science topics. Our core data science team is aided by 5 times more people
across the organization who have been given access on a temporary basis to bring
their expertise on various projects. 4. Visual “recipes” enable everyone to
bring in their skills and shorten the time-to-insightThe visual features for
data wrangling and visualization enable everyone to contribute their skills to
successfully conduct data projects. Even the project managers and director are
able to draw insights from the data at hand. 5. Democratize data science through
a versatile, all-in-one platformBuilding upon all levers described above,
Dataiku’s biggest impact lies in the. democratization of data science across the
organization. From the original technical testing project, we’ve expanded usage
to finance (understanding fundraising dynamics to increase the impact of our
campaigns) and communications (optimizing social media content and timing to
maximize visibility, and therefore improve fundraising abilities). Through
bringing together everyone on the same platform and the rise of “citizen data
science”, Dataiku enabled us to embed data science across the organization to
create more value toward fulfilling our mission.
0 2
Posted by BrunoTOC
HES-SO - Teaching the Next Generation of Chief Data Officers with Dataiku
Team members:Cédric Gaspoz, Professor UASDominique Genoud, Professor
UASCountry:SwitzerlandOrganization:University of Applied Sciences and Arts
Western Switzerland (HES-SO)Description:HES-SO is a network of 28 schools of
higher education offering degree programmes in six key fields to some 21,000
students. Our universities play a key role in the social, economic and cultural
development of each of western Switzerland’s seven cantons. The Master of
Science in Business Administration (MSc BA) gives the opportunity to develop the
understanding of management they acquired during their Bachelor’s course and
specialise in a fast-growing area of competence.Awards Categories: Excellence in
Teaching Challenge: To accompany our students and their future employers in the
digital transition, we have thoroughly revised our business intelligence
courses. Our students must not only be able to analyze data, but also to become
information producers with all the steps that this includes. During the three BI
courses of the master, we start by refreshing the knowledge of R, before
starting the discovery of data science that leads us from data acquisition to
Deep Learning. It was possible to introduce the students to the different types
of training and evaluation of the models by using the available metrics and data
splitting that are usually used in machine learning. The built-in graphical
explanations about the results greatly facilitated the understanding of the
tuning of the models and their understanding. Another challenge we wanted to
address with this redesign was the production stage. Often, curricula stop at
the learning of models and their evaluation. However, from a business point of
view, it is only when the models are deployed that we start to create value. It
was therefore important to be able to concretely see how to use the models to
support business processes. When redesigning these courses we faced several
challenges: Multiple tools implemented depending on the languages (R or Python)
Tools dedicated to only one part of the workflow (data cleaning, machine
learning...) Lack of tools for the release of models in production Lack of
understanding of the metrics used to check the quality of the models in
production No possibility of collaboration between students on the same project
Feedbacks and corrections take a lot of time (file transfer between students and
teachers) IT support for multiples tools The lack of integration of the tools
also prevented us to successfully proposing integrative pedagogical scenarios
because it was difficult to actively collaborate with several people on the same
task. Solution: During our review of different tools, we had the opportunity to
test Dataiku. The ability to support all phases of the lifecycle as well as the
integration of notebooks convinced us to pursue the discussions with the Dataiku
academic team. The most important weakness was the absence of the API services
in the academic offer, which Dataiku finally integrated into its offering. Our
Dataiku instance has been deployed in our global infrastructure on Azure and is
perfectly integrated in our processes (incl. provisioning, authentication...).
After one year of classes, we have 109 users in 39 groups who have produced 646
projects, 2,491 recipes, 614 notebooks, 752 models and 67 API services. This
usage includes exercises and work done in class, individual projects, group
projects, a hackathon and some master's thesis. Using the Dataiku API allows us
to efficiently create projects, assign rights, track progress and evaluate
results. As teachers, especially in the pandemic year, Dataiku allowed us to
support all teaching activities. The first discovery of Dataiku was through the
R notebooks. By revisiting the statistical basics and the R language, students
started to use the dataset features. Then, through a day animated by a data
scientist from Datailu, the students discovered the preparation and
classification of data with the integration of R recipes. As the weeks went by,
we introduced more advanced notions to finish with image recognition using deep
learning. Finally we saw how to publish a model using the API services and
integrate it using a simple webapp, also created in Dataiku. Various group
projects allowed the students to put their knowledge into practice on different
datasets related to concrete business problems (e.g. sales prediction, churn,
audit, mortgages). The groups performed all the tasks related to the lifecycle:
data preparation, feature creation, feature selection, model learning,
hyperparameter selection, selecting the best model, deploying the model on the
API services, and querying the model with a webapp: Impact: The course content
is organized around the tasks of the CDO (Chief Data Officer). In collaboration
with CDOs, who are also involved in the course, we have defined 26 user stories
that cover all aspects of the function. While the theoretical aspects are
covered in a more traditional way, the practical aspects are realized on
Dataiku. Thus, without being data scientists, the students had the opportunity
to concretely explore the different aspects of the job and to implement them
through various use cases. This allows them to specialize in data science or in
managerial functions where they will be required to manage the different aspects
of data projects within multidisciplinary teams. Because of the importance of
practice during the courses, we have adapted the assessments to allow students
to be in a situation close to reality. At the end of the course, a hackathon was
organized with the goal of developing a webapp for investors, allowing them to
determine the financing opportunities, based on a dataset on the success of
startups according to the financing rounds. During 10 hours, groups of students
from management and computer science departments had to complete 15 tasks (data
balance analysis, feature creation with R, subpopulation analysis, ...) and
produce 10 deliverables (map of failures and geographical disparities of
factors, evaluation of model results, ...). A board meeting was also organized
in the middle of the day to review the intermediate results and distribute new
data. It should also be noted that the 10 groups (40 students) had to work
remotely due to health restrictions, which would not have been possible without
a tool such as Dataiku. At the end of the day, the groups were able to present
their results and demonstrate their webapp based on the best trained model. 
After the first iteration, student satisfaction was very high and the Dataiku
tool was quickly adapted. Several students chose to do their master thesis using
Dataiku. Quotes from the course evaluation: “Really cool to have discovered and
tested R and Dataiku. Thanks for the Hackathon experience and the whole
organization!” “Very rich content, intervention of a Dataiku data scientist.”
“Very interesting material and dynamic presentation, good alternation between
theory and exercises.” The knowledge of Dataiku by our students allowed us to
propose master thesis subjects including advanced machine learning algorithms.
As the Dataiku tools were sufficiently well understood, many students have
chosen subjects containing machine learning, ranging from sleep cycle analysis
to recognition of objects on geographical maps, and face recognition. They will
all use the features provided by the Dataiku framework.Number of distinct users
per day - showing a strong interest in Dataiku, even outside of the class!
0 3
Posted by cgaspoz
Epsilon DX Machine Learning - Leveraging Dataiku to Build a Real-time Decision
Engine
Team members:Mark Sucrese (VP of Marketing Sciences), with Kevin Ng, Ravi
Nagabhyru, Ben Eubank, Felice Brezina, Raghavan Kirthivasan, Kevin Elwood, Ben
McVay, and Wayne Townsend. Country:United StatesOrganization:Epsilon DX Machine
LearningDescription:Epsilon DX is an organization that is focused on delivering
value through; partnerships, engineering, creativity, strategy, software
implementation, and best-in-class managed services. We work with our clients to
help them take on the challenges of today, tomorrow, and beyond. We have deep
expertise in partnerships like; Adobe, Salesforce, Dataiku, IBM, Microsoft, and
Sitecore to create unique partnerships that bring value to our clients not seen
by others.Awards Categories: Value at Scale Challenge: We have been asked by a
large US retailer to create a universal decision engine, that leverages modern
machine learning technology, to create omnichannel personalized experiences that
can scale the enterprise for digital and non-digital customer engagements. Areas
of focus for personalization are: Product and offer recommendations, Optimized
content and messaging, Improved and targeted pricing, Ability to reduce fraud.
The brand wants to ensure that model development can be leveraged by both data
scientists and business analysts in a collaborative way, and go from development
to production in a short amount of time. Lastly, the brand needs improved model
transparency and interpretation to ensure compliance with legal, data, IT, and
marketing teams. Solution: Epsilon worked with Dataiku to build a real-time
decision engine leveraging Dataiku for model development, workflow, and
execution, automation node for job scheduling and monitoring, and API node for
integrated services to the various applications for batch and real-time
processing. We integrated this solution into the brand's enterprise CRM and
email marketing applications to deliver hyper-personalized email experiences. As
each email campaign is generated from the marketing teams, the system calls our
environment to return the next best actions for things like product
recommendations, offers and promotions, best content, and best subject lines.
This sense and response environment ensures that no two emails are ever the
same, and each one is uniquely personalized for every customer profile. Impact:
The machine learning test groups have outperformed the control groups by 47% for
revenue per email open. The machine learning test groups have driven a 20% lift
in conversion rates over the control groups for all emails that were opened. To
date, the program has generated ~31k in net new revenue week over week. Created
a first of its kind, content optimization delivery system using deep learning
and computer vision models in Dataiku.
0 1
Posted by msucrese
Pr. Kurnicki (Hult International Business School) - Frictionless Data Mining for
Advanced Learning
Name:Thomas KurnickiTitle:ProfessorCountry:United StatesOrganization:Hult
International Business School Description:Hult International Business School is
a new kind of non-profit business school that constantly innovates to meet the
needs of students, employers, and society in a world that is changing faster
than ever before. More than a business school, Hult is a dynamic and
multicultural community that educates, inspires, and connects some of the most
forward-thinking business talent from around the world.Awards Categories:
Excellence in Teaching Challenge: Before using Dataiku, our Data Mining class
was based on 5 different data environments and IDEs: SQL Server, MongoDB,
Hadoop, Arrangemo DB. Students had to navigate from one access point to another.
Students would report multiple tech issues in different systems and our teaching
assistants spent a lot of time trouble-shooting. Solution: Dataiku helped us
consolidate the environments. Instead of using 5 different environments
(accessible from different apps), we used Dataiku as an all in one software
where students could seamlessly move data between the different environments. At
the end, students used one software, Dataiku, throughout the entire class.
Impact: The Data Mining class allows students to explore many different data
environments and learn the advantages and disadvantages of using a particular
data environment for a given business case. Students learn when they should use
a SQL, NoSQL, Hadoop environment and how to make that decision. By enabling them
to switch in just a few clicks, Dataiku removes the friction and enables them to
quickly compare and assess the most relevant technology for their project -
saving much-needed time and headaches. In the real-world, many data teams need
to connect to multiple data sources to collect required data - this class shows
students how to do that.
0 2
Posted by LisaB
MandM Direct - Managing Models at Scale to Deliver Faster Insights
Team members:Ben Powis - Head of Data ScienceJoel Lenden - Junior Data
ScientistTobi Osinowo - Data ScientistJim Taylor - Data AnalystOisin Devitt -
Data AnalystCountry:United KingdomOrganization:MandM DirectDescription:Our
journey began in 1987 by founders Mark Ellis and Martin Churchward (the 'two Ms'
), selling end of line sports products directly to customers in the UK. Now more
than 30 years on, we're now one of Europe's leading online off price retailers -
with over 2 million active customers. We have dedicated local market websites in
Ireland, Germany, France, Netherlands, Denmark and Poland as well as dispatching
to another 20+ countries worldwide. Our success is down to our commitment and
passion for seeking out the biggest fashion, sport and outdoor brands at
unbeatably low prices all year round, to make sure you get even more for your
money.Awards Categories: AI Democratization & Inclusivity Value at Scale
Challenge: MandM Direct is one of the largest online retailers in the United
Kingdom with over 3.5 million active customers and seven dedicated local market
websites across Europe. The company delivers more than 300 brands annually to
25+ countries worldwide - which means that in 2020, we grew fast. This meant
more customers and, therefore more data, which magnified some of our challenges:
Getting all the available data out of silos and into a unified, analytics-ready
environment: The core data team is made up of four people (two data scientists,
and two data analysts), but we extend our reach by leveraging a hub and spoke
model for our data center of excellence, meaning we work with analysts embedded
across the business lines to scale our efforts. However, this requires an easy
way to enable those teams to leverage data to answer business questions that
doesn’t necessarily involve code. Scaling out AI deployment in a traceable,
transparent, and collaborative manner: MandM’s first machine learning models
were written in Python (.py files) and run on the data scientist’s local
machine, and we needed a way to prevent interruptions or failure of the machine
learning deployments. In an attempt to tackle the second challenge, our team
moved these .py files to Google Cloud Platform (GCP), and the outcome was well
received by the business and technical teams in the organization. However, once
the number of models in production went from one to three and more, we quickly
realized the burden involved in maintaining models. There were too many
disconnected datasets and Python files running on the virtual machine, and we
had no way to check or stop the machine learning pipeline. Solution: We turned
to the powerful combination of Dataiku and GCP to answer these critical
challenges. With Google BigQuery’s fully-managed, serverless data warehouse, we
were able to break the data silos and democratize data access across teams.
MandM Direct was one of the first online retailers to implement Google BigQuery
across the organization. At the same time, thanks to Dataiku’s visual and
collaborative interface for data pipelining, data preparation, model training,
and MLOps, our team could also easily scale out the models in production without
failure or interruptions - all this in a transparent and traceable way. MandM
now has hundreds of live models doing everything from scoring customer
propensity to generating pricing models, all with visibility into model
performance metrics, clear separation of design and production environments, and
many more MLOps capabilities built into the platform. Teams can now easily
push-down and offload computations for both data preparation and machine
learning to GCP. Using Dataiku means this capability is accessible to all user
profiles across the organization, without knowing the underlying technologies or
complexity. We love the flexibility offered by Dataiku. We do have a mix of
people that go more toward AutoML and visual tools as well as one data scientist
who loves to work in code. That’s the beauty of the platform and why we chose it
— we didn’t want a low-code tool where we could get lazy and just click a few
buttons. Now the team has the best of both worlds: if they want to nerd out and
go under the hood, they can do that. If they need a quick model, they can do
that too. Impact: The benefits we have seen by using Dataiku and GCP aren’t
limited to time saved from tedious maintenance work - we’re also having more
impact across the business. Since we began our journey with Dataiku in January
2020, 54 projects were created, which handle 1,171 different datasets and are
orchestrated by 53 different scenarios, making sure our models build only when
the data is available and validated. We have 9 large projects deployed to an
automation node, which are solving complex business problems or providing
advanced insight on a daily basis. Our data team is now able to deliver a
variety of solutions on business problems from adtech to customer lifetime
value, whether that’s a dashboard, a more detailed piece of analysis or a
machine learning project deployed in production. For example, one application
might be business users in the buying and merchandising teams, who could
interact with machine learning models in their day-to-day work through Dataiku
applications, which provide a nontechnical interface for projects developed by
the data team. We’ve also built out a feature library with Dataiku that contains
more than 400 features specific to MandM’s business. Now, the feature library is
the first place people go, sort of like a shop window for machine learning
projects — it takes away the monotony and repetition of their work. Having a
platform like Dataiku allows our data scientists to focus on building cool
things, not spending hours and hours on maintenance and making sure things are
running. With workflows deployed in Dataiku, we save days of work every month.
0 2
Posted by ben_p
Template Submission - Predicting the Sakura Blooming Day
Name:Makoto MiyazakiTitle:Data
ScientistCountry:FranceOrganization:DataikuDescription:Dataiku is the world’s
leading AI and machine learning platform, supporting agility in organizations’
data efforts via collaborative, elastic, and responsible AI, all at enterprise
scale. Hundreds of companies use Dataiku to underpin their essential business
operations and ensure they stay relevant in a changing world.Awards Categories:
Alan Tuning Challenge: Sakura, the world-famous cherry blossom in Japan, happens
every year in the spring. It is a world-renowned attraction, and many people
travel from far to witness its wonders. However, sakura blooms only for a short
period of time: seven days after the flowers open, they already start to
scatter, so many people simply miss it. As I’m a Data Scientist at Dataiku, I
took it as a challenge to build a prediction model for the bloom of Sakura using
Dataiku DSS - and see if I could obtain more accurate predictions than other
websites! Solution: Dataiku enabled me to automatically update the prediction on
a daily basis, thanks to the scenario automation feature of Dataiku DSS: 
Everyday at 2 a.m., a Python recipe scraped the weather information in the three
cities from the previous day and updated the predictions like the chart below: 
The two other main forecasting websites, tenki.jp and Japan Weather Association
(JWA) respectively updated their prediction once a week and every two weeks.
Daily updates are a big plus to gather more accurate forecasts on a precise
blooming day! My Dataiku DSS flow can be seen below: It consists of two zones:
data pre-processing and machine learning: Data preprocessing Inputs: daily
weather data from 1991 until today in the three cities as well as the historical
blooming days from the past 30 years + daily weather data scraped from the Japan
Meteorological Agency using a Python recipe, including average, highest, and
lowest temperature, precipitation, and daylight hours. Feature generation with a
window recipe: generating rolling averages during the past one month, three
months, and six months for each of the weather-related variables for each of the
three cities. I also made an average of the blooming days during previous years
for each city, assuming that the blooming day does not differ much from year to
year. Machine learning Includes two Random Forest algorithms, scoring one
dataset for each: These two scored datasets are then combined to create a single
prediction result. I made it this way because I set the target variable to
“number of days until blossom.” This target variable itself takes a value
between 0 and 365 (or even more). But I wanted to tell the model to look at this
as a cyclical variable, so that it can correctly assess the error. For this, I
scaled the variable to a range of 0 to 2π, then decomposed it into sine and
cosine. Therefore, one model predicted the sine value, another predicted the
cosine value. I combined the prediction results and reversed it to a day unit.
Impact: I run my prediction, humoristically called ‘Random Sakura Forest’, for
three cities in Japan: Oita prefecture (southern Japan), Aomori prefecture
(northern Japan), and Tokyo. My predictions were a few days behind the two other
forecasting websites: Sakura in Tokyo bloomed on March 15, so my prediction was
already proven wrong - but both of the other forecasting websites also missed
the forecast, although at a smaller extent. However, my predictions were closer
regarding Oita, which blossomed on March 24 (I predicted March 22 vs other
websites had March 15). I was also more in line, together with JWA, regarding
Aomori, which opened on April 14 (predicted April 21). A random forest with 500
trees and the maximum depth of 100 yielded the best result, and I was able to
reduce the error to four days. One interesting finding is that the model favored
only the temperature-related features. All the other features, such as
precipitation and daylight hour, had very little impact on the result. In Japan,
forecasting the Sakura blooming day is the daily news headline throughout
spring. Hence since the 1950s, a lot of methodologies have been addressed,
including multiple regression analysis. Nowadays, most of the Sakura blossom
forecasters use a formula based on a method developed by Yasuyuki Aono,
Associate Professor at Osaka Metropolitan University, in 2003. Aono’s approach
is unique in a way that it’s composed of two parts well-incorporating the
biology of Sakura trees. First, it computes a D-day, where the trees wake up
from their sleep during the winter time. This D-day is computed from a place’s
latitude, distance from the sea shore, and average temperature during January
and March, which therefore depends on the place. What the Aono method tells us
is that the blooming day depends solely on the place’s geographical position and
its temperature, which is indeed consistent with my prediction result!
0 1
Posted by Makoto



2021 WINNERS & FINALISTS

Discover the winners and finalists of the 2021 edition, and read their story to
learn about their pioneering achievements in data science and AI!

Use the following labels to filter by award category:

 * Organizational Transformation
 * AI Democratization & Inclusivity
 * Data Science for Good
 * Responsible AI
 * Value at Scale
 * Excellence in Teaching
 * Excellence in Research
 * Alan Tuning


ORGANIZATIONAL TRANSFORMATION

Recognizing individuals and organizations who are building the foundations of a
data-centric culture with Dataiku.

WINNER



SUBMISSION

Building an Intelligent Data Operations for Financial Planning and Performance
Management

ORGANIZATION

Standard Chartered Bank

TEAM MEMBER

Craig Turrell, Head of Digital Centre of Excellence P2P

Read their story

FINALIST



SUBMISSION

Using Dataiku to Democratize AI Within the Organization

ORGANIZATION

Schlumberger

TEAM MEMBER

Valerian Guillot, Nerve Center Data Science Architect

Read their story

FINALIST



SUBMISSION

Empowering a Data-driven Organization to Improve Astronomical Operations

ORGANIZATION

ALMA Observatory

TEAM MEMBER

Ignacio Toledo, Data Analyst

Read their story


AI DEMOCRATIZATION & INCLUSIVITY

Revealing the outstanding work of individuals and organizations who are
leveraging Dataiku to enable all people to gain insights from their data.

WINNER



SUBMISSION

Using Dataiku to Democratize AI Within the Organization

ORGANIZATION

Schlumberger

TEAM MEMBER

Valerian Guillot, Nerve Center Data Science Architect

Read their story

FINALIST



SUBMISSION

Building Self-service NLP for Analysts Worldwide

ORGANIZATION

Unilever

TEAM MEMBERS

Linda Hoeberigs, Head of Data Science and AI, PDC Lab & Ash Tapia, Data
Partnerships & Tools Stack Manager

Read their story

FINALIST



SUBMISSION

Developing a Scalable Digital Voice of the Consumer Capability

ORGANIZATION

Unilever

TEAM MEMBER

Anand Patel, Analytics Manager

Read their story


DATA SCIENCE FOR GOOD

Turning the spotlight on the best use of Dataiku by nonprofits, companies, and
individuals, to make a positive impact on the world.

WINNER



SUBMISSION

Bringing Workers Home Safe Through Defect Detection

ORGANIZATION

Atlantic Plant Maintenance

TEAM MEMBER

Aaron Crouch, Data Analytics Manager

Read their story

FINALIST



SUBMISSION

Empowering Citizen Data Scientists Across the Organization

ORGANIZATION

The Ocean Cleanup

TEAM MEMBER

Bruno Sainte-Rose, Lead Computational Modeler

Read their story

FINALIST



SUBMISSION

Helping Nonprofits Leverage Insights From Their Data

ORGANIZATION

41xRT

TEAM MEMBER

Tom Brown, Non Profit Data Science & Analytics Advocate

Read their story


RESPONSIBLE AI

Highlighting the individuals and organizations who are using Dataiku to develop
foundational AI for the future, that is governable, sustainable, transparent and
free of unintended bias.

WINNER



SUBMISSION

Designing a Responsible, Self-service Tool for Natural Language Processing

ORGANIZATION

Unilever

TEAM MEMBERS

Linda Hoeberigs, Head of Data Science and AI, PDC Lab & Ash Tapia, Data
Partnerships & Tools Stack Manager

Read their story

FINALIST



SUBMISSION

Building a Feature Store for Quicker and More Accurate Machine Learning Models

ORGANIZATION

Premera Blue Cross

TEAM MEMBER

Marlan Crosier, Senior Data Scientist

Read their story

FINALIST



SUBMISSION

Talent Acquisition Enablement with Machine Learning

ORGANIZATION

Schlumberger

TEAM MEMBER

Modhar Khan, Head of People Analytics

Read their story


VALUE AT SCALE

Showcasing the pioneering individual and organizational use of Dataiku to manage
the full lifecycle of models and pipelines, and deliver value at scale.

WINNER



SUBMISSION

Dynamic Audit Planning Through Machine Learning-based Risk Assessment

ORGANIZATION

Royal Bank of Canada

TEAM MEMBER

Masood Ali, Senior Director, Data Strategy & Governance

Read their story

WINNER



SUBMISSION

Reducing Detection Time of Manufacturing Issues with Real-time Automated Process
Control

ORGANIZATION

NXP Semiconductors

TEAM MEMBER

Adnan Chowdhury, Manufacturing Quality Engineer

Read their story

FINALIST



SUBMISSION

Streamlining & Augmenting the Well Evaluation Process at Scale

ORGANIZATION

Schlumberger

TEAM MEMBER

Rasesh Saraiya, Data Scientist

FINALIST



SUBMISSION

Leveraging AI to Democratize Insights From Customer Feedback

ORGANIZATION

Malakoff Humanis

TEAM MEMBER

Nikola Lackovic, Data Scientist

Read their story

FINALIST



SUBMISSION

Human-centered Machine Learning for Dimensioning Resources in Telecoms

ORGANIZATION

Ericsson

TEAM MEMBER

Marcial Gutierrez, System Manager

Read their story


EXCELLENCE IN TEACHING

Recognizing members of the teaching faculty for their invaluable contribution to
educate the next generation of data science talent with Dataiku, driving
innovation in the field and aligning with real world use cases.

WINNER



SUBMISSION

Teaching the Next Generation of Chief Data Officers with Dataiku

ORGANIZATION

HES-SO

TEAM MEMBERS

Cédric Gaspoz and Dominique Genoud, Professors UAS

Read their story

FINALIST



SUBMISSION

Data Analysis Bridges Finance Theory and Practice

ORGANIZATION

Columbia University

TEAM MEMBER

Perry Beaumont, PhD., Lecturer

Read their story

FINALIST



SUBMISSION

Dataiku as a Leading User-Friendly Data Science Platform for MBA Students

ORGANIZATION

INSEEC U.

TEAM MEMBER

Linda Attari, Director of MSc 1 Data Management and MSc2 Data Analytics

Read their story

FINALIST



SUBMISSION

Facilitating & Enhancing the Data Science Learning Experience with Dataiku

ORGANIZATION

Live University

TEAM MEMBER

Fernando Enobi, Professor

Read their story

FINALIST



SUBMISSION

Dataiku as a Versatile Platform for BI & Beyond

ORGANIZATION

Hochschule Hannover

TEAM MEMBER

Dr. Maylin Wartenberg, Professor

Read their story


EXCELLENCE IN RESEARCH

Starring academic researchers who are leveraging Dataiku to gain impactful
insights from their data and push the frontiers of our knowledge.

WINNER



SUBMISSION

Mapping Police Fatal Encounters to Inform Future Policy

ORGANIZATION

University of Michigan

TEAM MEMBERS

Frank Romo, Master of Urban Planning Researcher, & Harley Etienne, Professor

Read their story

FINALIST



SUBMISSION

Streamlining Data Workflows for Clinical Research

ORGANIZATION

Hospital de Clínicas de Porto Alegre

TEAM MEMBER

Tiago Andres Vaz, Head of AI

FINALIST



SUBMISSION

Software Analysis Execution Process Improvement and Prediction Program

ORGANIZATION

Leidos

TEAM MEMBER

Karen Cheng, Data Scientist

Read their story


ALAN TUNING

Rewarding the pioneers who are pushing the boundaries of Dataiku to build
innovative projects - including for fun!

WINNER



SUBMISSION

Leveraging AI to Democratize Insights From Customer Feedback

ORGANIZATION

Malakoff Humanis

TEAM MEMBER

Nikola Lackovic, Data Scientist

Read their story

FINALIST



SUBMISSION

Software Analysis Execution Process Improvement and Prediction Program

ORGANIZATION

Leidos

TEAM MEMBER

Karen Cheng, Data Scientist

Read their story

FINALIST



SUBMISSION

Streamlining & Augmenting the Well Evaluation Process at Scale

ORGANIZATION

Schlumberger

TEAM MEMBER

Rasesh Saraiya, Data Scientist

FINALIST



SUBMISSION

Building an Emotion Classification System on Videos

ORGANIZATION

IME

TEAM MEMBER

Mohamed AbdElAziz Khamis Omar, Senior Data Scientist

Read their story

Top


 * 
 * 
 * 
 * 



© 2012-2022 Dataiku. All rights reserved.

 * Privacy Policy
 * Cookie Policy
 * Events Code Of Conduct

Auto-suggest helps you quickly narrow down your search results by suggesting
possible matches as you type.