Feb 042018

Source: https://www.analyticsvidhya.com

In a previous blog post I wrote about how Information Architecture is an enabler for Big Data Analytics. This posts will focus on how Information Architecture is used to enable Big Data analytics and essentially become AI’s Secret Ingredient!

Most of the discussion around Artificial Intelligence (AI) is focused on Machine Learning, Cognitive Computing and Big Data Analytics. However, you must prepare your organization’s data in order to properly take advantage of AI tools that are focused on Big Data Analytics (such as, IBM Watson, Amazon, Microsoft, and Google). To properly prepare your data you will need to apply Information Architecture.

Information Architecture (IA) is AI’s secret ingredient. IA provides the process, procedures and methods to perform Content (semi-structured and unstructured data) Curation. IA being applied to perform content curation focuses on the semi-structured and unstructured data that comprises over 90% of the data being analyzed by big data analytics. Semi-structured data is a form of data that does not conform with the formal structure of data in a databases or data tables, but contains tags to separate elements and enforce hierarchies within the data (i.e., spreadsheets, XML files). Unstructured data is a form of data with no tagging, metadata or inherent structured associated with it (i.e., image, text, voice, video). Content typically refers to the container that the semi-structured and unstructured data resides in (i.e., .pdf, .doc, .xml, .ppt, .csv).

Content curation enables the extraction of value from data, and it is a capability that is required for areas that are dependent on complex and/or continuous data integration and classification. The improvement of data curation tools and methods directly provides greater efficiency of the knowledge discovery process, maximizes return of investment per data item through reuse, and improves organizational transparency.

Most organizations just deal with creating a common data model depicting all structural data in the organization. Although this is a great start, it is only part of your data picture. Creating a common or enterprise content model depicting semi-structured and unstructured data will complete your data picture and lay the foundation for data centralization providing the most accurate and holistic representation of your organization’s data.

Creating a common view of your semi-structured and unstructured data is a daunting task! Due to the massive amount of data and the variety of sources, it is important to start small, splitting the data into specific domains. In aligning your common data model and content model you must have common terms and consistent structures. In particularly for your semi-structured and unstructured data, applying consistent metadata to fully describe this data is important.

Content Curation Process for Big Data Analytics

Content curation provides the methodological and technological data management support to address data quality issues, maximize the usability of the data; provide an active and on-going management of data through its lifecycle; perform data discovery and retrieval, create and maintain quality, add value, and provide for re-use over time.

Content curation process includes the following activities:

  • Content Audit
    • Gather the requirements for content, as well as measurement and evaluation criteria. Perform Content Audit from the various content sources under consideration; determine what content is ready to be consumed, evaluate the quality of content, determine the gaps in content, and identify the measurements to determine what content is used (and not used).
  • Content Analysis
    • Content analysis examines information concepts, relationships, business rules and metadata. This provides a sharable, stable and organized structure for content (information and knowledge) for the enterprise. Semantics will address the meaning of the concepts identified in the content model as well as the meanings of the relationships between the concepts (usually expressed as business rules).
  • Address Content Gaps
    • The results of performing a Content Audit will determine the gaps in content and identify the additional sources of content that are needed for effective big data analytics.
  • Content Selection & Validation
    • Content selection should be considered in terms of significance, how essential or basic is it to the discipline; validity, is the content accurate, current and relevant to the domain under consideration; relevance: what is the discipline/workplace/ societal value of this content? Utility: how useful will the content be to overall domain under consideration. Validation or content validity is concerned with making sure that the content is accessed from and/or based on the authoritative or trusted source, reviewed on a regular basis (based on the specific governance policies), modified when needed and archived when it becomes obsolete.
  • Classification
    • The classification of content will be in the form of one or more ontologies/taxonomies. Classification of information will also be realized through controlled vocabularies and thesaurus. The structure refers to the methods to aggregate the concepts and metadata into the domain ontology/taxonomy.
  • Align Content to Domain Ontology/taxonomy
    • Categorizing content and aligning the content to a common ontology/taxonomy is essential to big data analytics due to the varied number of data sources under consideration.
  • Transformation
    • Transformation provides consistent look-n-feel between similar content types; consistently identified with standard and precise metadata; aligned to an accurate and exact ontology/taxonomy
  • Preservation and Governance

Preservation and Governance has the following characteristics:

    • Content and Classification Stewardship: The focus here is on establishing accountability for the accuracy, consistency and timeliness of content, content relationships, metadata and taxonomy within areas of the enterprise and the applications that are being used.
    • IA Management and Maintenance: This refers to the specific details on how the enterprise manages and maintains changes to content, content relationships, metadata and taxonomy. This is facilitated through the use of specific process and workflows.
    • Policies and Procedures: This refers to establishing and /or conforming to information policies for generation, consumption and access of content (information and knowledge); This also addresses how information is handled – Organization has detailed information policies associated with specific information types (i.e., Underwriting Guidelines, Rate Manuals, Pricing Strategies)
    • Enforcement: Enforcement of governance pertains to the implementation and execution of the policies and procedures identified in the governance plan. Establishment of a governance board will be the organizational entity to carry out the enforcement of governance, while the applications/tools must be configured to enforce governance of content on a day-to-day basis.

There is the need to discover patterns and create models to address a specific task or a business objective. Semi-structured and unstructured data is vital to the decision-making process. Defining a structured representation associated with the data allows users to compare, aggregate, and transform the data. With more data available, the barrier of data acquisition is reduced. To extract value from the data it needs to be systematically processed, transformed, and repurposed into a new context. Curation of semi-structured and unstructured data in big data analytics is driven by the need to reduce the time-to-market, reduce the time to create new products, repurposing existing content and to improve accessibility and visibility of information artifacts. Curation is important because of the growth of the variety of sources used in Big Data. Selecting your data from a variety of well curated sources will add richness to your Big Data Analytics results!


May 312017

AI and KMThis is the first of a three (3) part post on the connection between Artificial Intelligence and Knowledge Management.

Artificial Intelligence (AI) has become the latest “buzzword” in the industry today. However, AI has been around for decades. The intent of AI is to enable computers to perform tasks that normally require human intelligence, as such AI will evolve to take many jobs once performed by humans. I studied and developed applications in AI from the mid to late 1980’s through the early 2000’s. AI in the late 1980’s and early 1990’s evolved into a multidisciplinary science which included expert systems, neural networks, robotics, Natural Language Processing (NPL), Speech Recognition and Virtual Reality.

Knowledge Management (KM) is also a multidisciplinary field. KM encompasses psychology, epistemology, and cognitive science. The goals of KM are to enable people and organizations to collaborate, share, create, use and reuse knowledge. Understanding this KM is leveraged to improve performance, increase innovation and expand what we know both from an individual and organizational perspective.

KM and AI at its core is about knowledge. AI provides the mechanisms to enable machines to learn. AI allows machines to acquire, process and use knowledge to perform tasks and to unlock knowledge that can be delivered to humans to improve the decision-making process. I believe that AI and KM are two sides of the same coin. KM allows an understanding of knowledge to occur, while AI provides the capabilities to expand, use, and create knowledge in ways we have not yet imagined.

The connection of KM and AI has lead the way for cognitive computing. Cognitive computing uses computerized models to simulate human thought processes. Cognitive computing involves self/deep learning artificial neural network software that use text/data mining, pattern recognition and natural language processing to mimic the way the human brain works. Cognitive computing is leading the way for future applications involving AI and KM.

In recent years, the ability to mine larger amounts of data, information and knowledge to gain competitive advantage and the importance of data and text analytics to this effort is gaining momentum. As the proliferation of structured and unstructured data continues to grow we will continue to have a need to uncover the knowledge contained within these big data resources. Cognitive computing will be key in extracting knowledge from big data. Strategy, process centric approaches and interorganizational aspects of decision support to research on new technology and academic endeavors in this space will continue to provide insights on how we process big data to enhance decision making.

Cognitive computing is the next evolution of the connection between AI and KM. In future post, I will examine and discuss the industries where cognitive computing is being a disruptive force. This disruption will lead to dramatic changes on how people will work in these industries.