Feb 042018
 

Source: https://www.analyticsvidhya.com

In a previous blog post I wrote about how Information Architecture is an enabler for Big Data Analytics. This posts will focus on how Information Architecture is used to enable Big Data analytics and essentially become AI’s Secret Ingredient!

Most of the discussion around Artificial Intelligence (AI) is focused on Machine Learning, Cognitive Computing and Big Data Analytics. However, you must prepare your organization’s data in order to properly take advantage of AI tools that are focused on Big Data Analytics (such as, IBM Watson, Amazon, Microsoft, and Google). To properly prepare your data you will need to apply Information Architecture.

Information Architecture (IA) is AI’s secret ingredient. IA provides the process, procedures and methods to perform Content (semi-structured and unstructured data) Curation. IA being applied to perform content curation focuses on the semi-structured and unstructured data that comprises over 90% of the data being analyzed by big data analytics. Semi-structured data is a form of data that does not conform with the formal structure of data in a databases or data tables, but contains tags to separate elements and enforce hierarchies within the data (i.e., spreadsheets, XML files). Unstructured data is a form of data with no tagging, metadata or inherent structured associated with it (i.e., image, text, voice, video). Content typically refers to the container that the semi-structured and unstructured data resides in (i.e., .pdf, .doc, .xml, .ppt, .csv).

Content curation enables the extraction of value from data, and it is a capability that is required for areas that are dependent on complex and/or continuous data integration and classification. The improvement of data curation tools and methods directly provides greater efficiency of the knowledge discovery process, maximizes return of investment per data item through reuse, and improves organizational transparency.

Most organizations just deal with creating a common data model depicting all structural data in the organization. Although this is a great start, it is only part of your data picture. Creating a common or enterprise content model depicting semi-structured and unstructured data will complete your data picture and lay the foundation for data centralization providing the most accurate and holistic representation of your organization’s data.

Creating a common view of your semi-structured and unstructured data is a daunting task! Due to the massive amount of data and the variety of sources, it is important to start small, splitting the data into specific domains. In aligning your common data model and content model you must have common terms and consistent structures. In particularly for your semi-structured and unstructured data, applying consistent metadata to fully describe this data is important.

Content Curation Process for Big Data Analytics

Content curation provides the methodological and technological data management support to address data quality issues, maximize the usability of the data; provide an active and on-going management of data through its lifecycle; perform data discovery and retrieval, create and maintain quality, add value, and provide for re-use over time.

Content curation process includes the following activities:

  • Content Audit
    • Gather the requirements for content, as well as measurement and evaluation criteria. Perform Content Audit from the various content sources under consideration; determine what content is ready to be consumed, evaluate the quality of content, determine the gaps in content, and identify the measurements to determine what content is used (and not used).
  • Content Analysis
    • Content analysis examines information concepts, relationships, business rules and metadata. This provides a sharable, stable and organized structure for content (information and knowledge) for the enterprise. Semantics will address the meaning of the concepts identified in the content model as well as the meanings of the relationships between the concepts (usually expressed as business rules).
  • Address Content Gaps
    • The results of performing a Content Audit will determine the gaps in content and identify the additional sources of content that are needed for effective big data analytics.
  • Content Selection & Validation
    • Content selection should be considered in terms of significance, how essential or basic is it to the discipline; validity, is the content accurate, current and relevant to the domain under consideration; relevance: what is the discipline/workplace/ societal value of this content? Utility: how useful will the content be to overall domain under consideration. Validation or content validity is concerned with making sure that the content is accessed from and/or based on the authoritative or trusted source, reviewed on a regular basis (based on the specific governance policies), modified when needed and archived when it becomes obsolete.
  • Classification
    • The classification of content will be in the form of one or more ontologies/taxonomies. Classification of information will also be realized through controlled vocabularies and thesaurus. The structure refers to the methods to aggregate the concepts and metadata into the domain ontology/taxonomy.
  • Align Content to Domain Ontology/taxonomy
    • Categorizing content and aligning the content to a common ontology/taxonomy is essential to big data analytics due to the varied number of data sources under consideration.
  • Transformation
    • Transformation provides consistent look-n-feel between similar content types; consistently identified with standard and precise metadata; aligned to an accurate and exact ontology/taxonomy
  • Preservation and Governance

Preservation and Governance has the following characteristics:

    • Content and Classification Stewardship: The focus here is on establishing accountability for the accuracy, consistency and timeliness of content, content relationships, metadata and taxonomy within areas of the enterprise and the applications that are being used.
    • IA Management and Maintenance: This refers to the specific details on how the enterprise manages and maintains changes to content, content relationships, metadata and taxonomy. This is facilitated through the use of specific process and workflows.
    • Policies and Procedures: This refers to establishing and /or conforming to information policies for generation, consumption and access of content (information and knowledge); This also addresses how information is handled – Organization has detailed information policies associated with specific information types (i.e., Underwriting Guidelines, Rate Manuals, Pricing Strategies)
    • Enforcement: Enforcement of governance pertains to the implementation and execution of the policies and procedures identified in the governance plan. Establishment of a governance board will be the organizational entity to carry out the enforcement of governance, while the applications/tools must be configured to enforce governance of content on a day-to-day basis.

There is the need to discover patterns and create models to address a specific task or a business objective. Semi-structured and unstructured data is vital to the decision-making process. Defining a structured representation associated with the data allows users to compare, aggregate, and transform the data. With more data available, the barrier of data acquisition is reduced. To extract value from the data it needs to be systematically processed, transformed, and repurposed into a new context. Curation of semi-structured and unstructured data in big data analytics is driven by the need to reduce the time-to-market, reduce the time to create new products, repurposing existing content and to improve accessibility and visibility of information artifacts. Curation is important because of the growth of the variety of sources used in Big Data. Selecting your data from a variety of well curated sources will add richness to your Big Data Analytics results!

 

Dec 182017
 

Big-data-analytics-solutions: http://hpc-asia.com/japans-nec-opens-10mn-centre-for-big-data-analytics/The use of Information Architecture (IA) covers the spectrum from Big Data Analytics to Content Visualization. In my previous post “Information Architecture and Big Data Analytics” I indicated how IA is an enabler for Big Data Analytics and that Big Data includes all data (i.e., Unstructured , Semi-structured, and Structured). Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. Over 90% of any organizations data is either unstructured or semi-structured. Therefore, creating a consistent structure for this data is extremely important for the success of any big data analytics effort.

Big Data Analytics

The Big Data Analytics process consist of identifying the big data repositories, building the statistical/mathematical algorithm, building analytics model which could be one or a combination of descriptive, diagnostic, predictive and/or prescriptive models (Big Data Analytic Models), and execution by passing the data through the model. The result is that the model will provide insights into the data and will in turn be analyzed by an expert in order to take some action, (Big Data Analytics: collecting, organizing and analyzing large sets of data to discover patterns to provide actionable insights).

To understand the data, the data must first be organized and understood. Since the majority of big data is unstructured and semi-structured, from various systems and repositories with no relationship to each other, applying IA through content modeling, metadata and taxonomic analysis will provide the structure necessary for big data analytics and its various tools (such as, Hadoop, Spark, and MongoDB) to be effective.

Information architecture enables Big Data to rapidly explore and analyze any combination of structured, semi-structured and unstructured sources. Big Data requires information architecture to exploit relationships and synergies between your data. This infrastructure enables organizations to make decisions utilizing the full spectrum of your big data sources.

Content Visualization

Content Visualization is also enabled through Information Architecture. Content Visualization’s primary goal is to present content in an appealing and effective way. Content visualization takes content and makes is easily consumable by the user. This is an essential outcome of effectively applying IA to your content. The IA content model component identifies the content types and the relationships between content. Content modeling is the process that provides a visual representation of content in its appropriate context through the identification of content types and their relationships through the construction of content models. This also allows for the representation of content in a way that translates the intention, stakeholder needs, and functional requirements that can be translated into the user experience design and into something that can be built by developers. Content modeling is a critical portion of the implementation of your website, CMS and/or KMS. The content model is your initial exposure on what are the content components that will be displayed to a user.

The other essential IA component for content visualization is building the content taxonomy, which will lead to the navigation scheme on the user interface displaying the content. The taxonomy will provide a conceptual framework for content retrieval. Incorporating a consistent taxonomy structure will classify and name the content in an orderly manner that will produce usable content visualization elements on any software solution providing content.

Information Architecture provides the methods and tools for organizing, labeling, building relationships (through associations), and describing (through metadata) your unstructured content. Information Architecture is not only important for producing usable systems that provide content, making content more consumable for users and improving search and findability. IA is also necessary on the front end to provide the structure to enable big data analytics to be effective for your organization. It is extremely important when deciding to implement IA that you examine all the benefits it will bring to the organization and ROI you can achieve from big data analytics to content visualization!

Oct 312017
 

Data-Science-IA-Big-DataInformation Architecture is an enabler for Big Data Analytics. You may be asking, why would I say this, or how does IA enable Big Data Analytics. We need to remember that Big Data includes all data (i.e., Unstructured, Semi-structured, and Structured). The primary characteristics of Big Data (Volume, Velocity, and Variety) are a challenge to your existing architecture and how you will effectively, efficiently and economically process data to achieve operational efficiencies.

In order to derive the maximum benefit from Big Data, organizations must be able to handle the rapid rate of delivery and extraction of huge volumes of data, with varying data types. This can then be integrated with the organization’s enterprise data and analyzed. Information Architecture provides the methods and tools for organizing, labeling, building relationships (through associations), and describing (through metadata) your unstructured content adding this source to your overall pool of Big Data. In addition, information architecture enables Big Data to rapidly explore and analyze any combination of structured, semi-structured and unstructured sources. Big Data requires information architecture to exploit relationships and synergies between your data. This infrastructure enables organizations to make decisions utilizing the full spectrum of your big data sources.

                                                            Big Data – Component

Information Architecture Element Volume Velocity Variety
Content Consumption Provides an understanding of the universe of relevant content through performing a content audit. This contributes directly to volume of available content. This directly contributes to the speed at which content is accessed by providing initial volume of the available content. Identifies the initial variety of content that will be a part of the organization’s Big Data resources.
Content Generation Fill gaps identified in the content audit by Gather the requirements for content creation/ generation, which contributes to directly to increasing the amount of content that is available in the organization’s Big Data resources. This directly contributes to the speed at which content is accessed due to the fact that volumes are increasing. Contributes to the creation of a variety of content (documents, spreadsheets, images, video, voice) to fill identified gaps.
Content Organization Content Organization will provide business rules to identify relationships between content, create metadata schema to assign content characteristic to all content. This contributes to increasing the volume of data available and in some ways leveraging existing data to assign metadata values. This directly contributes to improving the speed at which content is accessed by applying metadata, which in turn will give context to the content. The Variety of Big Data will often times drive the relationships and organization between the various types of content.
Content Access Content Access is about search and establishing the standard types of search (i.e., keyword, guided, and faceted). This will contribute to the volume of data, through establishing the parameters often times additional metadata fields and values to enhance search. Contributes to the ability to access content and the speed and efficiency in which content is accessed. Contributes to how the variety of content is access. The Variety of Big Data will often times drive the search parameters used to access the various type of content.
Content Governance The focus here is on establishing accountability for the accuracy, consistency and timeliness of content, content relationships, metadata and taxonomy within areas of the enterprise and the applications that are being used. Content Governance will often “prune” the volume of content available in the organization’s Big Data resources by only allowing access to pertinent/relevant content, while either deleting or archiving other content. When the volume of content available in the organization’s Big Data resources is trimmed through Content Governance it will improve velocity by making available a smaller more pertinent universe of content. When the volume of content available in the organization’s Big Data resources is trimmed through Content Governance the variety of content available may be affected as well.
Content Quality of Service Content Quality of Service focuses on security, availability, scalability, usefulness of the content and improves the overall quality of the volume of content in the organization’s Big Data resources by: – defending content from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction – eliminating or minimizing disruptions from planned system downtime making sure that the content that is accessed is from and/or based on the authoritative or trusted source, reviewed on a regular basis (based on the specific governance policies), modified when needed and archived when it becomes obsolete – enabling the content to behave the same no matter what application/tool implements it and flexible enough to be used from an enterprise level as well as a local level without changing its meaning, intent of use and/or function – by tailoring the content to the specific audience and to ensure that the content serves a distinct purpose, helpful to its audience and is practical. Content Quality of Service will eliminate or minimize delays and latency from your content and business processes by speeding to analyze and make decisions directing effecting the content’s velocity. Content Quality of Service will improve the overall quality of the variety of content in the organization’s Big Data resources through aspects of security, availability, scalability, and usefulness of content.

The table above aligns key information architecture elements to the primary components of Big Data. This alignment will facilitate a consistent structure in order to effectively apply analytics to your pool of Big Data. The Information Architecture Elements include; Content Consumption, Content Generation, Content Organization, Content Access, Content Governance and Content Quality of Service. It is this framework that will align all of your data to enable business value to be gained from your Big Data resources.

Note: This table originally appeared in the book Knowledge Management in Practice (ISBN: 978-1-4665-6252-3) by Anthony J. Rhem.

Feb 292016
 

Cancer MoonshotOn January 12, 2016 in his State of the Union address, President Obama called for America to become “the country that cures cancer once and for all” As he introduced the “Moonshot” initiative that will be guided by Vice President Joe Biden.

Dr. Tom Coburn, former Republican Senator from the state of Oklahoma and three time cancer survivor, in his January 14th article in the Wall Street Journal ‘A Cancer ‘Moonshot’ Needs Big Data ; indicates that “harnessing that information (“big data”) would allow us to personalize prevention and treatment based on the genetic characteristics of a patient’s tumor, family history and personal preferences, while minimizing unwanted side effects.”

On February 5, 2016 on CNN’s Global Public Square show: Big data could be a health care game-changer author and doctor, David Agus tells Fareed Zakaria how using big data and examining thousands of cases might increase how long we live and our quality of life.

At this time in our history, with the continuing electronic capture of patient information from intake to discharge, the opportunity could not be brighter to cure cancer. The Obama administration’s 2010 initiative to capture electronic health records has enabled the opportunity to improve patient care, increase patient participation, improve diagnostics and patient outcomes, improve care coordination, as well as create practice efficiencies and cost savings.

The electronic capture of patient information has created medical big data repositories. One such repository is the American College of Surgeons/American Cancer Society’s National Cancer Database – NCDB. Resources such as these will benefit by utilizing knowledge management and information architecture techniques to identify and unlock knowledge patterns contained within these big data sources. In several of my blog post dating back to January 2013, I wrote about the advantages of applying KM to big data. From understanding Contextual Intelligence KM and Big Data ; to devoting a chapter on KM and Big Data in my upcoming book KM in Practice; I believe when executed the right way KM, powered by information architecture will provide the essential ingredient when applied to big data. This will enable researchers to discover better treatments and possible cures for many diseases including cancer and we will realize the dream presented by the Moonshot initiative!

Jan 312015
 

HiatusAfter a year hiatus, I am back facilitating the flow of knowledge through the Knowledge Management Depot. During my absence we had concluded 2013 in KM which presented, an increase of social media tools being incorporated in the workplace, the rise of analytics and BIG Data to tie KM to actionable business results,  and knowledge related tools on mobile applications; while in the year just concluded (2014) we experienced more enterprise collaboration, a rise in search related tools and functionality (incorporating Information Architecture) within mobile and enterprise applications to improve findability and to respond to customer inquiries more effectively and efficiently. Now as we enter 2015, I see several opportunities where KM will make an impact.

In 2015 KM will impact M&A transactions specifically when it comes to understanding who the key knowledge holders are and to properly give a valuation to a firms knowledge, the legal community is experiencing success with KM and more legal entities will be leveraging KM in 2015,  BIG Data continues to make noise in the industry and how KM will be positioned to gleam knowledge from all of this proliferation of content will be critical to organization (NASA-KM-meeting-Big-Data-Strategy) and interacting with the customer will continue to leverage KM to provide organizations with a competitive edge to not only attract new customers but also to retain and provide more interaction with the  current  customer base (Forrester’s Top Trends in Customer Service)

Although I have been absent… I have been busy!! I am concluding my next book on KM “Knowledge Management in Practice” as well as a two (2) day class in Information Architecture for Knowledge Management Systems. I look forward to your comments and to participating in knowledge management as 2015 unfolds!