Feb 042018

Source: https://www.analyticsvidhya.com

In a previous blog post I wrote about how Information Architecture is an enabler for Big Data Analytics. This posts will focus on how Information Architecture is used to enable Big Data analytics and essentially become AI’s Secret Ingredient!

Most of the discussion around Artificial Intelligence (AI) is focused on Machine Learning, Cognitive Computing and Big Data Analytics. However, you must prepare your organization’s data in order to properly take advantage of AI tools that are focused on Big Data Analytics (such as, IBM Watson, Amazon, Microsoft, and Google). To properly prepare your data you will need to apply Information Architecture.

Information Architecture (IA) is AI’s secret ingredient. IA provides the process, procedures and methods to perform Content (semi-structured and unstructured data) Curation. IA being applied to perform content curation focuses on the semi-structured and unstructured data that comprises over 90% of the data being analyzed by big data analytics. Semi-structured data is a form of data that does not conform with the formal structure of data in a databases or data tables, but contains tags to separate elements and enforce hierarchies within the data (i.e., spreadsheets, XML files). Unstructured data is a form of data with no tagging, metadata or inherent structured associated with it (i.e., image, text, voice, video). Content typically refers to the container that the semi-structured and unstructured data resides in (i.e., .pdf, .doc, .xml, .ppt, .csv).

Content curation enables the extraction of value from data, and it is a capability that is required for areas that are dependent on complex and/or continuous data integration and classification. The improvement of data curation tools and methods directly provides greater efficiency of the knowledge discovery process, maximizes return of investment per data item through reuse, and improves organizational transparency.

Most organizations just deal with creating a common data model depicting all structural data in the organization. Although this is a great start, it is only part of your data picture. Creating a common or enterprise content model depicting semi-structured and unstructured data will complete your data picture and lay the foundation for data centralization providing the most accurate and holistic representation of your organization’s data.

Creating a common view of your semi-structured and unstructured data is a daunting task! Due to the massive amount of data and the variety of sources, it is important to start small, splitting the data into specific domains. In aligning your common data model and content model you must have common terms and consistent structures. In particularly for your semi-structured and unstructured data, applying consistent metadata to fully describe this data is important.

Content Curation Process for Big Data Analytics

Content curation provides the methodological and technological data management support to address data quality issues, maximize the usability of the data; provide an active and on-going management of data through its lifecycle; perform data discovery and retrieval, create and maintain quality, add value, and provide for re-use over time.

Content curation process includes the following activities:

  • Content Audit
    • Gather the requirements for content, as well as measurement and evaluation criteria. Perform Content Audit from the various content sources under consideration; determine what content is ready to be consumed, evaluate the quality of content, determine the gaps in content, and identify the measurements to determine what content is used (and not used).
  • Content Analysis
    • Content analysis examines information concepts, relationships, business rules and metadata. This provides a sharable, stable and organized structure for content (information and knowledge) for the enterprise. Semantics will address the meaning of the concepts identified in the content model as well as the meanings of the relationships between the concepts (usually expressed as business rules).
  • Address Content Gaps
    • The results of performing a Content Audit will determine the gaps in content and identify the additional sources of content that are needed for effective big data analytics.
  • Content Selection & Validation
    • Content selection should be considered in terms of significance, how essential or basic is it to the discipline; validity, is the content accurate, current and relevant to the domain under consideration; relevance: what is the discipline/workplace/ societal value of this content? Utility: how useful will the content be to overall domain under consideration. Validation or content validity is concerned with making sure that the content is accessed from and/or based on the authoritative or trusted source, reviewed on a regular basis (based on the specific governance policies), modified when needed and archived when it becomes obsolete.
  • Classification
    • The classification of content will be in the form of one or more ontologies/taxonomies. Classification of information will also be realized through controlled vocabularies and thesaurus. The structure refers to the methods to aggregate the concepts and metadata into the domain ontology/taxonomy.
  • Align Content to Domain Ontology/taxonomy
    • Categorizing content and aligning the content to a common ontology/taxonomy is essential to big data analytics due to the varied number of data sources under consideration.
  • Transformation
    • Transformation provides consistent look-n-feel between similar content types; consistently identified with standard and precise metadata; aligned to an accurate and exact ontology/taxonomy
  • Preservation and Governance

Preservation and Governance has the following characteristics:

    • Content and Classification Stewardship: The focus here is on establishing accountability for the accuracy, consistency and timeliness of content, content relationships, metadata and taxonomy within areas of the enterprise and the applications that are being used.
    • IA Management and Maintenance: This refers to the specific details on how the enterprise manages and maintains changes to content, content relationships, metadata and taxonomy. This is facilitated through the use of specific process and workflows.
    • Policies and Procedures: This refers to establishing and /or conforming to information policies for generation, consumption and access of content (information and knowledge); This also addresses how information is handled – Organization has detailed information policies associated with specific information types (i.e., Underwriting Guidelines, Rate Manuals, Pricing Strategies)
    • Enforcement: Enforcement of governance pertains to the implementation and execution of the policies and procedures identified in the governance plan. Establishment of a governance board will be the organizational entity to carry out the enforcement of governance, while the applications/tools must be configured to enforce governance of content on a day-to-day basis.

There is the need to discover patterns and create models to address a specific task or a business objective. Semi-structured and unstructured data is vital to the decision-making process. Defining a structured representation associated with the data allows users to compare, aggregate, and transform the data. With more data available, the barrier of data acquisition is reduced. To extract value from the data it needs to be systematically processed, transformed, and repurposed into a new context. Curation of semi-structured and unstructured data in big data analytics is driven by the need to reduce the time-to-market, reduce the time to create new products, repurposing existing content and to improve accessibility and visibility of information artifacts. Curation is important because of the growth of the variety of sources used in Big Data. Selecting your data from a variety of well curated sources will add richness to your Big Data Analytics results!


Dec 182017

Big-data-analytics-solutions: http://hpc-asia.com/japans-nec-opens-10mn-centre-for-big-data-analytics/The use of Information Architecture (IA) covers the spectrum from Big Data Analytics to Content Visualization. In my previous post “Information Architecture and Big Data Analytics” I indicated how IA is an enabler for Big Data Analytics and that Big Data includes all data (i.e., Unstructured , Semi-structured, and Structured). Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. Over 90% of any organizations data is either unstructured or semi-structured. Therefore, creating a consistent structure for this data is extremely important for the success of any big data analytics effort.

Big Data Analytics

The Big Data Analytics process consist of identifying the big data repositories, building the statistical/mathematical algorithm, building analytics model which could be one or a combination of descriptive, diagnostic, predictive and/or prescriptive models (Big Data Analytic Models), and execution by passing the data through the model. The result is that the model will provide insights into the data and will in turn be analyzed by an expert in order to take some action, (Big Data Analytics: collecting, organizing and analyzing large sets of data to discover patterns to provide actionable insights).

To understand the data, the data must first be organized and understood. Since the majority of big data is unstructured and semi-structured, from various systems and repositories with no relationship to each other, applying IA through content modeling, metadata and taxonomic analysis will provide the structure necessary for big data analytics and its various tools (such as, Hadoop, Spark, and MongoDB) to be effective.

Information architecture enables Big Data to rapidly explore and analyze any combination of structured, semi-structured and unstructured sources. Big Data requires information architecture to exploit relationships and synergies between your data. This infrastructure enables organizations to make decisions utilizing the full spectrum of your big data sources.

Content Visualization

Content Visualization is also enabled through Information Architecture. Content Visualization’s primary goal is to present content in an appealing and effective way. Content visualization takes content and makes is easily consumable by the user. This is an essential outcome of effectively applying IA to your content. The IA content model component identifies the content types and the relationships between content. Content modeling is the process that provides a visual representation of content in its appropriate context through the identification of content types and their relationships through the construction of content models. This also allows for the representation of content in a way that translates the intention, stakeholder needs, and functional requirements that can be translated into the user experience design and into something that can be built by developers. Content modeling is a critical portion of the implementation of your website, CMS and/or KMS. The content model is your initial exposure on what are the content components that will be displayed to a user.

The other essential IA component for content visualization is building the content taxonomy, which will lead to the navigation scheme on the user interface displaying the content. The taxonomy will provide a conceptual framework for content retrieval. Incorporating a consistent taxonomy structure will classify and name the content in an orderly manner that will produce usable content visualization elements on any software solution providing content.

Information Architecture provides the methods and tools for organizing, labeling, building relationships (through associations), and describing (through metadata) your unstructured content. Information Architecture is not only important for producing usable systems that provide content, making content more consumable for users and improving search and findability. IA is also necessary on the front end to provide the structure to enable big data analytics to be effective for your organization. It is extremely important when deciding to implement IA that you examine all the benefits it will bring to the organization and ROI you can achieve from big data analytics to content visualization!

Oct 312017

Data-Science-IA-Big-DataInformation Architecture is an enabler for Big Data Analytics. You may be asking, why would I say this, or how does IA enable Big Data Analytics. We need to remember that Big Data includes all data (i.e., Unstructured, Semi-structured, and Structured). The primary characteristics of Big Data (Volume, Velocity, and Variety) are a challenge to your existing architecture and how you will effectively, efficiently and economically process data to achieve operational efficiencies.

In order to derive the maximum benefit from Big Data, organizations must be able to handle the rapid rate of delivery and extraction of huge volumes of data, with varying data types. This can then be integrated with the organization’s enterprise data and analyzed. Information Architecture provides the methods and tools for organizing, labeling, building relationships (through associations), and describing (through metadata) your unstructured content adding this source to your overall pool of Big Data. In addition, information architecture enables Big Data to rapidly explore and analyze any combination of structured, semi-structured and unstructured sources. Big Data requires information architecture to exploit relationships and synergies between your data. This infrastructure enables organizations to make decisions utilizing the full spectrum of your big data sources.

                                                            Big Data – Component

Information Architecture Element Volume Velocity Variety
Content Consumption Provides an understanding of the universe of relevant content through performing a content audit. This contributes directly to volume of available content. This directly contributes to the speed at which content is accessed by providing initial volume of the available content. Identifies the initial variety of content that will be a part of the organization’s Big Data resources.
Content Generation Fill gaps identified in the content audit by Gather the requirements for content creation/ generation, which contributes to directly to increasing the amount of content that is available in the organization’s Big Data resources. This directly contributes to the speed at which content is accessed due to the fact that volumes are increasing. Contributes to the creation of a variety of content (documents, spreadsheets, images, video, voice) to fill identified gaps.
Content Organization Content Organization will provide business rules to identify relationships between content, create metadata schema to assign content characteristic to all content. This contributes to increasing the volume of data available and in some ways leveraging existing data to assign metadata values. This directly contributes to improving the speed at which content is accessed by applying metadata, which in turn will give context to the content. The Variety of Big Data will often times drive the relationships and organization between the various types of content.
Content Access Content Access is about search and establishing the standard types of search (i.e., keyword, guided, and faceted). This will contribute to the volume of data, through establishing the parameters often times additional metadata fields and values to enhance search. Contributes to the ability to access content and the speed and efficiency in which content is accessed. Contributes to how the variety of content is access. The Variety of Big Data will often times drive the search parameters used to access the various type of content.
Content Governance The focus here is on establishing accountability for the accuracy, consistency and timeliness of content, content relationships, metadata and taxonomy within areas of the enterprise and the applications that are being used. Content Governance will often “prune” the volume of content available in the organization’s Big Data resources by only allowing access to pertinent/relevant content, while either deleting or archiving other content. When the volume of content available in the organization’s Big Data resources is trimmed through Content Governance it will improve velocity by making available a smaller more pertinent universe of content. When the volume of content available in the organization’s Big Data resources is trimmed through Content Governance the variety of content available may be affected as well.
Content Quality of Service Content Quality of Service focuses on security, availability, scalability, usefulness of the content and improves the overall quality of the volume of content in the organization’s Big Data resources by: – defending content from unauthorized access, use, disclosure, disruption, modification, perusal, inspection, recording or destruction – eliminating or minimizing disruptions from planned system downtime making sure that the content that is accessed is from and/or based on the authoritative or trusted source, reviewed on a regular basis (based on the specific governance policies), modified when needed and archived when it becomes obsolete – enabling the content to behave the same no matter what application/tool implements it and flexible enough to be used from an enterprise level as well as a local level without changing its meaning, intent of use and/or function – by tailoring the content to the specific audience and to ensure that the content serves a distinct purpose, helpful to its audience and is practical. Content Quality of Service will eliminate or minimize delays and latency from your content and business processes by speeding to analyze and make decisions directing effecting the content’s velocity. Content Quality of Service will improve the overall quality of the variety of content in the organization’s Big Data resources through aspects of security, availability, scalability, and usefulness of content.

The table above aligns key information architecture elements to the primary components of Big Data. This alignment will facilitate a consistent structure in order to effectively apply analytics to your pool of Big Data. The Information Architecture Elements include; Content Consumption, Content Generation, Content Organization, Content Access, Content Governance and Content Quality of Service. It is this framework that will align all of your data to enable business value to be gained from your Big Data resources.

Note: This table originally appeared in the book Knowledge Management in Practice (ISBN: 978-1-4665-6252-3) by Anthony J. Rhem.

Mar 312017

CognitiveThere are approximately 22,000 new cases of lung cancer each year with an overall 5-year survival rate of only ~18 percent (American Cancer Society). The economic burden of lung cancer just based on per patient cost is estimated $46,000/patient (lung cancer journal). Treatment efforts using drugs and chemotherapy are effective for some, however more effective treatment has been hampered by the inability of clinicians to better target treatments to patients. It has been determined that Big Data holds the key for providing clinicians with the ability to develop more effective patient centered cancer treatments.

Analysis of Big Data may also improve drug development by allowing researchers to better target novel treatments to patient populations. Providing the ability for clinicians to harness Big Data repositories to develop better targeted lung cancer treatments and to enhance the decision-making process to improve patient care can only be accomplished through the use of cognitive computing. However, having a source or sources of data available to “mine” for answers to improve lung cancer treatments is a challenge!

There is also a lack of available applications that can take advantage of Big Data repositories to recognize patterns of knowledge and extract that knowledge in any meaningful way. The extraction of knowledge must be presented in a way that researchers can use to improve patient centric diagnosis and the development of patient centric treatments. Having the ability to use cognitive computing and KM methods to uncover knowledge from large cancer repositories will provide researchers in hospitals, universities, and pharmaceutical companies with the ability to use Big Data to identify anomalies, discover new treatment combinations and enhance diagnostic decision making.

Content Curation

An important aspect to cognitive computing and Big Data is the ability to perform a measure of content curation. The lung cancer Big Data environment that will be analyzed should include both structured and unstructured data (unstructured being documents, spreadsheets, images, video, etc.). In order to ingest the data from the Big Data resource the data will need to be prepared. This data preparation includes applying Information Architecture (IA) to the unstructured data within the repository. Understanding the organization and classification schemes relating to the data both structured and unstructured is essential to unifying the data into one consistent ontology.

Are We Up for the Challenge!

Even if a Big Data source was available and content curation was successful, the vast amounts of patient data is governed by HIPAA laws which makes it difficult for researchers to gain access to clinical and genomic data shared across multiple institutions or firms including research institutions and hospitals. According to Dr. Tom Coburn in his January 14th article in the Wall Street Journal ‘A Cancer ‘Moonshot’ Needs Big Data; gaining access to a big data repository all inclusive of patient specific data is essential to offering patient centered cancer treatments. Besides the technology challenges, there are data and regulation challenges. I’m sure that many of these challenges are being addressed. Thus, far there have been no solutions. Are we up for the challenge? Big Data analysis could help tell us which cancer patients are most likely to be cured with standard approaches, and which need more aggressive treatment and monitoring. It is time we solve these challenges to make a moonshot a certain reality!

Jun 302016

WCG Content ModelContent modeling is a powerful tool for fostering communication and alignment between User Experience (UX) design, editorial, and technical resources on a Information Architecture effort. By clearly defining the content domains, content types, content attributes (metadata) and relationships, we can make sure that the envisioned content strategy becomes a reality for the content creators.

The Content Model is a logical depiction of what an organization knows about things of interest to the business and graphically shows how they relate to each other in an entity relationship (ER) diagram or class diagram. An entity relationship diagram is an abstract conceptual representation of structured data.  It uses standard symbols to denote the things of interest to the business (entities), the relationships between entities and the cardinality and optionality of those relationships.  The Content Model, contains detailed characteristics of the content types or concepts, attributes or properties and their definitions.  It is a result of detailed analysis of the business requirements.

When starting a content modeling effort, it is important to begin with a high-level (conceptual content model). The conceptual content model is the first output from content modeling. After some initial work identifying, naming and agreeing on what content domains and content types are important within your problem domain you are now ready to structure them together into a conceptual content model.

It is essential that content strategists, information architects and business stakeholders engage with content modeling early on in the process. These are the people best positioned to find and classify content types that make sense for the business. They bring that understanding of why content needs to be structured, named and related in a certain way. In addition, the business subject matter experts bring knowledge of the rules about content that drives the naming and determining of relationships between content types.

Finding Content Types

Content types live in existing web sites, customer call centers (call logs), product documentation, communications, as input & output of processes and functions as well as in the mind of people performing various tasks. The mission is to find them, document and define them. here are other reasons to make something a separate type of content:

  1. Distinct, reusable elements. You might decide to create an Author content type that contains the name, bio and photo of each author. These can then be associated with any piece of content that person writes.
  2. Functional requirements. A Video might be a different type of content because the presentation layer needs to be prepared to invoke the video player.
  3. Organizational requirements. A Press Release may be very similar to a general Content Page, but only the Press Release is going to appear in an automatically aggregated Newsroom. It’s easier for these to be filtered out if they’re a unique type of content.

Content models progress along a continuum of constant refinement, there are three important stages in the content modeling lifecycle:

  • Conceptual: The initial content model which aims to capture the content domains, content types and high level relationships between content types.
  • Design: Adds the descriptive elements (metadata) to each content type and further refines the structural relationships between them.
  • Implementation: Models the content within the context of the target technology, e.g. CMS, Search Engines, Semantic Tools, etc.

Remember Content is KING!

Apr 182015

IoTA lot has been said about the next big movement … the Internet of Things (IoT). Simply, IoT is a massive network of connected devices and/or objects (which also includes people). The relationship will be people-to-people, people-to-devices, and devices-to-devices. These devices will have network connectivity, allowing them to send and receive data.

The IoT will lead to Smart Grids and Smart Cities and Information Architecture (IA) will enable a “smart architecture” for delivering content in the right context to the right device (or object)!

So where does IA come into this scenario?

IA is all about connecting people to content (information and knowledge) and it is this ability that is at the core of enabling a myriad of devices and/or objects to connect and to send and receive content. It delivers that “smart architecture”.

The larger amounts of data brought in through the internet need a viable and clear information architecture to deliver consistency to a varied amount of devices. IA offers a viable option in which content (information and knowledge) can be represented in a flexible object-oriented fashion. However, with any option used for representing content, it will have to be able to design the “base” structure for all human content, everywhere. This, of course, is impossible.

It’s impossible because we simply cannot comprehend the extent of all content that is or will be available. This fully flexible object-oriented structure will need to be built similarly to how the human genome project scientists map DNA. This will allow the structure to continue to evolve and grow, which will continue to enable the delivery of content to devices and objects as they become connected to the internet.