Pierre-Julien Guay's picture

Learner Data

An introduction to Learning Analytics *

By Pierre-Julien Guay. The 1 February 2016


Pierre-Julien Guay collaborates in the development of the ISO/IEC 20748 series of standards for learning analytics as a part of the JTC1 ISO/IEC SC36 Committee tasked with the development of international standards in the field of information technology for education, training and learning. Alexandre Enkerli explores issues surrounding the use of learning data in the Quebec collegiate environment.


Many types of learning activities are currently being conducted using sophisticated digital tools. This technology is essential to the online courses that have come to dominate remote learning. It has also become an increasingly important part of in class training, an essential part of informal learning and in-company training.

These educational materials are distributed via a broad range of platforms and mediums including the Web (accessible by desktop computer as well as by smartphone), portable technology (sensors, smart watches) and even the Internet of Things. It is now possible to harvest data from a significant portion of learning activities and analyze their progression.

This article, after establishing the uses for these analyses, will examine the different types of data gathered and the pre-processing used for its analysis.

Applications of learning analytics

The most common use for data on learning is the construction of a dashboard that compares the learning pathways of a single learner with larger groups that can vary in size from a single team to a class group or even an entire establishment. This type of tool is used in environments such as Moodle (the Learning Analytics Enriched Rubric module and SmartKlass™), Brightspace as well as in tools supplied by large American publishers such as Pearson Learning Studio

The learner’s dashboard charts their position using several aspects including: level of progression, test score histograms and the number of visits. One of its principal goals is to involve the learners in managing and planning their own activities.

The instructor’s dashboard allows for the screening of learners having difficulty based on pre-established criteria. This allows the instructor to act promptly and provide the means required to help. Some tools even go so far as to claim predictive analyses as the frequency and duration of visits over the first three weeks of a course appear to be a reliable predictor of success or failure for a given course.

Another application consists of rapidly detecting learning difficulties in individual learners. Language problems (dysorthographia and dyslexia), arithmetic problems (dyscalculia) and even motor skills problems (dyspraxia) can skew learning achievements. Rapid detection and remediation can make a notable difference; especially when the intervention takes into account all the factors resulting from the assessment.

In certain cases, instructors and administrators can even access a broader pool of data for all learners in a single organization. As a result, managers can assign employee-specific training and instructors can corroborate with other colleagues on the absentee rate of a specific learner in their course.

Analyses of class groups can also be performed as a straightforward quality assurance measure. Because this is no longer an individual coaching effort at this point, best practises should include the dissociation of personal information from learning pathways to preserve anonymity, ensure data confidentiality and protect the privacy of [1] all associated parties (including the relatives and associates of learners).

Some promoters of learning analytics focus on automated pathways for personalized learning based on personal preferences and the material’s level of difficulty. However, this function requires access to a precise and stable representation of the curricula structure, a list of associated skills and a sizeable collection of digital resources with clearly defined parameters. For the time being, this information is rarely found in interoperable digital formats. 

Rather than being interpreted as a solid structure of precise facts or robust prescriptions, the result of these analyses should be understood instead as a comment or rather a suggestion. Predictive patterns are based on a simplified model of learning and do not take into account the intrinsic complexity of individuals, their personal backgrounds or the greater context that surrounds learning.

Schémai Data flow

Image : Data flow of learning analytics

Data collection

 Though the big data movement aims at exhaustiveness, it is important to highlight that much information is still absent from digital platforms. Additionally, a large part of learning activities take place offline—even for online courses.

 For learning analytics, the main data source about the learner comes from the learning environment. Elements such as the time and duration of a pedagogical resource consultation session, the external resources consulted and the results of a questionnaire are all typically retained by these tools.

 Specific data formats have been developed for learning analytics such as the xAPI specification and the Caliper system. These formats are based on natural language statements about learners. For example:

  •  Learner X has sent a message to Y
  • Learner X has watched the 123.avi file in its entirety

 Several publishing houses provide educational activities modules that can, strictly speaking, be used outside of learning platforms. Associating the data from these activities with CMS and LMS platforms can be very useful. The data produced by the SCORM modules can be transformed into xAPI using the CMI-5 application profile.

 Several types of learning activities take place online. However, many of those are done outside of predictive learning environments such as on social networks. Using formats such as Caliper and xAPI, there are many possibilities for interconnecting external data with specialized platforms.

 Given the constant evolution of practices, it should come as no surprise that information can also be transmitted via Web sensors. A webcam can be used to track eye movements during reading or when looking at a visual work. Other tools can read the contraction of muscles during facial expressions to discern the learner’s emotions. This information can be complemented by other data on movement and health and the detection of physical presence in an established location. Geolocation data sent by mobile devices and the Internet of Things are also frequently usable.

 Other a posteriori coded information such as observations and interactions within a group and other elements from admission services, registrars and student files can be used as well. 

 Data processing and storage

 Data encoded using xAPI or Caliper are typically stored in a Learning Record Store or LRS; though beforehand, some adjustments or additions may be necessary.

 Some learners (or their legal guardians) may prefer not to have their learning pathways retained or used at all. Some devices and applications lack a simple interface to deactivate the transmission of data footprints. This is why it is important to have a data import filter in place. The protection of private data is an essential condition for the public acceptance of learning analytics systems, as demonstrated in the United States with the resounding failure of InBloom, a service that collected data about learners’ criminal offences, their management by social services, academic sanctions, diagnoses of intellectual disability, autism and physical disabilities—to name but a few. The model proposed by ISO/IEC 20748 keeps the information registered outside of the data stream.

 xAPI and Caliper compliant data can be can be validated against application profiles. The main xAPI application profiles are CMI-5 for SCORM data and W3C Activity Streams for interactions on social networks. Other data must be converted.

 Data such as time stamps and unique identifiers can also be added before being uploaded to the Learning Record Store.

 Success Factors

 In education, any new idea or technological practise tends to encourage enthusiasm and create unrealistic expectations. We are of the opinion that entirely automated management for the personalized follow-up of learners is a utopia. Unless sophisticated equipment is used, we can only track the online portion of the learning process. This means it excludes observed social interactions in an active learning or collaborative class. Additionally, it is impossible to conceive of modelling learning as a whole, given significant variables such as learning styles, culture and learners’ prior experiences and previously acquired knowledge.

 Despite the massive investment in analyzing the evaluation processes of prior learning, docimology has not yet provided instruments that are capable of measuring learning in an adequate, robust and reliable manner. The jury is still out on the relative merits of measuring knowledge and observing the performance of developed skills.

 However, analyzing big data and the graphs created by sociograms is not within the capabilities of most teachers. Scholars such as Mike Caulfield have also criticized its purely empirical approach as being stripped of a pre-established theory—which is true for much of learning analytics.

 That said, by interpreting the indicators provided by learning analytics in their proper contexts, teachers can broaden, facilitate and improve their approaches to the accompaniment of learners.

 Ensuring the respect of privacy is an essential condition for the acceptance and use of learning analytics. Identity federation, an authentication broker that ensures no essential elements are shared, is a vital firewall which should be included as part of the online environment and it should be made readily available to learners by their establishments. Unfortunately, too few establishments have codes of conduct and behaviour for the implementation of this type of tool. Such practises would ideally include statements such as:

  • A non-consent option for the gathering and use of learning pathways and their automatic removal
  • The use of a mechanism such as identity federation for all online services (registrar, library) and the external services used as a part of learning (social networks, YouTube, Google Apps for Education, Office 365, Adobe Creative Cloud, etc.)
  • Policy analysis for each external service provider (data hosting site, length of archiving, use)
  • The type of data collected, the time frame for its archiving and use—including research purposes
  • A statement on the measures implemented to protect privacy
  • A statement on compliance with laws on the protection of privacy and universal accessibility.

[1] In colloquial usage, it is common to fuse these distinct concepts. A strategic business plan must be confidential without being associated with private lives. Personal information, related to private life, can be provided without identifying the individuals from whom it was obtained.

* In collaboration with Alexandre Enkerli 


You should be a member to comment. - Subscribe

Netiquette rules

Enter your username.
Enter your password.