Introduction

PALS Islandora Workday 2019: Keeping Digital Archives simple through Metadata and AI

PALS Islandora Workday 2019: Keeping Digital Archives simple through Metadata and AI

2019 PALS Islandora Workday attendees

On Friday, August 16th, the PALS Islandora community met at Bethany Lutheran College. PALS began the day with a brief update, noting that thanks to the staff at Augsburg University, PALS will look at testing the user engagement features of image annotation and user comments on repository objects. Our community expressed interest in building these features at an office hour held earlier in the year.

A glimpse at Islandora 8

PALS provided a first glance at Islandora 8. Islandora 8 represents the culmination of the Islandora CLAW project, which brings Islandora to Drupal 8 and Fedora 4. Islandora 8 integrates the use of Drupal 8, providing a lot more administrative control over how repository objects behave and display. Islandora 8 also substantially broadens the available number of third-party Drupal modules, themes, and distributions. See the official Islandora 8 announcement for more information and results of a survey sent out to the international community to learn more about coming developments.

A busy year in Digital Archives

After looking at Islandora 8, our partners gave updates on projects from 2019. Minnesota State University, Mankato focused on three migrations. These included their website and moving Archon to ArchivesSpace. Also Minnesota State University Mankato has nearly completed moving to Primo, have completed loading handbooks, and all collection guides. Moving the guides to Islandora was a part of migrating the website. Work continues on The School Progress. This fall archives staff hope to do more projects with students possibly including loading the Today magazine and additional photographs.  

Northwestern Health Sciences University worked on creating and adding material to their faculty scholarship collection. Staff continued adding photographs through the summer and will be making an additional push on faculty scholarship material this fall. Another priority will be loading oral histories.

2019 was the first year for Augsburg University to load new content into their repository. Previous work focused on migrating to the PALS Islandora digital repository solution. During this year students worked on digitizing the Women’s Athletic image collection which dates back to 1973. Staff at Augsburg have worked on getting the Augsburg AMail into the repository. Staff at Augsburg recently completed work on making oral histories available. This includes a workflow for students to load their own using simplified data entry forms.

Southwest Minnesota State University staff focused on Alma this past year and are now able to do more work with their digital repository. This work includes first making undergraduate research posters and booklets available followed by the undergraduate research journal. Over the summer, staff began to scan academic catalogs. The Southwest Minnesota State University archives also made a significant acquisition of art history material and photographs related to Southwest Minnesota State University from the Southwest Regional History Center.

The total number of documents in the repository for Minnesota Water Research Digital Library (MNWRL) is now double the original amount. Staff at MNWRL continue to work on talking to other agencies about the value and importance of our solution.

Our host for the workday, Bethany Lutheran College, finished loading The Echo over the summer. Archives staff also added some photographs and are excited to welcome a student intern this fall.

Keeping metadata entry simple

The next session of the workday began with a presentation by Anne Stenzel of Minnesota State University, Mankato on data entry and their experiences with student workers. Minnesota State Mankato archives staff started with a more complicated form. While students are fully capable of using detailed forms, archives staff quickly found that work progresses a lot smoother and faster when the forms are kept simple. When thinking about creating data entry forms, some pertinent questions to ask are:

  • What fields never change?
  • How many fields are left blank or are unnecessary?
  • How many fields become visual clutter and are confusing?

Answering these questions led to creating a new data entry form for born-digital records that consists of only five elements. These are:

  • Minnesota Digital Library Identifier
  • Local Identifier
  • Title
  • Date of Creation
  • Date Digital

Students appreciated the simplicity of the form. Another aspect they appreciate is having instructional text on entering data directly on the form, with no extra navigation, clicks, or effort needed. When working with students, think about what metadata is essential. Remember that the data entry form should be straightforward. If interested, see the presentation slides by Anne Stenzel.

Simplifying Oral Histories with AI

After Anne Stenzel’s presentation, Stewart Van Cleeve of Augsburg University spoke about his work on oral histories and figuring out a method that makes them simple for student workers to load. In his talk, he mentioned the program Otter.ai, which uses artificial intelligence for speech transcription. He found that this program can save a lot of time and effort for both staff and students. It isn’t perfect, however, and students are needed to review the transcript and fix errors. The benefit is that the program learns from this work and gets better at the automatic transcription process. Augsburg University recommends including a statement in the transcript similar to “Transcript generated by AI and contains users” to address any accuracy concerns.

One highly useful feature in Otter.ai is that it creates a list of keywords based on frequency. These get listed at the top of the transcript. It’s also possible to export the transcript to a PDF. Once these are in Islandora, the PDFs (transcripts) are full-text searchable, including these keywords. One could also use these as metadata subject terms in the data entry form if desired. Oral histories can be time-consuming for staff, but otter.ai addresses some of those concerns and helps make things simpler.

Bethany Lutheran College tour

After project updates and talking metadata, attendees toured the Bethany Lutheran College archives and some of the campus. Everyone enjoyed seeing the moving shelves, and all agreed that the archives space at Bethany Lutheran College is more organized than most. Next, attendees visited Old Main and viewed an impressive photomontage showing campus life at Bethany Lutheran College. Bethany Lutheran College’s digital archives feature this photomontage.  After this stop, attendees saw more of the library, exploring the stacks and talking about labeling conventions.

After the tour, attendees had lunch and decided to dive into afternoon discussions early, starting with digital preservation strategies.

Touring the Bethany Lutheran College Archives

Digital Preservation Strategies and Policies: Works in progress

Minnesota State University, Mankato archives staff started discussions with a story about crashing the campus server one day. They kept adding objects to the server until it failed. No one had informed them they shouldn’t be using the space and they received no automatic warnings that the server was nearing capacity. The crash led to archives staff working with the University IT department on a plan for coming up with more storage space, and organizing current storage. Archives staff and IT are conducting a year-long storage test, trying to determine how many copies to save, which backups, and more. IT is looking at how to gain access to Microsoft Azure.

Augsburg University has a simple preservation plan – backups in three different locations. First, Augsburg Universities’ IT gave the archives staff access to a server. Also, Augsburg University archives staff are using Google Drive. With Islandora, the staff at Augsburg feel they now have three viable separate preservation options to keep their assets safe.

Discussions continued about backup plans, leading to questions about how to create realistic policies that can help determine what to keep and digitize. Attendees felt that simplified procedures are ideal for helping limit the content that archives can take. There needs to be a legitimate reason to preserve the material. Figuring out the primary purpose or core mission statement for the archives can help. Stated policies assist the archivist in saying no, which saves space, time, and effort for what is truly important. From these discussions, PALS recommends that one should look into forming a partnership with IT, if feasible. This connection can prevent instances like crashing an entire server. It can also allow both IT and the library to figure out digital preservation strategies together.

As the planning progresses, it is essential to remember that the plan does not need to be complicated and detailed. It could state just the core areas that are important to the archives, providing a framework that defines what the archives collects and preserves. Just having this can be very helpful. A framework preservation and collection management policy combined with a statement to consult the archivist or archives further is a good start. It can be easy to get overwhelmed by digital preservation and trying to formulate a plan. Try to remember to keep it simple. Keeping it simple became an overall theme of the day.

Attendees found it invaluable to sit down and talk through digital preservation and potential strategies. There was a realization that everyone is currently talking through, dealing with, and struggling with how to approach this topic. No one is alone in this, and everyone appreciated the chance to talk through this issue. Providing the opportunity to discuss this and similar questions is the real goal of the workday.

Collection management: The things you see in the Archives

The discussion on digital preservation merged into one on collection management. This conversation touched on the overwhelming nature of born-digital objects, especially photographs. For example, does one need to load an entire folder of 200 photos? Doing so can quickly get repetitive and makes digital archives look dull and hard to get through. Attendees recommended the following during the discussion:

  • Do not load the entire file.
  • Identify a percentage of the folder to keep, a “representative sample.”
    • The sample can be as low as 3-5%.
  • Pick out the best ones.
    • Students can review photos and choose the best.
  • Pull repetitive photographs.
    • Students can review and identify which ones not to load.
  • Keep an entire file in a backup.

Attendees and PALS agreed that not all photographs in a folder need to get loaded into a repository. One can include a statement along the lines of “Duplicates were not loaded, but backups were kept” to make it clear that not everything from the source was loaded online. There is the current expectation that everything will be digitized and available online. In reality, this is not feasible. Storage space would be quickly overwhelmed if every photo got loaded. Having some simple digital preservation and collection management guidelines available can help counter this sentiment and aid the Archivist.

Sensitive material reflects the time and culture

The afternoon discussion also covered what do do with potentially offensive materials. There will be particular material in the archives than can be offensive. The shift to digital has made things a lot more open and available. It is a lot easier for a single photograph to get taken out of context, and this can come back to the archives in a negative manner. A straightforward way to help with this issue is to state on the About page or other prominent location, that “Things you see in the archives reflect the time and culture and not necessarily the views of the University and Archives.” This statement can also be a subtle reminder that the archives job is to preserve, and not weed out parts of history.   

A chance for collaboration

All attendees agreed that the workday provided an invaluable opportunity to talk through issues and learn from another. The discussions were a good reminder to keep the complicated issue of digital preservation simple whenever possible. More straightforward policy statements can only assist the public and archives staff with collection management and preservation issues. Also, remember that everyone is dealing with the difficult questions of collection management and digital preservation. Look at partnerships at your institution or with similar institutions to help ease the difficulty. PALS looks forward to holding another workday in 2020.