|
| 1 | +# DRUM ETD Loader |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +The DRUM ETD Loader is UMD custom functionality for processing files uploaded |
| 6 | +from ProQuest into DRUM. |
| 7 | + |
| 8 | +"ETD" stands for "electronic theses and dissertations". |
| 9 | + |
| 10 | +## ETD Workflow |
| 11 | + |
| 12 | +ProQuest periodically uploads Zip files to DRUM via SFTP to a specific |
| 13 | +"incoming" directory for processing. ProQuest sends an email to |
| 14 | +"<lib-drum@umd.edu>" with a list of the ETD files that were delivered |
| 15 | +(or failed to deliver). |
| 16 | + |
| 17 | +Each Zip file contains |
| 18 | + |
| 19 | +* An XML file containing the metadata for the theses/dissertation |
| 20 | +* One or more PDF files |
| 21 | + |
| 22 | +The "load-etd-nightly" cron job processes each Zip file in the "incoming" |
| 23 | +directory, adding them to DRUM. Successfully processed Zip files are moved to a |
| 24 | +"processed" directory so that they is not processed again. |
| 25 | + |
| 26 | +Upon completion, the "load-etd-nightly" sends an email of the log messages |
| 27 | +generated by the cron job. |
| 28 | + |
| 29 | +If an error occurs when processing a Zip file, the Zip file will be "skipped" |
| 30 | +and remain in the "incoming" directory, and will be processed again on the next |
| 31 | +cron run. |
| 32 | + |
| 33 | +## ETD Loader Components |
| 34 | + |
| 35 | +The ETD Loader functionality consists of: |
| 36 | + |
| 37 | +* an SFTP server for receiving files from ProQuest |
| 38 | +* The "load-etd-nightly"/"load-etd" scripts that loads the Zip files |
| 39 | +* Java classes in the DSpace "additions" modules |
| 40 | +* Angular components in the "umd-lib/dspace-angular" repository supporting |
| 41 | + the creation/editing/deletion of "ETD Departments". |
| 42 | +* A special "dspace/config/log4j2-etdloader.xml" Log4J configuration for |
| 43 | + controlling the log format |
| 44 | +* Configuration properties in "local.cfg" |
| 45 | + |
| 46 | +## Related Documentation |
| 47 | + |
| 48 | +* [DrumCronTasks.md](DrumCronTasks.md) - contains information the |
| 49 | + "load-etd-nightly" cron job that loads the Zip files received from ProQuest. |
| 50 | +* [DrumEmbargoAndAccessRestrictions.md](DrumEmbargoAndAccessRestrictions.md) - |
| 51 | + for information on embargo functionality. |
| 52 | +* [DrumLogging.md](DrumLogging.md) - contains information pertaining to the ETD |
| 53 | + logging functionality and email. |
| 54 | +* [DrumTestPlan.md](DrumTestPlan.md) - contains test steps for verifying the |
| 55 | + "ETD Departments" CRUD functionality, and SFTP connectivity. |
| 56 | +* [dspace/src/main/docker/README.md](../src/main/docker/README.md) - contains |
| 57 | + information about the SFTP Docker container |
| 58 | + |
| 59 | +## ETD Departments |
| 60 | + |
| 61 | +---- |
| 62 | + |
| 63 | +**Note**: "ETD Departments" is the human-friendly GUI-based name -- the |
| 64 | +Java and Angular source code uses "ETD Units". |
| 65 | + |
| 66 | +---- |
| 67 | + |
| 68 | +The XML metadata provided by ProQuest includes one (or more) "DISS_inst_contact" |
| 69 | +entries, for example: |
| 70 | + |
| 71 | +```xml |
| 72 | +<?xml version="1.0" encoding="ISO-8859-1"?> |
| 73 | +<DISS_submission publishing_option="0" embargo_code="0" third_party_search="Y"> |
| 74 | + ... |
| 75 | + <DISS_description ...> |
| 76 | + ... |
| 77 | + <DISS_institution> |
| 78 | + ... |
| 79 | + <DISS_inst_contact>English Language and Literature</DISS_inst_contact> |
| 80 | +``` |
| 81 | + |
| 82 | +Each "DISS_inst_contact" must match an existing "ETD Department" in DRUM, which |
| 83 | +is used to map the ETD into the appropriate DRUM collection. |
| 84 | + |
| 85 | +Each ETD is also added to the DRUM collection specified in the |
| 86 | +"drum.etdloader.collection" configuration property. |
| 87 | + |
| 88 | +## ETD Loader Configuration Properties |
| 89 | + |
| 90 | +The following properties are used to configure the ETD Loader. |
| 91 | + |
| 92 | +### drum.etdloader.collection |
| 93 | + |
| 94 | +The UUID of the collection that all ETD submissions are added to (in addition |
| 95 | +to the collection specified in the "DISS_inst_contact" XML property). |
| 96 | + |
| 97 | +### drum.etdloader.eperson |
| 98 | + |
| 99 | +The email address of the DRUM EPerson used to load the ETD submissions. |
| 100 | + |
| 101 | +### drum.etdloader.maxFileSize |
| 102 | + |
| 103 | +Operational parameter that sets a limit (in bytes) on the size of files that |
| 104 | +can be processed by the ETD Loader. |
| 105 | + |
| 106 | +This parameter is necessary to prevent the ETD Loader from uncompressing a |
| 107 | +Zip file entry that exceeds the resource limit of "drum-cron-ephemeral-vol" |
| 108 | +ephemeral volume in Kubernetes (which would cause the pod to reboot). |
| 109 | + |
| 110 | +If a Zip file contains an entry that exceeds the limit, the entire file will |
| 111 | +be skipped, and a message added to the ETD log (and email). |
| 112 | + |
| 113 | +This parameter is optional -- if not set (or set to "-1") no file size limit |
| 114 | +will be enforced. |
| 115 | + |
| 116 | +### drum.mail.etd.recipient |
| 117 | + |
| 118 | +Email address that receives the output message from the ETD Loader. |
| 119 | + |
| 120 | +### drum.mail.duplicate_title |
| 121 | + |
| 122 | +Email address that receives notifications of duplicate titles from the ETD |
| 123 | +Loader. |
| 124 | + |
| 125 | +## SFTP |
| 126 | + |
| 127 | +A ProQuest-provided public key that is added to the SFTP configuration to enable |
| 128 | +ProQuest to upload files. |
| 129 | + |
| 130 | +See the "docs/Secrets.md" document in the "umd-lib/k8s-drum" repository. |
0 commit comments