Skip to content

Commit 2ecb1ee

Browse files
committed
LIBDRUM-991. Add additional Etd Loader documentation
https://umd-dit.atlassian.net/browse/LIBDRUM-991
1 parent ff6c490 commit 2ecb1ee

3 files changed

Lines changed: 133 additions & 1 deletion

File tree

dspace/docs/DrumEmbargoAndAccessRestrictions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ system simply relies on those administrators maintaining both policies.
9494

9595
When ingesting ETD items from ProQuest, the bitstreams will either have no
9696
embargo, or a specific date for lifting the embargo. For embargoed items, the
97-
ETD loaded automatically adds both policies.
97+
ETD loader automatically adds both policies.
9898

9999
### Embargo List
100100

dspace/docs/DrumEtdLoader.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# DRUM ETD Loader
2+
3+
## Introduction
4+
5+
The DRUM ETD Loader is UMD custom functionality for processing files uploaded
6+
from ProQuest into DRUM.
7+
8+
"ETD" stands for "electronic theses and dissertations".
9+
10+
## ETD Workflow
11+
12+
ProQuest periodically uploads Zip files to DRUM via SFTP to a specific
13+
"incoming" directory for processing. ProQuest sends an email to
14+
"<lib-drum@umd.edu>" with a list of the ETD files that were delivered
15+
(or failed to deliver).
16+
17+
Each Zip file contains
18+
19+
* An XML file containing the metadata for the theses/dissertation
20+
* One or more PDF files
21+
22+
The "load-etd-nightly" cron job processes each Zip file in the "incoming"
23+
directory, adding them to DRUM. Successfully processed Zip files are moved to a
24+
"processed" directory so that they is not processed again.
25+
26+
Upon completion, the "load-etd-nightly" sends an email of the log messages
27+
generated by the cron job.
28+
29+
If an error occurs when processing a Zip file, the Zip file will be "skipped"
30+
and remain in the "incoming" directory, and will be processed again on the next
31+
cron run.
32+
33+
## ETD Loader Components
34+
35+
The ETD Loader functionality consists of:
36+
37+
* an SFTP server for receiving files from ProQuest
38+
* The "load-etd-nightly"/"load-etd" scripts that loads the Zip files
39+
* Java classes in the DSpace "additions" modules
40+
* Angular components in the "umd-lib/dspace-angular" repository supporting
41+
the creation/editing/deletion of "ETD Departments".
42+
* A special "dspace/config/log4j2-etdloader.xml" Log4J configuration for
43+
controlling the log format
44+
* Configuration properties in "local.cfg"
45+
46+
## Related Documentation
47+
48+
* [DrumCronTasks.md](DrumCronTasks.md) - contains information the
49+
"load-etd-nightly" cron job that loads the Zip files received from ProQuest.
50+
* [DrumEmbargoAndAccessRestrictions.md](DrumEmbargoAndAccessRestrictions.md) -
51+
for information on embargo functionality.
52+
* [DrumLogging.md](DrumLogging.md) - contains information pertaining to the ETD
53+
logging functionality and email.
54+
* [DrumTestPlan.md](DrumTestPlan.md) - contains test steps for verifying the
55+
"ETD Departments" CRUD functionality, and SFTP connectivity.
56+
* [dspace/src/main/docker/README.md](../src/main/docker/README.md) - contains
57+
information about the SFTP Docker container
58+
59+
## ETD Departments
60+
61+
----
62+
63+
**Note**: "ETD Departments" is the human-friendly GUI-based name -- the
64+
Java and Angular source code uses "ETD Units".
65+
66+
----
67+
68+
The XML metadata provided by ProQuest includes one (or more) "DISS_inst_contact"
69+
entries, for example:
70+
71+
```xml
72+
<?xml version="1.0" encoding="ISO-8859-1"?>
73+
<DISS_submission publishing_option="0" embargo_code="0" third_party_search="Y">
74+
...
75+
<DISS_description ...>
76+
...
77+
<DISS_institution>
78+
...
79+
<DISS_inst_contact>English Language and Literature</DISS_inst_contact>
80+
```
81+
82+
Each "DISS_inst_contact" must match an existing "ETD Department" in DRUM, which
83+
is used to map the ETD into the appropriate DRUM collection.
84+
85+
Each ETD is also added to the DRUM collection specified in the
86+
"drum.etdloader.collection" configuration property.
87+
88+
## ETD Loader Configuration Properties
89+
90+
The following properties are used to configure the ETD Loader.
91+
92+
### drum.etdloader.collection
93+
94+
The UUID of the collection that all ETD submissions are added to (in addition
95+
to the collection specified in the "DISS_inst_contact" XML property).
96+
97+
### drum.etdloader.eperson
98+
99+
The email address of the DRUM EPerson used to load the ETD submissions.
100+
101+
### drum.etdloader.maxFileSize
102+
103+
Operational parameter that sets a limit (in bytes) on the size of files that
104+
can be processed by the ETD Loader.
105+
106+
This parameter is necessary to prevent the ETD Loader from uncompressing a
107+
Zip file entry that exceeds the resource limit of "drum-cron-ephemeral-vol"
108+
ephemeral volume in Kubernetes (which would cause the pod to reboot).
109+
110+
If a Zip file contains an entry that exceeds the limit, the entire file will
111+
be skipped, and a message added to the ETD log (and email).
112+
113+
This parameter is optional -- if not set (or set to "-1") no file size limit
114+
will be enforced.
115+
116+
### drum.mail.etd.recipient
117+
118+
Email address that receives the output message from the ETD Loader.
119+
120+
### drum.mail.duplicate_title
121+
122+
Email address that receives notifications of duplicate titles from the ETD
123+
Loader.
124+
125+
## SFTP
126+
127+
A ProQuest-provided public key that is added to the SFTP configuration to enable
128+
ProQuest to upload files.
129+
130+
See the "docs/Secrets.md" document in the "umd-lib/k8s-drum" repository.

dspace/docs/DrumFeatures.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ information.
3131

3232
## Electronic Theses and Dissertations (ETD)
3333

34+
See [dspace/docs/DrumEtdLoader.md](DrumEtdLoader.md) for additional information.
35+
3436
* LIBDRUM-671 - "ETD Department" CRUD functionality
3537
* LIBDRUM-680 - Loader for loading ProQuest ETDs into DRUM
3638
* transform ProQuest metadata to dublin core

0 commit comments

Comments
 (0)