Skip to content

Commit 33364aa

Browse files
Homogenizes loader API.
1 parent 0596a3f commit 33364aa

18 files changed

Lines changed: 569 additions & 264 deletions

File tree

README.md

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,9 @@ Then start using the data:
3636
```python
3737
from glazing.search import UnifiedSearch
3838

39+
# Automatically uses default data directory after 'glazing init'
3940
search = UnifiedSearch()
40-
results = search.search_by_query("give")
41+
results = search.search("give")
4142

4243
for result in results[:5]:
4344
print(f"{result.dataset}: {result.name} - {result.description}")
@@ -65,31 +66,36 @@ Load and work with individual datasets:
6566
```python
6667
from glazing.framenet.loader import FrameNetLoader
6768
from glazing.verbnet.loader import VerbNetLoader
68-
from pathlib import Path
6969

70-
# Load datasets
71-
data_dir = Path.home() / ".local/share/glazing/converted"
72-
73-
fn_loader = FrameNetLoader()
74-
frames = fn_loader.load_frames(data_dir / "framenet.jsonl")
70+
# Loaders automatically use default paths and load data after 'glazing init'
71+
fn_loader = FrameNetLoader() # Data is already loaded
72+
frames = fn_loader.frames
7573

76-
vn_loader = VerbNetLoader()
77-
verb_classes = vn_loader.load_verb_classes(data_dir / "verbnet.jsonl")
74+
vn_loader = VerbNetLoader() # Data is already loaded
75+
verb_classes = list(vn_loader.classes.values())
7876
```
7977

8078
Cross-reference resolution:
8179

8280
```python
83-
from glazing.references.resolver import ReferenceResolver
8481
from glazing.references.extractor import ReferenceExtractor
82+
from glazing.verbnet.loader import VerbNetLoader
83+
from glazing.propbank.loader import PropBankLoader
8584

86-
# Extract and resolve references
87-
extractor = ReferenceExtractor()
88-
references = extractor.extract_from_datasets(data_dir)
85+
# Load datasets
86+
vn_loader = VerbNetLoader()
87+
pb_loader = PropBankLoader()
8988

90-
resolver = ReferenceResolver(references)
91-
related = resolver.resolve("give.01", source="propbank")
92-
print(f"VerbNet classes: {related.verbnet_classes}")
89+
# Extract references
90+
extractor = ReferenceExtractor()
91+
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
92+
extractor.extract_propbank_references(list(pb_loader.framesets.values()))
93+
94+
# Access PropBank cross-references
95+
if "give.01" in extractor.propbank_refs:
96+
refs = extractor.propbank_refs["give.01"]
97+
vn_classes = refs.get_verbnet_classes()
98+
print(f"VerbNet classes for give.01: {vn_classes}")
9399
```
94100

95101
## Supported Datasets
@@ -142,3 +148,7 @@ MIT License - see [LICENSE](LICENSE) file for details.
142148
- [PyPI Package](https://pypi.org/project/glazing/)
143149
- [Documentation](https://glazing.readthedocs.io)
144150
- [Issue Tracker](https://github.com/aaronstevenwhite/glazing/issues)
151+
152+
## Acknowledgments
153+
154+
This project was funded by a [National Science Foundation](https://www.nsf.gov/) ([BCS-2040831](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2040831)) and builds upon the foundational work of the FrameNet, PropBank, VerbNet, and WordNet teams.

docs/api/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,8 @@ from glazing.references.resolver import ReferenceResolver
7474
from pathlib import Path
7575
from glazing.verbnet.loader import VerbNetLoader
7676

77-
loader = VerbNetLoader()
78-
verb_classes = loader.load_verb_classes(Path("data/verbnet.jsonl"))
77+
loader = VerbNetLoader(Path("data/verbnet.jsonl"))
78+
verb_classes = loader.load_verb_classes()
7979
```
8080

8181
### Searching
@@ -84,7 +84,7 @@ verb_classes = loader.load_verb_classes(Path("data/verbnet.jsonl"))
8484
from glazing.search import UnifiedSearch
8585

8686
search = UnifiedSearch()
87-
results = search.search_by_query("abandon")
87+
results = search.search("abandon")
8888
```
8989

9090
## Type Safety

docs/index.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,11 +48,12 @@ After initialization, you can immediately start exploring the data:
4848

4949
```python
5050
from glazing.search import UnifiedSearch
51-
from pathlib import Path
5251

53-
# Search across all datasets
52+
# Automatically uses default data directory after 'glazing init'
5453
search = UnifiedSearch()
55-
results = search.search_by_query("give")
54+
55+
# Search across all datasets
56+
results = search.search("give")
5657

5758
for result in results[:5]:
5859
print(f"{result.dataset}: {result.name} - {result.description}")
@@ -95,7 +96,7 @@ Glazing is actively maintained and welcomes contributions. The project follows s
9596

9697
## License
9798

98-
MIT License - see [LICENSE](https://github.com/aaronstevenwhite/glazing/blob/main/LICENSE) file for details.
99+
This package is licensed under an MIT License. See [LICENSE](https://github.com/aaronstevenwhite/glazing/blob/main/LICENSE) file for details.
99100

100101
## Citation
101102

@@ -109,3 +110,7 @@ If you use Glazing in your research, please cite:
109110
url = {https://github.com/aaronstevenwhite/glazing}
110111
}
111112
```
113+
114+
## Acknowledgments
115+
116+
This project was funded by a [National Science Foundation](https://www.nsf.gov/) ([BCS-2040831](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2040831)) and builds upon the foundational work of the FrameNet, PropBank, VerbNet, and WordNet teams.

docs/quick-start.md

Lines changed: 38 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -19,14 +19,14 @@ This one-time setup downloads ~120MB of data and prepares it for use.
1919
Find entries across all datasets:
2020

2121
```bash
22-
# Search for "give" in all datasets
23-
glazing search query "give" --data-dir ~/.local/share/glazing/converted
22+
# Search for "give" in all datasets (uses default data directory)
23+
glazing search query "give"
2424

2525
# Search only in VerbNet
26-
glazing search query "give" --dataset verbnet --data-dir ~/.local/share/glazing/converted
26+
glazing search query "give" --dataset verbnet
2727

2828
# Get JSON output
29-
glazing search query "give" --json --data-dir ~/.local/share/glazing/converted
29+
glazing search query "give" --json
3030
```
3131

3232
### Find Cross-References
@@ -35,8 +35,7 @@ Discover connections between datasets:
3535

3636
```bash
3737
# Find VerbNet classes for a PropBank roleset
38-
glazing search cross-ref --source propbank --target verbnet --id "give.01" \
39-
--data-dir ~/.local/share/glazing/converted
38+
glazing search cross-ref --source propbank --target verbnet --id "give.01"
4039
```
4140

4241
### Get Dataset Information
@@ -57,13 +56,12 @@ glazing download info verbnet
5756

5857
```python
5958
from glazing.search import UnifiedSearch
60-
from pathlib import Path
6159

62-
# Initialize search with default data directory
60+
# Initialize search (automatically uses default paths)
6361
search = UnifiedSearch()
6462

6563
# Search across all datasets
66-
results = search.search_by_query("abandon")
64+
results = search.search("abandon")
6765

6866
for result in results[:5]:
6967
print(f"{result.dataset}: {result.name}")
@@ -77,32 +75,27 @@ for result in results[:5]:
7775
```python
7876
from glazing.framenet.loader import FrameNetLoader
7977
from glazing.verbnet.loader import VerbNetLoader
80-
from pathlib import Path
8178

82-
data_dir = Path.home() / ".local/share/glazing/converted"
83-
84-
# Load FrameNet
85-
fn_loader = FrameNetLoader()
86-
frames = fn_loader.load_frames(data_dir / "framenet.jsonl")
79+
# Loaders automatically use default paths and load data after 'glazing init'
80+
fn_loader = FrameNetLoader() # Data is already loaded
81+
frames = fn_loader.frames
8782
print(f"Loaded {len(frames)} frames")
8883

89-
# Load VerbNet
90-
vn_loader = VerbNetLoader()
91-
verb_classes = vn_loader.load_verb_classes(data_dir / "verbnet.jsonl")
84+
vn_loader = VerbNetLoader() # Data is already loaded
85+
verb_classes = list(vn_loader.classes.values())
9286
print(f"Loaded {len(verb_classes)} verb classes")
9387
```
9488

9589
### Work with VerbNet Classes
9690

9791
```python
9892
from glazing.verbnet.loader import VerbNetLoader
99-
from pathlib import Path
10093

101-
data_dir = Path.home() / ".local/share/glazing/converted"
94+
# Loader automatically uses default path and loads data
10295
loader = VerbNetLoader()
10396

104-
# Load all verb classes
105-
classes = loader.load_verb_classes(data_dir / "verbnet.jsonl")
97+
# Access already loaded verb classes
98+
classes = list(loader.classes.values())
10699

107100
# Find a specific class
108101
give_class = next(
@@ -125,13 +118,12 @@ if give_class:
125118

126119
```python
127120
from glazing.propbank.loader import PropBankLoader
128-
from pathlib import Path
129121

130-
data_dir = Path.home() / ".local/share/glazing/converted"
122+
# Loader automatically uses default path and loads data
131123
loader = PropBankLoader()
132124

133-
# Load framesets
134-
framesets = loader.load_framesets(data_dir / "propbank.jsonl")
125+
# Access already loaded framesets
126+
framesets = list(loader.framesets.values())
135127

136128
# Find rolesets for "give"
137129
give_framesets = [fs for fs in framesets if fs.lemma == "give"]
@@ -149,16 +141,20 @@ for frameset in give_framesets:
149141
```python
150142
from glazing.references.extractor import ReferenceExtractor
151143
from glazing.references.resolver import ReferenceResolver
152-
from pathlib import Path
144+
from glazing.verbnet.loader import VerbNetLoader
145+
from glazing.propbank.loader import PropBankLoader
153146

154-
data_dir = Path.home() / ".local/share/glazing/converted"
147+
# Load datasets
148+
vn_loader = VerbNetLoader() # Automatically loads data
149+
pb_loader = PropBankLoader() # Automatically loads data
155150

156-
# Extract all references
151+
# Extract references
157152
extractor = ReferenceExtractor()
158-
references = extractor.extract_from_datasets(data_dir)
153+
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
154+
extractor.extract_propbank_references(list(pb_loader.framesets.values()))
159155

160156
# Resolve references for a PropBank roleset
161-
resolver = ReferenceResolver(references)
157+
resolver = ReferenceResolver(extractor.mapping_index)
162158
related = resolver.resolve("give.01", source="propbank")
163159

164160
print(f"PropBank roleset: give.01")
@@ -171,13 +167,10 @@ print(f"WordNet senses: {related.wordnet_senses}")
171167

172168
```python
173169
from glazing.wordnet.loader import WordNetLoader
174-
from pathlib import Path
175170

176-
data_dir = Path.home() / ".local/share/glazing/converted"
171+
# Loader automatically uses default path and loads data
177172
loader = WordNetLoader()
178-
179-
# Load synsets
180-
synsets = loader.load_synsets(data_dir / "wordnet.jsonl")
173+
synsets = list(loader.synsets.values()) # Already loaded
181174

182175
# Find synsets for "dog"
183176
dog_synsets = [s for s in synsets if any(
@@ -203,13 +196,12 @@ For memory-efficient processing:
203196

204197
```python
205198
from glazing.verbnet.loader import VerbNetLoader
206-
from pathlib import Path
207199

208-
data_dir = Path.home() / ".local/share/glazing/converted"
209-
loader = VerbNetLoader()
200+
# For memory-efficient streaming, use lazy loading
201+
loader = VerbNetLoader(lazy=True, autoload=False)
210202

211203
# Stream verb classes one at a time
212-
for verb_class in loader.stream_verb_classes(data_dir / "verbnet.jsonl"):
204+
for verb_class in loader.iter_verb_classes():
213205
# Process each class without loading all into memory
214206
if "run" in [m.name for m in verb_class.members]:
215207
print(f"Found 'run' in class: {verb_class.id}")
@@ -222,10 +214,11 @@ for verb_class in loader.stream_verb_classes(data_dir / "verbnet.jsonl"):
222214

223215
```python
224216
from glazing.verbnet.search import VerbNetSearch
225-
from pathlib import Path
217+
from glazing.verbnet.loader import VerbNetLoader
226218

227-
data_dir = Path.home() / ".local/share/glazing/converted"
228-
search = VerbNetSearch(data_dir / "verbnet.jsonl")
219+
# Loader automatically loads data
220+
loader = VerbNetLoader()
221+
search = VerbNetSearch(list(loader.classes.values()))
229222

230223
# Find all classes with an Agent role
231224
agent_classes = []
@@ -241,13 +234,10 @@ print(f"Classes with Agent role: {len(agent_classes)}")
241234
```python
242235
import json
243236
from glazing.framenet.loader import FrameNetLoader
244-
from pathlib import Path
245237

246-
data_dir = Path.home() / ".local/share/glazing/converted"
238+
# Loader automatically uses default path and loads data
247239
loader = FrameNetLoader()
248-
249-
# Load frames
250-
frames = loader.load_frames(data_dir / "framenet.jsonl")
240+
frames = loader.frames # Already loaded
251241

252242
# Export as simple JSON
253243
simple_frames = []

docs/user-guide/cli.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -98,11 +98,11 @@ Common Options:
9898
# 1. Initialize everything
9999
glazing init
100100

101-
# 2. Search for a concept
102-
glazing search query "give" --data-dir ~/.local/share/glazing/converted
101+
# 2. Search for a concept (uses default data directory)
102+
glazing search query "give"
103103

104104
# 3. Find cross-references
105-
glazing search cross-ref --source propbank --id "give.01" --target verbnet --data-dir ~/.local/share/glazing/converted
105+
glazing search cross-ref --source propbank --id "give.01" --target verbnet
106106
```
107107
108108
### Custom Data Directory
@@ -122,13 +122,13 @@ glazing search query "run"
122122
123123
```bash
124124
# Download only VerbNet
125-
glazing download dataset --dataset verbnet --output-dir raw/
125+
glazing download dataset --dataset verbnet
126126

127127
# Convert it
128-
glazing convert dataset --dataset verbnet --input-dir raw/verbnet-3.4 --output-dir converted/
128+
glazing convert dataset --dataset verbnet
129129

130-
# Search it
131-
glazing search query "run" --dataset verbnet --data-dir converted/
130+
# Search it (uses default converted directory)
131+
glazing search query "run" --dataset verbnet
132132
```
133133
134134
## Output Formats

docs/user-guide/cross-references.md

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -51,16 +51,20 @@ glazing search cross-ref --source propbank --id "give.01" \
5151
```python
5252
from glazing.references.extractor import ReferenceExtractor
5353
from glazing.references.resolver import ReferenceResolver
54-
from pathlib import Path
54+
from glazing.verbnet.loader import VerbNetLoader
55+
from glazing.propbank.loader import PropBankLoader
5556

56-
data_dir = Path.home() / ".local/share/glazing/converted"
57+
# Load datasets
58+
vn_loader = VerbNetLoader() # Automatically loads data
59+
pb_loader = PropBankLoader() # Automatically loads data
5760

58-
# Extract all references
61+
# Extract references
5962
extractor = ReferenceExtractor()
60-
references = extractor.extract_from_datasets(data_dir)
63+
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
64+
extractor.extract_propbank_references(list(pb_loader.framesets.values()))
6165

6266
# Resolve for specific item
63-
resolver = ReferenceResolver(references)
67+
resolver = ReferenceResolver(extractor.mapping_index)
6468
related = resolver.resolve("give.01", source="propbank")
6569

6670
print(f"VerbNet: {related.verbnet_classes}")
@@ -74,14 +78,19 @@ print(f"FrameNet: {related.framenet_frames}")
7478

7579
```python
7680
from glazing.references.extractor import ReferenceExtractor
81+
from glazing.verbnet.loader import VerbNetLoader
7782

78-
extractor = ReferenceExtractor()
83+
# Load dataset
84+
vn_loader = VerbNetLoader()
7985

80-
# Extract from all datasets
81-
all_refs = extractor.extract_from_datasets(data_dir)
86+
# Extract references
87+
extractor = ReferenceExtractor()
8288

8389
# Extract from specific dataset
84-
vn_refs = extractor.extract_from_verbnet(verb_classes)
90+
extractor.extract_verbnet_references(list(vn_loader.classes.values()))
91+
92+
# Access extracted references
93+
vn_refs = extractor.verbnet_refs
8594
```
8695

8796
### Manual Mapping

0 commit comments

Comments
 (0)