Skip to content

KeyError in DaviesBouldin clustering metric  #12

@qetdr

Description

@qetdr

Versions

river version: 0.14.0
Python version: 3.10.4
Operating system: macOS Ventura 13.2

Describe the bug

Getting a KeyError when trying to reproduce the example in the DaviesBouldin class' docstrings. Note: the only difference between the example and the present code is importing the metric from river_extra instead of river.

The code used:

from river import cluster
from river import stream
from river_extra.metrics.cluster import DaviesBouldin

X = [
    [1, 2],
    [1, 4],
    [1, 0],
    [4, 2],
    [4, 4],
    [4, 0],
    [-2, 2],
    [-2, 4],
    [-2, 0]
]

k_means = cluster.KMeans(n_clusters=3, halflife=0.4, sigma=3, seed=0)
metric = DaviesBouldin()

for x, _ in stream.iter_array(X):
    k_means = k_means.learn_one(x)
    y_pred = k_means.predict_one(x)
    metric = metric.update(x, y_pred, k_means.centers)
metric

The output:

---------------------------------------------------------------------------
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/Library/Python/3.10/lib/python/site-packages/IPython/core/formatters.py:706, in PlainTextFormatter.__call__(self, obj)
    699 stream = StringIO()
    700 printer = pretty.RepresentationPrinter(stream, self.verbose,
    701     self.max_width, self.newline,
    702     max_seq_length=self.max_seq_length,
    703     singleton_pprinters=self.singleton_printers,
    704     type_pprinters=self.type_printers,
    705     deferred_pprinters=self.deferred_printers)
--> 706 printer.pretty(obj)
    707 printer.flush()
    708 return stream.getvalue()

File ~/Library/Python/3.10/lib/python/site-packages/IPython/lib/pretty.py:410, in RepresentationPrinter.pretty(self, obj)
    407                         return meth(obj, self, cycle)
    408                 if cls is not object \
    409                         and callable(cls.__dict__.get('__repr__')):
--> 410                     return _repr_pprint(obj, self, cycle)
    412     return _default_pprint(obj, self, cycle)
    413 finally:

File ~/Library/Python/3.10/lib/python/site-packages/IPython/lib/pretty.py:778, in _repr_pprint(obj, p, cycle)
    776 """A pprint that just redirects to the normal repr function."""
    777 # Find newlines and replace them with p.break_()
--> 778 output = repr(obj)
    779 lines = output.splitlines()
    780 with p.group():

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/river_extra/metrics/cluster/base.py:42, in ClusteringMetric.__repr__(self)
     40 def __repr__(self):
     41     """Returns the class name along with the current value of the metric."""
---> 42     return f"{self.__class__.__name__}: {self.get():{self._fmt}}".rstrip("0")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/river_extra/metrics/cluster/daviesbouldin.py:95, in DaviesBouldin.get(self)
     90 for j in range(i + 1, n_clusters):
     91     distance_ij = math.sqrt(
     92         utils.math.minkowski_distance(self._centers[i], self._centers[j], 2)
     93     )
     94     ij_partner_cluster_index = (
---> 95         self._inter_cluster_distances[i] / self._n_points_by_clusters[i]
     96         + self._inter_cluster_distances[j] / self._n_points_by_clusters[j]
     97     ) / distance_ij
     98     if ij_partner_cluster_index > max_partner_clusters_index:
     99         max_partner_clusters_index = ij_partner_cluster_index

KeyError: 0

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions