Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Wikimedia Commons has this feature. Editors can manually bless certain combinations of traits as "subcategories".

For example, https://commons.wikimedia.org/wiki/Category:Paintings_of_cas... contains the subcategories "Paintings of castles by country" (nested hierarchy), "Frescos of castles" (a medium), "Paintings of Château de Chillon" (a subject), and "Young Knight in a Landscape by Carpaccio" (multiple views onto a specific item). Each item may appear in multiple subcategories. As far as I can tell, the UI won't let you search for frescos of Italian castles (unless somebody's made a subcategory for that), or view all paintings of castles regardless of their subcategory.

I'm not very fond of this approach. I'd prefer for each item to have an unstructured set of tags ("fresco", "depiction of a castle", "depiction of Italy"), with automatic derivation of parent tags ("fresco" implies "painting") and the option to search by multiple tags. It should be possible to automatically discover tags which best refine a search, so that the UI can still suggest them to the user, as it does today.



> I'd prefer for each item to have an unstructured set of tags ("fresco", "depiction of a castle", "depiction of Italy"), with automatic derivation of parent tags ("fresco" implies "painting") and the option to search by multiple tags.

It's definitely possible to do this. IMSLP (a large repository of freely available sheet music, which differs by cross-cutting features such as genre, historical period, contributors (composers and others), instrumentation etc.) is MediaWiki based and has a plugin that does exactly that. These days the would probably want to host all the tags on Wikidata so that they can be multilingual and queryable out of the box, though.


Which is actually done on commons, it just isn't very popular (on images, click the structured data tab and then look at depicts) [admittedly i think a big part of the problem is is implementation choices and UI decisions].


That's only "depicts" claims and is nowhere near comprehensive. It doesn't even come close to matching what's currently stated using categories. Running searches on that data is also hard compared to what IMSLP gives you for their own system.


The Library of Congress uses both approaches, to an extent.

The cataloguing system uses a hierarchical classification, based on one originally developed by Thomas Jefferson, on whose initial donation the Library of Congress is based. This is known as the Library of Congress Classification, and is used to specifically locate a given title or work within the stacks, that is, each item has one and only one location.

There are also subject headings which are more tag-based, though also on a controlled vocabulary. A given work is given a (relatively small number) of subjects to which it's associated. These are not hierarchical, though of course the listing of subject headings itself follows a sequence. Unlike the classification, which assigns a single location to each work, the headings are a search aid to patrons searching for a set of related works within a subject heading, or facilitate branching of a search to possibly related subjects.

Tagging systems, especially ad hoc tags supplied by untrained users, are popular but tend to produce numerous issues over time. Not that formal systems (as with the LoC systems mentioned here) are immune to same. One feature of the LoC systems is that they've evolved processes for managing change over time. Examples would be terminology or classifications which are now deprecated, or of regions and polities which have changed or no longer exist (e.g., the Austro-Hungarian empire, the USSR), or of changes in underlying classifications (e.g., of chemical elements or of biological classifications, both of which have evolved significantly over the life of the Library of Congress).

The history of hierarchical information classifications is long and IMO fascinating, dating at least to Aristotle and his Categories, as well as numerous variants used in classifications of knowledge (such as Francis Bacon's) or encyclopedias, including Diderot's and Britannica.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: