Thank you. This is indeed interesting... Currently, I use the `CancellationTokenSource` / `Task` concept of C# and I'm pretty happy with it, but this is definitely worth reading.
Would you like to talk more about the things you're working on? I am interested in performance, the architecture of software and solving problems.
One problem I have with computer systems is data liveness and synchronization. You want to react to change in many situations but don't want to do things inefficiently such as polling regularly.
You kind of want to react to change given an event, when that event happens. So you don't need to poll and compare.
You also have the problem of identity, how you map data to other data and keep it in synchronization.
If you can capture events at source, then you could do the right behaviour. But it's very hard to capture events at source in modern computing systems as not every API has a callback or event log mechanism.
Sure, why not... so my pet project is basically for managing my audio files. There already is navidrome[1] and audiobookshelf[2]. They work great so far, but some minor details are kind of annoying...
The first milestone will be providing a basic API for my files - the main components of this will be the database (postgres), the API (C# + swashbuckle + JsonApiDotNet + Websockets) and the file indexer (C# HostedService). All parts except the file indexer are pretty much done, but it is a critical component, because it has to be as fast and correct as possible.
There are multiple approaches to index files... A best case scenario would be an "import" / "move" of files into a library or repository. That way you would be always up to date and always perfectly sorted. Unfortunately, an import would also be a big amount of work, because analysing the files and getting metadata from online sources is... lets say a huge project. And NOT getting metadata would mean, that I cannot move the files while another app manages the metadata. So I took another path - scanning an existing and well tagged library (that I manage with beets[3] for music and m4b-tool[4] / tone[5] for audio books).
My current Idea is to have a file indexer that:
- can run on multiple sources
- runs one full index scan after starting the app
- registers a filesystem watcher for every file source and reacts to events
- To ensure, no filesource is blocking others, each source is processed by a fixed batch size and then move on the the next file source
- If sources are modified (added, changed, deleted), there is a decision, what to do with already running indexers and registered file watchers (added just go to the queue, changed and deleted cancel already running tasks only for this source)
- All files are hashed (content only) to ensure, a change of metadata or tags will not change the hash, and if a file is moved, it will recognize this and update instead of delete and insert
The database will contain Tag-Values for every possible value. E.g.
File.Location music/album/AC_DC/Back in Black/01 - Hells Bells.mp3
FileTag.Type Artist
Tag.Value AC/DC
That way I can add a fulltext index on the Tags.Value field containing a searchable value while maintaining the FileTag.Type for recommendations.
Let's say I search for `AC/DC`, it will provide an auto-complete for all FileTags.Type values, that show a match + a generic one for searching ALL values:
Artist: AC/DC
FullText: AC/DC
Searching for 2010 will show:
Released: 2010
Title: 2010
FullText: 2010
because it contains matches in Releasedate and title.
There may be a lot to optimize, but I think my current plan goes pretty well. Let me know what you think about this approach :-)
Do file watchers registered in the main thread get called and then enqueue a message to a worker thread for processing?
I am guessing you want to keep the code that handles file events on the watcher and on startup the same code used in two places.
Guessing you scan multiple source directories of files recursively.
Does C# have a thread safe queue object? You could create a pool of worker threads and the file watcher can enqueue events
You could have threads that scan file sources (one per source) which enqueue file names to worker threads which do the work. You could have a queue per source thread and worker thread.
The problem with the file watcher code is that I don't know what context that event runs in, so you would either have to enqueue events from the main thread context to one of the worker thread queues.
> Do file watchers registered in the main thread get called and then enqueue a message to a worker thread for processing?
Yes, Producer Consumer pattern. Currently a single thread each, but that would be scalable later. For now I try to keep things simple.
> Guessing you scan multiple source directories of files recursively. Does C# have a thread safe queue object? You could create a pool of worker threads and the file watcher can enqueue events
Yes. There are a few. I use BufferBlock<T> [1], which is pretty flexible.
> The problem with the file watcher code is that I don't know what context that event runs in, so you would either have to enqueue events from the main thread context to one of the worker thread queues.
This is the long term plan. Using events is much more flexible than "polling" the next batch of file items (even if it is in realtime). The architechture seems to work out for this but I think for now I'm pretty close to a working solution. Maybe I start going for it, develop a small UI in flutter and see, where there might be problems :-) Currently there is too much "theory" - I would like to see this in practise.