While the vector store is local, it is sending the data to Gemini's API for embedding. (Which if using a paid API key is probably fine for most use cases, no long term retention/training etc.)
I think XML is good to know for prompting (similar to how <think></think> was popular for outputs, you can do that for other sections). But I have had much better experience just writing JSON and using line breaks, colons, etc. to demarcate sections.
XML helps because it a) Lets you to describe structures b) Make a clear context-change which make it clear you are not "talking in XML" you are "talking about XML".
I assume you are right too, JSON is a less verbose format which allows you to express any structure you can express in XML, and should be as easy for AI to parse. Although that probably depends on the training data too.
I recently asked AI why .md files are so prevalent with agentic AI and the answer is ... because .md files also express structure, like headers and lists.
Again, depends on what the AI has been trained on.
I would go with JSON, or some version of it which would also allow comments.
The main thing i use xml tags for is seperating content from instructions. Say I am doing prompt engineering, so that the content being operated on is itself a prompt then I wrap it with
<NO_OP_DRAFT>
draft prompt
</NO_OP_DRAFT>
instructions for modifying draft prompt
If I don't do this, a significant number of times it responds to the instructions in the draft.
I disagree that XML is more readable in general, but for the purpose of tagging blocks of text as <important>important</important> in freeform writing, JSON is basically useless
Could you clarify, do those tags need to be tags which exist and we need to lear about them and how to use them? Or we can put inside them whatever we want and just by virtue of being tags, Claude understands them in a special way?
All the major foundation models will understand them implicitly, so it was popular to use <think>, but you could also use <reason> or <thinkhard> and the model would still go through the same process.
This is cool, but for folks concerned about privacy, even if the cached layer is anonymized, in the aggregate I bet you can likely figure out who a person is.
I imagine just looking at the first degree connections of the votes would be a pretty strong signal.
I view them as more idiosyncratic docs, but focused on how to write code (there is so much huggingface code floating around the internet, the models do quite well with it already).
I have not had much success with skills that have tree based logic (if a do x, else do y), they just tend to do everything in the skill (so will do both x and y).
But just as "hey follow this outline of steps a,b,c" it works quite well in my experience.
Claude code inherits from the environment shell. So it could create a python program (or whatever language) to read the file:
# get_info.py
with open('~/.claude/secrets.env', 'r') as file:
content = file.read()
print(content)
And then run `python get_info.py`.
While this inheritance is convenient for testing code, it is difficult to isolate Claude in a way that you can run/test your application without giving up access to secrets.
If you can, IP whitelisting your secrets so if they are leaked is not a problem is an approach I recommend.
MDPI is gamed by design, I think that while Elsevier is awful, MDPI is even worse with 100s of special issues where you are guaranteed to land publication in journals with quite nice IF (which is inflated by publishing large proportion of reviews and less original research).
I wonder if the term "published" as a binary distinction applied to a piece of writing is a term and concept that is reaching the end of its useful life.
"Peer reviewed" as a binary concept might be as well, given that incentives have aligned to greatly reduce its filtering power.
They might both be examples of metrics that became useless as a result of incentives getting attached to them.
Both metrics are supposedly binary but in reality have always depended heavily on surrounding context. Archival journals have existed all along. Publication is useful as an immutable entry in the public record made via a third party. Blog posts have a tendency to disappear over time.
I'm certain that the comment you responded to never claimed that it was "isolated to Elsevier" in the first place, nor is it very compelling to speculate about how in the future something even worse might emerge.
Right now Elsevier is by far the biggest offender and also happens to the be the topic of the conversation and the article.
Exactly. Elsevier is a dominant company. Of course it's going to have a huge share of anything that goes into journals. They probably also have a huge share of the Nobel prize winning papers too.
That being said, I'm happy to encourage open access.
The book is likely a good fit to this type of work. The chapter on structured outputs shows how to extract out data from text, walking through prompt engineering and k-shot examples to generate json, to pydantic, then batch processing with the different providers.
It also shows how to set up evals in different parts of the book. (Depending on what you want to do, the structured outputs has evals show comparing models/prompt changes to ground truth, and the agent chapter has evals LLM as a judge.)
IMO Google Vertex is not any harder than AWS. AWS biggest pain is figuring out IAM roles for some of the services (batching and S3 Vectors -- I actually cut out Knowledge Bases in the book because it was too complicated and expensive). Have not personally had as big an issue figuring out Vertex.
I do have a follow up post planned on some reliability issues with the APIs I uncovered with compiling the book so much -- I would not use Google Maps grounding in production!
I am not as concerned with that with API usage as I am with the GUI tools.
Most of the day gig is structured extraction and agents, which the foundation LLMs are much better than any of the small models. (And I would not be able to provision necessary compute for large models given our throughput.)
I do have on the ToDo list though evaluating Textract vs the smaller OCR models (in the book I show using docling, their are others though, like the newer GLM-OCR). Our spend for that on AWS is large enough and they are small enough for me to be able to spin up resources sufficient to meet our demand.
Part of the reason the book goes through examples with AWS/Google (in additiona to OpenAI/Anthropic) is that I suspect many individuals will be stuck with the cloud provider that their org uses out of the box. So I wanted to have as wide of coverage as possible for those folks.
reply