More

apwheele · 2026-03-24T17:44:08 1774374248

While the vector store is local, it is sending the data to Gemini's API for embedding. (Which if using a paid API key is probably fine for most use cases, no long term retention/training etc.)

jakejmnz · 2026-03-26T14:54:21 1774536861

works completely locally with a decent model: https://github.com/jakejimenez/sentinelsearch

apwheele · 2026-03-01T17:24:07 1772385847

I think XML is good to know for prompting (similar to how <think></think> was popular for outputs, you can do that for other sections). But I have had much better experience just writing JSON and using line breaks, colons, etc. to demarcate sections.

E.g. instead of

    <examples>
      <ex1>
        <input>....</input>
        <output>.....</output>
      </ex1>
      <ex2>....</ex2>
      ...
    </examples>
    <instructions>....</instructions>
    <input>{actual input}</input>

Just doing something like:

    ...instructions...
    input: ....
    output: {..json here}
    ...maybe further instructions...
    input: {actual input}

Use case document processing/extraction (both with Haiku and OpenAI models), the latter example works much better than the XML.

N of 1 anecdote anyway for one use case.

galaxyLogic · 2026-03-01T19:52:52 1772394772

XML helps because it a) Lets you to describe structures b) Make a clear context-change which make it clear you are not "talking in XML" you are "talking about XML".

I assume you are right too, JSON is a less verbose format which allows you to express any structure you can express in XML, and should be as easy for AI to parse. Although that probably depends on the training data too.

I recently asked AI why .md files are so prevalent with agentic AI and the answer is ... because .md files also express structure, like headers and lists.

Again, depends on what the AI has been trained on.

I would go with JSON, or some version of it which would also allow comments.

irthomasthomas · 2026-03-02T10:40:20 1772448020

The main thing i use xml tags for is seperating content from instructions. Say I am doing prompt engineering, so that the content being operated on is itself a prompt then I wrap it with

<NO_OP_DRAFT> draft prompt </NO_OP_DRAFT>

instructions for modifying draft prompt

If I don't do this, a significant number of times it responds to the instructions in the draft.

marxisttemp · 2026-03-01T19:46:22 1772394382

XML is much more readable than JSON, especially if your data has characters that are meaningful JSON syntax

galaxyLogic · 2026-03-01T19:56:01 1772394961

I think readability is in the eye of the reader. JSON is less verbose, no ending tags everywhere, which I think makes it more readable than XML.

But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

ezfe · 2026-03-01T22:07:52 1772402872

I disagree that XML is more readable in general, but for the purpose of tagging blocks of text as <important>important</important> in freeform writing, JSON is basically useless

what · 2026-03-02T05:39:51 1772429991

>But I'd be happy to hear about studies that show evidence for XML being more readable, than JSON.

But I’d be happy to hear about studies that show evidence for JSON being readable, than XML.

ekjhgkejhgk · 2026-03-01T17:37:12 1772386632

Could you clarify, do those tags need to be tags which exist and we need to lear about them and how to use them? Or we can put inside them whatever we want and just by virtue of being tags, Claude understands them in a special way?

ezfe · 2026-03-01T17:51:28 1772387488

They probably don’t need to be specific values. The model is fine tuned to see the tags as signals and then interprets them

galaxyLogic · 2026-03-01T19:59:25 1772395165

If it walks like a duck ... AI thinks it is something like a duck.

apwheele · 2026-03-01T17:59:40 1772387980

All the major foundation models will understand them implicitly, so it was popular to use <think>, but you could also use <reason> or <thinkhard> and the model would still go through the same process.

cyanydeez · 2026-03-01T20:26:35 1772396795

<ponderforamoment>HTML is a large subsection of their training data, so they're used to seeing a somewhat semantic worldview</ponderforamoment>

apwheele · 2026-02-27T12:47:55 1772196475

This is cool, but for folks concerned about privacy, even if the cached layer is anonymized, in the aggregate I bet you can likely figure out who a person is.

I imagine just looking at the first degree connections of the votes would be a pretty strong signal.

apwheele · 2026-02-25T14:16:18 1772028978

I view them as more idiosyncratic docs, but focused on how to write code (there is so much huggingface code floating around the internet, the models do quite well with it already).

I have not had much success with skills that have tree based logic (if a do x, else do y), they just tend to do everything in the skill (so will do both x and y).

But just as "hey follow this outline of steps a,b,c" it works quite well in my experience.

apwheele · 2026-02-24T13:04:46 1771938286

Claude code inherits from the environment shell. So it could create a python program (or whatever language) to read the file:

    # get_info.py
    with open('~/.claude/secrets.env', 'r') as file:
        content = file.read()
        print(content)

And then run `python get_info.py`.

While this inheritance is convenient for testing code, it is difficult to isolate Claude in a way that you can run/test your application without giving up access to secrets.

If you can, IP whitelisting your secrets so if they are leaked is not a problem is an approach I recommend.

apwheele · 2026-02-23T14:58:14 1771858694

I am skeptical it is a problem isolated to Elsevier. Given the LLM craze now prioritizes open access, https://andrewpwheeler.com/2025/08/28/deep-research-and-open..., it would not surprise me people start gaming MDPI in the same way for example.

azan_ · 2026-02-23T16:51:37 1771865497

MDPI is gamed by design, I think that while Elsevier is awful, MDPI is even worse with 100s of special issues where you are guaranteed to land publication in journals with quite nice IF (which is inflated by publishing large proportion of reviews and less original research).

Wobbles42 · 2026-02-23T19:34:33 1771875273

I wonder if the term "published" as a binary distinction applied to a piece of writing is a term and concept that is reaching the end of its useful life.

"Peer reviewed" as a binary concept might be as well, given that incentives have aligned to greatly reduce its filtering power.

They might both be examples of metrics that became useless as a result of incentives getting attached to them.

fc417fc802 · 2026-02-24T07:03:20 1771916600

Both metrics are supposedly binary but in reality have always depended heavily on surrounding context. Archival journals have existed all along. Publication is useful as an immutable entry in the public record made via a third party. Blog posts have a tendency to disappear over time.

EA-3167 · 2026-02-23T16:52:39 1771865559

I'm certain that the comment you responded to never claimed that it was "isolated to Elsevier" in the first place, nor is it very compelling to speculate about how in the future something even worse might emerge.

Right now Elsevier is by far the biggest offender and also happens to the be the topic of the conversation and the article.

xhkkffbf · 2026-02-23T16:12:39 1771863159

Exactly. Elsevier is a dominant company. Of course it's going to have a huge share of anything that goes into journals. They probably also have a huge share of the Nobel prize winning papers too.

That being said, I'm happy to encourage open access.

apwheele · 2026-02-19T22:58:45 1771541925

The book is likely a good fit to this type of work. The chapter on structured outputs shows how to extract out data from text, walking through prompt engineering and k-shot examples to generate json, to pydantic, then batch processing with the different providers.

It also shows how to set up evals in different parts of the book. (Depending on what you want to do, the structured outputs has evals show comparing models/prompt changes to ground truth, and the agent chapter has evals LLM as a judge.)

apwheele · 2026-02-19T18:27:28 1771525648

Crime De-coder is my consulting firm (not an acronym), but the book is not specific to crime analysis -- it is more general.

apwheele · 2026-02-19T17:41:22 1771522882

IMO Google Vertex is not any harder than AWS. AWS biggest pain is figuring out IAM roles for some of the services (batching and S3 Vectors -- I actually cut out Knowledge Bases in the book because it was too complicated and expensive). Have not personally had as big an issue figuring out Vertex.

I do have a follow up post planned on some reliability issues with the APIs I uncovered with compiling the book so much -- I would not use Google Maps grounding in production!

apwheele · 2026-02-19T16:57:26 1771520246

I am not as concerned with that with API usage as I am with the GUI tools.

Most of the day gig is structured extraction and agents, which the foundation LLMs are much better than any of the small models. (And I would not be able to provision necessary compute for large models given our throughput.)

I do have on the ToDo list though evaluating Textract vs the smaller OCR models (in the book I show using docling, their are others though, like the newer GLM-OCR). Our spend for that on AWS is large enough and they are small enough for me to be able to spin up resources sufficient to meet our demand.

Part of the reason the book goes through examples with AWS/Google (in additiona to OpenAI/Anthropic) is that I suspect many individuals will be stuck with the cloud provider that their org uses out of the box. So I wanted to have as wide of coverage as possible for those folks.