| ▲ | Show HN: I used Claude Code to discover connections between 100 books(trails.pieterma.es) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 322 points by pmaze 16 hours ago | 75 comments | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I think LLMs are overused to summarise and underused to help us read deeper. I built a system for Claude Code to browse 100 non-fiction books and find interesting connections between them. I started out with a pipeline in stages, chaining together LLM calls to build up a context of the library. I was mainly getting back the insight that I was baking into the prompts, and the results weren't particularly surprising. On a whim, I gave CC access to my debug CLI tools and found that it wiped the floor with that approach. It gave actually interesting results and required very little orchestration in comparison. One of my favourite trail of excerpts goes from Jobs’ reality distortion field to Theranos’ fake demos, to Thiel on startup cults, to Hoffer on mass movement charlatans (https://trails.pieterma.es/trail/useful-lies/). A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems - as if the task itself summoned a Foucault’s Pendulum mindset. Details: * The books are picked from HN’s favourites (which I collected before: https://hnbooks.pieterma.es/). * Chunks are indexed by topic using Gemini Flash Lite. The whole library cost about £10. * Topics are organised into a tree structure using recursive Leiden partitioning and LLM labels. This gives a high-level sense of the themes. * There are several ways to browse. The most useful are embedding similarity, topic tree siblings, and topics cooccurring within a chunk window. * Everything is stored in SQLite and manipulated using a set of CLI tools. I wrote more about the process here: https://pieterma.es/syntopic-reading-claude/ I’m curious if this way of reading resonates for anyone else - LLM-mediated or not. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | drakeballew 8 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is a beautiful piece of work. The actual data or outputs seem to be more or less...trash? Maybe too strong a word. But perhaps you are outsourcing too much critical thought to a statistical model. We are all guilty of it. But some of these are egregious, obviously referential LLM dog. The world has more going on than whatever these models seem to believe. Edit/update: if you are looking for the phantom thread between texts, believe me that an LLM cannot achieve it. I have interrogated the most advanced models for hours, and they cannot do the task to any sort of satisfactory end that a smoked-out half-asleep college freshman could. The models don't have sufficient capacity...yet. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | chrisgd 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Really great work but have to agree with others that I don’t see the threads. The one I found most connected that the LLm didn’t was a connection between Jobs and the The Elephant in the Brain The Elephant in the Brain: The less we know of our own ugly motives, the easier it is to hide them from others. Self-deception is therefore strategic, a ploy our brains use to look good while behaving badly. Jobs: “He can deceive himself,” said Bill Atkinson. “It allowed him to con people into believing his vision, because he has personally embraced and internalized it.” | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 8organicbits 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Can someone break this down for me? I'm seeing "Thanos committing fraud" in a section about "useful lies". Given that the founder is currently in prison, it seems odd to consider the lie useful instead of harmful. It kinda seems like the AI found a bunch of loosely related things and mislabeled the group. If you've read these books I'm not seeing what value this adds. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | theturtletalks 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
In a similar vein, I've been using Claude Code to "read" Github projects I have no business understanding. I found this one trending on Github with everything in Russian and went down the rabbit hole of deep packet inspection[0]. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | smusamashah 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I dont understand the lines connecting two pieces of text. In most cases, the connected words have absolutely zero connection with each other. In "Father wound" the words "abandoned at birth" are connected to "did not". Which makes it look like those visual connections are just a stylistic choice and don't carry any meaning at all. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | johnwatson11218 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I did something similar whereby I used pdfplumber to extract text from my pdf book collection. I dumped it into postgresql, then chunked the text into 100 char chunks w/ a 10 char overlap. These chunks were directly embedded into a 384D space using python sentence_transformers. Then I simply averaged all chunks for a doc and wrote that single vector back to postgresql. Then I used UMAP + HDBScan to perform dimensionality reduction and clustering. I ended up with a 2D data set that I can plot with plotly to see my clusters. It is very cool to play with this. It takes hours to import 100 pdf files but I can take one folder that contains a mix of programming titles, self-help, math, science fiction etc. After the fully automated analysis you can clearly see the different topic clusters. I just spent time getting it all running on docker compose and moved my web ui from express js to flask. I want to get the code cleaned up and open source it at some point. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | zkmon 34 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Given the common goals of every book (fame and sales by grabbing user attention), the general themes and styles would have high similarity. It's like flowers with bright colors and nice shapes. Orwelliian motives (sheer egoism, aesthetic enthusiasm, historical impulse and political purposes) are somewhat dated. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | urbandw311er 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This feels like a nice idea but the connection between the theme and the overarching arc of each book seems tenuous at best. In some cases it just seems to have found one paragraph from thousands and extrapolated a theme that doesn’t really thread through the greater piece. I do like the idea though — perhaps there is a way to refine the prompting to do a second pass or even multiple passes to iteratively extract themes before the linking step. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | bonkusbingus 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
"There are, you see, two ways of reading a book: you either see it as a box with something inside and start looking for what it signifies, and then if you're even more perverse or depraved you set off after signifiers. And you treat the next book like a box contained in the first or containing it. And you annotate and interpret and question, and write a book about the book, and so on and on. Or there's the other way: you see the book as a little non-signifying machine, and the only question is "Does it work, and how does it work?" How does it work for you? If it doesn't work, if nothing comes through, you try another book. This second way of reading's intensive: something comes through or it doesn't. There's nothing to explain, nothing to understand, nothing to interpret." — Gilles Deleuze | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | pxc 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I read a book maybe a decade ago on the "digital humanities". I wish now I could remember the title and author. :( Anyway, it introduced me to the idea of using computational methods in the humanities, including literature. I found it really interesting at the time! One of the the terms it introduced me to is "distant reading", whose name mirrors that of a technique you may have studied in your gen eds if you went to university ('close reading"). The idea is that rather than zooming in on some tiny piece of text to examine very subtle or nuanced meanings, you zoom out to hundreds or thousands of texts, using computers to search them for insights that only emerge from large bodies of work as wholes. The book argued that there are likely some questions that it is only feasible to ask this way. An old friend of mine used techniques like this for dissertation in rhetoric, learning enough Python along the way to write the code needed for the analyses she wanted to do. I thought it was pretty cool! I imagine LLMs are probably positioned now to push distant reading forward in an number of ways: enabling new techniques, allowing old techniques to be used without writing code, and helping novices get started with writing some code. (A lot of the maintainability issues that come with LLM code generation happily don't apply to research projects like this.) Anyway, if you're interested in other computational techniques you can use to enrich this kind of reading, you might enjoy looking into "distant reading": https://en.wikipedia.org/wiki/Distant_reading | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | tolerance 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I don’t like this product as a service to readers (i.e., people who read as a cognitive/philosophical exploit) but I do think that somewhere embedded in its backend there are things of benefit. I think that this sucks the discreet joy out of reading and learning. Having the ways that the topics within a certain book can cross over in lead into another book of a different topic externalized is hollowing and I don’t find it useful. On the other hand I feel like seeing this process externalized gives us a glimpse at how “the algorithms” (read: recommender systems) suggest seemingly disjunctive content to users. So as a technical achievement I can’t knock what you’ve done and I’m satisfied to see that you’re the guy behind the HN Book map that I thought was nice too. At its core this looks like a representation of the advantages that LLMs can afford to the humanities. Most of us know how Rob Pike feels about them. I wonder if his senior former colleague feels the same: https://www.cs.princeton.edu/~bwk/hum307/index.html. That’s a digression, but I’d like to see some people think in public about how to reasonably use these tools in that domain. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | lkbm 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Earlier today, I was thinking about doing something somewhat similar to this. I was recently trying to remember a portal fantasy I read as a kid. Goodreads has some impressive lists, not just "Portal Fantasies"[0], but "Portal Fantasies where the portal is on water[1], and a seven more "where/what's the portal" categories like that. But the portal fantasy I was seeking is on the water and not on the list. LLMs have failed me so far, as has browsing the larger portal fantasy list. So, I thought, what if I had an LLM look through a list of kids books published in the 1990s and categorize "is this a portal fantasy?" and "which category is the portal?" I would 1. possibly find my book and 2. possibly find dozens of books I could add to the lists. (And potentially help augment other Goodread-like sites.) Haven't done it, but I still might. Anyway, thanks for making this. It's a really cool project! [0] https://www.goodreads.com/list/show/103552.Portal_Fantasy_Bo... [1] https://www.goodreads.com/list/show/172393.Fiction_Portal_is... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | timoth3y 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
What meaningful connections did it uncover? You have an interesting idea here, but looking over the LLM output, it's not clear what these "connections" actually mean, or if they mean anything at all. Feeding a dataset into an LLM and getting it to output something is rather trivial. How is this particular output insightful or helpful? What specific connections gave you, the author, new insight into these works? You correctly, and importantly point out that "LLMs are overused to summarise and underused to help us read deeper", but you published the LLM summary without explaining how the LLM helped you read deeper. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | guidoism 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nice! I've been using Claude Code and ChatGPT for something similar. My inspiration is Adler's concept of The Great Conversation and Adler's Propædia. I've been able to jump between books to read about the same concept from different author's perspectives. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | hecanjog 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
You really know what a good interface should be like, this is really inspiring. So is the design of everything I've seen on your website! I won't pile on to what everyone else has said about the book connections / AI part of this (though I agree that part is not the really interesting or useful thing about your project) but I think a walk-through of how you approach UI design would be very interesting! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | amadeuswoo 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The feedback loop you describe—watching Claude's logs, then just asking it what functionality it wished it had—feels like an underexplored pattern. Did you find its suggestions converged toward a stable toolset, or did it keep wanting new capabilities as the trails got more sophisticated? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | lisdexan 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Finally, Schizophrenia as a Service (SaaS). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | hising 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Yeah, I had a similar idea, I used Open AI API to break down movies into the 3 act structure, narrative, pacing, character arcs etc and then trying to find movies that are similar using PostgreSQL with pgvector. The idea was to have another way to find movies I would like to watch next based on more than "similar movies" in IMDb. Threw some hours at it, but I guess it is a system that needs a lot of data, a lot of tokens and enormous amount of tweaking to be useful. I love your idea! I agree with you on that we could use LLM:s for this kind of stuff that we as humans are quite bad at. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Aurornis 12 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It’s interesting how many of the descriptions have a distinct LLM-style voice. Even if you hadn’t posted how it was generated I would have immediately recognized many of the motifs and patterns as LLM writing style. The visual style of linking phrases from one section to the next looks neat, but the connections don’t seem correct. There’s a link from “fictions” to “internal motives” near the top of the first link and several other links are not really obviously correct. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | itsangaris 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
surprised to that "seeing like a state" didn't get included in the "legibility tax" category | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | JimmyJamesJames 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Like this initial step and its findings. #1: would a larger dataset increase the depth and breadth of insight ( go to #2) #2: with the initial top 100, are there key ‘super node’ books that stand out as ones to read due the breadth they offer. Would a larger dataset identify further ‘super node’ books. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | adsharma 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is GraphRAG using SQLite. Wouldn't it be good if recursive Leiden and cypher was built into an embedded DB? That's what I'm looking into with mcp-server-ladybug. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | amelius 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Makes me wonder, how well could an LLM-based solution score on the Netflix prize? https://en.wikipedia.org/wiki/Netflix_Prize (Are people still trying to improve upon the original winning solution?) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dev_l1x_be 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Claude code is good for arranging random things into categories, with code, configuration and documentation files it is barely goes into random rabbit holes or hallucinates categories for me. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | threecheese 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Where did you come across Leiden partitioning? I’m facing a similar use case and wonder what you’re reading. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | sciences44 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Love the originality here - makes you curious to explore more. Solid technical execution too. Well done! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | pharrington 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Please don't give yourself LLM-induced psychosis. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | chromanoid 8 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems - as if the task itself summoned a Foucault’s Pendulum mindset. I really appreciate you mentioning this. I think this is the nature of LLMs in general. Any symbol it processes can affect its reasoning capabilities. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | dangoodmanUT 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The UI animations are so fun | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jgalt212 7 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
What did it say about who wrote To Kill a Mockingbird? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | wormpilled 12 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems Interesting... seems like it wants the keys on your system! ;) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | typon 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The website design and content are much nicer than the "ideas" here. Just standard LLM slop once if you actually have read some of these books. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | miracoli 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
wow I hope the bubble pops soon.. now that you discovered books with AI that was illegally trained on them, how about reading them? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | only-one1701 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This is an IQ test lol | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | jereees 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
now do this for research papers! fun stuff :) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mannanj 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Seems like a lot of successful leaders have a history of or normalize deception and lying for some benefit. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | joe_the_user 11 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A fun tendency is that Claude kept getting distracted by topics of secrecy, conspiracy, and hidden systems - as if the task itself summoned a Foucault’s Pendulum mindset. It's all fun and game 'till someone loses an eye/mind/even-tenuous-connection-to-reality. Edit: I'd mention that the themes Claude finds qualify as important stuff imo. But they're all pretty grim and it's a bit problematic focusing on them for a long period. Also, they are often the grimmest spin things that are well known. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | napolux 11 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Monetize it! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||