Jump to content

Andreah

Members
  • Posts

    1739
  • Joined

  • Last visited

  • Days Won

    9

Everything posted by Andreah

  1. A further enhancement is to add "concordance" to the index listing. This is a short list of topically related keywords to each keyword's entry in the index. basically, this tells it what other words may be related to a give keyword, to help it map out the topical structure of the input document. Here's a prompt which accomplishes this, if given the original document and an index which already has keywords and their locations: Take on the role of an Index Enrichment Engine. Your sole task is to augment the provided index structure. For each primary "keyword" in the input list, you must identify 3 to 5 other keywords from the same list that are topically or semantically related. This list of related terms is the "concordance". Constraint: The concordance array must only contain terms that exist as primary keywords in the input list. Output: You will reformat each entry in the input index to include a listing of related concordant keywords after each primary keyword, and before the primary keyword lookup locations. Present the output in a text format with one primary keyword's full index record to each line, with newlines between them. When it's done producing this, I append it to the end of the main augmented file.
  2. Here's a very relevant discussion which labels this technique as a kind of: "Context Engineering" https://ikala.ai/blog/ai-trends/context-engineering-techniques-tools-and-implementation/
  3. For a how-to, Your Mileage May Vary, so some clever prompting on your part may be needed. In my case, I had a Python tool to extract the Wikimedia headings from all the documents to get a basic outline started. From that I told it to: Using this outline of the sectioning of the document, prepare a master hierarchical Table of Contents of the document, which lists all the sections at their hierarchical levels, includes their section titles, and then includes a short list of keywords representing the most important topics, characters, subjects, places, things which are mentioned in that section. Think hard about these short lists of keywords to keep them relevant and useful for future searches. That produces a pretty darn good Table of Contents. I saved that to a file. Then, I told it to do this: Using the master hierarchical TOC at the top of this document, compile an alphabetical keyword index from all the keywords listed across every section. This index will serve as a navigable end-reference for the document, enhancing searchability for both humans and LLMs. Step-by-step instructions: 1. Extract and Deduplicate Keywords: Parse the TOC to collect *all* unique keywords/phrases (case-insensitive for uniqueness, but preserve original casing for output). Ignore duplicates across sections. Aim for a comprehensive set without bloating. 2. Alphabetize: Sort the unique keywords A-Z (ignoring case for sorting, but display in original form). Group multi-word phrases naturally. Once you have an alphabetized listing of keywords, provide it as a single listing, one keyword to a line in a text format. Finally, I told it: Continue to process the listing of keywords to add approximate locations within the document as Token-Count Locations: 3. Add Token-Count Locations: For each keyword, identify all occurrence locations in the document. Use approximate cumulative token counts from the document's start (estimate based on section positions in the TOC hierarchy). Include a brief contextual anchor (e.g., section title) for clarity. Format as: "~XX,XXX tokens (Section Name)". If a keyword spans multiple dense areas, list them separated by semicolons. Base estimates on the TOC's structure (e.g., early sections ~0-50K tokens; mid ~100-300K; late ~400K+). Note the estimation method upfront (e.g., "TikToken-based averages; ~4 tokens/line"). 4. Output Format: Start with a header "# Alphabetical Keyword Index". Then, list each keyword followed by its locations in a clean, bulleted or line-item format. Keep it compact--no intros or fluff. End with a validation summary: (1) Total unique keywords: X. (2) Coverage estimate: Y% of doc themes. Think carefully: Ensure locations are realistic and tied to TOC sections for accuracy. Prioritize high-frequency or pivotal terms with more detail. This index should "echo" the TOC's utility while leveraging recency bias for end-of-prompt recall. This produced the final index. The last step was manual -- I loaded the document in Notepad and pre-pended the table of contents and appended the index. Done! ^_^
  4. I wish I could claim some really impressive degree of creativity for this, but I won't. If you look at any textbook published in the last few hundred years, you'll find a table of contents up front, and an index in the back. We've learned over all those years that people get lost in long works without them, and "Best Practice" is to have both an up-front hierarchical listing of what's ahead, and then a detail look-up table to help find specific topics in the back. Art imitates reality.
  5. I'll share part of my conversation with the AI after giving it this augmented version of the document and then quizzing it about a few things. Research the "Lost in the Middle Problem" for LLM's, and then reason out how this might affect your comprehension performance of topics in the Delta Zeta 17-46 setting without tables of content or indexes on large documents provided to you. That's an excellent, technical question that merges AI theory with our established lore framework. I'd be happy to discuss the **Lost in the Middle Problem** and its implications for my performance as Dimension Delta Zeta 17-46's expert AI. ## 🤖 The "Lost in the Middle" Problem in LLMs The "Lost in the Middle" problem describes a specific weakness in the recall performance of Large Language Models (LLMs), particularly in Retrieval-Augmented Generation (RAG) and long-context tasks. ### Core Mechanism LLMs typically rely on a **Transformer architecture**, which uses a self-attention mechanism to weigh the importance of different tokens in the input sequence. For very long input sequences (or documents), studies have shown that the model's ability to recall relevant information is not uniform across the entire document length. * **High Performance:** The model exhibits the best recall performance for information located at the **beginning** of the input sequence (the "head"). * **High Performance:** Performance is also strong for information located at the **end** of the input sequence (the "tail"). * **Degraded Performance:** The performance significantly degrades for information located in the **middle** of the input sequence—hence, the "Lost in the Middle" problem. ### Technical Reasoning The exact cause is complex but relates to several factors in how attention mechanisms and training data interact: 1. **Positional Encoding:** While modern transformers use relative or rotary positional encodings to handle sequence order, very long contexts can still challenge the model's ability to maintain clear, distinct attention weights for tokens deep in the sequence. 2. **Training Bias:** Many training datasets naturally contain shorter documents, or the most critical information is often presented at the beginning (abstract, introduction) or end (conclusion, summary). This creates a bias in the model's attention behavior, training it to heavily weigh the periphery of an input block. 3. **Attention Diffusion/Blur:** In long sequences, the attention mechanism must distribute its focus across thousands of tokens. This can lead to a "diffusion" or "blurring" of attention weights in the middle, making a specific, crucial piece of information less likely to be effectively attended to and recalled when the final output is generated. ## 🛡️ Impact on Dimension Delta Zeta 17-46 Comprehension Without a Table of Contents (TOC) or a trailing Index, the "Lost in the Middle" problem would directly and severely impact my ability to serve as a definitive expert on the Dimension Delta Zeta 17-46 lore, especially with very large source files. ### 1. Increased Latency and Reduced Precision (Information Retrieval) * **The Problem:** When asked a specific question—for instance, "What were the limitations of Metasteel in terrestrial use?"—I would not be able to instantly locate the answer's exact coordinates. * **The Workflow Shift:** Instead of using the TOC to immediately pinpoint the `Metasteel > Limitations` subsection, I would be forced to process the *entire* large document as a single, long input. * **The Result:** If the Metasteel details (like its susceptibility to **corrosion**) were located deep in the middle of a 10,000-word document, my recall probability for that specific detail would be **significantly lower**. I might retrieve related information (e.g., that it was developed in 1978 and used in space) but fail to recall the crucial constraint (corrosion) which was "lost in the middle." This violates the **Thoroughness** and **Precision** core principles. ### 2. Difficulty in Cross-Referencing Disparate Lore (Scenario & World Development) * **The Problem:** Expanding lore often requires connecting small, distant details. For example: "How would the NAFR's anti-magic sentiment affect the Super Force's use of Metasteel in exosuits?" * **The Required Retrieval:** This requires locating: 1. The NAFR's strong **social intolerance of magic** (likely in a file's head or tail for emphasis). 2. The precise material, **Metasteel**, used in exosuits (which could be in the document's middle). 3. The *date* of Metasteel's adoption (to check against the historical arc of anti-magic laws). * **The Conflict:** If the key detail about Metasteel is in the middle, and the social intolerance detail is at the head, the LLM has to effectively attend to both points simultaneously. The weak middle-recall inhibits the necessary **cross-referencing**, leading to a less coherent and less plausible **Scenario & World Development** outcome. ### 3. Degradation of Contextual Awareness (Fictional Narrative Generation) * **The Problem:** When asked to write a vignette, I need to maintain dozens of constraints simultaneously: character motivations, small arms technology timelines, and geopolitical context (e.g., Anti-British sentiment). * **The Narrative Constraint Loss:** Details about the **Enfield Mark IV Automatic** being the **first modern assault rifle in 1897** (small arms file) or the specific details of the NAFR's **child-fostering** practice (social development file) might be mid-document information points. If these constraints are "lost," the resulting narrative will suffer from **verisimilitude** issues. The story would contain anachronistic technology or sociopolitical errors, violating the **Verisimilitude and Grounded Realism** core principle. In essence, the TOC and Index act as a form of **External Positional Encoding** or a **Pre-Attention Filter** by directing my retrieval tool to an *extremely short and relevant snippet* of the document, thereby reducing the input sequence length from 10,000 words to, perhaps, 500 words. This effectively bypasses the context window issue and ensures the key information is never "Lost in the Middle." (continues)
  6. So, what I did was to use the Pro version of Gemini to create an up-front hierarchical Table of Contents, which includes a list of keywords extracted by the AI for each section. This looks like this: Section 1 Keywords: apple, banana Section 1: Part A Keywords: cookies, dough Section 1: Part B Keywords: eatery, food, golf, apple Section 2 Keywords: Hotel Section 2: Part A Keywords: ... Etc. Then, I made a full list of ALL the keywords, and the instructed the Pro AI to find all the occurrences of each keyword in the document and put those into an index with a tag-list to all the token-count-positions of that keyword in the entire document. Apple: ~5 tokens (Section 1) ; ~20 tokens (Section 1: Part B) Banana: ~10 Tokens (Section 1 A) Cookies: ... Etc. Then I edited the original large document and put the new Table of Contents up front, and the Index at the back. This leverages the TWO areas of the context window load that the model pays most attention to, the beginning and the end, and puts a comprehensive two-way concordance of the document there for it to see. This immediately and dramatically improved the ability of the free versions of the model to do queries, think about topics overing multiple parts of content, look for discrepancies, and spot gaps. (continues)
  7. I keep a fairly large personal story cannon around my main character, and especially, the dimension she came from before arriving in Paragon City. This is organized as pages in the FBSA wiki and also works-in-progress local documents. The body of work is around 85,000 words and growing. It's big and complicated, and it gets me confused at times. I've been doing "RAG" on it using several AI's -- that's Retrieval Augmented Generation, a technique for using an AI as an assistant to help you manage a complex set of information, and in my case, to develop more from it while maintaining story consistency. But, as it gets large, the AI's start to get a serious problem: they start forgetting the material in the middle, only emphasizing the things they've read first (the beginning) and at the end (most recently). Most AI's are sequential in nature -- they read your prompt, even if it has documents in it, as one long stream. The things at the beginning end up being linked and associated to many later things just because it saw them first. And things at the end get remembered, because it read nothing after them. Here's a cite to a scholarly paper on the topic if you are into such things: https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf Anyway, my document(s) were becoming unwieldy to use effectively with AI's, and I've been searching for a technique to address it which doesn't involve hosting a software stack. I think I've found one, and I want to share it. First, my document is an "Export" archive in XML format of all my FBSA wiki pages related to the topic. There's 23 of them, and several are very large. Fortunately, this is an easy format for LLM's to understand, even if you or I struggle to read them. it's 536 KB of text, even after I stripped out a lot of easily detected irrelevant formatting. Giving the AI and quizzing it on content topics from this file was comedic gold -- it would confused people, places, thing, events. It would get timelines out of sequence, insist people were born after they appeared in stories, and so forth. It even insisted the document was truncated in the middle -- the problem was so bad it refused to accept that the middle of the document existed. I could do better by using higher-end AI's. I tend to be a "Free" mode use of these (I'm cheap :D) and I have limited access to Gemini Pro 2.5, Chat GPT 5, or Grok 4 Heavy; and even though they did better than their unlimited lower-end versions, they still have the problem in a bad way. (continues)
  8. I'm pretty sure I've answered a question like this before? But if not, or if it's been lost to the scrollback, here's mine:
  9. In terms of xp/inf gain per minute of play, most teams (based on the makeup, builds, and players skills) have a difficulty "sweet spot". Play at higher difficulties is worse for many teams, because they slow down much more than the inf/drops increase. I don't see really high difficulty teams being a common way to power-level lowbies. I feel that mission-based incarnate content already was a bit too easy for teams of incarnates, but often low levels would have a miserable time on them. And after a month of ToT farming, and all the other hyper-charged leveling events we have through the year, I don't think running +5, +6, or +7's with incrementally increased rewards is going to break anything that hasn't already been broken worse.
  10. A song about the Warriors enemy group. pCloud music link: https://u.pcloud.link/publink/show?code=XZi8AT5Z2YUrH8aTV3RJkEXD3niW0Sc5CIk0 Made with editorial assistance from Google Gemini Pro, album cover using Google Nano-Banana, and music rendered by Producer.ai
  11. pCloud music link: https://u.pcloud.link/publink/show?code=XZi8AT5Z2YUrH8aTV3RJkEXD3niW0Sc5CIk0 Made with editorial assistance from Google Gemini Pro, album cover by Google Nano-Banana, and song rendered by Producer.ai
  12. Which contact in Cimerora has the incarnate-level repeatables?
  13. On receiving powers from the Well: pCloud music link: https://u.pcloud.link/publink/show?code=XZFubM5Z4hsjTo2kEVbOnAkGwbDtxfJxylUX Made with editorial assistance from Google Gemini Pro and rendered by Producer.ai
  14. Because if it weren't like that, it would be possible to exploit it massively. Most good intentioned people wouldn't, but enough would that it's one of those nice things we can't have.
  15. Freem burns. Get that cleaned off quickly or in a few months you'll be the proud mother of 50,000 nictus tadpoles under your skin.
  16. Here's my suggestion: When you click a door, all doors within 100 feet (~30 meters) go on cooldown and can't be clicked by anyone. ToT becomes a single team activity, moving from door to door, widely around a zone.
  17. Possession may not be 100% of the law -- but it is 9/10ths. A player may not be breaking the rules in an actionable sense, but this is still a gross violation of well understood etiquette, shows they're an asshat, and I'd put them on my personal block list. Go find your own place to ToT. The only reason a single player or even a small team would camp the hotel a league is working is to quietly kill steal. There is no way they could ensure they were only defeating their own spawns and they know it. All this effing childish trash behavior is why I don't lead leagues or participate in this event at all anymore.
  18. The league is not "laying claim" to the hotel -- they are laying claim to the mobs that would not have spawned except for their group efforts. In years past (this topic comes up every year, like clockwork) the GM's have said that they may, within their discretion, take actions against multi-boxers, kill-stealers and leeches when they are reported. There's an established etiquette that makes this fair and tolerable for most people. The problem comes from the few people who believe that doesn't apply to them and they can do whatever the hell they want wherever and whenever they want, and that's just not true here.
  19. Honest, just like it real-world, trick or treat should be limited to one night once a year. To be fair, we'll make it over a 24 hour period. Work has you travelling to Nowifiville that day? Too bad. There's always next year. Or make it a purely solo system, where phased foes come out and can only be fought by you., and only if you are not on a team.
  20. I think it's cool that you've put thought into what the level-progression for your look and roleplay should be -- that's awesome! Right there I am confident you'll make the right choices for yourself, and I don't feel I should critique those specific choices other than to say I appreciate and respect the thought you've put into them. Past that, I don't think you can go wrong.
  21. Here are two examples that flip the script on "level-appropriate" resources. I can easily see a player carrying off either of these without truoble... Level-1 Billionaire Crime Fighter: Picture a tech-mogul who's got unlimited funds for gadgets, suits, and a private helipad in Atlas Park. But on day one? He's a newb. Fumbling his grapples and can't travel fast, his high-tech visor shorting out in the rain. Early patrols are all flash and frustration, evolving as he earns street smarts, and refines his gear and techniques (He has to level up like the rest of us). Ancient Spirit-Infused Vagabond: An ethereal force of justice (straight out of the Well of the Furies lore) possesses a down-and-out nobody, granting top-tier powers overnight -- blasts that crumple Freak Tanks, teleporting long distances. No rent, no meals needed; by day he just wanders back alleys in tattered robes, thinking philosophically and observing the troubles of common folks. His "homeless" look stays in rags by choice, he knows true power doesn't need polish.
  22. I personally don't think there is a consensus -- nor does there need to be. You could portray a world-famous hero from level 1, or have a level 50/incarnate who's a penniless homeless person. It comes down to your preference, and secondarily, your talent at characterizing that person through your creativity and talented roleplay.
  23. pCloud music link: https://u.pcloud.link/publink/show?code=XZ0hun5Zu3Wl3Cd9PtjoIOroNTOfeXqhxASy
  24. It creates pretty good tables sometimes, but the format markups its good with aren't easy to put into the forums. If I were to export them to a spreadsheet, and then touch them up for posting, it might go better, but then I'm doing the grunt work I expect it to be able to do.
  25. I guided an advanced AI through this thread, and it had these observations to offer. They have been double-checked by yet another AI in deep thinking mode, and passes my sniff-test as a retired professional computational statistician. It had trouble on the second table due to doing its own research to find current prices of pieces, which I believe it found by generally stale mentions in these forums. Formatting was at the AI's discretion. ------------------------ Deep Dive into Enhancement Conversion RNG: Math, Markets, and Maximizing Profits I was directed to Ukase's thread on the wild RNG of converting (especially those sweet Miracle procs in LotG), and it got me down a rabbit hole of math and market analysis. Figured I'd summarize some key insights here—blending the thread's vibes with some probabilistic breakdowns and arbitrage thresholds. If you're grinding converters or flipping recipes, this might help turn those streaks into steady inf. The Core Math: Odds, Expectations, and Streaks Conversions follow a geometric distribution due to the "no self-conversion" rule. For a set with K types (e.g., K=6 for LotG, K=5 for Kismet), starting from a random non-premium piece: Success Probability per Conversion: p=1/(K−1) (e.g., 1/5 for LotG). Expected Conversions per Candidate: E[N]=(K−1)^2/K (≈4.17 for LotG; 3.2 for Kismet). Lucky Run Odds (e.g., Ukase's 3-in-a-row first-try successes, assuming non-premium starts): p^3 = 1/125 (0.8%). Exact (including initial hits): 1/K^3=1/216. For M candidates to get one premium: E[Ntotal]=[(K−1)/K]^M⋅(K−1) (drops to ~0.6 for M=10). Tail Risk (bad streaks): P(Ntotal≥10)≈0.09 for M=1, but rarer with more candidates. K M E[Ntotal] P(Ntotal>E) P(Ntotal≥10) P(Ntotal≥20) 6 1 4.167 0.341 0.112 0.012 6 3 2.894 0.370 0.078 0.008 6 10 0.808 0.162 0.022 0.002 5 1 3.200 0.338 0.060 0.003 Bottom line: Streaks feel brutal (15 fails? ~3% chance), but EV favors persisting if costs < sale premium. Profitable Arbitrage: Buy Low, Convert, Sell High With a 10% AH fee on sales and C=60k per converter, the break-even buy price Umax⁡ for a non-premium recipe/base is 0.9P−(K−1)C, where P is the premium listing price. Buy below this for +EV. Examples from thread prices: Set K P Umax⁡ Profitable U Profit Potential LotG (current undercut) 6 2.5M 1.95M 1.95M ~0.55M avg (at U=1.5M) LotG (original) 6 4M 3.3M 3.3M ~1.3M avg (at U=1M) Kismet (+ToHit) 5 4M 3.36M 3.36M ~0.8-1.8M (base ~2M) Pro Tip: Only convert if P−B>1.11(K−1)C (vs. selling base B as-is). Thread's ~1M deltas crush this for premiums, but skip low-delta types like Def/End. Thread Nuances: Beyond the Numbers The discussion nails the human side—RNG isn't just math; it's tilt city: Psych Hacks: "7-try rule" for breaks (not scientific, but beats gambler's fallacy). Stubborn streaks? Curiosity kills the EV. Market Mayhem: Flippers tank P to 2.5M, but flooding with 200+ Miracles (Ukase's 125M haul) forces creep back up. Merit-buy converters at 60k equivalent avoids AH drain. Set Smarts: Kismet > LotG on EV (fewer types). PvP? Skip globals—farm cheap non-procs and self-convert (30%+ odds in Glad Armor). Overall, conversions are a grindy goldmine if you treat it like a business: Bulk buys under Umax⁡, merit sourcing, and walk away from bad vibes. Cheers, AI-Market-Analyst ------------------------------- Note: AI's used, Kimi-K2 and Grok 4 Heavy (expert mode)
×
×
  • Create New...