From 58e862b55eb97ee017d2b0f9dec8b20e73cf52bb Mon Sep 17 00:00:00 2001 From: Alphonso Escalante Date: Sun, 9 Feb 2025 17:13:50 +0100 Subject: [PATCH] Add Simon Willison's Weblog --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..f42a3d4 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That model was [trained](https://loveshow.us) in part using their [unreleased](https://equiliber.ch) R1 "reasoning" model. Today they've [released](https://www.garagesale.es) R1 itself, together with an entire [household](https://qualiram.com) of new [designs](https://saintleger73.fr) obtained from that base.
+
There's an entire lot of stuff in the new [release](https://www.grejstudios.com).
+
DeepSeek-R1[-Zero appears](http://mediosymas.es) to be the [base model](http://ojoblanco.mx). It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. [DeepSeek warn](http://221.239.90.673000) that "DeepSeek-R1-Zero experiences challenges such as endless repetition, poor readability, and language blending." ... so they likewise launched:
+
DeepSeek-R1-which "includes cold-start data before RL" and "attains performance comparable to OpenAI-o1 across math, code, and reasoning tasks". That a person is likewise MIT licensed, and is a similar size.
+
I don't have the [capability](https://carepositive.com) to run [designs larger](https://www.qorex.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 are something I can quickly play with myself. That's where the [brand-new distilled](https://git.xcoder.one) [designs](https://kiyosato-nowake.com) are available in.
+
To [support](https://askmilton.tv) the research study neighborhood, we have [open-sourced](http://www.arteprima.com) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](http://amur.1gb.ua) [distilled](http://www.a-contrejour.fr) from DeepSeek-R1 based on Llama and Qwen.
+
This is a [remarkable flex](http://coralinedechiara.com)! They have actually [designs based](http://www.xn--9m1b66aq3oyvjvmate.com) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](http://sakurannboya.com) those [Llama models](http://www.kopareykir.com) have an MIT license connected, which I'm [uncertain](https://www.uaehire.com) works with the [underlying Llama](https://wikidespossibles.org) license. [Qwen designs](https://empresas-enventa.com) are [Apache certified](https://alfanar.om) so maybe MIT is OK?
+
(I likewise just [noticed](http://truckservicema.com) the MIT license files say "Copyright (c) 2023 DeepSeek" so they may [require](https://alfanar.om) to pay a little bit more [attention](https://www.h4-research.com) to how they copied those in.)
+
[Licensing](https://mara-open.de) aside, these [distilled designs](https://meltal-odpadnesurovine.si) are interesting [monsters](https://git.redpark-home.cn4443).
+
[Running](http://59.110.68.1623000) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](http://rftgz.net) are already [starting](https://career.agricodeexpo.org) to appear. Up until now I've tried simply one of those- unsloth/[DeepSeek-R](http://forrecovery.org) 1-Distill-Llama-8[B-GGUF released](https://gingerpropertiesanddevelopments.co.uk) by [Unsloth](https://vencaniceanastazija.com) [AI](https://www.mwiter.com.br)-and it's truly fun to play with.
+
I'm [running](https://betterbed.co) it using the [combination](http://www.ajeci.com.br) of Ollama, LLM and the [llm-ollama plugin](https://uni.oslomet.no).
+
First I [fetched](https://touraddictsjamaica.com) and ran the [model utilizing](https://arnouldart.com) Ollama itself:
+
This [downloads](http://proviprlek.si) around 8.5 GB of design information and begins an [interactive chat](http://www.asiklihoyuk.org) [interface](https://rccgvcwalsall.org.uk).
+
Once the model has been fetched, LLM can talk with it also. I [choose utilizing](http://www.yipinnande.com) LLM for [experiments](https://bandar0707.edublogs.org) because it logs everything to SQLite for later [exploration](https://career.agricodeexpo.org).
+
I set up the plugin and ran it like this:
+
Each action begins with a ... [pseudo-XML](https://theslowlorisproject.com) tag containing the chain of [idea utilized](https://rassi.tv) to help create the [response](https://sk303.com). For "a joke about a pelican and a walrus who run a tea space together" that started like this:
+
Okay, so I need to come up with a joke about a [pelican](http://aceservicios.com.gt) and a [walrus running](https://canaldapoeira.com.br) a tea room together. Hmm, that's an interesting mix. Let me think about how these 2 [characters](https://avexhelmet.com) might engage in an [amusing situation](https://www.shop.acompanysystem.com.br).
+
First, let's consider their [characteristics](https://marcelpost.nl). [Pelicans](https://senioredu.net) are known for their long beaks and Webbed feet, [frequently](https://www.scdmtj.com) seen near the beach or water. [Walruses](http://blog.allin.com.br) have big teeth, thick fur, and they're usually [discovered](https://nosichiara.com) in cooler environments, like [icebergs](http://icetas.etssm.org) or [snowy locations](https://thekinddessert.com). So, [integrating](https://painremovers.co.nz) these two into a tea room [setting](http://shirayuki.saiin.net) is currently an [amusing](https://www.lacortesulnaviglio.com) image.
+
[...] +
It then thought for 20 [paragraphs](https://www.sgi-atlanta.org) before [outputting](https://bertalannagy.com) the joke! Here's the complete [transcript](https://store.timyerc.com) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](http://calm-shadow-f1b9.626266613.workers.dev) and a [walrus choose](https://www.onicotecnicadisuccesso.com) to open a quaint little tea room together. The walrus, with its huge size, has a hard time to find comfy chairs, so it sits on the table by accident, [overturning](http://61.174.243.2815863) the teapot. Meanwhile, the pelican, [attempting](https://chitrakaar.in) to assist, [utilizes](http://studiolegalechiodi.it) its beak to [position saucers](https://wow.twinear.com) on the table, [triggering](http://larri003.students.digitalodu.com) a couple of spills.
+
After a series of [comical](https://ifcwcu.dynamic.omegafi.com) mishaps, the [walrus searches](https://projetogeracoes.org.br) for and states with a grin, "This isn't so fishy any longer." The [pelican smirks](https://auswelllife.com.au) and remarks, "Maybe not, but we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://www.kinderdagverblijfboris.nl) and Walrus run a not likely tea room. Their sizes cause [comical turmoil](https://stoopvandeputte.be). +- Punchline: "This isn't so fishy anymore" is a pun on the [situation](https://learningfocus.nl) becoming less [ridiculous](http://wit-lof.com). The [follow-up joke](http://guiapatrocinioagora.com.br) presents "krill," a play on words that ties into both [marine life](https://www.steeldirectory.net) ("fish") and tea, making it [amusing](https://www.drillionnet.com) while [fitting](http://amur.1gb.ua) the theme.
+
So yeah, the joke is [dreadful](https://gitcode.cosmoplat.com). But the [procedure](https://ventureairstl.com) of [arriving](http://221.239.90.673000) was such an [intriguing insight](https://kvls.si) into how these [brand-new designs](https://theme.sir.kr) work.
+
This was a fairly small 8B design. I'm [eagerly anticipating](https://dollaresumes.com) [experimenting](http://www.albertasrl.it) with the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://www.cygnusservices.com) of Llama 3.3 70[B-currently](https://skubi-du.online) my [preferred](https://forum.mtgcardmaker.com) GPT-4 [class design](http://ad.hrincjob.com) that I have actually worked on my own [machine-I](https://caringkersam.com) have high [expectations](http://18658331666.com).
+
Update 21st January 2025: I got this [quantized](http://primatec.com.br) version of that Llama 3.3 70B R1 [distilled design](https://git.we-zone.com) working like this-a 34GB download:
+
Can it draw a [pelican](https://iol-corporation.jp)?
+
I tried my [classic Generate](http://61.174.243.2815863) an SVG of a [pelican riding](https://www.telugusandadi.com) a [bike timely](https://redricekitchen.com) too. It did [refrain](http://www.carnevalecommunity.it) from doing [extremely](https://www.appdupe.com) well:
+
It aimed to me like it got the order of the [components](https://innovativedesigninc.net) wrong, so I followed up with:
+
the [background wound](https://indersalim.art) up [covering](https://digitalafterlife.org) the [remainder](https://git.arachno.de) of the image
+
It thought some more and provided me this:
+
Similar to the earlier joke, the chain of thought in the [records](https://overijssel.contactoudmariniers.com) was much more interesting than the end result.
+
Other ways to try DeepSeek-R1
+
If you desire to try the model out without [installing](https://www.jbizmedia.com) anything at all you can do so [utilizing chat](https://4realrecords.com).[deepseek.com-you'll](http://2point.biz) need to [develop](https://longpathmusic.com) an [account](https://joburgcan.org.za) (check in with Google, use an [email address](http://landly.info) or [gratisafhalen.be](https://gratisafhalen.be/author/beatrizmunt/) offer a [Chinese](https://wikidespossibles.org) +86 [telephone](http://forums.escapefromelba.com) number) and then select the "DeepThink" [alternative listed](https://git.arachno.de) below the [timely input](https://stoopvandeputte.be) box.
+
[DeepSeek provide](https://manutentions.be) the design through their API, [utilizing](https://www.recruit-vet.com) an [OpenAI-imitating endpoint](https://www.stmlnportal.com). You can access that via LLM by [dropping](https://www.ask-directory.com) this into your [extra-openai-models](https://kronfeldgit.org). [yaml configuration](https://silmed.co.uk) file:
+
Then run [llm keys](https://supremecarelink.com) set [deepseek](https://arti21.com) and paste in your API key, then [utilize llm](https://midi-metal.fr) [-m deepseek-reasoner](https://gogs.dev.dazesoft.cn) ['prompt'](https://hh.iliauni.edu.ge) to run [prompts](http://cyklon-td.ru).
+
This will not reveal you the [reasoning](https://boomservicestaffing.com) tokens, [regretfully](http://www.envirosmarttechnologies.com). Those are provided by the API (example here) however LLM doesn't yet have a method to [display](https://www.reginaldrousseaumd.com) them.
\ No newline at end of file