17] บทความวิจัย ML เด่นประจำสัปดาห์ (Top ML Papers of the Week)

(discuss.pytorch.kr)

4 คะแนน โดย ninebow 2024-03-19 | 6 ความคิดเห็น | แชร์ทาง WhatsApp

[2024/03/11 ~ 03/17] บทความวิจัย ML เด่นประจำสัปดาห์ (Top ML Papers of the Week)

บทความนี้เป็นการแปลอัตโนมัติของเนื้อหาเกี่ยวกับงานวิจัย ML ที่ DAIR.AI เผยแพร่ทุกสัปดาห์
สัปดาห์นี้ งานวิจัยเกี่ยวกับโมเดลภาษาขนาดใหญ่ (Large Language Models, LLMs) ปรากฏเป็นเทรนด์หลัก โดยมีหลายงานที่มุ่งเน้นไปที่ LLMs เพื่อแก้ปัญหาหรือทำความเข้าใจประเด็นต่าง ๆ ตัวอย่างเช่น งานอย่าง "SIMA", "Retrieval Augmented Thoughts", "LMs Can Teach Themselves to Think Before Speaking", "Knowledge Conflicts for LLMs" และ "LLMs Predict Neuroscience Results" ล้วนใช้โมเดลภาษาขนาดใหญ่หรือกล่าวถึงประเด็นที่เกี่ยวข้องกับประสิทธิภาพของมัน นอกจากนี้ งานอย่าง "Stealing Part of a Production Language Model" ยังแสดงให้เห็นถึงการศึกษาด้านความปลอดภัยที่เกี่ยวข้องกับโมเดลภาษาอีกด้วย
แนวโน้มนี้สะท้อนให้เห็นถึงการเปลี่ยนแปลงเชิงนวัตกรรมและอิทธิพลของโมเดลภาษาขนาดใหญ่ที่มีต่อชุมชนวิจัยปัญญาประดิษฐ์ในช่วงไม่กี่ปีที่ผ่านมา โมเดลภาษาขนาดใหญ่ไม่เพียงสร้างฐานที่มั่นในงานประมวลผลภาษาธรรมชาติ (Natural Language Processing, NLP) เท่านั้น แต่ยังวางตำแหน่งตนเองเป็น foundation model ที่มีประสิทธิภาพในหลากหลายโดเมนอีกด้วย ในลักษณะนี้ LLMs แสดงประสิทธิภาพสูงในงานทำความเข้าใจและสร้างภาษาที่หลากหลาย และยังถูกสำรวจอย่างกว้างขวางในงานวิจัยเชิงประยุกต์ด้วย เพิ่มเติมจากนั้น งานอย่าง "Multimodal LLM Pre-training" ยังสะท้อนแนวโน้มการวิจัยล่าสุดที่ LLMs ถูกผสานเข้ากับข้อมูลรูปแบบอื่น เช่น ภาพและเสียง เพื่อเสริมความสามารถด้านการเรียนรู้แบบมัลติโหมด
จากการวิเคราะห์นี้ คาดได้ว่างานวิจัยเกี่ยวกับ LLMs จะยังคงเดินหน้าต่อไปเพื่อปรับปรุงความเข้าใจภาษาธรรมชาติ ขยายไปสู่การประยุกต์ใช้ใหม่ ๆ ที่หลากหลาย และมีบทบาทสำคัญต่อการพัฒนาเทคโนโลยีปัญญาประดิษฐ์ ไม่เพียงการยกระดับประสิทธิภาพของ LLMs เท่านั้น แต่ยังรวมถึงการวิจัยเชิงประยุกต์ ความปลอดภัย และประเด็นด้านจริยธรรมที่ครอบคลุมกว้างขวางซึ่งคาดว่าจะได้รับการสำรวจมากขึ้น

SIMA / SIMA

แนะนำงานวิจัย

เอเจนต์ AI แบบ generalist สำหรับสภาพแวดล้อมเสมือน 3D ที่ทำตามคำสั่งภาษาธรรมชาติได้ในสภาพแวดล้อมเสมือน 3D และวิดีโอเกมที่หลากหลาย โดยมีการประเมินทักษะพื้นฐาน 600 รายการ ครอบคลุมการนำทาง การโต้ตอบกับวัตถุ และการใช้งานเมนู ซึ่งดูเหมือนว่าภาษาจะเป็นปัจจัยสำคัญอย่างมากต่อประสิทธิภาพ

A generalist ai agent for 3d virtual environments that follows natural-language instructions in a broad range of 3d virtual environments and video games; sima is evaluated across 600 basic skills, spanning navigation, object interaction, and menu use. language seems to be a huge factor in performance.

บทคัดย่อ(Abstract)

การสร้างระบบ AI แบบ embodied ที่สามารถทำตามคำสั่งภาษาตามอำเภอใจได้ในทุกสภาพแวดล้อม 3D เป็นความท้าทายสำคัญของการสร้าง AI ทั่วไป การบรรลุเป้าหมายนี้จำเป็นต้องเรียนรู้การเชื่อมโยงภาษากับการรับรู้และการกระทำเชิงกายภาพเพื่อให้สามารถทำงานที่ซับซ้อนได้ โครงการ Scalable, Instructable, Multiworld Agent (SIMA) รับมือกับปัญหานี้ด้วยการฝึกเอเจนต์ให้ทำตามคำสั่งแบบอิสระในสภาพแวดล้อมเสมือน 3D ที่หลากหลาย ซึ่งรวมถึงทั้งสภาพแวดล้อมวิจัยที่จัดเตรียมไว้และวิดีโอเกมเชิงพาณิชย์แบบปลายเปิด เป้าหมายของเราคือการพัฒนาเอเจนต์ที่สั่งงานได้และสามารถทำทุกอย่างที่มนุษย์ทำได้ในสภาพแวดล้อม 3D จำลองใด ๆ แนวทางของเรามุ่งเน้นความเป็นทั่วไปที่ขับเคลื่อนด้วยภาษา พร้อมตั้งสมมติฐานให้น้อยที่สุด เอเจนต์ของเราโต้ตอบกับสภาพแวดล้อมแบบเรียลไทม์ผ่านอินเทอร์เฟซทั่วไปที่คล้ายมนุษย์ โดยรับอินพุตเป็นภาพสังเกตการณ์และคำสั่งภาษา และส่งเอาต์พุตเป็นการกระทำผ่านคีย์บอร์ดและเมาส์ แนวทางทั่วไปเช่นนี้มีความท้าทาย แต่ทำให้เอเจนต์สามารถยึดโยงภาษาเข้ากับสภาพแวดล้อมที่มีความซับซ้อนทางภาพและอุดมด้วยความหมายได้หลายรูปแบบ พร้อมทั้งช่วยให้เรานำเอเจนต์ไปรันในสภาพแวดล้อมใหม่ได้โดยสะดวก ในบทความนี้ เราอธิบายแรงจูงใจและเป้าหมายของเรา ความคืบหน้าเบื้องต้นที่ทำได้แล้ว และผลลัพธ์เบื้องต้นที่น่าจับตาในสภาพแวดล้อมวิจัยที่หลากหลายและวิดีโอเกมเชิงพาณิชย์หลายประเภท

Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as openended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.

ลิงก์งานวิจัย

https://storage.googleapis.com/deepmind-media/DeepMind.com/…

อ่านเพิ่มเติม

https://discuss.pytorch.kr/t/gn-google-sima-3d-ai/3764

https://x.com/GoogleDeepMind/status/1767918515585994818

RAT: ความคิดเสริมด้วยการค้นคืนเพื่อดึงการให้เหตุผลที่ตระหนักบริบทในการสร้างระยะยาว / RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

แนะนำงานวิจัย

แสดงให้เห็นว่าการปรับแก้ลำดับความคิดซ้ำๆ ผ่านการค้นคืนข้อมูลสามารถปรับปรุงความสามารถในการให้เหตุผลและการสร้างของ LLM ได้อย่างมากในงานสร้างระยะยาว แนวคิดหลักคือแต่ละขั้นของความคิดจะถูกปรับแก้ด้วยข้อมูลที่ค้นคืนมา ซึ่งเกี่ยวข้องกับคำค้นของงาน รวมถึงขั้นความคิดปัจจุบันและก่อนหน้า retrieval-augmented thoughts (RAT) สามารถนำไปใช้กับโมเดลต่างๆ เช่น gpt-4 และ codellama-7b เพื่อปรับปรุงงานสร้างระยะยาว (เช่น การเขียนเชิงสร้างสรรค์และการวางแผนงานแบบ embodied); RAT เป็นวิธี prompting แบบ zero-shot และให้ผลดีขึ้นอย่างมากเมื่อเทียบกับ baseline ต่างๆ รวมถึง zero-shot cot prompting, vanilla rag และ baseline อื่นๆ

Shows that iteratively revising a chain of thoughts with information retrieval can significantly improve llm reasoning and generation in long-horizon generation tasks; the key idea is that each thought step is revised with relevant retrieved information to the task query, the current and past thought steps; retrieval augmented thoughts (rat) can be applied to different models like gpt-4 and codellama-7b to improve long-horizon generation tasks (e.g., creative writing and embodied task planning); rat is a zero-shot prompting approach and provides significant improvements to baselines that include zero-shot cot prompting, vanilla rag, and other baselines.

บทคัดย่อ(Abstract)

เราศึกษาว่าการปรับแก้ลำดับความคิดซ้ำๆ ด้วยความช่วยเหลือของการค้นคืนข้อมูล สามารถยกระดับความสามารถในการให้เหตุผลและการสร้างของโมเดลภาษาขนาดใหญ่ได้อย่างมีนัยสำคัญในงานสร้างระยะยาว พร้อมทั้งลดอาการหลอนได้อย่างมาก โดยเฉพาะอย่างยิ่ง วิธีที่เสนอชื่อว่า Retrieval-Augmented Thoughts (RAT) จะปรับแก้แต่ละขั้นของความคิดทีละขั้นด้วยข้อมูลที่ค้นคืนมา ซึ่งเกี่ยวข้องกับคำค้นของงาน ขั้นความคิดปัจจุบัน และขั้นความคิดก่อนหน้า หลังจากสร้าง zero-shot CoT เริ่มต้นแล้ว เมื่อนำ RAT ไปใช้กับ GPT-3.5, GPT-4 และ CodeLLaMA-7b พบว่าสามารถปรับปรุงประสิทธิภาพในงานสร้างระยะยาวที่หลากหลายได้อย่างมาก โดยมีคะแนนประเมินเพิ่มขึ้นโดยเฉลี่ยแบบสัมพัทธ์ 13.63% ในงานสร้างโค้ด 16.96% ในการให้เหตุผลเชิงคณิตศาสตร์ 19.2% ในงานเขียนเชิงสร้างสรรค์ และ 42.78% ในการวางแผนงานแบบ embodied ดูหน้าเดโมได้ที่ https://craftjarvis.github.io/RAT

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- retrieval-augmented thoughts (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT

ลิงก์บทความ

https://arxiv.org/abs/2403.05313

อ่านเพิ่มเติม

https://x.com/omarsar0/status/1767251740443746435

Quiet-STaR: โมเดลภาษาสามารถสอนตัวเองให้คิดก่อนพูดได้ / Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

แนะนำบทความ

นำเสนอการขยายแนวคิดของ STaR ในชื่อ Quiet-STaR เพื่อให้โมเดลภาษา (LMs) เรียนรู้การให้เหตุผลได้ในรูปแบบที่ทั่วไปและขยายขนาดได้มากขึ้น โดย Quiet-STaR ทำให้ LMs สามารถสร้าง rationale ในแต่ละโทเค็นเพื่ออธิบายข้อความในอนาคต เสนออัลกอริทึมการสุ่มตัวอย่างแบบขนานรายโทเค็นที่ช่วยปรับปรุงการทำนายของ LM ด้วยการสร้างความคิดภายในอย่างมีประสิทธิภาพ และการสร้าง rationale นี้ได้รับการปรับปรุงด้วย REINFORCE

Presents a generalization of star, called quiet-star, to enable language models (lms) to learn to reason in more general and scalable ways; quiet-star enables lms to generate rationales at each token to explain future text; it proposes a token-wise parallel sampling algorithm that helps improve lm predictions by efficiently generating internal thoughts; the rationale generation is improved using reinforce.

บทคัดย่อ(Abstract)

เวลาเขียนหรือพูด มนุษย์บางครั้งก็หยุดคิดอยู่ครู่หนึ่ง งานที่มุ่งเน้นด้านการให้เหตุผลมักอธิบายการให้เหตุผลว่าเป็นวิธีตอบคำถามหรือทำงานแทนให้เสร็จ แต่แท้จริงแล้วการให้เหตุผลแฝงอยู่ในข้อความเขียนแทบทั้งหมด ตัวอย่างเช่น ขั้นตอนที่ไม่ได้ระบุไว้ระหว่างบรรทัดของบทพิสูจน์ หรือทฤษฎีจิตใจที่เป็นรากฐานของบทสนทนา ใน Self-Taught Reasoner (STaR, Zelikman และคณะ, 2022) โมเดลจะเรียนรู้กระบวนการคิดที่เป็นประโยชน์โดยอนุมานเหตุผลประกอบจากตัวอย่างไม่กี่ชุดในการตอบคำถาม และเรียนรู้จากตัวอย่างที่นำไปสู่คำตอบที่ถูกต้อง อย่างไรก็ตาม นี่เป็นสภาพแวดล้อมที่มีข้อจำกัดสูง ตามอุดมคติแล้ว language model ควรเรียนรู้ที่จะอนุมานเหตุผลที่ไม่ได้ระบุไว้จากข้อความทั่วไปใด ๆ ได้ด้วย เราขอแนะนำ Quiet-STaR ซึ่งเป็นเวอร์ชันทั่วไปของ STaR ที่ให้ LMs เรียนรู้การสร้างเหตุผลประกอบในแต่ละโทเคนเพื่ออธิบายข้อความในอนาคตและปรับปรุงความสามารถในการทำนาย เราจัดการกับความท้าทายสำคัญ ได้แก่ 1) ต้นทุนการคำนวณของการสร้างข้อความต่อเนื่อง 2) ความจริงที่ว่าในช่วงแรก LM ยังไม่รู้วิธีสร้างหรือใช้ความคิดภายใน และ 3) ความจำเป็นในการทำนายให้ไกลเกินกว่าโทเคนถัดไปเพียงตัวเดียว เพื่อแก้ปัญหาเหล่านี้ เราเสนออัลกอริทึมการสุ่มตัวอย่างแบบขนานรายโทเคน โดยใช้โทเคนที่เรียนรู้ได้เพื่อระบุจุดเริ่มต้นและจุดสิ้นสุดของความคิด พร้อมทั้งเทคนิค extended teacher-forcing ผลลัพธ์ที่น่าสนับสนุนคือ เหตุผลประกอบที่สร้างขึ้นช่วยให้โมเดลจัดการกับโทเคนที่คาดเดายากได้ดีขึ้นอย่างเด่นชัด และเพิ่มความสามารถของ LM ในการตอบคำถามยาก ๆ ได้โดยตรง โดยเฉพาะอย่างยิ่ง หลังจากทำ continued pretraining ให้กับ LM บนคลังข้อความอินเทอร์เน็ตด้วย Quiet-STaR เราพบการปรับปรุงแบบ zero-shot บน GSM8K (5.9% $\rightarrow$ 10.9%) และ CommonsenseQA (36.3% $\rightarrow$ 47.2%) และยังสังเกตเห็นว่า perplexity ของโทเคนที่ยากในข้อความธรรมชาติดีขึ้น ที่สำคัญคือ การปรับปรุงเหล่านี้ไม่ต้องอาศัยการ fine-tuning กับงานเหล่านี้ Quiet-STaR เป็นอีกก้าวหนึ่งสู่ LMs ที่สามารถเรียนรู้การให้เหตุผลได้อย่างทั่วไปและปรับขยายได้มากขึ้น

When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%$\rightarrow$10.9%) and CommonsenseQA (36.3%$\rightarrow$47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.

ลิงก์บทความ

https://arxiv.org/abs/2403.09629

อ่านเพิ่มเติม

https://x.com/omarsar0/status/1768681638009975088

ความขัดแย้งของความรู้สำหรับ LLM: บทสำรวจ / Knowledge Conflicts for LLMs: A Survey

แนะนำบทความ

บทความสำรวจฉบับนี้จัดหมวดหมู่ปัญหาความขัดแย้งของความรู้ที่พบได้บ่อยเมื่อทำงานกับ LLM ออกเป็นความขัดแย้งระหว่างบริบทกับหน่วยความจำ ระหว่างบริบท และภายในหน่วยความจำ พร้อมนำเสนออินไซต์เกี่ยวกับสาเหตุและแนวทางที่อาจช่วยบรรเทาปัญหาความขัดแย้งของความรู้เหล่านี้

An overview of the common issue of knowledge conflict when working with llms; the survey paper categorizes these conflicts into context-memory, inter-context, and intra-memory conflict; it also provides insights into causes and potential ways to mitigate these knowledge conflict issues.

บทคัดย่อ(Abstract)

แบบสำรวจนี้นำเสนอการวิเคราะห์เชิงลึกเกี่ยวกับความขัดแย้งของความรู้ใน large language model (LLM) โดยเน้นให้เห็นความท้าทายที่ซับซ้อนเมื่อผสานความรู้ตามบริบทเข้ากับความรู้เชิงพารามิเตอร์ ที่นี่มุ่งเน้นความขัดแย้งของความรู้ข้าม attention อยู่ 3 ประเภท ได้แก่ ความขัดแย้งระหว่างบริบทกับหน่วยความจำ, ระหว่างบริบท, และภายในหน่วยความจำ ความขัดแย้งเหล่านี้อาจส่งผลอย่างมากต่อความน่าเชื่อถือและประสิทธิภาพของ LLM โดยเฉพาะในการใช้งานจริงที่มักมีสัญญาณรบกวนและข้อมูลผิดพลาดอยู่ทั่วไป แบบสำรวจนี้มีเป้าหมายเพื่อเป็นแหล่งข้อมูลสำคัญในการผลักดันงานวิจัยในสาขาที่กำลังพัฒนาอย่างต่อเนื่องนี้ โดยจัดหมวดหมู่ความขัดแย้งเหล่านี้ สำรวจสาเหตุ ศึกษาพฤติกรรมของ LLM เมื่อเผชิญความขัดแย้ง และทบทวนแนวทางแก้ไขที่มีอยู่ เพื่อชี้ให้เห็นกลยุทธ์ในการปรับปรุงความทนทานของ LLM

This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness of LLMs, thereby serving as a valuable resource for advancing research in this evolving area.

ลิงก์บทความ

https://arxiv.org/abs/2403.08319

อ่านเพิ่มเติม

https://x.com/omarsar0/status/1768288774532858003

การขโมยบางส่วนของ production language model / Stealing Part of a Production Language Model

แนะนำบทความ

แนะนำการโจมตีแบบ model stealing ครั้งแรกที่ดึงข้อมูลออกมาจาก production language model อย่าง ChatGPT หรือ PaLM-2 และแสดงให้เห็นว่าสามารถกู้คืน embedding projection layer ของโมเดลที่อิง Transformer ได้ผ่านการเข้าถึง API ทั่วไป พร้อมยกตัวอย่างว่ามีการดึง projection matrix ทั้งหมดออกมาจากโมเดล openai ada และ babbage ได้ด้วยค่าใช้จ่ายต่ำกว่า 20 ดอลลาร์

Presents the first model-stealing attack that extracts information from production language models like chatgpt or palm-2; shows that it's possible to recover the embedding projection layer of a transformer-based model through typical api access; as an example, the entire projection matrix was extracted from the openai ada and babbage models for under $20.

บทคัดย่อ(Abstract)

นำเสนอการโจมตีแบบ model stealing ครั้งแรกที่ดึงข้อมูลที่แม่นยำและมีนัยสำคัญออกมาจาก black-box production language model เช่น ChatGPT ของ OpenAI หรือ PaLM-2 ของ Google โดยเฉพาะอย่างยิ่ง การโจมตีนี้สามารถกู้คืน embedding projection layer ของโมเดล Transformer (ภายใต้สมมาตร) ได้จากการเข้าถึง API ทั่วไป ด้วยค่าใช้จ่ายต่ำกว่า 20 ดอลลาร์ สามารถดึง projection matrix ทั้งหมดของโมเดลภาษา Ada และ Babbage ของ OpenAI ออกมาได้ ด้วยเหตุนี้จึงยืนยันได้เป็นครั้งแรกว่าโมเดล black-box เหล่านี้มี hidden dimension เท่ากับ 1024 และ 2048 ตามลำดับ นอกจากนี้ยังสามารถกู้คืนขนาด hidden dimension ที่แน่นอนของโมเดล gpt-3.5-turbo ได้ และประเมินว่าการกู้คืน projection matrix ทั้งหมดจะมีค่าใช้จ่ายในการ query ต่ำกว่า 2,000 ดอลลาร์ สุดท้าย ผู้วิจัยนำเสนอแนวทางป้องกันและการบรรเทาผลกระทบที่เป็นไปได้ พร้อมอภิปรายถึงนัยสำคัญของงานในอนาคตที่อาจขยายการโจมตีนี้ต่อไป

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under $20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.

ลิงก์บทความ

https://arxiv.org/abs/2403.06634

อ่านเพิ่มเติม

https://x.com/omarsar0/status/1767641831079067694

Branch-Train-MiX: ผสม expert LLM เข้าเป็น mixture-of-experts LLM / Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

แนะนำบทความ

เสนอการผสานการฝึก expert LLM เข้าเป็นการฝึกแบบ mixture-of-experts เพื่อเป็นแนวทางที่มีประสิทธิภาพด้านการคำนวณมากกว่าสำหรับการฝึก LLM แนวทางนี้แสดงให้เห็นว่ามีประสิทธิภาพมากกว่าการฝึก generalist LLM ที่ใหญ่ขึ้น หรือการฝึกโมเดลเฉพาะทางหลายตัวแยกกัน โดยแนวทางนี้จะเริ่มจากฝึกสำเนาหลายชุดของ seed LLM ที่ปรับให้เชี่ยวชาญในโดเมนต่าง ๆ แบบขนานกัน (กล่าวคือ expert LLM) แล้วรวมเข้าด้วยกันเป็น LLM เดียวโดยใช้ชั้น feed-forward แบบ MoE จากนั้นจึงทำ fine-tuning กับโมเดลรวมทั้งหมด

Proposes mixing expert llms into a mixture-of-experts llm as a more compute-efficient approach for training llms; it's shown to be more efficient than training a larger generalist llm or several separate specialized llms; the approach, btx, first trains (in parallel) multiple copies of a seed llm specialized in different domains (i.e., expert llms) and merges them into a single llm using moe feed-forward layers, followed by fine-tuning of the overall unified model.

บทคัดย่อ(Abstract)

เราศึกษาวิธีที่มีประสิทธิภาพในการฝึก Large Language Models (LLM) ให้มีความสามารถในหลายโดเมนเฉพาะทาง เช่น การเขียนโค้ด การให้เหตุผลทางคณิตศาสตร์ และความรู้เกี่ยวกับโลก วิธีนี้มีชื่อว่า BTX (Branch-Train-MiX) โดยเริ่มจาก seed model ที่แตกแขนงออกมาเพื่อฝึก expert ด้วยลักษณะที่ขนานกันได้อย่างเต็มที่ มี throughput สูง และลดต้นทุนการสื่อสาร หลังจากฝึก expert แต่ละตัวแบบ asynchronous แล้ว BTX จะนำพารามิเตอร์ feedforward ของพวกมันมารวมเป็น expert ในเลเยอร์ Mixture-of-Expert (MoE) และเฉลี่ยพารามิเตอร์ที่เหลือ จากนั้นจึงเข้าสู่ขั้นตอน MoE finetuning เพื่อเรียนรู้การ routing ในระดับโทเค็น BTX ทำให้สองกรณีพิเศษเป็นกรอบทั่วไปเดียวกัน ได้แก่ วิธี Branch-Train-Merge ที่ไม่มีขั้นตอน MoE finetuning สำหรับเรียนรู้ routing และ sparse upcycling ที่ตัดขั้นตอนการฝึก expert แบบ asynchronous ออกไป เมื่อเทียบกับแนวทางอื่น BTX ให้จุดสมดุลระหว่างความแม่นยำและประสิทธิภาพได้ดีที่สุด

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts are asynchronously trained, BTX brings together their feedforward parameters as experts in Mixture-of-Expert (MoE) layers and averages the remaining parameters, followed by an MoE-finetuning stage to learn token-level routing. BTX generalizes two special cases, the Branch-Train-Merge method, which does not have the MoE finetuning stage to learn routing, and sparse upcycling, which omits the stage of training experts asynchronously. Compared to alternative approaches, BTX achieves the best accuracy-efficiency tradeoff.

ลิงก์บทความวิจัย

https://arxiv.org/abs/2403.07816

อ่านเพิ่มเติม

https://x.com/jaseweston/status/1767727740952682667

Large language models มีประสิทธิภาพเหนือกว่าผู้เชี่ยวชาญมนุษย์ในการทำนายผลลัพธ์ทางประสาทวิทยา / Large language models surpass human experts in predicting neuroscience results

แนะนำบทความวิจัย

มีการเสนอ BrainBench ซึ่งเป็น benchmark สำหรับประเมินความสามารถของแมชชีนเลิร์นนิงในการทำนายผลลัพธ์ทางประสาทวิทยา และพบว่าแมชชีนเลิร์นนิงมีประสิทธิภาพเหนือกว่าผู้เชี่ยวชาญในการทำนายผลการทดลอง อีกทั้งยังพบว่าแมชชีนเลิร์นนิงที่ปรับให้เหมาะกับวรรณกรรมด้านประสาทวิทยาให้ประสิทธิภาพที่ดียิ่งขึ้น

Proposes a benchmark, brainbench, for evaluating the ability of llms to predict neuroscience results; finds that llms surpass experts in predicting experimental outcomes; an llm tuned on neuroscience literature was shown to perform even better.

บทคัดย่อ(Abstract)

การค้นพบทางวิทยาศาสตร์มักขึ้นอยู่กับการสังเคราะห์งานวิจัยที่สะสมมานานหลายทศวรรษ ซึ่งเป็นงานที่อาจเกินขีดความสามารถของมนุษย์ในการประมวลผลข้อมูล Large language models (LLM) นำเสนอทางออก LLM ที่ฝึกจากวรรณกรรมวิทยาศาสตร์จำนวนมหาศาลอาจบูรณาการผลการวิจัยที่มีสัญญาณรบกวนแต่เชื่อมโยงกัน เพื่อคาดการณ์ผลลัพธ์ใหม่ได้ดีกว่าผู้เชี่ยวชาญมนุษย์ เพื่อประเมินความเป็นไปได้นี้ เราได้สร้าง BrainBench ซึ่งเป็น benchmark เชิงอนาคตสำหรับการทำนายผลลัพธ์ทางประสาทวิทยา เราพบว่า LLM มีประสิทธิภาพเหนือกว่าผู้เชี่ยวชาญในการทำนายผลการทดลอง BrainGPT ซึ่งเป็น LLM ที่เราปรับแต่งบนวรรณกรรมด้านประสาทวิทยาให้ผลลัพธ์ที่ดียิ่งขึ้น เช่นเดียวกับผู้เชี่ยวชาญมนุษย์ เมื่อ LLM มีความมั่นใจในคำทำนายของตน คำทำนายนั้นก็มีแนวโน้มจะถูกต้องมากกว่า ซึ่งบ่งชี้ถึงอนาคตที่มนุษย์และ LLM จะร่วมมือกันสร้างการค้นพบ แนวทางนี้ไม่ได้จำกัดอยู่แค่ประสาทวิทยา และสามารถถ่ายโอนไปใช้กับงานอื่น ๆ ที่ต้องพึ่งพาความรู้เข้มข้นได้

Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.

ลิงก์บทความวิจัย

https://arxiv.org/abs/2403.03230

อ่านเพิ่มเติม

https://x.com/ProfData/status/1765689739682754824

C4AI Command-R

แนะนำบทความวิจัย

command-r เป็นโมเดลขนาด 35b พารามิเตอร์ที่มีความยาวคอนเท็กซ์ 128k ปรับให้เหมาะกับกรณีใช้งานอย่างการให้เหตุผล การสรุป และการตอบคำถาม มีความสามารถด้านการสร้างข้อความหลายภาษา ซึ่งประเมินใน 10 ภาษา รวมถึงความสามารถด้านการใช้เครื่องมือและ RAG ที่มีประสิทธิภาพสูง และถูกเผยแพร่ออกมาเพื่อวัตถุประสงค์ด้านการวิจัย

A 35b parameter model, with a context length of 128k, optimized for use cases that include reasoning, summarization, and question answering; command-r has the capability for multilingual generation evaluated in 10 languages and performant tool use and rag capabilities; it has been released for research purposes.

ลิงก์บทความวิจัย

https://huggingface.co/CohereForAI/c4ai-command-r-v01

อ่านเพิ่มเติม

https://x.com/CohereForAI/status/1767275927505977455

Cosine similarity ของ embedding เกี่ยวกับความคล้ายกันจริงหรือไม่? / Is Cosine-Similarity of Embeddings Really About Similarity?

แนะนำบทความวิจัย

ศึกษา embedding ที่ได้มาจากโมเดลเชิงเส้นที่มี regularization และอนุมานเชิงวิเคราะห์ว่าความคล้ายแบบ cosine สามารถให้ค่าความคล้ายที่เป็นไปตามอำเภอใจและไร้ความหมายได้อย่างไร อีกทั้งยังพบว่าสำหรับโมเดลเชิงเส้นบางประเภท ค่าความคล้ายไม่ได้มีลักษณะเฉพาะเพียงหนึ่งเดียว และในบางกรณีก็ถูกควบคุมโดย regularization ผู้เขียนจึงเตือนให้ระวังการใช้ cosine similarity แบบไม่ไตร่ตรอง พร้อมเสนอข้อควรพิจารณาและทางเลือกอื่น

Studies embeddings derived from regularized linear models and derive analytically how cosine-similarity can yield arbitrary and meaningless similarities; also finds that for some linear models, the similarities are not even unique and others are controlled by regularization; the authors caution against blindly using cosine similarity and presents considerations and alternatives.

บทคัดย่อของงานวิจัย (Abstract)

cosine similarity คือค่า cosine ของมุมระหว่างเวกเตอร์สองตัว หรือกล่าวได้อีกอย่างว่าเป็น dot product ระหว่างเวอร์ชันที่ถูก normalize ของเวกเตอร์ทั้งสอง การใช้งานที่ได้รับความนิยมคือการวัดความคล้ายเชิงความหมายระหว่างอ็อบเจ็กต์มิติสูง โดยนำ cosine similarity ไปใช้กับ feature embedding มิติต่ำที่เรียนรู้มา วิธีนี้ในทางปฏิบัติอาจทำงานได้ดีกว่า แต่บางครั้งก็แย่กว่า unnormalized dot product ระหว่างเวกเตอร์ embedding เช่นกัน เพื่อให้เข้าใจข้อสังเกตเชิงประจักษ์นี้มากขึ้น เราศึกษา embedding ที่ได้มาจากโมเดลเชิงเส้นที่มี regularization ซึ่งคำตอบแบบ closed-form เอื้อต่อการให้ insight เชิงวิเคราะห์ เราอนุมานเชิงวิเคราะห์ว่าความคล้ายแบบ cosine สามารถให้ค่า ความคล้าย ที่เป็นไปตามอำเภอใจ และจึงไร้ความหมายได้ สำหรับโมเดลเชิงเส้นบางประเภท ค่าความคล้ายไม่ได้มีลักษณะเฉพาะเพียงหนึ่งเดียว ขณะที่ในบางประเภท ค่าความคล้ายถูกควบคุมโดย regularization โดยนัย เราอภิปรายผลกระทบที่ไปไกลกว่าโมเดลเชิงเส้น: ในการฝึก deep model มักมีการใช้ regularization หลายแบบร่วมกัน ซึ่งส่งผลโดยนัยและไม่ตั้งใจเมื่อคำนวณ cosine similarity ของ embedding ที่ได้ ทำให้ผลลัพธ์ขาดความโปร่งใสและอาจเป็นไปตามอำเภอใจ จาก insight เหล่านี้ เราจึงเตือนให้หลีกเลี่ยงการใช้ cosine similarity แบบไม่ไตร่ตรอง และสรุปทางเลือกอื่นไว้

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

ลิงก์งานวิจัย

https://arxiv.org/abs/2403.05440

อ่านเพิ่มเติม

https://x.com/_reachsumit/status/1767045820384477575

MM1: วิธีการ การวิเคราะห์ และอินไซต์จากการ pre-training มัลติโมดัล LLM / MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

แนะนำงานวิจัย

ให้ภาพรวมอย่างครอบคลุมของวิธีการ การวิเคราะห์ และอินไซต์เกี่ยวกับการ pre-training มัลติโมดัล LLM โดยศึกษาคอมโพเนนต์สถาปัตยกรรมที่หลากหลาย และพบว่าการผสมข้อมูล image-caption, interleaved image-text และ text-only อย่างรอบคอบ คือกุญแจสำคัญสู่ประสิทธิภาพระดับล้ำสมัย อีกทั้งยังเสนอชุดโมเดลมัลติโมดัลขนาดสูงสุด 30b พารามิเตอร์ ที่ทำสถิติ sota ในตัวชี้วัดการ pre-training และมีคุณสมบัติ เช่น in-context learning ที่ดีขึ้น การให้เหตุผลจากหลายภาพ และการเปิดให้ใช้ few-shot chain-of-thought prompting ได้

Provides a comprehensive overview of methods, analysis, and insights into multimodal llm pre-training; studies different architecture components and finds that carefully mixing image-caption, interleaved image-text, and text-only data is key for state-of-the-art performance; it also proposes a family of multimodal models up to 30b parameters that achieve sota in pre-training metrics and include properties such as enhanced in-context learning, multi-image reasoning, enabling few-shot chain-of-thought prompting.

บทคัดย่อของงานวิจัย (Abstract)

งานนี้อธิบายวิธีสร้าง Multimodal Large Language Model (MLLM) ที่มีประสิทธิภาพสูง โดยเฉพาะอย่างยิ่งได้ศึกษาความสำคัญขององค์ประกอบสถาปัตยกรรมต่าง ๆ และการเลือกข้อมูล ผ่านการทำ ablation อย่างรอบคอบและครอบคลุมกับ image encoder, vision-language connector และตัวเลือกข้อมูลสำหรับ pre-training หลายรูปแบบ ทำให้ระบุบทเรียนด้านการออกแบบที่สำคัญได้หลายประการ ตัวอย่างเช่น ได้พิสูจน์ให้เห็นว่าการทำ multimodal pre-training ขนาดใหญ่ด้วยการผสมข้อมูล image-caption, interleaved image-text และ text-only อย่างรอบคอบนั้นมีความสำคัญต่อการบรรลุผลลัพธ์ few-shot ระดับ state-of-the-art (SOTA) ในหลายเบนช์มาร์ก เมื่อเทียบกับผล pre-training ที่เผยแพร่ก่อนหน้านี้ นอกจากนี้ยังแสดงให้เห็นว่า image encoder รวมถึงความละเอียดของภาพและจำนวน image token มีผลอย่างมาก ขณะที่การออกแบบ vision-language connector มีความสำคัญค่อนข้างน้อย เมื่อนำสูตรที่นำเสนอไปขยายสเกล จึงได้ MM1 ซึ่งเป็นตระกูลโมเดลมัลติโหมดขนาดสูงสุด 30B พารามิเตอร์ ประกอบด้วยทั้ง dense model และตัวแปร mixture-of-experts (MoE) ที่ทำสถิติ SOTA ในตัวชี้วัดด้าน pre-training และทำผลงานได้อย่างแข่งขันได้หลังผ่าน supervised fine-tuning บน multimodal benchmark มาตรฐานที่หลากหลาย ด้วยอานิสงส์จาก pre-training ขนาดใหญ่ MM1 จึงมีคุณสมบัติที่น่าสนใจ เช่น in-context learning ที่ดีขึ้น และการให้เหตุผลกับหลายภาพ ซึ่งช่วยให้ทำ few-shot chain-of-thought prompting ได้

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

บทความนี้สรุปด้วยโมเดล GPT จึงอาจมีบางส่วนที่คลาดเคลื่อน โปรดดูต้นฉบับที่ลิงก์ไว้ด้านล่างประกอบด้วย! หากพบข้อความที่แปลกหรือข้อมูลไม่ถูกต้องระหว่างอ่าน รบกวนแจ้งในคอมเมนต์ด้วย

⚠️โฆษณา⚠️: บทความนี้ที่สรุปโดย ชุมชนผู้ใช้ PyTorch เกาหลี มีประโยชน์สำหรับคุณหรือไม่? หาก สมัครสมาชิก เราจะส่งบทความสำคัญทางอีเมลให้คุณ! (ค่าเริ่มต้นคือ Weekly แต่ เปลี่ยนเป็น Daily ได้)

6 ความคิดเห็น

prelude9903 2024-03-19

ช่วยบอกหน่อยว่าใช้เครื่องมือแปลอัตโนมัติตัวไหน

ninebow 2024-03-19

ใช่ครับ ใช้ DeepL อยู่เหมือนกัน 555
ช่วงนี้เพิ่งเพิ่มความสามารถให้สร้างอภิธานศัพท์การแปลภาษาเกาหลีได้เลยลองใช้ดู แต่มีปัญหานิดหน่อย orz...

libner 2024-03-19

ดูเหมือนว่าในการแนะนำบทความส่วน RAT คำว่า rat กับ rag ถูกแปลเป็น หนู กับ ผ้าขี้ริ้ว ตามลำดับ น่าจะเป็นเพราะโมเดลอ่านตัวพิมพ์เล็กตรง ๆ ไปครับ

ninebow 2024-03-20

ได้แก้ไขดังต่อไปนี้แล้ว ขอบคุณครับ! :D

แสดงให้เห็นว่าการปรับแก้ chain-of-thought (CoT) ซ้ำ ๆ ผ่านการสืบค้นข้อมูล สามารถปรับปรุงทั้งการให้เหตุผลและการสร้างข้อความของ LLM ได้อย่างมากในงานสร้างเนื้อหาระยะยาว แนวคิดหลักคือในแต่ละขั้นของความคิดจะถูกปรับแก้ด้วยข้อมูลที่สืบค้นมา ซึ่งเกี่ยวข้องกับคำค้นของงาน ขั้นความคิดปัจจุบัน และขั้นความคิดก่อนหน้า Retrieval-Augmented Thoughts (RAT) สามารถนำไปใช้กับโมเดลอื่นอย่าง GPT-4 และ CodeLlama-7b สำหรับงานสร้างเนื้อหาระยะยาว (เช่น การเขียนเชิงสร้างสรรค์และการวางแผนงานแบบลงรายละเอียด); RAT เป็นแนวทางการพรอมป์แบบ zero-shot และให้ผลดีกว่า baseline ต่าง ๆ อย่างมาก รวมถึง zero-shot cot prompt, RAG พื้นฐาน และ baseline อื่น ๆ

ninebow 2024-03-19

อ๊ะ จริงด้วยครับ; เดี๋ยวผมแก้ต้นฉบับไว้ให้นะครับ 555
ขอบคุณครับ!

ninebow 2024-03-19

อ๊ะ ชื่อเรื่อง... รบกวนเปลี่ยนเป็น 'บทความวิจัย ML เด่นประจำสัปดาห์นี้' ด้วยนะ;;

[2024/03/11 ~ 03/17] บทความวิจัย ML เด่นประจำสัปดาห์ (Top ML Papers of the Week)