June 6, 2023
Tech Enthusiast Weekly Issue 253 The day the training material.webp

Tech Enthusiast Weekly (Issue 253): The day the training material runs out

The reason why this post seems so long is because over 17 people write this content for our viewers. so enjoy

Here is a record of the technology content worth sharing every week, released on Friday.

this magazineopen source,welcomepost.Weekly otherwise“Who’s Hiring”Service, publish programmer recruitment information.cooperative promotion pleaseemail contact(yifeng.ruan@gmail.com).

cover picture

bg2023050201

This is not an art gallery, but a red bayberry shed in Sankou Village, Lin’an, Hangzhou, stacked together along the hillside. (via

Topic of the Week: The Day You Run Out of Training Materials

In the current news reports, there are AI news every day, and many models will be mentioned in it.

To distinguish the strength of the model, there is a key indicator, which is to see how many parameters it has. In general, the greater the number of parameters, the stronger the model.

GPT-2 has 1.5 billion parameters, GPT-3 and ChatGPT have175 billionGPT-4 did not publish this indicator, and it is said to be more than 5 times larger than the previous generation.

bg2023050303

So, what are parameters?

According to my rough understanding, the parameter is equivalent to the number of nodes of the neural network based on the model prediction.The more parameters, the more possibilities the model considers, the greater the amount of calculation, and the better the effect.

Since the more parameters the better, will the parameters grow infinitely?

The answer is no, because the parameters are constrained by the training material. Sufficient training material must be available to calculate these parameters,If the parameters grow infinitely, the training material must also grow infinitely.

One argument I’ve seen is that the training material should be at least 10 times the parameters. For example, a model that distinguishes cat photos from dog photos, assuming 1,000 parameters, should be trained on at least 10,000 images.

bg2023050305

ChatGPT has 175 billion parameters, so the training material is preferably no less than 1.75 billion tokens. “Lexical elements” are various words and symbols. Taking the novel “Dream of Red Mansions” as an example, it has 788,451 characters, which means 1 million lexical elements. Then, the training material of ChatGPT is equivalent to 1.75 million copies of “Dream of Red Mansions”.

according toto reportChatGPT actually used 570 GB of training material from Wikipedia, Internet libraries, Reddit forums, Twitter, etc.

bg2023050306

Think about it, everyone, a more powerful model requires more training material,The question is can we find so many materials, and will there not be enough materials one day?

Let me tell you that some scholars really wrotepaperto study this issue.

Over the past 10 years, AI training datasets have grown much faster than the world’s data stock. If this trend continues, exhausting the data stock is inevitable.

The paper gives three time points.

  • 2026: Exhaustion of general language data
  • 2030-2050: use up all language data
  • 2030-2060: use up all visual data

That is, according to their predictions,After about three or four years, new training materials will be hard to find. Thirty years later at the latest, all the materials in the world are not enough for AI training.

bg2023050307

The above picture is the trend chart given by the author, the dotted line is the growth rate of the training material, and the red line and blue line are different predictions of the model growth rate. After 2035, these three lines will merge together, and the curve will become flatter and flatter.

At that point, the authors argue, AI model development could slow down significantly due to insufficient training material.

If his prediction is correct, it means that, unlike what everyone thinks,The rapid development of AI will not last long.Maybe now is the fastest growing stageand then it will start to slow down, and it will slow down significantly by the middle of this century, approaching a stagnation, similar to the status quo of quantum physics.

Technology dynamics

1、wheel steering system

South Korea’s Hyundai has released a new technology that allows each wheel to turn 90 degrees independently.

bg2023042801

In the demo video, the concept car can drive sideways or turn around on the spot.

Although very practical, this technology increases the complexity and cost of the vehicle, and it is unknown whether it will affect normal driving. Hyundai did not say whether it will be put into production.

2、Static electricity from computer chairs

A foreign netizen posted that the monitor in his home often went dark for a few seconds for no reason, and then turned back on again.

He thought it was a monitor problem, but later found that the glitch only occurred when moving the computer chair, or sitting down and standing up.

bg2023042901

His computer chair is MARKUS from IKEA. Many netizens replied that their computer chair also has this problem.

The fabric material or metal frame of this chair is prone to static electricity, and any movement will cause discharge, causing the computer monitor to shut down for a short time.

The solution seems to be to replace the chair, but some netizens with strong hands-on ability connect the ground wire to the chair and let it connect to the ground, thus solving the discharge problem.

bg2023042902

bg2023042903

bg2023042904

3、The Hearing Aid Function of Wireless Headphones

Wireless headphones could replace hearing aids and help the hearing-impaired, a study finds

bg2023010505

Apple’s Airpods headphones have a “real-time listening” feature that amplifies external sounds, much like a hearing aid, and it works really well.

The price of hearing aids is very expensive, tens of thousands of yuan for good ones, and several thousand for ordinary ones. If wireless headphones can be replaced, it will benefit many deaf people.

4、sand dam reservoir

South Korea built the country’s first sand dam reservoir in order to solve the problem of water cut-off in mountainous areas during the dry season.

bg2023043001

bg2023043002

There is a sandstone reservoir inside the dam body, which is usually used to store water. When necessary, the pipeline is opened to let the water flow downstream.

bg2023043003

Doing so is said to have three benefits: water evaporation is greatly reduced; water quality is improved as it passes through the sand bed; and water does not freeze in winter.

5、smart wedding ring

A Czech company has launched a “smart wedding ring”, which can sense the wearer’s heartbeat and display the heartbeat curve on the ring.

bg2023042405

What’s interesting is that it doesn’t show your own heartbeat, but the heartbeat of the other party.

It communicates with the phone via Bluetooth, and whenever the wearer presses on the ring, the phone contacts the other paired ring.

bg2023042406

The heartbeat frequency of the other party will be transmitted to your mobile phone, and the heartbeat curve will also be displayed on the ring.

bg2023042407

According to its inventors, it allows you to feel the romantic heartbeat of your lover at all times. It is made of rose gold and the price is $3,000 per pair.

bg2023042408

article

1、my open source experience(Chinese)

bg2023050401

The author shares his experience and develops a web application for image editing. (@nihaojob post)

2、How to implement CodePen yourself(English)

bg2023041602

CodePen is a well-known real-time editing and preview tool for web pages. This article teaches you how to realize its main functions, which is very simple.

3、tcpdump quick start(English)

bg2023020818

The author teaches you how to use the command line tool tcpdump to view the TCP communication of a website.

4、Why WebGPU matters(English)

bg2023050301

The graphics API of the operating system is not unified at present: Windows is DirectX, Apple is Metal, and Linux is Vulkan.

WebGPU is a cross-platform solution that provides a unified interface. This long article is recommended.

5、My 30 years developing PCalc(English)

bg2022122402

The author wrote a calculator, PCalc (above), for the Macintosh computer in 1992. He later maintained the project for 30 years and ported it to other Apple devices, such as the iPhone and iWatch (below). The author recalled his 30 years.

bg2022122403

6、Automate HTTP testing with hurl(English)

bg2022122701

This article introduces a simple method, using the software hurl, to automate the testing of the website API to see if it responds correctly.

7、Error Handling Mechanisms in Programming Languages(English)

bg2023041305

This article discusses how different languages ​​handle error reporting, such as Java throws an exception, while Go assigns the error to a variable.

Here’s another onearticles on the same topicis also worthy of reference.

8、crazy C strings(English)

bg2023040702

This article is a string tutorial in C language, from the end\0When it comes to Unicode, the conclusion is that in the C language, how troublesome it is to correctly handle strings.

tool

1、stagnated

bg2022121903

This software can turn a Git repository into a static website, generating a page for each file and commit.

2、meta tag generator

bg2022101001

For external URLs, many social media will display a card with a title, thumbnail image and brief content of the page. This information comes from the meta tags inside the web pages, and this tool can help you generate these meta tags.

3、CJK font recognition

bg2023050402

Upload a picture of East Asian text, and this open source tool can identify what fonts are used for these texts. (@JeffersonQin post)

4、microblog.pub

bg2022101007

A self-hosted open-source microblogging website that can only be used by one person (that is, no multiple users), and supports the ActivityPub protocol.

5、Textual Markdown Browser

bg2022121804

A Markdown file renderer for the terminal window, suitable for reading Markdown files in the terminal.

6、HorusPass

bg2023022501

This site takes the text entered by the user and generates a URL for sharing. However, this URL can only be opened once, and it will not exist on the second visit, which is a bit like “burning after reading”.

7、Progress-up

bg2023022701

A web multi-file upload JS library with upload progress display.

8、snappify

bg2022120903

A tool to generate screenshots of code snippets.

9、RustDesk

bg2022121003

An open source remote desktop software that allows you to remotely operate the desktops of other computers, with clients for various operating systems.

10、LosslessCut

bg2022121403

A video editor, the biggest feature is that it does not re-encode, and cuts and connects according to the original video format, so the speed is extremely fast.

resource

1、ChatGPT Tips Project for Developers

bg2023043004

Wu Enda’s free English course with OpenAI teaches you how to write ChatGPT tips and make your own chatbot.

2、The Complete Guide to Next.js and React

bg2023050403

Chinese subtitle version of Udemy High Score Paid Course. (@lyf61 post)

3、Graphical QUIC connection (Chinese version)

bg2023050404

Explain the meaning of each byte of QUIC protocol communication,original english versiontranslation. (@cangSDARM post)

4、musician

bg2022093004

An AI model that automatically generates music, its official website can listen to the music generated by this model.

picture

1、cloud expression

An American artist specializes in adding expressions to photos of clouds, making them look like cartoon characters.

bg2022111716

bg2022111717

Originally, out of boredom, he took some random photos of Yun, drew expressions on them, and posted them on the Internet.

Later, he found that many people liked these works, so he persisted.

bg2022111718

bg2022111719

“Looking at the clouds gives you endless inspiration,” he said.

Now, more and more readers are submitting articles to him. He is also preparing to publish a book.

bg2022111720

abstract

1、The Seven Levels of Busyness

The busyness of life can be divided into seven levels.

You can compare, which level do you belong to?

bg2023050405

Level 1: Not busy at all.

The time is very free, you can arrange it however you want, there are no things that must be done, and you can sleep as long as you want on weekends.

Level 2: There are some little things.

You remember that there is something to do. These things are legitimate things with no deadlines, but you know they have to be done sooner or later.

Level 3: Something important.

You have things that must be done and need to be tracked in time without procrastination, and you will always remind yourself of these things.

Level 4: The schedule is packed.

Your schedule is full, and you have to constantly ask yourself “what is more important?” in order to decide which things to do first and which things to do later.

You have no unplanned time, but you can still control the schedule.

Level 5: Chaos occurs in life.

You can’t finish your work during working hours, and you start working overtime.

You often say “sorry” to others because things are too late. Those things were not given up by you, but you had to rush, and some things became sloppy in execution.

Level 6: The task is endless.

You need to do more than you can schedule. Even if you give up some things, you still can’t finish the rest.

Your working hours are greatly extended, affecting normal life. You feel very tired.

Level 7: Life can’t get by.

Tasks fill every waking minute of your life. Meals and other necessities of life are taken time to do. When you’re busy, you don’t even have time to eat.

You stop writing your schedule because there is no time to plan and things change every hour.

You are also absent-minded when you walk, and you often feel that you are going to collapse, and you can’t go on with your life.

remarks

1、

I left Google to call out the risks of AI, and it is not convenient to talk about these things at Google.

“Father of Deep Learning” Jeffrey Hinton(Geoffrey Hinton), announced his resignation from Google

2、

The problem in Europe is that, instead of seeing the Internet as an economic opportunity to be exploited, it is seen as something to be regulated.

“Europe Is Not Ready to Become a ‘Third Superpower'”

3、

Most people think it’s okay to have people under them who are smarter than them. Typically, leaders hire advisers and staff who are smarter than themselves.

So why do people feel threatened when your minions are turned into AI models that are smarter than you?

Yann LeCunChief AI Scientist at Meta

4、

To be a good programmer, write a lot of code; to be a top programmer, read a lot of code.

“Please Write CRISP Code”

this week in history

How to move past disappointment and doubt(2022 #206)

Graphics card out of stock and competition from other industries(2021 #156)

digital nomad(2020 #106)

Why are liberal arts students not easy to find jobs?(2019 #56)

thank you

Weekly received a new generation of domestic knowledge management and collaboration platform FlowUs Thanks a lot for your help.

FlowUS = document + form + network disk. You can use it to write documents, make a home page, manage data, store files, and more.

Each issue of the magazine is simultaneously published in the FlowUs columnWelcome everyone to open your own column and homepage.

bg2023030205

(over)

document information

  • Copyright statement: free reprint – non-commercial – non-derivative – keep the signature (Creative Commons 3.0 License
  • Date published: May 5, 2023

Ewen Eagle

I am the founder of Urbantechstory, a Technology based blog. where you find all kinds of trending technology, gaming news, and much more.

View all posts by Ewen Eagle →

Leave a Reply

Your email address will not be published.