The reason why this post seems so long is because over 17 people write this content for our viewers. so enjoy
Here is a record of the technology content worth sharing every week, released on Friday.
this magazineopen source,welcomepost.Weekly otherwise“Who’s Hiring”Service, publish programmer recruitment information.cooperative promotion pleaseemail contact(email@example.com).
In the small town of Tirau, New Zealand, there are many strange buildings made of corrugated iron sheets placed on the side of the road. The puppy pictured above is actually a public toilet. (via）
Topic of the Week: Big Data is Dead
The word “big data” must be familiar to everyone. This is one of the most popular IT words, and the whole society used to love it.
Baidu IndexIt shows that “big data” began to enter the search engine in 2011, then spread rapidly, and reached its peak between 2017 and 2019.
At that time, everyone believed that data would grow exponentially in the future, and the world would be flooded with massive amounts of data. How to deal with these data has become a key issue. It determines the competitiveness of an enterprise or even a country in the information age.
As a result, companies are seeking big data solutions one after another, and many related recruitment positions have appeared, all of which are high salaries.Universities also responded positively, withto reportAccording to the report, more than 600 colleges and universities in China have set up “big data majors” or “big data colleges”, includingBeijing University、Fudan UniversitySuch a prestigious school.
However, ten years later, big data has not become a bottleneck for development, and we are still able to handle all the data generated, and this will be the case in the foreseeable future.
The predicted era of big data does not seem to have come, but has become far away. The popularity of the word “big data” is also cooling down, the number of times it is mentioned has decreased, and the number of recruitment positions has gradually disappeared.
Correspondingly, the technical field of “big data” has made little progress, no new concepts and theories have been born, no technological breakthroughs have been made, and many directions have stagnated.
For example, NoSQL databases designed for processing big data are getting weaker and stagnant, while traditional relational databases (SQLite, Postgres, MySQL) are growing strongly and becoming more and more popular.
How is this going?
Jordan Tigani, a big data engineer at Google, recently put it bluntly: “big data is dead“。
he thinks,The era of big data is over, and the storage and analysis of big data has been solved as a technical problem. Users no longer have to worry about the size of the data, no amount of data is a problem.
He put forward 6 reasons for “big data is dead”, I think it is very convincing, and I will share it with you below.
(1) The vast majority of enterprises cannot reach the level of big data. The amount of data in enterprises is often less than 1TB, and many are even less than 100GB.
Suppose a medium-sized manufacturing company has 1000 customers, each customer generates an order every day, and each order contains 100 products. The amount of data generated by this company a day is still far less than 1 MB. Three years later, the total amount of data is only 1 GB, and it will take thousands of years to reach 1 TB.
Even large Internet companies cannot reach the level of big data most of the time. Assuming that a certain marketing activity has 1 million users, and at the same time, the company has launched dozens of such marketing activities, the daily data volume is still less than 1 GB, even with various logs, it may only be a few GB, This is far from big data.
(2) Storage and computing are being separated. Big data includes two aspects of “data storage” and “data computing”. If it is processed in one system, it is really difficult.
However, these two aspects have now been able to be decoupled into two independent systems, each able to scale independently. This means, “data computing” is not limited by “data storage” (database size), and vice versa.
Therefore, big data ceases to exist as a single problem, and becomes two problems of massive storage and large computing.
(3) In the absence of new business, the data grows linearly, That is, the new data added every day has the same structure as the previous data.
Once the previous data is written into the database, it usually does not change anymore, and there is no new calculation requirement, and the relevant calculations have been completed before. At this time, you only need to calculate the latest newly added data separately, and then save it. You rarely need to scan old data every day: those data are immutable, why calculate them over and over again?
Therefore, the assumption that “data will grow exponentially” does not hold for a business. Moreover, the demand for data calculation is actually much smaller than the demand for data storage, because old data rarely needs to be calculated again.
(4) People often only value the most recent data. The most frequent query is for data generated within 24 hours, the possibility of querying data from one week ago is 20 times lower, and historical data from one month ago is only occasionally queried.
This means that big data is more like data at rest than data in motion. Since the previous data is rarely used, it can be compressed and saved. A table containing 10 years of data may reach the PB level, but if the historical data is compressed, it may be less than 50 GB after compression.
(5) Companies that really have big data almost never query all the data.90% of their queries involved less than 100 MB of data, and very few queries involved terabytes of data.
Even if you query terabytes of data, the priority of query performance is often not high. Waiting a weekend or a few days for results is usually acceptable.
Also, queries on large datasets are expensive. Google’s BigQuery is priced at $5,000 for petabyte-scale queries, and even large companies don’t use it very often.
(6) The rapid development of hardware has greatly increased the computing power of a single computer. In 2004, when Google published the MapReduce paper, the computing power of a single machine was still relatively weak, and many calculations had to be done through distribution.
In 2006, AWS launched the EC2 cloud host, you can only use a single-core CPU and 2 GB of memory. Today, a standard instance on AWS has 64 cores and 256 GB of memory. If you’re willing to spend more, you can get 445 cores and over 24 TB of memory.
The computing power of a stand-alone machine has been greatly enhanced, which means that the biggest difficulty of big data – distributed computing – even if it is used, the degree of difficulty is greatly reduced.
In summary, the conclusion is:There is no need to pay special attention to the amount of data, and there is no need to worry about being unable to handle massive data. As a technical problem, big data has already been solved.
An African artist used AI to generate a fashion show.
He told the AI that he wanted a fashion show with older Africans displaying their national costumes, and the AI generated images accordingly. After continuous adjustment, we finally got quite ideal results.
It can be seen that these images may have a better effect and greater impact than the actual fashion show.
In real life, finding these models, and getting these clothes ready, is very difficult and expensive.
There will definitely be many fashion shows in the future, which will be generated with AI. This may have a big impact on the fashion industry. Who would have thought that AI could also affect the livelihood of models.
2、Aerosols from toilet flushes
Public health scientists have long called for the spread of aerosol particles generated by toilet flushes that could spread pathogens, but there is no evidence of the speed and distribution of such spread.
A team of engineers at the University of Colorado used green lasers and high-speed camera equipment to film toilet flushes and confirmed that aerosol diffusion does exist.
Experiments have found that when flushing, the particles are ejected rapidly at a speed of 2 meters per second and can reach a height of 1.5 meters above the toilet (above).
While larger particles settle on the toilet surface within seconds, smaller aerosol particles can remain suspended in the air for several minutes or longer.
The picture below is the laser shooting scene during the experiment.
Google Maps will display the movement track based on the GPS signal. Many people use this method to draw on Google Maps, which is called GPS art.
In order to propose to his girlfriend, a Japanese man wrote “Marry Me” on Google’s map of Japan, and finally drew a heart shot by an arrow.
He drove from Hokkaido in the north of Japan to Kagoshima in the south, traveling a total of 7,163 kilometers for 6 months. It is the largest GPS artwork in the world.
There is a special theater performance in Reykjavik, the capital of Iceland, showing lava flowing out of a volcano.
It uses real lava from a volcanic eruption in 1918, and it uses 600 kilograms at a time. After reheating to the melting point (1100 degrees Celsius), the cooled lava turns back into a fiery liquid, which flows down from a high slide, allowing visitors to experience the feeling of a volcanic eruption.
Real volcanic lava contains sulfur gas, which is poisonous, but after repeated heating, these gases have been released, so it is safe.
The organizer of the event said that there was a volcanic eruption, and he saw many tourists rushing to watch it, so he came up with the idea of holding a lava show.
1、how do i set up my own blog(English)
The author introduces in detail his process of building a blog using the Next.js framework and other React tools. He wants to include React components into Markdown files using the MDX format.
2、One Year Anniversary of Steam Deck(English)
This month marks the one-year anniversary of the handheld game console Steam Deck. This article describes the many innovations of this device, especially in software.
3、How to configure the nano editor(English)
The server comes with a nano editor, but it is generally considered to be weak in function. This article explains how it works great when configured correctly.
4、How Stripe builds interactive documentation with Markdoc(English)
Markdoc is an extended syntax of Markdown that allows components to be inserted when generating HTML pages, and users can interact with the document. This article describes how Stripe uses Markdoc to build documentation.
5、CSS Color Formatting(English)
This long article introduces the various formats of CSS colors in detail. If you want to learn more about how to express colors, it is recommended to read this article.
6、Query GitHub with ClickHouse(English)
ClickHouse is a well-known data warehouse software, and its official website has adriving range, you can query all GitHub events since 2011 online, with a total of 3.1 billion records. This article demonstrates how to use this database.
7、Explaining HTTPS with carrier pigeons(English)
The author uses the example of the pigeon to pass the book to explain what the HTTPS protocol is. It is well written and this example is also very appropriate.
8、How many layers of UI does Windows 11 have?(English)
A very interesting article, the author examines how many previous styles are preserved in the Windows 11 UI, and even finds out the legacy Windows XP and Windows 3.1 styles.
A bookmarklet script for removing sticky elements from web pages. Many websites use this element to generate overlays, which is very annoying.Similar scripts also unsticky。
English learning software, the user uploads the original video and subtitles, and the software will automatically generate a thesaurus. When playing the video in the future, when encountering a word in the thesaurus, the word will automatically display the interpretation in the form of a barrage. (@tangshimin post)
A web-based drawing bed system. (@it-chenliang post)
An open source data exploration tool that can import data from various data sources, and then customize the query and presentation of data (charts, dashboards, maps, etc.). (@jerrylususu post)
A Python-based web automation tool that can write scripts to manipulate browsers. (@g1879 post)
A command-line tool written in JS that can estimate the approximate time spent developing a code base.
An open source full-text search engine that claims to replace Elasticsearch and supports Chinese.
A command-line tool that automatically deletes silent clips in videos, which is more suitable for processing speeches and lecture videos.
A Bootstrap-based web UI framework, dedicated to the development of management background.
This website can convert code in one language into another language, such as converting JS code into PHP code. It is a paid service, but there are free credits.
Youtube mobile client developed by netizens, supports Android and iOS.
This page collects and presents various classless minimalist CSS frameworks. Check it out if you want to pick a simple CSS framework.
2、C language manual（GNU C Language Manual）
Richard Stallman’s recent C language tutorial, here is the source code,GitHub There are converted PDF files available for download.
A highly recognizable English font, the confusing characters are clearly distinguished, such as 1 and I, i and l.
A simple and practical English tutorial for getting started with Python.
1、AI portraits of US presidents
An American columnist used AI to generate Pixar-like cartoon-style portraits of every U.S. president.
Below are the US presidents of the last half century.
In the 1960s and 1970s, Brussels, the capital of Belgium, let urban planning go its own way, resulting in many incongruous high-rise buildings in traditional neighborhoods.
In the picture above, high-rise buildings are built randomly in traditional blocks, destroying urban functions and landscapes.
Later, in architecture, “Brusselification” came to mean haphazard urban planning.
The K-219 was a nuclear submarine of the Soviet Navy that could carry 16 missiles armed with either 32 or 48 nuclear warheads.
On Friday, October 3, 1986, it was sailing in the Atlantic Ocean when the missile tube exploded and burst into flames. The Soviet Union later claimed that this was due to a collision with a U.S. submarine, but the U.S. Navy denied this.
Two Soviet sailors were killed instantly in the explosion, and a third died shortly after from the poisonous gas. To make matters worse, the explosion blew a gap in the submarine, sea water poured in, and the submarine quickly fell from 40 meters to 300 meters from the bottom of the sea.
The captain had to immediately close the airtight doors between all cabins to prevent the continuous influx of sea water.
Twenty-five sailors were trapped in a sealed cabin and could not get out. After a tense meeting, the captain finally agreed to open the airtight compartment and let them out.
The scariest part is that the nuclear reactor, which was supposed to automatically shut down, is still running. If it continues like this, the consequences will be disastrous. The reactor chamber was already over 60C and filled with toxic nitric acid fumes, but someone still had to go in and manually put control rods into the reactor to shut it down.
The first soldier enters the reactor bay, but only inserts one control rod (out of four) and runs out of oxygen. He had to back out and passed out as soon as he got out.
At this time, a 20-year-old soldier expressed his willingness to complete this task. Wearing a chemical suit, he entered the room and successfully shut down the reactor. However, a fire broke out in the room, increasing the pressure, and the pressure difference made it impossible for the soldier to open the door, and he eventually suffocated in the reactor compartment. He was later posthumously named Hero of the Russian Federation.
When the nuclear reactor shuts down, the submarine loses power. The captain kept the K-219 afloat entirely on battery power. Then, the Soviet Union prepared to send a freighter to tow the submarine back to port.
However, the seawater continued to infiltrate, and by October 6, three days later, it was completely beyond repair. All personnel had to be evacuated, and the submarine and the nuclear weapons it carried sank to the bottom of the Atlantic Ocean at a depth of 6,000 meters, where it is still there today.
Moore’s Law only says that the number of transistors on a computer chip doubles every 18 months, but it doesn’t say that the number of scientific researchers required for doubling is 18 times that of the 1970s today.
— Breakthroughs in Biology 2022
I have done more than 70 entrepreneurial projects, only 4 were successful. Overall, my success rate is only about 5%, and 95% of the things I do will fail.
So… I’m going to do more projects.
Some scientists are like birds, looking at problems from a bird’s-eye view, focusing on the vast landscape in front of them, without too many details. Other scientists are like frogs, who only focus on what is in front of them and like to go into details.
— The Birds and Frogs of Physics
You should start blogging, if you don’t know what to write about, about what you’ve learned, and about what you’ve created or built.
The ultimate hidden truth of the world is that the world is something we made and could easily be made different.
this week in history
If there is a happy machine in this world（2022 #197）
Find the pain you are willing to endure（2021 #147）
Telecommuting exposes redundant positions（2020 #95）
Asimov’s memoir “The Stage of Life”（2019 #45）
Weekly received a new generation of domestic knowledge management and collaboration platform FlowUs Thanks a lot for your help.
FlowUS = document + form + network disk. You can use it to write documents, make a home page, manage data, store files, and more.
Each issue of the magazine is simultaneously published in the FlowUs columnWelcome everyone to open your own column and homepage.
- Copyright statement: free reprint – non-commercial – non-derivative – keep the signature (Creative Commons 3.0 License）
- Date published: March 3, 2023