A commercial crawler's journey of growth


Reading an article where programmers complain about their current learning environment at work, the living environment around one can largely affect your attitude towards learning.

For example, if your English is terrible, if you are thrown in an environment where you have to communicate with foreigners in English, your English should improve by leaps and bounds and you will find that you can communicate fluently with people in English.

English learning is the same, programming development is also the same, when studying in college think nothing, but once you count on this skill to support your family, I believe you will also have a sudden improvement, that combined with programmers, the closest programming language to English syntax is certainly not Python.

With the explosion of the artificial intelligence and data mining industries in recent years, the language Python has gradually entered the mainstream. If 2014 to 2017 was the era of mobile development, now it can be said to be the era of artificial intelligence and data mining analytics. The programming industry is such that technology is changing rapidly and a little laxity in learning will not keep up with the times. But remember that choice is more important than effort.

Choice is more important than effort.

A few 3-5 years back, when development was dominated by mobile programming, Android and iOS required different system environments; iOS development required a mac system environment, and a good mac computer was worth a lot of money. Android development, on the other hand, can be done with a Windows-based computer. Indirectly, this has created a hot environment where Android development is crushing iOS development, how can I describe it? Let's put it this way the fire is as hot as today's artificial intelligence.

And computer science college students tend to learn things are very general and very basic knowledge Java, C#, etc., rarely will come into contact with Android/iOS and Python such other programming languages, so many computer science graduates in the mobile development of the hottest years choose Android development and iOS development, now most of them have changed careers, I am surrounded by a few Android and iOS development now most choose to learn python, why they choose to learn Python rather than other development languages? The main reasons remain these two.

1, the first and foremost certainly because python is now a big fire, career development has more growth paths to choose from; WEB development, artificial intelligence algorithm posts, data analysis, crawler engineers, server operations and maintenance, and so on mainstream Internet programming positions.

2, looking for a job after graduating from college found that the demand for mobile development shrinks too badly for work and life (it's time to make money when you're older ---- If you have a little programming foundation, you can use Python in a week or so at the earliest. Even if you don't have a foundation, it is easier to master Python programming in a short time compared to other programming learning up to three or four months to a year and a half.

And it's just the right time for Python to be in the limelight right now before this bonus period has passed. The ones who got on the bus are already working their way up to a higher level, and the ones who didn't are still waiting on the sidelines. Crawlers have become the programming language of choice for more IT elites who want to enter the programming world with their ease of learning and broad development prospects, and many people have developed their interest in crawlers for their careers.

But when a hobby becomes your career, you realize how hard it is to get good at it.

But in becoming Professional crawler later, The first task on the job I received was to write a browser plugin crawler。

WTF! What? Browser plugins can also be crawlers? The first thought in my head at the time was dumbfounded, what am I supposed to do with this. But again, this was the first task assigned by the boss, so it was impossible to say that I wouldn't do it, so I had to figure it out a little bit.

The ability to learn must be one of the most important abilities for a professional crawler or a professional programmer. Because there are very many areas that you will encounter on the job that you are not familiar with, just like browser plugin crawlers. Browser plugins are entirely programmed in the language of the front-end, which requires you to know html, js, and css code. You said you only know Python and you don't know this? Sorry, then you'll have to go. But luckily some of the logic of the crawler is generic.

Professional crawler

Professional crawler The day-to-day job is definitely helping the company get all kinds of data, Maintain existing crawler code, Allow crawlers to crawl the data your company needs every day。 But professional crawlers are fundamentally different from the usual crawlers written。

The usual practice of writing a crawler that can't catch the data may give up after you've tossed it around for a few hours. But when it comes to real work, no matter what you use, the boss must see the data at the end of the day. A crawler written on a regular basis might be at most 100 lines of code. But with a professional crawler, an xpath statement in a parsing function is thousands of lines of code. I'm the guy who spent a day writing thousands of lines of xpath statements and almost didn't get xpath.

If the usual crawler written is blocked, it's blocked, big deal to crawl a different site. But professional crawlers an account may be registered to thousands, tens of thousands of fast, was closed economic loss is very large, may be because of this an account, your month's salary is gone, but also less boss a big scolding. So the boss explicitly tells you to write this crawler, and the requirement is a crawler that never gets blocked, but also catches a lot of data. My heart:***

And these are just the things that make themselves Professional crawler 3 What has happened in the last month。

Crawler technology is varied and abounds in the market, but few can really achieve the ultimate, especially to Commercial crawlers The levels are almost non-existent.

The short and sweet explanation of the so-called commercial grade is. Climb whoever you want!

So many sites now have anti-crawl strategies, such as IP Restrictions, Access Frequency Limits, User-Agent Authentication, Data Encryption, Captcha Restrictions, Login Restrictions etc. When you encounter these situations, the average crawler is at its wits' end.

I think back to my initial learning Python crawler to now, along the way is also experienced many detours and misunderstandings, the following summary of my learning Python crawler a few experiences for you to share ( Article 7 is the focus)。

1、Learn the foundation of Python

2、First introduction to Python data analysis

3, Beginning Python Web Crawler

4、Study "learn Python web crawler from scratch", systematically learn crawler

5, first acquaintance with machine learning, study "Machine Learning Python Practice

6. Study of Data Analysis with Python

7, research Liao Xuefeng big brother's blog, official website, learning tutorials, etc.

When it comes to learning Python, the tutorials of Liao Xuefeng, the godfather of Python, are the best!

With that in mind, the best-selling author of Spring 2.0 Core Technologies and Best Practices. The Godfather of Python - Xuefeng Liao The teacher team has come out with an official version of a crawler course to help you become a Python crawler in the IT jungle!

Also available at the end of this article is a free Python tutorial summarized by Mr. Liao Xuefeng (free for the first 300 people)

Mr. Liao Xuefeng. He has worked as a senior technologist in well-known companies such as Siemens, Motorola, and FireCoin; his official blog is a common reference tutorial for many technical people, with daily visits of 50,000+

What we saw before on Liao Shen's blog was a text version of the Python tutorial, but this time it's different. Straightforward class with videos, notes and case studies in action!!!

Without further ado look at the contents.

This crawler course contains 13 project cases Hands-on, not only will you be taught the most professional commercial crawlers, but also how to anti-crawl, and even more so, how to Cracking the anti-crawler.

All for one purpose only. Let there be no data in the world that can't be crawled!

Friendly reminder. This set of videos is created by Mr. Liao Xuefeng, not only has the theory, but also contains the case explanation summarized by Mr. Liao Xuefeng's years of development experience, I hope you can learn seriously after receiving the material!


Recommended>>
1、Tracking your tuna with blockchain
2、Notice of the General Office of Xiamen Municipal Peoples Government on the Issuance of Several Measures to Promote the Development of a New Generation of Artificial Intelligence Industry
3、How many times does a butterfly blink before it learns to fly What you dont know
4、Here comes the blockchain for CCHT Are you ready
5、RMS father of free software Linux contributors have no right to revoke code

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号