Python爬虫 | 美国特斯拉充电桩位置信息(含经纬度)爬取

warning: 这篇文章距离上次修改已过778天,其中的内容可能已经有所变动。

概要

最近在搞美赛模拟,2018D题需要用到特斯拉在美国的两种充电桩(一种是Supercharger,另一种是Destination Charging)的位置数据。下面介绍如何用Python在Tesla官网爬取已经建立的充电桩所在州、县、详细地址、经纬度坐标以及FIPS等信息,以及即将建立的充电桩的大致位置和计划建造时间。

目录

详细

数据来源

数据结果

tesla_charger_us.xlsx
tesla_supercharger_us.xlsx
下载:https://cowtransfer.com/s/7c7a8523a76945

程序实现

环境

Python 3.9.7
Jupyter 1.0.0
Packages

名称版本说明
lxml4.6.3\
numpy1.20.3\
pandas1.3.4\
requests2.26.0\
regex2021.8.3Python自带正则表达式模块
progressbar24.0.0用于在控制台显示进度条

代码

由于两种充电桩的数据页面基本一样,所以以下代码可以抓取两类充电桩的数据。记得在 # 索引页面root下面的requests.get()里面修改source为数据来源里写的网页,并且在代码末尾修改保存csv的名称避免覆盖。

#!/usr/bin/env python
# coding: utf-8

# # 2018-MCM-D
# ## 数据爬取
# ### 特斯拉Supercharger基本信息爬取

import numpy as np
import pandas as pd
import requests
from lxml import etree
get_ipython().run_line_magic('matplotlib', 'inline')
get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'svg'")

# 内   容 :美国特斯拉超级充电桩位置
# Content: Tesla Superchargers' location in the US
# Source : https://www.tesla.com/findus/list/superchargers/United+States
data_col = ['name',
           'common_name',
           'street_address',
           'extended_address',
           'locality',
           'google_navi',
           'charging',
           'coming',
           'opening_date',
           'detail_url']
full_data = pd.DataFrame(columns=data_col) # 存放爬取到的数据

# 索引页面root
html = requests.get("https://www.tesla.com/findus/list/chargers/United+States")
etree_html = etree.HTML(html.text)

# 统计总共在多少个州有超级充电站
states_content = etree_html.xpath('//*[@id="find-us-list-container"]/div/section/div/h2')
state_count = 0
for each_state in states_content:
    print(each_state.text)
    state_count = state_count + 1
print('-------------\n{} states in total'.format(state_count))
# 57 states in total, in which ME and Maine are duplicated
# 56 states in fact

# 爬取每个充电站的详情页面url
detail_url = etree_html.xpath('//*[@id="find-us-list-container"]/div/section/div/div/address/a')

root_data = pd.DataFrame(columns=['name','detail_url'])
index = 0
for each_url in detail_url:
    print(each_url.text)
    print(each_url.attrib)
    full_url = "https://www.tesla.com"+each_url.attrib.get('href')
    root_data.loc[index] = [each_url.text,full_url]
    index = index + 1

print(root_data)


# ### 特斯拉Supercharger详细信息解析

from lxml import html
import re
import progressbar

p = progressbar.ProgressBar()
p.start(len(root_data))
# get further information through 'detail_url'
for index,row in root_data.iterrows():
    collected_row = {'name':'',
           'common_name':'',
           'street_address':'',
           'extended_address':'',
           'locality':'',
           'google_navi':'',
           'charging':'',
           'coming':'',
           'opening_date':'',
           'detail_url':''}
    response = requests.get(row['detail_url'])
    etree_html = etree.HTML(response.text)
    #print(row['detail_url'])
    # contents
    pattern0 = re.compile(r'<address.*?class=\"vcard\">(.*?)</address>',re.S)
    root=re.findall(pattern0,response.text)
    
    pattern1 = re.compile(r'<span.*?class=\"common-name\">(.*?)</span>',re.S)
    common_name = re.findall(pattern1,root[0])
    
    pattern2 = re.compile(r'<span.*?class=\"street-address\">(.*?)</span>',re.S)
    street_address = re.findall(pattern2,root[0])
    
    pattern3 = re.compile(r'<span.*?class=\"extended_address\">(.*?)</span>',re.S)
    extended_address = re.findall(pattern3,root[0])
    
    pattern4 = re.compile(r'<span.*?class=\"locality\">(.*?)</span>',re.S)
    locality = re.findall(pattern4,root[0])
    
    pattern5 = re.compile(r'<a.*?href=\"(.*?)\" target=\"_blank\">',re.S)
    google_navi = re.findall(pattern5,root[0])
    
    pattern6 = re.compile(r'harging</strong>.*?>(.*?)</p>',re.S)
    charging = re.findall(pattern6,root[0])
    
    pattern7 = re.compile(r'<span.*?class=\"coming-soon\">(.*?)</span>',re.S)
    coming = re.findall(pattern7,root[0])
    
    pattern8 = re.compile(r'<p><strong>Target opening in (.*?)</strong></p>',re.S)
    opening_date = re.findall(pattern8,root[0])

    if(len(common_name)>0):
        collected_row['common_name']=common_name[0]
        #print(common_name[0])
    if(len(street_address)>0):
        collected_row['street_address'] = street_address[0]
        #print(street_address[0])
    if(len(extended_address)>0):
        collected_row['extended_address'] = extended_address[0]
        #print(extended_address[0])
    if(len(locality)>0):
        collected_row['locality'] = locality[0]
        #print(locality[0])
    if(len(google_navi)>0):
        collected_row['google_navi'] = google_navi[0]
        #print(google_navi[0])
    if(len(charging)>0):
        collected_row['charging'] = charging[0]

    if(len(coming)>0):
        collected_row['coming'] = coming[0]
        #print(coming[0])
    if(len(opening_date)>0):
        collected_row['opening_date'] = opening_date[0]
        #print(opening_date[0])
    
    collected_row['name'] = row['name']
    collected_row['detail_url'] = row['detail_url']
    full_data.loc[index] = collected_row
    
    
    p.update(index)
p.finish()

full_data.to_csv('data/tesla_charger_us.csv')

print(full_data)

参考

https://blog.csdn.net/qq_32392597/article/details/96147620

添加新评论