DrissionPage/README.en.md

2959 lines
74 KiB
Markdown
Raw Normal View History

2020-05-25 14:42:44 +08:00
# Introduction
2020-11-12 17:57:37 +08:00
2020-05-25 14:42:44 +08:00
***
2020-04-26 11:47:38 +08:00
2020-11-12 17:57:37 +08:00
DrissionPage, a combination of driver and session, is a python- based Web automation operation integration tool.
It achieves seamless switching between selenium and requests.
2020-08-14 00:34:16 +08:00
Therefore, the convenience of selenium and the high efficiency of requests can be balanced.
2020-11-12 17:57:37 +08:00
It integrates the common functions of the page, the API of the two modes is consistent, and it is easy to use.
It uses the POM mode to encapsulate the commonly used methods of page elements, which is very suitable for automatic operation function expansion.
What's even better is that its usage is very concise and user- friendly, with a small amount of code and friendly to novices.
2020-04-26 11:47:38 +08:00
2020-11-12 17:57:37 +08:00
**project address:**
2020-04-26 11:47:38 +08:00
2020-05-25 14:42:44 +08:00
- https://github.com/g1879/DrissionPage
- https://gitee.com/g1879/DrissionPage
2020-04-26 11:47:38 +08:00
2020-11-13 11:00:22 +08:00
**Sample address:** [Use DrissionPage to crawl common websites and automation](https://gitee.com/g1879/DrissionPage-demos)
2020-11-12 17:57:37 +08:00
**Contact Email: ** g1879@qq.com
# Concept and background
***
## Idea
**Concise, easy to use, extensible**
2020-06-01 18:09:23 +08:00
2020-11-12 17:57:37 +08:00
## Background
When the requests crawler faces the website to be logged in, it has to analyze data packets and JS source code, construct complex requests, and often has to deal with anti- climbing methods such as verification codes, JS confusion, and signature parameters, which has a high threshold. If the data is generated by JS calculation, the calculation process must be reproduced. The experience is not good and the development efficiency is not high.
Using selenium, these pits can be bypassed to a large extent, but selenium is not efficient. Therefore, this library combines selenium and requests into one, switches the corresponding mode when different needs, and provides a user- friendly method to improve development and operation efficiency.
In addition to merging the two, the library also encapsulates common functions in web pages, simplifies selenium's operations and statements. When used for web page automation, it reduces the consideration of details, focuses on function implementation, and makes it more convenient to use.
Keep everything simple, try to provide simple and direct usage, and be more friendly to novices.
2020-04-26 11:47:38 +08:00
2020-05-25 14:42:44 +08:00
# Features
2020-06-01 18:09:23 +08:00
2020-05-25 14:42:44 +08:00
***
2020-04-26 11:47:38 +08:00
2020-11-12 17:57:37 +08:00
- The first pursuit is simple code.
- Allow seamless switching between selenium and requests, sharing session.
- The two modes provide consistent APIs, and the user experience is consistent.
- Humanized page element operation mode, reducing the workload of page analysis and coding.
- The common functions are integrated and optimized, which is more in line with actual needs.
- Compatible with selenium code to facilitate project migration.
- Use POM mode packaging for easy expansion.
- A unified file download method makes up for the lack of browser downloads.
- Simple configuration method, get rid of tedious browser configuration.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
# Project structure
2020-06-01 18:09:23 +08:00
***
2020-11-18 16:50:55 +08:00
## Structure diagram
2020-06-01 18:09:23 +08:00
2020-11-18 16:50:55 +08:00
![](https://gitee.com/g1879/DrissionPage-demos/raw/master/pics/20201118164542.jpg)
## Drission Class
Manage the WebDriver object and Session object responsible for communicating with the web page, which is equivalent to the role of the driver.
## MixPage Class
MixPage encapsulates the common functions of page operation. It calls the driver managed in the Drission class to access and operate the page. Can switch between driver and session mode. The login status will be automatically synchronized when switching.
## DriverElement class
The page element class in driver mode can perform operations such as clicking on the element, inputting text, modifying attributes, running js, etc., and can also search for descendant elements at its lower level.
## SessionElement Class
The page element class in session mode can obtain element attribute values and search for descendant elements at its lower levels.
2020-11-12 17:57:37 +08:00
# Simple demo
2020-06-01 18:09:23 +08:00
2020-11-12 17:57:37 +08:00
***
## Comparison with selenium code
2020-06-01 18:09:23 +08:00
2020-11-12 17:57:37 +08:00
The following code implements exactly the same function, compare the amount of code between the two:
2020-06-01 18:09:23 +08:00
2020-11-20 23:20:37 +08:00
- Find the first element whose text contains some text with explicit wait
2020-06-01 18:09:23 +08:00
```python
2020-11-12 17:57:37 +08:00
# Use selenium:
2020-11-20 23:20:37 +08:00
element = WebDriverWait(driver).until(ec.presence_of_element_located((By.XPATH,'//*[contains(text(), "some text")]')))
2020-11-12 17:57:37 +08:00
# Use DrissionPage:
element = page('some text')
2020-06-01 18:09:23 +08:00
```
2020-11-12 17:57:37 +08:00
- Jump to the first tab
2020-06-01 18:09:23 +08:00
```python
2020-11-12 17:57:37 +08:00
# Use selenium:
2020-06-01 18:09:23 +08:00
driver.switch_to.window(driver.window_handles[0])
2020-11-12 17:57:37 +08:00
# Use DrissionPage:
2020-06-01 18:09:23 +08:00
page.to_tab(0)
```
2020-11-12 17:57:37 +08:00
- Select drop- down list by text
```python
# Use selenium:
from selenium.webdriver.support.select import Select
select_element = Select(element)
select_element.select_by_visible_text('text')
# Use DrissionPage:
element.select('text')
```
- Drag and drop an element
2020-06-01 18:09:23 +08:00
```python
2020-11-12 17:57:37 +08:00
# Use selenium:
2020-06-01 18:09:23 +08:00
ActionChains(driver).drag_and_drop(ele1, ele2).perform()
2020-11-12 17:57:37 +08:00
# Use DrissionPage:
2020-06-01 18:09:23 +08:00
ele1.drag_to(ele2)
```
2020-11-12 17:57:37 +08:00
- Scroll the window to the bottom (keep the horizontal scroll bar unchanged)
2020-06-02 00:14:45 +08:00
```python
2020-11-12 17:57:37 +08:00
# Use selenium:
driver.execute_script("window.scrollTo(document.documentElement.scrollLeft, document.body.scrollHeight);")
# Use DrissionPage:
2020-06-02 00:14:45 +08:00
page.scroll_to('bottom')
```
2020-11-12 17:57:37 +08:00
- Set headless mode
2020-06-05 23:03:13 +08:00
```python
2020-11-12 17:57:37 +08:00
# Use selenium:
2020-06-05 23:03:13 +08:00
options = webdriver.ChromeOptions()
2020-11-12 17:57:37 +08:00
options.add_argument("- - headless")
# Use DrissionPage:
2020-06-05 23:03:13 +08:00
set_headless()
```
2020-11-12 17:57:37 +08:00
- Get pseudo element content
2020-06-01 18:09:23 +08:00
2020-11-12 17:57:37 +08:00
```python
# Use selenium:
text = webdriver.execute_script('return window.getComputedStyle(arguments[0], "::after").getPropertyValue("content");', element)
2020-06-01 18:09:23 +08:00
2020-11-12 17:57:37 +08:00
# Use DrissionPage:
text = element.after
```
2020-06-01 18:09:23 +08:00
2020-11-12 17:57:37 +08:00
- Get shadow- root
```python
# Use selenium:
shadow_element = webdriver.execute_script('return arguments[0].shadowRoot', element)
# Use DrissionPage:
shadow_element = element.shadow_root
```
2020-11-20 23:20:37 +08:00
- Use xpath to get attributes or text nodes directly
2020-11-12 17:57:37 +08:00
```python
# Use selenium:
2020-11-20 23:20:37 +08:00
Quite complicated
2020-11-12 17:57:37 +08:00
# Use DrissionPage:
class_name = element('xpath://div[@id="div_id"]/@class')
text = element('xpath://div[@id="div_id"]/text()[2]')
```
## Compare with requests code
The following code implements exactly the same function, compare the amount of code between the two:
- Get element content
```python
url ='https://baike.baidu.com/item/python'
# Use requests:
from lxml import etree
headers = {'User- Agent':'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36'}
response = requests.get(url, headers = headers)
html = etree.HTML(response.text)
element = html.xpath('//h1')[0]
title = element.text
# Use DrissionPage:
page = MixPage('s')
page.get(url)
title = page('tag:h1').text
```
Tips: DrissionPage comes with default headers
- download file
```python
url ='https://www.baidu.com/img/flexible/logo/pc/result.png'
save_path = r'C:\download'
# Use requests:
r = requests.get(url)
with open(f'{save_path}\\img.png','wb') as fd:
for chunk in r.iter_content():
fd.write(chunk)
# Use DrissionPage:
page.download(url, save_path,'img') # Support renaming and handle file name conflicts
2020-11-12 17:57:37 +08:00
```
2020-06-01 18:09:23 +08:00
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
## Mode switch
Log in to the website with selenium, and then switch to requests to read the web page. Both will share login information.
2020-05-25 14:42:44 +08:00
```python
page = MixPage() # Create page object, default driver mode
page.get('https://gitee.com/profile') # Visit the personal center page (not logged in, redirect to the login page)
2020-05-25 14:42:44 +08:00
page.ele('@id:user_login').input('your_user_name') # Use selenium to enter the account password to log in
2020-05-25 14:42:44 +08:00
page.ele('@id:user_password').input('your_password\n')
page.change_mode() # Switch to session mode
print('Title after login:', page.title,'\n') # session mode output after login
2020-05-26 21:58:47 +08:00
```
2020-05-25 14:42:44 +08:00
2020-05-26 21:58:47 +08:00
Output:
```
2020-11-12 17:57:37 +08:00
Title after login: Personal Information- Code Cloud Gitee.com
2020-05-26 21:58:47 +08:00
```
2020-11-12 17:57:37 +08:00
## Get and print element attributes
2020-05-26 21:58:47 +08:00
```python
2020-11-12 17:57:37 +08:00
# Connect the previous code
foot = page.ele('@id:footer- left') # find element by id
first_col = foot.ele('css:>div') # Use the css selector to find the element in the lower level of the element (the first one)
lnk = first_col.ele('text: Command Learning') # Use text content to find elements
text = lnk.text # Get element text
href = lnk.attr('href') # Get element attribute value
2020-11-12 17:57:37 +08:00
print(text, href,'\n')
# Concise mode series search
text = page('@id:footer- left')('css:>div')('text:command learning').text
print(text)
2020-05-25 14:42:44 +08:00
```
Output:
```
2020-11-12 17:57:37 +08:00
Git command learning https://oschina.gitee.io/learn- git- branching/
Git command learning
```
## download file
```python
url ='https://www.baidu.com/img/flexible/logo/pc/result.png'
save_path = r'C:\download'
page.download(url, save_path)
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
# Installation
2020-05-25 14:42:44 +08:00
***
```
pip install DrissionPage
```
2020-11-20 17:33:22 +08:00
Only supports python3.6 and above, and the driver mode currently only supports chrome.It has only been tested in the Windows environment.
2020-11-20 23:20:37 +08:00
To use the driver mode, you must download chrome and **corresponding version** of chromedriver. [[chromedriver download]](https://chromedriver.chromium.org/downloads)
The get_match_driver() method in the easy_set tool can automatically identify the chrome version and download the matching driver.
2020-05-25 14:42:44 +08:00
# Instructions
***
2020-11-12 17:57:37 +08:00
## Import module
2020-05-25 14:42:44 +08:00
```python
2020-11-20 23:20:37 +08:00
from DrissionPage import MixPage
2020-05-25 14:42:44 +08:00
```
## Initialization
If you only use session mode, you can skip this section.
2020-11-12 17:57:37 +08:00
Before using selenium, you must configure the path of chrome.exe and chromedriver.exe and ensure that their versions match.
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
There are four ways to configure the path:
-Use the get_match_driver() method of the easy_set tool (recommended)
-Write the path to the ini file of this library
-Write two paths to system variables
-Manually pass in the path when using
### Use get_match_driver() method
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
If you choose the first method, please run the following code before using it for the first time. The program will automatically detect the Chrome version installed on your computer, download the corresponding driver, and record it in the ini file.
```python
from DrissionPage.easy_set import get_match_driver
get_match_driver()
```
Output:
```
ini文件中chrome.exe路径 D:\Google Chrome\Chrome\chrome.exe
version 75.0.3770.100
chromedriver_win32.zip
Downloading to: D:\python\projects\DrissionPage\DrissionPage
100% Success.
解压路径 D:\python\projects\chromedriver.exe
正在检测可用性...
版本匹配,可正常使用。
```
Then you can start using it.
2020-11-22 21:12:26 +08:00
If you want to use the specified chrome.exe (green version), and specify the ini file and the save path of chromedriver.exe, you can write:
2020-11-20 23:20:37 +08:00
```python
2020-11-22 21:12:26 +08:00
get_match_driver(ini_path ='ini file path', save_path ='save path', chrome_path='chrome path')
2020-11-20 23:20:37 +08:00
```
2020-11-22 21:12:26 +08:00
Tips: When you specify chrome_path, the program writes this path to the INI file after successful detection.
2020-11-20 23:20:37 +08:00
### Use set_paths() method
If the previous method fails, you can download chromedriver.exe yourself, and then run the following code to record the path to the ini file.
2020-05-25 14:42:44 +08:00
```python
from DrissionPage.easy_set import set_paths
2020-11-20 23:20:37 +08:00
driver_path ='D:\\chrome\\chromedriver.exe' # Your chromedriver.exe path, if not filled in, it will be searched in system variables
chrome_path ='D:\\chrome\\chrome.exe' # Your chrome.exe path, if not filled in, it will be searched in system variables
2020-05-25 14:42:44 +08:00
set_paths(driver_path, chrome_path)
```
2020-11-12 17:57:37 +08:00
This method also checks whether the chrome and chromedriver versions match, and displays:
2020-05-25 14:42:44 +08:00
```
2020-11-20 23:20:37 +08:00
正在检测可用性...
版本匹配,可正常使用。
```
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
or
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
```
出现异常:
Message: session not created: Chrome version must be between 70 and 73
(Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19631 x86_64)
2020-11-20 23:20:37 +08:00
可执行easy_set.get_match_driver()自动下载匹配的版本。
或自行从以下网址下载https://chromedriver.chromium.org/downloads
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
After passing the check, you can use the driver mode normally.
2020-05-25 14:42:44 +08:00
In addition to the above two paths, this method can also set the following paths:
```python
debugger_address # Debug browser address, such as: 127.0.0.1:9222
download_path # Download file path
global_tmp_path # Temporary folder path
user_data_path # User data path
cache_path # cache path
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
Tips:
- Different projects may require different versions of chrome and chromedriver. You can also save multiple ini files and use them as needed.
- It is recommended to use the green version of chrome, and manually set the path, to avoid browser upgrades causing mismatch with the chromedriver version.
- It is recommended to set the debugger_address when debugging the project and use the manually opened browser to debug, saving time and effort.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
## Create drive object Drission
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
The creation step is not necessary. If you want to get started quickly, you can skip this section. The MixPage object will automatically create the object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Drission objects are used to manage driver and session objects. When multiple pages work together, the Drission object is used to pass the driver, so that multiple page classes can control the same browser or Session object.
The configuration information of the ini file can be directly read and created, or the configuration information can be passed in during initialization.
2020-05-25 14:42:44 +08:00
```python
2020-11-18 14:55:12 +08:00
# Create from the default ini file
2020-11-12 17:57:37 +08:00
drission = Drission()
2020-11-18 14:55:12 +08:00
# Create by other ini files
2020-11-12 17:57:37 +08:00
drission = Drission(ini_path ='D:\\settings.ini')
2020-11-18 14:55:12 +08:00
# Create without ini files
drission = Drission(read_file = False)
2020-05-25 14:42:44 +08:00
```
2020-11-20 23:20:37 +08:00
To manually pass in the configuration (ignore the ini file):
2020-05-25 14:42:44 +08:00
```python
from DrissionPage.config import DriverOptions
2020-11-18 14:55:12 +08:00
# Create a driver configuration object, read_file = False means not to read the ini file
do = DriverOptions(read_file = False)
# Set the path, if it has been set in the system variable, it can be ignored
do.set_paths(chrome_path ='D:\\chrome\\chrome.exe',
driver_path ='D:\\chrome\\chromedriver.exe')
# Settings for s mode
session_options = {'headers': {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)'}}
2020-11-12 17:57:37 +08:00
2020-11-18 14:55:12 +08:00
# Incoming configuration, driver_options and session_options are optional, you need to use the corresponding mode to pass in
drission = Drission(driver_options, session_options)
2020-11-12 17:57:37 +08:00
```
## Use page object MixPage
The MixPage page object encapsulates common web page operations and realizes the switch between driver and session modes.
MixPage must receive a Drission object and use the driver or session in it. If it is not passed in, MixPage will create a Drission by itself (using the configuration of the default ini file).
2020-11-20 23:20:37 +08:00
Tips: When multiple objects work together, you can pass the Drission object in one MixPage to another, so that multiple objects can share login information or operate the same page.
2020-11-12 17:57:37 +08:00
### Create Object
There are three ways to create objects: simple, passing in Drission objects, and passing in configuration. Can be selected according to actual needs.
```python
# Simple creation method, automatically create Drission objects with ini file default configuration
page = MixPage()
page = MixPage('s')
# Create by passing in the Drission object
page = MixPage(drission)
page = MixPage(drission, mode='s', timeout=5) # session mode, waiting time is 5 seconds (default 10 seconds)
2020-11-12 17:57:37 +08:00
2020-11-20 23:20:37 +08:00
# Incoming configuration information, MixPage internally creates Drission according to the configuration
page = MixPage(driver_options=DriverOption, session_options=SessionOption) # default d mode
2020-11-12 17:57:37 +08:00
```
### visit website
If there is an error in the connection, the program will automatically retry twice. The number of retries and the waiting interval can be specified.
```python
# Default mode
page.get(url)
page.post(url, data, **kwargs) # Only session mode has post method
2020-11-12 17:57:37 +08:00
# Specify the number of retries and interval
page.get(url, retry=5, interval=0.5)
```
### Switch mode
Switch between s and d modes, the cookies and the URL you are visiting will be automatically synchronized when switching.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
```python
page.change_mode(go=False) # If go is False, it means that the url is not redirected
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
### Page properties
```python
page.url # currently visited url
page.mode # current mode
page.drission # Dirssion object currently in use
page.driver # WebDirver object currently in use
page.session # Session object currently in use
page.cookies # Get cookies information
page.html # Page source code
page.title # Current page title
2020-11-12 17:57:37 +08:00
# d mode unique:
page.tabs_count # Return the number of tab pages
page.tab_handles # Return to the handle list of all tabs
page.current_tab_num # Return the serial number of the current tab page
page.current_tab_handle # Return to the current tab page handle
2020-11-12 17:57:37 +08:00
```
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### Page operation
2020-06-05 23:03:13 +08:00
2020-11-12 17:57:37 +08:00
When calling a method that only belongs to d mode, it will automatically switch to d mode. See APIs for detailed usage.
2020-05-25 14:42:44 +08:00
```python
page.change_mode() # switch mode
page.cookies_to_session() # Copy cookies from WebDriver object to Session object
page.cookies_to_driver() # Copy cookies from Session object to WebDriver object
page.get(url, retry, interval, **kwargs) # Use get to access the web page, you can specify the number of retries and the interval
page.ele(loc_or_ele, timeout) # Get the first element, node or attribute that meets the conditions
page.eles(loc_or_ele, timeout) # Get all eligible elements, nodes or attributes
page.download(url, save_path, rename, file_exists, **kwargs) # download file
page.close_driver() # Close the WebDriver object
page.close_session() # Close the Session object
2020-11-12 17:57:37 +08:00
# s mode unique:
page.post(url, data, retry, interval, **kwargs) # To access the webpage in post mode, you can specify the number of retries and the interval
2020-11-12 17:57:37 +08:00
# d mode unique:
page.wait_ele(loc_or_ele, mode, timeout) # Wait for the element to be deleted, displayed, and hidden from the dom
page.run_script(js, *args) # Run js statement
page.create_tab(url) # Create and locate a tab page, which is at the end
page.to_tab(num_or_handle) # Jump to tab page
page.close_current_tab() # Close the current tab page
page.close_other_tabs(num) # Close other tabs
page.to_iframe(iframe) # cut into iframe
page.screenshot(path) # Page screenshot
page.scrool_to_see(element) # Scroll until an element is visible
page.scroll_to(mode, pixel) # Scroll the page as indicated by the parameter, and the scroll direction is optional:'top','bottom','rightmost','leftmost','up','down','left', ' right'
page.refresh() # refresh the current page
page.back() # Browser back
page.et_window_size(x, y) # Set the browser window size, maximize by default
page.check_page() # Check whether the page meets expectations
page.chrome_downloading() # Get the list of files that chrome is downloading
page.process_alert(mode, text) # Process the prompt box
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
## Find element
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
ele() returns the first eligible element, and eles() returns a list of all eligible elements.
You can use these two functions under the page object or element object to find subordinate elements.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
page.eles() and element.eles() search and return a list of all elements that meet the conditions.
2020-05-25 14:42:44 +08:00
2020-11-18 14:55:12 +08:00
Description:
- The element search timeout is 10 seconds by default, you can also set it as needed.
- In the following search statement, the colon: indicates a fuzzy match, and the equal sign = indicates an exact match
- There are five types of query strings: @attribute name, tag, text, xpath, and css
2020-05-25 14:42:44 +08:00
```python
# Find by attribute
page.ele('@id:ele_id', timeout = 2) # Find the element whose id is ele_id and set the waiting time for 2 seconds
page.eles('@class') # Find all elements with class attribute
page.eles('@class:class_name') # Find all elements that have ele_class in class
page.eles('@class=class_name') # Find all elements whose class is equal to ele_class
2020-11-12 17:57:37 +08:00
# Find by tag name
page.ele('tag:li') # Find the first li element
page.eles('tag:li') # Find all li elements
2020-11-12 17:57:37 +08:00
# Find according to tag name and attributes
page.ele('tag:div@class=div_class') # Find the div element whose class is div_class
page.ele('tag:div@class:ele_class') # Find div elements whose class contains ele_class
page.ele('tag:div@class=ele_class') # Find div elements whose class is equal to ele_class
page.ele('tag:div@text():search_text') # Find div elements whose text contains search_text
page.ele('tag:div@text()=search_text') # Find the div element whose text is equal to search_text
2020-11-12 17:57:37 +08:00
# Find according to text content
page.ele('search text') # find the element containing the incoming text
page.eles('text:search text') # If the text starts with @, tag:, css:, xpath:, text:, add text: in front to avoid conflicts
page.eles('text=search text') # The text is equal to the element of search_text
2020-11-12 17:57:37 +08:00
# Find according to xpath or css selector
page.eles('xpath://div[@class="ele_class"]')
page.eles('css:div.ele_class')
# Find according to loc
loc1 = By.ID,'ele_id'
loc2 = By.XPATH,'//div[@class="ele_class"]'
2020-05-25 14:42:44 +08:00
page.ele(loc1)
page.ele(loc2)
2020-11-12 17:57:37 +08:00
# Find lower- level elements
2020-05-25 14:42:44 +08:00
element = page.ele('@id:ele_id')
element.ele('@class:class_name') # Find the first element whose class is ele_class at the lower level of element
element.eles('tag:li') # find all li elements under ele_id
2020-05-25 14:42:44 +08:00
2020-06-07 00:08:19 +08:00
# Find by location
element.parent # parent element
element.next # next sibling element
element.prev # previous sibling element
2020-06-07 00:08:19 +08:00
2020-11-12 17:57:37 +08:00
# Get shadow- dom, only support open shadow- root
ele1 = element.shadow_root.ele('tag:div')
# Chain search
2020-05-25 14:42:44 +08:00
page.ele('@id:ele_id').ele('tag:div').next.ele('some text').eles('tag:a')
2020-11-12 17:57:37 +08:00
# Simplified writing
2020-11-18 14:55:12 +08:00
eles = page('@id:ele_id')('tag:div').next('some text').eles('tag:a')
ele2 = ele1('tag:li').next('some text')
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
## Get element attributes
2020-05-25 14:42:44 +08:00
```python
element.html # Return element outerHTML
element.inner_html # Return element innerHTML
element.tag # Return element tag name
element.text # Return element innerText value
element.link # Returns absolute href or src value of the element.
element.texts() # Returns the text of all direct child nodes in the element, including elements and text nodes, you can specify to return only text nodes
element.attrs # Return a dictionary of all attributes of the element
element.attr(attr) # Return the value of the specified attribute of the element
element.css_path # Return the absolute css path of the element
element.xpath # Return the absolute xpath path of the element
element.parent # Return element parent element
element.next # Return the next sibling element of the element
element.prev # Return the previous sibling element of the element
element.parents(num) # Return the numth parent element
element.nexts(num, mode) # Return the following elements or nodes
element.prevs(num, mode) # Return the first few elements or nodes
element.ele(loc_or_str, timeout) # Return the first sub- element, attribute or node text of the current element that meets the conditions
element.eles(loc_or_str, timeout) # Return all eligible sub- elements, attributes or node texts of the current element
# d mode unique:
element.before # Get pseudo element before content
element.after # Get pseudo element after content
element.is_valid # Used to determine whether the element is still in dom
element.size # Get element size
element.location # Get element location
element.shadow_root # Get the ShadowRoot element under the element
element.get_style_property(style, pseudo_ele) # Get element style attribute value, can get pseudo element
element.is_selected() # Returns whether the element is selected
element.is_enabled() # Returns whether the element is available
element.is_displayed() # Returns whether the element is visible
2020-11-12 17:57:37 +08:00
```
## Element operation
Element operation is unique to d mode. Calling the following method will automatically switch to d mode.
```python
element.click(by_js) # Click the element, you can choose whether to click with js
element.input(value) # input text
element.run_script(js) # Run JavaScript script on the element
element.submit() # Submit
element.clear() # Clear the element
element.screenshot(path, filename) # Take a screenshot of the element
element.select(text) # Select the drop- down list based on the text
element.set_attr(attr, value) # Set element attribute value
2020-11-24 00:19:33 +08:00
element.remove_attr(attr) # remove a element attribute
element.drag(x, y, speed, shake) # Drag the relative distance of the element, you can set the speed and whether to shake randomly
element.drag_to(ele_or_loc, speed, shake) # Drag the element to another element or a certain coordinate, you can set the speed and whether to shake randomly
element.hover() # Hover the mouse over the element
2020-11-12 17:57:37 +08:00
```
2020-11-23 12:43:43 +08:00
## shadow-dom operation
Supports obtaining shadow-root and internal elements. The obtained shadow-root element type is ShadowRootElement. The usage is similar to normal elements, but the function is simplified.
**note:**
- Only open shadow-root can be obtained
- Find shadow-root internal elements cannot use xpath method
Get the shadow-root element attached to the ordinary element
```python
shadow_root_element = element.shadow_root # element is an ordinary element containing shadow-root
```
Properties and methods
```python
shadow_root_element.tag # return'shadow-root'
shadow_root_element.html # html content
shadow_root_element.parent # parent element
shadow_root_element.next # Next sibling element
shadow_root_element.parents(num) # Get upward num parent elements
shadow_root_element.nexts(num) # Get backward num sibling elements
shadow_root_element.ele(loc_or_str) # Get the first eligible internal element
shadow_root_element.eles(loc_or_str) # Get all eligible internal elements
shadow_root_element.run_scrpit(js_text) # Run js script
shadow_root_element.is_enabled() # Returns whether the element is available
shadow_root_element.is_valid() # Returns whether the element is still in dom
```
**Tips:** The elements obtained by the above attributes or methods are ordinary DriverElement. For usage, please refer to the above.
2020-11-24 14:47:42 +08:00
## Splicing with selenium or requests code
2020-11-12 17:57:37 +08:00
2020-11-24 14:47:42 +08:00
DrissionPage code can be seamlessly spliced with selenium and requests code. You can use Selenium's WebDriver object directly, or you can export your own WebDriver to selenium code. The Session object of requests can also be passed directly. Make the migration of existing projects very convenient.
2020-11-12 17:57:37 +08:00
### selenium to DrissionPage
```python
driver = webdriver.Chrome()
driver.get('https://www.baidu.com')
page = MixPage(Drission(driver)) # Pass the driver to Drission, create a MixPage object
2020-11-24 14:47:42 +08:00
print(page.title) # Print result: 百度一下,你就知道
element = driver.find_element_by_xpath('//div') # Use selenium native functions
2020-11-12 17:57:37 +08:00
```
### DrissionPage to selenium
```python
page = MixPage()
page.get('https://www.baidu.com')
driver = page.driver # Get the WebDriver object from the MixPage object
2020-11-24 14:47:42 +08:00
print(driver.title) # Print results: 百度一下,你就知道
```
### requests to DrissionPage
``` python
session = requets.Session()
drission = Drission(session_or_options=session)
page = MixPage(drission, mode='s')
page.get('https://www.baidu.com')
```
### DrissionPage to requests
```python
page = MixPage('s')
session = page.session
response = session.get('https://www.baidu.com')
2020-11-12 17:57:37 +08:00
```
## download file
Selenium lacks effective management of browser download files, and it is difficult to detect download status, rename, and fail management.
Using requests to download files can better achieve the above functions, but the code is more cumbersome.
Therefore, DrissionPage encapsulates the download method and integrates the advantages of the two. You can obtain login information from selenium and download it with requests.
To make up for the shortcomings of selenium, make the download simple and efficient.
### Features
- Specify download path
- Rename the file without filling in the extension, the program will automatically add
- When there is a file with the same name, you can choose to rename, overwrite, skip, etc.
- Show download progress
- Support post method
- Support custom connection parameters
### Demo
```python
url ='https://www.baidu.com/img/flexible/logo/pc/result.png' # file url
save_path = r'C:\download' # save path
2020-11-12 17:57:37 +08:00
# Rename to img.png, and automatically add a serial number to the end of the file name when there is a duplicate name to display the download progress
page.download(url, save_path,'img','rename', show_msg=True)
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
## Chrome Quick Settings
2020-07-20 18:15:42 +08:00
The configuration of chrome is very cumbersome. In order to simplify the use, this library provides setting methods for common configurations.
### DriverOptions Object
The DriverOptions object inherits from the Options object of selenium.webdriver.chrome.options, and the following methods are added to it:
```python
2020-11-20 23:20:37 +08:00
options.remove_argument(value) # Remove an argument value
options.remove_experimental_option(key) # delete an experimental_option setting
options.remove_all_extensions() # Remove all plugins
options.save() # Save the currently opened ini file
options.save('D:\\settings.ini') # Save to the specified path ini file
options.save('default') # Save the current settings to the default ini file
options.set_argument(arg, value) # set argument property
options.set_headless(on_off) # Set whether to use interfaceless mode
options.set_no_imgs(on_off) # Set whether to load images
options.set_no_js(on_off) # Set whether to disable js
options.set_mute(on_off) # Set whether to mute
options.set_user_agent(user_agent) # set user agent
options.set_proxy(proxy) # Set proxy address
options.set_paths(driver_path, chrome_path, debugger_address, download_path, user_data_path, cache_path) # Set browser-related paths
2020-07-20 18:15:42 +08:00
```
2020-11-12 17:57:37 +08:00
2020-07-20 18:15:42 +08:00
### Instructions
```python
2020-11-20 23:20:37 +08:00
do = DriverOptions(read_file=False) # Create chrome configuration object, do not read from ini file
do.set_headless(False) # show the browser interface
do.set_no_imgs(True) # Do not load pictures
do.set_paths(driver_path='D:\\chromedriver.exe', chrome_path='D:\\chrome.exe') # set path
do.set_headless(False).set_no_imgs(True) # Support chain operation
2020-08-02 11:38:50 +08:00
2020-11-20 23:20:37 +08:00
drission = Drission(driver_options=do) # Create Drission object with configuration object
page = MixPage(drission) # Create MixPage object with Drission object
2020-07-20 18:15:42 +08:00
2020-11-20 23:20:37 +08:00
do.save() # Save the currently opened ini file
do.save('default') # Save the current settings to the default ini file
2020-07-20 18:15:42 +08:00
```
2020-05-25 14:42:44 +08:00
## Save configuration
2020-11-12 17:57:37 +08:00
Because there are many configurations of chrome and headers, an ini file is set up specifically to save common configurations. You can use the OptionsManager object to get and save the configuration, and use the DriverOptions object to modify the chrome configuration. You can also save multiple ini files and call them according to different projects.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Tips: It is recommended to save the commonly used configuration files to another path to prevent the configuration from being reset when the library is upgraded.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### ini file content
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
The ini file has three parts by default: paths, chrome_options, and session_options. The initial content is as follows.
2020-05-25 14:42:44 +08:00
```ini
[paths]
; chromedriver.exe path
chromedriver_path =
2020-11-12 17:57:37 +08:00
; Temporary folder path, used to save screenshots, file downloads, etc.
2020-05-25 14:42:44 +08:00
global_tmp_path =
[chrome_options]
2020-11-12 17:57:37 +08:00
; The address and port of the opened browser, such as 127.0.0.1:9222
2020-05-25 14:42:44 +08:00
debugger_address =
; chrome.exe path
binary_location =
; Configuration information
arguments = [
; Hide browser window
2020-11-12 17:57:37 +08:00
'- - headless',
2020-05-25 14:42:44 +08:00
; Mute
2020-11-12 17:57:37 +08:00
'- - mute- audio',
2020-05-25 14:42:44 +08:00
; No sandbox
2020-11-12 17:57:37 +08:00
'- - no- sandbox',
; Google documentation mentions that this attribute needs to be added to avoid bugs
'- - disable- gpu',
; Ignore warning
'ignore- certificate- errors',
; Do not display the information bar
'- - disable- infobars'
2020-05-25 14:42:44 +08:00
]
; Plugin
extensions = []
; Experimental configuration
experimental_options = {
'prefs': {
2020-11-12 17:57:37 +08:00
; Download does not pop up
2020-05-25 14:42:44 +08:00
'profile.default_content_settings.popups': 0,
2020-11-12 17:57:37 +08:00
; No popup
2020-05-25 14:42:44 +08:00
'profile.default_content_setting_values': {'notifications': 2},
; Disable PDF plugin
2020-06-05 23:03:13 +08:00
'plugins.plugins_list': [{"enabled": False, "name": "Chrome PDF Viewer"}]
},
2020-11-12 17:57:37 +08:00
; Set to developer mode, anti- reptile
'excludeSwitches': ["enable- automation"],
2020-05-25 14:42:44 +08:00
'useAutomationExtension': False
}
[session_options]
headers = {
2020-11-12 17:57:37 +08:00
"User- Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8",
2020-05-25 14:42:44 +08:00
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
2020-11-12 17:57:37 +08:00
"Connection": "keep- alive",
"Accept- Charset": "utf- 8;q=0.7,*;q=0.7"
2020-05-25 14:42:44 +08:00
}
```
2020-11-12 17:57:37 +08:00
### OptionsManager Object
The OptionsManager object is used to read, set and save the configuration.
2020-05-25 14:42:44 +08:00
```python
2020-11-20 23:20:37 +08:00
manager.paths # Return path settings in dictionary form
manager.chrome_options # Return chrome settings in dictionary form
manager.session_options # Return session settings in dictionary form
manager.get_value(section, item) # Get the value of a configuration
manager.get_option(section) # Return all attributes of configuration in dictionary format
manager.set_item(section, item, value) # Set configuration properties
manager.manager.save() # Save the currently opened ini file
manager.save('D:\\settings.ini') # Save to the specified path ini file
manager.save('default') # Save the current settings to the default ini file
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
2020-07-20 18:15:42 +08:00
### Usage example
2020-05-25 14:42:44 +08:00
2020-07-20 18:15:42 +08:00
```python
from DrissionPage.configs import *
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
options_manager = OptionsManager() # Create OptionsManager object from the default ini file
options_manager = OptionsManager('D:\\settings.ini') # Create OptionsManager object from other ini files
driver_path = options_manager.get_value('paths','chromedriver_path') # read path information
options_manager.save() # Save the currently opened ini file
options_manager.save('D:\\settings.ini') # Save to the specified path ini file
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
drission = Drission(ini_path='D:\\settings.ini') # Use the specified ini file to create the object
2020-05-25 14:42:44 +08:00
```
2020-06-05 23:03:13 +08:00
2020-11-12 17:57:37 +08:00
## easy_set method
2020-06-05 23:03:13 +08:00
2020-11-20 23:20:37 +08:00
The methods of frequently used settings can be quickly modified. Calling the easy_set method will modify the content of the default ini file.
2020-06-05 23:03:13 +08:00
```python
set_headless(True) # Turn on headless mode
set_no_imgs(True) # Turn on no image mode
set_no_js(True) # Disable JS
set_mute(True) # Turn on mute mode
set_user_agent('Mozilla/5.0 (Macintosh; Int......') # set user agent
set_proxy('127.0.0.1:8888') # set proxy
set_paths(paths) # See [Initialization] section
set_argument(arg, value) # Set the attribute. If the attribute has no value (such as'zh_CN.UTF- 8'), the value is bool, which means switch; otherwise, the value is str. When the value is'' or False, delete the attribute item
2020-06-05 23:03:13 +08:00
```
2020-07-20 18:15:42 +08:00
# POM mode
2020-05-25 14:42:44 +08:00
***
2020-11-12 17:57:37 +08:00
MixPage encapsulates common page operations and can be easily used for extension.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Example: extend a list page reading class
2020-05-25 14:42:44 +08:00
```python
import re
from time import sleep
from DrissionPage import *
class ListPage(MixPage):
2020-11-12 17:57:37 +08:00
"""This class encapsulates the method of reading the list page. According to the necessary 4 elements, the isomorphic list page can be read
(Chinese variable is really fragrant) """
2020-05-25 14:42:44 +08:00
def __init__(self, drission: Drission, url: str = None, **xpaths):
super().__init__(drission)
self._url = url
self.xpath_column name = xpaths['column name'] # [xpath string, regular expression]
2020-11-12 17:57:37 +08:00
self.xpath_next page = xpaths['next page']
self.xpath_lines = xpaths['line']
self.xpath_page number = xpaths['page number'] # [xpath string, regular expression]
2020-11-12 17:57:37 +08:00
self.total pages = self.get_total pages()
2020-05-25 14:42:44 +08:00
if url:
self.get(url)
2020-11-12 17:57:37 +08:00
def get_column name (self) - > str:
if self.xpath_ column name[1]:
s = self.ele(f'xpath:{self.xpath_column name[0]}').text
r = re.search(self.xpath_column name[1], s)
2020-05-25 14:42:44 +08:00
return r.group(1)
else:
2020-11-12 17:57:37 +08:00
return self.ele(f'xpath:{self.xpath_column name[0]}').text
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
def get_total number of pages (self) - > int:
if self.xpath_page number[1]:
s = self.ele(f'xpath:{self.xpath_number of pages[0]}').text
r = re.search(self.xpath_number of pages[1], s)
2020-05-25 14:42:44 +08:00
return int(r.group(1))
else:
2020-11-12 17:57:37 +08:00
return int(self.ele(f'xpath:{self.xpath_number of pages[0]}').text)
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
def click_next page(self, wait: float = None):
self.ele(f'xpath:{self.xpath_next page}').click()
2020-05-25 14:42:44 +08:00
if wait:
sleep(wait)
2020-11-12 17:57:37 +08:00
def get_ current page list (self, content to be crawled: list) - > list:
2020-05-25 14:42:44 +08:00
"""
2020-11-12 17:57:37 +08:00
Format of content to be crawled: [[xpath1,parameter1],[xpath2,parameter2]...]
Return list format: [[Parameter1,Parameter2...],[Parameter1,Parameter2...]...]
2020-05-25 14:42:44 +08:00
"""
2020-11-12 17:57:37 +08:00
Result list = []
Line s = self.eles(f'xpath:{self.xpath_lines}')
for line in line s:
Row result = []
for j in content to be crawled:
Line result.append(line.ele(f'xpath:{j[0]}').attr(j[1]))
Result list.append (row result)
print(line result)
return result list
def get_list(self, content to be crawled: list, wait: float = None) - > list:
List = self.get_ current page list (content to be crawled)
for _ in range(self. total pages- 1):
self.click_next page(wait)
List.extend(self.get_current page list (content to be crawled))
return list
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
# Other
2020-05-25 14:42:44 +08:00
***
## DriverPage and SessionPage
2020-11-12 17:57:37 +08:00
If you don't need to switch modes, you can only use DriverPage or SessionPage as needed, and the usage is the same as MixPage.
2020-05-25 14:42:44 +08:00
```python
from DrissionPage.session_page import SessionPage
from DrissionPage.drission import Drission
session = Drission().session
page = SessionPage(session) # Pass in Session object
2020-05-25 14:42:44 +08:00
page.get('http://www.baidu.com')
print(page.ele('@id:su').text) # Output: Baidu
2020-05-25 14:42:44 +08:00
driver = Drission().driver
page = DriverPage(driver) # Pass in Driver object
2020-05-25 14:42:44 +08:00
page.get('http://www.baidu.com')
print(page.ele('@id:su').text) # Output: Baidu
2020-05-25 14:42:44 +08:00
```
2020-11-12 17:57:37 +08:00
# APIs
***
## Drission Class
### class Drission()
The Drission class is used to manage WebDriver objects and Session objects, and is the role of the driver.
Parameter Description:
- driver_or_options: [WebDriver, dict, Options] - WebDriver object or chrome configuration parameters.
- session_or_options: [Session, dict] - Session object configuration parameters
- ini_path: str - ini file path, the default is the ini file under the DrissionPage folder
- proxy: dict - proxy settings
### session
Return the Session object, which is automatically initialized according to the configuration information.
Returns: Session- the managed Session object
### driver
Return the WebDriver object, which is automatically initialized according to the configuration information.
Returns: WebDriver- Managed WebDriver object
### driver_options
Return or set the driver configuration.
Returns: dict
### session_options
Return to session configuration.
Returns: dict
### session_options()
Set the session configuration.
Returns: None
### proxy
Return to proxy configuration.
Returns: dict
### cookies_to_session()
Copy the cookies of the driver object to the session object.
Parameter Description:
- copy_user_agent: bool - whether to copy user_agent to session
- driver: WebDriver- Copy the WebDriver object of cookies
- session: Session- Session object that receives cookies
Returns: None
### cookies_to_driver()
Copy cookies from session to driver.
Parameter Description:
- url: str - the domain of cookies
- driver: WebDriver- WebDriver object that receives cookies
- session: Session- Copy the Session object of cookies
Returns: None
### user_agent_to_session()
Copy the user agent from the driver to the session.
Parameter Description:
- driver: WebDriver- WebDriver object, copy user agent
- session: Session- Session object, receiving user agent
Returns: None
### close_driver()
Close the browser and set the driver to None.
Returns: None
### close_session()
Close the session and set it to None.
Returns: None
### close()
Close the driver and session.
Returns: None
## MixPage Class
### class MixPage()
MixPage encapsulates the common functions of page operation and can seamlessly switch between driver and session modes. Cookies are automatically synchronized when switching.
The function of obtaining information is shared by the two modes, and the function of operating page elements is only available in mode d. Calling a function unique to a certain mode will automatically switch to that mode.
It inherits from DriverPage and SessionPage classes, these functions are implemented by these two classes, and MixPage exists as a scheduling role.
Parameter Description:
- drission: Drission - Drission object, if not passed in, create one. Quickly configure the corresponding mode when's' or'd' is passed in
- mode: str - mode, optional'd' or's', default is'd'
- timeout: float - timeout, driver mode is the time to find elements, session mode is the connection waiting time
### url
Returns the URL currently visited by the MixPage object.
Returns: str
### mode
Returns the current mode ('s' or'd').
Returns: str
### drission
Returns the Dirssion object currently in use.
Returns: Drission
### driver
Return the driver object, if not, create it, and switch to driver mode when calling.
Returns: WebDriver
### session
Return the session object, if not, create it.
Returns: Session
### response
Return the Response object obtained in s mode, and switch to s mode when called.
Returns: Response
### cookies
Return cookies, obtained from the current mode.
Returns: [dict, list]
### html
Return the html text of the page.
Returns: str
### title
Return to the page title.
Returns: str
### url_available
Returns the validity of the current url.
Returns: bool
### change_mode()
Switch mode,'d' or's'. When switching, the cookies of the current mode will be copied to the target mode.
Parameter Description:
- mode: str - Specify the target mode,'d' or's'.
- go: bool - whether to jump to the current url after switching mode
Returns: None
### ele()
Return the eligible elements on the page, the first one is returned by default.
2020-11-19 23:55:50 +08:00
If the query parameter is a string, the options of '@attribute name:', 'tag:', 'text:', 'css:', 'xpath:', '.', '#' are available. When there is no control mode, the text mode is used to search by default.
2020-11-12 17:57:37 +08:00
If it is loc, query directly according to the content.
Parameter Description:
- loc_or_str: [Tuple[str, str], str, DriverElement, SessionElement, WebElement] - The positioning information of the element, which can be an element object, a loc tuple, or a query string
- mode: str - 'single' or'all', corresponding to find one or all
- timeout: float - Find the timeout of the element, valid in driver mode
Example:
- When the element object is received: return the element object object
- Find with loc tuple:
- ele.ele((By.CLASS_NAME,'ele_class')) - returns the first child element whose class is ele_class
- Find with query string:
2020-11-19 23:55:50 +08:00
Attributes, tag name and attributes, text, xpath, css selector, id, class.
2020-11-12 17:57:37 +08:00
2020-11-19 23:55:50 +08:00
@ Means attribute,. Means class, # means id, = means exact match,: means fuzzy match, the string is searched by default when there is no control string.
2020-11-12 17:57:37 +08:00
2020-11-19 23:55:50 +08:00
- page.ele('.ele_class') - returns the first element whose name is equal to ele_name
- page.ele('.:ele_class') - returns the element with ele_class in the first class
- page.ele('#ele_id') - Return the first element with id ele_id
- page.ele('#:ele_id') - Returns the element with ele_id in the first id
2020-11-12 17:57:37 +08:00
- page.ele('@class:ele_class') - returns the element with ele_class in the first class
- page.ele('@name=ele_name') - returns the first element whose name is equal to ele_name
- page.ele('@placeholder') - returns the first element with placeholder attribute
- page.ele('tag:p') - return the first p element
- page.ele('tag:div@class:ele_class') - returns the first class div element with ele_class
- page.ele('tag:div@class=ele_class') - returns the first div element whose class is equal to ele_class
- page.ele('tag:div@text():some_text') - returns the first div element whose text contains some_text
- page.ele('tag:div@text()=some_text') - returns the first div element whose text is equal to some_text
- page.ele('text:some_text') - returns the first element whose text contains some_text
- page.ele('some_text') - returns the first text element containing some_text (equivalent to the previous line)
- page.ele('text=some_text') - returns the first element whose text is equal to some_text
- page.ele('xpath://div[@class="ele_class"]') - return the first element that matches xpath
- page.ele('css:div.ele_class') - returns the first element that matches the css selector
Returns: [DriverElement, SessionElement, str] - element object or attribute, text node text
### eles()
Get the list of elements that meet the conditions according to the query parameters. The query parameter usage method is the same as the ele method.
Parameter Description:
- loc_or_str: [Tuple[str, str], str] - query condition parameter
- timeout: float - Find the timeout of the element, valid in driver mode
Returns: [List[DriverElement or str], List[SessionElement or str]] - a list of element objects or attributes and text node text
### cookies_to_session()
Copy cookies from the WebDriver object to the Session object.
Parameter Description:
- copy_user_agent: bool - whether to copy user agent at the same time
Returns: None
### cookies_to_driver()
Copy cookies from the Session object to the WebDriver object.
Parameter Description:
- url: str - the domain or url of cookies
Returns: None
### get()
To jump to a url, synchronize cookies before the jump, and return whether the target url is available after the jump.
Parameter Description:
- url: str - target url
- go_anyway: bool - Whether to force a jump. If the target url is the same as the current url, it will not redirect by default.
- show_errmsg: bool - whether to display and throw an exception
- retry: int - the number of retries when a connection error occurs
- interval: float - Retry interval (seconds)
- **kwargs - connection parameters for requests
Returns: [bool, None] - whether the url is available
### post()
Jump in post mode, automatically switch to session mode when calling.
Parameter Description:
- url: str - target url
- data: dict - submitted data
- go_anyway: bool - Whether to force a jump. If the target url is the same as the current url, it will not redirect by default.
- show_errmsg: bool - whether to display and throw an exception
- retry: int - the number of retries when a connection error occurs
- interval: float - Retry interval (seconds)
- **kwargs - connection parameters for requests
Returns: [bool, None] - whether the url is available
### download()
Download a file, return whether it is successful and the download information string. This method will automatically avoid the same name with the existing file in the target path.
Parameter Description:
- file_url: str - file url
- goal_path: str - storage path, the default is the temporary folder specified in the ini file
- rename: str - rename the file without changing the extension
- file_exists: str - If there is a file with the same name, you can choose'rename','overwrite','skip' to process
- post_data: dict - data submitted in post mode
- show_msg: bool - whether to show download information
- show_errmsg: bool - whether to display and throw an exception
- **kwargs - connection parameters for requests
Returns: Tuple[bool, str] - a tuple of whether the download was successful (bool) and status information (the information is the file path when successful)
The following methods and properties only take effect in driver mode, and will automatically switch to driver mode when called
***
### tabs_count
Returns the number of tab pages.
Returns: int
### tab_handles
Returns the handle list of all tabs.
Returns: list
### current_tab_num
Returns the serial number of the current tab page.
Returns: int
### current_tab_handle
Returns the handle of the current tab page.
Returns: str
### wait_ele()
Wait for the element to be deleted, displayed, and hidden from the dom.
Parameter Description:
- loc_or_ele: [str, tuple, DriverElement, WebElement] - Element search method, same as ele()
- mode: str - waiting mode, optional:'del','display','hidden'
- timeout: float - waiting timeout
Returns: bool - whether the wait is successful
### check_page()
In d mode, check whether the web page meets expectations. The response status is checked by default, and can be overloaded to achieve targeted checks.
Parameter Description:
- by_requests: bool - Force the use of built- in response for checking
Return: [bool, None] - bool is available, None is unknown
### run_script()
Execute JavaScript code.
Parameter Description:
- script: str - JavaScript code text
- *args - incoming parameters
Returns: Any
### create_tab()
Create and locate a tab page, which is at the end.
Parameter Description:
- url: str - the URL to jump to the new tab page
Returns: None
### close_current_tab()
Close the current tab.
Returns: None
### close_other_tabs()
Close tab pages other than the incoming tab page, and keep the current page by default.
Parameter Description:
- num_or_handle:[int, str] - The serial number or handle of the tab to keep, the first serial number is 0, and the last is - 1
Returns: None
### to_tab()
Jump to the tab page.
Parameter Description:
- num_or_handle:[int, str] - tab page serial number or handle string, the first serial number is 0, the last is - 1
Returns: None
### to_iframe()
Jump to iframe, jump to the highest level by default, compatible with selenium native parameters.
Parameter Description:
- loc_or_ele:[int, str, tuple, WebElement, DriverElement] - Find the condition of iframe element, can receive iframe serial number (starting at 0), id or name, query string, loc parameter, WebElement object, DriverElement object, and pass in ' main' jump to the highest level, and pass in'parent' to jump to the upper level
Example:
- to_iframe('tag:iframe')- locate by the query string passed in iframe
- to_iframe('iframe_id')- Positioning by the id attribute of the iframe
- to_iframe('iframe_name')- locate by the name attribute of iframe
- to_iframe(iframe_element)- locate by passing in the element object
- to_iframe(0)- locate by the serial number of the iframe
- to_iframe('main')- jump to the top level
- to_iframe('parent')- jump to the previous level
Returns: None
### scroll_to_see()
Scroll until the element is visible.
Parameter Description:
- loc_or_ele:[str, tuple, WebElement, DriverElement] - The conditions for finding elements are the same as those of the ele() method.
Returns: None
### scroll_to()
Scroll the page and decide how to scroll according to the parameters.
Parameter Description:
- mode: str - scroll direction, top, bottom, rightmost, leftmost, up, down, left, right
- pixel: int - scrolling pixel
Returns: None
### refresh()
refresh page.
Returns: None
### back()
The page goes back.
Returns: None
### set_window_size()
Set the window size, maximize by default.
Parameter Description:
- x: int - target width
- y: int - target height
Returns: None
### screenshot()
Take a screenshot of the web page and return the path of the screenshot file
Parameter Description:
- path: str - The screenshot save path, the default is the temporary folder specified in the ini file
- filename: str - the name of the screenshot file, the default is the page title as the file name
Returns: str
### chrome_downloading()
Return to the list of files downloaded by the browser.
Parameter Description:
- download_path: str - download folder path
Returns: list
### process_alert()
Process the prompt box.
Parameter Description:
- mode: str - 'ok' or'cancel', if another value is entered, the button will not be pressed but the text value will still be returned
- text: str - You can enter text when processing the prompt box
Returns: [str, None] - the text of the prompt box content
### close_driver()
Close the driver and browser.
Returns: None
### close_session()
Close the session.
Returns: None
## DriverElement class
### class DriverElement()
The element object in driver mode encapsulates a WebElement object and encapsulates common functions.
Parameter Description:
- ele: WebElement- WebElement object
- page: DriverPage- the page object where the element is located
- timeout: float - Find the timeout of the element (it can be set separately each time the element is searched)
### inner_ele
The wrapped WebElement object.
Returns: WebElement
### html
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the outerHTML text of the element.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### inner_html
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the innerHTML text of the element.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### tag
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the element tag name.
2020-05-25 23:52:38 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 23:52:38 +08:00
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### attrs
2020-06-18 14:33:05 +08:00
2020-11-12 17:57:37 +08:00
Return all attributes and values of the element in a dictionary.
2020-06-18 14:33:05 +08:00
2020-11-12 17:57:37 +08:00
Returns: dict
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### text
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the text inside the element.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
### link
Returns absolute href or src value of the element.
Returns: str
2020-11-12 17:57:37 +08:00
### css_path
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the absolute path of the element css selector.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### xpath
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the absolute path of the element xpath.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### parent
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the parent element object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: DriverElement
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### next
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return the next sibling element object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: DriverElement
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### prev
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the previous sibling element object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: DriverElement
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### size
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return the element size in a dictionary.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: dict
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### location
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Replace the element coordinates in a dictionary.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: dict
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### shadow_root
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the shadow_root element object of the current element
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: ShadowRoot
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### before
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the content of the ::before pseudo- element of the current element
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### after
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the content of the ::after pseudo element of the current element
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### texts()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the text of all direct child nodes within the element, including elements and text nodes
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- text_node_only: bool - whether to return only text nodes
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: List[str]
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### parents()
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Returns the Nth level parent element object.
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
- num: int - which level of parent element
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Returns: DriverElement
2020-08-13 11:44:26 +08:00
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### nexts()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the text of the numth sibling element or node.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- num: int - the next sibling element or node
- mode: str - 'ele','node' or'text', matching element, node, or text node
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: [DriverElement, str]
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### prevs()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the text of the previous num sibling element or node.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- num: int - the previous sibling element or node
- mode: str - 'ele','node' or'text', matching element, node, or text node
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: [DriverElement, str]
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### attr()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Get the value of an attribute of an element.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- attr: str - attribute name
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### ele()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the sub- elements, attributes or node texts of the current element that meet the conditions.
2020-11-19 23:55:50 +08:00
If the query parameter is a string, the options of '@attribute name:', 'tag:', 'text:', 'css:', 'xpath:', '.', '#' are available. When there is no control mode, the text mode is used to search by default.
2020-11-12 17:57:37 +08:00
If it is loc, query directly according to the content.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- loc_or_str: [Tuple[str, str], str] - the positioning information of the element, which can be a loc tuple or a query string
- mode: str - 'single' or'all', corresponding to find one or all
- timeout: float - Find the timeout of the element
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Example:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- Find with loc tuple:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- ele.ele((By.CLASS_NAME,'ele_class')) - returns the first child element whose class is ele_class
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- Find with query string:
2020-05-25 14:42:44 +08:00
2020-11-19 23:55:50 +08:00
Attributes, tag name and attributes, text, xpath, css selector, id, class.
2020-05-25 14:42:44 +08:00
2020-11-19 23:55:50 +08:00
@ Means attribute,. Means class, # means id, = means exact match,: means fuzzy match, the string is searched by default when there is no control string.
2020-05-25 14:42:44 +08:00
2020-11-19 23:55:50 +08:00
- ele.ele('.ele_class')-returns the first child element whose class is ele_class
- ele.ele('.:ele_class')-returns the child elements of the first class that contain ele_class
- ele.ele('#ele_id')-returns the first child element with id ele_id
- ele.ele('#:ele_id')-Returns the child element with ele_id in the first id
- ele.ele('@class:ele_class')-returns the first class that contains e le_class
- ele.ele('@name=ele_name')-returns the first child element whose name is equal to ele_name
- ele.ele('@placeholder')-returns the first child element with placeholder attribute
- ele.ele('tag:p')-returns the first p child element
- ele.ele('tag:div@class:ele_class')-returns the first div sub-element that contains ele_class
- ele.ele('tag:div@class=ele_class')-returns the first div child element whose class is equal to ele_class
- ele.ele('tag:div@text():some_text')-returns the first div child element whose text contains some_text
- ele.ele('tag:div@text()=some_text')-returns the first div child element whose text is equal to some_tex t
- ele.ele('text:some_text')-returns the first child element whose text contains some_text
- ele.ele('some_text')-returns the first text element with some_text (equivalent to the previous line)
- ele.ele('text=some_text')-returns the first child element whose text is equal to some_text
- ele.ele('xpath://div[@class="ele_class"]')-Return the first child element that matches xpath
- ele.ele('css:div.ele_class')-returns the first child element that matches the css selector
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: [DriverElement, str]
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### eles()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Get the list of elements that meet the conditions according to the query parameters. The query parameter usage method is the same as the ele method.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- loc_or_str: [Tuple[str, str], str] - query condition parameter
- timeout: float - Find the timeout of the element
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: List[DriverElement or str]
2020-08-17 10:26:07 +08:00
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
### get_style_property()
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Returns the element style attribute value.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- style: str - style attribute name
- pseudo_ele: str - pseudo element name
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-08-12 16:24:14 +08:00
2020-11-12 17:57:37 +08:00
### click()
2020-08-12 16:24:14 +08:00
2020-11-12 17:57:37 +08:00
Click on the element. If it is unsuccessful, click in js mode. You can specify whether to click in js mode.
2020-08-12 16:24:14 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-08-12 16:24:14 +08:00
2020-11-12 17:57:37 +08:00
- by_js: bool - whether to click with js
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### input()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Enter text and return whether it is successful.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- value: str - text value
- clear: bool - whether to clear the text box before typing
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-06-02 22:48:55 +08:00
2020-11-12 17:57:37 +08:00
### run_script()
2020-06-02 22:48:55 +08:00
2020-11-12 17:57:37 +08:00
Execute the js code and pass in yourself as the first parameter.
2020-06-02 22:48:55 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- script: str - JavaScript text
- *args - incoming parameters
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: Any
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### submit()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
submit Form.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### clear()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Clear the text box.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### is_selected()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Whether the element is selected.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### is_enabled()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Whether the element is available on the page.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### is_displayed()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Whether the element is visible.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### is_valid()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Whether the element is still in the DOM. This method is used to determine when the page jump element cannot be used
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-05-26 14:09:08 +08:00
2020-11-12 17:57:37 +08:00
### screenshot()
2020-05-26 14:09:08 +08:00
2020-11-12 17:57:37 +08:00
Take a screenshot of the web page and return the path of the screenshot file
2020-05-26 14:09:08 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- path: str - The screenshot save path, the default is the temporary folder specified in the ini file
- filename: str - the name of the screenshot file, the default is the page title as the file name
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### select()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Select from the drop- down list.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- text: str - option text
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool - success
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_attr()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Set element attributes.
2020-05-25 14:42:44 +08:00
Parameter Description:
2020-11-12 17:57:37 +08:00
- attr: str - parameter name
- value: str - parameter value
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool - whether it was successful
2020-05-25 14:42:44 +08:00
2020-07-03 17:54:37 +08:00
2020-11-24 00:19:33 +08:00
### remove_attr()
Remove element attributes.
Parameter Description:
- attr: str -parameter name
Returns: bool - whether it was successful
2020-11-12 17:57:37 +08:00
### drag()
2020-07-03 17:54:37 +08:00
2020-11-12 17:57:37 +08:00
Drag the current element a certain distance, and return whether the drag is successful.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- x: int - drag distance in x direction
- y: int - drag distance in y direction
- speed: int - drag speed
- shake: bool - whether to shake randomly
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### drag_to()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Drag the current element, the target is another element or coordinate tuple, and return whether the drag is successful.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-08-11 10:16:54 +08:00
2020-11-12 17:57:37 +08:00
- ele_or_loc[tuple, WebElement, DrissionElement] - Another element or relative current position, the coordinates are the coordinates of the element's midpoint.
- speed: int - drag speed
- shake: bool - whether to shake randomly
2020-08-11 10:16:54 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### hover()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Hover the mouse over the element.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-05-25 14:42:44 +08:00
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
## SessionElement Class
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### class SessionElement()
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
The element object in session mode encapsulates an Element object and encapsulates common functions.
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
- ele: HtmlElement - HtmlElement object of lxml library
- page: SessionPage - the page object where the element is located
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
### inner_ele
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
The wrapped HTMLElement object.
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Returns: HtmlElement
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### html
2020-08-13 11:44:26 +08:00
Returns the outerHTML text of the element.
2020-05-25 14:42:44 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
### inner_html
2020-05-25 14:42:44 +08:00
Returns the innerHTML text of the element.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
### tag
2020-05-25 14:42:44 +08:00
Returns the element tag name.
2020-08-13 11:44:26 +08:00
Returns: srt
2020-08-13 11:44:26 +08:00
### attrs
2020-05-25 14:42:44 +08:00
Returns the names and values of all attributes of the element in dictionary format.
Returns: dict
### text
Returns the text within the element, namely innerText.
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
### link
2020-05-25 14:42:44 +08:00
Returns absolute href or src value of the element.
2020-05-25 14:42:44 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### css_path
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the absolute path of the element css selector.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: srt
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### xpath
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the absolute path of the element xpath.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: srt
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### parent
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the parent element object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: SessionElement
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### next
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return the next sibling element object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: SessionElement
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### prev
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the previous sibling element object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: SessionElement
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### parents()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the Nth level parent element object.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- num: int - which level of parent element
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: SessionElement
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### nexts()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns the text of the numth sibling element or node.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- num- the next few sibling elements
- mode: str - 'ele','node' or'text', matching element, node, or text node
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: [SessionElement, str]
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### prevs()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return the first N sibling element objects.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- num- the first few sibling elements
- mode: str - 'ele','node' or'text', matching element, node, or text node
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: [SessionElement, str]
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### attr()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Get the value of an attribute of an element.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- attr: str - attribute name
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-07-27 17:11:32 +08:00
2020-11-12 17:57:37 +08:00
### ele()
2020-07-27 17:11:32 +08:00
2020-11-12 17:57:37 +08:00
Get elements based on query parameters.
2020-11-19 23:55:50 +08:00
If the query parameter is a string, you can choose the methods of '@attribute name:', 'tag:', 'text:', 'css:', 'xpath:', '.', '#'. When there is no control mode, the text mode is used to search by default.
2020-11-12 17:57:37 +08:00
If it is loc, query directly according to the content.
2020-07-27 17:11:32 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-26 14:09:08 +08:00
2020-11-12 17:57:37 +08:00
- loc_or_str:[Tuple[str, str], str] - query condition parameter
2020-05-26 14:09:08 +08:00
2020-11-12 17:57:37 +08:00
- mode: str - Find one or more, pass in'single' or'all'
2020-05-26 14:09:08 +08:00
2020-11-12 17:57:37 +08:00
Example:
2020-05-26 14:09:08 +08:00
2020-11-12 17:57:37 +08:00
- Find with loc tuple:
- ele.ele((By.CLASS_NAME,'ele_class')) - returns the first child element whose class is ele_class
- Find with query string:
2020-07-03 11:06:04 +08:00
2020-11-19 23:55:50 +08:00
Attributes, tag name and attributes, text, xpath, css selector, id, class.
@ Means attribute,. Means class, # means id, = means exact match,: means fuzzy match, the string is searched by default when there is no control string.
- ele.ele('.ele_class')-returns the first child element whose class is ele_class
- ele.ele('.:ele_class')-returns the child elements of the first class that contain ele_class
- ele.ele('#ele_id')-returns the first child element with id ele_id
- ele.ele('#:ele_id')-Returns the child element with ele_id in the first id
- ele.ele('@class:ele_class')-returns the first class that contains e le_class
- ele.ele('@name=ele_name')-returns the first child element whose name is equal to ele_name
- ele.ele('@placeholder')-returns the first child element with placeholder attribute
- ele.ele('tag:p')-returns the first p child element
- ele.ele('tag:div@class:ele_class')-returns the first div sub-element that contains ele_class
- ele.ele('tag:div@class=ele_class')-returns the first div child element whose class is equal to ele_class
- ele.ele('tag:div@text():some_text')-returns the first div child element whose text contains some_text
- ele.ele('tag:div@text()=some_text')-returns the first div child element whose text is equal to some_tex t
2020-07-03 11:06:04 +08:00
2020-11-19 23:55:50 +08:00
- ele.ele('text:some_text')-returns the first child element whose text contains some_text
2020-07-03 11:06:04 +08:00
2020-11-19 23:55:50 +08:00
- ele.ele('some_text')-returns the first text element with some_text (equivalent to the previous line)
- ele.ele('text=some_text')-returns the first child element whose text is equal to some_text
- ele.ele('xpath://div[@class="ele_class"]')-Return the first child element that matches xpath
- ele.ele('css:div.ele_class')-returns the first child element that matches the css selector
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: [SessionElement, str]
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### eles()
Get the list of elements that meet the conditions according to the query parameters. The query parameter usage method is the same as the ele method.
2020-05-25 14:42:44 +08:00
Parameter Description:
2020-11-12 17:57:37 +08:00
- loc_or_str: [Tuple[str, str], str] - query condition parameter
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: List[SessionElement or str]
2020-05-25 14:42:44 +08:00
2020-11-23 12:43:43 +08:00
## ShadowRootElement class
### class ShadowRootElement()
The shadow-root element within the element.
Parameter Description:
- inner_ele: WebElement-the shadow-root element obtained by selenium
- parent_ele: DriverElement-the element to which the shadow-root is attached
- timeout: float-timeout
### tag
Element tag name.
Returns: the'shadow-root' string.
### html
Internal html text.
Returns: str
### parent
The parent element on which the shadow-root depends.
Returns: DriverElement
### next
Return the next sibling element.
Returns: DriverElement
### parents()
Return the parent element at level num above
Parameter Description:
- num: int-which level of parent element
Returns: DriverElement
### nexts()
Return the next num sibling element
Parameter Description:
- num: int-which sibling element
Returns: DriverElement
### ele()
Returns the first child element that meets the criteria.
Parameter Description:
- loc_or_str: Union[Tuple[str, str], str]-element positioning conditions
- mode: str-'single' or'all', corresponding to get one and all
- timeout: float-timeout
Returns: DriverElement-the first element that meets the conditions
### eles()
Return all sub-elements that meet the criteria.
Parameter Description:
- loc_or_str: Union[Tuple[str, str], str]-element positioning conditions
- timeout: float-timeout
Returns: List[DriverElement]-a list of all eligible elements
### run_script()
Execute js code on the element.
Parameter Description:
- scrpit: str-js code
- *args-the object passed in
### is_enabled()
Returns whether the element is available.
Returns: bool
### is_valid()
Returns whether the element is still in the dom.
Returns: bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
## OptionsManager class
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### class OptionsManager()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
The class that manages the content of the configuration file.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- path: str - the path of the ini file, if not passed in, the configs.ini file in the current folder will be read by default
2020-05-25 14:42:44 +08:00
2020-08-11 10:16:54 +08:00
2020-11-19 23:55:50 +08:00
### paths
Return paths setting information.
Returns: dict
### chrome_options
Return to chrome setting information.
Returns: dict
### session_options
Return session setting information.
Returns: dict
2020-11-12 17:57:37 +08:00
### get_value()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Get the configured value.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- section: str - section name
- item: str - configuration item name
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: Any
2020-05-25 14:42:44 +08:00
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### get_option()
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Return the configuration information of the entire paragraph in dictionary format.
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
- section: str - section name
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Returns: dict
2020-08-13 11:44:26 +08:00
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### set_item()
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Set the configuration value and return to yourself for chain operation.
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
- section: str - section name
- item: str - configuration item name
- value: Any - value content
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Return: OptionsManager - return to yourself
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
### save()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Save the settings to a file and return to yourself for chain operation.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
- path: str - the path of the ini file, pass in 'default' would save to the default ini file
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: OptionsManager - return to yourself
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
## DriverOptions class
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### class DriverOptions()
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
The Chrome browser configuration class, inherited from the Options class of selenium.webdriver.chrome.options, adds the methods of deleting configuration and saving to file.
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
2020-11-12 17:57:37 +08:00
- read_file: bool - Whether to read configuration information from the ini file when creating
2020-11-20 23:20:37 +08:00
- ini_path: str - ini file path, if it is None, the default ini file will be read
2020-08-13 11:44:26 +08:00
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### driver_path
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
The path of chromedriver.exe.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### chrome_path
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
chrome.exe path
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: str
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### save()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Save the settings to a file and return to yourself for chain operation.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
- path: str - the path of the ini file, pass in 'default' would save to the default ini file
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### remove_argument()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Remove a setting.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- value: str - the attribute value to be removed
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### remove_experimental_option()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Remove an experiment setting and delete the key value.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- key: str - the key value of the experiment setting to be removed
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### remove_all_extensions()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Remove all plug- ins, because plug- ins are stored in the entire file, it is difficult to remove one of them, so if you need to set, remove all and reset.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_argument()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Set the chrome attribute, the attribute with no value can be set to switch, and the attribute with value can set the value of the attribute.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
- arg: str - attribute name
- value[bool, str] - attribute value, the attribute with value is passed in the value, and the attribute without value is passed in bool
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_headless()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Turn on or off the interfaceless mode.
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
on_off: bool - turn on or off
2020-08-02 11:38:50 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-08-02 11:38:50 +08:00
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_no_imgs()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Whether to load the picture.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
on_off: bool - turn on or off
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_no_js()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Whether to disable js.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
on_off: bool - turn on or off
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_mute()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Whether to mute.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
on_off: bool - turn on or off
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
### set_user_agent()
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Set the browser user agent.
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
- user_agent: str - user agent string
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
### set_proxy()
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Set up a proxy.
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
- proxy: str - proxy address
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
### set_paths()
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Set the path related to the browser.
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
- driver_path: str - the path of chromedriver.exe
- chrome_path: str - the path of chrome.exe
- debugger_address: str - debug browser address, for example: 127.0.0.1:9222
- download_path: str - download file path
- user_data_path: str - user data path
- cache_path: str - cache path
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Return: DriverOptions - return self
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
## easy_set method
2020-07-20 16:58:05 +08:00
2020-11-20 17:33:22 +08:00
Chrome configuration is too complicated, so the commonly used configuration is written as a simple method, and the related content of the ini file will be modified by calling.
### get_match_driver()
Automatically identify the chrome version and download the matching driver. Get the chrome.exe path recorded in the ini file, if not, get the path in the system variable.
2020-11-20 23:20:37 +08:00
Parameter Description:
- ini_path: str-the path of the ini file to be read and modified
- save_path: str-chromedriver save path
2020-11-20 17:33:22 +08:00
Returns: None
2020-07-20 16:58:05 +08:00
2020-11-19 23:55:50 +08:00
### show_settings()
Print all configuration information in the ini file.
2020-11-20 23:20:37 +08:00
Parameter Description:
- ini_path: str-ini file path, if it is None, read the default ini file
2020-11-19 23:55:50 +08:00
Returns: None
2020-11-12 17:57:37 +08:00
### set_paths()
2020-07-20 16:58:05 +08:00
2020-11-20 23:20:37 +08:00
Convenient way to set the path, save the passed path to an ini file, and check whether the chrome and chromedriver versions match.
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-07-20 16:58:05 +08:00
2020-11-20 23:20:37 +08:00
- driver_path: str-chromedriver.exe path
- chrome_path: str-chrome.exe path
- debugger_address: str-debug browser address, for example: 127.0.0.1:9222
- download_path: str-download file path
- global_tmp_path: str-Temporary folder path
- user_data_path: str-user data path
- cache_path: str-cache path
- ini_path: str-ini file path, if it is None, save to the default ini file
- check_version: bool-whether to check if chromedriver and chrome match
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
### set_argument()
2020-07-20 16:58:05 +08:00
2020-11-20 23:20:37 +08:00
Set the properties. If the attribute has no value (such as'zh_CN.UTF-8' ), value is passed in bool to indicate a switch; otherwise, value is assigned to the attribute. When value is'' or False, delete the attribute item.
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-07-20 16:58:05 +08:00
2020-11-20 23:20:37 +08:00
- arg: str-attribute name
- value: [bool, str]-attribute value, the value attribute is passed in the value, and the non-property is passed in bool
- ini_path: str-ini file path, if it is None, save to the default ini file
2020-07-20 16:58:05 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-07-20 16:58:05 +08:00
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_headless()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Turn headless mode on or off.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-11-20 23:20:37 +08:00
- on_off: bool-whether to turn on headless mode
- ini_path: str-ini file path, if it is None, save to the default ini file
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
### set_no_imgs()
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Turn picture display on or off.
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-11-20 23:20:37 +08:00
- on_off: bool-whether to turn on the no image mode
- ini_path: str-ini file path, if it is None, save to the default ini file
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### set_no_js()
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Turn on or off disable JS mode.
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-11-20 23:20:37 +08:00
- on_off: bool-whether to enable or disable JS mode
- ini_path: str-ini file path, if it is None, save to the default ini file
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### set_mute()
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Turn on or off the silent mode.
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-11-20 23:20:37 +08:00
- on_off: bool-whether to turn on silent mode
- ini_path: str-ini file path, if it is None, save to the default ini file
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### set_user_agent()
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Set user_agent.
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-11-20 23:20:37 +08:00
- user_agent: str-user_agent value
- ini_path: str-ini file path, if it is None, save to the default ini file
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### set_proxy()
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Set up a proxy.
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-11-20 23:20:37 +08:00
- proxy: str-proxy value
- ini_path: str-ini file path, if it is None, save to the default ini file
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Returns: None
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
### check_driver_version()
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Check if the chrome and chromedriver versions match.
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-11-12 17:57:37 +08:00
- driver_path: bool - chromedriver.exe path
- chrome_path: boo - chrome.exe path
2020-05-25 14:42:44 +08:00
2020-11-12 17:57:37 +08:00
Returns: bool