DrissionPage/README.en.md

1796 lines
56 KiB
Markdown
Raw Normal View History

2020-06-02 00:14:45 +08:00
- 中文README[点击这里](https://github.com/g1879/DrissionPage/blob/master/README.zh-cn.md)
2020-06-03 18:12:40 +08:00
- 示例:[点击这里](https://gitee.com/g1879/DrissionPage-demos)
2020-06-02 00:14:45 +08:00
2020-05-25 14:42:44 +08:00
# Introduction
***
2020-04-26 11:47:38 +08:00
2020-08-14 00:34:16 +08:00
DrissionPage, the combination of driver and session, is a python-based Web automation operation integration tool.
It realizes the seamless switching between selenium and requests.
Therefore, the convenience of selenium and the high efficiency of requests can be balanced.
It uses POM mode to encapsulate common methods of page elements, which is very suitable for automatic operation function expansion.
2020-07-20 18:15:42 +08:00
What's even better is that its usage is very concise and user-friendly, with a small amount of code and friendly to novices.
2020-04-26 11:47:38 +08:00
2020-05-25 14:42:44 +08:00
**Project address**
2020-04-26 11:47:38 +08:00
2020-05-25 14:42:44 +08:00
- https://github.com/g1879/DrissionPage
- https://gitee.com/g1879/DrissionPage
2020-04-26 11:47:38 +08:00
2020-06-03 18:12:40 +08:00
**Demos:** [https://gitee.com/g1879/DrissionPage-demos](https://gitee.com/g1879/DrissionPage-demos)
2020-06-01 18:09:23 +08:00
2020-05-25 14:42:44 +08:00
**email** g1879@qq.com
2020-04-26 11:47:38 +08:00
2020-05-25 14:42:44 +08:00
# Features
2020-06-01 18:09:23 +08:00
2020-05-25 14:42:44 +08:00
***
2020-04-26 11:47:38 +08:00
2020-06-05 23:03:13 +08:00
- Allows seamless switching between selenium and requests, sharing session.
2020-07-20 18:15:42 +08:00
- Use POM mode to encapsulate common methods for easy expansion.
2020-06-05 23:03:13 +08:00
- The two modes provide a unified operation method with consistent user experience.
- Humanized operation method of page elements to reduce the workload of page analysis and coding.
- Some common functions (such as click) have been optimized to better meet the actual needs.
- Easy configuration method to get rid of the cumbersome browser configuration.
2020-05-25 14:42:44 +08:00
2020-06-01 18:09:23 +08:00
# Idea
***
## Simple, Easy and Extensible
- DrissionPage takes concise code as the first pursuit, streamlines long statements and completely retains its functions.
- DrissionPage encapsulates many commonly used functions and is more convenient to use.
- The core of DrissionPage is a page class, which can directly derive subclass pages to adapt to various scenarios.
2020-07-20 18:15:42 +08:00
- Simple browser configuration method, get rid of tedious settings.
2020-06-01 18:09:23 +08:00
The following code implements exactly the same function, comparing the code amounts of the two:
2020-08-01 00:57:24 +08:00
1. Use explicit wait to find all elements whose text contains 'some text'
2020-06-01 18:09:23 +08:00
```python
# selenium:
2020-06-05 23:03:13 +08:00
element = WebDriverWait(driver).until(ec.presence_of_all_elements_located((By.XPATH, '//*[contains(text(), "some text")]')))
2020-06-01 18:09:23 +08:00
# DrissionPage:
2020-06-05 23:03:13 +08:00
element = page.ele('some text')
2020-06-01 18:09:23 +08:00
```
2020-06-05 23:03:13 +08:00
2. Jump to the first tab
2020-06-01 18:09:23 +08:00
```python
# selenium
driver.switch_to.window(driver.window_handles[0])
# DrissionPage
page.to_tab(0)
```
2020-06-05 23:03:13 +08:00
3. Drag an element
2020-06-01 18:09:23 +08:00
```python
# selenium
ActionChains(driver).drag_and_drop(ele1, ele2).perform()
# DrissionPage
ele1.drag_to(ele2)
```
2020-06-05 23:03:13 +08:00
4. Scroll the window to the bottom (keep the horizontal scroll bar unchanged)
2020-06-02 00:14:45 +08:00
```python
# selenium
driver.execute_script("window.scrollTo(document.documentElement.scrollLeft,document.body.scrollHeight);")
# DrissionPage
page.scroll_to('bottom')
```
2020-06-05 23:03:13 +08:00
5. Set headless mode
```python
# selenium
options = webdriver.ChromeOptions()
options.add_argument("--headless")
# DrissionPage
set_headless()
```
2020-06-01 18:09:23 +08:00
# Background
***
When a novice learns a web crawler, in the face of a website that needs to log in, it is necessary to analyze data packets, JS source code, construct complex requests, and often have to deal with verification codes, JS confusion, signature parameters and other measures, which is difficult to learn. When acquiring data, some data is generated by JavaScript calculation. If you only get the source data, you must also reproduce the calculation process. The experience is not good and the development efficiency is not high.
Using selenium can avoid these problems to a great extent, but selenium is not efficient. Therefore, what this library has to do is to combine selenium and requests into one, and provide a humanized use method to improve development and operation efficiency.
In addition to merging the two, this library also encapsulates commonly used functions in units of web pages, which simplifies selenium operations and statements. When used in web page automation operations, it reduces the consideration of details, focuses on function implementation, and is more convenient to use.
The design concept of this library is to keep everything simple, try to provide a simple and direct method of use, and is more friendly to novices.
2020-05-25 14:42:44 +08:00
# Simple Demo
***
2020-05-26 21:58:47 +08:00
Example: Log in to the website with selenium, then switch to requests to read the web page.
2020-05-25 14:42:44 +08:00
```python
2020-06-05 23:03:13 +08:00
page = MixPage() # Create page object, default driver mode
2020-05-26 21:58:47 +08:00
page.get('https://gitee.com/profile') # Visit personal center page (redirect to the login page)
2020-05-25 14:42:44 +08:00
2020-05-26 21:58:47 +08:00
page.ele('@id:user_login').input('your_user_name') # Use selenium to log in
2020-05-25 14:42:44 +08:00
page.ele('@id:user_password').input('your_password\n')
page.change_mode() # Switch to session mode
print('Title after login:', page.title, '\n') # Output of session mode after login
2020-05-26 21:58:47 +08:00
```
2020-05-25 14:42:44 +08:00
2020-05-26 21:58:47 +08:00
Output:
```
Title after login: Dashboard - Gitee
```
Example: Find element and print attributes.
```python
2020-05-25 14:42:44 +08:00
foot = page.ele('@id:footer-left') # Find elements by id
2020-05-26 21:58:47 +08:00
first_col = foot.ele('css:>div') # Find first div element in the lower level by css selector.
2020-05-25 14:42:44 +08:00
lnk = first_col.ele('text:Git Branching') # Find elements by text content
text = lnk.text # Get element text
href = lnk.attr('href') # Get element attribute value
2020-05-26 00:06:18 +08:00
2020-05-25 14:42:44 +08:00
print(first_col)
print(text, href)
```
Output:
```
<SessionElement div class='column'>
Learn Git Branching https://oschina.gitee.io/learn-git-branching/
```
2020-05-26 00:06:03 +08:00
# Install
2020-05-25 14:42:44 +08:00
***
```
pip install DrissionPage
```
Only python3.6 and above are supported. Driver mode currently only supports chrome.
To use the driver mode, you must download chrome and ** corresponding version ** of chromedriver. [chromedriver download](https://chromedriver.chromium.org/downloads)
Currently only tested in the Windows environment.
# Instructions
***
## import
```python
from DrissionPage import *
```
## Initialization
Before using selenium, you must configure the path of chrome.exe and chromedriver.exe and ensure that their versions match.
If you only use session mode, you can skip this section.
There are three ways to configure the path:
2020-08-14 00:38:04 +08:00
- Write two paths to system variables.
- Pass in the path manually when using it.
- Write the path to the ini file of this library (recommended).
2020-05-25 14:42:44 +08:00
If you choose the third method, please run these lines of code before using the library for the first time, and record these two paths in the ini file.
```python
from DrissionPage.easy_set import set_paths
2020-06-05 23:03:13 +08:00
driver_path = 'D:\\chrome\\chromedriver.exe' # Your chromedriver.exe path, optional
2020-05-25 14:42:44 +08:00
chrome_path = 'D:\\chrome\\chrome.exe' # Your chrome.exe path, optional
set_paths(driver_path, chrome_path)
```
This method also checks if the chrome and chromedriver versions match, and displays:
```
版本匹配,可正常使用。
or
出现异常:
Message: session not created: Chrome version must be between 70 and 73
(Driver info: chromedriver=73.0.3683.68 (47787ec04b6e38e22703e856e101e840b65afe72),platform=Windows NT 10.0.19631 x86_64)
2020-05-25 14:42:44 +08:00
chromedriver下载网址https://chromedriver.chromium.org/downloads
```
After the inspection is passed, the driver mode can be used normally.
In addition to the above two paths, this method can also set the following paths:
```python
debugger_address # Opened browser address, eg. 127.0.0.1:9222
download_path # Download path
global_tmp_path # Temporary folder path
2020-06-05 23:03:13 +08:00
user_data_path # User data path
cache_path # Cache path
2020-05-25 14:42:44 +08:00
```
Tips
2020-08-14 00:38:04 +08:00
- Different projects may require different versions of chrome and chromedriver. You can also save multiple ini files to use as needed.
- It is recommended to use the green version of chrome, and manually set the path to avoid browser upgrades that do not match the chromedriver version.
- It is recommended to set debugger_address when debugging a project, and use a manually opened browser to debug, saving time and effort.
2020-05-25 14:42:44 +08:00
## Create Drission Object
2020-06-05 23:03:13 +08:00
Drission objects are used to manage driver and session objects.Drission objects are used to transmit drives when multiple pages work together, enabling multiple page classes to control the same browser or Session object.
It can be created by directly reading the configuration information of the ini file, or it can be passed in during initialization.
2020-05-25 14:42:44 +08:00
```python
# Created by default ini file
2020-05-25 14:42:44 +08:00
drission = Drission()
# Created by other ini files
drission = Drission(ini_path = 'D:\\settings.ini')
2020-05-25 14:42:44 +08:00
```
To manually pass in the configuration:
```python
# Create with incoming configuration information (ignore ini file)
from DrissionPage.config import DriverOptions
driver_options = DriverOptions() # Create driver configuration object
driver_options.binary_location = 'D:\\chrome\\chrome.exe' # chrome.exe path
session_options = {'headers': {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6)'}}
2020-06-05 23:03:13 +08:00
driver_path = 'D:\\chrome\\chromedriver.exe' # driver_path path
2020-05-25 14:42:44 +08:00
drission = Drission(driver_options, session_options, driver_path) # Create object through incoming configuration
```
## Use MixPage objects
2020-06-05 23:03:13 +08:00
The MixPage page object encapsulates commonly used web page operations and implements the switch between driver and session mode.
MixPage must receive a Drission object and use its driver or session. If no one is sent, MixPage will create a Drission itself (Use configurations from the default INI file).
Tips: When multi-page objects work together, remember to manually create Drission objects and transfer them to page objects for use. Otherwise, page objects can create their own Drission objects, rendering the information impossible to transmit.
2020-05-25 14:42:44 +08:00
```python
2020-06-07 00:08:19 +08:00
# Ways to create MixPage objects
page = MixPage() # Automatic creation of Drission objects is recommended only for single page objects
page = MixPage('s') # Quickly create in session mode, automatically create a Drission object
2020-06-05 23:03:13 +08:00
2020-06-07 00:08:19 +08:00
page = MixPage(drission) # Created by passing in a Drission object
page = MixPage(drission, mode='s', timeout=5) # session mode, waiting time 5 seconds (default 10 seconds)
2020-05-25 14:42:44 +08:00
# Visit URL
page.get(url, **kwargs)
page.post(url, data, **kwargs) # Only session mode has post method.Call the post method will automatically switch to session mode.
# Switch mode
page.change_mode()
# Page operation
print(page.html) # Page source code
page.run_script(js) # Run js statement
page.close_other_tabs(num) # Close other tabs
page.to_iframe(iframe) # switch to iframe
page.screenshot(path) # Screenshot of the page
page.scrool_to_see(element) # Scroll until an element is visible
# See APIs for details...
```
TipsCalling a method that belongs only to the driver mode will automatically switch to the driver mode.
## Find elements
2020-06-08 15:00:42 +08:00
ele() returns the first eligible element, eles() returns a list of all eligible elements.
You can use these two functions under the page object or element object to find the subordinate elements.
2020-05-25 14:42:44 +08:00
Note: The element search timeout is 10 seconds by default, you can also set it as required.
```python
# Find by attribute
page.ele('@id:ele_id', timeout = 2) # Find the element with id ele_id and set the waiting time to 2 seconds
2020-06-08 15:00:42 +08:00
page.eles('@class') # Find all elements with ele_class
page.eles('@class:class_name') # Find all elements with class equal to ele_class
2020-05-25 14:42:44 +08:00
# Search by tag name
page.ele('tag:li') # Find the first li element
page.eles('tag:li') # Find all li elements
2020-06-07 00:08:19 +08:00
# Search by tag name and attributes
page.ele('tag:div@class=div_class') # Find the first div element whose class is div_class
2020-06-08 15:00:42 +08:00
page.ele('tag:div@class:ele_class') # Find the div element with ele_class in class
page.ele('tag:div@class=ele_class') # Find div elements with class equal to ele_class
page.ele('tag:div@text():search_text') # Find the div element whose text contains search_text
page.ele('tag:div@text()=search_text') # Find div elements with text equal to search_text
2020-05-25 14:42:44 +08:00
# Find by text
page.ele('search text') # Find elements containing incoming text
2020-06-08 15:00:42 +08:00
page.eles('text:search text') # If the text starts with @, tag :, css :, xpath :, text :, add text: in front to avoid conflicts
page.eles('text=search text') # Elements with text equal to search_text
2020-05-25 14:42:44 +08:00
# Find by xpath or css selector
page.eles('xpath://div[@class="ele_class"]')
page.eles('css:div.ele_class')
# Find by loc
loc1 = By.ID, 'ele_id'
loc2 = By.XPATH, '//div[@class="ele_class"]'
page.ele(loc1)
page.ele(loc2)
# Find subordinate elements
element = page.ele('@id:ele_id')
element.ele('@class:class_name') # Find the first element whose class is ele_class at the lower level of element
element.eles('tag:li') # Find all li elements below ele_id
2020-06-07 00:08:19 +08:00
# Find by location
element.parent # Parent element
elementnext # Next sibling element
element.prev # Previous brother element
2020-05-25 14:42:44 +08:00
# Tandem search
page.ele('@id:ele_id').ele('tag:div').next.ele('some text').eles('tag:a')
```
## Element operations
```python
# Get element information
element = page.ele('@id:ele_id')
element.html # Return html inside element
element.text # Returns the text value after removing the html tag in the element
element.tag # Return element tag name
element.attrs # Returns a dictionary of all attributes of the element
element.attr('class') # Returns the element's class attribute
element.is_valid # Driver mode only, used to determine whether the element is still available
# Operating element
element.click() # Click element
element.input(text) # Enter text
element.run_script(js) # Run js
element.submit() # submit Form
element.clear() # Clear element
element.is_selected() # Is selected
element.is_enabled() # it's usable or not
element.is_displayed() # Is it visible
element.is_valid() # Whether it is valid, used to judge the situation where the page jump causes the element to fail
element.select(text) # Select the drop-down list option
element.set_attr(attr,value) # Set element attributes
element.size # Returns the element size
element.location # Returns the element position
```
2020-07-20 18:15:42 +08:00
## Chrome shortcut settings
The configuration of chrome is very cumbersome. In order to simplify the use, this library provides setting methods for common configurations.
### DriverOptions Object
The DriverOptions object inherits from the Options object of selenium.webdriver.chrome.options, and the following methods are added to it:
```python
remove_argument(value) # Delete an argument value
remove_experimental_option(key) # Delete an experimental_option setting
remove_all_extensions() # Delete all plugins
save() # Save the configuration to the default ini file
save('D:\\settings.ini') # Save to other path
set_argument(arg, value) # Set argument attribute
set_headless(on_off) # Set whether to use interfaceless mode
set_no_imgs(on_off) # Set whether to load pictures
set_no_js(on_off) # Set whether to disable js
set_mute(on_off) # Set whether to mute
set_user_agent(user_agent) # Set user agent
set_proxy(proxy) # Set proxy address
set_paths(driver_path, chrome_path, debugger_address, download_path, user_data_path, cache_path) # Set browser-related paths
```
### Instructions
```python
do = DriverOptions(read_file=False) # Create chrome configuration object, not read from ini file
do.set_headless(False) # Show browser interface
do.set_no_imgs(True) # Don't load pictures
do.set_paths(driver_path='D:\\chromedriver.exe', chrome_path='D:\\chrome.exe') # Set paths
2020-08-02 11:38:50 +08:00
do.set_headless(False).set_no_imgs(True) # Support chain operation
2020-07-20 18:15:42 +08:00
drission = Drission(driver_options=do) # Create Drission object with configuration object
page = MixPage(drission) # Create a MixPage object with a Drission object
do.save() # Save the configuration to the default ini file
```
2020-05-25 14:42:44 +08:00
## Save configuration
Because chrome and headers have many configurations, an ini file is set up to save commonly used configurations. You can use the OptionsManager object to get and save the configuration, and use the DriverOptions object to modify the chrome configuration. You can also save multiple ini files and call them according to different projects.
TipsIt is recommended to save common configuration files to another path to prevent the configuration from being reset when the library is upgraded.
### ini file
The ini file has three parts by default: paths, chrome_options, and session_options. The initial contents are as follows.
```ini
[paths]
; chromedriver.exe path
chromedriver_path =
; Temporary folder path, used to save screenshots, download files, etc.
global_tmp_path =
[chrome_options]
; The opened browser address and port, such as 127.0.0.1:9222
debugger_address =
; chrome.exe path
binary_location =
; Configuration information
arguments = [
; Hide browser window
'--headless',
; Mute
'--mute-audio',
; No sandbox
'--no-sandbox',
; Google documentation mentions the need to add this attribute to avoid bugs
2020-06-05 23:03:13 +08:00
'--disable-gpu',
; ignore errors
2020-06-06 16:52:11 +08:00
'ignore-certificate-errors',
; Hidden message bar
'--disable-infobars'
2020-05-25 14:42:44 +08:00
]
; Plugin
extensions = []
; Experimental configuration
experimental_options = {
'prefs': {
; Download without pop-up
'profile.default_content_settings.popups': 0,
; No pop-up window
'profile.default_content_setting_values': {'notifications': 2},
; Disable PDF plugin
2020-06-05 23:03:13 +08:00
'plugins.plugins_list': [{"enabled": False, "name": "Chrome PDF Viewer"}]
},
2020-05-25 14:42:44 +08:00
; Set to developer mode, anti-anti-reptile (useless)
2020-06-05 23:03:13 +08:00
'excludeSwitches': ["enable-automation"],
2020-05-25 14:42:44 +08:00
'useAutomationExtension': False
}
[session_options]
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Connection": "keep-alive",
"Accept-Charset": "utf-8;q=0.7,*;q=0.7"
}
```
### OptionsManager object
The OptionsManager object is used to read, set, and save configurations.
```python
get_value(section, item) -> str # Get the value of a configuration
get_option(section) -> dict # Return all configuration properties in dictionary format
set_item(section, item, value) # Set configuration properties
save() # Save configuration to default ini file
save('D:\\settings.ini') # Save to other path
```
2020-07-20 18:15:42 +08:00
### Usage example
2020-05-25 14:42:44 +08:00
2020-07-20 18:15:42 +08:00
```python
from DrissionPage.configs import *
2020-05-25 14:42:44 +08:00
2020-07-20 18:15:42 +08:00
options_manager = OptionsManager() # Create OptionsManager object from default ini file
options_manager = OptionsManager('D:\\settings.ini') # Create OptionsManager object from other ini files
driver_path = options_manager.get_value('paths', 'chromedriver_path') # Read path information
options_manager.save() # Save to the default ini file
options_manager.save('D:\\settings.ini') # Save to other path
2020-05-25 14:42:44 +08:00
2020-07-20 18:15:42 +08:00
drission = Drission(ini_path = 'D:\\settings.ini') # Use other ini files to create objects
2020-05-25 14:42:44 +08:00
```
2020-07-20 18:15:42 +08:00
**Note** : If you do not pass in the path when saving, it will be saved to the ini file in the module directory, even if you are not reading the default ini file.
2020-05-25 14:42:44 +08:00
2020-06-05 23:03:13 +08:00
## easy_set methods
2020-07-20 18:15:42 +08:00
Calling the easy_set method will modify the content of the default ini file.
2020-06-05 23:03:13 +08:00
```python
set_headless(True) # Set headless mode
set_no_imgs(True) # Set no-PIC mode
set_no_js(True) # Disable JavaScript
set_mute(True) # Silent mode
set_user_agent('Mozilla/5.0 (Macintosh; Int......') # set user agent
set_proxy('127.0.0.1:8888') # set proxy
set_paths(paths) # See the Initialization section
2020-06-06 16:52:11 +08:00
set_argument(arg, on_off) # Set the property. If the property has no value (e.g. 'zh_CN.utf-8'), the value is bool representing the switch. If value is "" or False, delete the attribute entry
2020-06-05 23:03:13 +08:00
```
2020-07-20 18:15:42 +08:00
# POM mode
2020-05-25 14:42:44 +08:00
***
MixPage encapsulates common page operations and can be easily used for expansion.
Example: Expand a list page reading class.
```python
import re
from time import sleep
from DrissionPage import *
class ListPage(MixPage):
"""This class encapsulates the method of reading the list page, according to the necessary 4 elements, can read the homogeneous list page"""
def __init__(self, drission: Drission, url: str = None, **xpaths):
super().__init__(drission)
self._url = url
self.xpath_cloumn_name = xpaths['cloumn_name'] # [xpath str, re str]
self.xpath_next_btn = xpaths['next_btn']
self.xpath_rows = xpaths['rows']
self.xpath_total_pages = xpaths['total_pages'] # [xpath str, re str]
self.total_pages = self.get_total_pages()
if url:
self.get(url)
def get_cloumn_name(self) -> str:
if self.xpath_cloumn_name[1]:
s = self.ele(f'xpath:{self.xpath_cloumn_name[0]}').text
r = re.search(self.xpath_cloumn_name[1], s)
return r.group(1)
else:
return self.ele(f'xpath:{self.xpath_cloumn_name[0]}').text
def get_total_pages(self) -> int:
if self.xpath_total_pages[1]:
s = self.ele(f'xpath:{self.xpath_total_pages[0]}').text
r = re.search(self.xpath_total_pages[1], s)
return int(r.group(1))
else:
return int(self.ele(f'xpath:{self.xpath_total_pages[0]}').text)
def click_next_btn(self, wait: float = None):
self.ele(f'xpath:{self.xpath_next_btn}').click()
if wait:
sleep(wait)
def get_current_page_list(self, content_to_fetch: list) -> list:
"""
content_to_fetch[[xpath1,para1],[xpath2,para2]...]
output list[[para1,para2...],[para1,para2...]...]
"""
result_list = []
rows = self.eles(f'xpath:{self.xpath_rows}')
for row in rows:
row_result = []
for j in content_to_fetch:
row_result.append(row.ele(f'xpath:{j[0]}').attr(j[1]))
result_list.append(row_result)
print(row_result)
return result_list
def get_list(self, content_to_fetch: list, wait: float = None) -> list:
current_list = self.get_current_page_list(content_to_fetch)
for _ in range(self.total_pages - 1):
self.click_next_btn(wait)
current_list.extend(self.get_current_page_list(content_to_fetch))
return current_list
```
# Others
***
## DriverPage and SessionPage
If there is no need to switch modes, only DriverPage or SessionPage can be used as required, the usage is consistent with MixPage.
```python
from DrissionPage.session_page import SessionPage
from DrissionPage.drission import Drission
session = Drission().session
page = SessionPage(session) # Pass in Session object
page.get('http://www.baidu.com')
print(page.ele('@id:su').text) # Output百度一下
driver = Drission().driver
page = DriverPage(driver) # Pass in Driver object
page.get('http://www.baidu.com')
print(page.ele('@id:su').text) # Output:百度一下
```
# APIs
***
## Drission class
2020-07-20 16:58:05 +08:00
class **Drission**(driver_options: Union[dict, Options] = None, session_options: dict = None, ini_path = None, proxy: dict = None)
2020-05-25 14:42:44 +08:00
Used to manage driver and session objects.
Parameter Description:
- driver_options - Chrome configuration parameters, can receive Options object or dictionary
- session_options - session configuration parameters, receive dictionary
- ini_path - ini file path, the default is the ini file in the DrissionPage folder
2020-05-25 14:42:44 +08:00
### session
Returns the HTMLSession object, which is created automatically when called.
### driver
Obtain the WebDriver object, which is automatically created when it is called and initialized according to the incoming configuration or ini file configuration.
2020-05-25 23:52:38 +08:00
### driver_options
Return driver configuration in dictionary format.
2020-05-25 14:42:44 +08:00
### session_options
2020-05-25 23:52:38 +08:00
Return session configuration in dictionary format.
2020-05-25 14:42:44 +08:00
2020-06-18 14:33:05 +08:00
### proxy
Return proxy configuration in dictionary format.
2020-08-13 11:44:26 +08:00
### cookies_to_session()
2020-05-25 14:42:44 +08:00
cookies_to_session(copy_user_agent: bool = False, driver: WebDriver = None, session: Session = None) -> None
Copy cookies from driver to session. By default, self.driver is copied to self.session, and driver and session can also be received for operation.
Parameter Description:
2020-08-13 11:44:26 +08:00
- copy_user_agent - Whether to copy user_agent to session
- driver - WebDriver object, copy cookies
- session - Session object, receiving cookies
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### cookies_to_driver()
2020-05-25 14:42:44 +08:00
cookies_to_driver(url: str, driver: WebDriver = None, session: Session = None) -> None
Copy cookies from session to driver. By default, self.session is copied to self.driver, and driver and session can also be received for operation. Need to specify url or domain name.
Parameter Description:
2020-08-13 11:44:26 +08:00
- url - cookies domain
- driver - WebDriver object, receiving cookies
- session - Session object, copy cookies
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### user_agent_to_session()
2020-05-25 14:42:44 +08:00
user_agent_to_session(driver: WebDriver = None, session: Session = None) -> None
Copy the user agent from the driver to the session. By default, self.driver is copied to self.session, and driver and session can also be received for operation.
Parameter Description:
2020-08-13 11:44:26 +08:00
- driver - WebDriver object, copy user agent
- session - Session object, receiving user agent
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### close_driver()
2020-05-25 14:42:44 +08:00
close_driver() -> None
Close the browser and set the driver to None.
2020-08-13 11:44:26 +08:00
### close_session()
2020-05-25 14:42:44 +08:00
close_session() -> None
Close the session and set it to None.
2020-08-13 11:44:26 +08:00
### close()
2020-05-25 14:42:44 +08:00
close() -> None
Close the driver and session.
## MixPage class
2020-06-07 00:08:19 +08:00
class **MixPage**(drission: Union[Drission, str] = None, mode:str = 'd', timeout: float = 10)
2020-05-25 14:42:44 +08:00
MixPage encapsulates common functions for page operations and can seamlessly switch between driver and session modes. Cookies are automatically synchronized when switching.
The function of obtaining information is common to the two modes, and the function of operating page elements is only available in the d mode. Calling a function unique to a certain mode will automatically switch to that mode.
It inherits from DriverPage and SessionPage classes. These functions are implemented by these two classes. MixPage exists as a scheduling role.
Parameter Description:
2020-08-13 11:44:26 +08:00
- drission - Drission objects, if not transmitted will create one. Quickly configure the corresponding mode when passing in's' or'd'
- mode - Mode, optional 'd' or 's', default is 'd'
- timeout - Timeout time, driver mode search element time and session mode connection time
2020-05-25 14:42:44 +08:00
### url
Returns the currently visited URL.
### mode
Returns the current mode ('s' or 'd').
### drission
Returns the currently used Dirssion object.
### driver
Returns the driver object, if not created, it will switch to driver mode when called.
### session
Returns the session object, if not created.
### response
Return the Response object and switch to session mode when calling.
### cookies
Returns cookies, obtained from the current mode.
### html
Return the page html text.
### title
Return to the page title text.
2020-08-13 11:44:26 +08:00
### change_mode()
2020-05-25 14:42:44 +08:00
change_mode(mode: str = None, go: bool = True) -> None
Switch mode, you can specify the target mode, if the target mode is consistent with the current mode, then directly return.
Parameter Description:
2020-08-13 11:44:26 +08:00
- mode - Specify the target mode, 'd' or 's'.
- go - Whether to jump to the current url after switching modes
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### get()
2020-05-25 14:42:44 +08:00
2020-06-10 16:49:23 +08:00
get(url: str, go_anyway=False, **kwargs) -> Union[bool, None]
2020-05-25 14:42:44 +08:00
Jump to a url, sync cookies before jumping, and return whether the target url is available after jumping.
Parameter Description:
2020-08-13 11:44:26 +08:00
- url - Target url
- go_anyway - Whether to force a jump. If the target url is the same as the current url, the default is not to jump.
- kwargs - Used to access parameters when in session mode.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### ele()
2020-05-25 14:42:44 +08:00
ele(loc_or_ele: Union[tuple, str, DriverElement, SessionElement], mode: str = None, timeout: float = None, show_errmsg: bool = False) -> Union[DriverElement, SessionElement]
Get elements according to query parameters and return elements or element lists.
If the query parameter is a string, you can select the '@property name:', 'tag:', 'text:', 'css:', 'xpath:' method. When there is no control mode, it is searched by text mode by default.
If it is loc, query directly according to the content.
Parameter Description:
2020-08-13 11:44:26 +08:00
- loc_or_str - Query condition parameters, if an element object is passed in, return directly
- mode - Find one or more, pass in 'single' or 'all'
- timeout - Search element timeout time, valid in driver mode
- show_errmsg - Whether to throw and display when an exception occurs
2020-05-25 14:42:44 +08:00
Examples:
2020-08-13 11:44:26 +08:00
- When the element object is received: Return the element object object
- Find with loc tuple:
- ele.ele((By.CLASS_NAME, 'ele_class')) - Return the first element whose class is ele_class in children
- Find with query string
Attributes, tag name and attributes, text, xpath, css selector.
Among them, @ means attribute, = means exact match,: means fuzzy match, the string is searched by default when there is no control string.
- page.ele('@class:ele_class') - Return the first class element containing ele_class
- page.ele('@name=ele_name') - Return the first element whose name is equal to ele_name
- page.ele('@placeholder') - Return the first element with placeholder attribute
2020-08-14 00:34:16 +08:00
- page.ele('tag:p') - Return the first p element
2020-08-13 11:44:26 +08:00
- page.ele('tag:div@class:ele_class') - Return the first class div element with ele_class
- page.ele('tag:div@class=ele_class') - Return the first div element whose class is equal to ele_class
- page.ele('tag:div@text():some_text') - Returns the first div element whose text contains some_text
- page.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text
- page.ele('text:some_text') - Returns the first element whose text contains some_text
- page.ele('some_text') - Return the first text element containing some_text (equivalent to the previous line)
- page.ele('text=some_text') - Returns the first element whose text is equal to some_text
- page.ele('xpath://div[@class="ele_class"]') - Return the first element that matches the xpath
- page.ele('css:div.ele_class') - Return the first element that matches the css selector
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### eles()
2020-05-25 14:42:44 +08:00
eles(loc_or_str: Union[tuple, str], timeout: float = None, show_errmsg: bool = False) -> List[DriverElement]
Obtain a list of elements that meet the criteria based on query parameters. The query parameter usage method is the same as the ele method.
Parameter Description:
2020-08-13 11:44:26 +08:00
- loc_or_str - Query condition parameters
- timeout - Search element timeout time, valid in driver mode
- show_errmsg - Whether to throw and display when an exception occurs
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### cookies_to_session()
2020-05-25 14:42:44 +08:00
cookies_to_session(copy_user_agent: bool = False) -> None
Manually copy cookies from driver to session.
Parameter Description:
2020-08-13 11:44:26 +08:00
- copy_user_agent - Whether to also copy user agent
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### cookies_to_driver()
2020-05-25 14:42:44 +08:00
cookies_to_driver(url=None) -> None
Manually copy cookies from session to driver.
Parameter Description:
2020-08-13 11:44:26 +08:00
- url - cookie domain or url
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### post()
2020-05-25 14:42:44 +08:00
post(url: str, params: dict = None, data: dict = None, go_anyway: bool = False, **kwargs) -> Union[bool, None]
Jump by post, and switch to session mode automatically when calling.
Parameter Description:
- url - Target url
2020-08-13 11:44:26 +08:00
- parame - url parameter
- data - Submitted data
- go_anyway - Whether to force a jump. If the target url is the same as the current url, the default is not to jump.
- kwargs - Access parameters such as headers
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### download()
2020-05-25 14:42:44 +08:00
2020-08-01 00:57:24 +08:00
download(file_url: str, goal_path: str = None, rename: str = None, file_exists: str = 'rename', show_msg: bool = False, **kwargs) -> tuple
2020-05-25 14:42:44 +08:00
Download a file, return success and download information string. Changing the method will automatically avoid renaming the existing file in the target path.
Parameter Description:
- file_url - File URL
2020-08-13 11:44:26 +08:00
- goal_path - Storage path, the default is the temporary folder specified in the ini file
- rename - Rename the file without changing the extension
- file_exists - If there is a file with the same name, you can choose 'rename', 'overwrite', 'skip' to process
- show_msg - Show download massage or not.
- kwargs - Connection parameters for requests
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
The following methods and attributes only take effect in driver mode, and will automatically switch to driver mode when called
2020-05-25 14:42:44 +08:00
***
2020-08-13 11:44:26 +08:00
### tabs_count
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
Returns the number of tab pages.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### tab_handles
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
Returns the handle list of all tabs.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### current_tab_num
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
Returns the serial number of the current tab page.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### current_tab_handle
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
Returns the handle of the current tab page.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### check_page()
2020-05-25 14:42:44 +08:00
2020-08-17 10:26:07 +08:00
check_page(by_requests: bool = False) -> Union[bool, None]
2020-05-25 14:42:44 +08:00
2020-08-17 10:26:07 +08:00
In d mode, check whether the web page meets expectations. The response status is checked by default, and can be overloaded to achieve targeted checks.
Parameter Description:
- by_requests - 强制使用内置response进行检查
2020-08-13 11:44:26 +08:00
### run_script()
2020-08-18 14:18:17 +08:00
run_script(script: str, *args) -> Any
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
Execute JavaScript code.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
- script - JavaScript code text
2020-08-18 14:18:17 +08:00
- args - arguments
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### create_tab()
2020-08-12 16:24:14 +08:00
create_tab(url: str = '') -> None
Create and locate a tab page, which is at the end.
Parameter Description:
- url - URL to jump in the new tab page
2020-08-13 11:44:26 +08:00
### close_current_tab()
2020-05-25 14:42:44 +08:00
close_current_tab() -> None
Close the current tab.
2020-08-13 11:44:26 +08:00
### close_other_tabs()
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
close_other_tabs(num_or_handle: Union[int, str, None] = None) -> None
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
Close tab pages other than the incoming tab page, and keep the current page by default.
2020-05-25 14:42:44 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
- num_or_handle - The serial number or handle of the tab to keep, the first serial number is 0, and the last is -1
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### to_tab()
2020-06-02 22:48:55 +08:00
2020-08-13 11:44:26 +08:00
to_tab(num_or_handle: Union[int, str] = 0) -> None
2020-06-02 22:48:55 +08:00
2020-08-13 11:44:26 +08:00
Jump to a tab page.
2020-06-02 22:48:55 +08:00
2020-08-12 16:24:14 +08:00
Parameter Description:
2020-06-02 22:48:55 +08:00
2020-08-13 11:44:26 +08:00
- num_or_handle - The serial number or handle of the tab to keep, the first serial number is 0, and the last is -1
2020-06-02 22:48:55 +08:00
2020-08-13 11:44:26 +08:00
### to_iframe()
2020-05-25 14:42:44 +08:00
to_iframe(self, loc_or_ele: Union[int, str, tuple, WebElement, DriverElement] = 'main') -> None
2020-05-25 14:42:44 +08:00
Jump to iframe, compatible with selenium native parameters.
2020-05-25 14:42:44 +08:00
Parameter Description:
2020-06-02 22:48:55 +08:00
- loc_or_ele - To search for iframe element conditions, you can receive iframe serial number (starting at 0), id or name, control string, loc parameter, WebElement object, DriverElement object, pass 'main' to jump to the top level, pass 'parent' to jump to parent level.
2020-05-25 14:42:44 +08:00
Examples:
2020-08-12 16:24:14 +08:00
- to_iframe('tag:iframe') - Positioning by the query string passed in the iframe
- to_iframe('iframe_id') - Positioning by the id attribute of the iframe
- to_iframe('iframe_name') - Positioning by the name attribute of the iframe
- to_iframe(iframe_element) - Positioning by passing in the element object
- to_iframe(0) - Positioning by the serial number of the iframe
- to_iframe('main') - Switch to the top level
- to_iframe('parent') - Switch to the previous level
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### scroll_to_see()
2020-05-25 14:42:44 +08:00
2020-06-03 12:41:13 +08:00
scroll_to_see(loc_or_ele: Union[str, tuple, WebElement, DriverElement]) -> None
2020-05-25 14:42:44 +08:00
Scroll until the element is visible.
Parameter Description:
2020-08-13 11:44:26 +08:00
- loc_or_ele - The search condition of the iframe element is the same as the search condition of the ele () method.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### scroll_to()
2020-05-25 14:42:44 +08:00
scroll_to(mode: str = 'bottom', pixel: int = 300) -> None
Scroll the page and decide how to scroll according to the parameters.
Parameter Description:
- mode - Scrolling direction, top, bottom, rightmost, leftmost, up, down, left, right
- pixel - Scrolling pixels
2020-08-13 11:44:26 +08:00
### refresh()
2020-05-25 14:42:44 +08:00
refresh() -> None
Refresh page.
2020-08-13 11:44:26 +08:00
### back()
2020-05-25 14:42:44 +08:00
back() -> None
The page back.
2020-08-13 11:44:26 +08:00
### set_window_size()
2020-05-25 14:42:44 +08:00
set_window_size(x: int = None, y: int = None) -> None
Set the window size and maximize it by default.
Parameter Description:
2020-08-13 11:44:26 +08:00
- x - Target width
- y - Target height
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### screenshot()
2020-05-25 14:42:44 +08:00
2020-06-10 16:49:23 +08:00
screenshot(path: str, filename: str = None) -> str
2020-05-25 14:42:44 +08:00
Take a screenshot of the web page and return the path of the screenshot file.
Parameter Description:
2020-08-13 11:44:26 +08:00
- path - Screenshot save path, default is the temporary folder specified in the ini file
- filename - Screenshot file name, default is page title as file name
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### process_alert()
2020-05-26 14:09:08 +08:00
process_alert(mode: str = 'ok', text: str = None) -> Union[str, None]
Processing alert, confirm and prompt box.
Parameter Description:
2020-08-13 11:44:26 +08:00
- mode - 'ok' or 'cancel', if enter another value, the button will not be pressed but the text value will still be returned
- text - Text can be entered when processing prompt box
2020-05-26 14:09:08 +08:00
2020-08-13 11:44:26 +08:00
### chrome_downloading()
2020-05-25 14:42:44 +08:00
2020-05-25 23:52:38 +08:00
chrome_downloading(download_path: str = None) -> list
2020-05-25 14:42:44 +08:00
Check whether the browser is downloaded.
Parameter Description:
2020-08-13 11:44:26 +08:00
- download_path - Download path, the default is the download path in chrome options configuration
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### close_driver()
2020-05-25 14:42:44 +08:00
close_driver() -> None
Close the driver and browser, and switch to s mode.
2020-08-13 11:44:26 +08:00
### close_session()
2020-05-25 14:42:44 +08:00
close_session() -> None
Close the session and switch to d mode.
## DriverElement class
class DriverElement(ele: WebElement, timeout: float = 10)
The element object of the driver mode wraps a WebElement object and encapsulates common functions.
Parameter Description:
- ele - WebElement object
- timeout - Search element time-out time (can also be set separately each time element search)
### inner_ele
The wrapped WebElement object.
2020-07-03 17:54:37 +08:00
### driver
WebDriver object of the element.
2020-05-25 14:42:44 +08:00
### attrs
Return all attributes and values of the elements in a dictionary.
### text
Returns the text inside the element.
### html
Returns the html text in the element.
### tag
Returns the element label name text.
2020-08-11 10:16:54 +08:00
### xpath
Return the xpath path of the element.
2020-05-25 14:42:44 +08:00
### parent
Returns the parent element object.
### next
Returns the next sibling element object.
### prev
Returns the last sibling element object.
2020-08-13 11:44:26 +08:00
### parents()
2020-06-05 18:36:37 +08:00
parents(num: int = 1) -> Union[DriverElement, None]
Returns the Nth-level parent element object.
2020-08-13 11:44:26 +08:00
Parameter Description:
- num - The parent element of the upper level
### nexts()
2020-06-05 18:36:37 +08:00
nexts(num: int = 1) -> Union[DriverElement, None]
Returns the next N sibling element objects.
2020-08-13 11:44:26 +08:00
Parameter Description:
- num - The next few sibling elements
### prevs()
2020-06-05 18:36:37 +08:00
prevs(num: int = 1) -> Union[DriverElement, None]
Return the first N sibling element objects.
2020-08-13 11:44:26 +08:00
Parameter Description:
- num - The first few sibling elements
2020-05-25 14:42:44 +08:00
### size
Returns the element size as a dictionary.
### location
Put the element coordinates back in a dictionary.
2020-08-13 11:44:26 +08:00
### ele()
2020-05-25 14:42:44 +08:00
ele(loc_or_str: Union[tuple, str], mode: str = None, show_errmsg: bool = False, timeout: float = None) -> Union[DriverElement, List[DriverElement], None]
Get elements based on query parameters.
If the query parameter is a string, you can select the '@property name:', 'tag:', 'text:', 'css:', and 'xpath:' methods. When there is no control mode, it is searched by text mode by default.
If it is loc, query directly according to the content.
Parameter Description:
2020-08-13 11:44:26 +08:00
- loc_or_str - Query condition parameters
- mode - Find one or more, pass in 'single' or 'all'
- show_errmsg - Whether to throw and display when an exception occurs
- timeout - Find Element Timeout
2020-05-25 14:42:44 +08:00
Examples:
2020-08-13 11:44:26 +08:00
- Find with loc tuple:
- ele.ele((By.CLASS_NAME, 'ele_class')) - Return the first element whose class is ele_class in children
- Find with query string
Attributes, tag name and attributes, text, xpath, css selector.
Among them, @ means attribute, = means exact match,: means fuzzy match, the string is searched by default when there is no control string.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
- page.ele('@class:ele_class') - Return the first class element containing ele_class
- page.ele('@name=ele_name') - Return the first element whose name is equal to ele_name
- page.ele('@placeholder') - Return the first element with placeholder attribute
- page.ele('tag:p') - Return the first <p> element
- page.ele('tag:div@class:ele_class') - Return the first class div element with ele_class
- page.ele('tag:div@class=ele_class') - Return the first div element whose class is equal to ele_class
- page.ele('tag:div@text():some_text') - Returns the first div element whose text contains some_text
- page.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text
- page.ele('text:some_text') - Returns the first element whose text contains some_text
- page.ele('some_text') - Return the first text element containing some_text (equivalent to the previous line)
- page.ele('text=some_text') - Returns the first element whose text is equal to some_text
- page.ele('xpath://div[@class="ele_class"]') - Return the first element that matches the xpath
- page.ele('css:div.ele_class') - Return the first element that matches the css selector
### eles()
2020-05-25 14:42:44 +08:00
eles(loc_or_str: Union[tuple, str], show_errmsg: bool = False, timeout: float = None) -> List[DriverElement]
Obtain a list of elements that meet the criteria based on query parameters. The query parameter usage method is the same as the ele method.
Parameter Description:
2020-08-13 11:44:26 +08:00
- loc_or_str - Query condition parameters
- show_errmsg - Whether to throw and display when an exception occurs
- timeout - Find Element Timeout
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### attr()
2020-05-25 14:42:44 +08:00
attr(attr: str) -> str
Get the value of an attribute of an element.
Parameter Description:
2020-08-13 11:44:26 +08:00
- attr - Attribute name
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### click()
2020-05-25 14:42:44 +08:00
2020-06-13 23:10:11 +08:00
click(by_js=None) -> bool
2020-05-25 14:42:44 +08:00
2020-06-13 23:10:11 +08:00
Click on the element. If it is unsuccessful, click on js. You can specify click on js or not.
2020-05-25 14:42:44 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
- by_js - Whether to click with js
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### input()
2020-05-25 14:42:44 +08:00
input(value, clear: bool = True) -> bool
Enter text.
Parameter Description:
2020-08-13 11:44:26 +08:00
- value - Text value
- clear - Whether to clear the text box before entering
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### run_script()
2020-05-25 14:42:44 +08:00
2020-08-18 14:18:17 +08:00
run_script(script: str, *args) -> Any
2020-05-25 14:42:44 +08:00
2020-08-18 14:18:17 +08:00
Execute js code, pass self as the first parameter.
2020-05-25 14:42:44 +08:00
Parameter Description:
- script - JavaScript text
2020-08-18 14:18:17 +08:00
- args - arguments
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### submit()
2020-05-25 14:42:44 +08:00
submit() -> None
Submit form.
2020-08-13 11:44:26 +08:00
### clear()
2020-05-25 14:42:44 +08:00
clear() -> None
Clear the text box.
2020-08-13 11:44:26 +08:00
### is_selected()
2020-05-25 14:42:44 +08:00
is_selected() -> bool
Whether the element is selected.
2020-08-13 11:44:26 +08:00
### is_enabled()
2020-05-25 14:42:44 +08:00
is_enabled() -> bool
Whether the element is available on the page.
2020-08-13 11:44:26 +08:00
### is_displayed()
2020-05-25 14:42:44 +08:00
is_displayed() -> bool
Whether the element is visible.
2020-08-13 11:44:26 +08:00
### is_valid()
2020-05-25 14:42:44 +08:00
is_valid() -> bool
Whether the element is valid. This method is used to determine the situation where the page jump element cannot be used
2020-08-13 11:44:26 +08:00
### screenshot()
2020-05-25 14:42:44 +08:00
2020-06-10 16:49:23 +08:00
screenshot(path: str, filename: str = None) -> str
2020-05-25 14:42:44 +08:00
Take a screenshot of the web page and return the path of the screenshot file.
Parameter Description:
2020-08-13 11:44:26 +08:00
- path - Screenshot save path, default is the temporary folder specified in the ini file
- filename - Screenshot file name, default is page title as file name
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### select()
2020-05-25 14:42:44 +08:00
select(text: str) -> bool
Choose from the drop-down list.
Parameter Description:
2020-08-13 11:44:26 +08:00
- text - Option text
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### set_attr()
2020-05-25 14:42:44 +08:00
set_attr(attr: str, value: str) -> bool
Set element attributes.
Parameter Description:
2020-08-13 11:44:26 +08:00
- attr - parameter name
- value - Parameter value
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### drag()
2020-07-27 17:11:32 +08:00
drag(x: int, y: int, speed: int = 40, shake: bool = True) -> bool
Drag the current element a certain distance, and return whether the drag is successful.
Parameter Description:
2020-08-13 11:44:26 +08:00
- x - Drag distance in x direction
- y - Drag distance in y direction
- speed - Drag speed
- shake - Random jitter
2020-07-27 17:11:32 +08:00
2020-08-13 11:44:26 +08:00
### drag_to()
2020-05-26 14:09:08 +08:00
2020-07-27 17:11:32 +08:00
drag_to(ele_or_loc: Union[tuple, WebElement, DrissionElement], speed: int = 40, shake: bool = True) -> bool:
2020-05-26 14:09:08 +08:00
Drag the current element, the target is another element or coordinate tuple, and return whether the drag is successful.
Parameter Description:
2020-07-27 17:11:32 +08:00
- ele_or_loc - Another element or relative current position. The coordinates are the coordinates of the midpoint of the element.
2020-08-13 11:44:26 +08:00
- speed - Drag speed
- shake - Random jitter
2020-05-26 14:09:08 +08:00
2020-08-13 11:44:26 +08:00
### hover()
2020-07-03 11:06:04 +08:00
hover()
Hover over the element.
2020-05-25 14:42:44 +08:00
## SessionElement class
class SessionElement(ele: Element)
The element object of session mode wraps an Element object and encapsulates common functions.
Parameter Description:
2020-08-13 11:44:26 +08:00
- ele - Element object of requests_html library
2020-05-25 14:42:44 +08:00
### inner_ele
The wrapped Element object.
### attrs
Returns the names and values of all attributes of the element in dictionary format.
### text
Returns the text inside the element.
### html
Returns the html text in the element.
### tag
Returns the element label name text.
2020-08-11 10:16:54 +08:00
### xpath
Return the xpath path of the element.
2020-05-25 14:42:44 +08:00
### parent
Returns the parent element object.
### next
Returns the next sibling element object.
### prev
Returns the last sibling element object.
2020-08-13 11:44:26 +08:00
### parents()
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
parents(num: int = 1) -> Union[SessionElement, None]
2020-06-05 18:36:37 +08:00
Returns the Nth-level parent element object.
2020-08-13 11:44:26 +08:00
Parameter Description:
- num - The parent element of the upper level
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### nexts()
nexts(num: int = 1) -> Union[SessionElement, None]
2020-06-05 18:36:37 +08:00
Returns the next N sibling element objects.
2020-08-13 11:44:26 +08:00
Parameter Description:
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
- num - The next few sibling elements
### prevs()
prevs(num: int = 1) -> Union[SessionElement, None]
2020-06-05 18:36:37 +08:00
Return the first N sibling element objects.
2020-08-13 11:44:26 +08:00
Parameter Description:
- num - The first few sibling elements
### ele()
2020-05-25 14:42:44 +08:00
ele(loc_or_str: Union[tuple, str], mode: str = None, show_errmsg: bool = False) -> Union[SessionElement, List[SessionElement], None]
Get elements based on query parameters.
If the query parameter is a string, you can select the '@property name:', 'tag:', 'text:', 'css:', and 'xpath:' methods. When there is no control mode, it is searched by text mode by default.
If it is loc, query directly according to the content.
Parameter Description:
2020-08-13 11:44:26 +08:00
- loc_or_str - Query condition parameters
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
- mode - Find one or more, pass in 'single' or 'all'
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
- show_errmsg - Whether to throw and display when an exception occurs
2020-05-25 14:42:44 +08:00
Examples:
2020-08-13 11:44:26 +08:00
- Find with loc tuple:
- ele.ele((By.CLASS_NAME, 'ele_class')) - Return the first element whose class is ele_class in children
- Find with query string
Attributes, tag name and attributes, text, xpath, css selector.
Among them, @ means attribute, = means exact match,: means fuzzy match, the string is searched by default when there is no control string.
- page.ele('@class:ele_class') - Return the first class element containing ele_class
- page.ele('@name=ele_name') - Return the first element whose name is equal to ele_name
- page.ele('@placeholder') - Return the first element with placeholder attribute
- page.ele('tag:p') - Return the first <p> element
- page.ele('tag:div@class:ele_class') - Return the first class div element with ele_class
- page.ele('tag:div@class=ele_class') - Return the first div element whose class is equal to ele_class
- page.ele('tag:div@text():some_text') - Returns the first div element whose text contains some_text
- page.ele('tag:div@text()=some_text') - Returns the first div element whose text is equal to some_text
- page.ele('text:some_text') - Returns the first element whose text contains some_text
- page.ele('some_text') - Return the first text element containing some_text (equivalent to the previous line)
- page.ele('text=some_text') - Returns the first element whose text is equal to some_text
- page.ele('xpath://div[@class="ele_class"]') - Return the first element that matches the xpath
- page.ele('css:div.ele_class') - Return the first element that matches the css selector
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### eles()
2020-05-25 14:42:44 +08:00
eles(loc_or_str: Union[tuple, str], show_errmsg: bool = False) -> List[SessionElement]
Obtain a list of elements that meet the criteria based on query parameters. The query parameter usage method is the same as the ele method.
Parameter Description:
2020-08-13 11:44:26 +08:00
- loc_or_str - Query condition parameters
- show_errmsg - Whether to throw and display when an exception occurs
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### attr()
2020-05-25 14:42:44 +08:00
attr(attr: str) -> str
Get the value of an attribute of an element.
Parameter Description:
2020-08-13 11:44:26 +08:00
- attr - Attribute name
2020-05-25 14:42:44 +08:00
## OptionsManager class
class OptionsManager(path: str = None)
The class that manages the content of the configuration file.
Parameter Description:
2020-08-13 11:44:26 +08:00
- path - Ini file path, if not imported, the configs.ini file in the current folder is read by default
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### get_value()
2020-05-25 14:42:44 +08:00
get_value(section: str, item: str) -> Any
Get the configured value.
Parameter Description:
2020-08-13 11:44:26 +08:00
- section - Paragraph name
- item - Configuration item name
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### get_option()
2020-05-25 14:42:44 +08:00
get_option(section: str) -> dict
Return configuration information for the entire paragraph in dictionary format.
Parameter Description:
2020-08-13 11:44:26 +08:00
- section - Paragraph name
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### set_item()
2020-05-25 14:42:44 +08:00
2020-08-02 11:38:50 +08:00
set_item(section: str, item: str, value: str) -> OptionsManager
2020-05-25 14:42:44 +08:00
Set configuration values.
Parameter Description:
2020-08-13 11:44:26 +08:00
- section - Paragraph name
- item - Configuration item name
- value - Content of value
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### save()
2020-05-25 14:42:44 +08:00
2020-08-02 11:38:50 +08:00
save(path: str = None) -> OptionsManager
2020-05-25 14:42:44 +08:00
Save the settings to a file.
Parameter Description:
2020-08-13 11:44:26 +08:00
- path - The path of the ini file, which is saved to the module folder by default
2020-05-25 14:42:44 +08:00
## DriverOptions class
class DriverOptions(read_file=True)
The chrome browser configuration class, inherited from the Options class of selenium.webdriver.chrome.options, adds methods to delete configuration and save to file.
Parameter Description:
- read_file - Boolean, specifies whether to read configuration information from the ini file when creating
2020-07-20 16:58:05 +08:00
### driver_path
Path of chromedriver.exe.
2020-08-02 11:38:50 +08:00
### chrome_path
Path of chrome.exe.
2020-08-13 11:44:26 +08:00
### remove_argument()
2020-05-25 14:42:44 +08:00
2020-08-02 11:38:50 +08:00
remove_argument(value: str) -> DriverOptions
2020-05-25 14:42:44 +08:00
Remove a setting.
Parameter Description:
2020-08-13 11:44:26 +08:00
- value - The attribute value to be removed
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### remove_experimental_option()
2020-05-25 14:42:44 +08:00
2020-08-02 11:38:50 +08:00
remove_experimental_option(key: str) -> DriverOptions
2020-05-25 14:42:44 +08:00
Remove an experiment setting and delete the incoming key value.
Parameter Description:
2020-08-13 11:44:26 +08:00
- key - The key value of the experiment to be removed
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### remove_argument()
2020-05-25 14:42:44 +08:00
2020-08-02 11:38:50 +08:00
remove_argument() -> DriverOptions
2020-05-25 14:42:44 +08:00
Remove all plug-ins, because the plug-in is stored in the entire file, it is difficult to remove one of them, so if you need to set, remove all and reset.
2020-08-13 11:44:26 +08:00
### save()
2020-05-25 14:42:44 +08:00
2020-08-02 11:38:50 +08:00
save(path: str = None) -> DriverOptions
2020-05-25 14:42:44 +08:00
Save the settings to a file.
Parameter Description:
2020-08-13 11:44:26 +08:00
- path - The path of the ini file, which is saved to the module folder by default
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### set_argument()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_argument(arg: str, value: Union[bool, str]) -> DriverOptions
2020-07-20 16:58:05 +08:00
Set the chrome attribute, the attribute with no value can be set to switch, the attribute with the value can set the value of the attribute.
Parameter description:
2020-08-13 11:44:26 +08:00
- arg - attribute name
- value - the attribute value, the attribute with value is passed in the value, the attribute without value is passed in bool
2020-07-20 16:58:05 +08:00
2020-08-13 11:44:26 +08:00
### set_headless()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_headless(on_off: bool = True) -> DriverOptions
2020-07-20 16:58:05 +08:00
Turn on or off the interfaceless mode.
Parameter Description:
2020-08-13 11:44:26 +08:00
on_off - open or close, bool
2020-07-20 16:58:05 +08:00
2020-08-13 11:44:26 +08:00
### set_no_imgs()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_no_imgs(on_off: bool = True) -> DriverOptions
2020-07-20 16:58:05 +08:00
Whether to load pictures.
Parameter Description:
2020-08-13 11:44:26 +08:00
on_off - open or close, bool
2020-07-20 16:58:05 +08:00
2020-08-13 11:44:26 +08:00
### set_no_js()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_no_js(on_off: bool = True) -> DriverOptions
2020-07-20 16:58:05 +08:00
Whether to disable js.
Parameter Description:
2020-08-13 11:44:26 +08:00
on_off - open or close, bool
2020-07-20 16:58:05 +08:00
2020-08-13 11:44:26 +08:00
### set_mute()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_mute(on_off: bool = True) -> DriverOptions
2020-07-20 16:58:05 +08:00
Whether to mute.
Parameter Description:
2020-08-13 11:44:26 +08:00
on_off - open or close, bool
2020-07-20 16:58:05 +08:00
2020-08-13 11:44:26 +08:00
### set_user_agent()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_user_agent(user_agent: str) -> DriverOptions
2020-07-20 16:58:05 +08:00
Set the browser user agent.
Parameter Description:
2020-08-13 11:44:26 +08:00
- user_agent - user agent string
2020-07-20 16:58:05 +08:00
2020-08-13 11:44:26 +08:00
### set_proxy()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_proxy(proxy: str) -> DriverOptions
2020-07-20 16:58:05 +08:00
Set up a proxy.
Parameter Description:
2020-08-13 11:44:26 +08:00
- proxy - proxy address
2020-07-20 16:58:05 +08:00
2020-08-13 11:44:26 +08:00
### set_paths()
2020-07-20 16:58:05 +08:00
2020-08-02 11:38:50 +08:00
set_paths(driver_path: str = None, chrome_path: str = None, debugger_address: str = None, download_path: str = None, user_data_path: str = None, cache_path: str = None) -> DriverOptions
2020-07-20 16:58:05 +08:00
Set browser-related paths.
Parameter Description:
2020-08-13 11:44:26 +08:00
- driver_path - path of chromedriver.exe
- chrome_path - path of chrome.exe
- debugger_address - debug browser address, for example: 127.0.0.1:9222
- download_path - download file path
- user_data_path - user data path
- cache_path - cache path
2020-07-20 16:58:05 +08:00
2020-05-25 14:42:44 +08:00
2020-06-05 23:03:13 +08:00
## easy_set methods
2020-05-25 14:42:44 +08:00
2020-06-05 18:36:37 +08:00
The configuration of chrome is too difficult to remember, so the commonly used configuration is written as a simple method, and the call will modify the relevant content of the ini file.
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
### set_paths()
2020-05-25 14:42:44 +08:00
2020-06-05 18:36:37 +08:00
set_paths(driver_path: str = None, chrome_path: str = None, debugger_address: str = None, global_tmp_path: str = None, download_path: str = None, user_data_path: str = None, cache_path: str = None, check_version: bool = True) -> None
2020-05-25 14:42:44 +08:00
2020-06-05 18:36:37 +08:00
Convenient way to set the path, save the incoming path to the default ini file, and check whether the chrome and chromedriver versions match.
2020-05-25 14:42:44 +08:00
2020-06-05 18:36:37 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
- driver_path - the path of chromedriver.exe
- chrome_path - the path of chrome.exe
- debugger_address - Debug browser address, eg. 127.0.0.1:9222
- download_path - File download path
- global_tmp_path - Temporary folder path
- user_data_path - User data path
- cache_path - Cache path
- check_version - Whether to check whether chromedriver and chrome match
2020-08-13 11:44:26 +08:00
### set_argument()
2020-05-25 14:42:44 +08:00
2020-06-06 16:52:11 +08:00
set_argument(arg: str, value: Union[bool, str]) -> None
2020-05-25 14:42:44 +08:00
2020-06-06 16:52:11 +08:00
Set the properties. If the attribute has no value (such as' zh_CN.utf-8 '), the value is passed into the bool to indicate the switch; Otherwise, value passes in STR, and when value is "" or False, the attribute entry is deleted.
2020-06-05 18:36:37 +08:00
Parameter Description:
2020-08-13 11:44:26 +08:00
- arg - Attribute name
- value - Attribute value, pass in a value if it has a value, pass in a bool if it doesn't
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### set_headless()
2020-06-05 18:36:37 +08:00
set_headless(on_off: bool) -> None
Turn headless mode on or off.
Parameter Description:
2020-08-13 11:44:26 +08:00
- on_off - Whether to enable headless mode
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### set_no_imgs()
2020-06-05 18:36:37 +08:00
set_no_imgs(on_off: bool) -> None
Turn the picture display on or off.
Parameter Description:
2020-08-13 11:44:26 +08:00
- on_off - Whether to enable no-picture mode
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### set_no_js()
2020-06-05 18:36:37 +08:00
set_no_js(on_off: bool) -> None
Turn JS mode on or off.
Parameter Description:
2020-08-13 11:44:26 +08:00
- on_off - Whether to enable or disable JS mode
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### set_mute()
2020-06-05 18:36:37 +08:00
set_mute(on_off: bool) -> None
Turn silent mode on or off.
Parameter Description:
2020-08-13 11:44:26 +08:00
- on_off - Whether to turn on silent mode
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### set_user_agent()
2020-06-05 18:36:37 +08:00
set_user_agent(user_agent: str) -> None:
Set user_agent.
Parameter Description:
2020-08-13 11:44:26 +08:00
- user_agent - user_agent value
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### set_proxy()
2020-06-05 18:36:37 +08:00
set_proxy(proxy: str) -> None
Set up the proxy.
Parameter Description:
2020-08-13 11:44:26 +08:00
- proxy - Proxy value
2020-06-05 18:36:37 +08:00
2020-08-13 11:44:26 +08:00
### check_driver_version()
2020-06-05 18:36:37 +08:00
check_driver_version(driver_path: str = None, chrome_path: str = None) -> bool
Check if the chrome and chromedriver versions match.
Parameter Description:
2020-05-25 14:42:44 +08:00
2020-08-13 11:44:26 +08:00
- driver_path - the path of chromedriver.exe
- chrome_path - the path of chrome.exe